Run function (featureCounts) Over Multiple Collections

CWunder · December 17, 2018, 8:34pm

I am following along with the de-novo transcript analysis tutorial here. I have been using collections to organize the data as I go (1 collection for Megakaryocyte data and 1 for G1E data).

The collections were not a problem until I needed to run featureCounts() on the alignment files from all 4 samples (the 2 MegaK replicates with the 2 G1E replicates). It does not appear that featureCounts() will allow you to select multiple collections for analysis (ie. it only allows me to choose either the MegaK collection, or the G1E collection, but not both). I also cannot seem to figure out how to ‘unpack’ the collections into individual files so that I can run all four files together.

Any help on either 1) running two collections together or 2) ‘unpacking’ the collections so that I can submit the individual files together, would be greatly appreciated.

CWunder · December 17, 2018, 9:36pm

I seem to have answered my own question. I will leave this here incase anyone else has the same question in the future.

To run multiple collections together I used the Merge Collections function to merge my two collections into one, then selected the merged collection as the input to featureCounts.

To split collections into individual files (or subgroups of files) I used the Filter List (from a text file) command. I found that I needed to have the exact file names in order for the filtering to work, partial names were not matched. To separate a collection out into individual files I called Filter List multiple times using a single file name each time. I found the easiest way to make the text files was by clicking the upload button and then selecting the Paste/Fetch Data button in the popup display and then pasting in the file names I wanted to extract.

jennaj · December 18, 2018, 1:07am

What you describe will work but is not necessary for all cases.

An alternative is to run each of the collections through tools independently. This version of nearly the same tutorial makes use of collections in this way, as an example: https://galaxyproject.org/tutorials/nt_rnaseq/. Note that the collections were grouped in a way originally that did not make advanced manipulations needed.

Once you have an analysis completed, a workflow can be extracted (and edited). Then it won’t matter so much about how many steps there are. All workflow steps will be started up at the same time and executed in order. Intermediate data can be hidden, data can be renamed to be meaningful, and importantly, how collection data is grouped/ungrouped to move it through tools will be for you to decide

Topic		Replies	Views
Featurecounts files of collection not showing up for Deseq2 upload collections	19	708	May 6, 2022
Multiple featureCounts let RNAseq workflow not run usegalaxy.eu support workflow , mapping , transcriptomics	4	573	November 10, 2022
Subsetting collections and doing operations on an arbitary number of groups usegalaxy.eu support workflow , collections , snpeff	9	959	July 1, 2020
Working with collection in RNA STAR usegalaxy.org support	2	357	January 5, 2023
Feature counts error workflow , transcriptomics	6	973	November 25, 2022

Run function (featureCounts) Over Multiple Collections

Related topics