Run function (featureCounts) Over Multiple Collections

I am following along with the de-novo transcript analysis tutorial here. I have been using collections to organize the data as I go (1 collection for Megakaryocyte data and 1 for G1E data).

The collections were not a problem until I needed to run featureCounts() on the alignment files from all 4 samples (the 2 MegaK replicates with the 2 G1E replicates). It does not appear that featureCounts() will allow you to select multiple collections for analysis (ie. it only allows me to choose either the MegaK collection, or the G1E collection, but not both). I also cannot seem to figure out how to ‘unpack’ the collections into individual files so that I can run all four files together.

Any help on either 1) running two collections together or 2) ‘unpacking’ the collections so that I can submit the individual files together, would be greatly appreciated.

I seem to have answered my own question. I will leave this here incase anyone else has the same question in the future.

To run multiple collections together I used the Merge Collections function to merge my two collections into one, then selected the merged collection as the input to featureCounts.

To split collections into individual files (or subgroups of files) I used the Filter List (from a text file) command. I found that I needed to have the exact file names in order for the filtering to work, partial names were not matched. To separate a collection out into individual files I called Filter List multiple times using a single file name each time. I found the easiest way to make the text files was by clicking the upload button and then selecting the Paste/Fetch Data button in the popup display and then pasting in the file names I wanted to extract.

1 Like

What you describe will work but is not necessary for all cases.

An alternative is to run each of the collections through tools independently. This version of nearly the same tutorial makes use of collections in this way, as an example: https://galaxyproject.org/tutorials/nt_rnaseq/. Note that the collections were grouped in a way originally that did not make advanced manipulations needed.

Once you have an analysis completed, a workflow can be extracted (and edited). Then it won’t matter so much about how many steps there are. All workflow steps will be started up at the same time and executed in order. Intermediate data can be hidden, data can be renamed to be meaningful, and importantly, how collection data is grouped/ungrouped to move it through tools will be for you to decide :slight_smile: