How to use Falco on a dataset collection

I’m working on bulk RNA-seq data. I simply created a list with the pairs of each sample (forward and reverse) and then I used the “flatten collection” tool to create a simple list. Next step is to use the Falco tool for quality control. I wanted to apply the tool on this flattened collection, so I clicked on “dataset collection” and selected the file, but it doesn’t work. This is the message that appears:

Parameter ‘input_file’: dataset collection supplied to single input dataset parameter; to run the tool over each element of the collection, use the map-over option.

I don’t know what this means and there is no “map-over option”. The data inside the collection are fastqsanger.gz.

What can I do?

Welcome @maria.vittoria

Hopefully we can help!

The “mapping over” refers to how collection elements are processed one by one. A longer description is in a hover-over message in the warning. I’ve copied it below. It is pretty technical but the general interpretation is that some tools can consume a List of Pairs collection shape (paired-end) and others expect a simple Flat List shape (single end or a split paired-end).

When a tool consumes a dataset but is run with a collection, the collection maps over the collection. This means instead of just running the tool once - the tool will be run once for each element of the provided collection. Additionally, the outputs of the tool will be collected into a collection that matches the structure of the provided collection. This matching structure means the output collections will have the same element identifiers as the provided collection and they will appear in the same order.

It is easiest to visualize “mapping over” a collection in the context of a tool that consumes a dataset and produces a dataset, but the semantics apply rather naturally to tools that consume collections or produce collections as well.

For instance, consider a tool that consumes a paired collection and produces an output dataset. If a list of paired collections (collection type list:paired) is passed to the tool - it will produce a flat list (collection type list) of output datasets with the same number of elements in the same order as the provided list of paired collections.

In the case of outputs, consider a tool that takes in a dataset and produces a flat list. If this tool is run over a flat list of datasets - that list will be “mapped over” and each element will produce a list. These lists will be gathered together in a nested list structure (collection type list:list) where the outer element count and structure matches that of the input and the inner list for each of those is just the outputs of the tool for the corresponding element of the input.

How this works

Falco, the same as FastQC, is a per-sequence operation, and will expect a Flat List collection type. The input collection will have just List assigned.

After running Flatten Collection on a List of Pairs shape (the collection itself will be labeled with just Pairs), the reads inside of it will have “forward” and a “reverse” added to the element identifiers, and be transformed into a Flat List shape and have just List assigned to the collection.

What to do

Double check that the collection being input has the List annotation, or expand the collection to check. Next, make sure you are selecting the correct collection when running the tool. The message indicates that there is a problem with this? Are you not able to start running the tool?

Later on, you can put all of these steps into a little workflow for reuse!

More help is in this topic, and you could swap in Falco for FastQC in the demonstration workflow. → Quality Control Start Here! multQC issue and guidance? - #2 by jennaj

Let us know how this goes! And if you would like more help, it is difficult to guess more, so you could generate a share link to your history and post it back for troubleshooting feedback! See → https://training.galaxyproject.org/training-material/faqs/galaxy/histories_sharing.html

I’m also running a quick independent test to see if this is working as expected! More soon. :slight_smile:

Update! It seems that all tool forms are having trouble consuming collections at UseGalaxy.eu right now. So, not just Falco. They likely know about this already but let’s ping one of the administrators.

Hi @wm75 I’m not able to select a collection input. All Ok?

Testing history with different collection shapes

No, we weren’t aware until now! Thanks for tagging me @jennaj

We’re on it, trying to understand how this happened.

Ok, as far as we understand this at the moment, the latest update with supposedly bug fixes for release 26.0 that we did yesterday shipped with broken collection handling in the tool interface.

It looks like this got fixed upstream already, but now we need another update to capture that fix.

Hopefully, tomorrow everything will be working again.

Very sorry for the inconvenience @maria.vittoria and thanks a lot for reporting the problem.

Update!

Yes, the root issue was a small issue with how some newer warnings are included on tool forms (to warn if the collection shape is a poor choice for the tool). Details → [26.0] Allow batch-wrapped HDCA on single data param in from_json by mvdbeek · Pull Request #22421 · galaxyproject/galaxy · GitHub

But I have GOOD NEWS! The correction has been applied at UseGalaxy.org (confirmed) and it seems this was also applied to UseGalaxy.eu (we can confirm this tomorrow).

Would you like to try again at the EU server, @maria.vittoria ? To see if it works now for you as well? Choosing the Flattened collection should be possible now (it was for me!). Please let us know how this goes! And thank you again for reporting the problem, unexpected, but a really good catch! We are finalizing the next release and feedback about actual usage is helpful. If you or anyone else discovers anything else odd, now or later, please do open a new topic and ask! :rocket:

Yes, the fix has been deployed on Galaxy Europe over night and it seems to work.

@jennaj @wm75 Thank you so much, you were extremely helpful!! The bug is indeed fixed and Falco is now working normally!