Flattening List of List of Pairs to List of Pairs

Hi there

I have used the fasterq-dump tool over a collection of input files (each input file has a list of accessions to download) to retrieve some data from NCBI SRA. The output is a list of lists of pairs. I would like to convert this into a flat list, i.e. simply a list of pairs. How can I do this?

Thanks!

I’m a bit surprised you got this, but here’s what I would try:

  1. Flatten collection
    Super-easy in case it works, but I’m not sure it will do the right thing.
  2. If it doesn’t: try Apply Rule to Collection, which is the rule-based uploader in tool form.

Neither Flatten collection nor Apply Rule To Collection allow me to select the nested collection.

If that is true, then, most likely, the download failed for some datasets. Try Filter failed on the collection and see if the result becomes selectable by the other tools.

Filter Failed only takes list or list:paired input: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tools/filter_failed_collection.xml#L10

The Apply Rule To Collection and Flatten tools have no obvious limit to their inputs, but also do not seem to offer nested collections as inputs. Perhaps this is a bug.

This is not what I’m observing. All three tools work just fine for me on nested lists.
Can you maybe share an example history with a problematic collection?

I have shared my GBRC history with you on usegalaxy.eu (using your uni-freiburg.de email)

1 Like

Hmm, there seems to be something wrong with these nested collections.
At least in my copy of your shared history there just don’t seem to be any datasets inside of them. Also the dump logs are all empty. If that’s the same for you, it might explain your downstream problems.

Thank you! I will investigate what might have gone wrong.

I tried again with a smaller set of data, simply this list:

SRR11810704
SRR11810705

and used the Download and Extract Reads in FASTA/Q tool. The output is an empty list, which is surprising. The output should be a list of 2 fastq datasets (this is single ended data).

Oops, you’re right. Turns out EU doesn’t have version 2.10.7+galaxy2 of fasterq-dump yet, which brings this bug fix for the accession list mode.

We’ll update ASAP, and thanks for reporting!

As a workaround, you should be able to use v 2.10.7+galaxy0 for now.
I think the +galaxy1 version only introduced support an sra_manifest.tabular datatype and the bug :wink:

Version 2.10.7+galaxy2 of all sra_tools is available on EU, meaning the latest version of fasterq-dump works as intended now.
Thanks again, Peter!