how to create a paired dataset collection from the files which are already paired

I tried Create a collection from a list of datasets, then the error shows:

×The following selections could not be included due to problems:

  • Paired-end data (fastq-dump) is a collection, this is not allowed
  • Paired-end data (fastq-dump) is a collection, this is not allowed

At least one element is needed for the collection. You may need to [cancel ](javascript:void(0))and reselect new elements.

Could anybody let me know how to do that in Galaxy?
Thank you,

Hi @crisprucd

Try using Collection tools to transform.

If your data is currently in two collection “lists” (single end each) – one for forward reads and one for reverse reads – create a paired end collection using the tool Zip collections.

Should that not be a match for what you want to do …what kind of collections do you already have? There is probably a tool. Modifying collection types is super important, and used all the time.

Technically, you could unhide all of the current collection’s datasets, then start over from scratch but the only reason to do that is if the original collection groupings were incorrect for some reason.

Collection tutorials, plus the tool forms have short descriptions https://training.galaxyproject.org

Hello Jennaj,
Thank you for the help. I downloaded 6 SRR files using download and extract reads in FASTQ format from NCBI SRA. The files showed a single-end data(fastq-dump), which is empty, and a paired-end data(fastq-dump) for each download in history. I want to combine the paired files into one collection file.
I tried a few ways in Galaxy, but no success.

The output from fasterq_dump is in collections already.

The paired end collection output will contain one pair per accession.

Did you fetch the accessions all in one job? Or did you run that tool one time per accession? You can input a list instead to get them all at once.

There are still ways to manipulate what you have if that is true. Unzip collection → Merge collections (twice, once per end) → then Zip collection. The “merge” expects list collections. There are reasons around this that are too much to put here…

If that does not help enough, maybe some screenshots or a shared history will me to understand what is going on.

I tried paste/fetch data for the 6 SRR files together and it didn’t go through. So I had to use the tool of download and extract reads in FASTQ format from NCBI SRA, which did individually. Now I am trying to use FASTQ Splitter to separate them into single end files, then I will try the Zip collection to put them together.
Thanks,

The list of accessions should be in a plain text file. Just the SRR or ERR identifiers with no extra content, one per line.

You can create that with the Upload tool, or load up a file you created on your computer. You can even edit it more once in your history: click on the “Visualize” icon for the dataset and search with the keyword “edit” to find the function.

That sounds good except for the Fastq Splitter part. That is for paired reads that are joined into the same sequence. These reads are in separate sequences but interleaved/interlaced. Try using FASTQ de-interlacer on paired end reads(Galaxy Version 1.1.5) instead. Hope it works out

Thank you so much.
I will correct that.

Hi Jennaj,
I unzipped the individual file and the forward and reverse are separated. Then I tried to use zip connection to put them together directly, but the multiply datasets icon can not recognize the files. Should I use a tool to put the forward files and reverse files as 2 connections at first, then the zip connection would work to put the forward connection and reverse connection as one file?
Thank you,

I’m confused, maybe misunderstanding. Unzipping a list collection does not create two files. A list collection contains only one list, not two paired lists, so there is nothing to unzip.

What you are trying to do is super tedious. While getting the data reorganized is certainly possible (a few different ways, not just what we discussed so far), it seems well worth your time to start over. Learning how to do this will serve you now, and going forward.

Try retrieving the sequences from NCBI again. Put all of the accessions to retrieve into a simple text file (one accession per line, no extra spaces or tabs), use the tool Faster Download and Extract Reads in FASTQ format from NCBI SRA, and select that file with the accession list instead of entering accessions directly on the form. All will be sort out correctly into collections in a usable format with zero chance of any mixups.

Hi Jennaj,

Thank you for the suggestions.

The forward reads and revers read are paired for each sample. However the zip connection cant recognize them, which cant add them together as a connection file.

I tried FASTQ de-intertracer to separate them into individual forward or reverse file, but it didn’t go through.

So I tried unzip connection, which looked worked and separated them into several forward and revers files.

Then I tested Zip connection, but it does not recognize the files through the multiply dataset. The separated files are still recognized as a individual collection files for Zip collection. I may used it by a wrong way for the zip collection tool and it may just work on 2 separated connection files, and add them together.

I will try The Faster Download and Extract Reads in FASTQ format from NCBI SRA tool to download the dataset together. But if we have same situation which we just have individual paired files, how we can add them together as one connection file.

Thank you for your time.

If data are ever in a collection, organized in a way that you don’t want to use, and the collection manipulation tools are not working for some reason, you can try this:

  1. Go into the Hidden tab of your history
  2. Unhide the datasets you want to restructure
  3. Create new collections

Collections are like folders. The files inside them are datasets. It is fine to delete collection folders that you don’t need – just be sure to not deleted the dataset files inside of collections or you will need to upload the data again.

Thank you,

1 Like