how to create a paired dataset collection from the files which are already paired

crisprucd · July 27, 2023, 3:58pm

I tried Create a collection from a list of datasets, then the error shows:

×The following selections could not be included due to problems:

Paired-end data (fastq-dump) is a collection, this is not allowed
Paired-end data (fastq-dump) is a collection, this is not allowed

At least one element is needed for the collection. You may need to [cancel ](javascript:void(0))and reselect new elements.

Could anybody let me know how to do that in Galaxy?
Thank you,

jennaj · July 27, 2023, 4:47pm

Hi @crisprucd

Try using Collection tools to transform.

If your data is currently in two collection “lists” (single end each) – one for forward reads and one for reverse reads – create a paired end collection using the tool Zip collections.

Should that not be a match for what you want to do …what kind of collections do you already have? There is probably a tool. Modifying collection types is super important, and used all the time.

Technically, you could unhide all of the current collection’s datasets, then start over from scratch but the only reason to do that is if the original collection groupings were incorrect for some reason.

Collection tutorials, plus the tool forms have short descriptions https://training.galaxyproject.org

crisprucd · July 27, 2023, 5:13pm

Hello Jennaj,
Thank you for the help. I downloaded 6 SRR files using download and extract reads in FASTQ format from NCBI SRA. The files showed a single-end data(fastq-dump), which is empty, and a paired-end data(fastq-dump) for each download in history. I want to combine the paired files into one collection file.
I tried a few ways in Galaxy, but no success.

jennaj · July 27, 2023, 6:36pm

The output from fasterq_dump is in collections already.

The paired end collection output will contain one pair per accession.

Did you fetch the accessions all in one job? Or did you run that tool one time per accession? You can input a list instead to get them all at once.

There are still ways to manipulate what you have if that is true. Unzip collection → Merge collections (twice, once per end) → then Zip collection. The “merge” expects list collections. There are reasons around this that are too much to put here…

If that does not help enough, maybe some screenshots or a shared history will me to understand what is going on.

crisprucd · July 27, 2023, 7:11pm

I tried paste/fetch data for the 6 SRR files together and it didn’t go through. So I had to use the tool of download and extract reads in FASTQ format from NCBI SRA, which did individually. Now I am trying to use FASTQ Splitter to separate them into single end files, then I will try the Zip collection to put them together.
Thanks,

jennaj · July 27, 2023, 7:15pm

The list of accessions should be in a plain text file. Just the SRR or ERR identifiers with no extra content, one per line.

You can create that with the Upload tool, or load up a file you created on your computer. You can even edit it more once in your history: click on the “Visualize” icon for the dataset and search with the keyword “edit” to find the function.

That sounds good except for the Fastq Splitter part. That is for paired reads that are joined into the same sequence. These reads are in separate sequences but interleaved/interlaced. Try using FASTQ de-interlacer on paired end reads(Galaxy Version 1.1.5) instead. Hope it works out

NGS data logistics << explains the different fastq formats with examples/tools

crisprucd · July 27, 2023, 7:22pm

Thank you so much.
I will correct that.

crisprucd · July 28, 2023, 2:27pm

Hi Jennaj,
I unzipped the individual file and the forward and reverse are separated. Then I tried to use zip connection to put them together directly, but the multiply datasets icon can not recognize the files. Should I use a tool to put the forward files and reverse files as 2 connections at first, then the zip connection would work to put the forward connection and reverse connection as one file?
Thank you,

jennaj · July 28, 2023, 5:11pm

I’m confused, maybe misunderstanding. Unzipping a list collection does not create two files. A list collection contains only one list, not two paired lists, so there is nothing to unzip.

What you are trying to do is super tedious. While getting the data reorganized is certainly possible (a few different ways, not just what we discussed so far), it seems well worth your time to start over. Learning how to do this will serve you now, and going forward.

Try retrieving the sequences from NCBI again. Put all of the accessions to retrieve into a simple text file (one accession per line, no extra spaces or tabs), use the tool Faster Download and Extract Reads in FASTQ format from NCBI SRA, and select that file with the accession list instead of entering accessions directly on the form. All will be sort out correctly into collections in a usable format with zero chance of any mixups.

crisprucd · July 28, 2023, 6:39pm

Hi Jennaj,

Thank you for the suggestions.

The forward reads and revers read are paired for each sample. However the zip connection cant recognize them, which cant add them together as a connection file.

I tried FASTQ de-intertracer to separate them into individual forward or reverse file, but it didn’t go through.

So I tried unzip connection, which looked worked and separated them into several forward and revers files.

Then I tested Zip connection, but it does not recognize the files through the multiply dataset. The separated files are still recognized as a individual collection files for Zip collection. I may used it by a wrong way for the zip collection tool and it may just work on 2 separated connection files, and add them together.

I will try The Faster Download and Extract Reads in FASTQ format from NCBI SRA tool to download the dataset together. But if we have same situation which we just have individual paired files, how we can add them together as one connection file.

Thank you for your time.

jennaj · July 28, 2023, 8:04pm

If data are ever in a collection, organized in a way that you don’t want to use, and the collection manipulation tools are not working for some reason, you can try this:

Go into the Hidden tab of your history
Unhide the datasets you want to restructure
Create new collections

Collections are like folders. The files inside them are datasets. It is fine to delete collection folders that you don’t need – just be sure to not deleted the dataset files inside of collections or you will need to upload the data again.

crisprucd · July 28, 2023, 8:15pm

Thank you,

Topic		Replies	Views
Paird-end Fastq-dump Manipulation - Fastq De-Interlacer	3	2514	May 7, 2019
Need help flattening collections! usegalaxy.eu support collections	2	8	April 11, 2025
How to remove duplicates in a concatenated paired dataset? usegalaxy.org.au support workflow , metagenomics , mothur	0	401	September 16, 2021
Fastp - Collection usegalaxy.eu support collections , fastp	1	14	March 28, 2025
Concatenate multiple datasets tool; combining all fastq.gz files under a single barcode to one fastq.gz file collections , tool-help	5	74	April 3, 2025

how to create a paired dataset collection from the files which are already paired

At least one element is needed for the collection. You may need to [cancel ](javascript:void(0))and reselect new elements.

Related topics