How to merge multiple fastq.gz files from one sample into one fastq.gz file

tingting081 · February 24, 2023, 6:26pm

Hi there,

We did Nanopore sequencing. Since the data of reads from one sample is way oversized, MinKNOW automatically splits data from one sample into over 40 separate seq.gz files. Specifically, these split 40 data was only from one run on one sample, but MinKNOW separated it into multiple fastq.gz files.
We need to merge those 40 files into one fastq.gz file. Is there any workflow able to do this? I am looking forward to your help. Thanks so much.

Tingting

jennaj · February 24, 2023, 9:13pm

Hello @tingting081

If all of the reads are placed into the same Dataset Collection, you can use the tool Collapse Collection into single dataset in order of the collection.

These tutorials explain how to use/manipulate collections.

→ Keyword search

→ Topic/category

tingting081 · February 24, 2023, 10:22pm

Thank you for your prompt feedback. I tried the collapse collection and concatenate as well. Then I did Kranken 2. I found results with a vast difference between those two features. After merging, the dataset is 19.9 MB. The classification result from collapse collection is about 3000 reads classified, while 8000 reads classified from concatenate. I am asking: 1. Which feature better fits Nanopore sequencing data? 2. What are possible reasons such a considerable discrepancy was found?

Thank you so much.

Thanks,
Tingting

jennaj · February 27, 2023, 6:59pm

Hi @tingting081

Glad that you were able to concatenate your reads.

Do you mean a different between using the collapse collection and concatenate methods? These should be exactly the same output if the order of the individual read files was the same.

Did you really attempt to concatenate by manually entering all 40 files on the same tool form? That seems error prone.

Use the collapse collection method – it is designed to work with large numbers of datasets. Any kind of dataset as long as all are the same datatype and same format (no extra headers, etc).

If both result datasets are identical, then you should expect (mostly) identical results from tools using either. My guess is that these two are actually different. 19.9 MB is very small and possibly an empty file.

Topic		Replies	Views
Concatenate multiple datasets tool; combining all fastq.gz files under a single barcode to one fastq.gz file collections , tool-help	6	123	July 22, 2025
Tool for merging 2x single-read illumina sequencing files (fastq) into one? usegalaxy.org support text-manipulation , macs2 , fastqsanger , epigenetics	7	666	November 5, 2021
Concatenation of RNA_Seq technical replicates usegalaxy.org support troubleshooting	7	24	July 23, 2025
Is there a way to merge collections with different structures? Or merge collections inside a list? usegalaxy.eu support collections , merge-collections	13	61	March 26, 2025
Concatenate multiple datasets tool-help , cat_multi_datasets	1	276	May 6, 2024

How to merge multiple fastq.gz files from one sample into one fastq.gz file

Related topics