How to merge multiple fastq.gz files from one sample into one fastq.gz file

Hi there,

We did Nanopore sequencing. Since the data of reads from one sample is way oversized, MinKNOW automatically splits data from one sample into over 40 separate seq.gz files. Specifically, these split 40 data was only from one run on one sample, but MinKNOW separated it into multiple fastq.gz files.
We need to merge those 40 files into one fastq.gz file. Is there any workflow able to do this? I am looking forward to your help. Thanks so much.

Tingting

1 Like

Hello @tingting081

If all of the reads are placed into the same Dataset Collection, you can use the tool Collapse Collection into single dataset in order of the collection.

These tutorials explain how to use/manipulate collections.

ā†’ Keyword search

ā†’ Topic/category

Thank you for your prompt feedback. I tried the collapse collection and concatenate as well. Then I did Kranken 2. I found results with a vast difference between those two features. After merging, the dataset is 19.9 MB. The classification result from collapse collection is about 3000 reads classified, while 8000 reads classified from concatenate. I am asking: 1. Which feature better fits Nanopore sequencing data? 2. What are possible reasons such a considerable discrepancy was found?

Thank you so much.

Thanks,
Tingting

Hi @tingting081

Glad that you were able to concatenate your reads.

Do you mean a different between using the collapse collection and concatenate methods? These should be exactly the same output if the order of the individual read files was the same.

Did you really attempt to concatenate by manually entering all 40 files on the same tool form? That seems error prone.

Use the collapse collection method ā€“ it is designed to work with large numbers of datasets. Any kind of dataset as long as all are the same datatype and same format (no extra headers, etc).

If both result datasets are identical, then you should expect (mostly) identical results from tools using either. My guess is that these two are actually different. 19.9 MB is very small and possibly an empty file.