How to combine multiple fastq.gz files for each of R1 or R2 reads of a genome sequence?

I have four R1 and four R2 fasq.gz files of a bacterial genome sequence. Do I need to concatenate each of four R1 or each of four R2 files directly?

Hello, the titel of your post and the text contain a different question.

Do I need to concatenate each of four R1 or each of four R2 files directly?

Mostly not, what do you want to do with the files? What kind of analyses do you want to perform?
This can also be a good place to start:

1 Like

Hi gbbio, i would like to assemble the reads. So i need to combine the fastq files. Thanks

Do you need to do a denovo assembly or a “reference based” assembly/mapping? In other words, do you have a reference or do you know if there is a known genome available for your sequenced bacteria? Either way you mostly don’t need to concatenate your files unless you do some sort of insilico pooling. The assembly/mapping tools have R1 and R2 input fields. If you really want to concatenate you can use a tool called: Concatenate datasets tail-to-head (cat).


Sorry for late, and thanks for your help.

Could you help in the determination of fold coverage please?

The fold coverage is mostly not that interesting but as far as I know you could just do that with fastqc (And do some calculation yourself). Other tools you could take a look at are Samtools coverage and Samtools depth. If you search for “depth” or “coverage” in the tool menu there are even more options.


Looking for solution for the same

1 Like

@gbbio thanks, Ill try

@dna I am not familiar with denovo assembly’s so in that case it may work differently. This also counts for @tjh but it is not clear what kind of assembly is done etc. Just keep that in mind, the answer is a possible tip but may not be the full solution for the question.