I have the raw R1 and R2 files of a bacterial genome sequence that have been assembled into contigs using spades. How can I calculate the genome coverage / depth? Thanks.
Dear @dna,
You calculate the typical genome coverage with paired-end sequencing with the equation:
C = (N · L · 2) / G
N = number of reads
L = read length
G = genome size
2 = factor for the paired-end sequencing
Kind regards,
Florian
@Flow thanks, I tried with this equation but still unsure if its accurate. There seems to be ~ 40000000 raw reads (including both R1 and R2) of 150 bp paired end sequencing and the assembly is ~2000000 nucleotides. So its 3000x coverage?
Dear @dna,
Just to make sure I ask a colleague of mine @pavanvidem. He confirmed my assumptions.
Maybe the coverage after mapping is more explainable, since your data might contain some reads with bad quality.
Coverage of 3000x might be good for a variant analyses. Thus, it really depends if 3000x is now “right” or “wrong”.
Kind regards,
Florian
Thanks again