I have the raw R1 and R2 files of a bacterial genome sequence that have been assembled into contigs using spades. How can I calculate the genome coverage / depth? Thanks.
You calculate the typical genome coverage with paired-end sequencing with the equation:
C = (N · L · 2) / G
N = number of reads
L = read length
G = genome size
2 = factor for the paired-end sequencing
@Flow thanks, I tried with this equation but still unsure if its accurate. There seems to be ~ 40000000 raw reads (including both R1 and R2) of 150 bp paired end sequencing and the assembly is ~2000000 nucleotides. So its 3000x coverage?
Maybe the coverage after mapping is more explainable, since your data might contain some reads with bad quality.
Coverage of 3000x might be good for a variant analyses. Thus, it really depends if 3000x is now “right” or “wrong”.