Hi, my input is 9M reads. I ran it through samtools mpileup following BWA alignment. I set the max per-file depth to 1M and I got an output of about 250,000 reads. I know I didn’t lose that much to quality, so why is my output number higher? I also noticed that if I set the max per file depth to anything between 300,000-1,000,000, I still get 250,000 as my output. Does samtools have a read limit?
I never used samtools mpileup on galaxy but how did you counted the number of reads in the mpileup file?
Oh I counted later down the pipline. I use compute and I end up with a table with read number included
Did you used the output option --output-QNAME?
I am asking because although it is possible the get the total amount of reads from a mpileup file it is not that easy. As far as I can think of. And also as far as I know (but can be wrong) you can not do that with the compute tool. And again I can be wrong but I have the feeling you are misunderstanding the mpileup output.
I did not use the output option --output-QNAME. Well if the max-per-file depth does not determine how many reads at each base position, what does it do? Column 4 shows read number at each position? or am I completely wrong?
Yes column 4 shows the depth (amount of reads) mapped on that specific position so that is correct. But just summing up that column will not give you the correct total amount of mapped reads. Because let’s say you have position 500 and 501, a (big) part of the reads mapped on those positions are the same reads and counted double. And if you have 9M reads, they will be spread out over the full reference.
If you want to know the total amount of mapped reads you need to use Samtools flagstat.
You are right. I used Samtools flagstat and I got 9 M reads aligned (99.5% aligned). Is there anyway of checking how much reads I have after Samtools?
I still only have max 250,000 reads at a position following samtools.
So you are expecting more then 250,000 reads per position. You could double check this with Samtools depth. Also in samtools mpileup you can disable the max depth by using a value of 0.
EDIT:
You also may need to rephrase your question.
I also noticed that if I set the max per file depth to anything between 300,000-1,000,000, I still get 250,000 as my output.
If the max is 300,000 and the depth is lower then that (250,000) why would it change? I would expect a change if you set the max-depth to a lower then 250,000 value. Like 8000 or something
With Samtools depth I am getting ~7000 coverage at each position.
I expected more than 250,000 at each position because its 9 M reads over a genome size of 4 kb.
It seems I am losing 10,000 coverage at the BWA alignment stage. It is not samtools. Thanks for your help.