DeepTools: bamCoverage output doesn't match manual counts

I am following this Galaxy tutorial to analyze Tn-Seq data: (Essential genes detection with Transposon insertion sequencing). This transposon inserts at TA sites. After mapping with Bowtie2, I ran bamCoverage (1 nt window and only report the 5’ end of the reads (position 1)). Inspection of the output bedgraph doesn’t match the read counts observable in the BAM (using VGI). In some cases, the count is way higher in the output, in some cases lower. I repeated bamCoverage to only include forward or reverse reads to get a clearer picture, but they don’t match either.
(1) should the read counts match a manual count of the reads?
(2) is there an alternative tool to compare outputs with?

Hello @sdmoore

The tool BamCoverge includes some scaling, normalization and smoothing parameters, so the counts might not match up exactly.

You could try tuning those parameter settings to directly count to see what happens. This can create a massive job – so do limit the region when testing this out. Help is on the tool form, with more here https://deeptools.readthedocs.org/

An alternative tool is BAM Coverage Plotter.

Both are included in tutorials – see the help sections for the links.

Thanks Jenna,

No scaling or smoothing was implemented - hence the puzzle (also not sure how counts would increase). The Deeptools doc doesn’t indicate what the output columns are from bamCoverage: I think the second column is the position of interest, not sure what the third column is (seems to be adjacent), and the fourth is read count (assumed). The plotter appears to make an image only.

Regards -S

1 Like

Hi @sdmoore

Oh - Ok. I’d like to look closer at your specific example and come up with a solid explanation for what you are observing. The author of this tool is part of the Galaxy community, so we can pull them into this too as needed.

My other guesses are that the BAM contains reads that do not pass whatever bamCompare considers to be valid mapped reads (can lower the direct positional count) and that some default normalizing function is considering a region between mapped ends of a read (increases the direct positional count). Filtering the BAM might address some but not all of this. But let’s move on from guesses and review the actual data. :slight_smile:

Please do this if you have time and I’ll review

  1. Copy that BAM into a new history
  2. Run the tool in that new history, once or a few times with different parameters
  3. Also run BAM Coverage Plotter if you can (or I can later)
  4. Generate a history share link, and post that back here Sharing your History
  5. Include an example of genomic coordinates where you can notice the discrepancy so I can also find it quickly. Something like: chrN:NNN-NNNN – similar to the coordinate formatting in the IGV display

Hi Again, thanks for your help. I picked a region with several examples.

Here is a share link:

/u/sdmoore/h/sdmooredebug-coverage

I had to break the URL because this mail window wouldn’t allow posting of this host…

(BTW, the instructions for the share linking are incorrect, the gear icon is not the path to the sharing, the drop-down arrow is)

The Coverage plotter looks OK, I think: a nice peak at replication origin (~ position 3,880,000 in this reference)

As an example, view region: NC_000913.3:135-301

The ‘TA’ at position 201-202 (1-start visual, 0-start data, so discussion is off by 1):
21 forward read 5’ positions
11 reverse read 5’ positions
bamCoverage (both read orientations) = 32 at position 200, 11 at position 201
bamCoverage (‘forward’, which may be reverse, 11 at position 201)
bamCoverage (‘reverse’, 32 at position 201)

The ‘TA’ at position 212-213:
5 forward 5’ ends
6 reverse 5’ ends
bamCoverage reports 8 and 6

TA at position 231-232:
7 forward
15 reverse
bamCoverage reports 7 and 27

Bedtools Genome Coverage does a similar thing, but sometimes different numbers.

SOLVED: IGV Viewer was not showing all reads, and not in a proportionate manner. Opening the BAM on another computer showed reads that matched the bamCoverage counts. I suspect a memory issue prevented or altered loading.

1 Like

Great! Thanks @sdmoore for posting back the answer :slight_smile: