DeepVariant low quality sorting

Why is my mapping quality so low (below 60)? E. coli, WGS, 100X coverage

  • Map with minimap2 - A fast pairwise aligner for genomic and spliced nucleotide sequences(Galaxy Version 2.28+galaxy0).
  • DeepVariant - deep learning-based variant caller(Galaxy Version 1.5.0+galaxy1).
  • SnpEff eff: - annotate variants(Galaxy Version 4.3+T.galaxy2)

help me! :-?

Welcome, @Nicole

From what I know (knew) about Minimap2, a MAPQ of 60 describes a perfect unique hit. I searched to confirm and found this post at the tool authors Github, and that is also what they state here MAPQ = 60 meaning · Issue #447 · lh3/minimap2 · GitHub with more details in this topic Uniquely mapped reads · Issue #528 · lh3/minimap2 · GitHub. You might find more nuance by searching there too, or by reviewing the author’s guide here minimap2/README.md at master · lh3/minimap2 · GitHub and FAQs here minimap2/FAQ.md at master · lh3/minimap2 · GitHub.

For the Galaxy part, when using DeepVariant, if your mapping quality is influencing how many variants you can call (and then annotate), I would suggest exploring the query read data quality first, then the alignment parameters and then the target database, to see if any explain what is going on in your experiment. In short, if the read quality is Ok, maybe adjust your alignment parameter settings to fit the data better? And, maybe review to learn if there is anything special about the target database that could reduce unique hits (create more potential multi-mappers) such as known duplicated regions.

Then if that is not enough… you could also consider using a different mapping tool, even if just for a comparison. If you have short read WGS data, alternative tools for mapping before calling variants can include Bowtie2 or BWA-MEM.

And, I tend to just inspect alignment data that seems off since visualizing provides another layer of understanding that can be nested inside of statistics. Now, SAMTools and other BAM parsing tools in Galaxy can be used to review trends across the entire dataset, but visualizing data in UCSC or IGV’s brower are good choices for a closer look. Load up your BAMs, and review regions that you think might be interesting. Maybe choose a site that has a known variant, or compare areas that have a different range of MAPQ scores. If UCSC hosts your genome, consider turning on tracks related to conservation, genome rearrangement, repeats, other known annotation tracks.

Hope this helps! :slight_smile: