DeepVariant low quality sorting

Nicole · July 30, 2024, 6:23am

Why is my mapping quality so low (below 60)? E. coli, WGS, 100X coverage

Map with minimap2 - A fast pairwise aligner for genomic and spliced nucleotide sequences(Galaxy Version 2.28+galaxy0).
DeepVariant - deep learning-based variant caller(Galaxy Version 1.5.0+galaxy1).
SnpEff eff: - annotate variants(Galaxy Version 4.3+T.galaxy2)

help me! :-?

jennaj · July 31, 2024, 12:55am

From what I know (knew) about Minimap2, a MAPQ of 60 describes a perfect unique hit. I searched to confirm and found this post at the tool authors Github, and that is also what they state here MAPQ = 60 meaning · Issue #447 · lh3/minimap2 · GitHub with more details in this topic Uniquely mapped reads · Issue #528 · lh3/minimap2 · GitHub. You might find more nuance by searching there too, or by reviewing the author’s guide here minimap2/README.md at master · lh3/minimap2 · GitHub and FAQs here minimap2/FAQ.md at master · lh3/minimap2 · GitHub.

For the Galaxy part, when using DeepVariant, if your mapping quality is influencing how many variants you can call (and then annotate), I would suggest exploring the query read data quality first, then the alignment parameters and then the target database, to see if any explain what is going on in your experiment. In short, if the read quality is Ok, maybe adjust your alignment parameter settings to fit the data better? And, maybe review to learn if there is anything special about the target database that could reduce unique hits (create more potential multi-mappers) such as known duplicated regions.

Then if that is not enough… you could also consider using a different mapping tool, even if just for a comparison. If you have short read WGS data, alternative tools for mapping before calling variants can include Bowtie2 or BWA-MEM.

And, I tend to just inspect alignment data that seems off since visualizing provides another layer of understanding that can be nested inside of statistics. Now, SAMTools and other BAM parsing tools in Galaxy can be used to review trends across the entire dataset, but visualizing data in UCSC or IGV’s brower are good choices for a closer look. Load up your BAMs, and review regions that you think might be interesting. Maybe choose a site that has a known variant, or compare areas that have a different range of MAPQ scores. If UCSC hosts your genome, consider turning on tracks related to conservation, genome rearrangement, repeats, other known annotation tracks.

Hope this helps!

Topic		Replies	Views
Want to retain reads with mapping quality 0 usegalaxy.org support variant-analysis	1	462	December 26, 2019
compare Galaxy europe aligners (bowtie2, BWA and minimap2) and CLC genomics workbench usegalaxy.eu support mapping , blast	11	840	September 27, 2023
First time user - Genome comparison usegalaxy.org support gtn-tutorial , dropbox	2	296	October 11, 2023
Fix a workflow for Variant calling from WGS data upon mutagenesis-screen snpeff , picard_markduplicates	2	665	May 1, 2020
Project help...(variant analysis) usegalaxy.org support tool-deprecated , picard_markduplicates	1	2506	August 3, 2020

DeepVariant low quality sorting

Related topics