Retrieve SNP info from bam files

Elaine · March 1, 2022, 6:08pm

My project involves genotyping individuals at targeted sites in the human SNP databases. The original DNA has been enriched in the targeted SNPs following capture hybridization. I have bam files of the aligned sequences resulting from the capture hybridization and a list of approximately 250 targeted SNP sites. I would like the output file(s) to list the two alleles from each site, the allele frequency and the coverage. I am hoping for suggestions as to how to retrieve this information from the bam files using Galaxy?

igor · March 1, 2022, 11:25pm

Hi @Elaine,
BAM files contain reads aligned to reference genome. You need to call variant first, for example Free Bayes is a good caller. You may want to work on alignments before calling variants, for example, re-align indels to left. VCF format contain information about variants including depth of coverage. Once you get list of variants, IDs from dbSNP can be added to the identified variants using tools like SnpSift Annotate SNPs from dbSnp. Have a look at GTN tutorials in

Kind regards,
Igor

Elaine · March 2, 2022, 6:24pm

Thank you Igor for responding to my request for help. I understand the steps I would take to obtain the allelic information. However, it seems like more steps than would be necessary, given that I already have a list of SNPs I am interested in, and am not trying to discover new ones. What has me stuck is that when I visualize the reads on IGV, searching by SNP location, all of the allelic information that I want is present. The problem is that I need to copy that information separately from each locus. Isn’t there a way to retrieve the information for multiple SNPs at once?

igor · March 4, 2022, 3:59am

BAM files contain information about reads mapped to genome, while IGV displays pileup style data in coverage track at the top of alignment. It converts information about mapped reads into “coverage”. This is what we do during variant calling.
I don’t know if information about variants and coverage can be extracted in IGV. It might be doable using intersection or other options, but I would not do it: it shows total coverage for a position, so it is easy to get confused by multi-mapped reads.
On other hand, FreeBayes can cal variants in all samples and produce a single table. To speed up, you can call variants only around the target site.
Kind regards,
Igor

Topic		Replies	Views
How to use SNPsplit in galaxy on mouse genome bam files to allign to two different mouse strains or is there anyother alternative...i need the output in .bam format variant-analysis	3	345	January 31, 2023
Annotation and variant calling on diploid systems gtn-tutorial , workflow , variant-analysis	3	526	September 1, 2021
How can I count the number of reads that support a variant in a vcf file? usegalaxy.org support variant-analysis , vcf	2	961	August 30, 2021
How to calculate allelic imbalance in usegalaxy freebayes , usegalaxy , variant-analysis	2	482	March 17, 2021
SNP variant analysis/MergeSamFiles? usegalaxy.eu support workflow , galaxy-local	3	637	May 4, 2020

Retrieve SNP info from bam files

Related topics