Hello,
I am new to exome sequencing analysis. I have been looking into the theory and tried using galaxy for the same. I got 2 mutations following a published pipeline from the shared data (which uses freebayes) but only one of them was confirmed by Sanger Sequencing.
Looking back at my SNP-Sift output, I see that although the unconfirmed variant is of good quality [“QUAL” is good], the CIGAR string for that variant is just represented as “CIGAR=1X”.
From a theoretical point of view, what I could understand is that , CIGAR strings is generally represented as some matches, then some mismatches/insertion/deletion and then some matches. So, I am unable to understand how it can just be “CIGAR=1X”. If it is possible, please tell what does it say about the quality of that variant.
Hi,
if the variant in question is a single nucleotide change, then CIGAR=1X is just fine though also totally meaningless. CIGAR strings in the VCF INFO field (because that’s what I think we are talking about here) describe how to align an alternate allele to the reference allele. If your REF and your ALT allele both have a length of 1 nt, then the only possibility to align the two is with one mismatch, which is exactly what 1X means.
What you might confuse this with is the CIGAR string associated with aligned reads in a BAM file. This version describes how to align the read to the reference sequence and that’s what would probably be more relevant for assessing the quality of the variant call. However, it’s typically easier to look at reads aligned to the reference in a genome browser (like IGV) than to do it manually by inspecting CIGAR strings one by one.