SNP variant tool gives a string instead of one nucleotide

Hi everyone!

The SNP analysis give me two long strings (with only character difference) as a SNP variant for my strains at one position, this exactly: POS:2521921
Ref GAAAAAAAAAAAAAAAAAAAGATGAAACTAAGAAAAC
Alt AAAAAAAAAAAAAAAAAAAAGATGAAACTAAGAAAAC
(the first charachter is the only difference, every other position there was just one nucleotide difference).

Is it because of some kind of tool error? Or a potential double transition (except on the character) could
explain this?

Looks like this is an artefact of variant calling. (It’s beyond the scope of this help forum, but it could reflect what types of variants and combinations thereof the variant caller initially considered for that region. Eventually it decided to call only this single SNP, but may have considered other possible changes.)

Whatever the reason, it is good practice to process your VCF with bcftools norm after variant calling, which should turn cases like this into their standard single-nucleotide representation and will, more generally, ensure the result conforms to VCF standards.

1 Like