snippy analysis on bacterial isolates

I am analysing 9 bacterial isolates of acinetobacter baumannii using snippy to identify SNPs within each sequence when compared with the reference genome. I have run BLAST on these isolates against the reference genome and can see 100% coverage within the chromosome but plasmid integration in different areas of each isolate. It is always the same sequence from the plasmid which is integrating but at different positions within each isolate. When I run SNP analysis using snippy, there are no SNPs identified in any of the isolates - this may be a silly question but if there is 100% coverage in BLAST, would this be correct that there are no SNPs identified in snippy? Any help would be greatly appreciated.

Welcome, @Jonathon_McLaughlin

Coverage or percent identity?

For a BLAST alignment:

Coverage is a metric that describes overlap between the query and target within the aligned region.

  • The query will have a coverage result == how much of that query sequence is contained within the aligned region
  • And, the target will have a coverage result == how much of that target sequence is contained with the aligned region

Percent identity is a metric that describes how “exact” the basepair matches are between the query and target within the aligned region. This one is a bit more complicated, but for SNPs, you will care about discovering if the query sequence (“sample”) has any different bases compared to the target (“reference”).

For Snippy:

The Coverage metric here is a bit different and can get more complex overall – coverage can inform about how many query sequences (sample reads) overlap with a particular basepair of interest in the target (reference). This is also commonly called the “depth of coverage”.

These are lines of evidence considered when deciding if, at a particular position in the target reference sequence, the query sequences that overlap at the same region have a different basepair (SNP), or an extra basepair (Insert), or a missing basepair (Deletion), and if that matters (based on coverage, quality, ratios, etc). If yes, that basepair is a potential SNP/indel candidate.

This is on the tool form in the Help section. Learn more about what this means in the tutorials and other linked resources.

Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels).

So, to bring it all together, for the first part of your question here …

The first thought I have is that you might not have enough read coverage (query reads) over the isolate (target reference) at some or all locations to call SNPs. In other words, not enough “depth”. Or, you may truly have exact matches over the whole area.

Maybe play around with the Snippy parameters or output more of the optional reports to see if you can figure this part out, or confirm your conclusion! This is a judgement call by the scientist doing the analysis but you could also consider learning more about what others may be doing when working with this species or analysis domain e.g. what thresholds they are also using to call SNPs. You could also run your data through other variant calling protocols and compare (this is pretty common).

Extra: Load all of your data into a genome browser like IGV and visually inspect the region. Do you notice anything from that overview? Taking a look won’t generate statistics, or variant calls, but getting an idea of the big picture tends to be helpful, and can reveal areas to explore next.

Hope this helps! :slight_smile:

1 Like

Hi @jennaj,
Thank you so much for replying to my comment, it is much appreciated!
In BLAST, I am getting 100% percent identity and 99% query coverage. I will attempt to refine the parameters in Snippy to see if any changes occur in the results, I have the minimum coverage set to 0.1 in hopes that the smaller coverage will identify smaller changes - but I will have to go back over the tutorial to refine my knowledge on the tool.
In terms of other variant calling protocols, do you have any recommendations I could use? I have tried using Mauve to align then output SNPs but there has been zero identified there, although I do not think Mauve is ideal for this analysis! This is my first time doing this sort of work and normally focus on proteomics, so I really appreciate your input on this matter!
Thanks again,
Jonathon

Hi @Jonathon_McLaughlin

Great, glad that helped!

Freebayes is another common tool for calling SNPs. We have a few tutorials that incorporate it if you want review examples. :slight_smile: Link → Tutorials using devteam/freebayes