SnpEff error while processing VCF entry on Galaxy

Hi,

I have been trying to annotate vcf.gz files that I have using the SnpEff tool on Galaxy. They’re human samples so I’m using the hg19 genome as the Snpff Genome Version Name.

I keep getting the following error message - I’m guessing from the error message that it has to do with the fact that the reference and alternate bases are not single bases and rather 2 bases (TG and CA, respectively, in the below example) - is that the case?
How can I resolve this error? GATK was used for variant calling.

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/corral4/main/jobs/040/966/40966912/_job_tmp -Xmx7g -Xms256m

Error: Error while processing VCF entry (line 176) :

chr1 16890671 . TG CA 125.238 PASS AF=0.53012;AO=44;DP=84;FAO=44;FDP=83;FDVR=10;FR=.;FRO=39;FSAF=21;FSAR=23;FSRF=20;FSRR=19;FWDB=-0.0269551;FXX=0.0119048;HRUN=1;HS_ONLY=0;LEN=2;MLLD=170.568;OALT=CA;OID=.;OMAPALT=CA;OPOS=16890671;OREF=TG;PB=0.5;PBP=1;PPD=0;QD=6.03554;RBI=0.0269856;REFB=0.0133951;REVB=-0.00128241;RO=39;SAF=21;SAR=23;SPD=0;SRF=20;SRR=19;SSEN=0;SSEP=0;SSSB=-0.0302714;STB=0.516705;STBP=0.775;TYPE=mnp;VARB=-0.00753567 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/1:91:84:83:39:39:44:44:0.53012:23:21:20:19:23:21:20:19

java.lang.StringIndexOutOfBoundsException: String index out of range: 3

java.lang.StringIndexOutOfBoundsException: String index out of range: 3

at java.lang.String.substring(String.java:1963)

at org.snpeff.snpEffect.HgvsProtein.simplifyAminoAcidsLeft(HgvsProtein.java:395)

at org.snpeff.snpEffect.HgvsProtein.simplifyAminoAcids(HgvsProtein.java:384)

at org.snpeff.snpEffect.HgvsProtein.toString(HgvsProtein.java:491)

at org.snpeff.snpEffect.VariantEffect.getHgvsProt(VariantEffect.java:633)

at org.snpeff.vcf.VcfEffect.set(VcfEffect.java:1031)

at org.snpeff.vcf.VcfEffect.(VcfEffect.java:147)

at org.snpeff.outputFormatter.VcfOutputFormatter.addInfo(VcfOutputFormatter.java:98)

at org.snpeff.outputFormatter.VcfOutputFormatter.toString(VcfOutputFormatter.java:286)

at org.snpeff.outputFormatter.OutputFormatter.endSection(OutputFormatter.java:112)

at org.snpeff.outputFormatter.VcfOutputFormatter.endSection(VcfOutputFormatter.java:230)

at org.snpeff.outputFormatter.OutputFormatter.printSection(OutputFormatter.java:145)

at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:292)

at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotateVcf(SnpEffCmdEff.java:468)

at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:142)

at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1029)

at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)

at org.snpeff.SnpEff.run(SnpEff.java:1183)

at org.snpeff.SnpEff.main(SnpEff.java:162)

Thank you in advance for your help.

Hi @rozita,
the error seems to be related to this issue: String index out of range · Issue #177 · pcingola/SnpEff · GitHub. SnipEff will be updated to version 5.1; I think it will fix the problem: Update PR.

Regards

Hi @gallardoalba ,
Thank you for your help, from what I’ve read it seems that it would take time, until at least next week, to have it updated. Is there any alternative that I can use in the meantime?

You can try to run it in a local Galaxy instance by cloning my repository. In order to do it, you need to install planemo on your computer and run the command planemo s inside this folder. It will allow you to launch the last SnpEff version in your local Galaxy instance.

Regards

I’m sorry, I’m a beginner with the analysis so I might have a lot of questions that might seem trivial. I was using SnpEff on https://usegalaxy.org/ to do the analysis. My understanding is that I would do the above steps that you mentioned on the command line, which I did follow and I think I’ve managed to install planemo and clone the repository and I can view the tools, however, I’m not sure about the exact steps to be done to launch SnpEff - would you be able to assist me with that? Many thanks for your help.

Hi @rozita,
the error is related to the fact that the vcf line does not encode a simple snp but an mnp, yes, but is not directly caused by it.
It rather seems that SnpEff, after having completed the annotation of the line internally, tries to simplfy the amino acid change it’s going to report by stripping off leading and trailing identical amino acids.
Lets assume for example: in your TG → CA change, the T->C is silent (leaving say an Ala unaltered), but G->A changes a Leu->Ser (completely making this up of course). Then the original AA change annotation would be AlaLeu → AlaSer, but obviously that could be simplified to just Leu → Ser.

Now the problem is that SnpEff assumes in that simplifying code that amino acids are encoded in 3-letter code at this point. For some reason that doesn’t seem to be true in your case, but you seem to arrive at that piece of code with single-letter code (like AL → AS in the example). Now when the code tries to compare the first three letters of the left string to the first three letters of the right one, it fails with that String index out of range: 3 from the error message.
Looking for an explanation of why you arrive there with one letter codes, I can only think of the setting:
Use one letter Amino acid codes in HGVS notation. E.g. p.R47G instead of p.Arg47Gly in the SnpEff annotation options. Maybe that is what’s incompatible with MNPs?
Just guessing here, but if you have checked that option, just try rerunning without it.
Good luck,
Wolfgang

1 Like

Hi @wm75 ,

The only SnpEff annotation options that I’ve checked were the following, so I’m not sure that is the case:

  • Use ‘EFF’ field compatible with older versions (instead of ‘ANN’)
  • Override classic and use Sequence Ontolgy terms for effects (missense_variant vs NON_SYNONYMOUS_CODING)
  • Old notation style notation: E.g. ‘c.G123T’ instead of ‘c.123G>T’ and ‘X’ instead of ‘*’
  • Use transcript ID in HGVS notation. E.g. ENST00000252100:c.914C>G instead of c.914C>G
  • Only use protein coding transcripts
  • Use gene ID instead of gene name (VCF output)
  • Add loss of function (LOF) and nonsense mediated decay (NMD) tags
  • Perform ‘cancer’ comparisons (somatic vs. germline)
1 Like

Out of these ‘EFF’ is the most likely culprit then. Try to go for ‘ANN’.
Which version of SnpEff are you trying to use exactly btw?

I’m using version 4.3 on usegalaxy.org and I got the same error message after choosing ANN instead of EFF as suggested.

It worked when I didn’t select any of the options and converted the vcf.gz files to vcf.

1 Like