SnpEff eff - ERROR_CHROMOSOME_NOT_FOUND

Hello,

I am trying to annotate a .vcf file of nucleotide changes in sample of Influenza B Virus genome vs the reference strain (B/Victoria/02/1987) with corresponding amino acid changes.

I have built a SnpEff database using SnpEff build and the reference genome assembly (ASM3108318v1) for this strain which I obtained from NCBI in gbff.gz format

I have run the SnpEff chromosome-info tool on the database and the chromosome names and co-ordiantes appear to be correct and correspond to the chromosome names found in my .vcf file for this sample. Here is the result of that tool:

CY018764 1’ 2351
CY018763 1’ 2334
CY018762 1’ 2271
CY018757 1’ 1843
CY018760 1’ 1803
CY018759 1’ 1521
CY018758 1’ 1151
CY018761 1’ 1061

However when I run SnpEff eff using the database and my .vcf, every row in the output vcf shows the error β€œNO CHROMOSOME FOUND” and amino acid nomenclature is not present. For example:

CY018764 1822 . G A . PASS DP=1340;MQ=249.42;FractionInformativeReads=1;SoftClipRatio=0;STR;RU=G;RPA=2;ANN=A||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 1:67.9:0,1333:1:0,695:0,638:1333:0,0,927,406:0,0,701,632

I will eventually need to repeat this process for a variety of Influenza strains (many of which do not have pre existing databases).

I am unsure why this is occurring and wondered if anyone could help me troubleshoot this issue. Any help would be hugely appreciated!

Hi @Martin_Larke,

I think the `1’` column in your chromosome info output represents the sequence version, i.e., your DB sequences are really called `CY018764.1`, `CY018763.1`. etc.

SnpEff build has the option `Remove sequence version label?` to adjust for this issue. Alternatively you would need to adjust your ref sequence that results in the VCFs to use `.1` identifiers.

Best,

Wolfgang

Hi Wolfgang,

You are correct, I tested on one chromosome and that has fixed the error! Many thanks for your help.

1 Like