Problems with snpEff databases

Welcome @angelbg

SnpEff is very picky about the content between all files being in synch. If the version of the reference genome you mapped against then called variants with is not exactly the same as the version of the reference genome the pre-built index is based on, errors can come up. We had a discussion yesterday about some of those corner cases if interested.

The part of the job log you shared is a good start, and this pop-up usually means something else is going on, sometimes with the job’s input datasets metadata. This could be a different database key on some data, empty input files (header only?), and other things, some of which you can adjust.

I had trouble downloading pre-built indexes in May and am trying that again on a test dataset to see what happens.

If you would like to share back the history with your job, we can review closer, too, and try to figure out a solution. Make sure the history has all the upstream jobs/data for the sample that is failing since the early files can matter.

And, the alternative is to create the index yourself from a .gbk (genbank) record. Doing this at the start, and having that tool parse out the custom genome fasta (and annotation), then using that version of the assembly, tends to work better (and is how the issue yesterday was resolved).

Calling and annotating snps is incredibly specific, so the coordinate scheme and assembly bases must be exact between all data, or the results will be incorrect or the tools can fail.

I’ll post back after my test (double check no server issues), and if you are not sure how to generate a history share link to post back here (for data issue resolution), please see the banner topic at this forum. Thanks! :slight_smile: