Snpeff database run errors

jennaj · March 5, 2025, 6:22pm

Yes, building a database with SnpEff can be a bit tricky since the tools are very particular about the formatting for identifiers (chromosomes, genes, other features). In short, try to locate files that are all based on the same exact assembly build/version, then use very simplest file formats possible (applying “cleanup” steps after getting the data from a provider might be necessary).

We have prior troubleshooting for this tool in these topics: snpeff and snpeff_build_gb plus any with reference-genome or reference-annotation tags.

For a short review of the human genome as an example, which has many different assemblies that are not directly compatible (but can be manipulated to be), please see →

Reference genomes at public Galaxy servers: GRCh38/hg38 example

And for a recent post where this type of data was reformatted, please see. (they are doing something a bit different but perhaps helpful anyway) →

Getting NCBI Reference genome indexed for tools: custom genome, reference genome, reference annotation

Then for this part

Those indexes come directly from the tool authors at Home - SnpEff & SnpSift. You could report the issue to them but there might not be a lot they can do since it is all automatic, and relies on public data. If that is flawed, anything created from it will carry the problems forward, as you noticed!

SO, all of that is a lot to read through! If you get stuck and would like to share back a history with just your reference data and the failed runs, we can probably help to diagnose what might be going wrong and fix it up. Right now, it seems like you have mismatched chromosome identifiers. Meaning, the reference genome fasta and reference annotation seem to not be “matching up” for some reason. That could be a file format issue (simplifying the format is where to start, maybe with gffread and NormalizeFasta), or actually a difference in the data itself and you’ll need to standardize the identifiers across files (if a mapping exists for the Replace column tool) or need to locate different reference data.

Hope this helps and we can follow up!