Downloading hg19/GRCh37 SnpEff database into history

jennaj · July 11, 2025, 11:26pm

Let’s start over in a new topic, since time, the tool version, the server you are working at, and the Snpeff source location can all make a difference.

What is going on: after a recent update at the external location where these files are hosted (and downloaded from), there are some newer issues. It sounds like this is your use case, and the solution below is what we would recommended for you, too.

Please give this a try! You will want to create the SnpEff index using the genbank file, and output both the fasta and gff3 from the tool, then use those exact reference data for all of the upstream steps, including using the output fasta as a custom-genome.

Any questions while working through this, please let us know!

Problems with snpEff databases

Update

I was able to confirm that the download versions of the database can be problematic. This has to do with how the precomputed indexes are hosted by the source. → Usage issues with downloaded snpeff 5.N databases ORG and EU · Issue #6956 · galaxyproject/tools-iuc · GitHub

From here, you can create your own index with the genbank file for you species from NCBI. → FAQ: NCBI reference data

Or, from the fasta and gtf or gff3 files from any other source.

If you need help locating the files you need, we can help. We’ll need to know what indexes you are already using and which server you are working at. The assembly version is important. You could also build your SnpEff index first, get all the reference data prepared, then run the mapping and variant calling against those files instead, to make sure everything is internally consistent.

Hope this helps!

You will want to plan ahead to avoid sequence identifier conflicts, especially when working with human data: genome, annotation, other annotation like this one, and where you plan to visualize the data. IGV can accept any genome but UCSC will require UCSC identifiers. You can try to convert identifiers in some files types but not all, see Replace column.

Use case: working at UCSC is important to you, or you have other reference data with UCSC (not Ensembl) identifiers. Solution: Pull in the UCSC version of the reference genome and annotation (RefSeq Genes is usually best), build the SnpEFF index, and all should be good to go!

XREf → Reference genomes at public Galaxy servers: GRCh38/hg38 example

Topic		Replies	Views
Problems with snpEff databases usegalaxy.eu support snpeff	2	14	June 12, 2025
Subject: Urgent Assistance Needed with SnpEff Custom Database Creation reference-index , snpeff	2	390	January 19, 2024
Snpeff database run errors usegalaxy.org support tool-help , snpeff , snpeff_build_gb	3	55	March 13, 2025
How to add Triticum aestivum snpEff4.3 genome database or appropriate wheat genome database in Galaxy for VCF annotation? usegalaxy.org support snpeff	10	2286	March 30, 2020
SnpEff database compatibility snpeff	8	91	November 20, 2024

Downloading hg19/GRCh37 SnpEff database into history

Related topics