Downloading hg19/GRCh37 SnpEff database into history

Welcome @boi_incog

Let’s start over in a new topic, since time, the tool version, the server you are working at, and the Snpeff source location can all make a difference.

What is going on: after a recent update at the external location where these files are hosted (and downloaded from), there are some newer issues. It sounds like this is your use case, and the solution below is what we would recommended for you, too.

Please give this a try! You will want to create the SnpEff index using the genbank file, and output both the fasta and gff3 from the tool, then use those exact reference data for all of the upstream steps, including using the output fasta as a custom-genome.

Any questions while working through this, please let us know! :slight_smile:

You will want to plan ahead to avoid sequence identifier conflicts, especially when working with human data: genome, annotation, other annotation like this one, and where you plan to visualize the data. IGV can accept any genome but UCSC will require UCSC identifiers. You can try to convert identifiers in some files types but not all, see Replace column.

Use case: working at UCSC is important to you, or you have other reference data with UCSC (not Ensembl) identifiers. Solution: Pull in the UCSC version of the reference genome and annotation (RefSeq Genes is usually best), build the SnpEFF index, and all should be good to go!

XREf → Reference genomes at public Galaxy servers: GRCh38/hg38 example