Problems with snpEff databases

Hi,

I’m trying to run snpEff eff on a FreeBayes vcf. Since it appears there are no ‘Locally installed snpEff databases’ (I get a ‘No options available’ in the field below), I tried to download a db with snpEff download. This works well but when I try to use it choosing ‘Downloaded snpEff database in your history’, I get the following pop up on trying to run the job:

Pasting the text also for convenience:

{
    "history_id": "d2153e1f600a6ba5",
    "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpeff/snpEff/5.2+galaxy0",
    "tool_version": "5.2+galaxy0",
    "inputs": {
        "input": {
            "batch": false,
            "product": false,
            "values": [
                {
                    "id": "26c75dcccb616ac8356bd79e72b21da6",
                    "src": "hda",
                    "map_over_type": null
                }
            ]
        },
        "inputFormat": "vcf",
        "outputFormat": "vcf",
        "csvStats": false,
        "generate_stats": true,
        "snpDb|genomeSrc": "history",
        "snpDb|snpeff_db": {
            "batch": false,
            "product": false,
            "values": [
                {
                    "id": "26c75dcccb616ac802561507caf5baf2",
                    "src": "hda",
                    "map_over_type": null
                }
            ]
        },
        "snpDb|reg_section|regulation": null,
        "udLength": "0",
        "spliceSiteSize": "2",
        "spliceRegion|setSpliceRegions": "no",
        "annotations": null,
        "intervals": null,
        "transcripts": null,
        "filterOut": null,
        "filter|specificEffects": "no",
        "chr": null,
        "noLog": true
    }
}

I tried with databases ‘hg19’ and ‘hg38’ but same result. Also I tried the option ‘Download on demand’ and specifying the same databases, but the jobs seem to get stuck on running state, and never ending.

Can you help?

Thanks!

Welcome @angelbg

SnpEff is very picky about the content between all files being in synch. If the version of the reference genome you mapped against then called variants with is not exactly the same as the version of the reference genome the pre-built index is based on, errors can come up. We had a discussion yesterday about some of those corner cases if interested.

The part of the job log you shared is a good start, and this pop-up usually means something else is going on, sometimes with the job’s input datasets metadata. This could be a different database key on some data, empty input files (header only?), and other things, some of which you can adjust.

I had trouble downloading pre-built indexes in May and am trying that again on a test dataset to see what happens.

If you would like to share back the history with your job, we can review closer, too, and try to figure out a solution. Make sure the history has all the upstream jobs/data for the sample that is failing since the early files can matter.

And, the alternative is to create the index yourself from a .gbk (genbank) record. Doing this at the start, and having that tool parse out the custom genome fasta (and annotation), then using that version of the assembly, tends to work better (and is how the issue yesterday was resolved).

Calling and annotating snps is incredibly specific, so the coordinate scheme and assembly bases must be exact between all data, or the results will be incorrect or the tools can fail.

I’ll post back after my test (double check no server issues), and if you are not sure how to generate a history share link to post back here (for data issue resolution), please see the banner topic at this forum. Thanks! :slight_smile:

Update

I was able to confirm that the download versions of the database can be problematic. This has to do with how the precomputed indexes are hosted by the source. → Usage issues with downloaded snpeff 5.N databases ORG and EU · Issue #6956 · galaxyproject/tools-iuc · GitHub

From here, you can create your own index with the genbank file for you species from NCBI. → FAQ: NCBI reference data

Or, from the fasta and gtf or gff3 files from any other source.

If you need help locating the files you need, we can help. We’ll need to know what indexes you are already using and which server you are working at. The assembly version is important. You could also build your SnpEff index first, get all the reference data prepared, then run the mapping and variant calling against those files instead, to make sure everything is internally consistent.

Hope this helps!