How to add Triticum aestivum snpEff4.3 genome database or appropriate wheat genome database in Galaxy for VCF annotation?

vcf
snpeff
wheat
annotation

#1

I added the T. aestivum snpEff4.3 genome database in Galaxy (https://usegalaxy.org/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fsnpeff%2FsnpEff%2F4.3%2BT.galaxy1&version=4.3%2BT.galaxy1&__identifer=rjgm18erx4), seems the database has been added as I could see theT. aestivum snpEff4.3 genome database at the right side history of Galaxy page column. But, when I add VCF file and snpEff4.3 wheat database, it’s showing error. I would like to annotate the natural variants against the wheat genome (particularly against wheat homeolog 3DL). I also tried with VEP ENSEMBL, it is working, but I would like to try with Galaxy to see the annotations from Galaxy. Does anyone know about this?


#2

What error exactly are you getting (post the error message here). It’s hard to help without this info.


#3

I attached the file, showing the added wheat database is unavailable, when I run a VCF. file with the added database.


#4

Hi - We are in the process of updating the SnpEff tool suite at Galaxy Main https://usegalaxy.org. You might have run into one of the known bugs the update will address. The ticket with details if interested: https://github.com/galaxyproject/usegalaxy-playbook/issues/157

Choices:

  1. Wait for the tools to be updated at Galaxy Main (the ticket linked above will close when that is completed) and rerun using the new tool versions.

  2. Use the tools at Galaxy EU https://usegalaxy.eu. EU has the updated tool versions already installed and as far as I know, those are working as expected now. Should the tool still present problems at Galaxy EU, then more is going on.

@wm75 is an admin for the EU server and I am an admin for the Main server – either of us can help with more troubleshooting to determine if this is a usage problem versus some tool issue that remains to be addressed.

Update: Looking at your graphic closer I am wondering of the snpeff database dataset is in a hidden/unstable state or from an earlier SnpEff tool version (not always compatible). It appears to be uploaded and not created new in the same working history. If you want to direct message me here and share your registered account email address at Galaxy Main (do NOT need your password and you should never share that with anyone), I can take a look to see if that is a factor first, to save you some time before bothering with reruns/moving your data to another server. Your choice - thanks!


#5

@Karthikeyan_Thiyagar If this is really, as noted by @jennaj and suggested by your screenshot, a snpeffdb dataset downloaded from Galaxy, then reuploaded, then this is not going to work on any server at the moment.
As things are implemented currently, there is extra data associated with a snpeffdb dataset, which will not be included in your download, but which is necessary for snpEff. In other words, SnpEff genomes obtained through Galaxy cannot be exported from that particular Galaxy instance to anywhere else in a useable form. This situation is different from what you may be used to from other Galaxy-generated data, but is technically not easy to avoid.

So the remaining question is why you tried this approach in the first place. Was that in reaction to other ways not working either? If so that would most likely indicate a bug with a snpEff tool.
Otherwise, I would suggest you to download the snpEff genome again (creating a new dataset in your history), then use snpEff with that genome directly.


#6

Thanks @wm75 for the clarification!

@Karthikeyan_Thiyagar Please try this at the Galaxy EU server for now using the latest version of the SnpEff tools. This will avoid known prior issues with older versions (now fixed). It is fine to transfer other data between servers (your vcf data, etc). Please let us know how this goes!


#7

Thanks @jennaj and @wm75, I just tried again to added the the same database with the latest version of snpEff tool, but showing the same error as before.

Please see the attached file with the files history.


#8

It looks like the file selected is the “available database” listing (tool SnpEff databases: list available databases).

You want to select the .db result in dataset 13 – that look like the result of the tool SnpEff build (and not SnpEff download) but both produce a “snpEff database” db output.

Note: Dataset 13 is a collection – so click on the collection “folder” icon instead of the single dataset icon to select the proper input. Hover over the three icons to see the pop-ups describing what each of the three are for.

If choosing the .db input doesn’t work for some reason or you run into more problems, please confirm that this was done at the EU server (appears to be from the low usage of quota).


#9

Thanks a lot Jennaj, the I found the same error message even with European Galaxy server. But I found an article, “https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0137549” in that the authors used snpEff toolbox to annotate the SNPs using wheat reference sequence.
Article title " Mutation Scanning in Wheat by Exon Capture and Next-Generation Sequencing"

I contacted the corresponding author pf this article via email and he said me to contact a Bioinformatician of that article to have an idea about the usage of snpEff. I will contact the Bioinformatician of the paper, if I found an answer or an idea to use the wheat snpEff database, I will post here. Thanks