Question about the Galaxy Tutorial ''Exome sequencing data analysis'', the dbSNP vcf file

Hi, I am leaning from the ‘‘Exome sequencing data analysis’’ tutorial in order the develop the skill to analyze my genomics data against the reference genome for identifying variants on the specific genes of my samples.

I followed the tutorial to use the ‘‘SnpSift Annotate’’ to give variant ID to the vcf file of the ‘‘father’’.
However, the dbSNP vcf file (dbSNP_138.hg19.vcf) offered in the tutorial does not contain ID information.
So I could run SnpSift Annotate, but the output vcf still does not have ID.

I also downloaded the dbSNP vcf file from the NCBI database. I could not run the SnpSift Annotate of the ‘‘father vcf’’ against the NCBI dbSNP vcf, I guess it is because the chromosomes in the NCBI dbSNP vcf file are not named as ‘‘chr#’’ but only ‘’#’’.

Do I understand the error correctly? Any suggestions how I can solve this?

Many thanks.

Susan

1 Like

Hi Susan,
so you’re trying a tutorial that’s currently broken in several ways, and I’m feeling somewhat guilty because I was planning to fix it last week but didn’t get round to it.
However, by diagnosing correctly the first problem, and even trying to work around it, you’re demonstrating that you have a sound understanding of the topic.
Indeed, the dbsnp file used in the tutorial is not what it’s supposed to be. I would have thought that the different chromosome names between your vcf and the NCBI file could be handled by Snpsift, but it’s possible that they pose a problem. One other possibility I could think of is that the genome versions differ. Did you download the dbsnp hg19 or the hg38 version?
Anyway, you don’t really need to run this step at all when using GEMINI next because that tool knows about the dbsnp ids anyway. However, if you’re working on usegalaxy.org, you will soon find out that GEMINI is not currently available there. You can find it on usegalaxy.eu, but at a newer and heavily reworked version than what’s described in the tutorial.
So overall this experience will not be as comfortable as it should be, and I’m sorry for that.
If this is not super urgent, you may be better off waiting 1 or 2 weeks longer for my fixes.
Best,
Wolfgang

1 Like

Thanks. I did use hg19 as the reference genome.
I will look into other tools and tutorials first and come back to this one.

I appreciate your prompt reply and clarification. Very helpful. :slight_smile:

1 Like

Hi Susan,
fixing the exome-seq tutorial turned out to be more complicated than I first thought, and I ended up rewriting it almost completely. I’m finally done with it though, and the new tutorial is live now at https://galaxyproject.github.io/training-material/topics/variant-analysis/tutorials/exome-seq/tutorial.html.
Because some recently updated tools used in the new version are not yet available on usegalaxy.org, you will have to follow it from usegalaxy.eu for now.
I hope you find the contents instructive, and thanks for your patience,
Wolfgang

2 Likes