I am failing to download the gnomad dataset into galaxy. It cannot recognize the vcf.bgz.tbi extension, I think. Would you suggest how can I download this dataset?

Gemini can currently only build one database per VCF, so to analyze several samples together you need to produce a multisample VCF first. As I tried to explain in merge multiple VCF files - variant analysis and sample organization , whether that makes sense, strongly depends on what you’re analyzing…

Hello @matty5 The data is sourced from here, correct? https://gnomad.broadinstitute.org/downloads Choices: Download the data using the methods described by the data provider. Upload just the .vcf dataset to Galaxy. The resulting vcf will work with more of the tools wrapped for Galaxy. This is yo…

Update: Data from https://gnomad.broadinstitute.org/ will have a few issues once loaded by URL into Galaxy. The data does load with the Upload tool. I tested at Galaxy EU https://usegalaxy.eu and ran the Upload tool twice – both were successful. The data will be in vcf_bgzip format (autodetected …

Also asked about in this post: [image] merge multiple VCF files - variant analysis and sample organization usegalaxy.eu support Thank you for your answer! I followed the tutorial you suggested and also the “Identification of somatic and germline variants from tumor and n…

Hi @jennaj , Thanks a lot for your answers. In the meantime I tried to upload “gnomad.genomes.r2.1.1.exome_calling_intervals.sites.vcf.bgz” with the GalaxyEU tool. it worked apparently, but it assigned automatically the vcf_bgzip (~10 GB). It is not recognized by GEMINI annotate tool (as you also s…

@jennaj , @matty5 oops, I was thinking that both gemini load and gemini annotate were already accepting vcf_bgzip as input, but that’s not the case. I just added the necessary logic to these tools, which means, @matty5 , that you only need to be patient until these changes make it into the Galaxy t…

[image] wm75: I just added the necessary logic to these tools, which means, @matty5 , that you only need to be patient until these changes make it into the Galaxy toolshed and from there onto usegalaxy.eu . Expect a couple of days for that. Everything went fine with the tool update on usegalaxy…

Thanks @wm75 for replying and update gemini annotate. Regarding the answer below, actually I would prefer, if I can to run one analysis for all my samples, batching them in one initial database in gemini load. Anyway to batch the VCFs keeping track of patient info/ID? Thanks [image] wm75: If …

download gnomad vcf.bgz.tbi dataset into galaxy

jennaj January 23, 2020, 8:53pm 4

Also asked about in this post: