How do I count the number of variants per gene from an annotated vcf file?


I generated a vcf file and then annotated it using bcftools.
My header is in the format: Chrom Pos ID Ref Alt Qual Filter Info Format data

and some example data looks like this (most important is the GENE=___ at the end of the INFO section):
chr16 868776 . CGGGGGGGGGGGGC CGGGGGGGGGGGC 61.251 . AB=0.363636;ABP=4.78696;AC=1;AF=0.5;AN=2;AO=4;CIGAR=1M1D12M;DP=11;DPB=11.1429;DPRA=0;EPP=11.6962;EPPR=5.18177;GTI=0;LEN=1;MEANALT=6;MQM=37.5;MQMR=42;NS=1;NUMALT=1;ODDS=5.44144;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=136;QR=32;RO=1;RPL=0;RPP=11.6962;RPPR=5.18177;RPR=4;RUN=1;SAF=4;SAP=11.6962;SAR=0;SRF=1;SRP=5.18177;SRR=0;TYPE=del;technology.ILLUMINA=1;GENE=ENST00000262301.15

How do I group and count variants that have the same gene? Any help would be greatly appreciated.



1 Like

Still need help with this if anyone has any insight!

Hi @gcarson

Try the Gemini tools at Galaxy EU

Tutorials that cover how to use Gemini:

Galaxy Main does not have the latest Gemini tools/indexes for technical reasons. These issues are being addressed. Once done, we’ll add the tools/indexes.


Hi! Sorry for the late reply. from the vcf I used snpeff eff → gemini load → gemini query (with no parameters, just default)

Thanks so much for the help!

1 Like

Hi everyone!
I am trying to build a workflow for variant calling and GEMINI tools seem to be useful to my purposes, after using FREEBAYES or GATK However, I am having problems to find it in the galaxy platform I do not know why this can be occuring.
Any help about it? Another tool already existing in galaxy to replace GEMINI?
Thank you in advance

1 Like

Hi @Laila_Toum

We are working to get the Gemini suite back at Galaxy Main We had technical issues that are expected to be resolved soon … but don’t wait. This is still very much a work-in-progress.

Meanwhile, you can use the tool suite at Galaxy EU


More help for anyone running their own Galaxy server

Gemini can be installed from the ToolShed and the indexes created. Be sure to install the latest version of the tool suite and the latest associated Data Manger to create new indexes. Any older indexes will not work with the updated tool version without a few non-trivial modifications. Make sure your Galaxy version itself is up to date first, or additional problems may present.

I faced the same issue. I used “bcftools count” and it worked like a charm on my .vcf.

Good luck!

1 Like