How do I count the number of variants per gene from an annotated vcf file?


I generated a vcf file and then annotated it using bcftools.
My header is in the format: Chrom Pos ID Ref Alt Qual Filter Info Format data

and some example data looks like this (most important is the GENE=___ at the end of the INFO section):
chr16 868776 . CGGGGGGGGGGGGC CGGGGGGGGGGGC 61.251 . AB=0.363636;ABP=4.78696;AC=1;AF=0.5;AN=2;AO=4;CIGAR=1M1D12M;DP=11;DPB=11.1429;DPRA=0;EPP=11.6962;EPPR=5.18177;GTI=0;LEN=1;MEANALT=6;MQM=37.5;MQMR=42;NS=1;NUMALT=1;ODDS=5.44144;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=136;QR=32;RO=1;RPL=0;RPP=11.6962;RPPR=5.18177;RPR=4;RUN=1;SAF=4;SAP=11.6962;SAR=0;SRF=1;SRP=5.18177;SRR=0;TYPE=del;technology.ILLUMINA=1;GENE=ENST00000262301.15

How do I group and count variants that have the same gene? Any help would be greatly appreciated.



Still need help with this if anyone has any insight!

Hi @gcarson

Try the Gemini tools at Galaxy EU

Tutorials that cover how to use Gemini:

Galaxy Main does not have the latest Gemini tools/indexes for technical reasons. These issues are being addressed. Once done, we’ll add the tools/indexes.


Hi! Sorry for the late reply. from the vcf I used snpeff eff -> gemini load -> gemini query (with no parameters, just default)

Thanks so much for the help!

