How to find the genes with the polymorphic variants in a VCF file

Hi, I have a VCF file with polymorphic variants and I am trying to find the 5 genes with the highest number of these variants. I hav intersect the VCF file with a BED file with data about genes from UCSC. However, I can’t figure out how to find the genes from there. I have tried to convert the resulting VCF file to pgSnp to then do intervals but I cannot do the conversion because I get the following error:

bad variant nt AACACACACACACACACACAAACAT,AACACACACACACACACACACACAAACAT for nt 2 at /opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/vcf2pgsnp/5fca46616675/vcf2pgsnp/vcf2pgSnp.pl line 95, line 173

Any input?

Thanks for the help.

Have you tried SnpSift Intervals (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_int/4.3+t.galaxy0)?

Alternatively, but more complex: you could consider SnpEff (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/snpeff/snpEff/4.3+T.galaxy1) to annotate your VCF with genomic effects. This will include the gene name’s together with other details. Just make sure you are suppressing upstream/downstream change annotations.

Finally, I would strongly recommend excluding indels from this type of analysis. Polymorphic indels have a very high chance to represent alignment atrefacts, and they are also complicating things because one indel at a site may affect a gene, while another one may fall just outside the gene.