Hi, I have a VCF file with polymorphic variants and I am trying to find the 5 genes with the highest number of these variants. I hav intersect the VCF file with a BED file with data about genes from UCSC. However, I can’t figure out how to find the genes from there. I have tried to convert the resulting VCF file to pgSnp to then do intervals but I cannot do the conversion because I get the following error:
bad variant nt AACACACACACACACACACAAACAT,AACACACACACACACACACACACAAACAT for nt 2 at /opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/vcf2pgsnp/5fca46616675/vcf2pgsnp/vcf2pgSnp.pl line 95, line 173
Finally, I would strongly recommend excluding indels from this type of analysis. Polymorphic indels have a very high chance to represent alignment atrefacts, and they are also complicating things because one indel at a site may affect a gene, while another one may fall just outside the gene.