As mentioned, my question has to do with variant calling using RNAseq data. My workflow is basically: Trimmomatic → HISAT2 → FreeBayes → VCF filter → snpeff eff
My question is about the results of snpeff (see below). How/why am I detecting so many upstream, downstream, and intergenic variants if my starting point is RNAseq data?
Type (alphabetical order) | Count | Percent | ||
---|---|---|---|---|
— | — | — | — | |
DOWNSTREAM | 5,285 | 17.28% | ||
EXON | 5,148 | 16.832% | ||
GENE | 2 | 0.007% | ||
INTERGENIC | 6,870 | 22.462% | ||
INTRON | 5,317 | 17.384% | ||
SPLICE_SITE_ACCEPTOR | 333 | 1.089% | ||
SPLICE_SITE_DONOR | 360 | 1.177% | ||
SPLICE_SITE_REGION | 1,602 | 5.238% | ||
TRANSCRIPT | 2 | 0.007% | ||
UPSTREAM | 5,218 | 17.061% | ||
UTR_3_PRIME | 300 | 0.981% | ||
UTR_5_PRIME | 148 | 0.484% |