As mentioned, my question has to do with variant calling using RNAseq data. My workflow is basically: Trimmomatic → HISAT2 → FreeBayes → VCF filter → snpeff eff
My question is about the results of snpeff (see below). How/why am I detecting so many upstream, downstream, and intergenic variants if my starting point is RNAseq data?
| Type (alphabetical order) | Count | Percent | ||
|---|---|---|---|---|
| — | — | — | — | |
| DOWNSTREAM | 5,285 | 17.28% | ||
| EXON | 5,148 | 16.832% | ||
| GENE | 2 | 0.007% | ||
| INTERGENIC | 6,870 | 22.462% | ||
| INTRON | 5,317 | 17.384% | ||
| SPLICE_SITE_ACCEPTOR | 333 | 1.089% | ||
| SPLICE_SITE_DONOR | 360 | 1.177% | ||
| SPLICE_SITE_REGION | 1,602 | 5.238% | ||
| TRANSCRIPT | 2 | 0.007% | ||
| UPSTREAM | 5,218 | 17.061% | ||
| UTR_3_PRIME | 300 | 0.981% | ||
| UTR_5_PRIME | 148 | 0.484% |