Variant Calling from RNAseq

mitchellgc · January 15, 2024, 4:25pm

As mentioned, my question has to do with variant calling using RNAseq data. My workflow is basically: Trimmomatic → HISAT2 → FreeBayes → VCF filter → snpeff eff

My question is about the results of snpeff (see below). How/why am I detecting so many upstream, downstream, and intergenic variants if my starting point is RNAseq data?

	Type (alphabetical order)		Count
—	—	—	—
DOWNSTREAM		5,285	17.28%
EXON		5,148	16.832%
GENE		2	0.007%
INTERGENIC		6,870	22.462%
INTRON		5,317	17.384%
SPLICE_SITE_ACCEPTOR		333	1.089%
SPLICE_SITE_DONOR		360	1.177%
SPLICE_SITE_REGION		1,602	5.238%
TRANSCRIPT		2	0.007%
UPSTREAM		5,218	17.061%
UTR_3_PRIME		300	0.981%
UTR_5_PRIME		148	0.484%

jennaj · January 16, 2024, 7:26pm

Hi @mitchellgc

Are you using a stranded protocol for all steps?

If using unstranded for any, or just some, that could explain your results. Using the wrong strand could also end up this way.

You can check strand with a tool like Infer Experiment.

Other than that, no data is perfect. And depending on the genome, not all transcribed elements may have an annotated feature yet. You could pre-filter to use only the “known genes” in all steps if you are not interested in discovery.

Topic		Replies	Views
SnpEff annotation- transcript information discordant to the information available on the Ensemble website usegalaxy.eu support variant-analysis , snpeff	11	3715	January 29, 2020
Assistance with Variant Analysis usegalaxy.org support variant-analysis , vcf	6	198	November 21, 2023
Ways to make a variant calling for RNA Seq (paired-end) usegalaxy.org support freebayes , transcriptomics , variant-analysis , vcf	5	517	April 12, 2023
Naive Variant Caller, Samtools mpileup inaccuracies variant-analysis , vcf	4	312	April 7, 2022
Filtering SnpEff output with SnpSift variant-analysis , vcf , snpeff	2	803	January 4, 2023

Variant Calling from RNAseq

Related Topics