featureCounts strand setting and impact on DESeq2 results

Hi all,
I am a beginner into RNA-Seq data analysis and was wondering if someone could explain what exactly the ‘Specify strand information’ setting from the featureCounts tool is doing. For my pair-end RNA-Seq data analysis I am following the workflow as per this tutorial. BTW, I am performing this analysis for data (biological triplicates) generated under 3 experimental conditions A, B and C. So total 9 files.

After estimation of the strandedness (using the infer experiment tool) of my pair-end RNA-Seq data alignments I find that my libraries are reverse stranded (i.e. on average for all files “1±,1-+,2++,2–”: 0.9670). Does it mean that for featureCounts I need to ‘Specify strand information’ as stranded (Reverse)?

I have tried both combinations of Stranded (Forward) and Stranded (Reverse) with the idea that I do not want to miss out on read counts that originated from overlapping but opposite strand genes or the antisense strand of non-overlapping genes i.e. antisense strand where there are no annotated features yet on the target genome thereby ‘discovering’ potential new transcribed regions. I then created 3 dataset collections as:

  1. ‘all counts’ - containing forward and reverse count files (total 18 tabular datasets) for each sample (A, B and C).
  2. ‘all forward counts’ - containing only forward count files (total 9 tabular datasets) for each sample (A, B and C).
  3. ‘all reverse counts’ - containing only reverse count files (total 9 tabular datasets) for each sample (A, B and C).

Subsequently, I extracted identifiers and tagged all files from all 3 dataset collections accordingly. I then performed DESeq2, filtered out genes (2-fold change and adjusted p-value<0.05). I now have 3 dataset collections each containing filtered lists (A vs B, A vs C and B vs C) i.e. DEGs from considering ‘all counts’, ‘all forward counts’ and ‘all reverse counts’. My main question is, which of these would be the correct collection to refer to? Would it be only the collection of lists obtained from considering the ‘all reverse counts’ count files?

I would like to thank you very much in advance for your time, efforts and patience in answering my questions.

Best regards.

Hi @agschindler

Setting the stand filters which mapped read will be counted up per annotation feature.

  • Reads map to a genomic regions with strand assigned during mapping (BAM).
  • Features are placed on genomic regions with strand assigned in reference annotation (GTF, GFF3, or the built in indexes).
  • FeatureCounts compares the two, and counts up reads-per-feature at the gene summary level.

If your data is stranded, then you should use stranded protocols for both mapping and counting.

Why? The sequencing chemistry protocol was designed to produce stranded reads. Any reads mapped or counted on the opposite strand are non-specific. That can create a lot of noise in the results, and valid results could be omitted due to the over-counting.

Yes, those would be the results to consider.

FeatureCounts produces graphs describing which reads could be assigned to a feature with specificity, and when not, the reason why. Maybe examine those reports to see if you can confirm they produce the “best” results?

For this purpose, Salmon, Sailfish and Kallisto are some choices for transcriptome-based discovery analysis.

Hi @jennaj,

Thank you very much for this explanation and the additional pointers for transcriptome-based discovery.

Best wishes.

1 Like