I am facing a very similar issue here analyzing my bulk RNA-seq data from 6 samples. I trimmed the adapters and ran FastQC on my samples. I ran HISAT2 and FeatureCounts.
I got very high %alignment for HISAT2 according to multiQC (~97%) but the percentage of reads mapped to genes in FeatureCount is about 5%. I checked FastQC results and I saw a high percentage of sequence duplication. However, I don’t understand how sequence duplication contributes to this low gene alignment percentage ( my library is RF mm10).
Let’s start over in a new topic to review your results.
The bulk of your reads are either not mapping uniquely to a feature, or are not matching to the same coordinates as any feature at all!
The first place I usually start is with examining the reference data choices. Which genome are you using? Where did you source the feature annotation reference? Have you confirmed that both are based on the same genome assembly?
This topic explains a bit more for human, but any species can have multiple assemblies, so the advice here, especially the review and troubleshooting advice, is where to start.
Please give that a try and let us know what happens! If you need help with the review, you are welcome to share back your history in your reply and we can try to advise!
Apparently the library prep was forward, not reverse:) I reran my FeatureCounts with forward settings again and the alignment increased from 5% to more than 60%. I assume 60% is a good number, but let me know if it is still not considered as satisfactory/signals that something’s wrong.
Also, I tried running my HISAT2 again with forward setting but I observed something strange:
The HISAT2 I ran with reverse (i.e wrong) settings, gave me an alignment of 94-97% per sample. When I tried running it again with the correct setting (i.e forward), the alignment percentage slightly decreased (92-95%). I was wondering if this is normal and what the reason could be. Is strandedness (forward vs reverse) important for HISAT2 at all?