Hi, I’m running through my rat samples, and they are paired-end, reverse stranded. However, I ended up with low HISAT2 mapping rate (70-75%) and also low featurecounts assigned rate (only around 50%). Is there any way to improve those rate?
Review the QA performed on the reads. Did FastQC reveal anything interesting after the reads went through trimming? And, did the QA do what you expected it to do (comparing before and after trimming quality).
Did you confirm the “strandedness” of the read data and use those settings when mapping?
Did you use reference annotation during mapping? Did you restrict to known annotation junctions to guide mapping? Should you consider doing that? Should you not? Featurecounts has optional built-in annotation that is probably not exactly like user-supplied annotation that might be incorporated with other tools. So, you could try the different combinations and compare.
Did you set the spiced alignment parameters during mapping? I’m asking since this can be missed.
Have you explored what those reads do map to, if not the rat genome? Or, if they do map but are not captured by Featurecounts, have you explored what annotated features are in those genomic regions (UCSC is great for this).
If you are not sure how to do some of these steps, each is covered in the GTN tutorials. The tools you are using are included in several, so check the bottom of tool forms to find the links. The first three Transcriptomics tutorials in the Intro section go over the details with scientific reasoning.
In short, this could be low quality read bases, a small amount of contamination, low resolution assembly in the genome, or missing features in the annotation. To me, this doesn’t look like a technical problem with the reference genome itself or the annotation or the match between the two, but you could double check against → FAQ: Extended Help for Differential Expression Analysis Tools