I have paired end fastq files from illumina Novaseq using whole transcriptome mRNA-seq profiling. My RNA STAR result looks OK (using hg38 gtf file from ucsc table browser).
Number of input reads | 38164847
Average input read length | 201
UNIQUE READS:
Uniquely mapped reads number | 30007228
Uniquely mapped reads % | 78.63%
Average mapped length | 200.97
Number of splices: Total | 18124401
Number of splices: Annotated (sjdb) | 17921546
Number of splices: GT/AG | 17970447
Number of splices: GC/AG | 117674
Number of splices: AT/AC | 16850
Number of splices: Non-canonical | 19430
Mismatch rate per base, % | 0.19%
Deletion rate per base | 0.01%
Deletion average length | 1.73
Insertion rate per base | 0.01%
Insertion average length | 1.47
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 7150744
% of reads mapped to multiple loci | 18.74%
Number of reads mapped to too many loci | 62501
% of reads mapped to too many loci | 0.16%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 2.44%
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
However, when I use featureCounts I get very few assigned reads, most of the reads are in Unassigned Multimapped category. (I use reverse stranded option as indicated by ‘infer experiment’)
| Assigned | 7712538 |
|---|---|
| Unassigned_Unmapped | 0 |
| Unassigned_MappingQuality | 0 |
| Unassigned_Chimera | 0 |
| Unassigned_FragmentLength | 0 |
| Unassigned_Duplicate | 0 |
| Unassigned_MultiMapping | 37553849 |
| Unassigned_Secondary | 0 |
| Unassigned_NonSplit | 0 |
| Unassigned_NoFeatures | 4138912 |
| Unassigned_Overlapping_Length | 0 |
| Unassigned_Ambiguity | 18155778 |
What could be the reason behind it? Is there a way to improve on that? I am losing a lot of reads due to multimapping.
I also checked the read distribution. It looks OK to me too.
