I have paired end fastq files from illumina Novaseq using whole transcriptome mRNA-seq profiling. My RNA STAR result looks OK (using hg38 gtf file from ucsc table browser).
Number of input reads | 38164847 Average input read length | 201 UNIQUE READS: Uniquely mapped reads number | 30007228 Uniquely mapped reads % | 78.63% Average mapped length | 200.97 Number of splices: Total | 18124401 Number of splices: Annotated (sjdb) | 17921546 Number of splices: GT/AG | 17970447 Number of splices: GC/AG | 117674 Number of splices: AT/AC | 16850 Number of splices: Non-canonical | 19430 Mismatch rate per base, % | 0.19% Deletion rate per base | 0.01% Deletion average length | 1.73 Insertion rate per base | 0.01% Insertion average length | 1.47 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 7150744 % of reads mapped to multiple loci | 18.74% Number of reads mapped to too many loci | 62501 % of reads mapped to too many loci | 0.16% UNMAPPED READS: % of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 2.44% % of reads unmapped: other | 0.03% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
However, when I use featureCounts I get very few assigned reads, most of the reads are in Unassigned Multimapped category. (I use reverse stranded option as indicated by ‘infer experiment’)
What could be the reason behind it? Is there a way to improve on that? I am losing a lot of reads due to multimapping.
I also checked the read distribution. It looks OK to me too.