I tried to run RNA STAR to map paired-end reads onto a genome.
I got many unmapped reads described as “too short”, see the report below:
Number of input reads | 35461958
Average input read length | 269
Uniquely mapped reads number | 12102031
Uniquely mapped reads % | 34.13%
Average mapped length | 274.83
Number of splices: Total | 8397273
Number of splices: Annotated (sjdb) | 8344187
Number of splices: GT/AG | 8236046
Number of splices: GC/AG | 66713
Number of splices: AT/AC | 3518
Number of splices: Non-canonical | 90996
Mismatch rate per base, % | 1.00%
Deletion rate per base | 0.04%
Deletion average length | 2.50
Insertion rate per base | 0.03%
Insertion average length | 2.89
Number of reads mapped to multiple loci | 5260654
% of reads mapped to multiple loci | 14.83%
Number of reads mapped to too many loci | 3293617
% of reads mapped to too many loci | 9.29%
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 41.43%
% of reads unmapped: other | 0.32%
Number of chimeric reads | 0
% of chimeric reads | 0.00%
I also tried to run Hisat2 which indicates that many reads are unpaired:
HISAT2 summary stats:
Total pairs: 35461958
Aligned concordantly or discordantly 0 time: 14150787 (39.90%)
Aligned concordantly 1 time: 10475830 (29.54%)
Aligned concordantly >1 times: 7754092 (21.87%)
Aligned discordantly 1 time: 3081249 (8.69%)
Total unpaired reads: 28301574
Aligned 0 time: 25752505 (90.99%)
Aligned 1 time: 2199936 (7.77%)
Aligned >1 times: 349133 (1.23%)
Overall alignment rate: 63.69%
Do you have any idea what could have caused this?
I was wondering whether perhaps the fact to concatenate the files from different libraries with cat function leads to a problem.
But I have done it in the past and it was never the case.
Or perhaps I should sort the fastq files?
Please let me know if you have any idea how to solve the problem.
The reads should not be too short as they are between 75 and 150 bp.
Thank you in advance for the help!