Unassigned Multimapping in featurecounts

I have paired end fastq files from illumina Novaseq using whole transcriptome mRNA-seq profiling. My RNA STAR result looks OK (using hg38 gtf file from ucsc table browser).

                      Number of input reads |	38164847
                  Average input read length |	201
                                UNIQUE READS:
               Uniquely mapped reads number |	30007228
                    Uniquely mapped reads % |	78.63%
                      Average mapped length |	200.97
                   Number of splices: Total |	18124401
        Number of splices: Annotated (sjdb) |	17921546
                   Number of splices: GT/AG |	17970447
                   Number of splices: GC/AG |	117674
                   Number of splices: AT/AC |	16850
           Number of splices: Non-canonical |	19430
                  Mismatch rate per base, % |	0.19%
                     Deletion rate per base |	0.01%
                    Deletion average length |	1.73
                    Insertion rate per base |	0.01%
                   Insertion average length |	1.47
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |	7150744
         % of reads mapped to multiple loci |	18.74%
    Number of reads mapped to too many loci |	62501
         % of reads mapped to too many loci |	0.16%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |	0.00%
             % of reads unmapped: too short |	2.44%
                 % of reads unmapped: other |	0.03%
                              CHIMERIC READS:
                   Number of chimeric reads |	0
                        % of chimeric reads |	0.00%

However, when I use featureCounts I get very few assigned reads, most of the reads are in Unassigned Multimapped category. (I use reverse stranded option as indicated by ‘infer experiment’)

Assigned 7712538
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 37553849
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 4138912
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 18155778

What could be the reason behind it? Is there a way to improve on that? I am losing a lot of reads due to multimapping.

I also checked the read distribution. It looks OK to me too.

1 Like

Welcome @srashid

Avoid the UCSC reference GTFs from their Table Browser. These often end up truncated, plus there is a serious data content concern. Why is covered in this FAQ in more detail:

Good sources for hg38 GTF reference annotation are described in this prior Q&A (and are included in the FAQ above as well):

Give one or both of those a try and see if your “Unassigned_Ambiguity” and “Unassigned_MultiMapping” counts reduce – they should (“gene_id” and “transcript_id” will no longer be the same value).

You may even get fewer “Unassigned_NoFeatures” if the UCSC data was truncated when extracted from the Table Browser.

Thank you! While waiting for your reply, I actually tried igenomes gtf file, it definitely reduces ambiguity , but the multimapping issue still remains

This is the output of featureCounts now

Assigned 23399833
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 37557436
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 6348801
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 259191
1 Like

Hi @srashid

FeatureCounts only reports unique matches with default settings.

Your reads are likely hitting more than one “exon”, which leads to “multimapping” counts when summarized at the Gene level.

Review the “Advanced Options”. In particular, pay attention to these parameters, but also review others and see what results. There isn’t a single right answer for everyone. It depends on how you want these counted up, if at all.

  • “Allow read to contribute to multiple features” (default=no)
  • “Largest overlap” (default=no)
  • “Count multi-mapping reads/fragments” (default=disabled) and the sub-option (when enabled) “Assign fractions to multimapping reads”

Thanks!