Unassigned Multimapping in featurecounts

I have paired end fastq files from illumina Novaseq using whole transcriptome mRNA-seq profiling. My RNA STAR result looks OK (using hg38 gtf file from ucsc table browser).

                      Number of input reads |	38164847
                  Average input read length |	201
                                UNIQUE READS:
               Uniquely mapped reads number |	30007228
                    Uniquely mapped reads % |	78.63%
                      Average mapped length |	200.97
                   Number of splices: Total |	18124401
        Number of splices: Annotated (sjdb) |	17921546
                   Number of splices: GT/AG |	17970447
                   Number of splices: GC/AG |	117674
                   Number of splices: AT/AC |	16850
           Number of splices: Non-canonical |	19430
                  Mismatch rate per base, % |	0.19%
                     Deletion rate per base |	0.01%
                    Deletion average length |	1.73
                    Insertion rate per base |	0.01%
                   Insertion average length |	1.47
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |	7150744
         % of reads mapped to multiple loci |	18.74%
    Number of reads mapped to too many loci |	62501
         % of reads mapped to too many loci |	0.16%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |	0.00%
             % of reads unmapped: too short |	2.44%
                 % of reads unmapped: other |	0.03%
                              CHIMERIC READS:
                   Number of chimeric reads |	0
                        % of chimeric reads |	0.00%

However, when I use featureCounts I get very few assigned reads, most of the reads are in Unassigned Multimapped category. (I use reverse stranded option as indicated by ‘infer experiment’)

Assigned 7712538
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 37553849
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 4138912
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 18155778

What could be the reason behind it? Is there a way to improve on that? I am losing a lot of reads due to multimapping.

I also checked the read distribution. It looks OK to me too.

1 Like

Welcome @srashid

Avoid the UCSC reference GTFs from their Table Browser. These often end up truncated, plus there is a serious data content concern. Why is covered in this FAQ in more detail:

Good sources for hg38 GTF reference annotation are described in this prior Q&A (and are included in the FAQ above as well):

Give one or both of those a try and see if your “Unassigned_Ambiguity” and “Unassigned_MultiMapping” counts reduce – they should (“gene_id” and “transcript_id” will no longer be the same value).

You may even get fewer “Unassigned_NoFeatures” if the UCSC data was truncated when extracted from the Table Browser.

Thank you! While waiting for your reply, I actually tried igenomes gtf file, it definitely reduces ambiguity , but the multimapping issue still remains

This is the output of featureCounts now

Assigned 23399833
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 37557436
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 6348801
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 259191
1 Like

Hi @srashid

FeatureCounts only reports unique matches with default settings.

Your reads are likely hitting more than one “exon”, which leads to “multimapping” counts when summarized at the Gene level.

Review the “Advanced Options”. In particular, pay attention to these parameters, but also review others and see what results. There isn’t a single right answer for everyone. It depends on how you want these counted up, if at all.

  • “Allow read to contribute to multiple features” (default=no)
  • “Largest overlap” (default=no)
  • “Count multi-mapping reads/fragments” (default=disabled) and the sub-option (when enabled) “Assign fractions to multimapping reads”

Thanks!

1 Like

Did you resolve your issue? it would be interesting to know what the solution was.

Hi.
I have the same problem. A lot of unassigned Multimapped reads. When I select the Allow reads to map to multiple features option, My problem is fixed and more than 50% of reads are assigned. Now I wanted to ask is it scientifically and technically okay to allow reads to map to multiple features? (My aim from this analysis is to find DEGs)

2 Likes

@mmomeni, you can find some discussions/methods here:

https://www.biostars.org/p/273609/

https://doi.org/10.1016/j.csbj.2020.06.014

https://www.biostars.org/p/311322/

2 Likes

Thanks for introducing these discussions. I read them.
Can we use one of the tools Salmon, RSEM, or Kallisto in Galaxy for dealing with multi-mapped reads?
If the answer is yes, does any tutorials exist for that? If Galaxy has any other tool for this aim please introduce to me.

Hi @mmomeni,
yes, Kallisto, RSEM and Salmon are available in Galaxy. I recommed you to have a look at this tutorial in order to learn how to use Salmon for gene quantification: Quantification of gene expression: Salmon.

Regards

3 Likes

Thanks a lot. that was interesting tutorial and interesting tool!

2 Likes