Unassigned Multimapping in featurecounts

jennaj · October 31, 2019, 7:55pm

Avoid the UCSC reference GTFs from their Table Browser. These often end up truncated, plus there is a serious data content concern. Why is covered in this FAQ in more detail:

Extended Help for Differential Expression Analysis Tools

Good sources for hg38 GTF reference annotation are described in this prior Q&A (and are included in the FAQ above as well):

RNA-STAR and hg38 GTF reference annotation

The GTF should be based on the UCSC “hg38” genome build. Some choices:

For Gencode , copy the link to the GTF and paste it into the Upload tool. Hg38 data is here https://www.gencodegenes.org/ . After it is loaded, remove the headers (lines that start with a “#”) with the Select tool using the options “NOT Matching” with the regular expression ^# . Once the formatting is fixed, change the datatype to be gft under Edit Attributes (pencil icon). The data will be given the datatype gff by default, which works fine with some tools and but not with others. Avoid the gff3 version of this particular data (contains duplicated IDs and several RNA-seq tools do not work with annotation in that format anyway).

For iGenomes , the archive corresponding to the target genome/build needs to be locally downloaded, the tar archive unpacked, and then just the genes.gtf data uploaded to Galaxy (browse the local file, or use FTP). Find all available genome/builds here: iGenomes

Give one or both of those a try and see if your “Unassigned_Ambiguity” and “Unassigned_MultiMapping” counts reduce – they should (“gene_id” and “transcript_id” will no longer be the same value).

You may even get fewer “Unassigned_NoFeatures” if the UCSC data was truncated when extracted from the Table Browser.

Topic		Replies	Views
RNA STAR high percentage of unmapped reads: too short usegalaxy.eu support troubleshooting , mapping , blast , transcriptomics , rna_star	11	8935	January 18, 2022
Unassigned_Ambiguity problem in featureCounts usegalaxy.org support transcriptomics , rna_star	4	1628	May 10, 2021
High unassigned ambiguity counts for featureCounts data on bacterial transcriptomics usegalaxy.org support picard_markduplicates	2	1836	March 25, 2020
Too many duplicated sequences OR unassigned because of mapping quality troubleshooting , mapping , transcriptomics , tool-help , featurecounts	2	1757	March 20, 2024
RNA STAR high percentage of multi-mapped reads usegalaxy.org support blast , transcriptomics , bg_sortmerna , rna_star	7	4011	February 7, 2022

Unassigned Multimapping in featurecounts

Related topics