High unassigned ambiguity counts for featureCounts data on bacterial transcriptomics

ngwj · March 24, 2020, 6:08am

Hi, I am relatively new to RNA-seq data analysis following the " Finding and quantifying new transcripts" tutorial (https://galaxyproject.org/tutorials/nt_rnaseq/). The transcriptomics study that I done was on the effects of a chemical on a Mycobacterial species. I uploaded the fasta and gtf file and followed closely to what the tutorial suggested. My Hisat2 summary was as follows:
HISAT2 summary stats:
Total pairs: 20863625
Aligned concordantly or discordantly 0 time: 3642143 (17.46%)
Aligned concordantly 1 time: 8046746 (38.57%)
Aligned concordantly >1 times: 9064704 (43.45%)
Aligned discordantly 1 time: 110032 (0.53%)
Total unpaired reads: 7284286
Aligned 0 time: 5004928 (68.71%)
Aligned 1 time: 931102 (12.78%)
Aligned >1 times: 1348256 (18.51%)
Overall alignment rate: 88.01%.

I followed up filtering only for quality reads (during filtering did not check for paired reads and paired reads mapped in proper pair), stringtie and stringtie merge to create a merged transcriptomics file as per the tutorial and followed up with featureCounts to determine the counts of my data. However, the summary for featureCounts for one of my datasets looked like this:

Status	Filtered LJ5 Control 1 BAM
Status	Filtered LJ5 Control 1 BAM
Assigned	1844744
Unassigned_Unmapped	0
Unassigned_MappingQuality	0
Unassigned_Chimera	7171
Unassigned_FragmentLength	0
Unassigned_Duplicate	0
Unassigned_MultiMapping	0
Unassigned_Secondary	0
Unassigned_NonSplit	0
Unassigned_NoFeatures	19955
Unassigned_Overlapping_Length	0
Unassigned_Ambiguity	7025997

with a high amount of reads unassigned due to ambiguity.

I followed the tutorial pretty strictly with regards to parameters to set for featureCounts except for strand specificity (I have half for either so unstranded). Please help if possible. Thank you.

Yours sincerely,
Wei Jie

ngwj · March 24, 2020, 6:36am

The fasta and gtf files were downloaded from the shogun sequenced genome of the type strain of the Mycobacterium species ob BLAST assembly database. Most of the annotated proteins are hypothetical. I am unsure if this might have affected the results for featureCounts. I also only filtered for quality because I am interested in both functional gene and tRNA expression levels.

ngwj · March 25, 2020, 9:07am

I tried multi-mapping the reads and apparently featureCounts was able to align them. Parameters used are as follows:

|Alignment file|* 299: Filter SAM or BAM, output SAM or BAM on data 222: bam|
|Specify strand information|Unstranded|
|Gene annotation file|history|
|Gene annotation file|* 314: Merged Transcriptome (Mapped paired reads)|
|Output format|Gene-ID “\t” read-count (MultiQC/DESeq2/edgeR/limma-voom compatible)|
|Create gene-length file|False|
|pe_parameters||
|Count fragments instead of reads|-p|
|Check paired-end distance||
|Only allow fragments with both reads aligned|False|
|Exclude chimeric fragments|True|
|extended_parameters||
|GFF feature type filter|exon|
|GFF gene identifier|transcript_id|
|On feature level|False|
|Allow read to contribute to multiple features|True|
|Count multi-mapping reads/fragments|-M|
|Assign fractions to multimapping reads|False|
|Minimum mapping quality per read|0|
|Exon-exon junctions||
|Long reads|False|
|Count reads by read group|False|
|Largest overlap|False|
|Minimum bases of overlap|1|
|Minimum fraction (of read) overlapping a feature|0|
|Minimum fraction (of feature) overlapping a read|0|
|Read 5’ extension|0|
|Read 3’ extension|0|
|Reduce read to single position|Leave the read as it is|
|Only count primary alignments|False|
|Ignore reads marked as duplicate|False|
|Annotates the alignment file with ‘XS:Z:’-tags to described per read or read-pair the corresponding assigned feature(s).|False|
|Ignore unspliced alignments|False|

The summary output looks like this:

Status	Filtered LJ5 Control 1 BAM
Status	Filtered LJ5 Control 1 BAM
Assigned	7773515
Unassigned_Unmapped	0
Unassigned_MappingQuality	0
Unassigned_Chimera	0
Unassigned_FragmentLength	0
Unassigned_Duplicate	0
Unassigned_MultiMapping	0
Unassigned_Secondary	0
Unassigned_NonSplit	0
Unassigned_NoFeatures	24615
Unassigned_Overlapping_Length	0
Unassigned_Ambiguity	0

Please let me know if I input any parameters wrongly. While this would be an ideal result for me, the downstream DeSeq2 analysis would be gravely affected if I did anything wrongly with this analysis.

Topic		Replies	Views
Unassigned_Ambiguity problem in featureCounts usegalaxy.org support transcriptomics , rna_star	4	1601	May 10, 2021
Seeking advice on amount of ambiguity in featureCounts usegalaxy.org support	7	2109	September 9, 2019
featureCounts high Unassigned_NoFeatures - New to RNA-seq! usegalaxy.org support gtn-tutorial , troubleshooting	1	2460	November 10, 2020
Unassigned Multimapping in featurecounts reference-annotation , salmon	9	9916	June 15, 2021
Featurecounts in built genome giving no read assignments mapping , transcriptomics , featurecounts	3	461	February 16, 2023

High unassigned ambiguity counts for featureCounts data on bacterial transcriptomics

Related topics