below 50 percents of reads are assigned in featurecount

Hi.
I am analyzing a human RNAseq single-end unstranded dataset. I have used the hg38 build-in index file in HISAT2 as a reference genome and the FeatureCounts build-in GTF file of hg38 as the GTF file in FeatureCounts. But only under 40% of reads are assigned in FeatureCounts. I Changed the GTF file many times and tried different GTF files but it was not effective. Now, what should I do to elevate the percentage of assignments? Is anything wrong with this dataset? Should not I use this dataset?

Welcome!
Which tools are you using?

Hi. HISAT2 for mapping and featurecounts for annotation assignment.

Sorry, I mean before HISAT2; quality-checking/trimming?

FASTQC for quality checking and Cutadapt for trimming.

Ok.
So, assuming you have high-quality reads, how was the HISAT2 alignment reports? Can you show it here?

2 Likes

Please accept my apologies for my delayed answer. For the multiQC report, I had to have summary reports of HISAT2 and I performed mapping by HISAT2 again to obtain summary files. Here is the MultiQC report:

Sample Name % Aligned
HISAT2 on data 447_ Mapping summary 90.2%
HISAT2 on data 449_ Mapping summary 86.9%
HISAT2 on data 451_ Mapping summary 88.0%
HISAT2 on data 453_ Mapping summary 86.7%
HISAT2 on data 455_ Mapping summary 86.2%
HISAT2 on data 457_ Mapping summary 85.9%
HISAT2 on data 459_ Mapping summary 87.2%
HISAT2 on data 461_ Mapping summary 88.0%
HISAT2 on data 463_ Mapping summary 86.6%
HISAT2 on data 465_ Mapping summary 85.0%
HISAT2 on data 467_ Mapping summary 86.8%
HISAT2 on data 469_ Mapping summary 86.0%
HISAT2 on data 471_ Mapping summary 80.2%
HISAT2 on data 473_ Mapping summary 81.6%
HISAT2 on data 475_ Mapping summary 88.8%
HISAT2 on data 477_ Mapping summary 89.9%
HISAT2 on data 479_ Mapping summary 86.8%
HISAT2 on data 481_ Mapping summary 85.5%
HISAT2 on data 483_ Mapping summary 87.3%
HISAT2 on data 485_ Mapping summary 83.0%
HISAT2 on data 487_ Mapping summary 88.0%
HISAT2 on data 489_ Mapping summary 87.1%
HISAT2 on data 491_ Mapping summary 84.8%
HISAT2 on data 493_ Mapping summary 85.8%
HISAT2 on data 495_ Mapping summary 83.1%
HISAT2 on data 497_ Mapping summary 87.9%
HISAT2 on data 499_ Mapping summary 89.1%
HISAT2 on data 501_ Mapping summary 86.7%
HISAT2 on data 503_ Mapping summary 88.7%
HISAT2 on data 505_ Mapping summary 88.8%
HISAT2 on data 507_ Mapping summary 86.9%
HISAT2 on data 509_ Mapping summary 85.1%
HISAT2 on data 511_ Mapping summary 85.7%
HISAT2 on data 513_ Mapping summary 87.7%
HISAT2 on data 515_ Mapping summary 88.2%
HISAT2 on data 517_ Mapping summary 87.3%
HISAT2 on data 519_ Mapping summary 85.5%
HISAT2 on data 521_ Mapping summary 87.0%
HISAT2 on data 523_ Mapping summary 86.1%
HISAT2 on data 525_ Mapping summary 88.8%
HISAT2 on data 527_ Mapping summary 88.9%
HISAT2 on data 529_ Mapping summary 84.6%
HISAT2 on data 531_ Mapping summary 85.1%
HISAT2 on data 533_ Mapping summary 83.8%
HISAT2 on data 535_ Mapping summary 88.9%
HISAT2 on data 537_ Mapping summary 88.4%
HISAT2 on data 539_ Mapping summary 85.9%
HISAT2 on data 541_ Mapping summary 88.0%
HISAT2 on data 543_ Mapping summary 88.9%
HISAT2 on data 545_ Mapping summary 88.7%
HISAT2 on data 547_ Mapping summary 86.3%
HISAT2 on data 549_ Mapping summary 87.2%
HISAT2 on data 551_ Mapping summary 86.3%
HISAT2 on data 553_ Mapping summary 85.6%
HISAT2 on data 555_ Mapping summary 87.2%
HISAT2 on data 557_ Mapping summary 90.4%

And these are fastQC and Cutadapts reports prepaired by MultiQC.

@mmomeni,
Mapping summary seems good, so I’d try check if your

reads are mapped on regions that are not found in your annotation.

Can you share what featurecounts’ parameters are you using? Maybe you can tweak some options related to stringency/overlap/quality, like the “Allow reads to map to multiple features”, "Minimum mapping quality per read", "Minimum fraction (of read) overlapping a feature", etc…

No, I do not change any options you mentioned. All these parameters are set by default.

Hello again. So what do you think about the problem? Can I use the results of this RNAseq dataset?

Hello, @mmomeni. I’d still like to see your feedbacks about:

Have you read this?

And

So you’re not worried about trying to refine for (possible) better parameters?

Yes, I have read that but it did not help me because the cause of un assignment, in that case, was NoFeature and Ambiguity.
As you see below the most unassigned reads are because of multi mapping:


By considering this which of the mentioned parameters should be changed?

Now that’s a good report.

Have you tried this?

Now I tried it and the percentages of reads assigned became a little better(38-60 percent). But something strange happened the percentage of Unassigned_NoFeatures increased a lot! What is the reason for that?

I guess can’t give you any better help,
Have you compared all the options from Allow reads to map to multiple features?