Modify GTF datasets with select tool

Kunyuan_Tian · September 28, 2022, 10:02am

I’m performing FeatureCounts on BAM files and received a summary with very high Unassigned_NoFeatures read.

While I am troubleshooting, I came across this Datatype information page that advised to trim the headers from the GTF annotation.

I am using annotation for mouse GRCm39 and saw headers start with ##, therefore decided to try this out.

However I was unable to find Select (instance) from the tools. Has it been renamed? How do I remove the headers from GTF file?

Many thanks
KY

gallardoalba · September 28, 2022, 12:45pm

Hi @Kunyuan_Tian,
I think this is the tool that you require: Select. You are right, it doesn’t show up when typing Select, but the search bar will be improved quite a lot in the next release.

Regards

Kunyuan_Tian · September 28, 2022, 3:53pm

Thank you so much!

I have tried rerunning FeatureCount with the Modified GTF datasets, but the there are still too many unassigned_noFeature reads. Please see below.

What would likely be the reasons for high number of noFeature reads? So far to my knowledge, there’s less than 0.1% adaptor content after Cutadapt, the RNA STAR alignment was decent (90% uniquely mapped reads). Library is unstranded as I tested with infer experienment and had 47% for both forward and reverse. I used GRCm39 for alignment and uploaded GRCm39 annotation from Genecode. Any advice on how to unravel the high no_feature read would be really appreciated!

Best regards

jennaj · September 28, 2022, 10:26pm

The scientific and technical reasons other people have run into are in several other Q&A topics at the author’s support forum Bioconductor Forum and more are here just about Galaxy. There will be some overlap. Please have a look first then share the more details about your analysis if you still need more help (or an R → Galaxy tool form translation).

https://help.galaxyproject.org/search?q=featurecounts%20unassigned

Some basic checks include: Did you map against the built-in index mm39 (at usegalaxy.eu – the UCSC version of GRCm39)? Are you certain that the genome annotation is in GTF format (Gencode also hosts GFF3 format)? Gencode also host multiple annotation subsets – maybe you need to try a different one? You could also backup – did the original and post-trimming read QA/QC steps reveal anything special about the content? Is the data public or your own?

Your reads are mapped but not overlapping with annotated regions. Opening the BAMs and your GTF in a tool like IGV might reveal what is scientifically going on after any remaining technical issues are resolved.

Topic		Replies	Views
featureCounts high Unassigned_NoFeatures - New to RNA-seq! usegalaxy.org support gtn-tutorial , troubleshooting	1	2493	November 10, 2020
featurecounts gtf error usegalaxy.org support featurecounts	1	73	November 21, 2024
Error with FeatureCounts usegalaxy.org support transcriptomics	5	29	April 14, 2025
Unassigned_Ambiguity problem in featureCounts usegalaxy.org support transcriptomics , rna_star	4	1626	May 10, 2021
STAR GTF file error for newbie usegalaxy.org support mapping , transcriptomics , reference-annotation , featurecounts	4	741	April 24, 2023

Modify GTF datasets with select tool

Related topics