Missing gene id labeling in limma

Jeff · February 18, 2025, 12:57am

I have run Limma on some data and I am appearing to get the statistical side of the output ok, but the gene id information seems to not be attached? I am using the same genecode annotation gtf file for both featurecounts and limma, so it should have the information but there is a mismatch somehow. Anyone come across this error

jennaj · February 19, 2025, 1:01am

Welcome @Jeff

Very odd. You do not have a chromosome location for those observations either. Do you want to share the run so we can provide feedback about how to correct this? See → How to get faster help with your question

Or, you can try a few simple fixes to learn if those are the problem. The most obvious is the presence of # header lines in GTF files from GENCODE. You can remove those headers to see what happens first, then come back here if it wasn’t enough.

Let’s start there, thanks!

Jeff · February 19, 2025, 11:15am

Thanks for the reply Jennifer. I had assumed it was potentially the # header lines and attempted to remove them, however the problem has remained. I’ll run through things again from the start after cleaning up the gtf file and see what the result is. If I can’t resolve it, I’ll be back

jennaj · February 19, 2025, 8:18pm

Jeff · February 20, 2025, 8:59pm

Ok. I was able to remove the headers in the Gencode file which I though was the issue, however after running through again, I have the same issue still. Could the problem be using the inbuilt genome in featurecounts then the Genecode in Limma? Galaxy | Australia

jennaj · February 20, 2025, 11:54pm

Hi @Jeff

Yes, mixing up annotation sources will lead to this problem.

If you want to use this combination of tools and the built-in annotation supplied by Bioconductor in Featurecounts, this tutorial is doing exactly that and lists out all of the intermediate steps.

Hands-on: 2: RNA-seq counts to genes / 2: RNA-seq counts to genes / Transcriptomics

Or, you can use the Gencode reference annotation with all steps.

More tips are in the tutorials above, and the FAQs I shared already, especially the second one with the extended help (includes warnings about mixing up annotation sources with more details about formatting and related logic).

Hope this helps!

Jeff · February 20, 2025, 11:55pm

Hmm. For some reason I couldn’t get featurecounts to use Genecode

Jeff · February 21, 2025, 12:00am

Actually - correction. I can use the same genecode gtf for featurecounts and limma. Its the reference genome in hisat2 I was wondering about where I used human hg38

jennaj · February 21, 2025, 6:56pm

Hi @Jeff

The Gencode GTF with the headers removed will work with every tool I can think of. This is where I would expect it to be sourced. Note that the GFF3 might work with some tools but not others.

GENCODE - Human Release 47

These will be “a match” with the UCSC hg38 reference genome indexed at the UseGalaxy servers. Removing the # header lines that data providers include for provenance are usually better removed to bring the file into strict GTF specification.

This guide explains more details about pairing up reference data, using this assembly as an example.

Reference genomes at public Galaxy servers: GRCh38/hg38 example

Please give that a review, and you can ask more follow up questions. We’d like to get this working for you and it all seems to be very close given your updates!

Jeff · February 21, 2025, 9:33pm

Ok. That’s the source of my Genecode gtf file, so I seems to have the correct input, there must be some small detail somewhere I am missing.
I will have a read on the links before running things through again

Jeff · February 22, 2025, 6:19am

I think I am losing the will to live, and may have to concede defeat. It doesn’t seem to make a difference what annotation file I use Genecode, UCSC, I’ve even followed some tute to construct my own. All have versions of the same result - the annotation doesn’t work.

Jeff · February 23, 2025, 6:55am

I gave it one last run through with different combinations of reference genomes and annotations, based on the link you provided. I finally acheived annotation in Limma. The only combination that gave annotated output was native/built in HG38 canon in hisat2, native/built in Hg38 in featurecounts and then a self constructed annotation file in Limma voom.
Using UCSC or Genecode files in Limma failed.

jennaj · February 24, 2025, 6:29pm

Hi @Jeff

Ok – I see the problem now.

Some of the Bioconductor tools do not understand the “dot” in identifiers. Sort of a gotcha but that is how the tools work everywhere. In short, R is interpreting the dot when it shouldn’t be. I think you can quote it but I haven’t tested that with every tool in this pipeline.

We have a few topics about it if you are curious. The solution can be to remove the .N part of the identifier. This is one example.

how to replace these ID with official gene names?

The FAQ here has a troubleshooting warning but it is easy to miss!

FAQ: Extended Help for Differential Expression Analysis Tools
- Sometimes these tools do not understand transcript_id.N and gene_id.N notation (where N is a version number).
- This notation could be in fasta or tabular inputs.
- Try removing .N from all inputs, and check for the accidential creation of new duplicates!

This trips up people using the tools directly, too!

https://support.bioconductor.org/post/search/?query=version+numbers+on+identifiers

But I’m glad you were able to get this working! You might be able to get the other annotation sources working too.

Jeff · February 27, 2025, 3:37am

While troubleshooting this issue, as well as removing the headers, I also previously removed the versioning in the genecode file I was using, however annotation still failed, there must be something I’ve missed in that file.

jennaj · February 27, 2025, 6:24pm

Hi @Jeff

If you want to share the history with the error, I’ll take a look. Maybe I’ll notice what is going wrong.

Jeff · February 28, 2025, 5:07am

Sure Jennifer

1482 used UCSC annotation and 1496 used genecode. The subsequent ones that worked (1498 and 1520) used a ‘homemade’ file

jennaj · March 14, 2025, 7:07pm

Hi @Jeff

The other history links are still not loading, just your original share link. That’s Ok. The problem with the annotation was likely technical, and if you have this working, then that’s the main goal! If you have problems later on, we can follow up more.

To be clear about sharing for anyone else reading:

FAQ: Sharing your History
The history share link needs to be copied from the Share or Publish form, this is not the same browser URL.
If you toggle this form back to not sharing, then everyone else loses access.
To share a history with everyone working at the same server (and anyone with the link, too), consider the publish toggle instead to list your history under Histories → Public Histories.
When troubleshooting, a shared history is appreciated or it is hard to guess about what the problem might be. You can always unshare after.

Topic		Replies	Views
Troubleshooting with FeatureCounts usegalaxy.org support transcriptomics , igv , featurecounts	3	958	June 21, 2023
Troubleshooting FeatureCounts Error featurecounts	1	95	November 28, 2024
An error of limma-voom usegalaxy.eu support limma_voom	1	325	March 14, 2023
Limma-Voom Error	0	539	August 10, 2019
ref_gene_id featurecounts usegalaxy.org support	6	3175	May 22, 2019

Missing gene id labeling in limma

Related topics