Missing gene id labeling in limma

I have run Limma on some data and I am appearing to get the statistical side of the output ok, but the gene id information seems to not be attached? I am using the same genecode annotation gtf file for both featurecounts and limma, so it should have the information but there is a mismatch somehow. Anyone come across this error

Welcome @Jeff

Very odd. You do not have a chromosome location for those observations either. Do you want to share the run so we can provide feedback about how to correct this? See ā†’ How to get faster help with your question

Or, you can try a few simple fixes to learn if those are the problem. The most obvious is the presence of # header lines in GTF files from GENCODE. You can remove those headers to see what happens first, then come back here if it wasnā€™t enough.

Letā€™s start there, thanks! :slight_smile:

Thanks for the reply Jennifer. I had assumed it was potentially the # header lines and attempted to remove them, however the problem has remained. Iā€™ll run through things again from the start after cleaning up the gtf file and see what the result is. If I canā€™t resolve it, Iā€™ll be back :rofl:

:heart_eyes_cat:

Ok. I was able to remove the headers in the Gencode file which I though was the issue, however after running through again, I have the same issue still. Could the problem be using the inbuilt genome in featurecounts then the Genecode in Limma? Galaxy | Australia

Hi @Jeff

Yes, mixing up annotation sources will lead to this problem.

If you want to use this combination of tools and the built-in annotation supplied by Bioconductor in Featurecounts, this tutorial is doing exactly that and lists out all of the intermediate steps.

Or, you can use the Gencode reference annotation with all steps.

More tips are in the tutorials above, and the FAQs I shared already, especially the second one with the extended help (includes warnings about mixing up annotation sources with more details about formatting and related logic).

Hope this helps! :slight_smile:

Hmm. For some reason I couldnā€™t get featurecounts to use Genecode

Actually - correction. I can use the same genecode gtf for featurecounts and limma. Its the reference genome in hisat2 I was wondering about where I used human hg38

Hi @Jeff

The Gencode GTF with the headers removed will work with every tool I can think of. This is where I would expect it to be sourced. Note that the GFF3 might work with some tools but not others.

These will be ā€œa matchā€ with the UCSC hg38 reference genome indexed at the UseGalaxy servers. Removing the # header lines that data providers include for provenance are usually better removed to bring the file into strict GTF specification.

This guide explains more details about pairing up reference data, using this assembly as an example.

Please give that a review, and you can ask more follow up questions. Weā€™d like to get this working for you and it all seems to be very close given your updates! :scientist:

Ok. Thatā€™s the source of my Genecode gtf file, so I seems to have the correct input, there must be some small detail somewhere I am missing.
I will have a read on the links before running things through again

I think I am losing the will to live, and may have to concede defeat. It doesnā€™t seem to make a difference what annotation file I use Genecode, UCSC, Iā€™ve even followed some tute to construct my own. All have versions of the same result - the annotation doesnā€™t work.

I gave it one last run through with different combinations of reference genomes and annotations, based on the link you provided. I finally acheived annotation in Limma. The only combination that gave annotated output was native/built in HG38 canon in hisat2, native/built in Hg38 in featurecounts and then a self constructed annotation file in Limma voom.
Using UCSC or Genecode files in Limma failed.

1 Like

Hi @Jeff

Ok ā€“ I see the problem now.

Some of the Bioconductor tools do not understand the ā€œdotā€ in identifiers. Sort of a gotcha but that is how the tools work everywhere. In short, R is interpreting the dot when it shouldnā€™t be. I think you can quote it but I havenā€™t tested that with every tool in this pipeline.

We have a few topics about it if you are curious. The solution can be to remove the .N part of the identifier. This is one example.

The FAQ here has a troubleshooting warning but it is easy to miss!

This trips up people using the tools directly, too!

But Iā€™m glad you were able to get this working! You might be able to get the other annotation sources working too. :slight_smile:

While troubleshooting this issue, as well as removing the headers, I also previously removed the versioning in the genecode file I was using, however annotation still failed, there must be something Iā€™ve missed in that file.

Hi @Jeff

If you want to share the history with the error, Iā€™ll take a look. Maybe Iā€™ll notice what is going wrong.

Sure Jennifer

1482 used UCSC annotation and 1496 used genecode. The subsequent ones that worked (1498 and 1520) used a ā€˜homemadeā€™ file

Hi @Jeff

The other history links are still not loading, just your original share link. Thatā€™s Ok. The problem with the annotation was likely technical, and if you have this working, then thatā€™s the main goal! If you have problems later on, we can follow up more.

To be clear about sharing for anyone else reading:

  • FAQ: Sharing your History
  • The history share link needs to be copied from the Share or Publish form, this is not the same browser URL.
  • If you toggle this form back to not sharing, then everyone else loses access.
  • To share a history with everyone working at the same server (and anyone with the link, too), consider the publish toggle instead to list your history under Histories ā†’ Public Histories.
  • When troubleshooting, a shared history is appreciated or it is hard to guess about what the problem might be. You can always unshare after.