I have run Limma on some data and I am appearing to get the statistical side of the output ok, but the gene id information seems to not be attached? I am using the same genecode annotation gtf file for both featurecounts and limma, so it should have the information but there is a mismatch somehow. Anyone come across this error
Welcome @Jeff
Very odd. You do not have a chromosome location for those observations either. Do you want to share the run so we can provide feedback about how to correct this? See ā How to get faster help with your question
Or, you can try a few simple fixes to learn if those are the problem. The most obvious is the presence of # header lines in GTF files from GENCODE. You can remove those headers to see what happens first, then come back here if it wasnāt enough.
- FAQ: Working with GFF GFT GTF2 GFF3 reference annotation
- FAQ: Extended Help for Differential Expression Analysis Tools
Letās start there, thanks!
Thanks for the reply Jennifer. I had assumed it was potentially the # header lines and attempted to remove them, however the problem has remained. Iāll run through things again from the start after cleaning up the gtf file and see what the result is. If I canāt resolve it, Iāll be back
Ok. I was able to remove the headers in the Gencode file which I though was the issue, however after running through again, I have the same issue still. Could the problem be using the inbuilt genome in featurecounts then the Genecode in Limma? Galaxy | Australia
Hi @Jeff
Yes, mixing up annotation sources will lead to this problem.
If you want to use this combination of tools and the built-in annotation supplied by Bioconductor in Featurecounts, this tutorial is doing exactly that and lists out all of the intermediate steps.
Or, you can use the Gencode reference annotation with all steps.
More tips are in the tutorials above, and the FAQs I shared already, especially the second one with the extended help (includes warnings about mixing up annotation sources with more details about formatting and related logic).
Hope this helps!
Hmm. For some reason I couldnāt get featurecounts to use Genecode
Actually - correction. I can use the same genecode gtf for featurecounts and limma. Its the reference genome in hisat2 I was wondering about where I used human hg38
Hi @Jeff
The Gencode GTF with the headers removed will work with every tool I can think of. This is where I would expect it to be sourced. Note that the GFF3 might work with some tools but not others.
These will be āa matchā with the UCSC hg38 reference genome indexed at the UseGalaxy servers. Removing the # header lines that data providers include for provenance are usually better removed to bring the file into strict GTF specification.
This guide explains more details about pairing up reference data, using this assembly as an example.
Please give that a review, and you can ask more follow up questions. Weād like to get this working for you and it all seems to be very close given your updates!
Ok. Thatās the source of my Genecode gtf file, so I seems to have the correct input, there must be some small detail somewhere I am missing.
I will have a read on the links before running things through again
I think I am losing the will to live, and may have to concede defeat. It doesnāt seem to make a difference what annotation file I use Genecode, UCSC, Iāve even followed some tute to construct my own. All have versions of the same result - the annotation doesnāt work.
I gave it one last run through with different combinations of reference genomes and annotations, based on the link you provided. I finally acheived annotation in Limma. The only combination that gave annotated output was native/built in HG38 canon in hisat2, native/built in Hg38 in featurecounts and then a self constructed annotation file in Limma voom.
Using UCSC or Genecode files in Limma failed.
Hi @Jeff
Ok ā I see the problem now.
Some of the Bioconductor tools do not understand the ādotā in identifiers. Sort of a gotcha but that is how the tools work everywhere. In short, R is interpreting the dot when it shouldnāt be. I think you can quote it but I havenāt tested that with every tool in this pipeline.
We have a few topics about it if you are curious. The solution can be to remove the .N
part of the identifier. This is one example.
The FAQ here has a troubleshooting warning but it is easy to miss!
- FAQ: Extended Help for Differential Expression Analysis Tools
- Sometimes these tools do not understand
transcript_id.N
andgene_id.N
notation (where N is a version number). - This notation could be in fasta or tabular inputs.
- Try removing
.N
from all inputs, and check for the accidential creation of new duplicates!
- Sometimes these tools do not understand
This trips up people using the tools directly, too!
But Iām glad you were able to get this working! You might be able to get the other annotation sources working too.
While troubleshooting this issue, as well as removing the headers, I also previously removed the versioning in the genecode file I was using, however annotation still failed, there must be something Iāve missed in that file.
Hi @Jeff
If you want to share the history with the error, Iāll take a look. Maybe Iāll notice what is going wrong.
Sure Jennifer
1482 used UCSC annotation and 1496 used genecode. The subsequent ones that worked (1498 and 1520) used a āhomemadeā file
Hi @Jeff
The other history links are still not loading, just your original share link. Thatās Ok. The problem with the annotation was likely technical, and if you have this working, then thatās the main goal! If you have problems later on, we can follow up more.
To be clear about sharing for anyone else reading:
- FAQ: Sharing your History
- The history share link needs to be copied from the Share or Publish form, this is not the same browser URL.
- If you toggle this form back to not sharing, then everyone else loses access.
- To share a history with everyone working at the same server (and anyone with the link, too), consider the publish toggle instead to list your history under Histories ā Public Histories.
- When troubleshooting, a shared history is appreciated or it is hard to guess about what the problem might be. You can always unshare after.