featureCounts output not compatible with Annotate DeSeq2/DexSeq output tables

Henk · March 5, 2021, 4:54pm

Hi,
Just starting as a Galaxy user I got an issue conserning annotation. I have a number of featureCounts output files for DE analysis. A number of significantly differentially expressed genes with a Fc of >2 were obtained. However, when I want to annotate them using ‘Annotate DeSeq2/DexSeq output tables’ my columns with the annotation info returns only NAs. I got the impression that the cause of this is that featureCount files come with the Entrez-id while this information is not present in my .gtf file that I got from the EnSembl repository (GRCh38.102.gtf.gz; removed the header lines containing an #). Any suggestions how to change the gene Entrez-id in my featureCounts output or how to get ENS numbers in my .gtf file? Thanks for your help,
Henk

gallardoalba · March 8, 2021, 6:59pm

Hi @HUJI_stu,
you can use the annotateMyIDs tool in order to perform such format conversion by using the featureCounts file as input. Let me know if it works.

Regards

Henk · March 9, 2021, 10:40am

Dear Cristobal,

I tried AnnotateMyIDs and it works fine. Unfortunately it requires some additional steps of joining datasets which is done automatically if you use the Annotate DeSeq2/DexSeq output tables. Found out that the datasets with my own data uses Entrez ID names as transcript identifier and that’s where things went wrong because the gtf file I used did not contain this information in the attribute column.

Kind regards,

Henk