GOEnrichment support

Hello,

I am currently working with a NON model species in Galaxy (usegalaxy.eu) and I would like to perform GO enrichment analysis. However, I am having trouble understanding how to generate the Gene Product Annotation File (.gaf) required by the GOEnrichment tool.

From what I understand, the .gaf file should contain the GO annotations associated with the genes of the species, but I am not sure what the correct workflow is to produce this file within Galaxy. Should it be generated from an existing annotation dataset, converted from another format (e.g., GFF/GTF), or downloaded from an external resource and then formatted for Galaxy?

Could someone clarify the correct way to obtain or generate a .gaf Gene Product Annotation File for a model species in Galaxy, or point me to the appropriate tool or documentation?

Thank you in advance for your help.

Welcome @Edith_Tittarelli

We have some examples in the tutorials here!

In short, get the reference data from a public source. Then, annotate your data using the same base reference identifiers (or, convert to it once reduced). The .gaf data can be a two column format: gene (tab) GO term.



GOEnrichment requires:

  • A Gene Ontology file in either OBO or OWL format (see Download ontology).
  • A tabular annotation file in GAF (Download annotations) format, BLAST2GO format, or a simple two-column table (e.g. from BioMart) with gene product ids in the first column and GO terms in the second one.
  • A list of gene products comprising the study set (a flat text file with one gene product per line).
  • Optionally, a list of gene products comprising the population set (if none is submitted, the population set will be the set of gene products listed in the annotation file).


What kind of data are you working with? RNA-Seq? If so, tutorials like these will lead you to the steps where marker genes can be isolated from expression data.

Let’s start there, thanks! :slight_smile: