Help with interpreting GO enrichment resutls using goseq

Hi there! I’m new to RNA-Seq results analysis. By following the tutorial of " Reference-based RNA-Seq data analysis" (Reference-based RNA-Seq data analysis), I have so far completed mapping, annotation, differential expression analysis. Now the next step that I want to do is gene enrichment analysis with GO. I followed the steps in the tutorial by using the “goseq” tool. But although I was able to obtain and understand the results of " Ranked category list - Wallenius method" and " Top over-represented GO terms plot", I couldn’t understand the results of the " DE genes for categories (GO/KEGG terms)". Specifically, although the ranked category list telling me that I have x number of genes associated with different GO terms (with adj. p value < 0.05), I could not find gene_id associated with any specific GO term; All I can find in the “DE genes for categories” file was some strange number separated by “NA”… I have attached a figure of those results for one of my dataset (BP vs FPNP). Can somebody please show me if there is a way to find the gene_id that was associated with each GO terms listed in the ranked category list (my target organism is an E. coli strain)? Many thanks ahead.
DE genes for categories (?)


Ranked category list

The GO annotation file that I uploaded to goseq

Hi @ding66

Compare this:

  1. List of genes with the true/false
  2. Your “The GO annotation file that I uploaded to goseq” file
  3. Do both use the same gene ID format?

If there is a mismatch, that is likely the problem. You would need to adjust the file in step 2 above to match your existing list of genes from step 1. A “replace” tool could do that if you can find the mapping. Data Manipulation Olympics

I might be guessing wrong, but you can share all the inputs and we can look at this closer.

Hi Jenn,

Thanks very much for your response and suggestions. I double checked and I do believe that my “gene ID and DE” file (please see the picture below) and the “GO Annotation file” are a match…

Gene ID and DE file:

Hi Jenn,

Thanks very much for your response and suggestions. I double checked and I do believe that my “gene ID and DE” file (please see the picture below) and the “GO Annotation file” are a match…

Gene ID and DE file:

Thanks @ding66 for posting those details.

I agree – those look like a match!

The only other thing I can think of is that maybe the underscore is causing some “match up” problem. We know that for other use cases that a dot character can be problematic (Ensemble gene terms with a version), and while that usually causes a mismatch that presents in a slightly different way … you could run a quick test with that underscore character removed from all of the inputs to see what happens.

For example … change GL980_000001 to GL980000001.

That can be done in batch on all files with one of the replace tools, or with sed.

Replacing the _ with nothing using sed would use this string: s/_//

Maybe try that first, and if it doesn’t work, you can share back your history and I’ll take a look at it and try to figure out what might be going wrong that way.

You can post the link back here, or I can start up a direct message you can share it in. Please try to make that history as simple as possible e.g. put a copy of the inputs into a new history, run the _ manipulations, then the tool, and share that.

Thanks for your suggestion Jenn! I have tried your method by removing the dash in the gene_id. However, the problem seems to have remained… How do I share the history with you?

The link to the FAQ is in here