Goseq NA NA NA NA values

Hi all,

I am dealing with RNA-seq analysis, where I used goseq tool for overRepsentation GO analysis. Using the three input files (screenshots below), following the Galaxy tutorial, the tool worked smoothly, but I had only one issue, with the table showed enriched GO corresponded to NA1 NA2 NA3 instead of the actual locus_tags or gene IDs.

  • I went through several previous posts that addressed this issue
  • I removed the (under_score) in the gene IDs as mentioned by @jennaj
  • I used the data for my seconde microbe

However, I could not solve this issue/bug

Any help is much appreciated




Hi @Seraph

Very strange results, I don’t think I’ve seen this before!

Are you really working at UseGalaxy.org, or some other public server? Check the top of your browser window for the URL to check. If you are not working at one of the UseGalaxy servers yet, maybe try at one of them as a cross-check? This tool has had a few revisions, so you’ll want to be using the most current version to avoid older bugs.

Thanks for reviewing the earlier Q&A, it sounds like you’ve done all of the usual checks. And, yes, underscores should be Ok with this tool (written in R), and removing them hasn’t helped other people to get a result, but I guess that is something to at least try. Meaning: substitute the underscore with nothing in all files, so that each GENEID term is oneWord.

And, I’d like to take a look at this – assuming you are working at a public server. Would you like to share your history back here for feedback? You can restrict the history to just the inputs and outputs of this tool (“copy datasets”), and see the banner at this forum for how to create the share link, or here directly → How to get faster help with your question. You can unshare once we are done.

Let’s start there, thanks! :slight_smile:

1 Like

Update:

I can reproduce this with test data. Hum :confused:

  • test history → https://usegalaxy.eu/u/jenj/h/test-goseq

  • tool version → toolshed.g2.bx.psu.edu/repos/iuc/goseq/goseq/1.50.0+galaxy0

  • parameter → Extract the DE genes for the categories (GO/KEGG terms)? set to Yes

This will likely need a tool wrapper ticket. More soon, and thanks for reporting the problem! :hammer_and_wrench:

1 Like

Hi Jennaj,

Yes, I am using https://usegalaxy.org/. It looks like you’ve encountered the same bug.

Could you please let me know if this issue can be resolved within a few days, or if it might take longer? since it is very important for my analysis.

Also, do you still need me to share the history?

Thanks again for your prompt response.

S

Hi @Seraph

Thanks for confirming this was at ORG. I was able to produce this same result at both EU and ORG, so I think it is a problem with the tool wrapper (so far). That means a longer correction time, even if what is going on is immediately clear (and it isn’t to me so far).

What to try (I haven’t yet)

  • The UseGalaxy.au server
  • Older versions of the tool. (FAQ: Changing the tool version)
  • The underlying R package directly inside of an interactive environment, either in Galaxy or outside of it. Rstudio is one example. This shouldn’t be overly difficult… please scroll to the bottom of the tool form for tutorials if you know R, but are not sure how to connect your history data to/from the environment.

More soon.

1 Like

Hi Jennaj

  1. UseGalaxy.au server resulted in the same NA issue
  2. The old veriosns of galaxy goseq (Galaxy Version 1.44.0+galaxy2), and (Galaxy Version 1.44.0+galaxy0) resulted same NA issue

A. Inside Galaxy: I believe using the underlying R package directly inside of an interactive environment will apply the same Tool Source Code in R and so will create same issue (Correct me if this is wrong concept),

B. In local Rstudio. goseq in R works in different way and requires enrez IDs, (that I couldnot find using all converion tools), to match the Ids to their GO-terms.
This is why goseq on galaxy can solve this issue by simply uploading the go-term-gene-ids file.

Please let me know as soon as the bug is solved in which I can complete this step

Thanks, I really aprrecite it,
S

Hi @Seraph

I’m testing today to answer your questions, but for the gene identifier conversion part, would AnnotateMyIDs (link to tool) work for you?

1 Like

Hi Jennaj,

As I checked under the organism list, this tool is restricted for few organisms (human, mouse, rat, fruit fly and zebrafish), no bacteria nore fungi are supportd by this tool!!

Thanks,
S

Hi @Seraph

Yes, the original package natively supports only a few. I didn’t look up your IDs in the original post to check which you are working with.

Since you are working in R, the help at the Bioconductor forum might be interesting. The native packages for species are the same anywhere you work, and everyone working with a different species needs to build a custom package. That’s what you are doing when inputting your own GO mapping through the Galaxy form. You could output those working files with the option Output RData file? if you wanted to.

I’m still looking at the Galaxy wrapper. More soon.

1 Like

Hi Jenna,

I’m working with non-model organisms that have customized IDs (locus_tags) and RefSeq IDs, which were generated by mapping to a group of genomes within the same genus. I previously contacted NCBI about this issue but couldn’t resolve it, as many of the IDs don’t have corresponding Entrez IDs.

I’m familiar with R and attempted to use the Output RData file for goseq. However, the file opened in R instead of RStudio, and had several issues that prevented me from proceeding.

I appreciate your attention to this matter and look forward to resolving the galaxy goseq bug

Thanks,
S

Thanks @Seraph for following up.

So … I now have a different result that actually did output the gene identifiers. This uses one of the GTN tutorial examples. I am still reviewing what the potential differences data-wise might be between your “index from the history” run, and mine with the test data that also used a non-native index, but thought I would share it here anyway so we both can look through it and maybe find what is going on faster.

My initial guess (a true guess, nothing is confirmed yet): supplying the data “from the history” to create a mini-package for the species isn’t building everything that the tool needs to populate the output we are troubleshooting. Nothing is confirmed yet, this is just my hunch so far. The issue isn’t something trivial like a gene identifier format – it is instead probably a missing temporary data structure/table.

Now, there may be a technical reason why that cannot be constructed: maybe that data structure requires extra content not supplied in the simple two column gene-to-GO mapping. Meaning, for the Galaxy wrapper usage, the simple usage was offered as a convenience, even if all of the output options are not possible. Galaxy wrappers do not alter the underlying tool, and the underlying tool is normally expecting the full “data package” when used directly on the command line in R.

Here is the data

Dataset 38 is the file output you are trying to replicate. This test example is using the Drosophila package that comes with the Bioconductor natively (as far as I know) or maybe it was created using the original tools by us – but it is complete, and differs from my original test (that builds the mini-package on demand at runtime, similar to what you are doing).

Screenshot of Dataset 38, mostly for clarity, just to make sure we are both reviewing/expecting the same thing as we continue to troubleshoot.

That’s where I am now. I’ll try to look more at this today before the weekend, and at least get this ticketed for developer feedback. The more specific that ticket is, the better that feedback (and faster the potential action).

This also loops back to my prior post: you could consider creating a real, complete data package for your species, then running GOSEQ directly in R against it. The “bug” seems to be data based on the “mini” temporary data indexes created with the Galaxy wrapper. Creating your own indexes would avoid that. Meaning, this isn’t a problem with the R packages from Bioconductor or GOSEQ specifically (run in Galaxy or otherwise).

For some reason I am now thinking this has come up before … but I can’t find the discussion. It was probably many years ago. But I’ll search again! :hammer_and_wrench:

Xref: https://bioconductor.org/packages/devel/bioc/vignettes/goseq/inst/doc/goseq.pdf

Hi Jennaj,

Thank you for the time you are spending to solve this issue
Creating a real, complete data package for the baterial and fungal species, is a challenge, since it needs combining the genome assemblied for all strains, merging the GFF files, with consistent gene IDs, before checking their corresponding GO-Term.

I would appreciate any easier way using galaxy, if some how this bug can be resolved.

Thank you,
S

Hi @Seraph

The tool itself needs this same reference data available pre-processed in order to include certain data points in the output. Galaxy isn’t doing anything extra, like building the complete index, for the complications you describe. Even if the index-building functions for Bioconductor tools were also wrapped, the inputs would all still need to “match up” for those features/identifiers.

To be clear: the tools available in Galaxy are the same tools available anywhere else. The “galaxy part” just adds in some extra bits around the original tool to make it nicer to use in a web interface that supports workflows, other tools written by other developers, and throughput features. The original tool itself is still always sourced from the same general tool packages (usually the conda version).

So, I don’t think this is a “bug”, instead, this is just how it works with limited reference data.

You could follow up by reaching out to the Bioconductor authors. The small slice of extra data you need might be something they could layer into the “limited reference data” functions (that part isn’t Galaxy specific either, and part of the underlying tool).

Not a great answer but I am not sure how to help more!

1 Like

Hi Jennaj,

Thank you so much, I really appreciate your clearfication! and yes, at the end its part of the tool-code itself, and not related to galaxy.

Thanks agian,

1 Like