Thanks @Seraph for following up.
So … I now have a different result that actually did output the gene identifiers. This uses one of the GTN tutorial examples. I am still reviewing what the potential differences data-wise might be between your “index from the history” run, and mine with the test data that also used a non-native index, but thought I would share it here anyway so we both can look through it and maybe find what is going on faster.
My initial guess (a true guess, nothing is confirmed yet): supplying the data “from the history” to create a mini-package for the species isn’t building everything that the tool needs to populate the output we are troubleshooting. Nothing is confirmed yet, this is just my hunch so far. The issue isn’t something trivial like a gene identifier format – it is instead probably a missing temporary data structure/table.
Now, there may be a technical reason why that cannot be constructed: maybe that data structure requires extra content not supplied in the simple two column gene-to-GO mapping. Meaning, for the Galaxy wrapper usage, the simple usage was offered as a convenience, even if all of the output options are not possible. Galaxy wrappers do not alter the underlying tool, and the underlying tool is normally expecting the full “data package” when used directly on the command line in R.
Here is the data
Dataset 38 is the file output you are trying to replicate. This test example is using the Drosophila package that comes with the Bioconductor natively (as far as I know) or maybe it was created using the original tools by us – but it is complete, and differs from my original test (that builds the mini-package on demand at runtime, similar to what you are doing).
Screenshot of Dataset 38, mostly for clarity, just to make sure we are both reviewing/expecting the same thing as we continue to troubleshoot.
That’s where I am now. I’ll try to look more at this today before the weekend, and at least get this ticketed for developer feedback. The more specific that ticket is, the better that feedback (and faster the potential action).
This also loops back to my prior post: you could consider creating a real, complete data package for your species, then running GOSEQ directly in R against it. The “bug” seems to be data based on the “mini” temporary data indexes created with the Galaxy wrapper. Creating your own indexes would avoid that. Meaning, this isn’t a problem with the R packages from Bioconductor or GOSEQ specifically (run in Galaxy or otherwise).
For some reason I am now thinking this has come up before … but I can’t find the discussion. It was probably many years ago. But I’ll search again!
Xref: https://bioconductor.org/packages/devel/bioc/vignettes/goseq/inst/doc/goseq.pdf