Greetings! How can I obtain FASTA format sequences of orthologous genes detected using eggNOG Mapper? Thanks in advance.
This is a good question and maybe a bit hidden, but I’ll try to explain!
The idea is to create the annotation reference first, with a specific optional tag toggled on (the md5 hash), then to use that file during later analysis steps as a reference with the Basis for annotation mode set to Use Cached annotations (cache). See also --md5.
Then you’ll see the toggle on this form to output a fasta file – a subset of the reference data in fasta format associated with the orthologs calls made for your primary query fasta data.
I’ve set up the form below to show where to look for it. When you review the form yourself, notice that that fasta output option will not always be present. But you can generate the data needed for the next round with those other methods.
Down in the Help section on the form this is described a bit (very short) but the linked documentation have this covered. The tool works in Galaxy about the same as it does other places, except the command line flags are options on the form instead.
Screenshot
That pre-computed cached reference with md5 values is created with a prior run. The toggle to set the md5 part of it is at the bottom of the form in Output Options. The “cache” part of the name just means pre-computed in a sort of index, which is what you are using the output as with subsequent runs.
Screenshot
Tool form options can include very specific language to better link together all the tool’s resources. In most cases, the full tool publications are the definitive usage guides. And when an option or flag is named in the paper, is probably what you’ll also find on the Galaxy form on option toggles, to facilitate knowing what those do and how to use them.
So, in short, run the tool twice with those options and you’ll be able to first create a reference, then use your reference for queries and to produce novel output subsets for more!
Please let us know if this actually helps or not!