Genome index or dbkey not accessed by tools on a local Galaxy - Solution: Run tool-specific Data Managers

pa.saunders · May 16, 2019, 4:45am

I am setting up a local Galaxy instance on my laptop and want to have reference genomes available. I used the example in https://galaxyproject.org/events/gcc2014/training-day/data-managers/ 16:10-16:30 Install a DataManager from the ToolShed

As indicated, I set Galaxy.yml (not “universe_wsgi.ini”) made the change

enable_data_manager_user_view = True

and installled

data_manager_fetch_genome_all_fasta

from the toolshed. When run, the history showed the dbkey was created, but when I tried to use it with HISAT2 no reference genome was found. What do I need to do to make the reference genome dbkey available to use as a reference genome?

jennaj · May 16, 2019, 4:30pm

Hello @pa.saunders!

You’ll need to run more Data Managers. This prior Q&A lists the best order to run these for base-line setup. After those four are done, then use the HISAT2 DM to create indexes for that tool, and run any other DMs for indexes used by the other tools you plan to use.

Thanks!

pa.saunders · May 16, 2019, 8:00pm

What I an trying to achieve is having the reference genomes is accessed as a single menu pull down item that works for multiple programs the way they are in the public use Galaxy server.

I realize that the programs have their own particular needs for how the genome is indexed, but I thought that is why tool shed programs are loaded with dependencies, eg to process stored FASTA files. Are you telling me that for each reference genome listed they are already pre-indexed for all the programs available and stored somewhere?

jennaj · May 16, 2019, 9:06pm

Some tools do just use the fasta, so do not require more indexing. Others have tool-specific additional indexes and those have a distinct data manager tool.

We run data managers to provide indexes for tools on the public servers. Everyone else who runs their own server does the same, unless using a cloud/docker version of Galaxy that includes pre-computed indexes (created by a Data manager). And even those cloud/docker instances can have more tools/indexes added in by the administrators.

The dependencies that the tool shed installs are for the underlying tool itself. These are not reference genome indexes.

Check the ToolShed under the category “Data Managers” to find all DMs that are available.

To run Data Managers in batch, you can use Ephemeris in the “Run data managers” mode. https://ephemeris.readthedocs.io/

Ephemeris does require that certain configuration files are customized. For an example, please see this tutorial: https://galaxyproject.github.io/training-material/topics/instructors/tutorials/setup-galaxy-for-training/tutorial.html

Thanks!

pa.saunders · May 18, 2019, 2:50am

The genome imported into the dbkey fine. How do I make it accessible by programs as a pull down menu option?

jennaj · May 18, 2019, 9:06pm

Did you run the Data Managers for the targeted tools?

If yes, restart your server to have the index tables update.