use Create DBKey and Reference Genome to define new entries

Error adding new builds after adding cmvfs tool_data_tables.

I’ve used both release_24.1 and release_23.0 and for both versions after adding the line that defines the external tool_data_tables in the galaxy config section of galaxy.yml as specified below I’m not able to add any build any more (e.g. local, from UCSC, NCBI).

tool_data_table_config_path: /cvmfs/data.galaxyproject.org/byhand/location/tool_data_table_conf.xml,/cvmfs/data.galaxyproject.org/managed/location/tool_data_table_conf
.xml

When I remove the line everything works again as expected. Adding the default ‘/srv/galaxy/server/config/tool_data_table_conf.xml.sample’ path to the specification above also does not work. DId I forget to specifiy another path?

Hi @frans

The line referenced is correct, and matches our administrative documents here → Hands-on: Reference Data with CVMFS / Reference Data with CVMFS / Galaxy Server administration

Are you using data managers on your local server for this?

Hi Jennifer,

I’m going through the motions of setting up a galaxy server using ansible following the steps as described the GTN Galaxy Server administration documents. I’ve got two data-managers installed: BWA-MEM index and

Create DBKey and Reference Genome. One of the steps describes the process of downloading and installing a reference genome using the run-data-managers command (sacCer3 with Ephemeris). Running this example always results in a timeout error (something wrong with the ftp server?). Because of this error I reverted to the option to manually add a reference genome. I downloaded the Arabidopsis genome from https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz and uploaded the file to my galaxy history. When trying to add to add this with the data_manager_fetch_genome_all_fasta_dbkey tool using a new key and pointing to a file from my history this invariably results with an exec_after_process_hook_failed error. However, when I replace the line:

tool_data_table_config_path: /cvmfs/data.galaxyproject.org/byhand/location/tool_data_table_conf.xml,/cvmfs/data.galaxyproject.org/managed/location/tool_data_table_conf.xml

with:

tool_data_table_config_path: /srv/galaxy/server/config/tool_data_table_conf.xml.sample

it works as expected. Because I followed the steps as closely as possible I’m not sure what is going wrong. Even the version I downloaded from (Release step-10 · hexylena/git-gat (github.com)) displays the same behavior.

Hi @frans

So … maybe these were just examples, but… when using the CVMFS reference http://datacache.galaxyproject.org/

Both sacCer3 and Arabidopsis_thaliana_TAIR10 are already indexed in CVMFS, yes? If true, then if there is some tool specific index you want to create that is missing, you can use the already indexed genome (referenced by the database dbkey) to create a new tool-specific index.

You should see those databases as a local option in the BWA Data Manager. But, I also see BWA indexes for these two already.

And for the path changes, yes, you won’t be able to write to CVMFS. But you should be able to read from it. Your problem here also points to a CVMFS connection issue.

What am I misunderstanding about your question? Were you not able to get CVMFS installed on your server? Would you like to solve that first? Share more details about what was going on with that and we can try to help.


XRef
For completely new genomes and associated indexes, this topic has the general details. In short, get the genome indexed (fasta idx, samtools, picard), then generate tool specific indexes, as wanted, but after. Why? tool specific indexes tend to use the fasta/samtools/picard indexes to generate new indexes (anywhere, not just Galaxy). Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting - #2 by jennaj. You can do all that manually if you are careful enough (tedious, error prone, but possible, that’s what the “hands-on” in CVMFS is! all of our pre-data manager content), or in a batch stream with Ephemeris (exact, reproducible, and highly recommended if at all possible) Hands-on: Galaxy Tool Management with Ephemeris / Galaxy Tool Management with Ephemeris / Galaxy Server administration.

Hi Jennifier,

Thanks for your quick reply and support. At the moment we are running a containerized version of galaxy (docker-galaxy-stable, version 20.09) that is setup to use the CVMFS references. With that version I’m also able to add local reference genomes with the create_key_and_reference_genome_fetching data manager. Because I want to update to a newer version of galaxy I’m looking at the ansible administration tutorials of GTN. After running the ansible scripts I now have a working galaxy server (24.1) with a minimal set of tools installed (i.e. BWA and BWA-MEM). I’m able to access the references from the CVMFS locations when running either tool. I also expected to be able to add local reference genomes in the same way as before using the create_key_and_reference_genome_fetching data manager. However, this now results in an error. So my question is this:

Should I be able to add local reference genomes with the create_key_and_reference_genome_fetching data-manager after including the CVMFS references in the tool_data_table_config_path specification of galaxy.yml (like I could do before)? Thanks again for your support.

Yes, I think this should work, and I am not sure why it is not working for you when running the current 24.1 release.

This is worth following up on, since there might be something we can adjust. I’m going to cross-post this question over to our Admin chat for help. This is a group of people who are also running small servers for researchers, and the core administrators from our team are there to assist. They may reply here or there, and feel free to join the chat! You're invited to talk on Matrix