Unable to determine location of rsync data for all_fasta (human b38)

Greetings.

Thanks in advance for any help. I’m new to Galaxy and installed a fresh installation of Galaxy (19_09) on Ubuntu hooked into PostgreSQL. I installed the data manager “data_manager_rsync_g2” (e0329ab30f6d) without issue. I then attempted to use the rsync data manager to install the fasta reference and data table entries for the human GRC hg38 genome.

Parameters chosen: 
    genome:  Human Dec. 2013 (GRCh38/fg38)(hg38);  
    Desired Data Tables:  all_fasta; bwa_mem_indexes;  
    Desired Data Table Entries:  selected all data table entries except female and Cononical tables  (I.E.  hg38full.fa and hg38.fa)

The job successfully submits but when I click on the history I see it actually failed with error: “unable to determine location of rsync data for all_fasta {‘dbkey’: ‘hg38’, ‘name’: ‘Human (Homo sapiens) (b38): hg38’, ‘path’: ‘/cvmfs/data.galaxyproject.org/byhand/hg38/seq/hg38.fa’, ‘value’: ‘hg38’}”

I see in the galaxy.log a section that looks like an error but does not say it is:

galaxy.tool_util.deps DEBUG 2020-01-28 10:33:55,938 [p:51638,w:1,m:0] [LocalRunner.work_thread-3] Using dependency python version 3.7 of type conda
galaxy.tool_util.deps DEBUG 2020-01-28 10:33:55,938 [p:51638,w:1,m:0] [LocalRunner.work_thread-3] Using dependency rsync version 3.1.3 of type conda
galaxy.jobs.command_factory INFO 2020-01-28 10:33:55,951 [p:51638,w:1,m:0] [LocalRunner.work_thread-3] Built script [/apps/galaxy/galaxy_app/database/jobs_directory/000/4/tool_script.sh] for tool command [[ "$(basename "$CONDA_DEFAULT_ENV")" = "$(basename '/apps/galaxy/galaxy_app/database/dependencies/_conda/envs/mulled-v1-815a44c80bfad16368e1558d06f2a6ab4d4a80b679909085f17817b64abb8167')" ] ||
MAX_TRIES=3
COUNT=0
while [ $COUNT -lt $MAX_TRIES ]; do
    . '/apps/galaxy/galaxy_app/database/dependencies/_conda/bin/activate' '/apps/galaxy/galaxy_app/database/dependencies/_conda/envs/mulled-v1-815a44c80bfad16368e1558d06f2a6ab4d4a80b679909085f17817b64abb8167' > conda_activate.log 2>&1
    if [ $? -eq 0 ];then
        break
    else
        let COUNT=COUNT+1
        if [ $COUNT -eq $MAX_TRIES ];then
            echo "Failed to activate conda environment! Error was:"
            cat conda_activate.log
            exit 1
        fi
        sleep 10s
    fi
done ; [ "$(basename "$CONDA_DEFAULT_ENV")" = "$(basename '/apps/galaxy/galaxy_app/database/dependencies/_conda/envs/mulled-v1-815a44c80bfad16368e1558d06f2a6ab4d4a80b679909085f17817b64abb8167')" ] ||
MAX_TRIES=3
COUNT=0
while [ $COUNT -lt $MAX_TRIES ]; do
    . '/apps/galaxy/galaxy_app/database/dependencies/_conda/bin/activate' '/apps/galaxy/galaxy_app/database/dependencies/_conda/envs/mulled-v1-815a44c80bfad16368e1558d06f2a6ab4d4a80b679909085f17817b64abb8167' > conda_activate.log 2>&1
    if [ $? -eq 0 ];then
        break
    else
        let COUNT=COUNT+1
        if [ $COUNT -eq $MAX_TRIES ];then
            echo "Failed to activate conda environment! Error was:"
            cat conda_activate.log
            exit 1
        fi
        sleep 10s
    fi
done ; python '/apps/galaxy/galaxy_app/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_rsync_g2/e0329ab30f6d/data_manager_rsync_g2/data_manager/data_manager_rsync.py' '/apps/galaxy/galaxy_app/database/files/000/dataset_4.dat']
galaxy.jobs.runners DEBUG 2020-01-28 10:33:55,966 [p:51638,w:1,m:0] [LocalRunner.work_thread-3] (4) command is: rm -rf working; mkdir -p working; cd working; /bin/bash /apps/galaxy/galaxy_app/database/jobs_directory/000/4/tool_script.sh > ../tool_stdout 2> ../tool_stderr; return_code=$?; cd '/apps/galaxy/galaxy_app/database/jobs_directory/000/4';
[ "$GALAXY_VIRTUAL_ENV" = "None" ] && GALAXY_VIRTUAL_ENV="$_GALAXY_VIRTUAL_ENV"; _galaxy_setup_environment True
python "metadata/set.py"; sh -c "exit $return_code"
galaxy.jobs.runners.local DEBUG 2020-01-28 10:33:55,985 [p:51638,w:1,m:0] [LocalRunner.work_thread-3] (4) executing job script: /apps/galaxy/galaxy_app/database/jobs_directory/000/4/galaxy_4.sh
galaxy.jobs DEBUG 2020-01-28 10:33:56,000 [p:51638,w:1,m:0] [LocalRunner.work_thread-3] (4) Persisting job destination (destination id: local)

Good day,

The issue here turned out to be that the data manager was trying to retrieve content from non-existent or invalid paths, one on the rsync server and one in the python script for the data manager itself.

I’ve created a pull request to update the data manager, and my colleague @nate has adjusted the rsync server’s configuration. Once you see version 0.0.4 of data_manager_rsync_g2 available for installation on the main ToolShed, you should be able to update the data manager and retrieve data from the data cache.

Thank you so much! I was feeling very defeated and tried just about everything to trick this into working. I have to say I’m very impressed with the support on this forum from the developers and community.

1 Like

Good morning. This is my first bug and I do not know what to tell my users for an ETA. Is there a normal/average timeframe for something like this; days, weeks, or months? Thanks in advance.

1 Like

Hi @wdtraughber

How long it takes for corrections can vary. This one is still pending, and there isn’t an ETA yet.

I asked for an update directly on PR (pull request) here: https://github.com/galaxyproject/tools-iuc/pull/2839. You could do the same for any future fixes you may be tracking or waiting for.

Once the tool is updated, the PR will close out and you’ll find version 0.0.4 in the ToolShed (as @dave noted). The repository is here https://toolshed.g2.bx.psu.edu/view/devteam/data_manager_rsync_g2/e0329ab30f6d. This is the same repository as browsed from within your server when installing tools as an admin.

Corrections/enhancement/updates lifecycle:

  1. Problem identified. An issue ticket may be created along with a PR or in some cases just an issue ticket (no PR yet), or just a PR (more common for technical fixes). This particular problem is tracked here at Galaxy Help, has an open PR, and had an issue ticket (linked to the PR).
  2. Fix is made. PR will close out first, then any associated issue tickets will close out.
  3. Fix applied. If the fix needs to be applied to a public server or resource that the issue ticket/PR is referencing, that is an extra step. When you are the admin pulling in a fix, the fix is for you to apply to your server. This could mean re-installing a tool or updating your server to capture a code update.

As an open-source project, we welcome and rely on community help and contributions! If you are interested in getting involved, please see: https://galaxyproject.org/develop/

If you are ever not sure about the status of a change, you can either review the issue/PR to check the status, update/test yourself, or ask for an update (on the issue ticket or PR is usually best for technical corrections). For this case, what you are watching for is the updated Data Manger, at the specified version, to become available in the ToolShed. But you could also follow or comment on the PR at Github to get notified with an alert when it closes out, changes state, has new comments, etc.

If you are in a hurry or just don’t want to wait: The Human GRCh38 genome sourced from UCSC (hg38) doesn’t need to be sourced from the Rsync server. That is just one choice. You could also install the same exact genome directly from UCSC using different Data Managers. This prior Q&A explains how: Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting

Thanks!

wow, thank you for the thorough and detailed reply! We are not stopped by it as you noted. I’m sure this will not be my last post so knowing the process is immensely helpful for me.

1 Like