Galaxy throws error when running internal tools with Pulsar (i.e. uploading data)

Hi everyone,

I’m having some problems with my Pulsar installation and was wondering if anyone can give me some advice on what I might be doing wrong.

In my current setup I was able to use some tools from the tool shed using Pulsar, however internal Galaxy tools do not work. For example the internal upload tool does not work on my Pulsar server using the web variant. The job returns the following error:

Traceback (most recent call last): File “/data/galaxy/server/files/staging/36/tool_files/data_fetch.py”, line 13, in from galaxy.datatypes import sniff ModuleNotFoundError: No module named ‘galaxy.datatypes’
An error occurred with this dataset
Traceback (most recent call last):
File “/data/galaxy/server/files/staging/36/tool_files/data_fetch.py”, line 13, in
from galaxy.datatypes import sniff
ModuleNotFoundError: No module named ‘galaxy.datatypes’

When I configure my local_env.sh in my Pulsar configuration to contain export TEST_GALAXY_LIBS= 1 I also get the below error, which make me suspect that this problem is because somehow Pulsar cannot use its local Galaxy installation correctly because I’ve misconfigured something.

Blockquote
Sourcing file ./local_env.sh
Traceback (most recent call last):
File “”, line 1, in
ImportError: cannot import name ‘eggs’ from ‘galaxy’ (/data/galaxy/venv/lib/python3.8/site-packages/galaxy/init.py)
Failed to setup Galaxy environment properly, is GALAXY_HOME (/data/galaxy) a valid Galaxy instance.

My Galaxy installation is in /data/galaxy. I’ve tried changing the python path in the local_env.sh to several things like:

export PYTHONPATH=${PYTHONPATH}:/data/galaxy/server/lib/galaxy/jobs/rules

or

export PYTHONPATH=${PYTHONPATH}:/data/galaxy/server/lib/

But neither seems to be working.

This is my local_env.sh configuration:
 
 ## Place local configuration variables used by Pulsar and run.sh in here. For example
 
 ## If using the drmaa queue manager, you will need to set the DRMAA_LIBRARY_PATH variable,
 ## you may also need to update LD_LIBRARY_PATH for underlying library as well.
 #export DRMAA_LIBRARY_PATH=/path/to/libdrmaa.so
 
 
 ## If you wish to use a variety of Galaxy tools that depend on galaxy.eggs being defined,
 export GALAXY_HOME=/data/galaxy
 export PYTHONPATH=${PYTHONPATH}:/data/galaxy/server/lib
 export TEST_GALAXY_LIBS=1

And this is my job_conf.xml config on the side of the Galaxy server:

<job_conf>
    <plugins workers="4">
        <plugin id="local_plugin" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/>
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarLegacyJobRunner"/>
    </plugins>
    <destinations default="remote_cluster">
        <destination id="local_destination" runner="local_plugin"/>
        <destination id="remote_cluster" runner="pulsar">
            <param id="url">http://pulsar-server-02:8913/</param>
            <param id="dependency_resolution">remote</param>
        </destination>
    </destinations>
    <tools>
    </tools>
</job_conf>

A workound to this problem I found is to configure the DATA_FETCH tool to run using the local_destination destination. However, I’d like to know if there is a way to have all tools including internal ones run on the Pulsar side or whether there is a better way entirely to solve this problem.

I also found this github issue that seems to be related to my question, but the fix from this issue did not work for me unfortunately: Galaxy eggs? · Issue #134 · galaxyproject/pulsar · GitHub

Any advice or pointers would be greatly appreciated!

1 Like

Welcome, @Turgon

Thanks for posting all of these details. I’ve asked our Admin experts for advice at their chat. They may reply here or there, and feel free to join the chat! You're invited to talk on Matrix

This issue can be marked as solved because I found out that I shouldn’t be using Pulsar in the way I described above and found a better way to run a Galaxy cluster in the meantime.

1 Like