Munge Errors submitting job on SGE gridengine Galaxy 18.09

admin
drmaa
cluster
galaxy_1809
#1

I have a new install of Galaxy 18.09 and am attempting to run a test job that returns the number of cores on the cluster (same job we ran at galaxy admin training at Penn State in February). The job errors out at the submission with MUNGE ERROR: INVALID CREDENTIAL FORMAT

Munge and gridengine are configured correctly on the submitting host. I have verified this. I can manually go into the jobs_directory as the galaxy user and run a qsub with the same commands that galaxy would have run and the sub submits successfully.

I will place log output and versioning below:

Logged info with errors:

galaxy.jobs.runners DEBUG 2019-02-19 16:00:05,795 [p:17958,w:1,m:0] [DRMAARunner.work_thread-0] (16) command is: rm -rf working; mkdir -p working; cd working; /Dedicated/clingalproddata/database/jobs_directory/000/16/tool_script.sh; return_code=$?; cd '/Dedicated/clingalproddata/database/jobs_directory/000/16';
[ "$GALAXY_VIRTUAL_ENV" = "None" ] && GALAXY_VIRTUAL_ENV="$_GALAXY_VIRTUAL_ENV"; _galaxy_setup_environment True
python "/Dedicated/clingalproddata/database/jobs_directory/000/16/set_metadata_BVKS9G.py" "/Dedicated/clingalproddata/database/jobs_directory/000/16/registry.xml" "/Dedicated/clingalproddata/database/jobs_directory/000/16/working/galaxy.json" "/Dedicated/clingalproddata/database/jobs_directory/000/16/metadata_in_HistoryDatasetAssociation_16_ZwFjOq,/Dedicated/clingalproddata/database/jobs_directory/000/16/metadata_kwds_HistoryDatasetAssociation_16_Vl4XIt,/Dedicated/clingalproddata/database/jobs_directory/000/16/metadata_out_HistoryDatasetAssociation_16_momZ6M,/Dedicated/clingalproddata/database/jobs_directory/000/16/metadata_results_HistoryDatasetAssociation_16_B5Qnpf,/Dedicated/clingalproddata/database/files/000/dataset_16.dat,/Dedicated/clingalproddata/database/jobs_directory/000/16/metadata_override_HistoryDatasetAssociation_16_Kwuloa" 5242880; sh -c "exit $return_code"
galaxy.jobs.runners.drmaa DEBUG 2019-02-19 16:00:05,865 [p:17958,w:1,m:0] [DRMAARunner.work_thread-0] (16) submitting file /Dedicated/clingalproddata/database/jobs_directory/000/16/galaxy_16.sh
galaxy.jobs.runners.drmaa DEBUG 2019-02-19 16:00:05,865 [p:17958,w:1,m:0] [DRMAARunner.work_thread-0] (16) native specification is: -V -q IIHG -pe smp 2
error: getting configuration: MUNGE authentication failed: Invalid credential format

manually submitting the galaxy created job as galaxy user

[svc-clingalprod@clinical-galaxy 16]$ qsub -clear -V -q IIHG -pe smp 2 galaxy_16.sh
Your job 2711251 ("galaxy_16.sh") has been submitted

Versioning Info

[svc-clingalprod@clinical-galaxy 16]$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[svc-clingalprod@clinical-galaxy 16]$ yum list installed | grep gridengine
*Note* Spacewalk repositories are not listed below. You must run this command as root to access Spacewalk repositories.
gridengine.x86_64              8.1.9-2.el7            @loveshack-SGE
gridengine-devel.noarch        8.1.9-2.el7            @loveshack-SGE
[svc-clingalprod@clinical-galaxy 16]$ yum list installed | grep munge
*Note* Spacewalk repositories are not listed below. You must run this command as root to access Spacewalk repositories.
munge.x86_64                   0.5.11-3.el7           @epel-centos-x86_64-7
munge-devel.x86_64             0.5.11-3.el7           @epel-centos-x86_64-7
munge-libs.x86_64              0.5.11-3.el7           @epel-centos-x86_64-7
[svc-clingalprod@clinical-galaxy 16]$ source /Dedicated/clingalaxy/.venv/bin/activate
(.venv) [svc-clingalprod@clinical-galaxy 16]$ pip search drmaa
drmaa (0.7.9)         - a python DRMAA library
1 Like
#2

At first glance this looks like a drmaa-python problem, similar to https://github.com/pygridtools/drmaa-python/issues/44.
You could try installing python-drmaa according to the instructions in the readme and submit a job via python as in the opening post in https://github.com/pygridtools/drmaa-python/issues/44. If that doesn’t work you could try submitting without the threadpool, or with just a single thread. If that works we may be able to lock job submission somehow, if it doesn’t we’d need to dig deeper in the drmaa / python-drmaa libraries.

1 Like
#3

I agree. I wrote a single thread and a thread-pool Python script outside of galaxy to test. I posted those results in drmaa library threads and munge: Invalid Credential Format . Single-thread submission works.

My skill-set falls short on digging into the code and creating a wrapper to lock the submission on Galaxy. I thought about spinning up PULSAR and using that to submit the job to the SGE Cluster as a means to bypass the drmaa job runner. Could that act as a work-around?

1 Like
#4

No need to open a new post for this, but OK, we can continue in the other thread.

1 Like
drmaa library threads and munge: Invalid Credential Format
#5

Honestly that first post was so all over the place I wanted to delete it and rewrite something a bit more coherent and professional. Apologies.

2 Likes
#6

Iinked post