Job submission failed when daemon

When i run this command sh run.sh, jobs start as expected using job_conf.xml.

      <?xml version="1.0"?>
      <!-- A sample job config that explicitly configures job running the way it is
           configured by default (if there is no explicit config). -->
      <job_conf>
          <plugins>
              <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
              <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/>
          </plugins>
          <destinations default="local">
              <destination id="local" runner="local"/>
              <destination id="big_jobs" runner="drmaa">
                  <param id="nativeSpecification">-q cassava.q -pe snode 35</param>
              </destination>
          </destinations>
          <tools>
             <tool id="maker" destination="big_jobs" />
             <tool id="metaspades" destination="big_jobs" />
          </tools>
      </job_conf>

when i run sh run.sh --daemon , jobs don’t go in the queue:
Do i need something more the config ?

    [pid: 43975|app: 0|req: 60/66] 127.0.0.1 () {46 vars in 1134 bytes} [Fri Apr 24 17:41:18 2020] GET /api/histories/33b43b4e7093c91f/contents?order=hid&v=dev&q=update_time-
    ge&q=deleted&q=purged&qv=1970-01-01T00%3A00%3A00.000Z&qv=False&qv=False => generated 4572 bytes in 258 msecs (HTTP/1.1 200) 3 headers in 139 bytes (1 switches on core 3)
    galaxy.jobs.runners.drmaa ERROR 2020-04-24 17:41:19,971 [p:43975,w:1,m:0] [DRMAARunner.work_thread-0] (17) drmaa.Session.runJob() failed unconditionally
    Traceback (most recent call last):
      File "lib/galaxy/jobs/runners/drmaa.py", line 188, in queue_job
        external_job_id = self.ds.run_job(**jt)
      File "/media/vol2/home/galaxy/galaxy/.venv/lib/python3.6/site-packages/pulsar/managers/util/drmaa/__init__.py", line 67, in run_job
        return DrmaaSession.session.runJob(template)
      File "/media/vol2/home/galaxy/galaxy/.venv/lib/python3.6/site-packages/drmaa/session.py", line 314, in runJob
        c(drmaa_run_job, jid, sizeof(jid), jobTemplate)
      File "/media/vol2/home/galaxy/galaxy/.venv/lib/python3.6/site-packages/drmaa/helpers.py", line 302, in c
        return f(*(args + (error_buffer, sizeof(error_buffer))))
      File "/media/vol2/home/galaxy/galaxy/.venv/lib/python3.6/site-packages/drmaa/errors.py", line 151, in error_check
        raise _ERRORS[code - 1](error_string)
    drmaa.errors.DrmCommunicationException: code 2: unable to send message to qmaster using port 6444 on host "node01": got send timeout
    galaxy.jobs.runners.drmaa ERROR 2020-04-24 17:41:20,025 [p:43975,w:1,m:0] [DRMAARunner.work_thread-0] (17) All attempts to submit job failed
    galaxy.tools.error_reports DEBUG 2020-04-24 17:41:20,864 [p:43975,w:1,m:0] [DRMAARunner.work_thread-0] Bug report plugin <galaxy.tools.error_reports.plugins.sentry.SentryPlugin object at 0x7fb824231470> generated response None
    galaxy.tools.error_reports DEBUG 2020-04-24 17:41:20,865 [p:43975,w:1,m:0] [DRMAARunner.work_thread-0] Bug report plugin <galaxy.tools.error_reports.plugins.sentry.SentryPlugin object at 0x7fb824231470> generated response None
    galaxy.tools.error_reports DEBUG 2020-04-24 17:41:20,865 [p:43975,w:1,m:0] [DRMAARunner.work_thread-0] Bug report plugin <galaxy.tools.error_reports.plugins.sentry.SentryPlugin object at 0x7fb824231470> generated response None
    127.0.0.1 - - [24/Apr/2020:17:41:23 +0200] "GET /api/histories/33b43b4e7093c91f/contents?order=hid&v=dev&q=update_time-ge&q=deleted&q=purged&qv=1970-01-01T00%3A00%3A00.000Z&qv=False&qv=False HTTP/1.1" 200 - "http://127.0.0.1:8081/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36"

Have you confirmed that the Galaxy worker node can access qmaster using port 6444 on host “node01”?

Yes, node01 can access qmaster but i don’t know how to check the port. the port used by drmaa change when the daemon mode is activated ?

After a modification of job_conf to add c library drmaa, the error on the port disappear, but I still have the issue with daemon mode.
The job never pass by the destination id step.

with no daemon

galaxy.jobs.runners.drmaa DEBUG 2020-04-28 17:07:21,424 [p:32191,w:1,m:0] [DRMAARunner.work_thread-1] (47) submitting file /media/vol2/home/galaxy/galaxy/database/jobs_directory/000/47/galaxy_47.sh
galaxy.jobs.runners.drmaa DEBUG 2020-04-28 17:07:21,425 [p:32191,w:1,m:0] [DRMAARunner.work_thread-1] (47) native specification is: -q cassava.q -pe snode 10
galaxy.jobs.runners.drmaa INFO 2020-04-28 17:07:21,440 [p:32191,w:1,m:0] [DRMAARunner.work_thread-1] (47) queued as 1007701
galaxy.jobs DEBUG 2020-04-28 17:07:21,441 [p:32191,w:1,m:0] [DRMAARunner.work_thread-1] (47) Persisting job destination (destination id: big_jobs)
galaxy.jobs.runners.drmaa DEBUG 2020-04-28 17:07:22,042 [p:32191,w:1,m:0] [DRMAARunner.monitor_thread] (47/1007701) state change: job is queued and active

with daemon

galaxy.jobs.runners.drmaa DEBUG 2020-04-28 15:39:16,138 [p:26779,w:1,m:0] [DRMAARunner.work_thread-1] (45) submitting file /media/vol2/home/galaxy/galaxy/database/jobs_directory/000/45/galaxy_45.sh
galaxy.jobs.runners.drmaa DEBUG 2020-04-28 15:39:16,138 [p:26779,w:1,m:0] [DRMAARunner.work_thread-1] (45) native specification is: -q cassava.q -pe snode 10

job_conf

<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is
     configured by default (if there is no explicit config). -->
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
            <param id="drmaa_library_path">/media/vol1/gridengine/lib/linux-x64/libdrmaa.so</param>
        </plugin>
    </plugins>
    <destinations default="local">
        <destination id="local" runner="local"/>
        <destination id="big_jobs" runner="drmaa">
            <param id="nativeSpecification">-q cassava.q -pe snode 10</param>
        </destination>
    </destinations>
    <tools>
       <tool id="maker" destination="big_jobs" />
       <tool id="metaspades" destination="big_jobs" />
    </tools>
</job_conf>

Was this ever resolved? I’m having the same issue wrt daemon mode where jobs are never submitted to SGE. When I execute run.sh it works. There are no identifiable errors in the log.