Galaxy Cloudman instance - jobs randomly not dispatching

Hello. I recently launched a Galaxy CloudMan 18.05 instance on an AWS m5a.4xlarge server. Some Galaxy jobs are not dispatching (i.e., they stay gray in the history and never turn yellow). If you resubmit the same job, it will frequently immediately dispatch, but the original hung job never starts running. The stuck jobs seem to occur randomly. Please let me know if anyone has any advice regarding this (e.g., a configuration recommendation that will avoid the problem or a technique to un-hang an enqueued job). Thank you.

Geoffrey H. Smith, MD
Emory University

1 Like

My first thought was that there weren’t enough CPUs available on the cluster, particularly as some tools specify larger requirements than others. Doesn’t sound like that’s it given the resubmission works. Can you look in the Galaxy log if there is any messages about job scheduling? You can reach the Galaxy log from the CloudMan Admin page and sift through it some, or post it somewhere and reply here with a link.

2 Likes

Thank you. I restarted the AWS virtual machine this morning. So far, all jobs submitted since the restart are dispatching and I have been unable to replicate the problem. If the problem replicates today I will post the Galaxy log from the CloudMan admin page.

1 Like