Debugging a stuck workflow

Peter_van_Heusden · April 11, 2021, 3:34pm

Hi there

Our Galaxy server at SANBI is running Galaxy 20.01 with 5 mules.

I have a workflow invocation that seems to be stuck despite on some level being reported as running. This workflow takes two inputs: a dataset and a list of pairs. It submits to our cluster here at SANBI. The workflow invocation screen and part of the history looks like this:

Which suggests that there are some running jobs waiting to complete. However, zooming within one of the final collections shows waiting-to-run jobs:

and digging through the details of the workflow execution shows 8 BWA-MEM jobs that are in the “new” state. Eight to be exactly. Here is a snapshot of one from the job view in the Admin panel:

Somehow the scheduler seems to have stopped scheduling these jobs. Any tips to find out why? And how to maybe have the scheduler restart?

Thanks,
Peter
P.S. there are no jobs in error and the upstream datasets for these jobs completed fine.

Peter_van_Heusden · April 11, 2021, 5:40pm

Restarting the Galaxy server got the jobs re-scheduled and completed.

Topic		Replies	Views
Workflow stopped working after Galaxy went offline on 6/18/20 usegalaxy.org support workflow	7	990	July 7, 2020
Workflow execution stalled usegalaxy.org support workflow , queued-gray-datasets , workflow-options	4	960	February 11, 2021
Server delays at Galaxy Main https://usegalaxy.org ~12/3/2019 usegalaxy.org support server-side-delay	7	904	December 5, 2019
I tried to run a new workflow but it stopped scheduling at some point... seems like a better error message is in order? usegalaxy.org support	6	538	October 20, 2023
Job Stuck in Queue usegalaxy.org support public-galaxy-server , server-side-delay , queued-gray-datasets	4	1507	May 15, 2020

Debugging a stuck workflow

Related topics