Jobs Stuck processing with a workflow: Using the job cache to rerun workflows with canceled outputs

Hi,

I’ve started a workflow with 42 samples. They were all scheduled for RNA Star, 5 hours later 39 samples were successfully completed. However, now 19 hours after the samples went into Star 2 are still processing. Is this normal? What to do in this case without messing with the data produced and the workflow?

Is this still an issue? We do not see any Star jobs running still so maybe the two remaining jobs have completed by now.

In general, it can sometimes happen that individual jobs end up on compute that’s experiencing degraded performance for unrelated reasons.

Best way to report really long running jobs is via an email to Galaxy Europe, in which you also state your username or a link to the details page of the long-running job.

Most of the time, just waiting patiently will be your best option, but we can then let you know if rerunning is better in the specific case.

Thank you for your advice. I ended up cancelling the workflow and ran those two samples manually. Would be great if jobs had a “push” or “re-que” option that pops up, eg after 12 hours or 24 hours or so, since it’s particularly elaborate in galaxy to manipulate collections (delete, re-add entries), if not impossible.

Hi @M_M

Instead of re-queuing individual jobs there is a better way! (This would be misused on the public server, leading to havic for our admins! And not even intentional, just by not understanding why it would be sorting the queues “too much”.),

:face_with_monocle: Use case: A few jobs failed or were canceled. In any case, they can’t be resumed/replaced for some reason.

:blush: Solution: Rerun the entire workflow, with the original inputs, and toggle the Re-use jobs with identical parameters option.

Tool tip: Enabling this option will use cached jobs for the workflow invocation. This is useful if you want to reuse the results of a previous job for the same input data.

Translation: As long as the results for that exact same job still exist and the job was run by you originally, it is in your cache somewhere. (exact means everything: inputs, parameters, tool version, and don’t delete all copies yet!). This option will pull in the prior successful run instead of queuing a job from scratch. Any jobs that didn’t complete wouldn’t have a valid, successful, cache result available, so those would get newly queued as usual. In practical use, this makes reruns quick, with no 1-off collection manipulations, as it only is re-queuing what original didn’t work for some reason. There is a rerun button at the top of workflow invocation reports that most people would use for this – so just a few clicks, then this extra toggle to get it going.

For other use cases, if you have a question about a collection manipulation, please ask! The combinations should cover all or most use cases, but if you think we missed something, we can get it ticketed for review. Or, you are welcome to share a use-case and open a ticket at the main repository. We rely on this kind of real-analysis feedback and you’ll see many more issues there, both resolved and pending, just like this!

Hope this helps and glad you had the work finished!