Hi @Abhisek_Dey
Thanks for sharing more details!
Situation
The link you posted this time did include job errors and I noticed the server administrators now have a banner message posted! Did you see this yet? I think it explains your situation.
Sporadic errors
We currently have a technical problem leading to sporadic errors occurring on random jobs, with a typical “Unable to finish job” error. We are aware of this problem, and we do our best to find a solution to this tricky problem. If you encounter this problem, the workaround is to rerun the job with the same parameters.
Screenshot
What to do
I would suggest trying the workflow reruns, not just job reruns (although you can try that too!). How to at any Galaxy server. → Rerunning only failed jobs in a workflow: Replace and Resume functions
The problem is likely with the cluster configuration and some jobs are failing “by chance” for unclear reasons. While that is always true at some low level (these clusters process thousands, sometimes tens of thousands, of jobs a day), it is higher right now for some reason only those administrators can examine. They will likely post a notice once this is corrected, either as another server message or possibly at their forum.
This might be linked to parameter choices (since those can influence which cluster nodes jobs are sorted to) or it could be truly random! You could share your example and observations with them. Maybe it can help to sort out the underlying issue?
Frustrating I know but I still hope this helps! ![]()
