Kraken2 and Spades not running

HI @zschong and @Jon_Colman

The UseGalaxy.eu is very busy. I see a lot of Kraken2 jobs and related tools running. You both should leave those jobs queued, otherwise they may never get a chance to process.

If I go the EU server homepage, and click into the statistics (how to) I see

Processing work

Scheduled work (data is ready, waiting for a cluster node to free up)

Waiting to be scheduled (data isn’t ready yet, maybe still processing in an upstream tool)

From this data right now, in very general terms, I’d say that there is quite a bit of metagenomics work going on right now, and that there is maybe a training (so, smaller jobs that run quickly) along with a lot of workflows running (real data, these usually run longer) plus people using the tools directly.

The server will balance that load and process the work fairly. The rules are a bit complicated and cluster node allocation on the server can be dynamic… but this is “fair” and as balanced as possible, meaning everyone has an equal chance to move up in that queue line.

So, with all the context in mind

  • Every time a job is deleted and rerun, that new job goes back to the very end of that queue line.

  • If you delete often enough, those jobs may never have a chance to move up in the queue far enough along to get to the processing stage.

  • You can run other jobs even if some are queued in your account.

  • The best advice is to get work queued then let it process. It is the only way to get real work done under this kind of resource competition at a public cluster. Often the same is true in private cluster situations like a university server but those might also have “priority users”. At public servers, all public users have equal priority.

  • Using collections (folders of similar data) and/or workflows (tools strung together to process the inputs in a sequence) can help. Yes, it keeps the history organized, but most importantly all the jobs submitted are scheduled together – that means they are put into one of these “waiting lines” all at once, right at the start. With the new workflow graphics, you can even go to your workflow reports and watch that happening.

  • The alternative, clicking tool-by-tool, dataset-by-dataset, will also queue your work but that is click-by-click, and so much more tedious, and your data is not “in line” for the next tool until you manually start that job, when it could have be in line that whole time if a workflow was used at the start. This way of running real data will always take much longer to complete because of all the time gaps between data being “ready” from an upstream job and the next job actually getting into the queue.

Since I can only check accounts at the server where I am an administrator, let’s ask the EU administrators for some more feedback. There does seem to be a bit of a heavy load for these metagenomics tools in particular. Hi @wm75 is this all expected right now? Thanks!