Job still running for more than 15 hours

Hi @Monish_V

Thanks for sharing the history! Very helpful.

I don’t see anything special about the mapping run here. It seems to have not started the mapping part at all but instead was quit out before the job was finished with queueing, so it didn’t even get assigned to a cluster node yet. Is this the example you meant to share? Do you want to share the history with the errors from a job that failed during processing? Or have you quit out of all prior jobs?

The best way to get information for a strange job is to allow it to finish processing. A mapping job could queue for a day or so, and then finally start processing. After 15 hours, it may still be queued (how the job above seems to have been) or it may be just starting the actual mapping job.

This doesn’t need to delay you for other analysis. Meaning, you can keep going and start up new work involving other at the same time. That can be in the same history or a different history – either is fine! The server knows how to keep track of tens of thousands of your jobs all at the same time.

I’m wondering if explaining how the clusters work would help?

The color of a dataset can give some clues about the processing stage work is in. The topic below has some short help about computational resources at the public servers.


You’ll find many more explanations in topics tagged with queued-gray-datasets . Some explain how to investigate the server performance, example → How to see the UseGalaxy.eu job queue statistics

Most public servers work about the same way! The best advice is to get your jobs into the queue then to allow them to completely process. If later on an odd error comes in, include a resource issue, you can share the example and we’ll be able to offer advice. This can include reaching out to cluster administrators to learn if resources can be adjusted, but also helping you to organize your data or parameters a bit differently.

As a test, I started up a test history here that I’ll let run over the weekend. It is the same accession you were using. I pulled in the SRR11296739 sample GSM4408849: MSI-H, likely Lynch due MLH1 germline sample 7 tumor tiss... - SRA - NCBI from NCBI the same way you did to start with. Next, I ran a simple generic QA workflow Galaxy on the raw reads to see what happens.

Next, I’ll try to map the reads using HISAT2 (defaults) against the hg38 native index.

This sample isn’t overly large but as @Gabriel explained it will route to a larger and busier cluster node than you may have experienced before (during a training session, or simply when the server was less busy!). My job will probably have the same automatic rerun invoked unless the additional QA helps the reads to map cleaner. In either case, that’s ok, let’s let it complete until the end.

Hope this helps to keep things going and after the weekend we’ll have some more data to look at. You are still welcome to share back any job that failed on its own (not canceled by you) for a closer look.

Thanks! :slight_smile: