Job still running for more than 15 hours

Monish_V · February 3, 2026, 4:53am

I’ve been running HISAT on the given below job API ID
Kindly provide support

|

11ac94870d0bb33a951ae66565734a8e

Gabriel · February 3, 2026, 9:27am

Hello!

The logs show that the job has gone over cgroup memory limit.The job was resubmitted with double the memory and, in total (with re-submission), it ran for 8h. This means that the dataset is bigger than usual and it will take more time to run aswell.

Could you please share the history? We can then provide you with better analysis

Thank you,

Gabriel

Monish_V · February 4, 2026, 8:52am

Hello !
Thank you for the reply

could you please direct me to a source which teaches how to do that ? or can you teach me how to do it?

Monish

Gabriel · February 4, 2026, 3:20pm

Hello,

You can follow the instructions here: FAQ: Sharing your History

Gabriel

Monish_V · February 6, 2026, 2:51pm

Hi Galaxy team,

I’m running HISAT2 on Galaxy for an RNA-seq splicing project comparing MSI vs MSS samples.

I’m seeing a consistent issue where MSI FASTQ files larger than ~2GB run indefinitely (15+ hours and still “running”), while:

MSS samples finish in ~45–60 minutes
Smaller MSI FASTQs also complete normally

All jobs use the same workflow, reference genome, and default HISAT2 parameters.

The affected jobs don’t fail — they just stay in the running state. FASTQs appear normal, and read counts aren’t dramatically different.

As advised by one of the admins earlier, I also tried resubmitting the job with higher requested memory, but unfortunately that didn’t change anything — the MSI jobs still keep running without completing.

I’m wondering if this could be related to:

Galaxy-side limits (walltime / I/O),
HISAT2 behavior with higher mismatch/indel rates in MSI,
or something specific to larger input files.

Has anyone encountered similar behavior with HISAT2 on Galaxy?

Are there recommended parameter tweaks, preprocessing steps, or alternative aligners (e.g., STAR) that might help in this situation?

https://usegalaxy.eu/u/monish123/h/rna-analysis

I’ve also made the history acessible, any help would be appreciated

Thanks!

jennaj · February 7, 2026, 10:42pm

Hi @Monish_V

Thanks for sharing the history! Very helpful.

I don’t see anything special about the mapping run here. It seems to have not started the mapping part at all but instead was quit out before the job was finished with queueing, so it didn’t even get assigned to a cluster node yet. Is this the example you meant to share? Do you want to share the history with the errors from a job that failed during processing? Or have you quit out of all prior jobs?

The best way to get information for a strange job is to allow it to finish processing. A mapping job could queue for a day or so, and then finally start processing. After 15 hours, it may still be queued (how the job above seems to have been) or it may be just starting the actual mapping job.

This doesn’t need to delay you for other analysis. Meaning, you can keep going and start up new work involving other at the same time. That can be in the same history or a different history – either is fine! The server knows how to keep track of tens of thousands of your jobs all at the same time.

I’m wondering if explaining how the clusters work would help?

The color of a dataset can give some clues about the processing stage work is in. The topic below has some short help about computational resources at the public servers.

You’ll find many more explanations in topics tagged with queued-gray-datasets . Some explain how to investigate the server performance, example → How to see the UseGalaxy.eu job queue statistics

Most public servers work about the same way! The best advice is to get your jobs into the queue then to allow them to completely process. If later on an odd error comes in, include a resource issue, you can share the example and we’ll be able to offer advice. This can include reaching out to cluster administrators to learn if resources can be adjusted, but also helping you to organize your data or parameters a bit differently.

As a test, I started up a test history here that I’ll let run over the weekend. It is the same accession you were using. I pulled in the SRR11296739 sample GSM4408849: MSI-H, likely Lynch due MLH1 germline sample 7 tumor tiss... - SRA - NCBI from NCBI the same way you did to start with. Next, I ran a simple generic QA workflow Galaxy on the raw reads to see what happens.

Next, I’ll try to map the reads using HISAT2 (defaults) against the hg38 native index.

This sample isn’t overly large but as @Gabriel explained it will route to a larger and busier cluster node than you may have experienced before (during a training session, or simply when the server was less busy!). My job will probably have the same automatic rerun invoked unless the additional QA helps the reads to map cleaner. In either case, that’s ok, let’s let it complete until the end.

https://usegalaxy.eu/u/jenj/h/test-srr11296739-qa-and-mapping

Hope this helps to keep things going and after the weekend we’ll have some more data to look at. You are still welcome to share back any job that failed on its own (not canceled by you) for a closer look.

Thanks!

Monish_V · February 8, 2026, 12:19pm

Hi @jenj,

Thanks for the detailed explanation — that really helped clarify the queue vs execution states. I think I misunderstood the job status and assumed mapping had already started when it may still have been queued or in early setup.

Just to clarify, I did have a recent HISAT2 job that failed on its own (not canceled by me), but I unfortunately deleted that history afterward. The only message shown was a generic “job failed”, without additional error output. After that, I simply chose to rerun the workflow using a smaller FASTQ, which completed successfully.

Going forward, I’ll let HISAT2 jobs run to completion and will keep and share job IDs for any failures that occur.

Really appreciate the time and effort you’re putting into investigating this and for running the test history as well.

Best,
Monish

jennaj · February 9, 2026, 7:09pm

Great, glad that helped @Monish_V

Right now, I don’t think this accession has especially large inputs. However, I do think you’ll likely want to trim these reads. Review the FastQC plots – even after the default trimming was applied, the reads still have some artifact! If you are not sure how to interpret FastQC plots, see the tool form for a link out to the original documentation where these are explained. Forums like Biostars.org also have a lot of discussion (the tools work the same in Galaxy!).

Also – the NCBI record may have more details about the library chemistry that can help with making QA choices. Then, if that isn’t enough, you can specify to trim by regions if you can’t get a match to a specific adaptor choice – the first 13 bases or so seem to be the target.

So, while mapping tools can sometimes compensate for lower quality read content (in particular, ignoring the ends of reads and mapping the rest), if you have a sample that presents with problems, running some more QA is where to start. After that, you can investigate more – try mapping with BLASTN, reviewing other annotations on those genome regions, that sort of custom review. UCSC is a good place to visualize this kind of work as discussed in topics like this one.

Hope this works out! And if you get an error, and QA isn’t enough, we can try to dig in and see if there is any information in the administrative job logs. Sharing the history with the error result is the best way to start this kind of review.

Thanks!

Monish_V · February 11, 2026, 7:03pm

Hi @jennaJ,

Thank you for the detailed explanation — that really helps. We’ll review the FastQC plots again and look into trimming the first ~13 bases as suggested.

I also wanted to ask: are there any alternative splicing analysis tools available on this Galaxy server (such as rMATS or similar)? We’re planning to analyze exon skipping and intron retention, so I wanted to check what options might be supported here.

Thanks again!

jennaj · February 11, 2026, 8:32pm

Hi @Monish_V

Glad this helped!

And for this part, yes!

Start here → Hands-on: Genome-wide alternative splicing analysis / Genome-wide alternative splicing analysis / Transcriptomics

Then, to review some prior troubleshooting with the details this is a good topic → GTN Transcriptomics Hands-on: Genome-wide alternative splicing analysis. Troubleshooting reference data choices!

Note: limiting the reference data to the primary autosomes chr1-22, chrX, and chrM will yield cleaner results and allow even large mammalian genomes such as human to processing using the public computational infrastructure. You can prepare this data and use it with any tool in Galaxy as custom data inputs (instead of using the default built-in reference genome with the other fragments). More → Reference genomes at public Galaxy servers: GRCh38/hg38 example

A lot of information! The important parts are to plan ahead with the data preparation. Making use of the workflow is another choice – even just to see what it does before tuning with your own tool choices/parameters.

Hope this helps!