How jobs queue and execute

I am facing same issue with RNA STAR. I tried rerunning job multiple times and spent 6 days in doing so, however 1 or 2 samples in batch get stuck every time.

Welcome, @Vikrant577

This is the part that is likely the root issue

Why? Every time you restart the job, it begins back at the very end of the server queue. If you do this often enough, a job may never get a chance to run!

When you next queue your jobs, consider check-marking the box to set an email notification. That way you’ll know when it is done.

Then, longer term, when you move on to using workflows, those will queue all of your jobs at once. This allows data to flow through tools without manual interactions. Plus, these also have notification settings.

More about how jobs at public clusters work

Hope this helps! :slight_smile:

I am experiencing an issue with batch RNA Star jobs. After waiting for 2 days for a job to finish, it was stopped with the error “time out.” I have noticed that when running batch RNA Star jobs, the process works smoothly for almost all samples but gets stuck with the last one or two samples.

When I run each sample individually using RNA Star, I do not encounter this issue. However, running each sample separately is extremely time-consuming, especially since the samples are already organized in a paired-end collection. This makes it challenging to create an efficient workflow.

Hi @Vikrant577

I’m not sure what this means or where it is presenting. Can you explain a bit more about it?

For example, I have 10 paired-end RNAseq FASTQ files in a collection named X. When I provide collection X as input to RNA STAR along with the reference genome, 8 files in the collection are analyzed smoothly. However, the last 2 files remain in the running mode (orange color) for 2 days and eventually the job gets killed due to a timeout.

In contrast, when I input each paired-end file individually into RNA STAR, the job runs smoothly and completes within approximately 30 mins for each of the 10 paired-end files (5 hours for all).

How can you tell these were timed out? This is the part I am confused about (sorry!). Are you getting a message in the logs? Could you share that? Or, you are running with a workflow, is that showing up on the workflow invocation report?

Technically, all of the jobs inside of a collection of jobs are themselves their own single jobs. It shouldn’t make any difference to the processing whether those are run in a batch or not. Same for all individual workflow jobs. The “net” around the batch of jobs is just a way of clustering them.

And, I forgot which server you are working at, so maybe clarify that as well since it will make a difference for some error messages.