Workflow troubleshooting post release 24.2

mmb_zorgregio · February 13, 2025, 10:40am

Hi,

We are suddenly experiencing two errors in our workflow which used to work fine. We are getting the same error in fastP as well as Medaka on two different accounts. Is there an explanation for these errors? Both accounts are currently at approx 50-60% of capacity. We’ve used our workflow for a few months now without any errors.

jennaj · February 14, 2025, 12:43am

Hi @mmb_zorgregio

That error is produced by the tool and the cluster that tool was running on and is unrelated to how much storage space is in your account.

We have a few topics explaining it more, this is one of them.

If you want to share back some details about the server you were working on and the other run details we can try to help you to solve what is going wrong. If you have used this workflow before, there is likely something special about the new data. Maybe it doesn’t fit the existing parameters? Or has a format problem?

How to generate the share link is in the banner of the forum, also here. → How to get faster help with your question. I’ll help to trace it all the way back to the root reason, so if there is something about the release itself impacting this, we will learn about that too!

Thanks!

mmb_zorgregio · February 14, 2025, 10:01am

Hi,

Thnx for your reply. The data input hasn’t changed (as far as I know). It’s data generated with the Nanopore MinION (rapid barcoding). The data has been generated during te last couple of months. And used to work fine until this Wednesday. I’ve tried to run fastq files generated in different runs on different days and they are all not working.

mmb_zorgregio · February 14, 2025, 10:13am

The errors (seen in multiple samples, from multiple runs):

Worked fine (same run as the other two samples below).

One of the histories failing at the fastP step

History failing at the porechop step (data is from the same batch as the above history)

Different account, different run, with an error at the Medaka step.

jennaj · February 14, 2025, 10:10pm

Hi @mmb_zorgregio

These jobs are actually running out of memory during processing. The sample that did process correctly has under half the data, and slightly shorter reads.

We are reviewing whether the tools can be allocated more resources. This will not be immediate, but we might have some feedback later next week. We’ll follow up here.

For now, to get this data processed, you can try at the UseGalaxy.eu server since all of these tools have a bit more runtime processing memory allocated by default.

This guide explains how to move data between servers and how to manage the account quota resources at each. This error is unrelated to quota but if you are running batches of work, that probably matters to you for other reasons.

Thanks! More followup next week! Please feel free to ping me for an update.

mmb_zorgregio · February 17, 2025, 6:41pm

Hi,

Thank you for your explanation. It would be nice if some more resources can be allocated to these specific tools. However, I doubt if it will solve the problem. Below you can find a run where the data input is higher than some of the workflows that now give an error. I just checked some more old assemblies and we do have more with >1.5 Gb that worked fine. In addition, I tried to run the workflow on a relatively small dataset and it failed. So I’m not sure the size of the datasets is the issue.

jessemartin · February 20, 2025, 7:59pm

Hi @mmb_zorgregio just having a quick look at your workflow it looks as if we are doing similar work with nanopore sequences and i have also been having similar issues with fastP. The way i got around it is by collapsing the collection (or concatenating) after fastP. It will mess up your’e html report as you get one report for each fastq file.

jennaj · February 24, 2025, 11:06pm

Hi @mmb_zorgregio

Thanks for sharing the history (the second is just a link to a general default history listing).

The length of the sequence is my next guess about what is scaling up the memory usage so much. The longest read here is 80k, and in the failed those are over 100k. Shorter reads processed in your other example, too.

Other than that there is maybe something about the read content, so yes, I agree that the “size on disk” of the data file is not necessarily the root issue. The fastp manual might have more about how memory is used for some of your data use cases.

I was trying to load your prior histories shared above but these seem to have changed since when I looked at them originally. Did you rerun some of the data in those? Do you still have an example of the problem?

And then @jessemartin do you want to share an example? Your issue seems to be occurring after fastp. With which tool? Medaka? There are a few ways to run that, maybe we can come up with a solution. Examples would help.

Thanks!

jessemartin · February 24, 2025, 11:33pm

Hi @jennaj

This is an example of the issues I was having. This history shows 3 separate workflow invocations using the same FastQ files. The first is where all 112 fastQ files are parsed to FastP individually before the filtered fast Q files are concatenated to be assembled etc. The second invocation is the same just with some slight of hand to make the sample report appear correctly. The third is where the 112 FastQ files are concatenated first and then filtered using FastP which then gives the error.

(Note: My standard workflow is to concatenate first then filter and this usually works its only about once or twice in every 50 samples that I’m forced to use this work around.)

Hope this helps, or at least doesn’t confuse the issue further.

Jesse

jennaj · February 25, 2025, 12:18am

Hi @jessemartin

Thanks for sharing the history! Very helpful.

I sent in a bug report to the EU administrators. Maybe they can get more details about exactly where the error is coming from. Is it from fastp itself or something else.

Offhand, it looks like a memory error, which is a bit harder to reach at the EU server due to the way the clusters are allocated. There is some predictive estimation of the “job size” to determine the target cluster. Maybe that can be adjusted for use cases like yours.

They’ll get back to us here.

Topic		Replies	Views
Mummer_show_coords/4.0.0rc1+galaxy2. usegalaxy.org support troubleshooting	1	160	March 6, 2024
"cannot allocate memory" error usegalaxy.org support exceeds-memory-error	2	1862	April 18, 2019
error with Picard's MarkDuplicates - This job was terminated because it used more memory than it was allocated. usegalaxy.org support picard , exceeds-memory-error	2	189	July 22, 2024
qiime2 feature-classifier fit-classifier-naive-bayes exceeds-memory-error , tool-help , qiime2	1	50	September 9, 2024
MEME Memory Allocation Problem usegalaxy.org support epigenetics , exceeds-memory-error	0	340	May 8, 2020

Related topics