Workflow troubleshooting post release 24.2

Hi,

We are suddenly experiencing two errors in our workflow which used to work fine. We are getting the same error in fastP as well as Medaka on two different accounts. Is there an explanation for these errors? Both accounts are currently at approx 50-60% of capacity. We’ve used our workflow for a few months now without any errors.




Hi @mmb_zorgregio

That error is produced by the tool and the cluster that tool was running on and is unrelated to how much storage space is in your account.

We have a few topics explaining it more, this is one of them.

If you want to share back some details about the server you were working on and the other run details we can try to help you to solve what is going wrong. If you have used this workflow before, there is likely something special about the new data. Maybe it doesn’t fit the existing parameters? Or has a format problem?

How to generate the share link is in the banner of the forum, also here. → How to get faster help with your question. I’ll help to trace it all the way back to the root reason, so if there is something about the release itself impacting this, we will learn about that too!

Thanks! :slight_smile:

Hi,

Thnx for your reply. The data input hasn’t changed (as far as I know). It’s data generated with the Nanopore MinION (rapid barcoding). The data has been generated during te last couple of months. And used to work fine until this Wednesday. I’ve tried to run fastq files generated in different runs on different days and they are all not working.

The errors (seen in multiple samples, from multiple runs):

Worked fine (same run as the other two samples below).

One of the histories failing at the fastP step

History failing at the porechop step (data is from the same batch as the above history)

Different account, different run, with an error at the Medaka step.

1 Like

Hi @mmb_zorgregio

These jobs are actually running out of memory during processing. The sample that did process correctly has under half the data, and slightly shorter reads.

We are reviewing whether the tools can be allocated more resources. This will not be immediate, but we might have some feedback later next week. We’ll follow up here.

For now, to get this data processed, you can try at the UseGalaxy.eu server since all of these tools have a bit more runtime processing memory allocated by default.

This guide explains how to move data between servers and how to manage the account quota resources at each. This error is unrelated to quota but if you are running batches of work, that probably matters to you for other reasons. :slight_smile:

Thanks! More followup next week! Please feel free to ping me for an update.

Hi,

Thank you for your explanation. It would be nice if some more resources can be allocated to these specific tools. However, I doubt if it will solve the problem. Below you can find a run where the data input is higher than some of the workflows that now give an error. I just checked some more old assemblies and we do have more with >1.5 Gb that worked fine. In addition, I tried to run the workflow on a relatively small dataset and it failed. So I’m not sure the size of the datasets is the issue.

Hi @mmb_zorgregio just having a quick look at your workflow it looks as if we are doing similar work with nanopore sequences and i have also been having similar issues with fastP. The way i got around it is by collapsing the collection (or concatenating) after fastP. It will mess up your’e html report as you get one report for each fastq file.

Hi @mmb_zorgregio

Thanks for sharing the history (the second is just a link to a general default history listing).

The length of the sequence is my next guess about what is scaling up the memory usage so much. The longest read here is 80k, and in the failed those are over 100k. Shorter reads processed in your other example, too.

Other than that there is maybe something about the read content, so yes, I agree that the “size on disk” of the data file is not necessarily the root issue. The fastp manual might have more about how memory is used for some of your data use cases.

I was trying to load your prior histories shared above but these seem to have changed since when I looked at them originally. Did you rerun some of the data in those? Do you still have an example of the problem?

And then @jessemartin do you want to share an example? Your issue seems to be occurring after fastp. With which tool? Medaka? There are a few ways to run that, maybe we can come up with a solution. Examples would help.

Thanks!

Hi @jennaj

This is an example of the issues I was having. This history shows 3 separate workflow invocations using the same FastQ files. The first is where all 112 fastQ files are parsed to FastP individually before the filtered fast Q files are concatenated to be assembled etc. The second invocation is the same just with some slight of hand to make the sample report appear correctly. The third is where the 112 FastQ files are concatenated first and then filtered using FastP which then gives the error.

(Note: My standard workflow is to concatenate first then filter and this usually works its only about once or twice in every 50 samples that I’m forced to use this work around.)

Hope this helps, or at least doesn’t confuse the issue further.

Jesse

1 Like

Hi @jessemartin

Thanks for sharing the history! Very helpful.

I sent in a bug report to the EU administrators. Maybe they can get more details about exactly where the error is coming from. Is it from fastp itself or something else.

Offhand, it looks like a memory error, which is a bit harder to reach at the EU server due to the way the clusters are allocated. There is some predictive estimation of the “job size” to determine the target cluster. Maybe that can be adjusted for use cases like yours.

They’ll get back to us here. :slight_smile: