CheckM lineage_wf

sm9412 · January 31, 2025, 10:22am

I am unable to run CheckM on Galaxy. Previously, I used the public server galaxy.org, but it showed the message, “The job was terminated because it was using more memory than allocated.” I attempted to run the same job with smaller files in the MB range, but the problem persisted.

I then tried using other Galaxy servers, and now I am receiving an error stating, “Problem with the dataset.” However, the dataset contains only FASTA files.

Can someone please help me with this issue?

jennaj · January 31, 2025, 8:58pm

Hi @sm9412

I think I helped you with the first error via a bug report, yes?

Good that you tried at the EU server!

Do you want to share the entire job log message? Find this using the i-icon within a dataset. Then scroll down into the report and expand the sections.

You can screenshot all of that and post back here in a reply. Make sure to capture the parameters at the top, expand each input section, then fully expand the outputs and all the different log sections since I’m not sure exactly which part will matter yet. This will be several screenshots but that is ok! If you have more than one job that failed, and the messages are different, then please capture those, too. We are looking for clues to solve this!

Another clue could come from a tool like Fasta Statistics. You could run this on the same collection you are using for this tool’s input. What is the base composition? Any IUPAC characters? What about Ns?

You could also post back a link to your shared history with the inputs and output failed job(s). But I remember what those inputs looked like before – nucleotide bin output from MetaBAT2, correct? – so that might not be needed yet if these are the same. You decide. More context means better, more specific help.

Let’s start there!

Xref →

sm9412 · February 4, 2025, 9:47am

Hey,
I have attached all the screenshots of the job. Could you please help me resolve this issue?

Thank you!

jennaj · February 4, 2025, 6:40pm

Great, thanks @sm9412 !

If you could expand dataset 600 we could inspect the message reported on the dataset, or at least see the metadata, and maybe find a clue.

And, were you able to run the statistics tool? You can post back those stats. We are checking for any non-ATCG (and maybe N) characters.

From that information we can probably determine if this is a technical issue or an actual scientific “result” – meaning, maybe this data will require different parameters than the defaults.

This tool needs more memory allocated at the ORG server (fails even with the tutorial data) but that wouldn’t be expected at the EU server normally, unless the data is really exploding up the memory usage for some reason. Experimenting with the QA options on/off might help but let’s see the message on that red dataset first. Or, do both?

Thanks!