"cannot allocate memory" error

error
#1

Hi - looking for any advice or solution. I am running some workflows and an unable to complete them because a step crashes giving the “cannot allocate memory” error. However, it randomly occurs during different steps in the workflow. None of my steps are very demanding, they are simple things like Sort or Join Two Datasets, although I am doing a blast search on a transcriptome that I uploaded. Is there a memory cache I need to purge somewhere? I have purged deleted files and am nowhere near by disk quota.

Thanks for any help, I am stuck.

1 Like
#2

Welcome, @grakster!

The memory used to execute jobs is different from the memory used to store account data. How you describe the problem has in the past indicated that the jobs are approaching the memory limit of what the public server can process. In short, sometimes the jobs hit a cluster node that can execute successfully and sometimes not.

Sorting data can require quite a bit of memory, depending on the datatype and size of the dataset. There are a few different “sort” tools, so I can’t offer many more details without knowing which you are using.

That said, a highly fragmented transcriptome can easily exceed resources (BAM or tabular results). Filtering out smaller target sequences can often help, especially if they are too small to capture meaningful hits anyway. Modifying the hit parameters can also help (more stringent criteria). In a BAM result, the header can get very large (includes each target), causing a Sort BAM type of job to fail at the indexing or sorting steps. Often tabular output is a better choice. Only “hits” are retained, that can be later filtered to reduce redundancy, before doing further manipulations that compare the data to other data (join functions). Tools to filter might include: Select, Filter, Sed, Awk, etc. Blast can create a great deal of “common key”, “coordinate overlapping”, and “low specificity” hits. Sometimes running with strict parameters first, then remapping whatever didn’t map originally, at a lower stringency threshold, is a good strategy to avoid generating too much output (often these are uninformative sub-hits).

Join Two Datasets can also be memory intensive, both with processing and with handling the output. It is possible to create pathologically large outputs where each row of the first dataset matches with each row of the second. We do recommend inputting the larger of the two datasets second on the tool form to maximize how the tool uses resources. You might need to modify your query to restrict the output/be more specific or split up the larger dataset into chunks first (tools might include: Line/Word/Character count, Select first/last lines, Remove beginning of a file), then run the join in distinct jobs, and at the end combine the results (tool: Concatenate).

If it turns out that you need your jobs processed as-is, and they are exceeding resources on the public Galaxy Main server at https://usegalaxy.org (or any usegalaxy.* server), the option to move to your own cloud, docker or local Galaxy is available. Enough job processing memory resources would need to be allocated, and that is usually easier with a cloud solution for a few reasons (less administrative setup, custom high-memory cluster choices). Keep in mind that Galaxy itself is always free, but commercial cloud services are not (although AWS does offer grants for education/research). You could also check to see if your affiliations include access to academic cloud resources. If interested, please see:

FAQs: https://galaxyproject.org/support/#troubleshooting

Hope that helps!

#3

Thanks so much, Jennifer, for your prompt and detailed reply. I will look into some of these details, particularly the join and sort statements. I will also investigate doing some filtering of my assembled transcriptomes and BLAST results, as I’m only interested in a few things, and there are doubtless a lot of hits that are not meaningful or useful. Again, I appreciate your advice. Though I am new to Galaxy, I’ve quickly found it to be essential to what I’m working on.

1 Like