Hi- I am wondering again if there is some way around these memory use errors I keep getting. This one is with MACS2 bdgcmp. Is it possible to increase the allocation for this tool?
Hi @TTP
This tool already has a higher memory setting at UseGalaxy.org (is that where you are working?).
You can try at UseGalaxy.eu and UseGalaxy.org.au to see what happens. Resources are a bit different between the servers.
Maybe also investigate some potential input data content or some parameter issue to see if that is the root problem. It is somewhat easy to ask the tool to do something that may not exactly fit the data well (e.g. method choice).
Consider running the data on a per chromosome basis for really large work that you think the public site cannot process. If that also fails, it would be one way to learn if the inputs are Ok or if the tool can process the work at all.
Hope this helps!
OK, thanks. The issue is often caused by the fact that we use multiple control (input) datasets in MACS2 callpeak. This generates a control bedgraph ouput that is really large (>3 or 4 GB sometimes). There is usually no error at this stage, but the huge control file triggers a memory error when the treatment and control bedgraphs go into MACS2 bdgcmp. I wonder if there is a better way to do this such that the input datasets can be somehow combined or averaged before MACS2 callpeak?
Do you mean that the same base might be included in multiple but overlapping regions (lines) of the bedgraph file? Yes, that will cause a problem.
MACS2 wants a 1-1 relationship between a base and a score per file. Then a tool like this one is comparing the score in one file to the score in another file for each specific base and generating the rest.
For this part
Are you pooling the control files? If not, I would suggesting testing that out.
What “grows” the overall size of BEDGRAPH file is more data lines. In the context of this data, more data lines means smaller windows (fewer bases) captured per line. (This is different from overlapping windows sharing bases in common.)
You might decide to drop one or more of the controls from that pooled set if it seems to have outliers, or is fragmenting the dataset too much.
One way to explore this is to run a matrix of different inputs, different parameters, and see what happens. Maybe stick with just one chromosome while you sort this out. At least you would get a result and some message at the end. Those messages can be searched at the MACS google group – the author still visits the site from what I can tell, plus prior Q&A goes into the scientific rationale behind the error reasons (isn’t just technical troubleshooting).
Update:
I’ve decided to open an issue ticket to allocation more working memory for this tool at UseGalaxy.org. You can track this here → Memory request: macs2_bdgcmp at usegalaxy.org · Issue #816 · galaxyproject/usegalaxy-tools · GitHub
I don’t expect that to happen quickly, so do consider testing a different public server can process the job. Try at one of these → UseGalaxy.eu or UseGalaxy.org.au.