Diffbind memory issues

Hi GalaxyHelp Group,

I am trying to run a Diffbind analysis using previously-successful parameters and have submitted a series of “bug” reports over the last 4-5 days without a response-- figured it may be best to reach out here. My analyses are being performed under my .upenn account address [listed in my Account here; the email sign-in link isn’t working for that address, which is a separate issue].

At any rate, the error message for my analysis has consistently been, “This job was terminated because it used more memory than it was allocated”. I’m performing this run with n=3 for each condition. “narrowPeaks” .bed files and associated .bam files are the inputs.

I’ve previously run this analysis with n=4 inputs with no issues. Could you please advise?

Thank you!

Welcome, @readlikeabook

Yes, posting questions here is how to get the fastest feedback.

I reviewed your jobs earlier today and wrote back.

Two important parts:

  1. The first condition must be labeled as Condition (exactly and only – it is just how Bioconductor wrote the tool as far as I know!)

  2. Put all of the peaks and BAM datasets into collections, and make sure these are in the same sort order so that the tool can “match up” a peak file with a BAM file.

This will mean a minimum of four collections for two conditions. You can create these using two collections of fasta files (then map, filter, call peaks) to start with or you can organize your uploaded data after. Keep the element identifiers very simple (or, adjust to be simple) or the tool will have a different sort of trouble.

Your input collections should look something like this. Notice how simple the names of the collection are. If the element identifiers were exacted, those labels are also very simple and in the same order between peak-bam sets. By “simple” I mean these are R friendly: Letters, numbers, no spaces, not starting with a number, optional underscores (the only special character allowed).

If you are not sure how to manipulate data inside collection folders, this tutorial is a sort of “cheat sheet” → Hands-on: Using dataset collections / Using dataset collections / Using Galaxy and Managing your Data

And more tips for Bioconductor tools are in here. → FAQ: Extended Help for Differential Expression Analysis Tools

If you get stuck, you can share back a link to your history and we can use it as an example while I help you to get it organized in a way this tool can understand. :slight_smile:

ps: The error message about “memory” can mean a job is actually too large to process at the public clusters, but that is somewhat rare. It is much more likely that a tool ran away due to problems with the inputs – content or labeling – so that is what we are exploring first. More about that catch-all error → FAQ: Understanding 'exceeds memory allocation' error messages

Thank you so much for your help and for taking the time to reply twice! I followed your advice and re-named my input .bam files (provided by a commercial group that we had sequence, trim, align, and filter the data) per your recommendations. I then put these files into collections (based on treatment group), converted to BED, and performed MACS2. Then, sorted all collections to ensure they were in order, confirmed names were ok, and tried to run Diffbind (first “Group” in Diffbind was named, “Condition”, second named, “Control”. Unfortunately, still no luck-- the same error message showed up. I’ve included a link to the history, below, but will make it private after today to safeguard the data. If needed, I can also share directly with your/your group’s preferred email! Thank you!

https://usegalaxy.org/u/kread/h/cae-r2

Best,
Kaitlin

Hm-- on second view, I found one more potential issue-- original .bam files (added to each collection) have a period in their name (thus, not R friendly). I will correct and retry/report back!

Following back up – I seem to have complicated the issue. I’m attempting to re-upload the files for CAE R2 (the history I shared) and will start over, there. I’m now working through another dataset for this project, and once I rename the files (to play nice with R), additional tools are failing to work. (E.g. I am trying to convert the .bam files to BED files for use in MACS2, and the tool is failing for “unknown reasons”.) Please find the link to this history, below. Your help and guidance to get these data organized to go through Diffbind is much appreciated.

https://usegalaxy.org/u/kread/h/cae-r1

I used this workflow for years with other samples; perhaps I just got lucky in how I named the files for that previous analysis…

Thank you!

Best,
Kaitlin

Hi @readlikeabook

Thanks for sharing the history! :slight_smile:

The errors that look like this are a temporary problem we are investigating right now at UseGalaxy.org. Let’s get that resolved first so we can solve your runtime issues.

Do you mean these same steps, or a Galaxy workflow? You could share that with the history together if you wanted.

I’d really like to get this working for you. More feedback about this very soon.

Thanks so much-- I thought it was something I did that generated an error in the file(s). I removed my original CAE R2 history (files and all); is there, by chance, a way to get that back?

Regarding my previous work, apologies for the unclear language; I used these same steps with different samples (though those .bam files I filtered and prepped myself). I was able to get access to the account (which used my previous institution’s email address), and I’ve included a link to that history, below. It does look like I was lucky enough to have named the files appropriately. However, I named the conditions… whatever. Definitely not R friendly, or “Condition”, which is odd.

I’m trying to post the link to my older account, but keep getting a, “you cannot post a link to that host” error message. Perhaps because of the different login? I look forward to hearing back from you-- thanks again!

Hi @readlikeabook

The server issue at UseGalaxy.org should be resolved now if you want to continue.

Was your other account at a different server? Or was this also at UseGalaxy.org?

You could extract a workflow from your prior history and use that anywhere else. Maybe there were extra steps that were missed.

For more about accounts at public servers, please see here. You can update the email address associated with an account if that is what you were trying to do?

Hi @jennaj,

Quick update-- I reviewed my prior data (now moved to my current account), and realized that I had previously used an older version of Diffbind (the most recent, previous version). I switched over to that version and ran the same datasets that I’d been attempting-- lo, and behold, it worked just fine. It seems my issue was with the newest version of Diffbind only…

Best,
Kaitlin

Hi @readlikeabook

Interesting result! Maybe the algorithm changed a bit between tool versions.

I’ll run some tests here to make sure more is not going on, so thank you for letting us know the details and I really glad you have this working now!! :star_struck: