Plots but no intervals in DiffBind

Hi all, I’m running DiffBind on the galaxy.org server. Although my analysis fails when I compare two sets of triplicates, I still get an output on the Plots file, showing volcano plots with seemingly differentially accessible sites – has anybody experienced this before, and any idea what may be going on? Thank you!

Hi @alexbisias

Have you determined why the job was failing yet? I would focus on that first. While attempting to interpret partial or errored data can sometimes give clues about what went wrong upstream… that’s not a great way to solve underlying problems.

Instead, start at the start: make sure the inputs make sense (format/content), then interpret data results. We can help with that here but will need more details.

Hi @jennaj, thanks for the message, much appreciated! To be a bit clearer, I have three datasets in triplicate – let’s call them A, B, and C, each with the relevant bam and bed files. A-vs-B and B-vs-C have both given me results (differentially accessible peaks, with all associated output files as well), so I know the files have the right format/content, however A-vs-C is producing the issue I mentioned. I get the following message in the preview of the resulting Differentially Bound Sites file on the right-hand side of the screen:

“[1] “”
Warning message:
In Sys.setlocale(“LC_MESSAGES”, “en_US.UTF-8”) :
OS reports request to set locale to “en_US.UTF-8” cannot be honored
6h 6h 6hXDMSO 1 raw
6h 6h 6hXDMSO 1 raw
6h 6h 6hXDMSO 1 raw
6h 6h 6hXdTAG 1 raw
6h 6h 6hXdTAG 1 raw
6”

Nevertheless, the Plots output file, although it gives me the same error message in the preview, once I click on the “view” (eye) icon, I get a pretty PDF with the seemingly differentially accessible sites I mentioned. Also happy to share this dataset if this helps at all (I believe it is an option?). Many thanks again!

Hi @alexbisias

That message is just part of the job log, and looks totally normal. There are four (!) logs, and this is just the one that is exposed inside a dataset (“job information”).

Click into the “i” icon for one of the datasets to reach the three other logs. Does the stdout or stderr have anything more? You could post that back along with the content from the section right above with the input/parameter table, too. If you click on those datasets, a small “peek” view is visible. Combined, that view provides all of the information needed for most troubleshooting. You can copy/paste back, or screenshot, or generate a share link to your history and post it back.

This is in the banner here but maybe you cleared it already:

How to get faster help with your question

:mechanic: FAQ: What information should I include when reporting a problem?

Any persistent problems can be reported in a new question for community help. Be sure to provide enough context so others can review the situation exactly and quickly offer advice.

Consider FAQ: Sharing your History or posting content from the Job Information :information_source: view as described in FAQ: Troubleshooting errors.

Hi @jennaj, thanks again for your reply! I’d forgotten that there is more to the logs… I’ve now looked at stdout, which looks the same as in a successful job. The stderr looks a bit different:

Warning message:
In Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored
6h 6h  6hXDMSO  1 raw
6h 6h  6hXDMSO  1 raw
6h 6h  6hXDMSO  1 raw
6h 6h  6hXdTAG  1 raw
6h 6h  6hXdTAG  1 raw
6h 6h  6hXdTAG  1 raw
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Error in wilcox.test.default(toplot[[i]], toplot[[j]], paired = FALSE) : 
  not enough 'y' observations
Calls: dba.plotBox ... pv.plotBoxplot -> pvalMethod -> wilcox.test.default
Warning message:
In wilcox.test.default(toplot[[i]], toplot[[j]], paired = TRUE) :
  cannot compute exact p-value with zeroes

Whereas in a successful DiffBind it looks like this:

Warning message:
In Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored
6h 6h  6hXBRGi  1 raw
6h 6h  6hXBRGi  1 raw
6h 6h  6hXBRGi  1 raw
6h 6h  6hXdTAG  1 raw
6h 6h  6hXdTAG  1 raw
6h 6h  6hXdTAG  1 raw
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Warning messages:
1: In wilcox.test.default(toplot[[i]], toplot[[j]], paired = TRUE) :
  cannot compute exact p-value with zeroes
2: In wilcox.test.default(toplot[[i]], toplot[[j]], paired = TRUE) :
  cannot compute exact p-value with zeroes
3: In wilcox.test.default(toplot[[i]], toplot[[j]], paired = TRUE) :
  cannot compute exact p-value with zeroes

Are there just some, but not enough, differentially accessible sites in my dataset to carry out the Wilcox test (“Error in wilcox.test.default(toplot[[i]], toplot[[j]], paired = FALSE) : not enough ‘y’ observations”)?

I found this post from someone who had a similar problem (Diffbind error: Not enough y observations). Someone suggested either changing the FDR threshold to 0.1, or using a different tool entirely (macs2 bdgdiff) to identify differential peaks in that particular experiment.

I’m sharing the history with you (https://usegalaxy.org/u/alexbis/h/6h-abde3-atac) in case it is helpful. If you look at an “unsuccessful” output (files 19-24) there is in fact something in file 20 (Plots). Compare to, for example, files 25-30 where the operation was successful. In the unsuccessful DiffBind operation, if I weren’t getting anything in Plots at all I’d just assume it’s a matter of no differentially accessible peaks present – but that’s not the case. (I deleted the .bam files from the history recently for space saving purposes, but happy to re-upload them if this further helps you help me.) Thanks again for your patience.

Hi @alexbisias

This is the same data that I already reviewed through your bug report, correct? If so, I didn’t notice any technical problems, so agree this is now a scientific problem to explore and solve.

I would suggest also checking the Bioconductor support site that I pointed you to in the email (https://support.bioconductor.org for any others reading). The authors post those replies directly, and they know their own tools best :).

Hope this works out!

Hi @jennaj,

Thank you for the advice, it’s much appreciated! I will reach out for advice on the Bioconductor support website. I was just wondering one last thing: since there does seem to be a PDF with volcano plots of differentially accessible sites in the job which otherwise seems to be failing, is there a way to access the data that went into generating it, or is it outwith the scope of Galaxy? Many thanks again!

Hi @alexbisias

I just tool another look at one of your newer failures. These were purged but still accessible.

This is different from your prior runs. Notice how the ordering of the paired data between the peak and bam files differs with a group.

This is on the tool form but sometimes missed:

The input order of the BAM files for the samples MUST match the input order of the peaks files.

The tool “pairs” the first peak with the first bam, the second peak with the second bam, the third peak with the third bam…

Getting that ordering correct when using files outside of a collection is very tricky. Instead, you could put the input files into a collection before running the peak calling tool. That preserves the order.

I see that you are uploading these files from somewhere else, after peak calling. But you can still put your files into collections. I sent you instructions via email but these are the tutorials again: https://training.galaxyproject.org/training-material/search2?query=collection

You can try uploading the files using the tool form, and be careful about the order, and see if that works or not too.

Getting that sorted out might solve your problem, and would be required before you ask the Bioconductor team for scientific help (they will notice). The prior failed run I looked at had this same error message, but the files were ordered correctly (so, valid scientific result). This newer error has the files not ordered correctly, so is a technical problem.

To show exactly which run I reviewed this time, screenshot:

Job Details from Dataset 37 in the history you shared.

This output was not meaningful since it was based on miss-paired data. And, for the data that was input, the screenshots above show how those paired where organized and the parameters applied. The tool didn’t think it was “different” enough, and reported why.

Repeating myself but this is complicated :slight_smile:

  • For the run I looked at via email, the reasons were scientific. Correctly paired data with no differential expression.
  • For the run above, the reasons were technical. Not correctly paired, with no differential expression.
  • The tool only knows what you tell it to do … so not everything like this can be trapped with specificity. The scientist using the tool has to sort out what is going on, and is what most science forums are about!

And yes, since the job failed, the R files were not output – that is something we might be able to change, and is a good idea! but it wouldn’t be immediate. I’ll write up a ticket and post it back here.

Update: https://github.com/galaxyproject/tools-iuc/issues/5725

Hope that that helps!

Hi @jennaj, apologies for the late reply and thank you so much for this information, it’s all very much appreciated. I will try out your recommendations! And thank you very much for writing up that ticket.

1 Like