DiffBind_ChIP-seq_Error

Hi,
I have ChIpseq data, 4 samples (2 groups with 2 replicates each). I did SICER peak calling and tried to perform DiffBind. But I always get the same error. [1] “en_US.UTF-8”.
By the way, I tried the first condition with the name “Condition”, but got the same error.

Can you please help me with this.
Thanks

Hi @akiko,
could you share your history with me? I’ll have a look at it.

Regards!

Error Localization

Dataset 77495831 (bbd44e69cb8906b5b0c1f33cca20c0d9)
History 5490068 (259ea8208f94a88d)
Failed Job 694: DiffBind on data 198, data 97, and others: Differentially bound sites (bbd44e69cb8906b5c7f7671d99f250a8)

Hi Cristóbal,
did you find out anything about the cause of the error after that?

Hi @akiko,
sorry, I cannot access your history. Could you follow those instructions? Sharing your History.

Regards.

Hi Cristóbal,
here is the URL for my history.
https://usegalaxy.org/u/akikosaito/h/cutrun

Thank you very much.

Hi @akiko

The output from SICER is in a non-standardized bed format, so it is assigned the interval datatype. Even if the coordinates are adjusted to be “0-based, fully-closed start” (an option on the tool form, and you used that), the change applied only makes the first three columns of the data match strict bed format.

Diffbind as wrapped for Galaxy expects a strict bed input for peaks. If a similar tabular datatype is input, Galaxy will try to convert it to bed format at runtime. For SICER peak outputs, that is not enough, results in required information loss, and leads to the particular error you ran into.

If you want to use the SICER output with Diffbind, you’ll need to transform all peak input data to bed6 formatted datasets. Use tools in the “Data Manipulation” tool group. This will involve rearranging and removing existing data columns, plus filling some new column data in.

Once the 1) peak inputs are adjusted to have the content below, 2) the tool form is changed to designate the 5th column as the “score” value, and 3) the datasets are assigned to the bed datatype, the current tool error will be avoided.

The bed6 contents:

  • column 1-3 should be the original data (chrom, start, stop)
  • column 4 can be a default but not empty value (name, see the first FAQ below for accepted values)
  • column 5 should be the peak calling statistic you want to be used (score)
  • column 6 should be filled in with + for all lines for your use case (strand, set to forward)

FAQs:

Hope that helps!

Thanks for the reply.

One of the output files from SICER, “Test-W200-G600-FDR0.01-islandfiltered.bed”, was a strict bed6 file. So, I ran DiffBind using this file and the job was completed.

However, the number of regions in the result was unusually small, and when I checked the details, I found an error message in “Tool standard error” of “Job information”.
What should I do?

Hi @akiko

That output contains the mapped reads that contributed to called peaks, not the actual peaks.

Hi Jennifer,
I uploaded the SICER (test-W200-G600.scoreisland) file to History after modifying it to bed6. I ran DiffBind using this file. I got an error message when using the sample named KO d14 (see dataset numbers 860 and 863).