Hi,
I have ChIpseq data, 4 samples (2 groups with 2 replicates each). I did SICER peak calling and tried to perform DiffBind. But I always get the same error. [1] “en_US.UTF-8”.
By the way, I tried the first condition with the name “Condition”, but got the same error.
The output from SICER is in a non-standardized bed format, so it is assigned the interval datatype. Even if the coordinates are adjusted to be “0-based, fully-closed start” (an option on the tool form, and you used that), the change applied only makes the first three columns of the data match strict bed format.
Diffbind as wrapped for Galaxy expects a strict bed input for peaks. If a similar tabular datatype is input, Galaxy will try to convert it to bed format at runtime. For SICER peak outputs, that is not enough, results in required information loss, and leads to the particular error you ran into.
If you want to use the SICER output with Diffbind, you’ll need to transform all peak input data to bed6 formatted datasets. Use tools in the “Data Manipulation” tool group. This will involve rearranging and removing existing data columns, plus filling some new column data in.
Once the 1) peak inputs are adjusted to have the content below, 2) the tool form is changed to designate the 5th column as the “score” value, and 3) the datasets are assigned to the bed datatype, the current tool error will be avoided.
The bed6 contents:
column 1-3 should be the original data (chrom, start, stop)
column 4 can be a default but not empty value (name, see the first FAQ below for accepted values)
column 5 should be the peak calling statistic you want to be used (score)
column 6 should be filled in with + for all lines for your use case (strand, set to forward)
One of the output files from SICER, “Test-W200-G600-FDR0.01-islandfiltered.bed”, was a strict bed6 file. So, I ran DiffBind using this file and the job was completed.
However, the number of regions in the result was unusually small, and when I checked the details, I found an error message in “Tool standard error” of “Job information”.
What should I do?
Hi Jennifer,
I uploaded the SICER (test-W200-G600.scoreisland) file to History after modifying it to bed6. I ran DiffBind using this file. I got an error message when using the sample named KO d14 (see dataset numbers 860 and 863).