Different total numbers of regions after bedtools Multiple Intersect function

PARKS98 · September 30, 2021, 7:46pm

I am using “Multiple Intersect function” to identify # of regions overlapping amongst three different ChIP-seq narrow Peaks regions.

When I get the results for “multiple Intersect function”, the Total # of regions for each ChIP-seq are different from my input file. There are more regions than there were originally for all three data files.

For example, one of the three files had 63,251 regions, but “mutliple intersect function” results suggest that there are 105,417 regions. This is the number I counted all the "1"s for my file. Am I doing something wrong?

Flow · October 4, 2021, 9:12am

Dear @PARKS98,
Just to clarify, becasue I am unsure what tool you have used. Was it bedtools Multiple Intersect? Please have a look at the example, for that tool, to see why you have more “regions” in your result file. The tool outputs subregions based on the intersections of your input files. That means, your original regions are probably split into mutliple regions in your result and thus you have more lines in the output than in the input.

I hope I could help and best wishes,
Florian

PARKS98 · October 4, 2021, 2:55pm

HI Florian,

Thanks for getting back to me. Yes, my question is on bedtools multiple interect. Maybe I am using this tool incorrectly, but what I did was-

Input – I input three different ChiP-seq MACS narrow peaks bed files because I was interested in their overlaps. ( I could have used Intersect, but I had three inputs instead of two).

Output – Yes, it seems like the output file gather all regions from three input files. My question was that when I counted all the “1”s for each input files, they were more than the original input files. I am sorry but I still don’t get how splitting into multiple regions give more results. Do you mean for instance one interval can be split into two or more during this bedtools multiple intersect?

Best,

Kevin

Flow · October 4, 2021, 4:37pm

Yes, as shown in the example.

a.bed

chr1 6 12
chr1 10 20
chr1 22 27
chr1 24 30

b.bed

chr1 12 32
chr1 14 30

c.bed

chr1 8 15
chr1 10 14
chr1 32 34

Example adding a header line:

chrom start end num list a.bed b.bed c.bed
chr1 6 8 1 1 1 0 0
chr1 8 12 2 1,3 1 0 1
chr1 12 15 3 1,2,3 1 1 1
chr1 15 20 2 1,2 1 1 0
chr1 20 22 1 2 0 1 0
chr1 22 30 2 1,2 1 1 0
chr1 30 32 1 2 0 1 0
chr1 32 34 1 3 0 0 1

The first region of a.bed is split into two regions in the result file (bold), because of the overlap with the other files.

The tool which you use identifies common intervals, but you want more common regions.

Better use the tool bedtools Intersect intervals. See here for a better description of the tool. The tool allows for multiple files.

Best wishes,
Florian

Topic		Replies	Views
bedtools Multiple Intersect usegalaxy.eu support bedtools , tool-help	1	106	April 4, 2024
bedtools intersect intervals usegalaxy.eu support chip-seq , rna-seq	5	274	June 21, 2024
Usegalaxy, ChIPSeq, bedtools Intersect intervals, Blacklist regions usegalaxy.org support epigenetics	0	657	August 14, 2020
bedtools multicov how can i get bed file?	0	453	June 16, 2019
How to filter SNP in usegalaxy filter , usegalaxy , variant-analysis	1	602	March 24, 2021

Different total numbers of regions after bedtools Multiple Intersect function

a.bed

b.bed

c.bed

Example adding a header line:

Related topics