Bedtools Intersect intervals results interpretation

I ran the bedtools Intersect intervals to find common peaks between 2 MACS2 broadpeak files. The resulting file has Columns as follows:

Chrom Start End Name Score Strand ThickStart ThickEnd ItemRGB BlockCount BlockSizes BlockSta

What are the values in columns 7, 8 and 9? Does any of them contain Fold change or qvalue?

Welcome @amaric

The format of the output is in BED12.

This is a format originally specified by UCSC, so this is the official description. It works the same in Galaxy. → https://genome.ucsc.edu/FAQ/FAQformat.html#format1

The format only describes the coordinates of an alignment, without any additional metrics computed, and existing metrics in inputs (not part of the strictest BED format) can be lost. That said, some tools may capture a simple whole number into the Score field. This is mostly used as a way to define how to color code the data in a browser display.

If you are interested other metrics – inspect the original files. Fold change and qvalue are with respect to the original peak files independently, and since there isn’t any “merging” of those values, you’ll need to consider other methods. For example: if you are only interested in common peaks for some range of fold change values, consider filtering the input files first, then intersecting. That batch will all represent the original score range of the two files that were input to it.

Does this make sense? Please let us know if it helps or if you have any follow up questions! :slight_smile:

Dear @jennaj

Thanks for you prompt help! Yes, that makes sense. I already filtered before intersecting the original files. I just wondered if there is a way to keep the information on FC, but I guess I will just retrieve that information through R.

One other dilemma, in the MACS2 broadpeak results file; the fold change is in column 7 and -log10(qvalue) is in column 9, right? It doesn’t say so explicitly, but by comparing broadpeak output with the tabular data output file, it seems so.

Hi @amaric

Glad that helped! And yes, you could export data the load into R wherever you usually do. But I can let you know that you could also use one of the join tools from the tool panel, or RStudio in Galaxy, or your own script inside a Jupiter notebook in Galaxy. Lots of options for custom data manipulation tasks. The benefit of keeping it all in Galaxy is the data provenance information and the ability to put everything into a simple workflow for the next batch of data – or even this set of data, if you want to just tune a parameter and see how those results compare to the original set. Workflows can be extracted from existing work, too.

How to. → Data Manipulation in Galaxy at GTN Materials Search (query=olympics)

Then, if interested in seeing how extracting a workflow can help, this is my favorite short demonstration. Worth a glance? → Hands-on: Galaxy Basics for everyone / Galaxy Basics for everyone / Introduction to Galaxy Analyses

Finally, for the data formats, I tend to use UCSC as a starting place for any of the “bed” style of data, especially when a tool itself doesn’t go into enough detail.

Starting here. → Genome Browser FAQ

Has a link to the specification, which is think is what you are looking for! → Genome Browser FAQ. If you want fold change values, you may want to have a look at this:https://groups.google.com/g/macs-announcement/c/i9tUydTElrE/m/2UoWdKGvbMoJ → Build Signal Track · macs3-project/MACS Wiki · GitHub

The google group for the tool is great reference! The author answered many of those directly and the discussions are about not only how to “correct errors” but the original scientific rationale behind the algorithm’s parameters/logs/results. → https://groups.google.com/g/macs-announcement. Another example: https://groups.google.com/g/macs-announcement/c/2ARZwLHzI28/m/9zHF_motfl8J

The top of a tool form has the Galaxy wrapper version, then see bottom of the tool form for the underlying tool and dependency versions. Track the Galaxy for usage/reproducibility reasons and the original tool version for understanding how that tool works. This is an older discussion for an example of the types of questions others want to answer that seem similar to yours. https://groups.google.com/g/macs-announcement/c/i9tUydTElrE/m/2UoWdKGvbMoJ. Those changes were layered in to the analysis package, maybe in a different tool, and how these work is captured in the wiki links at the author’s GitHub site and Google group. We have discussion here too but the focus is mostly on the technical usage.

Good questions and now lots of information! Hope this helps and isn’t too overwhelming. :slight_smile:

1 Like

You can retain information from specific column using bedtools map with the -c set to your column of interest. See: map — bedtools 2.31.0 documentation

2 Likes

That’s a good idea @vojtech !! :slight_smile:

How to find this in Galaxy (sharing since it might be hard to find since the flag is nested inside the operation a bit!)

Use the bedtools MapBed tool. Expand the section for Insert Applying operations and choose one of the “existing data values” options.