I`m recently analyzing ChIP-seq data for a specific TF derived from one control (Wt) and two ChIPed samples TF-GFP. My pipeline starts with Bowtie2 for the mapping of the reads (pair-end, 2x50, Illumina), and then MACS2 for the peak calling. Nevertheless, I do get plenty of statistically significant peaks from the MACS2 for each set of my analysis (Wt vs TF-GFP-1 and Wt vs TF-GFP-2), when I am trying to inspect peaks with >2.0 folds enrichment (as derived from MACS tabular file), those actual peaks seems to be pretty much at the same level (no fold enrichment) with the background (control(Wt) sample) in the IGB:
Vertical gray line shows the actual position of the Peaks as detected by MACS2. This is just an example of such an occasion where the MACS2 results do not seem to align with the actual visualization of the peaks; in the same datasets I can find plenty of them.
If there is a peak with 2.0 f.e. TF-GFP versus ctrl, recognized by MACS (with p<0,05) then this 2.0 difference should be visible as presented in the IGB browser as well. My problem is that for a great number of peaks this is not the case; in fact thousands of peaks with >2.0 f.e. (and all with p<0.05) seems to show no difference when I inspect the corresponding bigwig files in the IGB.
Am I doing something wrong? Is there anything else that could lead to such a discrepancy between the MACS results, and the actual visualization of them in the IGB?
So … you are stating that the data lines are correct but the visualization of that data seems off?
The “summits” are just part of the original peak’s genomic footprint. And the bamCoverage windows (bins) are another type of footprint. Maybe add the other MACS2 outputs to the display to see if that explains what is going on?
Other than that, maybe there is some issue with how IGB is interpreting your files. How does this same data look in IGV or UCSC?
@So … you are stating that the data lines are correct but the visualization of that data seems off?
Actually we cannot really say what is correct or what is wrong, can we? I probably will need another line of evidence for that. What surprises me though, is the fact that I have been analyzing several different datasets of ChIP-seq data, by using the same tools and I have never noticed something like this.
@ Maybe add the other MACS2 outputs to the display to see if that explains what is going on?
Could you please be more specific? What exactly should I do according to your opinion, what else from MACS outputs should I look or combine and how, in order to get an answer to my question? P-value and enrichment of the peaks are the most prominent ones as far as I know. Could you please give an example if that would be possible?
For this, try filtering your file down to just a few regions, and see if the values in the browser match your data file. That isolates the problem, and removes the browser from misinterpreting your data as a potential problem. Then, if you suspect the IGV browser is a problem, try to do the same in a different browser (UCSC, IGV). This isolates the problem even more (eg reproducible across browsers == file content problem).
This was just a broad suggestion … seeing all the data together might explain what is going on scientifically. I would personally load up all the data: inputs, reference annotation, mapped reads, any peaks – then reason through what manipulations were applied to better interpret my data (and potentially find actual data problems or just format issues, or clues about what parameter changes to explore).