Hi Galaxy Community,
I’m analyzing Cut&Run data for a transcription factor with the following setup:
- Multiple time points with replicates
- IgG control processed identically as a BAM file as control for macs2
- Libraries have excellent quality (0.7% duplication rate before peak calling)
- Using MACS2 callpeak with BAMPE format and hg38 reference
- Current filtering: Using samtools filter with less than 120bp size cutoff for TF analysis as per CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis | Genome Biology | Full Text
- Also filtering for
“filters”: [
{
“id”: “1”,
“isProperPair”: “true”,
“mapQuality”: “>=30”,
“reference”: “!chrM”
}
]
Main Issues:
- Very low peak counts: Only 11-16 peaks per sample (expecting more)
- No motifs found: MEME-ChIP returns no significant motifs
- Also seeing low alignment percentage of about 50 - 70 across samples after bowtie2
- High duplicate reads in both sample and control - 40 to 50 per cent before and after alignment. I performed dedup but many CUTRUN analysis methods mention to retain duplicates.
Questions:
- Should I remove the <120bp size filter for Cut&Run peak calling?
- Are there better motif discovery tools than MEME-ChIP for novel TF targets with few peaks?
- Could MACS2 parameters be too stringent? Should I try
--broad
mode? - Is hg38 vs hg19 reference choice affecting peak detection?
- Any Cut&Run-specific Galaxy workflows recommended for low-abundance TFs?