Cut&Run TF Analysis - Very Low Peak Counts Despite Good Library Quality

Hi Galaxy Community,

I’m analyzing Cut&Run data for a transcription factor with the following setup:

  • Multiple time points with replicates
  • IgG control processed identically as a BAM file as control for macs2
  • Libraries have excellent quality (0.7% duplication rate before peak calling)
  • Using MACS2 callpeak with BAMPE format and hg38 reference
  • Current filtering: Using samtools filter with less than 120bp size cutoff for TF analysis as per CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis | Genome Biology | Full Text
  • Also filtering for
    “filters”: [
    {
    “id”: “1”,
    “isProperPair”: “true”,
    “mapQuality”: “>=30”,
    “reference”: “!chrM”
    }
    ]

Main Issues:

  1. Very low peak counts: Only 11-16 peaks per sample (expecting more)
  2. No motifs found: MEME-ChIP returns no significant motifs
  3. Also seeing low alignment percentage of about 50 - 70 across samples after bowtie2
  4. High duplicate reads in both sample and control - 40 to 50 per cent before and after alignment. I performed dedup but many CUTRUN analysis methods mention to retain duplicates.

Questions:

  1. Should I remove the <120bp size filter for Cut&Run peak calling?
  2. Are there better motif discovery tools than MEME-ChIP for novel TF targets with few peaks?
  3. Could MACS2 parameters be too stringent? Should I try --broad mode?
  4. Is hg38 vs hg19 reference choice affecting peak detection?
  5. Any Cut&Run-specific Galaxy workflows recommended for low-abundance TFs?

Hi @skabra3,

Please, do not post identical messages to the forum. Your questions are about methodology, not Galaxy. You still may get some answers in this forum, but consider asking the same questions on bioinformatic forums.

  1. I expect hg38 and hg19 will return similar results.

Kind regards,
Igor

Got it. Thank you for your reply!