Hello @candyc
Thanks for sharing the history!
Yes, this error is a bit confusing. The tool is complaining about unexpected data content within the genome index. This comes from MACS2 callpeak itself and indicates that the tool found unexpected mapping positions it couldn’t parse when building up the internal data structures.
This can sometimes be from a genome mismatch but for your case, it has to do with including the haplotype/fragments in the genome when mapping. Using the canonical reference instead is the solution.
GTN Example → Hands-on: CUT&RUN data analysis / CUT&RUN data analysis / Epigenetics (#hands-on-mapping-reads-to-reference-genome)
Two solutions
You may get a slightly different result between the two, since the original competition for a mapping position will be a bit different. But that should be minor due to the other strict mapping quality filters applied. Your choice!
-
Remap the reads against the hg38 canonical reference genome.
-
Or, you can use one of the filtering tools on the BAM or BED files to restrict the reported regions to the primary autosomes + sex chromosomes (and sometimes just chrX).
Please give that a try and let us know if it helps! ![]()

