BWA-MEM built-in genome(s)

Ah, yes, you’re right (sorry for not reading the question carefully the first time).
hg38 should be identical to hg38 Full, but it’s different from hg38 Canonical.
So, no, you cannot use hg38 Canonical as the reference in a bwa-mem alignment.

That said, you should also not do that anyway. Aligning against just canonical chromosomes can cause misalignments of reads that originate from non-canonical sequences, simply because there is no better match for them than a stretch of canonical sequence.
The better approach is to align against the full genome, then eliminate non-canonical mappings by filtering or by using the canonical genome during variant calling. Agreed, this can be tricky depending on downstream tools. Some variant callers, for example, may refuse to work with sequences mentioned in the input BAM header that are not found in the reference genome, in which case you would have to reheader your BAM dataset first.

All in all, my recommendation would be to rerun your bowtie2 jobs using the full hg38 version, not to look for solutions for making bwa-mem work with the canonical version.

At the same time, it’s probably true that we should offer hg38 Canonical for bwa-mem if we do so for bowtie2 so thanks for bringing this up.

1 Like