Ah, yes, you’re right (sorry for not reading the question carefully the first time).
hg38
should be identical to hg38 Full
, but it’s different from hg38 Canonical
.
So, no, you cannot use hg38 Canonical
as the reference in a bwa-mem
alignment.
That said, you should also not do that anyway. Aligning against just canonical chromosomes can cause misalignments of reads that originate from non-canonical sequences, simply because there is no better match for them than a stretch of canonical sequence.
The better approach is to align against the full genome, then eliminate non-canonical mappings by filtering or by using the canonical genome during variant calling. Agreed, this can be tricky depending on downstream tools. Some variant callers, for example, may refuse to work with sequences mentioned in the input BAM header that are not found in the reference genome, in which case you would have to reheader your BAM dataset first.
All in all, my recommendation would be to rerun your bowtie2
jobs using the full hg38 version, not to look for solutions for making bwa-mem
work with the canonical version.
At the same time, it’s probably true that we should offer hg38 Canonical
for bwa-mem
if we do so for bowtie2
so thanks for bringing this up.