Map With BWA-MEM not working with one particular reference genome

I’m trying to use the tool Map With BWA-MEM with the reference genome “Human (Homo sapiens) (b37): hg19 Canonical”, but it is throwing the following error;

[E::bwa_idx_load_from_disk] fail to locate the index files
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from “-”

It’s odd because it’s working with every other reference genome I’ve tried.

The core error suggests that the index files for the “hg19 Canonical” reference genome are missing or not in the expected location.

Can you check on the Index files on your Galaxy Instance?
They will have extensions like .amb, .ann, .bwt, .pac, and .sa and could/should exist in the same directory as the reference genome file (.fa).

The owner/admin of your Galaxy Instance may be able to check on these indexes for you.

You may be able to make the needed indexes yourself with a command like bwa index hg19.fa

Hi @numbergirl86
I reproduced the hg19 Canonical issue. The server admins can fix it. Ping @Jennifer_J: an issue with hg19 Canonical and a similar issue with hg38 Canonical Female (I got an error with it on ORG).

In meantime you have several options. Use hg19 or hg19 Canonical Female. The whole hg19 has additional contigs, while Canonical Female contains all chromosomes excluding Y chromosome.

You can upload hg19 from UCSC Genome Browser and remove sequences you don’t need, basically, create a custom version of hg19.

Out of curiosity: why do you use Canonical genome from hg19 released in 2009? Can you used hg38 Canonical?

Kind regards,
Igor

2 Likes

Thanks,

I decided to use the Canonical Female reference. I wanted to use hg19 to create VCFs to merge with a couple of other VCFs I have that were aligned to hg19.

1 Like

Update 1

Reproduced the BWA-MEM index problems with two of the genomes :confused:

hg19 Canonical
hg38 Canonical Female

More soon…


Hi @numbergirl86 and thanks @igor

I’m running some tests in this history to see if I can also reproduce the issue with the Hg19 Canonical index. Then we can proceed to a correction. :hammer_and_wrench:

And, for variant analysis, the hg19 Canonical Female is usually a scientifically preferred choice, along with the hg38 Canonical Female. But all of these should be technically working!!

For the merging, you could check which was already used. That might help with the merging step too e.g. deciding out which header to attach to the final VCF. All should be based on the same genome assembly backbone (same actual basepairs) for anything retained but you can manipulate which specific chromosomes to keep, or how those chromosomes are labeled. :slight_smile:

More soon as those tests complete and I investigate.

1 Like

I’ve ticketed the issue here BWA-MEM corrupted indexes · Issue #53 · galaxyproject/idc · GitHub

1 Like