Map With BWA-MEM not working with one particular reference genome

numbergirl86 · August 17, 2024, 7:27pm

I’m trying to use the tool Map With BWA-MEM with the reference genome “Human (Homo sapiens) (b37): hg19 Canonical”, but it is throwing the following error;

[E::bwa_idx_load_from_disk] fail to locate the index files
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from “-”

It’s odd because it’s working with every other reference genome I’ve tried.

colinbrislawn · August 18, 2024, 8:20pm

The core error suggests that the index files for the “hg19 Canonical” reference genome are missing or not in the expected location.

Can you check on the Index files on your Galaxy Instance?
They will have extensions like .amb, .ann, .bwt, .pac, and .sa and could/should exist in the same directory as the reference genome file (.fa).

The owner/admin of your Galaxy Instance may be able to check on these indexes for you.

You may be able to make the needed indexes yourself with a command like bwa index hg19.fa

igor · August 19, 2024, 2:54am

Hi @numbergirl86
I reproduced the hg19 Canonical issue. The server admins can fix it. Ping @Jennifer_J: an issue with hg19 Canonical and a similar issue with hg38 Canonical Female (I got an error with it on ORG).

In meantime you have several options. Use hg19 or hg19 Canonical Female. The whole hg19 has additional contigs, while Canonical Female contains all chromosomes excluding Y chromosome.

You can upload hg19 from UCSC Genome Browser and remove sequences you don’t need, basically, create a custom version of hg19.

Out of curiosity: why do you use Canonical genome from hg19 released in 2009? Can you used hg38 Canonical?

Kind regards,
Igor

numbergirl86 · August 19, 2024, 8:28am

Thanks,

I decided to use the Canonical Female reference. I wanted to use hg19 to create VCFs to merge with a couple of other VCFs I have that were aligned to hg19.

jennaj · August 26, 2024, 8:39pm

Update 1

Reproduced the BWA-MEM index problems with two of the genomes

hg19 Canonical
hg38 Canonical Female

More soon…

Hi @numbergirl86 and thanks @igor

I’m running some tests in this history to see if I can also reproduce the issue with the Hg19 Canonical index. Then we can proceed to a correction.

https://usegalaxy.org/u/jen-galaxyproject/h/test-bwa-mem-human-hg19-hg38-indexes

And, for variant analysis, the hg19 Canonical Female is usually a scientifically preferred choice, along with the hg38 Canonical Female. But all of these should be technically working!!

For the merging, you could check which was already used. That might help with the merging step too e.g. deciding out which header to attach to the final VCF. All should be based on the same genome assembly backbone (same actual basepairs) for anything retained but you can manipulate which specific chromosomes to keep, or how those chromosomes are labeled.

More soon as those tests complete and I investigate.

jennaj · August 30, 2024, 5:46pm

I’ve ticketed the issue here BWA-MEM corrupted indexes · Issue #53 · galaxyproject/idc · GitHub

Topic		Replies	Views
BWA-MEM Index can't be found -- Resolved usegalaxy.org support troubleshooting , transcriptomics , stringtie	8	2309	October 1, 2020
BWA-MEM built-in genome(s) usegalaxy.eu support bwa-mem	6	2975	September 8, 2020
bwa-mem does not map to entire reference genome reference-index , galaxy-local , data-manager , genome	10	3250	July 15, 2019
BWA-MEM Index Error - fail to open file server-admin , galaxy-local , data-manager	1	2143	September 24, 2020
local machine can't use reference genome usegalaxy.org support server-admin , tool-dev	10	1374	March 2, 2020

Map With BWA-MEM not working with one particular reference genome

Related topics