Troubleshooting BWA-MEM2 resources under Docker Galaxy

Ann_Holtz-Morris · September 20, 2023, 8:11pm

Hi Jenna,
I ran into this same error.
With fastp cleaned data, I can use bwa-mem2 to run a dataset to 1 or 2 genes of the reference hg38. I then tried running the 48 pairs in the collection against the single gene. That works. but when I scaled to the complete genome, I get the same error (below) except I’m running in docker desktop, the latest bgruening/galaxy-stable Docker stable image on Ubuntu 20.04LTS, with 32GB RAM, 4TB SSD and 2 1TB hard drives. My system says I’m using half the RAM and 10 of the cores.

Details

Execution resulted in the following messages:

Fatal error: Exit code 1 ()

Tool generated the following standard error:

Looking to launch executable “/export/tool_deps/_conda/envs/mulled-v1-88bfe9d3fb5d8ab3673a5b08b613f2c0d466656f329fd172728c59fa3917261d/bin/bwa-mem2.avx2”, simd = .avx2 Launching executable “/export/tool_deps/_conda/envs/mulled-v1-88bfe9d3fb5d8ab3673a5b08b613f2c0d466656f329fd172728c59fa3917261d/bin/bwa-mem2.avx2” [bwa_index] Pack FASTA… 14.26 sec * Entering FMI_search init ticks = 262953660145 ref seq len = 6418597448 binary seq ticks = 147693711887 Allocation of 47.82 GB for suffix_array failed. Current Allocation = 53.80 GB

Regards,
Ann

jennaj · September 20, 2023, 9:24pm

Hi @Ann_Holtz-Morris

What is your reference genome? Or is it an exome? If there is a public link to the fasta data, please share that for context, and maybe some potential workaround.

And, you are currently using the Custom genome function, correct? A fasta file from the history?

Indexing the fasta is probably the solution. The indexing step can be computationally expensive, more so than the actual alignment step, and we had some trouble with BWA-MEM2 indexing too (close to being resolved). Plus that would avoid needing to spend compute time recreating the index each time you align against that reference.

This looks very similar to the errors we had when attempting to index the human genome originally. The root problem was lack of memory on the cluster node where the job ran. I don’t recall the details but we can find those if needed e.g. how the memory scales for resource estimates.

Data Managers: how to index local data and how to incorporate pre-computed indexes hosted at public servers: https://training.galaxyproject.org/training-material/search2?query=cvmfs

Working group’s repository for indexing data (new!). GitHub - galaxyproject/idc: Simon's Data Club - Reference data for Galaxy servers

Topic		Replies	Views
BWA MEM2 too slow server-admin , workflow , galaxy-local , tool-help , variant-analysis	9	52	April 2, 2025
local machine can't use reference genome usegalaxy.org support server-admin , tool-dev	10	1375	March 2, 2020
BWA-MEM Index can't be found -- Resolved usegalaxy.org support troubleshooting , transcriptomics , stringtie	8	2313	October 1, 2020
BWA-MEM results in "An error occurred setting metadata for this dataset" usegalaxy.org support metadata , mapping	10	2794	January 21, 2019
Map With BWA-MEM not working with one particular reference genome usegalaxy.org support reference-index , server-open-issue	5	143	August 30, 2024

Troubleshooting BWA-MEM2 resources under Docker Galaxy

Details

Related topics