LUMPY index file error -- Add Read Groups to your BAM

Erin_Bredeweg · December 17, 2024, 5:40am

I have tried many permutations of running LUMPY on usegalaxy.eu, including using mapped files from both BWA-MEM and BWA-MEM2 for LUMPY pre-processing, ensuring that output files had content, and ensuring they had readgroups. These were all suggested as previous troubleshooting strategies in this forum.

The repeated error I get is:

[E::idx_find_and_load] Could not retrieve index file for ‘f0.bam’
Traceback (most recent call last):
File “/data/jwd02f/main/076/742/76742719/configs/tmpuayovkvu”, line 31, in
known_rg_records[(rg_record[‘ID’], rg_record[‘SM’])] = bam_file

Is there a possibility that the index file is not being correctly called? The BWA-MEM output has a .bai file associated with it. However, the ‘LUMPY preprocessing: Collated and groomed on collection ##’ does not appear to have a .bai file. I have tried each of these as the BAM input data set with the same error about the index file. Any help to correct this error would be appreciated. Thanks!

jennaj · December 17, 2024, 10:40pm

Welcome, @Erin_Bredeweg

I ran some quick tests with really small testing data and these tools seem to be working as expected. I’ve shared that here if you would like to review and see if you can notice any differences between your run and my tests.

https://usegalaxy.eu/u/jenj/h/test-lumpy-and-lumpy-preprocessing

BAM data in Galaxy always has an index. That index is created when a BAM is created (mapping job) and is re-created with a BAM is loaded (Upload tool). So, you don’t need to upload the bai file or manage how tools find and use it.

That means I am a bit curious about how you are trying to use these tools (and any others that process a BAM input). Try selecting just the BAM file itself from the history as the input dataset. If you click on the little disc-icon to download the file, you’ll see that the index is there and linked in but you don’t need to handle it specially for other uses.

The sort order may have been a problem, too, but maybe the other error presented first, so we are not getting a message about that yet.

Summary of things to know about BAMs in Galaxy

Upload just the file.bam data. Galaxy will create the file.bam.bai index.
The file.bam.bai index will not show up as a separate dataset in the history but you can download it from the disc-icon for the parent file.bam.
BAM data are always coordinate sorted by Galaxy at Upload. Always let Galaxy “guess” the datatype. You will be expecting bam to be assigned. If you get something else, something went wrong! Is your file truncated?
Use tools in Galaxy to manipulate the bam more. Example: use Convert BAM to queryname-sorted BAM to create a qname_sorted.bam (“queryname”) sorted file, and a new datatype will be assigned by that tool.
Most simple format conversions can be found using the pencil-icon too.
Click on the eye-icon to see a peek view of the BAM header for quick review or for troubleshooting reasons.

What to try now

Upload the BAM again if needed
Sort as needed
Rerun this tool
If you still get an error, you can share back that history and we can troubleshoot more. See the banner at this forum for how to do that, or see here How to get faster help with your question. You can copy/paste that share link back in a reply to this topic.

Please let us know how this goes, and I hope this helps!

Erin_Bredeweg · December 18, 2024, 1:42pm

Thanks for looking into the error, I’m still having the same problem. The .bam file I am using is output from BWA-MEM2, mapping a cache of reads from SRA against a reference genome (Neurospora crassa OR74A in this case). I tried running the BWA-MEM2 output file through samtools sort, to see if that would change the LUMPY success. As before, it completed output with content via LUMPY preprocessing, and I used these files as input for LUMPY. My history generating this error is included:
"[E::idx_find_and_load] Could not retrieve index file for ‘f0.bam’
Traceback (most recent call last):
File “/data/jwd05e/main/076/776/76776175/configs/tmpqflzsz1l”, line 30, in
for rg_record in ibam.header[‘RG’]:
File “pysam/libcalignment”

history: Galaxy, and the latest attempt is item #282.

Any additional guidance would be welcome. Thank you!

jennaj · December 20, 2024, 8:43pm

Hi @Erin_Bredeweg

Ok, thanks for sharing and it seems I missed the clue in your original error. The problem is not specifying a read group when running BWA-MEM2. It is pretty common for variant calling tools to expect one as it seems you know but I don’t see that actually applied in some of your BAMs.

This is the rerun-icon form for the upstream mapping job for the last error in your history. The mapping job listed next in the history is the same way. I didn’t check the others but you could if those also have problems in later steps.

You can correct this two ways:

Rerun BWA-MEM2 and use the option

Set read groups information?

Or, run a tool like

AddOrReplaceReadGroups add or replaces read group information

Then, one final item is that you do not need to coordinate sort after mapping since that is already the “automatic” sort order used in Galaxy whenever the bam datatype is assigned. You can click on the eye icon on the dataset to peek at the headers to confirm. Maybe you were just troubleshooting here which is fine of course.

Please give that a try to see if it is enough! I had some trouble accessing some of the files and I’m not sure what exactly was happening, maybe sharing permissions – so just as a gentle reminder try not to interact with hidden datasets or hide data yourself – that tab is used to nest data into collections “automatically” and while you will have access, some strange situations can come up if the status is directly adjusted (but you can look/review all you want!).

Hope this helps but let us know!

bjoern.gruening · December 28, 2024, 9:33pm

@jennaj Do we need to improve the tool help of lumpy? Or should we tag this question so that it appears in the tool form?

Erin_Bredeweg · December 30, 2024, 9:59pm

Thank you so much for the guidance–I re-mapped my reads in BWA-MEM2 using readgroups, and that resulting file ran correctly through LUMPY. It would be very helpful to include that requirement either in the guidance for running LUMPY or in a forum topic tag, as that isn’t stated anywhere I found.

As for the hidden data–these datasets were pulled in from SRA using a galaxy tool rather than uploading them, so I didn’t originally intend to designate them as hidden.

Thanks again for your help!

Topic		Replies	Views
Unable to Run Lumpy mapping	7	700	February 6, 2022
LUMPY problem bam mapping	0	277	February 6, 2022
Missing bam index when downloading bam file usegalaxy.eu support mapping	2	1268	November 8, 2019
BamLeftAlign error unable to find fasta index -- use "fasta" version of genome or natively indexed genome custom-genome , galaxy-local , data-manager , picard_markduplicates	22	2979	July 18, 2019
BWA-MEM Index can't be found -- Resolved usegalaxy.org support troubleshooting , transcriptomics , stringtie	8	2321	October 1, 2020

LUMPY index file error -- Add Read Groups to your BAM

Related topics