I have tried many permutations of running LUMPY on usegalaxy.eu, including using mapped files from both BWA-MEM and BWA-MEM2 for LUMPY pre-processing, ensuring that output files had content, and ensuring they had readgroups. These were all suggested as previous troubleshooting strategies in this forum.
The repeated error I get is:
[E::idx_find_and_load] Could not retrieve index file for ‘f0.bam’
Traceback (most recent call last):
File “/data/jwd02f/main/076/742/76742719/configs/tmpuayovkvu”, line 31, in
known_rg_records[(rg_record[‘ID’], rg_record[‘SM’])] = bam_file
Is there a possibility that the index file is not being correctly called? The BWA-MEM output has a .bai file associated with it. However, the ‘LUMPY preprocessing: Collated and groomed on collection ##’ does not appear to have a .bai file. I have tried each of these as the BAM input data set with the same error about the index file. Any help to correct this error would be appreciated. Thanks!
I ran some quick tests with really small testing data and these tools seem to be working as expected. I’ve shared that here if you would like to review and see if you can notice any differences between your run and my tests.
BAM data in Galaxy always has an index. That index is created when a BAM is created (mapping job) and is re-created with a BAM is loaded (Upload tool). So, you don’t need to upload the bai file or manage how tools find and use it.
That means I am a bit curious about how you are trying to use these tools (and any others that process a BAM input). Try selecting just the BAM file itself from the history as the input dataset. If you click on the little disc-icon to download the file, you’ll see that the index is there and linked in but you don’t need to handle it specially for other uses.
The sort order may have been a problem, too, but maybe the other error presented first, so we are not getting a message about that yet.
Summary of things to know about BAMs in Galaxy
Upload just the file.bam data. Galaxy will create the file.bam.bai index.
The file.bam.bai index will not show up as a separate dataset in the history but you can download it from the disc-icon for the parent file.bam.
BAM data are always coordinate sorted by Galaxy at Upload. Always let Galaxy “guess” the datatype. You will be expecting bam to be assigned. If you get something else, something went wrong! Is your file truncated?
Use tools in Galaxy to manipulate the bam more. Example: use Convert BAM to queryname-sorted BAM to create a qname_sorted.bam (“queryname”) sorted file, and a new datatype will be assigned by that tool.
Most simple format conversions can be found using the pencil-icon too.
Click on the eye-icon to see a peek view of the BAM header for quick review or for troubleshooting reasons.
What to try now
Upload the BAM again if needed
Sort as needed
Rerun this tool
If you still get an error, you can share back that history and we can troubleshoot more. See the banner at this forum for how to do that, or see here How to get faster help with your question. You can copy/paste that share link back in a reply to this topic.
Please let us know how this goes, and I hope this helps!
Thanks for looking into the error, I’m still having the same problem. The .bam file I am using is output from BWA-MEM2, mapping a cache of reads from SRA against a reference genome (Neurospora crassa OR74A in this case). I tried running the BWA-MEM2 output file through samtools sort, to see if that would change the LUMPY success. As before, it completed output with content via LUMPY preprocessing, and I used these files as input for LUMPY. My history generating this error is included:
"[E::idx_find_and_load] Could not retrieve index file for ‘f0.bam’
Traceback (most recent call last):
File “/data/jwd05e/main/076/776/76776175/configs/tmpqflzsz1l”, line 30, in
for rg_record in ibam.header[‘RG’]:
File “pysam/libcalignment”
history: Galaxy, and the latest attempt is item #282.
Any additional guidance would be welcome. Thank you!
Ok, thanks for sharing and it seems I missed the clue in your original error. The problem is not specifying a read group when running BWA-MEM2. It is pretty common for variant calling tools to expect one as it seems you know but I don’t see that actually applied in some of your BAMs.
This is the rerun-icon form for the upstream mapping job for the last error in your history. The mapping job listed next in the history is the same way. I didn’t check the others but you could if those also have problems in later steps.
AddOrReplaceReadGroups add or replaces read group information
Then, one final item is that you do not need to coordinate sort after mapping since that is already the “automatic” sort order used in Galaxy whenever the bam datatype is assigned. You can click on the eye icon on the dataset to peek at the headers to confirm. Maybe you were just troubleshooting here which is fine of course.
Please give that a try to see if it is enough! I had some trouble accessing some of the files and I’m not sure what exactly was happening, maybe sharing permissions – so just as a gentle reminder try not to interact with hidden datasets or hide data yourself – that tab is used to nest data into collections “automatically” and while you will have access, some strange situations can come up if the status is directly adjusted (but you can look/review all you want!).
Thank you so much for the guidance–I re-mapped my reads in BWA-MEM2 using readgroups, and that resulting file ran correctly through LUMPY. It would be very helpful to include that requirement either in the guidance for running LUMPY or in a forum topic tag, as that isn’t stated anywhere I found.
As for the hidden data–these datasets were pulled in from SRA using a galaxy tool rather than uploading them, so I didn’t originally intend to designate them as hidden.