How database assignments are used to access a fasta index: custom genomes, custom builds, server indexes

I’m currently encountering some issues while running a dataset through the Qualimap BamQC tool. Specifically, I’m receiving the following errors:

  1. Fatal error: Exit code 1
  2. Input mapping file not found

Has anyone experienced similar issues or knows why this might be happening? Why would the tool not have access to the mapping file? I’ve double-checked the file paths. Any insights or suggestions would be greatly appreciated. I also want to mention that we are hosting the site on AWS if that is important.

Thank you in advance for your help!

Welcome, @Choute

Thanks for including the part about AWS! Knowing that the error is occurring on your own server is important :slight_smile:

Some tool use the metadata assigned to input datasets to access server indexed reference files.

For this tool specifically, the metadata to pay attention to is the database assignment. The database key informs the tool about which fasta index (genome.fa.fai) to use during processing.

That fasta index can be a global reference accessible to all users of the instance or a custom reference specific to an account.

For a global reference, that could be an index that you created with Data Managers, or an indexed included in a mounted CVMFS resource.

For a custom reference, how to create one is described at

  • https://training.galaxyproject.org/training-material/faqs/galaxy/reference_genomes_custom_genomes.html
  • Remember that a custom reference is account-specific. This means if one person creates it, the index will only be available to them and not other users.
  • For reference genomes that will be used by multiple people, as the administrator you will need to decide whether to index the genome locally, or to put the fasta into a common location like a Data Library and share how to create the database key. The first will be much simpler for users who are collaborating, and easier for you to support.

Since where this was broken might not be trapped perfectly by the tool, you’ll need to check the entire chain:

  1. Confirm that the input BAM has a database assignment (required)
  2. If a global reference, check to see if that index is mounted correctly (or possibly, indexed correctly?).
  3. If a custom reference, check to see if that database is defined in the account the tool was run in.
  4. And, I’m not sure exactly how this tool traps potential mismatches between the assigned database reference, and the actual content of that reference’s index versus the reference the input data was based on. So I would look at that if the other steps do not resolve the problem. As an example, this guide explains about the technical variations in common human genome builds → Reference genomes at public Galaxy servers: GRCh38/hg38 example

Let’s start there! There are other items to check depending on the parameters used.