Freebayes errors out in v. 1.3.6 vs 1.3.1

Update

This was bothering me so decided to dig a bit more :slight_smile:

The root problem is one of these:

  1. The input BAM did not have a database assigned
  2. The input BAM has a different or mismatched database assigned
  3. The genome fasta was not originally indexed by using this Data Manager data_manager_fetch_genome_dbkeys_all_fasta, or manually indexed to do the same (involves more than just the fasta.fai index itself). All done without any conflicts like a duplicated dbkey or missing indexes (including indexes that point to other indexes :melting_face:)

Docs Galaxy Community Hub - Galaxy Community Hub

The “first step” for all newly indexed genomes is very important.

  • Galaxy has a built-in list of dbkeys for known/common genomes to enable consistent labeling across all (most?) Galaxy installs. Those keys are indexed themselves for UI functions (Upload tool, assigning/changing database metadata).

    • If you create a new dbkey it is either added to that master index of all dbkeys available to all users (when using the “fetch” DM) or to just that specific user’s custom version of that master dbkeys index (when creating a custom genome build).
    • For both, the fasta index is also created since the actual sequences are now available.
  • Duplicated dbkeys lead to all sorts of issues across tools and functions and are tedious to fix. If this seems to be the problem and the Galaxy install is new, is usually better to just start over with a fresh Galaxy base install. Then decide to use CVMFS, or your own local indexes, or both.

    • If a dbkey already exists, use that when fetching a new genome, or expect problems.
    • If a dbkey does not exist yet, you can create a new one for you local indexes, but it must be different from any that already exist. For brand new dbkeys, DM will run a few more steps, including updating the master dbkeys index.
  • dbkey is the technical label in files/tables for what is displayed as the database metadata in the end-user portion of the UI.

  • A known dbkey is the same reserved “key:value” pairing that data providers like UCSC also use to label specific exact assemblies. (all of it – the dbkey, the fasta title lines, AND the sequences).

  • A dbkey is what enables direct connections to/from external sites or applications. Some of those external resources also support the creation/use of custom dbkeys. So, if a key is an exact “match” between the applications is used, useful functions like dataset displays are possible – directly.

So, the specific error is produced by Galaxy and is related to data indexes. Why the two wrappers perform differently, I don’t know. But these are the relevant technical items beyond the indexes.



Hi @prao123

Maybe upgrade Freebayes to the latest version instead of some prior version?

Version 1.3.6 isn’t even hosted at some of the public servers. The most current (and recommended) version is 1.3.6+galaxy0. This is expected to be paired with the most current release of Galaxy 22.05 Releases — Galaxy Project 23.1.1.dev0 documentation.

All versions are stored in the ToolShed Galaxy | Tool Shed with links out to the development Github repository that tracks all the changes if you want to investigate more about exactly why 1.3.6 isn’t working. I don’t remember why and it may not matter if you can get the more current version installed, and it works.

Indexes can be tricky to produce directly. At a minimum, all distinct assemblies should have at least four core Data Managers applied, in order, then layer in tool-specific indexes as needed. I guessing that one of these steps was missed, or has a typo, or maybe an extra space, although the latter is much more common when NOT using Data Managers.

Did you know that indexes are available for a local Galaxy servers through CVMFS? training-material/search?query=cvmfs

The error you had can come up when the database metadata assigned to the BAM inputs (by an upstream mapping tool) is not an exact match for the build key (also called the dbkey) used to create/label all of the related built-in indexes. You can either adjust your current indexes to be consistently named, fix typos, etc – or easier – link in the CVMFS indexes. These include the full suite of hg19 and hg38 indexes (across tools) plus everything else you see indexed at usegalaxy.* sites. All is hosted from a remote file system, and just slices of data are used when called by a tool.

Hope that helps!