Reference Genome in some tools - Fully indexing genomes with Data Managers

Hi @tomlodz – just to add a bit to @wm75’s reply:

To avoid problems, index reference genomes using the four primary Data Managers first (the sam index is included). This ensures basic functionality across tools, and Galaxy’s built-in functions, that do not have a tool-specific Data Manager.

More help in the post below. It lists the four primary Data Managers that should be run on every genome and the order that tends to work best:

These four can be run in a different order or after other tool-specific DMs are run, but that sometimes leads to problems.

  • For this genome, run the sam index, plus Picard and the twoBit (the “fasta fetch” genome DM was already run when you indexed for the mapping tools).
  • When you index your next genome, run those four DMs first, in the recommended order, then any tool-specific DMs.

To review how others have solved various problems when creating indexes in prior Q&A, click on the “data-managers” tag I added to your post.

Thanks!

1 Like