Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting

jennaj · May 29, 2019, 8:30pm

Review here, has many details, more than in my original post, including a video: Galaxy Community Hub - Galaxy Community Hub

But I still strongly recommended running the DMs in the order I suggest. Let that first set finish completely, one by one. Then can run other indexes in batch but only if you have enough resources allocated on your Galaxy to do that (run multiple high-memory jobs, and space to store the results). Some indexes can be imported from CVMFS pre-computed – the DM form will note that if available.

Epherimus has a “data manager” mode for batch work and is a bit more work to set up. If interested, see: Welcome to the Ephemeris documentation! — Ephemeris 0.10.3 documentation

The tools are under the “Admin” top masthead link. See the first top section. Tools along with the associated logistical (“loc”) files created by them and other related data. I like to create one new history for data manager runs and make that active, before running any of them. That way I keep all the runs together someplace I can refer back to them. I tend to do this in batches and name/date the histories so are easy to find.

You will have trouble with the wheat genome as-is, no matter where you are working. The PLAZA resource is where you should get the data (genome + matched up reference annotation). We have not added this to CVMFS yet (the core data repository) but this ticket explains what we want to do and includes info and links from the data organizers: Add PLAZA (plant) genomes to test, main, and cvmfs · Issue #187 · galaxyproject/usegalaxy-playbook · GitHub