Adding the tardigrade (H. exemplaris) genome to Galaxy

Hello,

Would it be possible to add the tardigrade genome to the built-in list of genomes?

I help maintain a genome browser (Integrated Genome Browser) that has been integrated into Galaxy such that users can click “display in IGB” on data files in their History and visually analyze their data. However, this only works if they can assign a database/build to their data which is why having the tardigrade genome already built in would be very helpful. For reference, I’ve already tested adding the genome to my local Galaxy as a custom reference genome and it worked well, so users should also be able to do this in the meantime.

There is an annotated scaffold-level genome assembly for the tardigrade H. exemplaris available on NCBI which I will link here. We ran our own script to convert the .gff3 file to .bed so that it’s formatted correctly for the browser – I’d be happy to share that file with you.

There are also quite a few more genomes that are available in our browser as well as UCSC’s Genome Broswer and possibly IGV that aren’t available in Galaxy (e.g., giant panda, American eel, Candida albicans, etc.), so if you’re able to incorporate the tardigrade genome, I’d be happy to help identify more genomes to add.

Please let me know if I can provide any more information or answer any questions!

All the best,
Paige

1 Like

Welcome, @paige_kulzer

We are about ready to start adding more genomes to the CVMFS resource, and that will populate the public servers and any local servers that also use it.

We don’t have a set process for organizing the additions yet, and are still testing out the process with a few genomes, but if you want to take a look at the project, this is our Github repository → Issues · galaxyproject/idc · GitHub

If you click into the Pull Requests, see the non-UCSC genomes for the minumum information needed per genome. This would be need for each of your genomes, along with the URL to the fasta file for the genome.

dbkey: Ecoli-O157-H7-Sakai
description: "Escherichia coli O157-H7 Sakai"
id: Ecoli-O157-H7-Sakai

So – we could do this two ways:

Use Github

  1. Create an issue ticket at that repository and list out those three pieces of information for each genome, plus the fasta URL.
  2. If something else is needed, we’ll let you know on the ticket.
  3. Then, when we are ready to accept PRs, you might be able to submit these yourself.
  4. The reference annotation is the tricky part – we are not currently keeping that in CVMFS for any genome, but that could still change. I would suggest including the reference annotation URL link along with the genome fasta. Maybe it can be used.
  5. If we can’t index the annotation, then the best recommendation is to keep it in a Data Library on your server.
  6. Or, you could possibly incorporate the annotation URL in the genome long description, if stable and unlikely to change (often true for some genomes). See the current file format for how some of the others are organized.

Use Galaxy Help

The alternative is to post back that same information here.

The point is if you decide on the important keys in the request (list in a post here, or issue ticket at Github), it will be faster for someone to add them in later (might be me). You could also get involved with the IDC project and create PRs directly! We are looking for community involvement for the curation project overall but that is still in development.

And, yes, using custom genome builds is how to get a new dbkey in common between the different apps, but those are per-account. You could also index the genomes for your server locally using Data Managers. But getting all of it into CMVFS is the larger goal, with community help just like what you are doing now. :rocket:

Thank you so much for the quick reply and thorough instructions!

I’ve gone ahead and opened a new issue ticket in the GitHub repository that you linked to. That can be found here. It contains all of the information you needed (dbkey, description, id, and fasta URL), plus links to NCBI where you can find and download reference annotations.

I’m excited to see more genomes being added to Galaxy! :grinning:

1 Like