Welcome, @Justine_Jay_Santonin
There wouldn’t be a single reference genome (yet) for these species, correct? So using HISAT2 wouldn’t be the best tool choice. Whether or not a particular gene hasn’t been annotated on that genome is another concern I have if I am understanding your question correctly … you would need to map your sample reads to a reference genome then look for overlapping coverage against the annotation with that method. Works great for some genomes, but not for others, it depends on your sample source (single species, yes, mixed species, no).
Maybe try to find a publication that does what you want to do? Then you’ll have a better idea of the reference data available for the tools others are using to explore the same kind of sample that you have – how to get from reads to assembly to annotation, and maybe expression/diversity insights.
This is just the first hit I found as an example. All of this example could be done in Galaxy. Some steps with the exact tools, some with analogous tools, and all could be bundled into a workflow. The reads were cleaned up, assembled, then run through prediction tools, compared to knowns, and summaries generated.
We have examples of analysis pathways in our tutorials.
- You could start here in you are new to Galaxy or bioinformatics → https://training.galaxyproject.org/training-material/learning-pathways/intro-to-galaxy-and-genomics.html
- Then explore pathways like this → Learning Pathway: Introduction to Galaxy and Ecological data analysis
- Or, topics this like this directly → Ecology / Tutorial List
- And these are the most similar to the publication I used as an example → Microbiome / Tutorial List. See the top of that listing for a link to a chat that will reach the scientists who created these and who can offer the best scientific guidance for the domain.
Then back to HISAT2, you can use any reference genome that you want to with this tool, as a Custom Reference Genome. This would work just like a native server index we happen to host. You will need to locate the public data provider that hosts the data, load that into Galaxy, prepare the format, and you’ll be ready to go. If you have technical problems, we can help with that at this forum to get you going again.
- https://training.galaxyproject.org/training-material/faqs/galaxy/reference_genomes_custom_genomes.html
- This is a recent topic with an example of all the data prepared in a history, ready to use with tools, including HISAT2. → Inquiry about lettuce genome - #2 by jennaj
Let’s start there, and you can explain more if I guessed poorly about what kind of data you have or what your larger goals are.