Hello, I want to use Kraken2 with the MAARJAM database as reference database. Is there a way I could custom the database or Galaxy team could install it?
Welcome, @khadija
Would this be the correct source for the database index?
- File source https://maarjam.ut.ee/ (“Download” link)
It appears to be open source, but I see a few topics at other forums about needing special permissions from the authors to use it. Maybe I am misunderstanding, but the data would need to be open source for everyone for us to be able to host it on a public Galaxy server.
The alternative would be to set up a Galaxy server and create a custom index there for your own use.
Let’s ask for more advice. Hi @wm75 – what do you think?
Hi @khadija,
License-wise I don’t see a problem with the MAARJAM sequence collection as the downloaded sequences seem to all have genbank IDs.
However, I’m no expert for fungal taxonomy, but the MAARJAM sequence collection seems to have a really narrow focus on rRNA sequences of Glomeromycota.
My naive expectation would be that the SILVA database contains most of these sequences and many more from other taxa and should almost always be a better choice than anything we could build from MAARJAM.
I checked the SILVA Search page for taxonomy Glomeromycotina and it yields 19,000 hits in their LSU (large subunit) database and almost 74,000 hits in their SSU (small subunit) database, which is more than the number of sequences I can download from MAARJAM.
I need to stress again though that I’m not an expert so please feel free to correct me and/or explain why you think you need the MAARJAM sequences for your specific research.
Only thing to be aware of is that the kraken2 SILVA database is built from “just” the SSU sequence collection of SILVA, not the LSU ones, which may leave some reads unclassified depending on what the source of your sequencing data is.
I am working on ITS sequences of Arbuscular mycorrhizal fungi, have tried Silva and fungi genome (2019) present on galaxy but major part of the sequences classified as ascomycota the rest remained unclassified. As per my requirement MAARJAM is best suitable for AMF ITS sequences.
if possible, recommend any other database from the server according to my requirement ?
Well, it could be that the MAARJAM ITS sequences is what you want then, but it still depends on what your input looks like.
Since that set of sequences is so narrowly defined, anything in your input, which is not Glomeromycota ITS sequence will remain unclassified or worse get misclassified.
Somewhat different line of thought:
are you aware of this tutorial: Hands-on: Identifying Mycorrhizal Fungi from ITS2 sequencing using LotuS2 / Identifying Mycorrhizal Fungi from ITS2 sequencing using LotuS2 / Microbiome ?
Maybe this is what you are looking for?