Database for Kraken2

Hello, I want to use Kraken2 with the EuPathDB genome database. Is there a way I could custom the database or Galaxy team could install it? I noticed that Galaxy Australia has EuPathDB database for this tool, but is not updated so was not useful for me. Thanks!

1 Like

Hi, I am guessing you are referring to EuPathDB46? I’ll work on getting this installed on usegalaxy.org.

2 Likes

Thanks @nate

I made a tracking ticket here everyone can follow: Add updated EupathDBs to kraken2 tools at UseGalaxy servers · Issue #57 · galaxyproject/idc · GitHub

1 Like

Yes, EuPathDB46. It would be very helpful. I’ll be waiting, thank you!

EuPathDB-46 should now be available in Kraken2 on usegalaxy.org, please give it a try and let us know if you run in to any issues.

Hi, it’s running fine, but it is not showing as hits all of the strains genomes available in EupathDB46. In this file EuPathDB46_Contents.txt (ftp://ftp.ccb.jhu.edu/pub/data/EuPathDB46/EuPathDB46_Contents.txt) it shows Trypanosoma cruzi BrazilA4 genome (TriTrypDB-46_TcruziBrazilA4) as available inside EupathDB46 but when I ran BrazilA4 reads, the reads were never classified for Brazil A4 strain. It is curious that the only classifications shown for T. cruzi were the ones that has a TaxID number in NCBI (Taxonomy browser (Trypanosoma cruzi)) maybe this might be the problem? Thanks!

Hi @Aylla_Ermland

The index was sourced directly from the URL you show.

Would you like to share your test? I can help to review parameter settings that might be leading to your observations plus maybe find out more. How to share your history is in the banner at this forum, also here How to get faster help with your question.

You can generate the link and post it back here with your comments. Thanks! :slight_smile:

Of course, see my test in this link: Galaxy
Note that the file “SRR11803988.fastq” contains the original reads of Trypanosoma cruzi BraA4 strain which is included on EuPathDB-46. Thanks!

Great, thanks! I’m reviewing … I think I understand your observations but will ask questions if I’m not sure if that’s Ok :slight_smile: More soon!

1 Like

Hi @Aylla_Ermland

Thanks for your patience! Ok, this is what is going on.

The genome is in the database, but since it doesn’t have a TaxID, it is getting lumped into the unclassified “TaxID 0”.

The part you probably care about is that all genomes without a TaxID will have their “hits” clustered into that unclassified group. Species strains without a TaxID are not available for this tool to classify them against the NCBI Taxonomy tree.

As a test, I downloaded the genome that the Kraken2 index used, split the first chromosome into a sort of fake fastq file, then ran the tool. This is a “self-hit” test. If interested, I have shared that here.

Next next step is probably reaching out to the assembly authors to see what the taxonomy status is for the assembly. You could also try asking NCBI.

Hope this helps even if it is not a great answer, it confirms what you also noticed.

If there is anything I missed, please let me know!

Thanks Jenna! I will reach out NCBI. :slight_smile:

1 Like