Kaiju database request

chuanzhai · May 22, 2026, 8:05am

Dear Galaxy Team,

I would like to run Kaiju using a reference database that includes bacteria, archaea, and eukaryotes, such as refseq_nr. Could you please help add this database? Thank you!

jennaj · May 23, 2026, 1:22am

Welcome @chuanzhai

The Kaiju indexes do not appear to under undergoing updates anymore (as of 2024). This is likely why the tool is only hosted at UseGalaxy.eu and not the other UseGalaxy servers.

Kaiju: Fast and sensitive taxonomic classification for metagenomics

Instead, please have a look at Kraken2 and related tools. We have many tutorials that can guide you through using them and the indexes are current from the same public source that others are using when working outside of Galaxy.

Galaxy Training Network

Tutorials including Kraken2: assign taxonomic labels to sequencing reads

Relevant Tutorials

Assembly / Decontamination of a genome assembly
Ecology / Checking expected species and contamination in bacterial isolate
Microbiome / Identification of the micro-organisms in a beer using Nanopore sequencing
Microbiome / 16S Microbial analysis with Nanopore data
Microbiome / Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition
Microbiome / Taxonomic Profiling and Visualization of Metagenomic Data
Sequence analysis / Quality and contamination control in bacterial isolate using Illumina MiSeq Data
Variant Analysis / M. tuberculosis Variant Analysis

Where we source the indexes → Kraken2 databases question - #2 by jennaj

We hope this explains the current situation and provides an alternative!

wm75 · May 25, 2026, 7:53am

@chuanzhai while the DBs not receiving updates will increasingly be problematic, I triggered installation of all the existing genomic DBs now on Galaxy Europe. They should be available as DB choices in the tool from tomorrow on.

Cheers,

Wolfgang

Jon_Colman · May 25, 2026, 10:37pm

Hopefully you can get this running on Galaxy. I have wanted this for a long time, I think it gives a more accurate classification than Kraken2.

jennaj · May 26, 2026, 6:07pm

Hi @chuanzhai and @Jon_Colman

The indexes appear to be in place! Please give the tool a try!

I didn’t go through and test each, so if either of you run into problems, please share back the full log messages and inputs/parameters and we can help to investigate!

Glad this could be done!

Jon_Colman · May 28, 2026, 7:20pm

I tried running Kaiju twice, using different databases, both failed??

jennaj · June 2, 2026, 9:46pm

Ah, ok, the quick addition was worth a try!

I was able to reproduce your use case with tool test data and found another small issue as well. I’ve ticketed these here → Corrections for kaiju_kaiju 1.10.1+galaxy1 · Issue #8045 · galaxyproject/tools-iuc · GitHub.

@wm75 is out right now but he’ll see this when he returns. Maybe there was some part of the nr nr_euk and refseq indexes that didn’t get replicated into the correct location for the working job directory to see it. The others are Ok.

Warning that the other issue I found will need to be corrected in order to use the same options that you applied if you want to try a different index. In short, “Enable SEG low complexity filter” need to be toggled to Yes or the job falls through to a different problem. The is technically supported by the underlying tool and I didn’t find a known issue so it may be spurious and something else is happening here.

Hope this helps and more next week!

Jon_Colman · June 2, 2026, 10:00pm

Yeah, I suspected some small issues. I didn’t want to spend too much time, as it was slow processing.

wm75 · June 9, 2026, 9:23am

Ah, yes. Not surprisingly, the tool has very different memory requirements depending on the DB. Our default is only good enough for the very small viral and pladmid ones.

We’ll need to configure per DB memory requirements. Give us a day or two to get this set up.

wm75 · June 10, 2026, 6:33am

All installed DBs on Galaxy Europe should now be working. Please report if there are remaining issues.

Jon_Colman · June 10, 2026, 6:57pm

I will give it a try!!

slghose · June 16, 2026, 7:36pm

Hello @jennaj and @wm75,

I ran Kaiju with the refseq_nr database (2024-08-13) on Galaxy Europe, and did not get any assignments to viruses or microbial eukaryotes (all assigned reads were to cellular organisms, and within that, only bacteria and archaea).

I know from running Kaiju with the nr_euk database and Kraken2 with the core_nt database that there are viruses and eukarotes in my samples. I am wondering if the refseq_nr database on Galaxy EU is incomplete?

Thanks for your help!

wm75 · June 18, 2026, 12:24pm

@slghose I’m not sure this is a technical issue with that DB on Galaxy Europe.
The files we have for it look ok, at least superficially, and there are no complaints from the tool either.

On the other hand it is entirely possible to have hits in refseq, nr and nr_euk that you don’t get with refseq_nr. Not sure how exactly sequences are selected for inclusion in the latter, but from

RefSeq non-redundant proteins :

“Non-redundant RefSeq protein records are currently provided for archaeal and bacterial RefSeq genomes, with the exception of selected reference genomes, by the NCBI prokaryotic genome annotation pipeline.”

So this matches well with your observation, doesn’t it?

slghose · June 18, 2026, 6:17pm

@wm75 Thanks for your response. I based my assumption that the refseq_nr database for Kaiju also included some viruses and microbial eukaryotes on this Kaiju documentation that describes the databases. There it says that refseq_nr from 2023-06-17 and 2024-08-13 should contain “Protein sequences from Archaea, bacteria, and microbial eukaryotes from NCBI RefSeq non-redundant protein collection, as well as viral protein sequences from NCBI RefSeq.” I used the refseq_nr version from 2024, so I think it should have some viruses/microbial eukaryotes in it.