Kraken 2 core nt database and large amount of unclassified reads?

Lily_ofthepond · December 29, 2024, 8:40pm

Hi there, is there a way to make the core_nt database available for Kraken 2 on the galaxy.eu server?

I have multiple metagenome samples from nanopore sequencing. When run through Porechop and fastp for QC the Kraken results for confidence=0.5 it yields between 85-90% unclassified.
Of course, this decreases with a decreased confidence screening, but it also yields unexpected results. This is a bioreactor consortium of environmental origin, I am mostly expecting soil bacteria.

I have tried all the 8GB, full, 2022, and 2024 Prebuilt Refseq indexes but got similar results. It is my understanding that using the core_nt database would help with this or show that the issue is in my samples. Running it natively is not an option at this time. Any recommendations?

Thanks

jennaj · January 3, 2025, 11:39pm

Hi @Lily_ofthepond

The UseGalaxy.org server also hosts the EUPathDB if you wanted to try it to see what happens. The EU server will host this soon, too but meanwhile you can move data between servers to access the distinct indexes.

The core_nt database is likely much too large to host at a public Galaxy server but I’ve logged the request with our team anyway to see what others think. Maybe there is something special that can be done in the future at the public sites.

Request for Kraken2 core_nt index hosting at UseGalaxy · Issue #60 · galaxyproject/idc · GitHub

If this is something you wanted to try yourself, the limitation is not Galaxy itself, but the attached cluster nodes that execute the public jobs. These are significant but this database index is truly large. Running Galaxy yourself (maybe the Docker version) and attaching it to a cluster node that can handle the job (maybe cloud based) is one idea.

If that interests you, please see → Galaxy Platform Directory: Servers, Clouds, and Deployable Resources - Galaxy Community Hub

Hopefully this helps!

bjoern.gruening · January 5, 2025, 3:06pm

Hi,

the core_nt database is now installed on the European Galaxy server.

Ciao,
Bjoern

Lily_ofthepond · January 6, 2025, 7:59pm

Thanks for the quick support! I will give all of that a try!

Lily_ofthepond · January 7, 2025, 5:36pm

Updating this: I tried the same analysis with the Core_nt database versus the Standard PFP and only 4% more of the reads (from 38% unassigned to 34%) were classified with no significant changes in taxonomical assignment. So it may not be more accurate. For both, I used confidence 0.05 and minimum hit group 3 on pre-treated nanopore metagenome reads.

This might be coming from our samples. Does anyone have any other suggestions?

jennaj · January 7, 2025, 6:09pm

Hi @Lily_ofthepond

Hum, a third of the reads are still unassigned. It seems you could explore these two areas.

Access the quality and content of your read samples.
- Examples are in this dedicated tutorial → Hands-on: Quality Control / Quality Control / Sequence analysis
Explore different parameter settings.
- Several tutorials in here work with Nanopore reads → Microbiome / Tutorial List

Then my last suggestion is to reach out to the Galaxy micro community scientists to see if they have more ideas. The link to their chat is at the very top of the tutorials above, and I’ve cross posted your question over there to get this started.

You're invited to talk on Matrix

Thanks!

paulzierep · January 8, 2025, 9:52am

Dear @Lily_ofthepond;
confidence score in kraken2 means, that taxa assigned with less than 5 % are dropped. This will remove all low abundance taxa. You might have many low abundance taxa in your samples. You could try to lower the confidence score or even set it to 0. Although kraken2 has a high false positive rate, 0.02 is often a good treat off. Here is a good discussion: Guidance on confidence score · Issue #265 · DerrickWood/kraken2 · GitHub

You could also try metaphlan, check if it assigns more reads and compare results.

Topic		Replies	Views
Kraken2 databases mixed up reference-index , server-open-issue	16	591	June 24, 2024
Kraken2 Databse for 16s full length metagenomics usegalaxy.eu support metagenomics	1	68	January 6, 2025
Adding nt_core db to Bracken and related workflow tools usegalaxy.eu support reference-index , kraken2 , bracken	7	59	February 4, 2025
Nanopore taxonomy analysis on MetaPhlan usegalaxy.org support metagenomics , tool-help , metaphlan	2	103	November 28, 2024
Updates for Kraken2 databases: Viruses and others reference-index , metagenomics , server-open-issue , kraken2	1	109	September 9, 2024

Kraken 2 core nt database and large amount of unclassified reads?

Related topics