Workflow specialized for eukaryotes using metagenomes

Marius_Sanders · October 4, 2025, 7:10pm

Hello everyone,

I’m really sorry to have to ask, as I don’t want to just offload my problems, but I’m currently stuck.

I’m writing my master’s thesis and wanted to prepare some data for it.

I’m actually writing a purely literature-based paper on the influence of tire abrasion on eukaryotic microbes. I’ve now found a dataset in the NCBI that uses metagenomics to collect data on precisely this topic and only evaluates prokaryotes. I have now started trying to evaluate the eukaryotic microbes. However, I have several problems:

1. I have doubts about my workflow and cannot find a suitable database in Kraken2 that delivers good results. My workflow so far has been to download and extract reads in FASTQ, fastp, FilterwithSortMeRNA, and Kraken2. I am unsure whether this is suitable for a statistical evaluation of diversity.

2. I have been looking for a better way, but I cannot find a tool that initially excludes all prokaryotes. This would reduce the amount of data. Or another tool that is well suited for eukaryotes.

3. I am unsure whether, for the scientifically correct evaluation of alpha and beta diversity using R, subsamples of equal size should be formed first.

I apologize for my novice questions, but I am only studying to become a teacher and am still unfamiliar with working with the tools and data.

EDIT: Or is it perhaps even possible to integrate a tool such as EukDetect?

igor · October 7, 2025, 3:38am

Hi @Marius_Sanders,

Can you filter out prokaryotic reads using Kraken2 and the standard database or mini-d? Enable Split classified and unclassified outputs option. You are after unclassified reads.

Description of Kraken2 databases: Index zone by BenLangmead

Kraken2’ PlusPF contains Standard plus Protozoa and fungi. It is available in Europe including two mini-versions, PlusPF-8 and PlusPF-16. Maybe this one can work for you directly, without read filtering. Try mini-versions of PlusPF. The same data for species coverage, but less k-mers. The results should be consistent with corresponding full-sized database.

It is OK to have account on different public Galaxy servers, one account per user per server.

Maybe someone else will comment on other questions.

Kind regards,

Igor

jennaj · October 8, 2025, 9:55pm

Hi @Marius_Sanders

Have you seen our Metagenomics tutorials at the GTN training site?

Start here → Microbiome / Tutorial List

From there, I wonder if this pathway might be of interest. It explores diversity with a bit of a wider net. → Learning Pathway: Introduction to Galaxy and Ecological data analysis

Hope this helps!