GTDB-Tk Classify genomes

Does GTDB-Tk classify metagenomic contigs?

Welcome, @matamela27

Scroll down on any tool form to see the Help section. This will provide a short description about the tool, sometimes short samples or examples, links to the author resources, and citations/publications.

You’ll also find links to tutorials if this exact tool is included in any. To find related tools, browse the Galaxy Training Network (GTN) in your analysis domain of interest. https://training.galaxyproject.org/

Help

What it does

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes.

This tool accepts one or more fasta (genome) files and determines taxonomic classification of genomes by maximum-likelihood (ML) placement. The classification workflow consists of three steps: identify, align, and classify.

The identify step calls genes using Prodigal, and uses HMM models and the HMMER package to identify the 120 bacterial and 122 archaeal marker genes used for phylogenetic inference. Multiple sequence alignments (MSA) are obtained by aligning marker genes to their respective HMM model.

The align step concatenates the aligned marker genes and filters the concatenated MSA to approximately 5,000 amino acids.

Finally, the classify step uses pplacer to find the maximum-likelihood placement of each genome in the GTDB-Tk reference tree. GTDB-Tk classifies each genome based on its placement in the reference tree, its relative evolutionary divergence, and/or average nucleotide identity (ANI) to reference genomes.

Results can be impacted by a lack of marker genes or contamination.


Update: indexes are now added! For details please see gtdbtk_classify_wf Missing Database! - #4 by jennaj