I would like to take my contigs from Metaspades to run on the GTDB-Tk, what binning program should I use for input on GTDB-Tk?? I have WGS reads for which I’m trying to identify Mycobacterium Species. After pre-processing the paired reads, I mapped against a large Mycobacterium Database with Minimap and then assembled with MetaSpades.
We have a Galaxy tutorial with an example, plus a short description of what the others do. Maybe try and compare the results? You might notice that one works better for your species/data “better” than others.
And finally, maybe try to find a publication that focuses on your species domain to learn what challenges others have identified and how those were accommodated. You could also reach out at the Microbiome chat at the top of the tutorial listing to ask the Galaxy scientists working in this domain if they know of a preference (someone may have a specialization, or can refer you to the best scientific forum). Actually, I’m going to cross-post this topic over there to get this started, but feel free to join that chat directly, too!
Dear @Jon_Colman ;
we are currently working on a MAGs workflow using 4 different binners and Das Tool for Bin refinement mags-individual-workflow. Since we use a consensus approach we do not relay on finding the best binner for a specific target but retrieve the best MAGs from all binners.
This Workflow also includes GTDB-tk for MAGs taxonomy assignment. Maybe you want to try that for your data. In this workflow MEGAHIT is used for assembly, but you could remove this step and use your MetaSpades Assembly as input.
Feel free to reach out if you need any help with the workflow. If it works for you we would be happy to add your analysis as a use case for the project where this workflow is developed. Best, Paul
This looks interesting!!! So it looks like I’m taking a paired set of reads after trimming and adapter removal, and using that for input??? Which should eventually give me the GTDB-tk taxonomy?
This is what I’m currently doing, let me know if this makes sense?
I have some whole blood samples I’m working with that include numerous bacteria, to include mycobacterium and Plasmodium. From what I can tell, it appears that I may have a couple of species of Mycobacterium that aren’t in the Core-nt database (marked as minor contamination) the Plasmodium species has a newer reference done last year that’s not been put into the NCBI references yet. So what I’m doing is taking my cleaned reads and mapping with bowtie what “appears” to be in my samples, then using BBtools: Tadpole to error correct the mapped reads. I’m assuming from what I have read that Tadpole works well before assembly??
Is it recommended to remove the host reads first, or just run them all together??
The pipeline I shared is based on trimmed reads, we usually use fastp for trimming and adapeter removal, but this is usually a use choice.
However, I would definitely remove host reads, e.g. using bowtie2 since this can effect assembly significantly.