Want to retrieve all bacterial sequences from whole genome

Hi all,

I would like to retrieve all bacteria sequences from whole genome. All I have is an assembled fasta file (contigs but not yet a long sequence genome). How could I retrieve the bacterial sequence from the contigs data I have?

Thanks
Santatra

Welcome, @santatra

I’m not sure if I have understood your question completely but you can clarify more about what you would like to do. Could you explain a bit more about your goals?

Meanwhile, I can share some analysis protocols through tutorials that may help to frame the kinds of questions we can answer here.

If you are completely new to Galaxy, this is a good place to start.

Then, this tutorial is an example of a bacterial genome assembly. It involves WGS and ONT reads.

You can find more tutorials using keywords or by navigating the training site directly.

Hope this helps! :slight_smile:

Hi @jennaj,

Thank you for your response.

I would like to know the bacterial community of my sample. However, my data (shot gun metagenomics reads) is still contigs (more than 50 contigs for each sample, and in fasta file). Do I need to assemble these contigs and transform into scaffolding before running the microbiome analysis or can I retrieve all bacteria from these contigs reads directly?

Thank you

Hi @santatra

There are a few tools you can use for metagenomic profiling. See that same training site for examples and the different tools you can try.

But in short, Kraken2 is usually a good choice for WGS reads. For amplicon, there is Mothur. Both are covered in the examples – along with all the other little steps – data preparation, assembly (if any), then result interpretation with graph generation.