Extracting WGS reads belonging to a particular organism

I have the sequenced data of an environmental culture. It has around 3 16srRNA belonging to 3 different organism. I want to extract the reads belonging to each different organism. How can I do that?
Also when using Bowtie its not getting mapped to the reference genome of the 3 organisms at all. What to do?

1 Like

Hello,

Mapping to a reference genome with Bowtie is unlikely to capture 16s hits.

For how to classify 16s reads (and other analysis operations), please see the Metagenomics tutorials from the GTN.

Thanks!

OK. Thank you. Let me see the metagenomics tutorials

1 Like

So do you mean to say that , even when the 16s sequences in our genome is similar to the 16s sequences in the reference genome, bowtie wont show hits?

1 Like

It is better to use a dedicated 16s reference database (example: Silva).

Reasons why can include:

  • Full reference genomes can have ribosomal regions masked.
  • 16s reads do not map with specificity against a full reference genome.
  • 16s reads do not meet the minimum mapping criteria set (eg: short length, too much variation).
1 Like

But mine is not a 16s sequencing data. I am having whole genome sequence data.

1 Like

Thanks for clarifying, I thought this ^^ was what you had sequenced.

WGS reads can be mapped with Bowtie2 or BWA/BWA-MEM or BLASTN against a reference genome. Success involves doing some QA/QC on the reads, setting the correct parameters for the mapping tool used, and (possibly, if a built-in index is not available) making sure the custom reference genome is formatted correctly.

See the tool forms for abbreviated help and links to the full documentation. Parameters available on Galaxy tool forms are for the most part the same as when using the tools line command.

Many of the Galaxy tutorials include QA/QC and mapping steps. Help for Custom Genome formatting is here. Also please see: Troubleshooting resources for errors or unexpected results

I have tried Bowtie, Bowtie2 and localblast too. But very very low hits or no mapping found with all the 3 reference genomes