I am very new to Bioinformatics. Recently, I have the project that aims to perform taxonomic anlysis on raw reads from mixed samples taken from environment.
For example, we perfromed NGS (whole genome) on a insect and we want to identify the taxonomic of every symbiotic bacteria in the raw reads.
Currently, I can use Kraken2 to perform the analysis. However, I have following few questions.
How can I focus only on the bacteria and remove the rest of data and make a summary table or visualization (the percentage of each bacteria strain).
Because in the future, we will switch to the mixed environmental samples and focused on 16s rRNA, I would like to know how to perform the taxonomic analysis by identifying 16s rRNA first from the raw reads and make analysis that focuses only on 16s rRNA. Because mixed environmental samples will contain not only bacteria but also other eukaryotic DNA reads, I want to identify them and analyze them later to reduce the process time.
Followed by that, what process and tools shoud I use?
I found the tutorial: " 16S Microbial Analysis with mothur" however, I tried with the current data of NGS data on insects, it took so much time on making contigs of non bacterial reads. I am wandering if there is any methods that can get rid of reads that is non bacterial in the first hand.
Additionally, I found other tools such as RNAmmer, barrnap, prokka. However, these tools seems to be only accepting bacterial whole genome but not mixed reads.
If you can share some experience and good workflow or tools to try, I will very appreiate that.
Thank you very much for your great help.
The samples are mixed so all of that would happen at the same time, correct? Or am I not understanding correctly?
Are the jobs just taking a long time to process? If they failing because of this at one server you could try at a different server that offers longer runtimes, UseGalaxy.eu is one choice.
The parts where you are making the contigs is what will allow you to find all the reads that may belog to species that you are not interested in. A regular pipeline where you would map to filter out probably wouldn’t work for this step, or not that I know of. You could check literature sources for alternative ideas though.
For this, you could filter the Kraken2 results before the visualization steps.
You could do this directly on the output report with data manipulation tools, or use a tool like this to filter the entire read dataset Krakentools: Extract Kraken Reads By ID Extract reads that were classified by the Kraken family at specified taxonomic IDs
@jennaj
Thank you very much!!
Yes, my sample is taken from environments so many unknown species will be mixed in the sample. We will want to extract the component of bacteria from the sample.
I tried UseGalaxy.eu and it is working great.
I tried the alternative method, (1) by identifying rRNA first (barrnap) and perform taxnomic analysis after that. (2) Using metaphlan directly to analyze the sample. I found that metaphlan is working great and efficiently on UseGalaxy.eu.
The Extract Kraken Reads By ID works great. Thank you very much!!
Additionally, I would like to install metaphlan in to my local galaxy setup.
However, my local setup requrires the proxy to access the internet. I cannot not find the option to set global proxy in the galaxy. Without proxy, my installation of all databases failed… For some data manager, I could managed to change the python code directly to incorporate proxy into it. But for metaphlan, I can not find a way to do it. Do you knwo how to add proxy in to the galaxy?
Finally, I found a tool that can merge the output of metaphlan, so could you give me some suggesstion on drawing stacked bar graph for the output of metaphlan?