mapping RNA-seq data to a composite build of bacterial genomes

S_B · May 30, 2019, 9:07am

Hi, Hope you can help: I’m trying to map some RNA-seq data generated from libraries prepared from RNA extracted from swabs taken from mouse oral cavity. I’m expecting a mixed population (lactobacillus, staphylococcus, enterococcus, etc) Is there a way I can build a 'mini’database within Galaxy? or is there another way to map the if you expect a mixed population. I tried the ‘MetaPhlAn’ bu then had to select a specific database for mapping. The main outcome I’m hoping for is to see which species are present and then run files through a QC metrics tool to see how much of the content is mRNA and how much rRNA (this was a pilot experiment with depletion of both human and mouse rRNA). Any advice would be great.
Thank you.
SB

jennaj · May 31, 2019, 5:32pm

Hi @S_B!

Right, these metagenomics tool map against mixed populations. Kraken is another example tool that works that way (but has three options, including Bacterial). Still, might be useful to learn what your primary target species are.

Once or if you know the target species, a composite custom reference genome could be created. Just make sure that each “chromosome” label (one chromosome per genome in your case) is named distinctly in the fasta file. Proper formatting for custom genomes really matters. If you plan to include reference annotation in your analysis, later on, make sure it “matches” the same genome/builds used for mapping.

FAQs: https://galaxyproject.org/support

Galaxy tutorials: https://galaxyproject.github.io/training-material/