I am working with 2 different bacterial samples with 1 being very high in GC-rich content while other being similar to humans. I have mixed these samples in equal amounts and performed WGS experiment. I have 2 questions:
1). What is the best aligner to use for prokaryotic genomes? I have most often used bowtie but is there a better one or recommended one for prokaryotic genomes?
2). From what I have read in other galaxy chats, people have recommended concatenating reference genomes of the 2 bacterial species and using this combined reference genomes? What would be best practice here? Also, I am not doing variant calling with these datasets, i am simply collecting some ngs library quality metrics ?
I would really apprecaite any help or advice on this.
Bowtie2 should be fine as far as I know. You could also look at Minimap2. Maybe run both and compare?
You could also go with a very simple approach with a tool like BLASTN (megablast) if the goal is only about statistics since the output is a basic tabular file and not a BAM.
Then for this part
With a mixed genome query sample, the primary benefit of combining the reference genomes together is that those target genome sequences will compete against each other for the “best hit per query sequence” result.
If you are generating metrics about that best hit, then yes, concatenating the target genomes together seems to be what you’ll want to do… Just be aware that you could get some cross over if the two species are similar. Again here, you could try both ways and review the results.
Contaminating fasta genomes is quick and will probably run in a few seconds. Customizing a workflow to run through multiple tools, or the same tool with different targets, is also pretty quick. You can create your own workflow or customize a public workflow.