Recently I analyzed my fastq files using your galaxy aligners, bowtie2, BWA and minimap2. But mapped reads to references are very few, only 1, see below:
reference: QOD39769.1_Coxsackievirus_A6, length 6606, mapped reads 1
I then used CLC genomics workbench to analyze, and found many reads mapped to the references, see below:
Maybe different parameters were used between the runs? This assumed the query, targets, and tool choices are otherwise similar.
You are running these tools through different platforms but that shouldn’t make a difference. You should be able to reproduce the results anywhere given the same tool, inputs, parameters. These platforms are mostly managing the technical details of “jobs” for you – not changing what the tools themselves produce. You are still in control of the scientific parts.
This tutorial covers how to map in general in Galaxy. Mapping
Your query reads type will matter for some tool choices.
If you are not sure what kind of data your reads represent, this tutorial can help. NGS data logistics
Most tools, including those you mention, have all (most?) of the command-line options implemented directly on the tool form. Those tend to be described inline where the option is set, with more information down in the help section. This includes the command flags. Maybe compare the command strings between the runs you are trying to replicate?
How to find the command-string in Galaxy is described here (in short, click on the “i” job information icon in any dataset to learn what was submitted, exactly: inputs, parameters, logs). The FAQ is labeled as being for errors, but the same advice also applies for odd outputs, or really for any reason! Troubleshooting errors
Please give those a review a try first. Then, if you need more help translating a command string, we’ll need more details about the job you are trying to replicate. The details will matter, so please be specific. My guess is that either the tool choice, or the target database you are mapping against are the root differences (so, at a higher level than tool alignment parameter choices) but we can confirm that.
My job is for genotyping of enteroviruses from patients. As you may know, there are lots of subtypes of enteroviruses, so we use multiple fasta references to align with fastq files. When we used your galaxy tools, like bowtie2, BWA, etc., only very few reads aligned to the multi-fasta references, but with CLC genomics workbench, lots of reads found to align to the multi-fasta references.
We tried to tweak the parameters of the bowtie2 tool, but no improvement could be seen.
Bowtie2 and BWA both map DNA Illumina reads, so please confirm that is also your read type.
Both of those tools were designed to report only the best same-species matches versus a single genome. Adjusting alignment parameters help more with finding weaker hits rather than the best hit per target sequence. The sequences in the target are expected to all be from the same genome, e.g. chromosomes or contigs, not distinct assemblies.
For cross-species mappings, a tool like BLAST+ is likely a better choice. BLASTN+ will map nucleotide reads to a nucleotide target which sounds like what you are doing. Other tools in the suite can handle protein/nucleotide targets and queries in different combinations – the tool forms explain what each does, or you could read about them in detail at NCBI.
I’m guessing that the CLC workflow you are running incorporates a tool like BLAST+, and not the other two tools for the same reasons outlined above. From what I have read in their online tutorials, details about the published workflows or bundled tools are probably accessible in the web application somewhere. If you cannot find those details, maybe ask them for help?
Your earlier summary of the hits doesn’t include alignment quality statistics but that is is likely available. Meaning, some hits will be a better quality match than others. If the CLC results were filtered for exact/best matches per target sequence, many of those hits will probably fall out. You’ll need to do that same sort of filtering with raw BLAST results as well. Bowtie2/BWA apply a “best match” type of filtering at runtime (on purpose!).
Replicating a published analysis pipeline is usually a good place to start when deciding which tools to use. My suggestions are very general, and mostly about explaining why your results may differ when using these specific tools.
Galaxy has a suite of tutorials here Galaxy Training!. Many involve replicating a prior published result, and include a workflow that can be imported and customized. If these don’t cover your exact use case, you can run the custom analysis, then extract your own workflow for reuse. Searching the tool panel with the keywords (example: “genotype”) is another resource to get oriented. Full references, including tool author publication links, are at the bottom of each tool’s form.
Our read type is illumina DNA read. We align reads to multiple references from same species. I have adjusted the alignment parameters, like sensitivity, seed sizes, but no any improvements can be seen. I wonder if you could tell me how to adjust the alignment parameters? Thanks.