Hi @Michael_Thon
Thanks for sharing the history, very helpful!
It seems that the list of pairs collection was run in paired-end mode, and the individual files were run in single-end mode. This is relevant for the scientific result but also for the technical processing. The context for paired-end data is handled differently than single-end.
Screenshots using the rerun-icon
SPAdes expects paired-end reads to be intact. Meaning, if one end of a pair was discarded for quality control reasons, the other end should be removed as well. This is just a built-in sanity check SPAdes applies (anywhere, not just in Galaxy!).
However, your input read files didn’t have this type of filter applied during QA. You can see the difference for the counts in the FastQC reports in your history. The error from SPAdes report this in the job Details logs (using the i-icon).
(paired_readers.cpp : 42) The number of left read-pairs is larger than the number of right read-pairs
0:00:08.716 0M / 268M ERROR General (paired_readers.cpp : 44) Unequal number of read-pairs detected in the following files: “/data/jwd07/main/099/418/99418033/working/paired_reads1/SRR16911041.fastq.gz” “/data/jwd07/main/099/418/99418033/working/paired_reads1/SRR16911042.fastq.gz”
Backing up, it seems that Trimmomatic was also run in single-end mode! This resulted in the tool not filtering out the unpaired reads. An automatic function of this tool is to split out the reads still paired after QA but to also report the unpaired (should you want to explore them more!).
Instead choose one of the paired modes.
What to do
If you have paired-end data, be sure to process the data as paired through all steps, unless there is a very specific reason (exploratory?) to use single end. Not doing this has scientific implications and tools may not always fail, and instead give odd results that may not show up until a downstream or data reduction tool is used later on.
I would strongly suggest that you consider creating a workflow for yourself that runs all of these QA steps together: FastQC → Trim → FastQC → MultiQC! Then reuse it whenever you have a new sample to process and just review the final report to assess. This can avoid the tedious clicks and parameter gotchas.
I have a small example at the EU server here → https://usegalaxy.eu/u/jenj/w/quality-control-q20-l20. You could swap out the QA tool and adjust MultiQC to parse it instead? Or maybe the preview is enough to show how this works? I think having QA as a separate workflow is super useful! Then create another small workflow for analysis steps as you develop them. A workflow can be as small as a two tools, whatever is useful for you!
IWC Workflows
You can also consider using our workflows for the assembly!
This one looks like what you want to do? Shovill invokes SPADes and provides some really nice metrics in the extra job logs – very useful for reruns of the first pass results are not what you quite want yet.
Note: if workflow consumes distinct forward and reverse reads, no problem! Use Unzip Collection to create the two inputs.
Hope this helps and let us know if this works out with the reruns! 