Issues with receiving results from FastQC in MultiQC from several collections of samples

Thanks for sharing these! I see the problem.

Notice that you have used

  1. Upzip collection - to get the forward and reverse reads into different list collections.
  2. Flatten collection - to transform a paired end collection organized by sample into a flat list with both forward and reverse reads together, but with unique full sample names: base sample name plus forward/reverse designation combined into the element identifier.

It is fine to keep the forward and reverse reads together. This means you can run just 2 above and skip using 1.

Then, to avoid conflicts between the trimmed and untrimmed, you can modify the element identifiers for one of the versions of the collections, or you can modify both and let the tools sort all of the the data by an “inferred sample” name.

How this works: the base name in all of the different collection identifiers is the “sample name” – specifically, the “sample” is the part of the element identifier before the first underscore _ character. Content after the first _ is also interpreted but you can control how this is handled.

Warning! These tools have some reserved terms!! If you only add on _trim or _trimmed, it will be parsed out by FastQC and you’ll lose samples in the report! So add in something else. I used _raw and _trimx. Later on, MultiQC still parsed out _raw in the final report, but I was Ok with that. You could experiment with different terms to see what passes through or not.

  • _raw was added for the raw reads sample names
  • _trimx was added for the trimmed reads sample names

You could use something like _banana and _apple and that would probably work too! :slight_smile:

Screenshots

Your workflow with the conflict annotated.

One of my workflow’s subworkflows annotated. I did almost the same thing for the raw reads in the other subworkflow (used a different term at the end _raw).

My main workflow. The subworkflows consume a paired collection then output the FastQC reports. All four versions of the reports have a unique full sample name, but have the base name the same! The results were returned, then sorted, then summarized by MultiQC.

I moved these over to the EU server (from the ORG server) to make it easier to find and examine in more places. Search the Activity bar → Workflows → Public Workflows tab with the keyword “quality”. Or, these links will take you to each directly.

The subworkflows can be used by anyone in other workflows. They consume a paired end collection of Illumina reads and output a collection of FastQC reports. Which trimming tool you used won’t matter. You can also import and edit them any way you want! Just give them a unique name to better help to keep track of what is doing what, and add yourself as an editor and a short description of what you changed if you decide to publish on the server.

How to use a subworkflow? Import it first, make changes if you want to (change the _trimx to be something else if you want), save, then go in to edit your primary workflow.

Once in your primary workflow, you can add in a subworkflow’s content to your workflow. You can just copy the steps, or you can keep the steps broken out into a subworkflow.




Unique sample names are required when using the FastQC/MultiQC tools directly on the command line, too. People will go in an rename files in their sequence directories to avoid the conflicts. You will be doing the same thing in Galaxy – the element identifiers of collections are analogous to the “file names” that tool’s interpret in this context.

So, you have some choices! Hope this helps but let’s know if it actually does and if you are able to get this to work! You don’t have to use my workflows – but examining what I did will probably help. :hammer_and_wrench: