multQC issue and guidance?

Hi @Martyn

The immediate issue is that the input collection is in a nested “paired list” shape. MultiQC is expecting a simple “list” collection.

How to get from the SRA “paired list” output collection shape to a simple “list” collection shape?

Use the Faster Download and Extract Reads in FASTQ format from NCBI SRA tool

Then, when running FastQC on a paired end collection, it is useful to first apply the Collection Operations -> Flatten Collection tool. This assigns a unique sample identifier to each end of the pair – giving the sequences a distinct collection identifier. MultiQC will then provide the results for each sequence end, and summarize correctly.

Single dataset QA example

Multiple paired end dataset QA example

A demonstration is in the Quality Control Q20-L20 public workflow.
How to find it →

What to do

  1. Run the Flatten Collection tool on the output from Faster Download and Extract Reads in FASTQ
  2. Run FastQC on the result (a simple “list” collection format)
  3. Run MultiQC on the result.

Or, you can try the workflow. You can customize the trimming step, or ignore it, or remove that step in your copy of the workflow.

Please let us know if this helps or not, and we can follow up more! :slight_smile: