MultiQC not working correctly

I have 24 untrimmed fastq files for RNA-seq. They are paired files. I have them in a data collection and run fastQC on them with no problem, but when I try to aggregate fastQC results with multiQC it only reports two files; forward and reverse, instead of each of the 24 individual files. What I’m I doing wrong? I’ve followed the tutorials and have set all parameters per those tutorials but keep getting the same result.

1 Like

You are mentioning you have only two output files but did you also downloaded and opened the files?
I am assuming that in one file you have 12 R1 fastq files combined and in the other the R2 files.

EDIT:
Not a good answer, see below.

@stealsh

I’ve seen this before, too, for about the last 5-4 months when collections changed a bit in the 21.09 release. These two tools don’t work in a series the way they used to.

Details: The problem comes from the way the data is organized and where the sample names are derived from when in a nested collection. They are named the same in the top level of the nested structure – – one “forward” and one “reverse”. The actual sample names are one level deeper.

I couldn’t figure out how to solve it before, gave up, and no one else reported the issue. Will create the test again and ask others to review it. There is probably a solution, and it would probably involve organizing the collection differently. “Flatten collection” was one option I reviewed but that didn’t produce the MultiQC output properly either (forward and reverse from the same sample had the same “identifier” that MultiQC was interpreting instead, so again there was data loss from common naming). “Rename collections” was problematic, too, but I forget why.

If there isn’t a good workaround, will open up a ticket. For either case, expect another reply tomorrow with an update. The FastQC tool itself might need a change – or maybe MultiQC (although that tool is tricker to change).

Meanwhile, one of these might work, and probably only the latter:

  1. Expand the collection and drag and drop the datasets from inside to the MultiQC tool input. This involves a LOT of clicking.
  2. Or – unhide the datasets in your history, then multi-select those for the input. I think this worked only when all forward were combined, then all reverse, but not together. Warning that this will make a lot of clutter in the history. Maybe copy just the FastQC output into a new different history and try it there, so any tests are easier to get rid of.

Thanks for reporting this! And @gbbio if you can think of a way to do this, feel free to add more to our replies. It is easily replicated: put any two pairs in a collection then run FastQC > MultiQC. MultiQC is only able to report back one pair, not both, no matter how the collection is arranged. I guess one option is to create some new collections just for input to MultiQC but that doesn’t combine by sample ID. Maybe I missed something obvious that fresh eyes will find :slight_smile:

Hi,

Thanks for the detailed reply. Yes, I see now that all the files when paired were named either forward or reverse. I will try the work arounds later today, when I have some free time.

Update, I attempted both of the suggested work arounds but got the same result, multiQC only outputting forward and reverse.

1 Like

Hi @stealsh

Ok, it was worth trying. These two specific tools won’t work together for now when inputting the collection as a whole.

This will need a ticket, I’ll get to it this week, and post that link back here for reference/tracking. I can’t estimate how long it will take for the review and actual change to make it back to the server, so don’t wait for that. The individual FastQC reports can be reviewed as an alternative for now.

Thanks for reporting the problem and so sorry there isn’t some easier or immediate solution.


Update

Looks like this is a known issue still pending a correction: MultiQC - Use "element_identifier" as "sample name" for all tools · Issue #1595 · galaxyproject/tools-iuc · GitHub

Hi @stealsh
try “flatten collection” from Collection Operation section on collection of paired reads before the FastQC step. Datasets in flatten collection have unique names.
Hope this helps.
Kind regards,
Igor

1 Like

Hi Igor,

Thanks for pointing this out. It works perfectly!

Steve