MultiQC not working correctly

I have 24 untrimmed fastq files for RNA-seq. They are paired files. I have them in a data collection and run fastQC on them with no problem, but when I try to aggregate fastQC results with multiQC it only reports two files; forward and reverse, instead of each of the 24 individual files. What I’m I doing wrong? I’ve followed the tutorials and have set all parameters per those tutorials but keep getting the same result.

1 Like

@stealsh

I’ve seen this before, too, for about the last 5-4 months when collections changed a bit in the 21.09 release. These two tools don’t work in a series the way they used to.

Details: The problem comes from the way the data is organized and where the sample names are derived from when in a nested collection. They are named the same in the top level of the nested structure – – one “forward” and one “reverse”. The actual sample names are one level deeper.

I couldn’t figure out how to solve it before, gave up, and no one else reported the issue. Will create the test again and ask others to review it. There is probably a solution, and it would probably involve organizing the collection differently. “Flatten collection” was one option I reviewed but that didn’t produce the MultiQC output properly either (forward and reverse from the same sample had the same “identifier” that MultiQC was interpreting instead, so again there was data loss from common naming). “Rename collections” was problematic, too, but I forget why.

If there isn’t a good workaround, will open up a ticket. For either case, expect another reply tomorrow with an update. The FastQC tool itself might need a change – or maybe MultiQC (although that tool is tricker to change).

Meanwhile, one of these might work, and probably only the latter:

  1. Expand the collection and drag and drop the datasets from inside to the MultiQC tool input. This involves a LOT of clicking.
  2. Or – unhide the datasets in your history, then multi-select those for the input. I think this worked only when all forward were combined, then all reverse, but not together. Warning that this will make a lot of clutter in the history. Maybe copy just the FastQC output into a new different history and try it there, so any tests are easier to get rid of.

Thanks for reporting this! And @gbbio if you can think of a way to do this, feel free to add more to our replies. It is easily replicated: put any two pairs in a collection then run FastQC > MultiQC. MultiQC is only able to report back one pair, not both, no matter how the collection is arranged. I guess one option is to create some new collections just for input to MultiQC but that doesn’t combine by sample ID. Maybe I missed something obvious that fresh eyes will find :slight_smile:

Hi,

Thanks for the detailed reply. Yes, I see now that all the files when paired were named either forward or reverse. I will try the work arounds later today, when I have some free time.

Update, I attempted both of the suggested work arounds but got the same result, multiQC only outputting forward and reverse.

1 Like

Hi @stealsh

Ok, it was worth trying. These two specific tools won’t work together for now when inputting the collection as a whole.

This will need a ticket, I’ll get to it this week, and post that link back here for reference/tracking. I can’t estimate how long it will take for the review and actual change to make it back to the server, so don’t wait for that. The individual FastQC reports can be reviewed as an alternative for now.

Thanks for reporting the problem and so sorry there isn’t some easier or immediate solution.


Update

Looks like this is a known issue still pending a correction: MultiQC - Use "element_identifier" as "sample name" for all tools · Issue #1595 · galaxyproject/tools-iuc · GitHub

Hi @stealsh
try “flatten collection” from Collection Operation section on collection of paired reads before the FastQC step. Datasets in flatten collection have unique names.
Hope this helps.
Kind regards,
Igor

1 Like

Hi Igor,

Thanks for pointing this out. It works perfectly!

Steve

I have 6 trimmed samples using the tool trimmomatic and I want to do Multiqc but each time is the same thing. it only generate one file but they are 6 ones. so how to fix this ? do I need to flatten the trimmed files

I am not sure, have not used multiqc myself. But it could be because of the file names. Are they different and do they have the correct format? Could you share the full filenames here? (But maybe anonymouse them)

EDIT:

I only just now noticed this is an older tread and there are already answers above this. And it is already mentioned that the file names need be unique. I think it can also happen if there are certain things like “_trim” before “_R1” in the filename.

1 Like

Hi @maram_Nh

A complete pathway through a trimming tool (you can change that out with your preferred tool), along with FastQC and MultiQC, both before and after trimming, is described in this tutorial. Hands-on: Quality Control / Quality Control / Sequence analysis

The reason for manipulating the collection is to give each file a unique name. This prevents MultiQC from getting confused – it uses the file name to assign the sample name. If two or more inputs have the same sample name, just one file is retained.

For one paired end set of files, that is four different names: forward reads before trimming, reverse reads before trimming, forward reads after trimming, reverse reads after trimming. You will want all represented uniquely when asking MultiQC to graph them into a summary report.

If you are not using a collection for some reason, that can get a bit tedious. If you have questions about collections, there are more tutorials and you can ask new questions. Think of collections like folders – folders that contain files that are all of the same type. You can change the “shape” of the collection with the tools in Collection Operations: a simple list, a paired list, renamed listings of the same files, etc.

Another important consideration if you have a lot of data: Copies of the same underlying file in different collections/shapes do not consume extra quota space – where a copy of a regular single file with a different name will consume extra space.

And @gbbio is correct about both items: the original help will apply for you (we introduced some changes since 2022 – Flatten Collection is the tool to use for auto-renaming). And, the older topic wasn’t fully updated with the link to the newer tutorial. So, I’m going to close out this question to avoid confusion. Going forward, it is usually best to ask new questions in new topics. If you want to reference a prior topic, you can quote it or capture the URL. If you have more questions about this protocol after reviewing the tutorial, you can ask a new question and we can try to help. Share context about where you are now – how to is in the banner at this forum (briefly share your history). :slight_smile:

Hope this helps!