MultiQC troubleshooting

I have a merged collection of FASTQC files of paired-ended RNA-Seq dataset which the multiQC is not identifying. I have tried remerging the FastQC files with merge collection tool but still it is not identifying the merged txt file.

Kindly provide support.

1 Like

Hi @Sanjukta_Ghosh

It is difficult for other people to guess correctly what might be going on if you don’t share enough details for troubleshooting. The banner at this site has instructions about the kind of information that helps. If you cleared that away already, you can find it here → How to get faster help with your question

As a guess – did you:

  1. Choose the correct tool that the reports were created from on the MultiQC drop down menu? You will want FastQC.
  2. In the select menu for the files, then input the raw reports from FastQC.
  3. Double check that the reports actually contain data and are not empty.
  4. If you created those reports outside of Galaxy, notice that a particular flag needs to be set when running FastQC on the command line. The form specifies what that is right under the input area.

FAQ → https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/faqs/multiqc_error.html

Tutorials with many examples can be found here (this is the link from the bottom of the tool form) → https://training.galaxyproject.org/training-material/by-tool/iuc/multiqc.html

Please give all those a try first, then post back your job details and we can try to help more. If you already solved the problem, you can also let us know what worked to help others reading.

Let’s start there :slight_smile:

Hi @jennaj

Thanks for your detailed answer. Sorry; I did not write the obvious details while framing the question.

I have selected FASTQC ( Which tool was used generate logs?), and
Raw data (Type of FastQC output?).

In the FASTQC output, initially no file was displayed, now non-fastqc files are being indexed.

Capture

What should I do?

Also, I change the name of the collection every time an output is generated; do you think that is why it is unable to be identified? (I have done miRNA differential expression analysis, in which I have always changed the output names slightly, to help in identification, in that there was no problem.)

Hi @Sanjukta_Ghosh

The name of the collection is not used. Sometimes the file name does matter. But what really matters is the metadata: database and datatype (format). Click all the way into a dataset to see those in the expanded view.

This tool wants files with the datatype text. That should be automatically assigned by FastQC for the collection labeled with the “raw” original naming.

One tip: if the input reads were in a “list of pairs” type of nested collection, you should flatten the collection before running FastQC. That adds in a _forward and _reverse to the name of the files inside the flattened collection. FastQC does use those file names … and the two ends will have the “same name” by default if not flattened, so you won’t get all of the outputs later on from MultiQC.

It sounds trickier than it is, and once you do this you’ll remember. This tutorial has an example of exactly what to do in the section I linked here → https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html#quality-control

Please give that a try :slight_smile:

And, I’m going to add in another tip! Try using tags to label data instead of changing the file names. Sometimes it is meaningful to know what tool output what… So, start a job, add in the tag after it shows up in the history (can be queued or running, unlike dataset/collection renaming), then go to the next step. The tags will show in the pull-down menus. This is not really how tags are meant to be used, but I do this all the time!! Our tutorials have the “correct” way, and you should explore those – are a bit more helpful once you have a workflow. For exploratory work, use tags however they work best for you!

1 Like

Thanks. It helped.

1 Like