Hi, I have a problem to run MultiQC tool , I tried to run it on data from feature count ( 2 samples are there ) , but every time I conduct MultiQC I got result only for the first sample
Hi @walaa
Thanks for sharing the history, I’m reviewing. If you are following a tutorial, linking that back in here too would be helpful. More soon!
Ok, that was quick!
It looks like dataset 60 was generated by running against a collection folder of Featurecounts summary reports. That output has both samples included in the final MultiQC report. So that one seems correct. Two sample reports were combined into a single summary report.
Then in dataset 84, there was just a single dataset of a Featurecounts summary report selected, and the output is based just on that single sample in the final MultiQC report.
Maybe you were able to solve this? Or you do want to clarify about which dataset in the history you have a question about? Screenshots of what is going wrong can help too to make sure we are looking at the same data.
Thanks and we can follow up more!
thanks for your answer , yes the problem is that I got result of one sample when I run multiqc for feature count data , I’m not sure if I should include the summary result or the count result or both ?
Ok, thanks for explaining @walaa
The output from Featurecounts is what will be input to downstream differential expression tools. The are the count files. The tool is comparing features in the reference annotation (coordinates on the genomic strand) to where the reads mapped to the same genomic strand (these also have coordinates). When they overlap according to special rules, then a read is “counted up” for that feature. Differences between these counts has biological meaning that is assessed by later tools. Summary and QA reports explain what is happening, but the “data” are in the primary files: reads, BAMs, count files. The primary files can be hard to read, so the summary files are output too or sometimes you’ll generate these separately, like you did with FastQC.
One way to review what happened when counting, across multiple samples or a single sample, is to also run the the summary of Featurecounts through MultiQC. The MultiQC tool is used to assess what happened, but it doesn’t generate primary data. The graphics are a bit easier to read than the summary tabular files with just numbers. Looking at all of your samples together this way can alert you to potential problems, or just important trends. When you later will have 100s of samples, this becomes more important.
Does that make sense? You did something similar in upstream steps when you were assessing the read quality. The data sort of splits – one path is to generate QA reports (the “summary” output) and one is for the next analysis step (the “count” output). As you go through the process, the data will continue to have forks like this. The data will also merge together – each sample travels along the workflow independently up to a certain stage, then all are compared together.
Collections are just folders of the datasets. This too is useful once you have many more samples and many more steps.
Have you seen a Galaxy workflow yet? These have a graphical format that can help to see what is going on a bit easier. The slides in the first tutorial here explain more about what these tools are doing scientifically but also the overall process and why. Then go into either of the first tutorials and look at the workflow. You’ll see the branching and merging I am talking about.
- Transcriptomics / Tutorial List
- This is one of the workflows from the second tutorial in that listing. It shows a small preview of a workflow that is doing something similar to what you are doing. → QC + Mapping + Counting - Ref Based RNA Seq - Transcriptomics - GTN - subworkflows / Reference-based RNA-Seq data analysis / Transcriptomics
- Later on you won’t have to click tools individually. You can create a workflow, then run that. Much easier!
And, it looks like you are already taking a class, but if you wanted to learn more about how this all works, we have a free training event coming up in May. It is asynchronous, you can choose a path even if it is just one tutorial, and we’ll have a special chat with many scientists able to help.
Ok – that’s a lot of information! Please let us know if you have more questions.
Thanks, now it works very well when I rerun the feature count data