Hello
I was working on a dataset, unfortunately during the analysis phase with Infer exprement my data became unknown and for this reason I cannot perform the multiQC operation, can you help me in this regard?!!
Hi @Sina
The problem is with the BAM files: the content inside the files. It is a scientific problem that this downstream tool is letting you know about (instead of just failing).
The datatype on the MultiQC output is correct (txt) but the file is empty. Notice the mini report at the top in the dataset description: “Total 0 usable reads were sampled”. That’s because of a reference genome mismatch problem – the BAM was mapped against apiMel2 but the reference BED data was based on Human hg38. The chromosomes do not “match up” between the two different assemblies, so no reads could be sampled and summarized per chromosome, resulting in an empty output. Most tools cannot trap every possible situation exactly – instead they will try to report back what they can detect somewhere in the outputs: job logs, annotation notes on the dataset, parameter settings or metadata.
What to do
You should back up and map those reads again. It looks like the reference genome choice was set incorrectly when running RNA-STAR.
The choice you used is the default (first genome sorted alphabetically in the list) as Honey Bee “apiMel2”. You should use the Human hg38 choice instead. This is match up with your reference annotation.
Screenshot from the job details i-icon view for one of your mapping jobs.
Reference Data
How could I tell which reference index to use? The metadata at the top of the GTF/GFF3 is one place – you will want to ensure that all reference data for an analysis is based on the same exact genome assembly, or the scientific results will be wrong (tools may not even fail!). If you ever have a file without metadata, or are not sure which server index is a match for other data you want to use, this guide can help more.
Job Details view: inputs, parameters, logs
The job details view is a great place to review a job if the outputs seem odd, or if downstream tools using that data fail or have an odd result. You can check to see what was done, exactly, and confirm your parameter choices. Or, something errored – this same place is where you will be reviewing a bit further down in the job logs.
I’m glad you asked since this can all be a bit confusing at first! But I do hope this helps. You can get rid of all the files from the mapping step forward since you will be recreating that data.
Consider using a workflow
We have workflows for RNA-seq in a few of our tutorials if you would like to try with those, too. Getting all of the little parameters set up once, then not needing to worry about them later is a big reason why workflows are so popular!
- RNA-seq tutorials that include this tool → Galaxy Training!
- This is the first one in a series that goes through an entire pipeline for DE analysis. Each has a workflow. Adjust the target database from Mouse mm10 to human hg38 and all should work great! → Hands-on: 1: RNA-Seq reads to counts / 1: RNA-Seq reads to counts / Transcriptomics
If you are not sure how to use a workflow, this is my favorite introduction tutorial.
And this one is a good reference.
The idea is to spend time creating a workflow – designing your analysis process – then plug in the data at the end. You can rerun until you like the results. Play around with parameter settings. Layer in more reports and graphics at the end. You can also check out our stable published workflows at Dockstore and WorkflowHub.
- This is an example search → Search.
- You could import and customize any of these.
- You can also search for public workflows under the Workflow menu since those external sources are searched, too.
Hope this helps! I’m expanding this help since it is a good example use case, and the advice here will help both you and others that visit here. You can ask more questions, too, if something isn’t clear.