Mirdeeper2 quantifier and Mirdeeper identication tools does not work(get errors

jennaj · October 15, 2024, 6:02pm

You are not getting meaningful results at the mapping step.

Find that scientific result in the job logs (using the i-icon).

Screenshot
Screen Shot 2024-10-15 at 10.11.58 AM

I played around a bit with your data. I can get about 20k of all the reads (out of over a million!!) to map with BLASTN using very permissive settings – specifically, by allowing multi-mappings and very short weak hits. That independent mapped length was as short as 14 bases. You could explore this, too, by using BLAST, as a type of sanity check.

Now, your source data is not the same species (from a traditional perspective). If you are hunting, that’s science! But you’ll need to have an “exploratory result” perspective. What you are trying to do is not so simple, and the very low mapping rates may be actual (once you verify the QA and correct the redundancy steps).

Genome: Human gammaherpesvirus 4, complete genome - Nucleotide - NCBI (virus, from human source, no human DNA present, correct?)
Reads: https://www.ncbi.nlm.nih.gov/sra/?term=SRR7547891 (tumor, from human source, with all the other human DNA still present!)

What to do

Review how QA was performed on the original reads.
Review how you are collapsing the reads. If you don’t get rid of the duplications, you will have non-specific mapping results in the downstream step, and depending on the map settings, those alignments will not pass through and be reported.
You don’t need to attempt to use the downstream tools until all of the results that are input to those tools are useable. So far, that appears to maybe be the quality of the reads, and how they map to your reference after collapsing them.

We don’t want to provide too much scientific advice here since our focus is on using Galaxy. Maybe visit a scientific forum where other scientists spend time. Or, if you can find a publication, you can probably replicate that in Galaxy.

These tools also have a tutorial. I know it is complicated, but understanding what is happening at each step will provide you with an example of what to check with your own data. You could even just use parts of the tutorial data to run through your own custom workflow just to make sure that the basics are intact. Then you can focus on the data interpretation parts.

Hands-on: Whole transcriptome analysis of Arabidopsis thaliana / Whole transcriptome analysis of Arabidopsis thaliana / Transcriptomics

Hope this helps!