Mirdeeper2 quantifier and Mirdeeper identication tools does not work(get errors

namarig · October 14, 2024, 10:15pm

Hi this Namarig Elmalih, Second year graduate student in A&T university with Dr Perpetua Muganda. You’ve found errors in the MiRDeep2 quantifier and identifier tools in certain samples but not others. This my history link Galaxy

jennaj · October 14, 2024, 10:31pm

Hi @namarig

Thanks for posting all the details, very helpful!

Review your mapping input for this job. You probably need to run MiRDeep2 Mapper with the correct files again.

The error is reporting that the target “genome” contains identifiers that are not found in the mapping file (and the reverse!). The reads part of the mapping appears to be Ok, so adjusting the target to be the same reference genome as you are including in this step is what to adjust with the upstream tool.

Where is this documented?

Under that mapping input area on the tool form:

Reads mapped against genome. Mappings should be in ARF format.

Then, down in the Help section of the tool form:

What it does

MiRDeep2 is a software package for identification of novel and known miRNAs in deep sequencing data. Furthermore, it can be used for miRNA expression profiling across samples.

Input

A FASTA file with deep sequencing reads, a FASTA file of the corresponding genome, a file of mapped reads to the genome in miRDeep2 arf format, an optional fasta file with known miRNAs of the analysing species and an option fasta file of known miRNAs of related species.

Arf format: Is a proprietary file format generated and processed by miRDeep2. It contains information of reads mapped to a reference genome. Each line in such a file contains 13 columns:

read identifier
length of read sequence
start position in read sequence that is mapped
end position in read sequence that is mapped
read sequence
identifier of the genome-part to which a read is mapped to. This is either a scaffold id or a chromosome name
length of the genome sequence a read is mapped to
start position in the genome where a read is mapped to
end position in the genome where a read is mapped to
genome sequence to which a read is mapped
genome strand information. Plus means the read is aligned to the sense-strand of the genome. Minus means it is aligned to the antisense-strand of the genome.
Number of mismatches in the read mapping
Edit string that indicates matches by lowercase ‘m’ and mismatches by uppercase ‘M’

Hope this helps!

namarig · October 14, 2024, 11:03pm

Hi jennaj
I repeat my work again still give me errors in both mirdeeper2 quentifier and Identifier.

jennaj · October 15, 2024, 6:02pm

Hi @namarig

You are not getting meaningful results at the mapping step.

Find that scientific result in the job logs (using the i-icon).

Screenshot
Screen Shot 2024-10-15 at 10.11.58 AM

I played around a bit with your data. I can get about 20k of all the reads (out of over a million!!) to map with BLASTN using very permissive settings – specifically, by allowing multi-mappings and very short weak hits. That independent mapped length was as short as 14 bases. You could explore this, too, by using BLAST, as a type of sanity check.

Now, your source data is not the same species (from a traditional perspective). If you are hunting, that’s science! But you’ll need to have an “exploratory result” perspective. What you are trying to do is not so simple, and the very low mapping rates may be actual (once you verify the QA and correct the redundancy steps).

Genome: Human gammaherpesvirus 4, complete genome - Nucleotide - NCBI (virus, from human source, no human DNA present, correct?)
Reads: https://www.ncbi.nlm.nih.gov/sra/?term=SRR7547891 (tumor, from human source, with all the other human DNA still present!)

What to do

Review how QA was performed on the original reads.
Review how you are collapsing the reads. If you don’t get rid of the duplications, you will have non-specific mapping results in the downstream step, and depending on the map settings, those alignments will not pass through and be reported.
You don’t need to attempt to use the downstream tools until all of the results that are input to those tools are useable. So far, that appears to maybe be the quality of the reads, and how they map to your reference after collapsing them.

We don’t want to provide too much scientific advice here since our focus is on using Galaxy. Maybe visit a scientific forum where other scientists spend time. Or, if you can find a publication, you can probably replicate that in Galaxy.

These tools also have a tutorial. I know it is complicated, but understanding what is happening at each step will provide you with an example of what to check with your own data. You could even just use parts of the tutorial data to run through your own custom workflow just to make sure that the basics are intact. Then you can focus on the data interpretation parts.

Hands-on: Whole transcriptome analysis of Arabidopsis thaliana / Whole transcriptome analysis of Arabidopsis thaliana / Transcriptomics

Hope this helps!

Topic		Replies	Views
miRdeep 2 quantifier usegalaxy.org support troubleshooting	5	225	September 27, 2023
mirdeep2 errors -- Solution: verify input format usegalaxy.org support custom-genome , troubleshooting , reference-genome	1	461	April 13, 2023
MiRDeep2 Quantifier tool-help , rbc_mirdeep2	9	127	September 17, 2024
Problems Running MirDeep2 custom-genome , tool-help , rbc_mirdeep2	3	361	March 4, 2024
Fasta format genome file in mirdeep2 (custom genome) usegalaxy.eu support custom-genome , genome , mirna	3	1808	September 4, 2019

Mirdeeper2 quantifier and Mirdeeper identication tools does not work(get errors

Where is this documented?

Related topics