MiRDeep2 Quantifier

Hi
I tried to run miRdeep2 quantifier after mapping using miRdeep2 mapper.
Basically I downloaded human sequence from the site and did miRdeep2 mapper with hg38.
Then used collapsed reads of miRDeep2 mapper, hairpin. fa and mature. fa from miRDase.
However, I get this message.

Fatal error: Exit code 1 () getting samples and corresponding read numbers
Converting input files building bowtie index mapping mature sequences against index mapping read sequences against index
Mapping statistics #desc total mapped unmapped %mapped %unmapped total: 14124170 14124170 0.000 1.000 seq: 14124170 14124170 0.000 1.000 analyzing data
Expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_expressed.csv not expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_not_expressed.csv
Creating miRBase.mrd file make_html2.pl -q expression_analyses/expression_analyses_galaxy/miRBase.mrd -k dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat -y galaxy -o -i expression_analyses/expression_analyses_galaxy/dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat_mapped.arf -l -M miRNAs_expressed_all_samples_galaxy.csv miRNAs_expressed_all_samples_galaxy.csv file with miRNA expression values parsing miRBase.mrd file finished creating PDF files

Can you direct me how to solve this issue?

1 Like

Welcome, @Koichi_Yuki

Is there more in other logs output by the tool? Check the view under the i icon in your error dataset for this.

Our topic in the banner explains more about what we’ll need to troubleshoot, or see here directly → How to get faster help with your question

Let’s start there :slight_smile: With more shared details we can probably help to solve this. Related topics are under rbc_mirdeep2

Thank you, Jenna. This is what I have under i.
I used hairpin. fa and mature. fa from MiRDase by simply downloading it (including all species). But do I need to do something for that?

command line (I assume the same)
quantifier.pl -p /corral4/main/objects/2/9/d/dataset_29dd6934-a51e-4e14-b093-f0ff6ae79e94.dat -m /corral4/main/objects/f/a/e/dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat -r /corral4/main/objects/9/1/9/dataset_919a7de9-5800-4ccc-ae26-81275ab808f5.dat -e 2 -f 5 -g 1 -y galaxy ; cp expression_galaxy.html /corral4/main/jobs/060/490/60490242/outputs/dataset_b4c9764d-dbbf-4e5a-9355-fd786917001d.dat 2> /dev/null ; mkdir -p /corral4/main/jobs/060/490/60490242/outputs/dataset_b4c9764d-dbbf-4e5a-9355-fd786917001d_files 2> /dev/null ; cp -R pdfs_galaxy /corral4/main/jobs/060/490/60490242/outputs/dataset_b4c9764d-dbbf-4e5a-9355-fd786917001d_files 2> /dev/null

Tool standard output
53327 mature mappings to precursors

after READS READ IN thing

Tool standard error
Tool Standard Error
getting samples and corresponding read numbers

Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics

#desc total mapped unmapped %mapped %unmapped
total: 14124170 14124170 0.000 1.000
seq: 14124170 14124170 0.000 1.000
analyzing data
Expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_expressed.csv
not expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_not_expressed.csv

Creating miRBase.mrd file

make_html2.pl -q expression_analyses/expression_analyses_galaxy/miRBase.mrd -k dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat -y galaxy -o -i expression_analyses/expression_analyses_galaxy/dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat_mapped.arf -l -M miRNAs_expressed_all_samples_galaxy.csv
miRNAs_expressed_all_samples_galaxy.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files

Job message
Fatal error Exit code 1()
error level 3
exit code 1

Hi @Koichi_Yuki

Without reviewing the files, it would be hard to guess if they are the correct content, and in the right format. You will want nucleotide fasta files for this. You can use tools like these to check/standardize: NormalizeFasta, Fasta Statistics.

What I’ve done is started up a very small tool test, and it worked fine at both the UseGalaxy.eu and UseGalaxy.org servers. Maybe review and see if you can notice what may be different between your data or parameter choices, and what is happening here? Results are included in this shared history.

For a larger example, we have a Galaxy Training Network (GTN) tutorial that you can compare too as well. Find this link in the Help section of the tool form (scroll down to find it) → GTN tutorials including MiRDeep2 Quantifier: fast quantitation of reads mapping to known miRBase precursors. Sometimes the workflow in a tutorial can be modified/reused, or you can just run it on the tutorial data to get a better idea of how the included tools work, with a real example using representative data.

Let us know how this works out, and we can follow up more. :slight_smile:

Thank you again.
Looks like
my hairpin. fa

hsa-let-7a-2 MI0000061 Homo sapiens let-7a-2 stem-loop
AGGUUGAGGUAGUAGGUUGUAUAGUUUAGAAUUACAUCAAGGGAGAUAACUGUACAGCCUCCUAGCUUUCCU
hsa-let-7a-3 MI0000062 Homo sapiens let-7a-3 stem-loop
GGGUGAGGUAGUAGGUUGUAUAGUUUGGGGCUCUGCCCUGCUAUGGGAUAACUAUACAAUCUACUGUCUUUCCU
----so I need to remove MI…homo sapiens let-71-3 stem-loop etc along with
change U to T?
Similarly for mature.fa?
Would you mind directing me how to remove those along with change U to T?
It would be too large to do something manually…so I appreciate for your feedback.

Jenna,

Thank you again.
Mature.fa- was able to change from U to R with normalization fasta followed RNA/DNA converter, but for hairpin.fa- I did normalize fasta, fasta width converter, then I tried to use RNA/DNA converter…but I received the following bug.
fasta_nucleotide_changer: found invalid nucleotide sequence (UUGUGUGCGUGCCUGGCUCCCUGUAUGCCACACAUGUAGCGCCCAACCCAGAUGYUAAGGUUGCCUGCUGUGGGUGGCGUGCAAGGGGCCAAGCAUGCAUCCAU) on line 2782
It looks liks what I downloaded from MiRDase hairpin.fa contains Y or R in sequences.
Is there anyway to remove these hairpin sequence so that I can run U to T? Pls advice. Thank you so much!

Hi @Koichi_Yuki

Getting the files in the correct format from the data source is best, so check for that first. Then if you need to do some data standardization, that can probably be done in Galaxy.

For confirmed content issues, these two tools might be enough:

  • NormalizeFasta – has an optional setting on the form to trim off the description content on the > title line, leaving just the identifier.

  • RNA/DNA converter – will standardize the bases themselves.

Then, if you still have IUPAC characters left, you can try either of these other tools:

  • Replace parts of text
  • Text transformation with sed

Finally, confirm that the content is Ok with a QC tool.

  • Fasta Statistics – counts up the bases along with summaries.

XRef → Hands-on: Data Manipulation Olympics / Data Manipulation Olympics / Foundations of Data Science

Hope this helps! :scientist:

Thank you so much.
I believe I figured out format…
However, I am still getting an error.
In your example of C. elegans, name between hairpin. fa and mature. fa matched completely such as cel-miR-36 (mature) vs cel-mir-36 (hairpin).
However, in my human one
for example,
hairpin

hsa-mir-15a
CCTTGGAGTAAAGTAGCAGCACATAATGGTTTGTGGATTTTGAAAAGGTGCAGGCCATATTGTGCTGCCTCAAAAATACAAGG
hsa-mir-16-1
GTCAGCAGTGCCTTAGCAGCACGTAAATATTGGCGTTAAGATTCTAAAATTATCTCCAGTATTAACTGTGCTGCTGAAGTAAGGTTGAC
hsa-mir-17
GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATGTGCATCTACTGCAGTGAAGGCACTTGTAGCATTATGGTGAC

Mature

hsa-miR-15a-5p
TAGCAGCACATAATGGTTTGTG
hsa-miR-15a-3p
CAGGCCATATTGTGCTGCCTCA
hsa-miR-16-5p
TAGCAGCACGTAAATATTGGCG
hsa-miR-16-1-3p
CCAGTATTAACTGTGCTGCTGA
hsa-miR-17-5p
CAAAGTGCTTACAGTGCAGGTAG
hsa-miR-17-3p
ACTGCAGTGAAGGCACTTGTAG

Name does not match at all. Is this going to be a problem?
If so, how should I fix the problem?
When I downloaded your example of C. elegans, miRDeep2 quantifier worked fine…so I assume this is the problem of formating. Pls let me know.

Sorry for many questions.
I think I figured out mismatching for hairpin and mature. fa names.
I am using one of RNA seq data stored in GEO by which others published about miRNA…so I assume, I should be able to detect miRNA.
However, I kept getting no match.
Now I am confident that hairpin. fa and mature. fa would be OK.
However, I am getting this kind of error message,

getting samples and corresponding read numbers.

Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics

#desc total mapped unmapped %mapped %unmapped
total: 14124170 14124170 0.000 1.000
seq: 14124170 14124170 0.000 1.000
analyzing data
Expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_expressed.csv
not expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_not_expressed.csv

Creating miRBase.mrd file

make_html2.pl -q expression_analyses/expression_analyses_galaxy/miRBase.mrd -k dataset_178aaa1f-f4bc-4368-b25a-c2ccc7ca7a9a.dat -t human -y galaxy -i expression_analyses/expression_analyses_galaxy/dataset_178aaa1f-f4bc-4368-b25a-c2ccc7ca7a9a.dat_mapped.arf -m hsa -M miRNAs_expressed_all_samples_galaxy.csv
miRNAs_expressed_all_samples_galaxy.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files

Is there any setup I need to do?

  • Sort reads by sample in PDF: Yes
  • Include ID-flexible mapping: Yes
  • Skip file conversion: No
  • Skip mapping against precursor: No
  • Consider the whole precursor as the mature sequence.: No
  • Discard all read multimapper: No
  • Upstream nucleotides: 2 (also tried up to 10)
  • Downstream nucleotides: 5 (also tried up to 10).
  • Allowed mismatched: 1 (also I tried 2)
    is there any setup I need to correct?
    Best

actully all set. I figured out what went wrong.
Thank you for taking time.

1 Like