Hi
I tried to run miRdeep2 quantifier after mapping using miRdeep2 mapper.
Basically I downloaded human sequence from the site and did miRdeep2 mapper with hg38.
Then used collapsed reads of miRDeep2 mapper, hairpin. fa and mature. fa from miRDase.
However, I get this message.
Fatal error: Exit code 1 () getting samples and corresponding read numbers
Converting input files building bowtie index mapping mature sequences against index mapping read sequences against index
Mapping statistics #desc total mapped unmapped %mapped %unmapped total: 14124170 14124170 0.000 1.000 seq: 14124170 14124170 0.000 1.000 analyzing data
Expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_expressed.csv not expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_not_expressed.csv
Creating miRBase.mrd file make_html2.pl -q expression_analyses/expression_analyses_galaxy/miRBase.mrd -k dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat -y galaxy -o -i expression_analyses/expression_analyses_galaxy/dataset_fae632b1-cc36-4b54-831c-7ed7fc416b51.dat_mapped.arf -l -M miRNAs_expressed_all_samples_galaxy.csv miRNAs_expressed_all_samples_galaxy.csv file with miRNA expression values parsing miRBase.mrd file finished creating PDF files
Thank you, Jenna. This is what I have under i.
I used hairpin. fa and mature. fa from MiRDase by simply downloading it (including all species). But do I need to do something for that?
Tool standard output
53327 mature mappings to precursors
after READS READ IN thing
Tool standard error Tool Standard Error
getting samples and corresponding read numbers
Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics
#desc total mapped unmapped %mapped %unmapped
total: 14124170 14124170 0.000 1.000
seq: 14124170 14124170 0.000 1.000
analyzing data
Expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_expressed.csv
not expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_not_expressed.csv
Without reviewing the files, it would be hard to guess if they are the correct content, and in the right format. You will want nucleotide fasta files for this. You can use tools like these to check/standardize: NormalizeFasta, Fasta Statistics.
What I’ve done is started up a very small tool test, and it worked fine at both the UseGalaxy.eu and UseGalaxy.org servers. Maybe review and see if you can notice what may be different between your data or parameter choices, and what is happening here? Results are included in this shared history.
For a larger example, we have a Galaxy Training Network (GTN) tutorial that you can compare too as well. Find this link in the Help section of the tool form (scroll down to find it) → GTN tutorials including MiRDeep2 Quantifier: fast quantitation of reads mapping to known miRBase precursors. Sometimes the workflow in a tutorial can be modified/reused, or you can just run it on the tutorial data to get a better idea of how the included tools work, with a real example using representative data.
Let us know how this works out, and we can follow up more.
hsa-let-7a-2 MI0000061 Homo sapiens let-7a-2 stem-loop
AGGUUGAGGUAGUAGGUUGUAUAGUUUAGAAUUACAUCAAGGGAGAUAACUGUACAGCCUCCUAGCUUUCCU
hsa-let-7a-3 MI0000062 Homo sapiens let-7a-3 stem-loop
GGGUGAGGUAGUAGGUUGUAUAGUUUGGGGCUCUGCCCUGCUAUGGGAUAACUAUACAAUCUACUGUCUUUCCU
----so I need to remove MI…homo sapiens let-71-3 stem-loop etc along with
change U to T?
Similarly for mature.fa?
Would you mind directing me how to remove those along with change U to T?
It would be too large to do something manually…so I appreciate for your feedback.
Thank you again.
Mature.fa- was able to change from U to R with normalization fasta followed RNA/DNA converter, but for hairpin.fa- I did normalize fasta, fasta width converter, then I tried to use RNA/DNA converter…but I received the following bug.
fasta_nucleotide_changer: found invalid nucleotide sequence (UUGUGUGCGUGCCUGGCUCCCUGUAUGCCACACAUGUAGCGCCCAACCCAGAUGYUAAGGUUGCCUGCUGUGGGUGGCGUGCAAGGGGCCAAGCAUGCAUCCAU) on line 2782
It looks liks what I downloaded from MiRDase hairpin.fa contains Y or R in sequences.
Is there anyway to remove these hairpin sequence so that I can run U to T? Pls advice. Thank you so much!
Getting the files in the correct format from the data source is best, so check for that first. Then if you need to do some data standardization, that can probably be done in Galaxy.
For confirmed content issues, these two tools might be enough:
NormalizeFasta – has an optional setting on the form to trim off the description content on the > title line, leaving just the identifier.
RNA/DNA converter – will standardize the bases themselves.
Then, if you still have IUPAC characters left, you can try either of these other tools:
Replace parts of text
Text transformation with sed
Finally, confirm that the content is Ok with a QC tool.
Fasta Statistics – counts up the bases along with summaries.
Thank you so much.
I believe I figured out format…
However, I am still getting an error.
In your example of C. elegans, name between hairpin. fa and mature. fa matched completely such as cel-miR-36 (mature) vs cel-mir-36 (hairpin).
However, in my human one
for example,
hairpin
Name does not match at all. Is this going to be a problem?
If so, how should I fix the problem?
When I downloaded your example of C. elegans, miRDeep2 quantifier worked fine…so I assume this is the problem of formating. Pls let me know.
Sorry for many questions.
I think I figured out mismatching for hairpin and mature. fa names.
I am using one of RNA seq data stored in GEO by which others published about miRNA…so I assume, I should be able to detect miRNA.
However, I kept getting no match.
Now I am confident that hairpin. fa and mature. fa would be OK.
However, I am getting this kind of error message,
getting samples and corresponding read numbers.
Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics
#desc total mapped unmapped %mapped %unmapped
total: 14124170 14124170 0.000 1.000
seq: 14124170 14124170 0.000 1.000
analyzing data
Expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_expressed.csv
not expressed miRNAs are written to expression_analyses/expression_analyses_galaxy/miRNA_not_expressed.csv
Creating miRBase.mrd file
make_html2.pl -q expression_analyses/expression_analyses_galaxy/miRBase.mrd -k dataset_178aaa1f-f4bc-4368-b25a-c2ccc7ca7a9a.dat -t human -y galaxy -i expression_analyses/expression_analyses_galaxy/dataset_178aaa1f-f4bc-4368-b25a-c2ccc7ca7a9a.dat_mapped.arf -m hsa -M miRNAs_expressed_all_samples_galaxy.csv
miRNAs_expressed_all_samples_galaxy.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
Is there any setup I need to do?
Sort reads by sample in PDF: Yes
Include ID-flexible mapping: Yes
Skip file conversion: No
Skip mapping against precursor: No
Consider the whole precursor as the mature sequence.: No
Discard all read multimapper: No
Upstream nucleotides: 2 (also tried up to 10)
Downstream nucleotides: 5 (also tried up to 10).
Allowed mismatched: 1 (also I tried 2)
is there any setup I need to correct?
Best