Problems Running MirDeep2

Hello everyone,

I am having some issues running MirDeep2. When I run MirDeep2, put in the following parameters:

  • For “Collapsed deep sequencing reads” I input the collapsed reads of MirDeep2 Mapper
  • For “genome” I input my genome file (genome.fna.gz)
  • For “Mappings” I input my Mapping output from MirDeep2 Mapper.
  • The rest of the settings are default

When I run the code, I receive the following error:

Fatal error: Exit code 25 ()
Fatal error:
#Starting miRDeep2
/home/galaxy/tool_dependencies/_conda/envs/__mirdeep2@2.0.1.2/bin/miRDeep2.pl /home/galaxy/galaxy/database/datasets/000/185/dataset_185262.dat /home/galaxy/galaxy/database/datasets/000/181/dataset_181109.dat /home/galaxy/galaxy/database/datasets/000/185/dataset_185263.dat none none none -t hsa -g 50000 -b 0

miRDeep2 started at 17:16:17

mkdir mirdeep_runs/run_05_02_2024_t_17_16_17

e[1;31mError: e[0mGenome file /home/galaxy/galaxy/database/datasets/000/181/dataset_181109.dat has not allowed whitespaces in its first identifier


Does anyone know what the issue could be?

This:

Check if there are white spaces in the identifiers. This is the line that starts with a “>”.

1 Like

Thank you for the response. I was able to remove the white spaces in the reference genome. I did this by replacing all whitespaces with underscores. I also made sure that the line started with a “>”.

After running it again, I received a new error message:


Fatal error: Exit code 255 ()

#Starting miRDeep2
/home/galaxy/tool_dependencies/_conda/envs/__mirdeep2@2.0.1.2/bin/miRDeep2.pl /home/galaxy/galaxy/database/datasets/000/185/dataset_185259.dat /home/galaxy/galaxy/database/datasets/000/186/dataset_186828.dat /home/galaxy/galaxy/database/datasets/000/185/dataset_185260.dat none none none -g 50000 -b 0

miRDeep2 started at 7:36:22

mkdir mirdeep_runs/run_28_02_2024_t_07_36_22

The mapped reference id NC_060925.1 from file /home/galaxy/galaxy/database/datasets/000/185/dataset_185260.dat is not an id of the genome file /home/galaxy/galaxy/database/datasets/000/186/dataset_186828.dat

Do you know what this new issue could be from?

Hi @tgray4

Try cleaning up the fasta file so that just the sequence identifers are on the title lines. You can use the tool NormalizeFasta. Be sure to toggle the option to strip everything after the first white space.

Tools are trying to “match up” the sequence identifiers across different files. So the idea is to isolate the identifiers in each file to make that easier for the tool to process.

So, leave just the sequence identifier in the fasta file > title lines.

Fasta files are best formatted as
>identifier

Not like this (what your first error was complaining about)
>identifier description maybe several words

And not like this (what your second error was about)
>identifier_description_maybe_several_words

The identifier value is what would be in other files:

  • BAM, in the @SN lines of the header and the third column of data lines.
  • GTF or GFF3, in the third column of data lines
  • BED, in the first column of data lines
  • More details FAQ: How to use Custom Reference Genomes?

Hope this helps!