Need to know the possible sources of duplicate listing of mitochondrial genes resulted by the splitting of sequences in the MITOS output

Greetings to the online community of Galaxy users! I want to know why some genes detected from the input assembled mitochondrial genome have duplicate list, which resulted by the splitting of their individual nucleotide and protein sequences in multi-FASTA files generated by the MITOS tool. Is this an artifact from the genome assembly, an hypothesis to the mitochondrial genome structure, or a MITOS-related annotation heuristics? Thank you in advance. Here is the history of my annotation run: Galaxy

Hi @gerry_ramos_jr

The publications from the tool authors explain how the parameter settings function plus how to interpret these outputs. Find citations linked from the tool form (scroll down to the bottom), also here.

Citations

Donath, A., Jühling, F., Al-Arab, M., Bernhart, S. H., Reinhardt, F., Stadler, P. F., Middendorf, M., & Bernt, M. (2019). Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes. Nucleic Acids Research, 47(20), 10543–10552. Visit Citation

Arab, M. A., zu Siederdissen, C. H., Tout, K., Sahyoun, A. H., Stadler, P. F., & Bernt, M. (2017). Accurate annotation of protein-coding genes in mitochondrial genomes. Molecular Phylogenetics and Evolution, 106, 209–216. Visit Citation



One of the tool authors is part of the Galaxy community and is active at this forum. Hi @bernt-matthias would you like to offer any more advice about result interpretation resources? Thanks! :slight_smile:

There are a few things here that I would like to remark:

  • Some of the duplicated genes (or rather fragments of the same gene) are adjacent. This is likely due to frameshifts.
  • The genome seems to be rather large for an avian mitogenome

Both points make me wonder if the observed problems are caused by assemby errors.

Besides this you need to check duplicated or fragmented genes manually, e.g. by MSA with related species.