Funnanotate failing in various ways

Hello! While doing work on Funnanotate, it has failed in several ways without performing its task:

  1. Arthropoda 2023 data base does not work:

[Jun 21 10:21 AM]: OS: Rocky Linux 8.6, 125 cores, ~ 438 GB RAM. Python: 3.8.10
[Jun 21 10:21 AM]: Running funannotate v1.8.9
[Jun 21 10:21 AM]: ERROR: arthropoda busco database is not found, install with funannotate setup -b arthropoda

  1. While using the 2022 database:
  • Names are too long:

[Jun 21 11:23 AM]: OS: Rocky Linux 8.6, 125 cores, ~ 438 GB RAM. Python: 3.8.10
[Jun 21 11:23 AM]: Running funannotate v1.8.9
[Jun 21 11:23 AM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[Jun 21 11:23 AM]: Skipping CodingQuarry as no --rna_bam passed
[Jun 21 11:23 AM]: Parsed training data, run ab-initio gene predictors as follows:
e[4mProgram Training-Methode[0m
augustus busco
glimmerhmm busco
snap busco
[Jun 21 11:24 AM]: Genome assembly error: headers contain more characters than the max (16), reformat headers to continue.
[Jun 21 11:24 AM]: First 5 headers that failed names:

  • When names are just fine:

[Jun 21 11:25 AM]: OS: Rocky Linux 8.6, 125 cores, ~ 438 GB RAM. Python: 3.8.10
[Jun 21 11:25 AM]: Running funannotate v1.8.9
[Jun 21 11:25 AM]: Can’t find Repeat Database at /data/db/data_managers/funannotate/2022-01-17-193541/repeats.dmnd, you may need to re-run funannotate setup

In conclusion, the tool seems to be failing no matter the configuration selected. Thanks for your attention and sorry for the inconveniences!

Hi @dairon

Yes, the 2023 database is known to be a problem. Ref: problem with funannotate - #3 by jennaj

The 2022 database had been working for others … but maybe there is still some problem!

We’ll need to wait for the indexes to be fixed for your use case. All indexes are undergoing a reorganization this summer, so in a few months this will be sorted out :slight_smile: Thanks for letting us know about the problems!


@dairon Jen’s remark about the 2023 database is, unfortunately, accurate.
Regarding your second issue, it is the error message that is accurate though.
In funannotate FASTA sequence names cannot be longer than 16 characters (see Preparing your Assembly — Funannotate 1.8.14 documentation) and if I’m counting right, the mentioned ones have 17 characters.

1 Like

Hi there! Thanks for the heads up :slight_smile:

I understand the first two errors, regarding the 2023 database and the name issue, but is there a workaround for the last error message? I mean, I am not able to use the tool no matter the configuration, thanks again!!

sure, you can shorten the read identifiers with one of the Replace Text tools Galaxy has. You could, for example drop the first few characters and later add them back in in the result.
How you should shorten them exactly, depends on your data since you want to keep ids unique.