BUSCO error: refseq db not found

I ran busco to evaluate the draft of a bacterial genome and got this error:
“FileNotFoundError: [Errno 2] No such file or directory: ‘/cvmfs/data.galaxyproject.org/byhand/busco/v5/lineages/bacilli_odb10/refseq_db.faa.gz’”

Thanks for your help in adavance.

Best regards

Hi @H1889

Yes, this can happen with certain combinations of settings. We have some troubleshooting in this similar topics → #busco

This is a fresh testing history. I’m guessing that your options were set up like the first example run that errored.

So, instead, try one of the other combinations. Some can be unexpected. But if a database “isn’t available” in the error message, that means the underlying tool cannot process data that way.

:graduation_cap: GTN Tutorials for Genome Annotation / Tutorial List

Thank again for your kind help.

This behavior seems extrange to me because 8 months ago I used the same combination of options and everything went well.

Anyway I will follow your recomendations.

Greetings

Hum, if you can show an example of where that worked before, I’d be willing to take a look.

Now, the prior versions with simplified form all required a linage selection (most like the first example in dataset 8-9 above) so off hand that may have been it. But now, with the compound form, I think it is all working as expected.

Hello again, in the post you mentioned above, there is something that doesn’t make sense.
Prokaryotic data is used, but the ortholog database used is from plants (liliopsida).
BUSCO works well if you use version 5.8.0+galaxy0.

Here is my command line:

busco --in '/jetstream2/scratch/main/jobs/71098230/inputs/dataset_fea5ce6a-c055-42f0-8e6d-0ddcaa7e5762.dat' --mode 'geno' --out busco_galaxy --cpu ${GALAXY_SLOTS:-4} --evalue 0.001 --limit 3 --contig_break 10   --lineage_dataset 'bacilli_odb10'  --miniprot  && mkdir BUSCO_summaries && ls -l busco_galaxy/run_*/ && cp busco_galaxy/short_summary.*.txt BUSCO_summaries/ && generate_plot.py -wd BUSCO_summaries -rt specific

Therefore, the error does not stem from the absence of orthologous protein databases for each lineage, but rather from the version of the program used. It is clear that the latest version does not work.

Best wishes

Hi @H1889

I was just setting the default combinations at the top level to show what would be produced – my example isn’t a scientific example, more about the settings. I should have worded this better! I’ll try again.

You have bacterial data, correct? Using Metaeuk is the only requirement, the remainder you can set as needed. That can be “auto” for the linage as I used – but you can also choose a linage. In short, using Miniprot won’t work with the “auto” choice because it only functions with a linage selection.

Some more details are here. → https://help.galaxyproject.org/t/busco-for-bacterial-genome-assemblies/14475

Does this now help? :slight_smile: