roary core genome alignment file is empty. (resolved)

Good morning,
I run roary for 59 genomes, I got 600 core genes, but the alignment file is empty.
can someone please help me understand why, I ran roary two times and I got the same result.
thank you

for update, I rerun with roary tool in galaxy europe, and got the same result empty file for alignment, I check my gff files no problem with them, if some can guide me what to do I will be grateful.

Hello everyone,

I’m responding to this topic because i’m experiencing the same issue. Roary runs, but output is empty (0 lines for “Summary statistics” and “Gene Presence Absence” tabular files and 0 bytes for the “Core Gene Alignment”). It worked fine when I tested few genomes, but the problem comes when I use more (552 in my case).
I got same results with different “minimum percentage identity for blastp” (90, 85, 65, 60 or 50%).

Anyone has a solution?
Thanks in advance.

1 Like

Up: the problem I reported is only for usegalaxy.eu. It worked with useglaxy.org. Still no idea why.

1 Like

Hi @Cosmo,
thanks for the report; we are investigating the cause of the problem. Could you share a Galaxy history with me in order to reproduce it?

Regards

Hi @gallardoalba,

This issue of getting empty core genome alignment file is still not resolved in usegalaxy.eu and usegalaxy.org. I performed roary for 1000 genomes and got 1978 core genome. But the core genome alignment file is missing/empty.

Thank you in advance.

Update 2023-03-07

File naming issue resolved. Use version 3.13.0+galaxy2 or later.


Hi @Ashikha @Cosmo @Ola_Alessa

Roary is one of the few tools that interprets dataset names by necessity. Most other tools don’t – metadata like the datatype, or the actual dataset contents are used instead.

Workarounds that tend to solve most issues with weird results from this tool are listed in the issue ticket here: Roary 3.13.0 fails at usegalaxy.org -- likely installation issue · Issue #293 · galaxyproject/usegalaxy-playbook · GitHub

If those don’t work for some reason, we would definitely like an example in a shared history link. A small public example is best as we might want clone it to include in that ticket’s workaround or reassess whether any part of the issue can be addressed. And if the tool is actually silently failing for suspected resource reasons with larger data/stricter params, an example would be helpful. Post the shared link here as a reply (public), or ask @gallardoalba or me to start up a private message to share it in (if not shared already!).

Every example of odd failures or odd outputs I’ve reviewed was not dependent on the number/size of inputs – all were dataset naming issues. But that could be a biased set. While it is certainly possible to create a really massive computationally expensive job that fails to produce results at a public site (could happen at one or multiple – each uses different resources), those usually result in “red” error results. Very few produce empty (putatively successful) “green” results, and I don’t think that is a known yet about this particular tool.

Thanks in advance for the feedback!

(post deleted by author)

Hi @Ashikha the URL you posted is not a history share link. How to generate these is described in the FAQ above. Thanks!

https://usegalaxy.eu/u/kitchlu/h/copy-of-pangenome-analysis

1 Like

Hi @Ashikha

Thanks for sharing the link.

None of the outputs have any content in the example run. The stderr for the job reports a missing input file (find that log under the :information_source: icon). That file is created by the tool at runtime – and the sub-job must have failed.

I didn’t notice anything wrong with that input data but you can double check it. It does work fine when run in a job with fewer inputs.

While it is possible to create a job that is simply too large to process at a public Galaxy site, try a few more reruns first before considering a move to a private Galaxy server. Why? The problems seems to be the number of concurrent sub-jobs eg some small fraction might fail by chance. The problem isn’t a resource issue from what I can tell.

Tool Standard Error
Use of uninitialized value in require at /usr/local/tools/_conda/envs/__roary@3.13.0/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61.
Use of uninitialized value in require at /usr/local/tools/_conda/envs/__roary@3.13.0/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61.
Couldn’t open /data/jwd01/main/050/809/50809869/working/out/QMTNnhP_8F/SRR1999712_output_fasta_gff3.gff.proteome.faa: No such file or directory at /usr/local/tools/_conda/envs/__roary@3.13.0/lib/site_perl/5.26.2/Bio/Roary/SplitGroups.pm line 84.

Note: The wrapper might be able to be improved to better trap processing issues. If you want to make suggestions, please post back to that original github issue ticket and the tool author will see it. Referencing this topic/discussion will provide context.


Update: I decided to post a summary of what I think this tool wrapper needs. Feel free to add more ideas. Roary 3.13.0 fails at usegalaxy.org -- likely installation issue · Issue #293 · galaxyproject/usegalaxy-playbook · GitHub

2 posts were split to a new topic: Roary troubleshooting (version 3.13.0+galaxy2 or later)