Did you finally find the solution? I’ve got the same problem as well. I run roary for 37 e coli genomes, but the core genome alignment file is empty and there is no sequences come out
All of the UseGalaxy.* servers have been updated to the latest tool version:
Galaxy Version 3.13.0+galaxy2. If this is not the version you are running, please try with the updated wrapper.
I updated the ticket with the details about the resolution addressing the “spaces in names” issue.
I’m going to split off your question into a new topic. If you want more help, would you please post back a shared history link? Sharing your History You can also start up a rerun in that same history if you haven’t already (reruns resolve transient failure reasons).
Test history with example usage: https://usegalaxy.org/u/jen/h/test-prokka-roary
@jennaj Posting here as I am facing the same issue on
I am running Roary on a collection of 2015 gff3 files. Tried multiple times - all the outputs of Roary are green but empty. In the stderr of the tool I get:
Use of uninitialized value in require at /email@example.com/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61. Use of uninitialized value in require at /firstname.lastname@example.org/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61. Couldn't open /data/jwd04/main/057/965/57965461/working/out/MHL1WUEQd_/SRR9971496.gff.proteome.faa: No such file or directory at /email@example.com/lib/site_perl/5.26.2/Bio/Roary/SplitGroups.pm line 84.
I noticed that there is a “failed to allocate memory message” in the job details.
I have downloaded and checked that all files are of expected size and do not contain invalid characters (spaces etc). Also, the tool works fine on a small subset of the data.
Anything else I should try? Thanks!
The “out of memory” message might not be actually what is going wrong. UseGalaxy.eu has significant resources on the cluster nodes. It seems the tool was attempting to use a file that was expected but not found – an intermediate file created by the tool itself.
People are having the same problem sometimes when running the tool directly. Example ticket: Could not open .proteome.faa · Issue #475 · sanger-pathogens/Roary · GitHub (are several others – search Issues with “proteome”). And this ticket: Core genome command not running · Issue #541 · sanger-pathogens/Roary · GitHub (keywords “core genome”).
I do see one of those tickets at the authors repo that suggests adding in a flag to the command-string to solve a similar problem for at least one person. Not sure if that is what will work for all failures like this or not – but asked the developers to review here and you can follow that ticket if interested (or help to modify/test the wrapper): Roary: adjust the command line to prevent intermediate file deletion? · Issue #5205 · galaxyproject/tools-iuc · GitHub
Other than that, you could review the data content and try to work around whatever is making the tool unhappy. The Galaxy wrapper around a tool stages and runs a tool – the internal algorithm is not controlled. Is there anything special about the input referenced in the
stderr: SRR9971496.gff? Does the GFF actually describe any protein/cds regions to extract (maybe try extracting separately and running a BLAST yourself)? What happens if you leave that dataset out, or input it in a different order, or input it with a smaller number of inputs… etc. You might be able to figure out how to get it to work. Eg: why it is that file the “first” problem reported and what other datasets have similar content, then leave all of those out. Or, maybe try out different parameters. The read quality from a super quick look seems not great for that accession: SRA Archive: NCBI
Hope that helps