Missing Gene trees in orthofinder outputs

valeriolollo · March 26, 2026, 8:39am

Hi!

I’m working on orthofinder 2.5.5+ and I’m having difficulties with the gene trees outcome. The error says “problem building datasets for collection”. While reading the help section I’ve noticed this line that says: “This galaxy tool implements the first part of the Orthofinder program, e.g. the clustering of orthogroups of genes”. I share with you the Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/orthofinder_onlygroups/orthofinder_onlygroups/2.5.5+galaxy1

And I share also the “tool standard error message”: /usr/local/bin/scripts_of/tree.py:367: SyntaxWarning: invalid escape sequence ‘\-’
“”"
/usr/local/bin/scripts_of/tree.py:1422: SyntaxWarning: invalid escape sequence ‘\-’
“”"
/usr/local/bin/scripts_of/newick.py:54: SyntaxWarning: invalid escape sequence ‘\[’
_ILEGAL_NEWICK_CHARS = “:;(),\[\]\t\n\r=”
/usr/local/bin/scripts_of/newick.py:57: SyntaxWarning: invalid escape sequence ‘\[’
_NHX_RE = “\[&&NHX:[^\]]*\]”
/usr/local/bin/scripts_of/newick.py:58: SyntaxWarning: invalid escape sequence ‘\d’
_FLOAT_RE = “[±]?\d+\.?\d*(?:[eE][-+]\d+)?”
/usr/local/bin/scripts_of/newick.py:60: SyntaxWarning: invalid escape sequence ‘\[’
_NAME_RE = “[^():,;\[\]]+”
/usr/local/bin/scripts_of/newick.py:337: SyntaxWarning: invalid escape sequence ‘\s’
MATCH = ‘%s\s*%s\s*(%s)?’ % (FIRST_MATCH, SECOND_MATCH, _NHX_RE)
/usr/local/bin/scripts_of/probroot.py:10: SyntaxWarning: invalid escape sequence ‘\i’
“”"
/usr/local/bin/scripts_of/probroot.py:201: SyntaxWarning: invalid escape sequence ‘\l’
“”"
/usr/local/bin/scripts_of/probroot.py:267: SyntaxWarning: invalid escape sequence ‘\l’
“”"

It’s not clear if this version of orthofinder includes building gene trees or not.

Best regards

Valerio

wm75 · March 26, 2026, 9:46am

Well, the tool has a “Full run (including gene trees)” option, so yes, it can do what you expect, but judging from the error message it doesn’t like your fasta input (probably the sequence identifiers).

To debug this in detail, we would need at least a link to a shared history.

In general, this is way easier if you just send a proper bug report from the failed dataset on the server you are working on because that report will already include all the necessary details. (Sorry in case you tried this already and did not get a reply.)

valeriolollo · March 27, 2026, 8:13am

Thank you for your kind reply, I share with you my orthofinder history
toolshed.g2.bx.psu.edu/repos/iuc/orthofinder_onlygroups/orthofinder_onlygroups/2.5.5+galaxy1

Kind Regards
Valerio

wm75 · March 27, 2026, 8:52am

@valeriolollo are you?

valeriolollo · March 27, 2026, 9:01am

Yes it’s me

wm75 · March 27, 2026, 9:29am

I mean are you going to share the history?

jennaj · April 1, 2026, 7:53pm

Hi @valeriolollo

I checked at UseGalaxy.org and didn’t find any bug reports send in for this tool either. Maybe you were working at a different server?

But first – if you are not sure how to review your job details, how to send in a bug report, or how to generate and post back a shared history link, please see the banner topic at this forum! It links to FAQs that explain the how-to.

How to get faster help with your question → FAQ: Troubleshooting errors

That said, I agree with the guess @wm75 provided! The tool seems to be reporting it found unexpected characters in the sequence headers. You can simplify the fasta > title lines to see if that is enough! Try using a very simple identifier format with only letters and numbers, and no special characters and no whitespace. The tool is using these identifiers as a primary key in the other manipulations, and special characters can lead to algorithm problems like yours!

>goodFormat1

>not-as_$readable by Tools | due to spaces and [special] % characters

We have some tutorials that teach how to use some of the Text/Fasta Manipulation tools! You can also search the tool panel for common utility names to find these – the Help on the form is a mini-guide!

If you need help confirming this issue, or with resolving it, we’ll need to see the exact data in context in a shared history. You can leave your own manipulation attempts (if any yet!) and the error results in this same history! Create the link and copy/paste it back here in a reply. You posted a tool_id last time, which is helpful, but for these we need to also see the data.

→ Hands-on: Data Manipulation Olympics / Data Manipulation Olympics / Introduction to Galaxy Analyses

If you solved this already, please let us know!