IQ-TREE configuration

I’m trying to specify the substitution model in IQ-TREE. This is supposed to be done with the -m tag, but when I tried to do this via Galaxy Europe, I got an error. I think my syntax for entering the model is correct, according to IQ-TREE’s website. As an example, one of my datasets got a best model of TPM3uf+G4 with ModelTest-NG 0.1.7, so the appropriate syntax should be -m TPM3uf+G4 (although I tried variations), but Galaxy’s IQ-TREE kept giving me errors like this: ERROR: File not found TPM3X.

Ultimately, I tried running IQ-TREE with its included model selector, ModelFinder, although the resulting tree looks weird, and I can’t help but think it’s a problem of using the wrong model. Apparently ModelTest-NG outperforms ModelFinder. Can anyone tell me what I need to do to specify a model in Galaxy’s IQ-TREE?

Hi @bernt-matthias -and @wm75 - would one you be able to help? Or suggest someone who could? Thanks!

1 Like

I’m also having a tangential issue of the IQ-TREE failing after running for a month, but giving no error message. Any ideas to either that problem or the topic of the thread?

Hi @jaredbernard

I haven’t used this tool a bunch in order to learn all the little details, but did just start up a test using the example linked on the tool form. I’ve shared it here https://usegalaxy.eu/u/jenj/h/tes-iq-tree

That file is a larger example than what is on the form directly. But reviewing that example seems like a good place to start.

7 28
Frog       AAATTTGGTCCTGTGATTCAGCAGTGAT
Turtle     CTTCCACACCCCAGGACTCAGCAGTGAT
Bird       CTACCACACCCCAGGACTCAGCAGTAAT
Human      CTACCACACCCCAGGAAACAGCAGTGAT
Cow        CTACCACACCCCAGGAAACAGCAGTGAC
Whale      CTACCACGCCCCAGGACACAGCAGTGAT
Mouse      CTACCACACCCCAGGACTCAGCAGTGAT

This is how I interpret the formatting (a flavor of PHYLIP)

ASB
C1TDDDDDDDDDDD1$
C2TDDDDDDDDDDD2$
C3TDDDDDDDDDDD3$

Where:

A = the number of species lines (C)
B = the length of the sequences (D)
C = species names, alphanumeric, all one word, and distinct per line
D = sequence, all the same length
S = space
T = tab
$ = new line
No extra blank lines at the end

For the other options, check the documentation for the tool via the links out. I would suggest starting very simple: not too many species, strong homology in the MFA sequences, default parameters. Then layer in options. Once working on a smaller data, scale up the inputs to see the limits. You could even decide to play around with the larger example when exploring parameters since I’m guessing that was the intention of providing it in so many places.

The EU server will let jobs run for a long time, and can rerun jobs one time (happens automatically, so can be missed). The Galaxy job logs might be lost if the tool eventually spins out and dies (memory reasons, etc). Smaller and shorter runs probably report more meaningful logs. I also see a place to input a “smaller” tree to guide the run – this is likely for iterative use to chomp through larger analysis pools of data (guess!).

So, the tool is working as far as I can tell. The usage can get quite complex. There is likely public discussion about it too. Translating direct usage to Galaxy usage should be possible, as the form options are labeled with command string tags.

Sorry I couldn’t help more! Others are still welcome to comment more :slight_smile:

ps: I forgot to mention that there is a Galaxy tutorial linked on the form (scroll all the way down). That also seems promising!

Thanks for looking into this! I doubt it’s simply an input formatting problem, because I had one or two runs that stopped almost immediately because of formatting issues – which for me were that some sequences had identical names. But that was corrected and then the jobs appeared to run for over four weeks. I’m reasonably sure the inputs were correct, but I’ll check through it all again to be sure.

I’ll take a look at the tutorial as well to see if I’m missing anything, so thanks for that. My datasets are massive, so that could be an issue. I’ll look into maybe subsetting for a trial run, as you suggest.

And again, it would be ideal if I could input the selection model so IQ-TREE doesn’t need to run ModelFinder, but I immediately got errors when I tried that, as described in my first post.

Thanks for the suggestions, and I’ll appreciate any further ideas as well! :melting_face:

Hi @jaredbernard This is interesting, and I wonder if it is related to the larger query (probably not). It sounds like the choice is not supported (could be just at that server).

If you want to make the option change suggestion to the tool wrapper authors, a prior post explains how so I won’t repeat that (see below). The more details, or scientific help to guide the change, you can offer, the more likely it is to be picked up soon-ish. I don’t know if the choice is available in the original tool or not, but maybe you do, and sometimes it is possible layer in that type of advanced configuration into the wrapper anyway. Apologies that I misunderstood about this part originally!

And, the rest sounds good! Smaller data seems like the way to go for now. Four weeks is a bit nuts!