How to do Augustus training or Maker training?

Erro on the Maker running:


I have followed exact the way you have showed me to download the EST and the SwissProt fasta data for Rat. And ran the Maker with exact setting the tutorial recommended. The erro message has showed that it’s possible to “fix it with -fix_nucleotides”, but there is no such option in the tool interface site.

Hi @daikez,
I updated the tool in order to include this parameter. It will be available soon. This is the PR which includes the modification Update Maker: Add paramter -fix_nucleotides.

Regards

The parameter -fix_nucleotides is still not available on the Maker tool interface (or it’s hiding somewhere?)

Hi @daikez,
the new update will be installed on Saturday (all Galaxy updates are scheduled for Saturday).

Regards.

Understood! Thanks! :+1:


The new update went through well (and wiredly quick), but it returned some empty files.

Should I try re-download EST and protein evidences?

I ran into this problem with the interpolation error with “@2”. You may find this conversation helpful.

Thanks! :+1: I am trying with your setting, and see if it went through well overnight. :slightly_smiling_face:

1 Like

Still running, not finished yet. But with my dataset, I see so far the Maker tool accept Dfam (no species) and no tRNA scan (it will definitely return an empty file with @2).

Should I also put some test comments in your other post after I have finished my test?

In my experience each Maker run takes 4-10 days. I have been getting results by using Dfam without selecting a species. When I tried to select a species, I got another error message saying RepeatMasker could not detect the selected species.

1 Like

Has anyone had very poor recovery of genes after multiple rounds of Maker, despite using both Augustus and SNAP training?

My genome assembly has very high BUSCOs (~98%) and my transcriptome assembly is the same (~98%), but my annotations are only recovering ~2%. I’ve tried annotating against the proteome of a relative and also using the UniProt/Swissprot protein set. The latter works slightly better, but still pretty low recovery.

Is there a better way to train Augustus? Or any other ideas? I just tried adding a transcriptome assembly gff from a relative, but this is in the “other_gff” part so it may not help much.

It looks strange to get so few busco, did you run all the training rounds? Could you share your history with me (anthony.[myname]@inria.fr) so I can have a look?

2 Likes

Augustus training is a mystery for me, since it requires a pre-defined training set. And my interested species “rat” is not on the list, not even a rodent. May I borrow the “Homo Sapient” training set instead, though this sounds odd? So how can people even start an Augustus training if there is no reference framework in place in advance?

I sent it to you, replacing the brackets with your last name. Thanks so much for any ideas you have!

1 Like

The others can answer better than me, but I think it really depends on the reason you are training Augustus. Depending on that, I think you can train it in different ways.

When running BUSCO to understand the completeness of a genome assembly, for instance, you can use Augustus to self-train against the closest model species available.

For annotating a genome with Maker, you can use the tool “Train Augustus” to train Augustus on the imperfect annotations of your de novo genome that Maker produced in the first round. Then that Augustus model can be used in the following round of Maker annotations.

You may find this thread and this thread helpful. And I’m sure others can give more accurate answers.

2 Likes

After a few weeks of running Maker using Dfam without selecting species, I still got empty file with @2 errors.

I am doing the last option now by using the RepeatMasked assembly to do Maker and disabled Dfam.


Got @2 error message immediately.

Sorry to hear that you are still having problems, @daikez.

In my very limited experience, I found that the @2 interpolation problem did not actually prevent Maker from giving results. @abretaud has said this as well. So although this problem with the Perl interpreter should probably be resolved, it likely isn’t the reason that your datasets are empty. Your detailed error message will likely show other errors, but unfortunately the @2 issue is listed as the main error on the metadata – leading one to think it is the reason something didn’t work. In my case, the error message revealed that the Dfam repeat database was trying to reference a species from the list that was apparently not supported (Drosophila), as you can see on my error message posted here. None of the other model organisms on the list were appropriate for my work, so I did not select a species, thereby using the entire curated Dfam consortium of repeats. Once I did that, I started getting results from Maker, and could train Augustus on the results.

(By the way, I’m now using a custom repeat library I built instead of the whole Dfam consortium, but that’s another story.)

So all I can suggest is to look closely at your entire error message, likely several lines below the initial @2 warning.

Maybe someone from Galaxy will have better suggestions.

For people who have the problem of low BUSCO recovery I described earlier, I seem to have resolved it by creating my own custom repeat library to use with Maker. I’m still in the early rounds trained with Augustus, but recovery of BUSCOs has already increased to nearly 80%. To produce a de novo repeat library, I used the EDTA workflow, which combines several packages, including RepeatModeler and LTRharvest.

1 Like