How to do Augustus training or Maker training?

I want to do Augustus training, but the rat model is not available on the list. Is there any other ways to do it?

And when I look at the Maker training set, I see that I have to use the ESTs and protein evidence databases. Where to find those databases? I have downloaded the rat (rn7.2) assembly dataset. Is it correct that I use the rna.fna file from the rn7.2 assembly as the ESTs or assembled cDNA and use the protein.fna from the rn7.2 assembly as the protein evidence? It seemed not working in this way, but what else I should use then?

1 Like

Hi @daikez,
regarding your first question, you can generate an Augustus model by using the Augustus training tool.

Respecting the following questions, if you pretend to re-annotate the rn7.2 assembly, then it is not a good idea to use the protein.fna and rna.fna datafiles from the rn7.2 assembly because that information has already been used for performing the annotation, then it won’t provide any additional information.

You can get an updated EST dataset from Genbank. I recorded a short video about how to get the sequences.

On the other hand, the protein sequences can be obtained from UniProt.

Let me know if you need additional information.

Regards.

2 Likes

Also, this Galaxy tutorial on genome annotation with Maker includes Augustus training, so perhaps you’ll find it helpful.

1 Like

Thanks a lot for your videos! Now I may continue with the Galaxy tutorial. :+1: :slight_smile:

Erro on the Maker running:


I have followed exact the way you have showed me to download the EST and the SwissProt fasta data for Rat. And ran the Maker with exact setting the tutorial recommended. The erro message has showed that it’s possible to “fix it with -fix_nucleotides”, but there is no such option in the tool interface site.

Hi @daikez,
I updated the tool in order to include this parameter. It will be available soon. This is the PR which includes the modification Update Maker: Add paramter -fix_nucleotides.

Regards

The parameter -fix_nucleotides is still not available on the Maker tool interface (or it’s hiding somewhere?)

Hi @daikez,
the new update will be installed on Saturday (all Galaxy updates are scheduled for Saturday).

Regards.

Understood! Thanks! :+1:


The new update went through well (and wiredly quick), but it returned some empty files.

Should I try re-download EST and protein evidences?

I ran into this problem with the interpolation error with “@2”. You may find this conversation helpful.

Thanks! :+1: I am trying with your setting, and see if it went through well overnight. :slightly_smiling_face:

1 Like

Still running, not finished yet. But with my dataset, I see so far the Maker tool accept Dfam (no species) and no tRNA scan (it will definitely return an empty file with @2).

Should I also put some test comments in your other post after I have finished my test?

In my experience each Maker run takes 4-10 days. I have been getting results by using Dfam without selecting a species. When I tried to select a species, I got another error message saying RepeatMasker could not detect the selected species.

1 Like

Has anyone had very poor recovery of genes after multiple rounds of Maker, despite using both Augustus and SNAP training?

My genome assembly has very high BUSCOs (~98%) and my transcriptome assembly is the same (~98%), but my annotations are only recovering ~2%. I’ve tried annotating against the proteome of a relative and also using the UniProt/Swissprot protein set. The latter works slightly better, but still pretty low recovery.

Is there a better way to train Augustus? Or any other ideas? I just tried adding a transcriptome assembly gff from a relative, but this is in the “other_gff” part so it may not help much.

It looks strange to get so few busco, did you run all the training rounds? Could you share your history with me (anthony.[myname]@inria.fr) so I can have a look?

2 Likes

Augustus training is a mystery for me, since it requires a pre-defined training set. And my interested species “rat” is not on the list, not even a rodent. May I borrow the “Homo Sapient” training set instead, though this sounds odd? So how can people even start an Augustus training if there is no reference framework in place in advance?

I sent it to you, replacing the brackets with your last name. Thanks so much for any ideas you have!

1 Like

The others can answer better than me, but I think it really depends on the reason you are training Augustus. Depending on that, I think you can train it in different ways.

When running BUSCO to understand the completeness of a genome assembly, for instance, you can use Augustus to self-train against the closest model species available.

For annotating a genome with Maker, you can use the tool “Train Augustus” to train Augustus on the imperfect annotations of your de novo genome that Maker produced in the first round. Then that Augustus model can be used in the following round of Maker annotations.

You may find this thread and this thread helpful. And I’m sure others can give more accurate answers.

2 Likes