Issue with Train Augustus tool

Hello, I am trying to use the Train Augustus tool with a FASTA format assembly as the genome to annotate and a Maker final annotation as the annotation to use for training. I get this error:

“1 Checking fasta headers in file /corral4/main/objects/c/5/6/dataset_c561dd1e-ebb2-4ead-8848-7b7b2dd0c7c0.dat…
WARNING: Fasta headers in file /corral4/main/objects/c/5/6/dataset_c561dd1e-ebb2-4ead-8848-7b7b2dd0c7c0.dat seem to contain non-letter and no”

I am not sure what file the error references, nor how to edit the file to remove the issues with the header. I see similar topics on here about errors with this tool, but not this specific error. Any help would be appreciated, I am just beginning to learn Galaxy!

Welcome, @Sylvie_Weaver_Fraley

The tool is complaining about the format of the > title lines in your assembly’s fasta file. That format was propagated into the Maker annotation file.

Simplifying the identifiers in the fasta file is usually a good idea. Tools can have trouble interpreting data with spaces or odd characters since it is trying to “match up” common identifiers between two or more different files.

Please give this a try. You might only need to remove description content from the fasta > title lines, leaving just the “identifier”. Then the tool would be able to match the fasta identifier with the annotation file’s CHROM column value.

Let us know if this helps or if you want to follow up more. :slight_smile: