Prokka error "Please rename your contigs" - "contig id must <=37 chars long"

Dear usegalaxy.eu community,

Prokka would not accept a draft genome assembly (which I would like to annotate) because the headers of the sequences are too long:
“Please rename your contigs”
“contig id must <=37 chars long”

Could I kindly ask for any suggestions how to shorten the headers of all sequences in the respective draft genome assembly?

The draft genome assembly (containing over 100 sequences) looks like this:

lcl|NODE_1_length_269672_cov_95.829929 Escherichia sp. A1234
TTGAGAAGGGAGAGATAAAACACGAGGCAAGACGAGTTGAACTTTGGAGTAATAAGTCAGTTGTCCT…

lcl|NODE_2_length_176203_cov_83.720917 Escherichia sp. A1234
AAAGGCAATTCCGAAAGGATATCTATGCTATTTAAGTTTTACCGGACAGCAATAAAAGTAAATAAAAAAA…

Thank you very much already in advance for your thoughts on this (basic) question!

Best,
Michael

P.S.:
I had mistakenly thought that ‚Relabel List Identifiers from contents of a file‘ (Galaxy Version 1.0.0) could be used for renaming the headers from e.g.:

lcl|NODE_1_length_269672_cov_95.829929 Escherichia sp. A1234

lcl|NODE_2_length_176203_cov_83.720917 Escherichia sp. A1234

to:

NODE_1_A1234

NODE_2_A1234

and I have already prepared a separate file which looks like this:

NODE_1_A1234
NODE_2_A1234

but only afterwards I came to notice that this would not work in my case, unfortunately.

Hi,

there is a very basic trick to convert your FASTA file to tabular with Galaxy | Europe

This will enable you to use all Galaxy tabular tools to manipulate that name column as you like. If you are finished you can use Galaxy | Europe to convert it back to FASTA :slight_smile:

You could also try to use Galaxy | Europe and use “Truncate sequence names at first whitespace”.

Hope that helps,
Bjoern

Another alternative is to use Text transformation with sed tool with this expression:
s/(^lcl.{n}).*/\1/, where n is the length of the header.

Regards

Hi bjoern.gruening,
Hi gallardoalba,

Thank you both very much for your quick replies with the helpful suggestions to my very basic question - great to have the choice of different approaches doing the trick in shortening the headers of my input data.

Best,
Michael