Dear usegalaxy.eu community,
Prokka would not accept a draft genome assembly (which I would like to annotate) because the headers of the sequences are too long:
“Please rename your contigs”
“contig id must <=37 chars long”
Could I kindly ask for any suggestions how to shorten the headers of all sequences in the respective draft genome assembly?
The draft genome assembly (containing over 100 sequences) looks like this:
lcl|NODE_1_length_269672_cov_95.829929 Escherichia sp. A1234
TTGAGAAGGGAGAGATAAAACACGAGGCAAGACGAGTTGAACTTTGGAGTAATAAGTCAGTTGTCCT…
…
lcl|NODE_2_length_176203_cov_83.720917 Escherichia sp. A1234
AAAGGCAATTCCGAAAGGATATCTATGCTATTTAAGTTTTACCGGACAGCAATAAAAGTAAATAAAAAAA…
…
Thank you very much already in advance for your thoughts on this (basic) question!
Best,
Michael
P.S.:
I had mistakenly thought that ‚Relabel List Identifiers from contents of a file‘ (Galaxy Version 1.0.0) could be used for renaming the headers from e.g.:
lcl|NODE_1_length_269672_cov_95.829929 Escherichia sp. A1234
…
lcl|NODE_2_length_176203_cov_83.720917 Escherichia sp. A1234
…
to:
NODE_1_A1234
…
NODE_2_A1234
…
and I have already prepared a separate file which looks like this:
NODE_1_A1234
NODE_2_A1234
…
but only afterwards I came to notice that this would not work in my case, unfortunately.