Error: Duplicate seq_ids are found

Hello everyone, I am using the web-based Galaxy tool, not the command line version. I merged FASTA files into one and I’m trying to construct a BLAST database with these local sequences through the makeblastdb function. I get an error message that reads, “Error: Duplicate seq_ids are found: GNL|BL_ORD_ID|9650923”.

Can anyone assist in finding a way to remove the duplicate seq IDs using the web-based Galaxy tool preferentially?

Thank you!

Hi @R_J,
how did you merge the FASTA files? Did you use FASTA Merge Files and Filter Unique Sequences?.

Regards

Hey @gallardoalba ,
I did use FASTA Merge Files and Filter Unique Sequences, I also used the Normalize FASTA function as well to truncate the sequence name at the first whitespace.

What happens if you change the regex to ^>([^.]+).*$? (without using Normalize FASTA)

1 Like