FASTA Header Changes not shown in BLAST database

Hello everyone,

I used the “Text Transformation” tool in Galaxy to use the sed function to change the FASTA headers for my sequence files. When I convert these sequence files into a BLASTn database through the “makeblastdb” tool, the headers revert back to the original headers that were in the FASTQ files, not the new headers I made after converting FASTQ to FASTA and then changing the headers.

As anyone had this problem before?

Sincerely,
Rupinder

1 Like

Hi @R_J

This may sound simple – but try double-checking which dataset you are inputting to the makeblastdb tool.

After going through my steps once again, I am sure I am making the BLAST database through the “makeblastdb” tool and I am using the FASTA files that have had their headers changed through the “text transformation” tool which uses sed.

When I open the tabular file of the BLAST result, the subject accession has definitely reverted back to its original header.

1 Like

Hi @R_J

Few questions:

  1. Did you run makeblastdb on the original fasta first?
  2. Does that original output still exist in your account (in any history) or have you purged the resulting dataset (and all copies)?
  3. If you ran makeblastdb twice (original + post sed) – did you name the database both times (on the tool form with the same name) or did you use the default naming both times?
  4. What happens if you rerun makeblastdb on the correct fasta, and give it a distinct name?

I’m wondering if there is a corner case bug where the original index is persistently attached to your account. If the new index is created and has the same “name”, it may not be cleared or there is a naming conflict (both are technical problems).

Purging the original index (the dataset that represents it in your history) might resolve that kind of problem quickly for you now, then on our side, we can take a look at how those are stored server-side by the indexing tool and make changes as needed. Ideally, each time an index is created it would have a distinct internal name, even if not specified on the tool form. That may not be happening.

I’ll also run a quick test today – but want to give you something to try for now.

  1. Test if purging the original output resolves the problem. Just run a few queries through so it runs quickly. If the target (subject) IDs are correct, then you can run the entire query.
  2. If that doesn’t work, try recreating the index and give it a specific name (not using the default) then test run as above (just a few queries).
  3. It would be helpful if you post back what happens.

Thank you!