Extract subsequence from FASTA/Q file

Which tool can I use to extract FASTA sequences from a file based on sequence ID? seqtk_subseq doesn`t work for my data any more (it used to). I have no idea what went wrong.
Here is the link to history

Hi @droslj

There are many ways to do this … these are the top choices for me.

  • Convert to tabular → keyword filter (by sequence id) → convert back to fasta
  • tool → Fasta regular expression finder Search in fasta for regexp match
  • tool → Filter FASTA on the headers and/or the sequences

More Data Manipulation Olympics

Thanks for the hint, I managed to complete the analysis. I still don’t understand what the issue was, must have been something in the reference file I pulled down from INSAflu site.

Hi @droslj

Oh, then maybe more was going on. Many tools do not expect to find description content on the fasta title lines, just the identifiers. For cases like that, NormalizeFasta is the tool to use to update the format.

These FAQs are for custom genomes, but the same format “rules”, when applied, can solve other fasta issues. Fasta (Reference Genomes)

Glad your problem was resolved even if none of this was your use case! I’m mostly posting the extra help for anyone else who lands on the topic later. :slight_smile:

1 Like