Protein sequences alignment

Hi,

I was wondering could I get suggestions on how to align multiple protein sequences that are in an excel file to a reference protein fasta sequence. I want to get an output of which align and which do not align.

Thanks for any tool suggestions.

Hi @stauntok,
excel files are usually a bad supported format. I suggest to generate a FASTA file with your sequences and then use the Align sequences to a reference tool.

Regards

2 Likes

Hi @gallardoalba,
Thanks for your advice. I have generated a fasta file using Tabular-to-Fasta. I then tried to use the Align sequences to a reference tool. However, I got the error below, do you know how to fix it? I need an alignment file right?

Traceback (most recent call last):
File “/usr/local/tools/_conda/envs/__python-bioext@0.19.7/bin/bealign”, line 196, in
retcode = main(
File “/usr/local/tools/_conda/envs/__python-bioext@0.19.7/bin/bealign”, line 82, in main
_align_par(
File “/usr/local/tools/_conda/envs/__python-bioext@0.19.7/lib/python3.8/site-packages/BioExt/uds/init.py”, line 80, in _align_par
aln = Aligner(
File “/usr/local/tools/_conda/envs/__python-bioext@0.19.7/lib/python3.8/site-packages/BioExt/align/init.py”, line 93, in init
raise ValueError(‘codon alignment requires a protein score matrix’)
ValueError: codon alignment requires a protein score matrix

Thanks,
Kara

Hi @stauntok,
I cannot reproduce that error, which Galaxy instance are you using? This tool requires the reference as a sequence of nucleotides.

Regards

Ahhh, my reference is a protein sequence and my sequences of interest are protein sequences.

2 Likes

Hi @stauntok

Maybe try this tool instead?

  • NCBI BLAST+ blastp Search protein database with protein query sequence(s). The “database” can be a single sequence (custom database).

If you don’t have too many sequences, these tools might also be worth trying. Both will attempt to align all-vs-all.

  • ClustalW multiple sequence alignment program for DNA or proteins
  • MAFFT Multiple alignment program for amino acid or nucleotide sequences

Search the tool panel with the term “protein alignment” for more choices.

Tutorials

2 Likes