MAFFT Multisequence alignment

Hello,

I am a beginner user of Galaxy, and I am trying to do a phylogenetic analysis of a fungal species. I have about 85 sequences for multiple-sequence alignment using MAFFT. It has been more than 24 hours, and the process is still not complete. I want to know if there is a way to check the progress of the alignment process.

Thank you

Hi @Pallavi
to the best of my knowledge, Galaxy users cannot check job progress.
Kind regards,
Igor

1 Like

Hi @igor

Can you suggest how long exactly it should take for data this large (85 sequences, approx. 24 GB) to take for MSA?

Thanks

!!

Probably forever, or until your job gets killed.

Multiple-sequence alignment is computationally rather demanding and MAFFT is not meant to do whole-genome alignments. It’s good for, say, single gene or protein alignment (few thousand bases/AAs).

1 Like

Hi @wm75

Thank you for the response.
So which MSA tool would you suggest for data this large? ClustalW?

Thank you

No, all these classical MSA programs face kind of the same limitations.
You’ll need specialized tools like Cactus or progressiveMauve (these two I know are available on the EU server).

1 Like

Assuming that it’s indeed genomes you are trying to align here.

yeah @wm75

1 Like

Hi @Pallavi
as another option, you may consider using protein sequences from orthologous genes, if the genomes are annotated.
How do you prepare the input file for MAFFT job, by concatenation of genomes? If your fungal species have multiple chromosomes, you can terminate the job. Both MAFFT and ClustalW align every sequence to other sequences. Usually there is very little similarity between chromosomes in a genome.
Kind regards,
Igor

1 Like