Question about multiple sequence alignment
geoff at ebi.ac.uk
Tue Jul 27 04:22:20 EST 1999
The limiting factor with clustal and most other hierarchical multiple
methods is the time required to perform all N(N-1)/2 pairwise comparisons.
needed to generate the tree which is then followed to produce the multiple
alignment. Once you have the tree, the multiple alignment stage itself only
takes N-1 comparisons of sequences so is pretty quick even for very large
numbers of sequences.
Some programs allow you to skip the tree building step and just take an
order. For example, if you do a database scan with a sequence, then want to
produce a multiple alignment of the top N hits, a good first approximation
to add sequences 2 to N to the alignment, one at a time. I believe clustal
do this, and so does AMPS (my rather ancient alignment program). The
that result from adding one sequence at a time are almost as good as the
method providing the sequences are very similar to each other. If there are
2 or more
distinct sub-groups of sequences, then the single order approach does less
I agree with David and Andrew that full N-way dynamic programming is usually
waste of time.
Doug Eernisse <DEernisse at fullerton.edu> wrote in message
news:DEernisse-2507991812390001 at rolltmpxa80937kbbw.fullerton.edu...
> In article <7n8360$2n4$1 at holly.csv.warwick.ac.uk>, David Jones
> <jones at globin.bio.warwick.ac.uk> wrote:
> > The bottom line is that both methods are going to produce alignments
> > are not biologically correct - so why not use the faster approximate
> > On a more practical note, we are now facing situations where we now have
> > to produce good multiple alignments for hundreds or even thousands of
> > sequences. Even the faster approximate MSA programs take a long time to
> > align this many sequences.
> Just out of curiosity, does anyone have empirical experience with
> how many sequences can be aligned with the current version of Clustal
> I was surprised that I could do something like 69 sequences without
> much problem on my Mac G3 (Clustal X, rDNA sequences about 1.8 kb average,
> default settings). Now I am wondering how many I could do if I waited
> or installed Clustal W on a Unix box. I am suspecting that Clustal is
> about as fast as any, reasonably effective, program, but please
> enlighten me if I am wrong.
> Doug Eernisse
> Department of Biological Science
> California State University
> Fullerton, CA 92834-6850 USA
More information about the Bio-soft