Question about multiple sequence alignment

Geoff Barton geoff at ebi.ac.uk
Tue Jul 27 04:22:20 EST 1999


The limiting factor with clustal and most other hierarchical multiple
alignment
methods is the time required to perform all N(N-1)/2 pairwise comparisons.
This is
needed to generate the tree which is then followed to produce the multiple
alignment.  Once you have the tree, the multiple alignment stage itself only
takes N-1 comparisons of  sequences so is pretty quick even for very large
numbers of sequences.

Some programs allow you to skip the tree building step and just take an
arbitrary
order.  For example, if you do a database scan with a sequence, then want to
produce a multiple alignment of the top N hits, a good first approximation
is
to add sequences 2 to N to the alignment, one at a time.  I believe clustal
lets you
do this, and so does AMPS (my rather ancient alignment program).  The
alignments
that result from adding one sequence at a time are almost as good as the
full-tree
method providing the sequences are very similar to each other.  If there are
2 or more
distinct sub-groups of sequences, then the single order approach does less
well on
average.

I agree with David and Andrew that full N-way dynamic programming is usually
a
waste of time.

Geoff.

Doug Eernisse <DEernisse at fullerton.edu> wrote in message
news:DEernisse-2507991812390001 at rolltmpxa80937kbbw.fullerton.edu...
> In article <7n8360$2n4$1 at holly.csv.warwick.ac.uk>, David Jones
> <jones at globin.bio.warwick.ac.uk> wrote:
>
> > The bottom line is that both methods are going to produce alignments
which
> > are not biologically correct - so why not use the faster approximate
method?
> >
> > On a more practical note, we are now facing situations where we now have
> > to produce good multiple alignments for hundreds or even thousands of
> > sequences. Even the faster approximate MSA programs take a long time to
> > align this many sequences.
> >
>
> Just out of curiosity, does anyone have empirical experience with
> how many sequences can be aligned with the current version of Clustal
(1.8)?
> I was surprised that I could do something like 69 sequences without
> much problem on my Mac G3 (Clustal X, rDNA sequences about 1.8 kb average,
> default settings). Now I am wondering how many I could do if I waited
longer
> or installed Clustal W on a Unix box. I am suspecting that Clustal is
> about as fast as any, reasonably effective, program, but please
> enlighten me if I am wrong.
>
> Thanks.
>
>
>
> Doug
>
> --
> Doug Eernisse
> Department of Biological Science
> California State University
> Fullerton, CA 92834-6850 USA
> <http://biology.fullerton.edu/deernisse/>
>






More information about the Bio-soft mailing list