Alignment programs

higgins at ebi.ac.uk higgins at ebi.ac.uk
Thu Dec 14 06:43:01 EST 1995


Mark Siddall wrote:

>In MALIGN the assumptions are less constraining than in others.
>For example PILEUP of the GCG package is explicitly order dependent.
>It just adds sequences to the growing alignment in the order you give them.

Mark,
are you sure about that??
The original PILEUP did what CLUSTAL did i.e. it made a UPGMA tree first
and then aligned according to the tree.  Unless it has changed, what you
say is simply untrue.  

>In CLUSTAL you are aligning things that are more similar to each otherin that
>order and are thus constrained to a phenetic alignment that may not
>be logical (i.e., if rates are different).

The "logic" behind using UPGMA is so that the "most similar remaining" sequences
are always aligned next.  The more similar two sequences or groups of
sequences are, the more accurate the alignment is.  By aligning the most
similar sequences next, you always choose the most accurate next step.
This is regardless of rate differences!!!!!!!
Even if the UPGMA tree is phylogenetically incorrect, you might still get
a better alignment by following it.  In Clustal W (published a year
ago) we use the Neighbour-Joining method of Saitou and Nei anyway which
gives us the best of both worlds.

>Neither PILEUP or CLUSTAL bother to search for less costly alignments
>after constructing one from the first pass through the data.

>However because MALIGN is so thorough it takes a lot of memory and
>processor time.
>So you have a choice... get your alignment quick or be rigorous in your
>science.

Until the mathematicians figure out the multiple alignment problem in
more detail, the SOLE criterion that should be used to judge the quality
of a method is the quality of the alignments.  Philosophical justifications
are meaningless if the alignments are silly.  We never claimed that clustal
was all powerful; we merely pointed out that it was fast and gave good
quality alignments in cases where you have a good idea what the alignment
should look like.   There are now DOZENS of test cases of
both RNA and protein sequences where there are sufficient data (e.g. 2-D or 3-D
structures known and/or sheer volume of primary sequence data) to judge the
quality of multiple alignment programs.  Clustal does pretty well but
is obviously far from perfect (very far); about half a dozen other programs
(e.g. by Jotun Hein, Willie Taylor, Florence Corpet, Randall Smith,
Feng and Doolittle, Geoff Barton to mention just some of the relatives 
and precursors of clustal) perform similarly well WHEN YOU LOOK AT THE QUALITY 
OF THE ALIGNMENTS!!!!!!


Two questions:

1) Can you explain the exact procedure/algorithm followed by MALIGN.
2) Would you like some test cases to try out?

Des Higgins




More information about the Mol-evol mailing list