IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Alignment programs

Doug Eernisse DEernisse at fullerton.edu
Thu Dec 14 19:58:47 EST 1995

In article <1995Dec14.124301 at ebi.ac.uk>, higgins at ebi.ac.uk wrote:

> Mark Siddall wrote:
> >In MALIGN the assumptions are less constraining than in others.
> >For example PILEUP of the GCG package is explicitly order dependent.
> >It just adds sequences to the growing alignment in the order you give them.
> Mark,
> are you sure about that??
> The original PILEUP did what CLUSTAL did i.e. it made a UPGMA tree first
> and then aligned according to the tree.  Unless it has changed, what you
> say is simply untrue.  

Yes, I think Mark is wrong. The order of pairwise alignments is by default
determined from a distance matrix of the remaining sequences. I haven't
checked this in awhile but I don't think there is an option for
specifying the pairwise addition order. Does anyone know differently?
Can Clustal W do this now? I never was able to get it working with
Clustal V.

> >In CLUSTAL you are aligning things that are more similar to each otherin that
> >order and are thus constrained to a phenetic alignment that may not
> >be logical (i.e., if rates are different).
> The "logic" behind using UPGMA is so that the "most similar remaining"
> are always aligned next.  The more similar two sequences or groups of
> sequences are, the more accurate the alignment is.

You may get the optimal alignment for the given model used but
I don't think you can claim it is the most accurate in terms of
representing the actual history of insertion/deletion events.

>  By aligning the most
> similar sequences next, you always choose the most accurate next step.
> This is regardless of rate differences!!!!!!!

I don't know how you can claim this. UPGMA is notoriously sensitive
to unequal rates. The problem is that as soon as you align one sequence
to another, you are biasing a subsequent phylogenetic analysis to
a result that puts those two sequences together. After all, you are
aligning the first two sequences optimally with each other, then
aligning a consensus of these two with the remaining sequence most
similar to it, and so on. One shouldn't be too surprised if, say, a 
parsimony analysis of the Clustal V aligned data is identical to the 
phenogram reconstructed from the initial _unaligned_ UPGMA distance 

> Even if the UPGMA tree is phylogenetically incorrect, you might still get
> a better alignment by following it.  

Mark is claiming that it would be preferable to consider the
phylogenetic reconstruction simultaneously with the alignment.
Malign is similar to Jotun Hein's earlier program in this way.
I agree with you that the proof is in the pudding. In my experience,
an older version of Malign give very similar alignments to Clustal V, 
and was quite a bit more difficult to use because there are so many possible
parameter combinations that can wildly affect the alignment (and it
tended to bomb with certain combinations, hopefully fixed by now).
Still, both programs did pretty well in general.

>In Clustal W (published a year
> ago) we use the Neighbour-Joining method of Saitou and Nei anyway which
> gives us the best of both worlds.

I was never clear why this wasn't done in the first place, but Mark
may have not known about this welcome recent change because it is a bit 
buried in the Clustal W documentation. Look for it in the publication
concerning Clustal W or in the file that accompanies Clustal W
called "clustalw.ms ascii". Again, there is no guarantee that the
NJ topology derived from the _unaligned_ data reflects the
true history of branching, so that one is still biasing subsequent
phylogenetic results towards agreement with this topology. James Lake
published a paper on this a few years back (Lake, 1991. MBE 8:378-385).
Another unrelated point is that the NJ topology is not necessarily
the topology that best fits an optimality criterion, such as
"minimum evolution."

> >Neither PILEUP or CLUSTAL bother to search for less costly alignments
> >after constructing one from the first pass through the data.
> >However because MALIGN is so thorough it takes a lot of memory and
> >processor time.
> >So you have a choice... get your alignment quick or be rigorous in your
> >science.
> Until the mathematicians figure out the multiple alignment problem in
> more detail, the SOLE criterion that should be used to judge the quality
> of a method is the quality of the alignments. 


> Two questions:
> 1) Can you explain the exact procedure/algorithm followed by MALIGN.
> 2) Would you like some test cases to try out?
> Des Higgins

I would appreciate the test cases by email if you are willing
to send them.

Doug Eernisse <DEernisse at fullerton.edu>
Dept. Biological Science MH282
California State University
Fullerton, CA 92634

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net