Alignment programs

Mark Siddall mes at zoo.toronto.edu
Wed Dec 13 22:32:46 EST 1995


In article <patersoa.82.00090741 at lincoln.ac.nz> patersoa at lincoln.ac.nz (Paterson, Adrian Mark) writes:
>Hi All
>
>Our lab has decided to give up aligning sequences by eye and will move into 
>the computer age.  Some questions:
>
>1)  Is this a good idea? Or is aligning sequences by eye still useful?
>

Yes this is wise!  Most importantly for the sake of objectivity but also
for the sake of the fact that (as you will discover if you follow
my advice below) if one has a particular alignment in hand that has
a particular cost associated with it (and all do.. even your by-eye ones
do implicitly), there are frequently other alignments that have the
same cost and thus are equally compelling vis a vis
making homology statements.

I would say that aligning by-eye is not useful in and of itself, nonetheless
I would argue that you go in and look at your alignment after it is
churned out by an algorithm because base-substitution costs and
indel costs are applied (usually) equivalently across all sites.  This
can confound alignment in some differentially behaved areas and may require
that you either fix it up a bit or carve those areas out, re-align them 
under different parameters and then re-insert them into the larger 
alignment.

>2)  Which programs are recommended for aligning sequences?

I recommend only MALIGN by Wheeler and Gladstein.  
It is ftp-able anonymously from ftp.amnh.org as a DOS binary
or as a tar-file with the compilable source code for SUN-OS (or any other
processor) making it useable on your local server with all of its speed and
memory.
MALIGN aligns according to a parsimony optimality criterion.
It allows you to specify base substitution costs, gap cost, leading and
trailing gap costs etc etc etc .
It has a variety of search strategies (it actually performs searches
analagously to tree-searches to find the most parsimonious alignment given
the parameters you have set).
It will give you multiple equaly optimal alignments if they exist.
It has options for maintaining reading frames in coding sequences.
etc etc etc 
Also, it will generate PAUP-ready or Hennig86-ready output (or both).


>
>3) What sorts of hidden assumptions are there in these programs?

In MALIGN the assumptions are less constraining than in others.
For example PILEUP of the GCG package is explicitly order dependent.
It just adds sequences to the growing alignment in the order you give them.
In CLUSTAL you are aligning things that are more similar to each otherin that
order and are thus constrained to a phenetic alignment that may not
be logical (i.e., if rates are different).
Neither PILEUP or CLUSTAL bother to search for less costly alignments
after constructing one from the first pass through the data.

However because MALIGN is so thorough it takes a lot of memory and
processor time.
So you have a choice... get your alignment quick or be rigorous in your
science.

>
Mark

-- 
Mark E. Siddall                "I don't mind a parasite...
mes at vims.edu                    I object to a cut-rate one" 
Virginia Inst. Marine Sci.                     - Rick
Gloucester Point, VA, 23062



More information about the Mol-evol mailing list