Sequence alignments -- where to chop?

Byron Adams bjadams at CRCVMS.UNL.EDU
Tue Apr 16 20:21:56 EST 1996

Wally said-
>Just trying to figure out the most "appropriate" course of action...

        You'll probably get a bunch of grief from shadetree systematists
(sensu J. Clabaugh) about the neccessity of complete sequences.  I wouldn't
doubt if most responses to your question are suggestions to get off of your
butt and sequence the missing segment.  I'm sure that if it were that
simple, you'd have done it by now.
        My suggestion is that you play around with the data set, and see
how it behaves (e.g. when you come across something new, poke it with a
stick and see what happens).  Do an analysis of all the taxa minus the one
with the gap.  Then cut the missing segment from all the taxa, include the
gapped taxa, and see if the new topology is different.  This could give a
clue (albeit maybe a misleading one if topologies do not differ) as to just
how important that hunk of sequence is to overall tree support.  Check your
tree "statistics" (e.g. decay indices, etc.) and see how much you gain or
lose by inclusion/exclusion of the gap/taxa.
        Cutting the gap out might violate some of the assumptions
associated with certain types of analyses, such as statistical or distance
methods.  Also, you may (or may not) be missing important information
associated with secondary structure.  Still, as long as you use an
objective means of polarizing the character states of the ingroup,
cladistic integrity will not be comprimised.
        In many systematic endeavors, investigators fail to recover
evidence for certain characters in some taxa.  In fact, in most sequence
alignments (homology statements) we have to insert gaps.  Some of these may
be informative, others are completely ambiguous.  We often treat the latter
as "missing" information.  Operationally I don't see this as being any
different than your proposal .  Just don't try and be sneaky about it... :)

        You should probably check out:

Maddison, W. P.  1994.  Missing data versus missing characters in
phylogenetic analysis.  Systematic Biology 42:576-581.



Byron J. Adams
University of Nebraska
406 Plant Sciences Hall
P.O. Box 830722
Lincoln, NE 68583-0722
(402) 472-2858                      "Wilderness, once we have given
fax (402) 472-2853                   it up, is beyond our
bjadams at crcvms.unl.edu               reconstruction."
badams at unlgrad1.unl.edu                 -Wallace Stegner

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net