Sequence alignments -- where to chop?
higgins at ebi.ac.uk
higgins at ebi.ac.uk
Thu Apr 18 05:50:53 EST 1996
In article <firstname.lastname@example.org>,
bjadams at CRCVMS.UNL.EDU (Byron Adams) writes:
> Wally said-
>>Just trying to figure out the most "appropriate" course of action...
> You'll probably get a bunch of grief from shadetree systematists
> (sensu J. Clabaugh) about the neccessity of complete sequences. I wouldn't
> doubt if most responses to your question are suggestions to get off of your
> butt and sequence the missing segment. I'm sure that if it were that
> simple, you'd have done it by now.
> My suggestion is that you play around with the data set, and see
> how it behaves (e.g. when you come across something new, poke it with a
> stick and see what happens). Do an analysis of all the taxa minus the one
> with the gap. Then cut the missing segment from all the taxa, include the
> gapped taxa, and see if the new topology is different. This could give a
> clue (albeit maybe a misleading one if topologies do not differ) as to just
> how important that hunk of sequence is to overall tree support. Check your
> tree "statistics" (e.g. decay indices, etc.) and see how much you gain or
> lose by inclusion/exclusion of the gap/taxa.
> Cutting the gap out might violate some of the assumptions
> associated with certain types of analyses, such as statistical or distance
I could be wrong but ....
as long as you cut the same section out of all the sequences, you actually
protect yourself from these "violations". The classic violation is to
break the so called triangle inequality.
> Also, you may (or may not) be missing important information
> associated with secondary structure. Still, as long as you use an
> objective means of polarizing the character states of the ingroup,
> cladistic integrity will not be comprimised.
> In many systematic endeavors, investigators fail to recover
> evidence for certain characters in some taxa. In fact, in most sequence
> alignments (homology statements) we have to insert gaps. Some of these may
> be informative, others are completely ambiguous. We often treat the latter
> as "missing" information. Operationally I don't see this as being any
> different than your proposal . Just don't try and be sneaky about it... :)
> You should probably check out:
> Maddison, W. P. 1994. Missing data versus missing characters in
> phylogenetic analysis. Systematic Biology 42:576-581.
> Byron J. Adams
> University of Nebraska
> 406 Plant Sciences Hall
> P.O. Box 830722
> Lincoln, NE 68583-0722
> (402) 472-2858 "Wilderness, once we have given
> fax (402) 472-2853 it up, is beyond our
> bjadams at crcvms.unl.edu reconstruction."
> badams at unlgrad1.unl.edu -Wallace Stegner
More information about the Mol-evol