DNA substitutions saturated?

Joe Felsenstein joe at evolution.genetics.washington.edu
Sun Dec 3 01:51:20 EST 1995

In article <DIxqu9.H54 at zoo.toronto.edu>,
Mark Siddall <mes at zoo.toronto.edu> wrote:
>This is in response to Doug Eernisse's post.
>(Hi Doug!).
>You asked about something to do with "How to deal with gaps in the alignment".
>I'd like to follow this thread here by inquiring of the 
>readership how they feel about the 2 propositions:
>1) indel events are not observed data (that is one does not 
>observe gaps in a sequence), they are matter of inference, thus, should
>not be treated as observed data points (i.e., code them as "missing").
>2) in order to achieve a multiple alignment, one must assign a cost
>to a gap (or string thereof), thus phylogenetic analysis of he 
>aligned data without coding for gaps is inconsistent with the
>epistemology of having gotten the alignment itself. (Can't have your 
>cake and eat it too).

#2 is the more persuasive.  However the best thing to do is what David Sankoff
(Sankoff, Morel, and Cedergren, 1973; Sankoff, 1975; Sankoff and Rosseau, 1975)
suggested: do the phylogenies and the alignments as part of the same process.
This he suggested to do by parsimony (his strategy has been implemented in
programs by Jotun Hein and, more exactly, by Ward Wheeler and David Gladstein).
I would prefer likelihood, of course, an extension of the work of
Bishop and Thompson (1986) and Thorne et. al. (1990), and assign probabilities
of events rather than weights.  But this computation is still impractical.

Most people instead first align, then do phylogenies.  This is an approximation
to the better, integrated strategy.  If people code gaps as if they are
missing nucleotides, once the alignment is estimated, they are doing so
not out of an ideological position that gaps are non-data, but just because
they don't really know another way of coding them that fits the phylogeny
inference.  Coding them as missing nucleotides leaves out the gap/nongap
information and leaves the tree to be inferred from the nucleotide
substitutions alone.  So the practice of #1 is useable, but not for reason #1.

Joe Felsenstein         joe at genetics.washington.edu     (IP No.
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA

More information about the Mol-evol mailing list