Logic of cladistics

Joe Felsenstein joe at GENETICS.WASHINGTON.EDU
Mon Jun 13 10:05:39 EST 1994


[my e-mail address may be incorrectly generated by doing a Reply: use the one
 at end of this posting instead]

There has been a lot of interesting discussion and useful points made here,
but I want to return to some points of the original discussion.  Mark
Siddall wrote a few days ago:

> Though defintitely a cladist myself, I don't fall into
> that group that eschews issues of probability and have even taken a stab
> at some of this myself (see below).

Fine.  There is disagreement among cladists as to whether to disallow
statistical frameworks.  However I still cannot put this together with
Mark's reasons for backing parsimony.  One might give reasons why parsimony
is expected to behave well statistically, but instead Sidall's argument is

  Perhaps the perspective is that
> there is a feeling that one cannot ever know the "true" phylogeny so 
> how does one go about looking to empirically measure the performance of
> a statistical approach to phylogeny reconstruction?

One could computer-simulate its behavior, in which case one has godlike
powers and knows the true phylogeny.  But more central is Mark's point that:

> Parsimony is seen by many (like myself) to be a logical "path of least
> resistance" approach to phylogeny reconstruction.  That is, why
> propose widespread convergence (for example) when there's a simpler
> explanation.

The question is, is following this path of least resistance and measuring
hypotheses by this measure of "simplicity" contradictory to making a
statistical inference?  How does this "simplicity" behave statistically?
We know of cases where it behaves badly, making a statistically inconsistent
estimates, for example. 

>    I think that one would likely find that those that resist a 
> probabilistic approach have their grounding in morphology and not in
> sequence data.
> Whereas an argument can (and has) been made to think of the phylogenetic
> signal in sequence data to be a very jumbled one and in need of some
> filtering, perhaps by a maximum liklihood approach, or by transversional
> weighting or whatever, the equivalent can not really be said of
> the evolution of a femur.

As Mike Zwick just said also, this sounds like it is being argued that
whereas we must make assumptions when analyzing molecular sequences, we
make none when using parsimony to analyze morphology.

I sense that two different frameworks are being used here at the same
time -- a statistical one and a nonstatistical one, and not in a way that
is logically connectable.

When I argued that:

> >If one is using the trees for some secondary analysis such as looking at
> >host-parasite coevolution, and one concentrates only on most parsimonious
> >trees, it would seem that if a statistical framework is allowed even in
> >principle, then one is effectively assuming a 100% probability for the
> >set of most parsimonious trees.

Mark replied that:

> This IS disturbing, I agree.  The above, however, assumes that the
> coevolutionary biologist is concerned with confidence in their
> coevolutionary hypothesis.  Rarely are they.  Rod Page is.  So am I.
> The extent to which I have taken it (submitted... and crossing my fingers)
> is to ask can I get a fit of the host and parasite cladograms as good
> or better when I randomize the observed associations of host(s) and
> parasite(s).   The upshot of this is, that where the answer is:
> observed is no better than random...  then less than 100% confidence
> in the contributing cladograms isn't going to make it any better.
> Where the answer is: non-random association... one could make the argument
> that this is only a partial probability of the system.

The difficulty here is that I suspect that the people who publish these
results with conclusions drawn from a set of phylogenies in which one has
much less than 95% confidence, do not say this.  They imply that these trees
embody all that the phylogenetic analysis can yield.

> And, of course, make certain assumptions about
> the nature and structure of the data.  I am not afraid of such things...
> I wish my fellow cladists would more often own up to it though.

Sorry to go on so long, but I am trying to make one central point here,
not just to score points off Mark, who is being very helpful by being
willing to openly discuss all this.  That is that the way cladists currently
argue for the use of parsimony attributes to it assumption-freeness,
and simplcitiness, and does not connect with a statistical framework at all.

I'm NOT arguing that one should not use parsimony, or that one must use
likelihood, just that if you do so you should defend your practice based
on its statistical properties under some assumptions that you are willing to
discuss.  Mark notes that his fellow cladists are not often willing to
own up to their assumptions.  And notions of "simplicity" themselves
escape from discussion of assumptions.

-----
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
 Internet:         joe at genetics.washington.edu     (IP No. 128.95.12.41)



More information about the Mol-evol mailing list