"Logic of Cladistics"

Bruce Rannala rannala at MINERVA.CIS.YALE.EDU
Fri Jun 10 12:44:20 EST 1994


>To: molecular-evolution at net.bio.net
>From: rannala at minerva.cis.yale.edu (Bruce Rannala)
>Subject: RE: "Logic of Cladistics"
>
>
>In the interests of polite conversation, I would generally avoid this 
topic, along with religion and politics. However, I think Felsenstein is 
correct in his assertion that the question of the logic underlying so-called 
"cladistic approaches" to phylogeny reconstruction has been largely ignored 
by the status quo mostly because of the unsettling outcomes (or lack of 
outcomes). I offer a few of my own observations on these questions, and my 
apologies for any nihilistic tendencies. The "classical" method of 
maximum-likelihood, due to R.A. Fisher, is generally applicable in cases 
where the family of possible distributions is known, apart from a finite 
number of unknown real parameters. Often, there may exist a unique set of 
parameter values that is most likely (i.e., maximizes the log-likelihood 
function), although this is by no means always the case. So the key feature 
here is that the family of "possible" distributions on the sample space is 
known (there are, of course, all sorts of other possible complications 
involving measure-theoretic considerations for discrete and continuous 
random variables which we biologists generally ignore).
>A practical outcome of this constraint is that some "model" of evolution is 
generally needed to postulate the form of the distribution of character 
states over the sample space and estimate branch lengths or other parameters 
of interest. An example of a possible model of gene frequency change is the 
stochastic process known as "Brownian motion."
>        Many cladists of the Elliot Sober school get rather upset over 
simple models of evolution, such as "Brownian motion." In many cases, this 
is probably justified. However, the alternative they espouse is to adopt a 
"statistic," parisimony-based minimum-branch-lengths to decide among trees 
without understanding any of the properties of this statistic. At this 
point, Occam's Probative is hurried in to save face. So the important 
question is how well-behaved the cladist's statistic really is? Under what 
sorts of evolutionary models does parsimony work well, or not so well? A 
number of authors including Felsenstein, Hillis, Nei and Penny (to name only 
a few), have tried to answer this question by evaluating the efficiency of 
parsimony methods under several different models of the evolutionary 
process, and also empirically using a known phylogeny (I believe it was for 
bacteria). The bottom line? Realistic evolutionary models tend to be 
multivariate stochastic processes that defy analytical solution, and closed 
form expressions for the character state distributions (i.e., distribution 
functions or probability density functions) are generally unavailable. 
>        It seems to me that there are three avenues from here: (1) 
increasingly complex computer simulations, inspired by research on 
evolutionary mechanisms, that attempt to evaluate the statistical properties 
of various phylogenetic estimators; (2) tracking real (perhaps 
artificially-accelerated) evolution in those organisms for which this is 
possible (mainly bacteria and viruses) and evaluating the statistical 
properties of the methods empirically (this would be tedious and allow for 
few generalizations to other species; each species might require a different 
estimator due to its "different" evolution); (3) developing phylogeny 
estimation methods with statistical properties that do not depend on any 
particular family of distributions over the sample space. For example, 
least-squares estimators require no knowledge of the form of the 
distribution of the error vector, apart from the mean and variance matrix. 
Recent methods of "partial-likelihood" might also be helpful here (please 
don't ask me to explain PL methods as I am no expert, see your local stats 
professor).
>        It is very telling that so few professional statisticians have 
ventured into the phylogenetics controversy (compare this with the field of 
theoretical population genetics which has attracted some of the most 
brilliant probabilists of this century: Bartlett, Feller, Karlin, Kolmogorov 
and Moran to name only a few). I would guess that the reason is that the 
mathematical issues are still very poorly defined in the field of 
phylogenetics. If we biologists were able to clarify our thinking; if we 
were able to decide what exactly we are trying to achieve with our 
phylogenetic methods, and to consolidate our views on what constitute valid 
evolutionary models, then we might have some hope of interesting the 
mathematical types and some real progress in the theory of phylogeny 
estimation might be made. Obviously, as Siddall suggests, the place to begin 
is with sequence data, since single-locus genetic models are generally much 
more tractable than quantitative genetic models, and require fewer assumptions.
>        I think that there is light at the end of the tunnel, but much of 
the current methodology in phylogenetics is bound to become obsolete. My 
advice to an ambitious young cladist would be, don't hitch your wagon to 
tightly to any particular train; change is, after all, the indication of a 
healthy scientific field. Buddhism and Christianity have both far-outlived 
Newtonian physics.Please direct any replies to this news-group, rather than 
my email address.
>  
> 
>
Bruce Rannala, Department of Biology, Yale University
rannala at minerva.cis.yale.edu




More information about the Mol-evol mailing list