>Mark Siddall writes...
>>On this thread, and more pertinent to this group perhaps, I am
>increasingly concerned with a tendency to crank off sequences,
>fire them into some phylogenetic software like PHYLIP, PAUP, MEGA or
Hennig86 >without full understanding of what's going on. The argument can
be made that
>it is an awkward, if not dangerous, thing to simply pump out a
>parsimony tree, a Fitch-Margoliash distance tree, a UPGMA, and whatever
>and publish them all side by side. There appears to be in some arenas
>a fundemental ignoring of the issues, as though phylogenetic
>investigation was so-much recipe work. It is not. There are issues that
need >to be addressed by all practitioners in their analyses regarding multiple
>trees, assumptions, defensibility of a chosen approach, and so much
It is quite easy to understand how a particular algorithm for
phylogeny estimation is applied, it is far more difficult to understand
its statistical behaviour (how well it performs in estimating the "true"
phylogeny). As an analogy, consider the "method of moments" estimator of
variance. It is quite simple to calculate an estimate of variance using
this method (there is generally no need for a "black box" unless the
data set is large), yet it requires a good deal more thought to
understand the statistical properties of the estimate. Is it unbiased,
consistent,...? It is not generally expected that an author publishing a
MOM estimate of variance will discuss all the properties of the
estimator, or even be familiar with them. Since we know the method that
has been applied, we can easily research its properties on our own.
Still, some statistical estimators tend to be used more often than
others, and are frequently advocated by statisticians based on their
statistical properties (these being the subject of "estimation theory").
Unfortunately, in phylogenetics there is no formal "theory of
estimation" that even grossly approaches that available for our more
familiar statistical estimators, although we can sometimes evaluate
certain properties, such as consistency, by using computer simulations
of simple evolutionary models. Thus, there are often few reasons to
unambiguously choose one phylogenetic method over another. Methods that
are known to behave well under particular evolutionary models (neutral
genetic drift for example) might be chosen if we think the changes in
the gene, or sequence, are likely to have been selectively neutral. Of
course, those in the field of population genetics who are actively
trying to determine whether particular genes are neutral know how
difficult this can be. Nonetheless, we do our best with the information
we have available. Parsimony is often presented as independent of
evolutionary models, relying instead on mediaeval axioms for its
justification (and analogies to bird migration). So when should
parsimony be chosen? Some would say "always," others might instead
evaluate the ability of parsimony to recover phylogeny for "simple"
evolutionary models and base their choice on the result. In any case,
the decision is anything but clear, and some conscientious scientists
choose to publish trees generated using several methods. I fail to
perceive the danger in that.
>Mark Siddall writes:
>>I would in fact suggest that one thoroughly investigate all lines, BUT
>do not be a fence-sitter. Not all approaches are philosophically
>reconcileable and one must have conviction.
I would argue that "fence-sitting" is entirely rational (and quite
scientific) when no one particular method is the "correct" approach
given the current state of knowledge. It is curious that Siddall places
such importance on philosophy, since statistical properties of the
estimators (such as the probability of recovering the "true" phylogeny)
would seem to be of greater importance (at least from a scientific standpoint).
Bruce Rannala, Department of Biology, Yale University
rannala at minerva.cis.yale.edu