In article <8q6b79$hvt$1 at mercury.hgmp.mrc.ac.uk>, Chris Conroy
<chris.conroy at stanford.edu> wrote:
> Is there any reason to pinpoint sequences that look like they
> have long branches in the unconstrained tree to leave out?
I know there is a diversity of opinions on this, but I recommend using
objective methods to identify sources of confusion in the data and remove
them. We use a method developed primarily by my ex-grad student, James
Lyons-Weiler. James and I published a paper on this tree-independent
approach to identifying taxa in your matrix that represent long branches
(i.e. they data for those taxa have been so scrambled by independent
evolution that they have lost phylogenetic signal in the context of the
other taxa in your matrix). The reference is:
Lyons-Weiler, J. and G. A. Hoelzer. 1997. Escaping from the Felsenstein
zone prior to the inference of a phylogenetic tree. Molecular
Phylogenetics and Evolution 8: 375-384.
> What about
> those sequences that jump around between constrained and unconstrained
> trees? Is it "fair" or statistically valid to sequentially remove
> outliers until the tree behaves in a clock-like manner?
Why not? Even your total data set represents a somewhat arbitrary
collection of taxa. You could have designed your a priori sampling scheme
differently. Surely that would not have invalidated your results from the
start. I would say that as long as you use objective (not topology based)
criteria for pruning the matrix, it is statistically valid because it
should not bias your phylogenetic inferences. Of course, reducing a data
set can reduce your statistical power. This is one basis for supporting
the act of data pruning; reducing the size of your data matrix is expected
to destabilize phylogenetic inferences, so increased stabilization is
evidence of effective data pruning (you removed the right stuff). Again,
this criterion for judging the effectiveness of your data pruning assumes
that you did not remove data in a way that biases the result. It should
only remove noise that was obscuring the imprint of phylogeny on the data.
--
Guy Hoelzer
Department of Biology
University of Nevada Reno
Reno, NV 89557