metric on trees
joe at GENETICS.WASHINGTON.EDU
Sat Jun 25 01:12:03 EST 1994
Michael J. Hennebry (hennebry at plains.NoDak.edu) suggested concerning
distances between trees:
> Try this metric:
> dist(T1, T2)^2 = Sum (dfatca(T1, a, b)-dfatca(T2, a, b))^2
> Where dfatca(T, a, b) is the distance in tree T, from labeled node a to
> its common anscestor with labeled node b and the sum is over all pairs
> of labeled nodes a and b in the trees.
Note that this is well-defined only when we have a clocklike tree.
Otherwise the distance from a to the common ancestor of a and b is
different from the distance from b to that common ancestor, and then
one doesn't know which of them to use. Using the total path length
from a down to the common ancestor and back up to b would enable us to
define this one for both clocklike and nonclocklike trees.
One disadvantage of either of these is that when we duplicate a species
and thus have it sitting right next to its duplicate, with zero branch
length between them, we increase the weight given to that part of the
tree in computing the distance.
Try this metric (Kuhner and Felsenstein, MBE, 1994):
dist(T1, T2)^2 = Sum (blength(T1, x) - blength(T2, x))^2
where the x's are all branches that appear in either or both trees.
blength(T1, x) is the length of branch x in tree T1. If it does not appear in
one of the trees, its length is taken to be 0 in that tree. When all
branches are assumed to have length 1, this is Robinson and Foulds's dT.
Of course there are many other possible metrics, and it just depends on what
properties you want the distance to have.
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
Internet: joe at genetics.washington.edu (IP No. 220.127.116.11)
More information about the Mol-evol