Program for tree comparison

Joe Felsenstein joe at evolution.genetics.washington.edu
Thu Apr 13 22:19:38 EST 1995


In article <3mjgtu$91h at newshound.uidaho.edu>,
James Foster <foster at cs.uidaho.edu> wrote:
>Korbinian Strimmer (strimmer at wap18.zi.biologie.uni-muenchen.de) wrote:
>: I want to count how many trees in the
>: big treefile are identical to the specified tree. As there are many
>: possibilities for writing down a given tree in the "New Hampshire"
>: form one can not simply compare the two files with a text editor
>: but one must think of another way. I suppose that this program must 
>: work in a way Consense (from PHYLIP) works, but Consense alone gives
>: no answer to my problem.
>
>You're looking for algorithms for the "tree isomorphism" problem: given
>two trees, determine if there is some way of naming all the nodes in
>each to get the same tree.
>
>There are polynomial time algorithms for this...

I don't think Strimmer's original problem is the one Foster discusses.
My reading of Strimmer's post is that he does NOT want to allow the tips
to be renamed.  He sees that  ((a,b),(c,d)); is the same tree as,
for example ((c,d),(b,a));   If the trees are unrooted there are even more
possibilities that two strings of symbols describe the same tree.  Strimmer
does not, I think, want to count either of these as identical to
((c,a),(b,d));

If one could write a program to read each tree in and flip subtrees at
each node into a canonical order (such as putting the subtree with the
species name that is lexicographically first ahead of one that has its
earliest species lexicographically later), then one could, after that was
done, compare strings to see if the trees are identical.

My program CONSENSE tabulates frequencies of subsets that occur on the trees,
and as a result does not lend itself well to what Strimmer wants.  Alas,
neither do Foster's algorithms.  The best I can think of is to do many
runs of CONSENSE, each making a consensus of two trees, one of the
sampled ones and the target tree.  Then look  at each output to see if
all the groups on it occur 100% of the time.  But that is tedious.

-----
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
 Internet:         joe at genetics.washington.edu     (IP No. 128.95.12.41)



More information about the Mol-evol mailing list