Program for tree comparison

James Foster foster at cs.uidaho.edu
Fri Apr 14 10:49:08 EST 1995

Joe Felsenstein (joe at evolution.genetics.washington.edu) wrote:
: In article <3mjgtu$91h at newshound.uidaho.edu>,
: James Foster <foster at cs.uidaho.edu> wrote:
: >Korbinian Strimmer (strimmer at wap18.zi.biologie.uni-muenchen.de) wrote:
: >: I want to count how many trees in the
: >: big treefile are identical to the specified tree. As there are many
: >
: >You're looking for algorithms for the "tree isomorphism" problem: given
: >two trees, determine if there is some way of naming all the nodes in
: >each to get the same tree.
: >
: >There are polynomial time algorithms for this...

: I don't think Strimmer's original problem is the one Foster discusses.
: My reading of Strimmer's post is that he does NOT want to allow the tips
: to be renamed.  He sees that  ((a,b),(c,d)); is the same tree as,
: for example ((c,d),(b,a));   If the trees are unrooted there are even more
: possibilities that two strings of symbols describe the same tree.  Strimmer
: does not, I think, want to count either of these as identical to
: ((c,a),(b,d));

Joe is entirely correct.  I was looking at the problem as a
mathematician, not as a biologist.

: My program CONSENSE tabulates frequencies of subsets that occur on the trees,
: and as a result does not lend itself well to what Strimmer wants.  Alas,
: neither do Foster's algorithms.  The best I can think of is to do many
: runs of CONSENSE, each making a consensus of two trees, one of the
: sampled ones and the target tree.  Then look  at each output to see if
: all the groups on it occur 100% of the time.  But that is tedious.

In general, I think there is no known fast solution to this problem.
Joe's suggestion is a good heuristic workaround.  The weakness is that
there is no way to quantify how good (or poor) it is...is there?  In my
experience, biologists eyeball the data, use considerable background
expertise, and get a pretty good idea of which trees are reasonably
close to each other.  I'm wondering to what extent this process can be
automated, and what can be said mathematically about the accuracy of
these automated solutions.
James A. Foster			email: foster at cs.uidaho.edu
Laboratory for Applied Logic	Dept. of Computer Science
University of Idaho		www: http://www.cs.uidaho.edu/~foster

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net