Joe Felsenstein (joe at evolution.genetics.washington.edu) wrote:
: In article <3mjgtu$91h at newshound.uidaho.edu>,
: James Foster <foster at cs.uidaho.edu> wrote:
: >Korbinian Strimmer (strimmer at wap18.zi.biologie.uni-muenchen.de) wrote:
: >: I want to count how many trees in the
: >: big treefile are identical to the specified tree. As there are many
: >
: >You're looking for algorithms for the "tree isomorphism" problem: given
: >two trees, determine if there is some way of naming all the nodes in
: >each to get the same tree.
: >
: >There are polynomial time algorithms for this...
: I don't think Strimmer's original problem is the one Foster discusses.
: My reading of Strimmer's post is that he does NOT want to allow the tips
: to be renamed. He sees that ((a,b),(c,d)); is the same tree as,
: for example ((c,d),(b,a)); If the trees are unrooted there are even more
: possibilities that two strings of symbols describe the same tree. Strimmer
: does not, I think, want to count either of these as identical to
: ((c,a),(b,d));
Joe is entirely correct. I was looking at the problem as a
mathematician, not as a biologist.
: My program CONSENSE tabulates frequencies of subsets that occur on the trees,
: and as a result does not lend itself well to what Strimmer wants. Alas,
: neither do Foster's algorithms. The best I can think of is to do many
: runs of CONSENSE, each making a consensus of two trees, one of the
: sampled ones and the target tree. Then look at each output to see if
: all the groups on it occur 100% of the time. But that is tedious.
In general, I think there is no known fast solution to this problem.
Joe's suggestion is a good heuristic workaround. The weakness is that
there is no way to quantify how good (or poor) it is...is there? In my
experience, biologists eyeball the data, use considerable background
expertise, and get a pretty good idea of which trees are reasonably
close to each other. I'm wondering to what extent this process can be
automated, and what can be said mathematically about the accuracy of
these automated solutions.
--
James A. Foster email: foster at cs.uidaho.edu
Laboratory for Applied Logic Dept. of Computer Science
University of Idaho www: http://www.cs.uidaho.edu/~foster