Program for tree comparison

James Foster foster at cs.uidaho.edu
Thu Apr 13 10:45:02 EST 1995

Korbinian Strimmer (strimmer at wap18.zi.biologie.uni-muenchen.de) wrote:

: To all tree reconstructors out there! 

: Using the programs from the PHYLIP package of Joe Felsenstein I have
: produced quite a huge number of trees written down as specified within
: the "New Hampshire" standard for computer readable trees (an example 
: for an unrooted tree may be  (a, b, ((c, d), e)); ) Now I want to compare
: all these trees (that are all in one big treefile) to one specific tree
: that is also given in another file. I want to count how many trees in the
: big treefile are identical to the specified tree. As there are many
: possibilities for writing down a given tree in the "New Hampshire"
: form one can not simply compare the two files with a text editor
: but one must think of another way. I suppose that this program must 
: work in a way Consense (from PHYLIP) works, but Consense alone gives
: no answer to my problem.

: I am very sure that many people must have encountered this problem before,
: and I am sure that there exists already a solution to this. If you know
: how to deal with this problem please give me a hint and contact me!!

: Thank you

: Korbinian Strimmer

You're looking for algorithms for the "tree isomorphism" problem: given
two trees, determine if there is some way of naming all the nodes in
each to get the same tree.

There are polynomial time algorithms for this.  See "The Graph
Isomorphism Problem" by Kobler, Schoning, and Turan.  

If I remember correctly, the idea is to label each leaf (of each
tree) with a 1.  Then recursively label each node n which would be a leaf
if you removed the nodes already labelled with (d,l), where l is
the label of the labelled node adjacent to n and d is the degree of n.
These labellings can then be matched up in the two trees if and only if
the trees are isomorphic.

A much harder problem is this: given a fixed tree T and a collection of
trees C, find the tree in C which is "most similar" to T.  In your case,
how do you know that one of the trees in your Phylip collection IS the
one you're interested in? 

I would be most interested in thoughts about this latter problem.

James A. Foster			email: foster at cs.uidaho.edu
Laboratory for Applied Logic	Dept. of Computer Science
University of Idaho		www: http://www.cs.uidaho.edu/~foster

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net