Program for tree comparison

John P. Huelsenbeck johnh at phylo.zo.utexas.edu
Mon Apr 17 19:08:02 EST 1995


In article <3mdtof$8sj at sunserver.lrz-muenchen.de>,
strimmer at wap18.zi.biologie.uni-muenchen.de (Korbinian Strimmer) wrote:

> To all tree reconstructors out there! 
> 
> Using the programs from the PHYLIP package of Joe Felsenstein I have
> produced quite a huge number of trees written down as specified within
> the "New Hampshire" standard for computer readable trees (an example 
> for an unrooted tree may be  (a, b, ((c, d), e)); ) Now I want to compare
> all these trees (that are all in one big treefile) to one specific tree
> that is also given in another file. I want to count how many trees in the
> big treefile are identical to the specified tree. As there are many
> possibilities for writing down a given tree in the "New Hampshire"
> form one can not simply compare the two files with a text editor
> but one must think of another way. I suppose that this program must 
> work in a way Consense (from PHYLIP) works, but Consense alone gives
> no answer to my problem.
> 
> I am very sure that many people must have encountered this problem before,
> and I am sure that there exists already a solution to this. If you know
> how to deal with this problem please give me a hint and contact me!!
> 
> Thank you
> 
> Korbinian Strimmer
> 
> 
> ----------------------------------
> strimmer at zi.biologie.uni-muenchen.de

I think that I can help you out.

A lot of my work involves comparing trees (usually a model tree from
simulations to trees estimated using various phylogenetic methods).  The
program does almost exactly what you want.  That is, I have a file of
"True trees" (corresponding to your one model tree) and another file
with the trees I want to examine.

The program reads New Hampshire formated trees.  It then determines
the taxon bipartitions for each tree.  I label the n taxa from 2^0, to
2^n.  I then traverse the tree from the tips to the root, and for each
internal node, I assign it a number which is the sum of the numbers to
the left and to the right of the node (i.e., if the taxon to the left
of the node is 1 and the taxon to the right is 8, then I assign the
node the number 9).  It is then an easy to task to compare whether or not
two trees are identical.  Unfortunately, the program is not very user
friendly (I am currently the only user; I think that you will need to change
the taxon labels to 1, 2, 3, 4, 5, 6, ...).  However, if you are willing
to compile the code (in C) and would kindly remind me, I will be happy
to send it along.

John Huelsenbeck

johnh at phylo.zo.utexas.edu



More information about the Mol-evol mailing list