Propositions concerning NH treefiles

Joe Felsenstein joe at evolution.genetics.washington.edu
Fri Apr 26 17:19:25 EST 1996

In article <4lo1rb$bc8 at lyra.csx.cam.ac.uk>,
Korbinian Strimmer  <strimmer at zi.biologie.uni-muenchen.de> wrote:
>I'd like to put the following two propositions concerning NH treefiles
>to the public for discussion:
>a) Sometimes you need to draw a tree with edge lengths *and* corresponding boot
>   values for each internal branch. For example, consider this tree with edge l
>   At the moment, it is not possible to have *both* information in one treefile
>, and it is
>   not possible to print let's say a phenogram with bootstraps.
>   I therefore would like to propose that the current NH convention should be e
>   to allow for more than one : extension. The tree drawing programs (DRAWTREE,
>   TreeView) could then allow the user to specify the task of each entry, say t
>he first
>   is the edge length, the second entry is the bootstrap value for each (intern
>al) branch.

The original standard (which we have not yet written up as all its authors got
too busy) allows comments to follow the branch lengths, in square brackets.
Why not just put the bootstrap values in square parentheses there?

((A:0.1345[100],B:0.12[100]):0.0345[76], .... (etc.)

>   Another possibility would be that the tree drawing programs allow for the in
>put of 2 treefiles
>   at the same time to generate 1 picture of a tree. ( Probably this is even a 
>better solution
>   than allowing for more than one : extension)

That sounds too complicated.

>b) As another extension of the NH scheme I'd like to propose that treefiles sho
>uld *always*
>   look like this:
>     1
>   (LungfishAu:0.1246,(LungfishSA:0.1338,LungfishAf:0.1324):0.0769,(
>   (((Platypus:0.1441,Opossum:0.1302):0.0380,((Mouse:0.0566,Rat:0.0717)
>   :0.1017,(((Cow:0.0803,Whale:0.0935):0.0334,Seal:0.0848):0.0267,Human:0.1514)
>   :0.0355):0.0495):0.0865,((((Crocodile:0.1948,Bird:0.1546):0.0560,Sphenodon:0.2029)
>   :0.0239,Lizard:0.2302):0.0362,Turtle:0.1508):0.0581):0.0842,Frog:0.1664)
>   :0.0619);
>   This means that in the first line there should always be a number indicating
> the number
>   of trees following. This would have the advantage that input trees and outpu> t trees have
>   the same format (DNAML 4.0!)  und you could just rename the corresponding fi>les. It would
>   also create a compatibility between PHYLIP tree files and MOLPHY topology fi>les.

The problems with this are twofold.  Sometimes a program does not know in
advance how many trees it is going to write out.  Then it would be
required by you to reopen the file and overwrite the first line (which it
might then have written as a blank line in anticipation).  This is a nuisance.

The Newick's Standard (the original authors seem to have voted for that name
instead of New Hampshire) does not really specify the whole file format.
So your suggestion is outside the standard but not incompatible with it.

The other problem is that in
some formats such as the Maddison-Swofford-Maddison NEXUS format, there is
a much more complex structure of the file is quite different from this.

Joe Felsenstein         joe at genetics.washington.edu     (IP No.
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net