tree length, ci, ri

Mark Siddall mes at zoo.toronto.edu
Wed May 10 02:38:34 EST 1995

In article <3okk7g$ab7 at mserv1.dl.ac.uk> "Essop, FM, Dr" <MFESSOP at chempath.uct.ac.za> writes:
>I have done some restriction mapping and constructed cladograms using 
>Hennig86.  I am not sure what the following values indicate: 

Oh my! _quae nocent docent_
I see we have much work to do here.  
My first comment is that your non-understanding suggests that you are one
who has simply stuck things into Hennig86 without sound knowledge of
the foundations and philosophies of phylogeny reconstruction.
A little knowledge is dangerous and I would highly reccomend that you
spend a lot of time becoming learned.
That said I will answer your queries albeit _in extenso_...

>i) consistency index - what for eg. does a ci value of 54 compared to 
>45 mean ?

Both the CI and RI relate to number of steps (tree length).
The global CI is simply the ratio of the minimum possible number of 
steps for a data set of that size to the minimum number of steps
Thus, if there are 20 binary characters the minimum possible number of steps
is 20.  That is, each character has to change at least once.
In most cases, however, there is no tree that will allow this absolute
minimum.  That is, some characters will have to change more than once by
way of convergence or reversals (we call this "homoplasy").
If with this same data set your shortest tree was 40 steps then the 
CI would be 0.50.  That is, on average, characters change twice.
A CI of 54 indicates less homoplasy than a CI of 45.  
Pari ratione, a CI of 54 indicates less "noise" than a CI of 45.

>ii) retention index - same question as above

The CI was criticized because one could artificially inflate it's value
by including uninformative characters.  Any character that changes only
once and for only one taxon is not informative of group relationships and
will be compatible with any tree of any shape.  We call these characters 
"autapomorphies".  So imagine a data set with 10 binary characters and
suppose all of these are informative, and suppose the shortest tree is
20 steps long.  The CI = 0.50.  Add 10 uninformative characters, you
will get THE SAME tree, but the CI will be 20/30 = 0.66 which looks
better than 0.50.
So, the RI is the ratio of the number of extra-steps acheived to the 
maximum number of extra steps possible.  It is sensitive both to the
number of informative characters and the number of taxa and is
not affected by autapomorphies.

>iii) tree length - what for eg. does a tree length of 307 compared to 
>288 mean ?  How does one "decide" what the best tree is - based on 
>tree length ?

If you used Hennig86 (properly) you will have got the shortest tree.
This is the most parsimonious tree.
This is the best tree.
The length in simply the number of evolutionary changes that need to be
postulated for the data.

I hope this helps.
You should really seek out a course in cladistics or otherwise a 
My impression is that there are a few at the University of Otago.

Alternatively, I would reccommend that you write Diana Lipscomb for her
teaching manual for use with Hennig86 (it's not free but it is good).
She is <biodL at gwuvm.gwu.edu> and is also the editor of the journal

Mark E. Siddall                "I don't mind a parasite...
mes at vims.edu                    I object to a cut-rate one" 
Virginia Inst. Marine Sci.                     - Rick
Gloucester Point, VA, 23062

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net