ML scores

James McInerney james.o.mcinerney at may.ie
Wed Aug 30 06:41:54 EST 2000


Nicole,

Tough one this.  You are effectively asking the question "Is my data
better than random?".  In the same way that maximum parsimony absolute
scores are pretty meaningless (is 1,000 steps good or bad?: Can't really
say).  It is not possible to say that an ML score of 12000 is bad or
good, however you can compare these scores to the null hypothesis of
impossibly good scores or impossibly bad scores.  There are many maximum
parsimony indices that theoretically could also be applied to ML.  For
instance there are consistency indices (pretty useless also), that
compare the score for each character with the minimum possible score for
that character.  Unfortunately, again it is difficult to say what is an
unacceptably good or bad CI value and indeed this value is (can be)
correlated with the number of taxa in the dataset.  There is also the
Retention Index (RI), that is a little more independent of the number of
taxa in the dataset, but is also problematic in the sense that it is not
possible to say what is a 'good' or 'bad' RI value.  Analagous indices
could be calculated using ML instead of MP.

Possibly the best way of assessing whether or not you have a 'good' or
'bad' ML value for a dataset of any particular size is to use a PTP
test.  In this test, you randomise within characters and then work out
the ML score for the new dataset.  Repeat this many times (say, 100) and
compare the original ML value to the values of the randomised datasets. 
If your original ML value is 'significantly' (a word I'm using
advisedly) better than any of the randomised values, then your ML value
is probably good for that original datamatrix.  The problem with this
approach might be that if there is a little phylogenetic structure in
part of the dataset, then the original datamatrix might pass the test,
even though a lot of the rest of the data is not good (random).

Sorry about the length of the reply, but maybe you will find something
here.  There is some information about these approaches on our molecular
systematics teaching website:

http://www.dbbm.fiocruz.br/james

or soon at:

http://www.bioinf.org/


Look among the lecture notes, or download the powerpoint presentation.

Kindest regards,

James.


Nicole Dubilier wrote:
> 
> Hi!
> 
> I'm a bit worried about bringing down the wrath of the entire ML
> community on me for asking such a stupid question, but here goes:
> 
> what is the absolute worth of a log likelihood value for a final tree? I
> understand that for a given data set I can try to improve ln log by
> comparing runs with different e.g. transition/transversion ratios but
> once I have found the best ln log value for my data set, how do I know
> if this is a good value? For example, with my data sets I have -ln log
> values around 12000 and this seems "bad" to me because in published
> trees I have looked at -ln log values are usually around 2000 to 4000.
> 
> thanks very much, Nicole
> 
> Dr. Nicole Dubilier
> Dept. of Molecular Ecology
> Max-Planck Institute for Marine Microbiology
> Celsiusstr. 1, D-28359 Bremen, Germany
> Tel.: +49 421 2028-932, Fax: +49 421 2028-580
> ndubilie at mpi-bremen.de
> 
> ---

-- 
Dr. James O. McInerney,         Phone +353 1 708 3860  
Dept. Biology,                  Fax   +353 1 708 3845  
Natl. Univ. Ireland,            Email james.o.mcinerney at may.ie          
Maynooth, Co. Kildare, Ireland
http://www.may.ie/academic/biology/jmbioinformatics.shtml
http://www.bioinf.org/


---







More information about the Mol-evol mailing list