ML scores

Mary K.Kuhner mkkuhner at kingman.genetics.washington.edu
Tue Aug 29 14:41:23 EST 2000

In article <8oguil$pnu$1 at mercury.hgmp.mrc.ac.uk>,
Nicole Dubilier  <ndubilie at postgate.mpi-bremen.de> wrote:

>I'm a bit worried about bringing down the wrath of the entire ML
>community on me for asking such a stupid question, but here goes:

Hey, don't be--if you don't ask, you'll never find out....

>what is the absolute worth of a log likelihood value for a final tree? I
>understand that for a given data set I can try to improve ln log by
>comparing runs with different e.g. transition/transversion ratios but
>once I have found the best ln log value for my data set, how do I know
>if this is a good value? For example, with my data sets I have -ln log
>values around 12000 and this seems "bad" to me because in published
>trees I have looked at -ln log values are usually around 2000 to 4000.

The log likelihood is the probability of your data, given the tree.  The
single biggest influence on its magnitude is the amount of data you
have--the more data you have, the more collectively improbable it
will be.  (Any given roll of one die is pretty likely.  Any specific set of
ten thousand die rolls is very unlikely.)  The likelihood gets worse and 
worse as you add more sites or tips.

The second biggest influence is the amount of variability in your data:
a data set consisting of ten identical sequences requires less explanation
than one consisting of ten diverse sequences.

So you can see that the magnitude of the log likelihood is *not* a
useful guide to how good your tree is.   Log likelihoods are only
meaningful for a specific data set, not between data sets.  

I'd say that your -12000 figure suggests that you have longer 
sequences than other people do, or a gene with a higher evolutionary
rate, or more tips.  This is not a bad thing!

Mary Kuhner mkkuhner at genetics.washington.edu


More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net