# ML scores

Mike Syvanen syvanen at ucdavis.edu
Wed Aug 30 13:57:46 EST 2000

```
James McInerney wrote:

> Nicole,
>
> Tough one this.  You are effectively asking the question "Is my data
> better than random?".  In the same way that maximum parsimony absolute
> scores are pretty meaningless (is 1,000 steps good or bad?: Can't really
> say).  It is not possible to say that an ML score of 12000 is bad or
> good, however you can compare these scores to the null hypothesis of
> impossibly good scores or impossibly bad scores.  There are many maximum
> parsimony indices that theoretically could also be applied to ML.  For
> instance there are consistency indices (pretty useless also), that
> compare the score for each character with the minimum possible score for
> that character.  Unfortunately, again it is difficult to say what is an
> unacceptably good or bad CI value and indeed this value is (can be)
> correlated with the number of taxa in the dataset.  There is also the
> Retention Index (RI), that is a little more independent of the number of
> taxa in the dataset, but is also problematic in the sense that it is not
> possible to say what is a 'good' or 'bad' RI value.  Analagous indices
> could be calculated using ML instead of MP.
>
> Possibly the best way of assessing whether or not you have a 'good' or
> 'bad' ML value for a dataset of any particular size is to use a PTP
> test.  In this test, you randomise within characters and then work out
> the ML score for the new dataset.  Repeat this many times (say, 100) and
> compare the original ML value to the values of the randomised datasets.
> If your original ML value is 'significantly' (a word I'm using
> advisedly) better than any of the randomised values, then your ML value
> is probably good for that original datamatrix.

The problem with this approach is the implicit assumption that the
homoplasy within the data set (and if  a tree with no homoplasy
were possible, we wouldn't be discussing this problem) is generated
randomly within some unique tree.  If biologically no unique tree exists
or if within such a unique tree, homoplasy is generated non-randomly,
then this approach will not work either.   A general solution to this problem
is, from my experience, not available.

Mike Syvanen

---

```