James McInerney wrote:
> Nicole,
>> Tough one this. You are effectively asking the question "Is my data
> better than random?". In the same way that maximum parsimony absolute
> scores are pretty meaningless (is 1,000 steps good or bad?: Can't really
> say). It is not possible to say that an ML score of 12000 is bad or
> good, however you can compare these scores to the null hypothesis of
> impossibly good scores or impossibly bad scores. There are many maximum
> parsimony indices that theoretically could also be applied to ML. For
> instance there are consistency indices (pretty useless also), that
> compare the score for each character with the minimum possible score for
> that character. Unfortunately, again it is difficult to say what is an
> unacceptably good or bad CI value and indeed this value is (can be)
> correlated with the number of taxa in the dataset. There is also the
> Retention Index (RI), that is a little more independent of the number of
> taxa in the dataset, but is also problematic in the sense that it is not
> possible to say what is a 'good' or 'bad' RI value. Analagous indices
> could be calculated using ML instead of MP.
>> Possibly the best way of assessing whether or not you have a 'good' or
> 'bad' ML value for a dataset of any particular size is to use a PTP
> test. In this test, you randomise within characters and then work out
> the ML score for the new dataset. Repeat this many times (say, 100) and
> compare the original ML value to the values of the randomised datasets.
> If your original ML value is 'significantly' (a word I'm using
> advisedly) better than any of the randomised values, then your ML value
> is probably good for that original datamatrix.
The problem with this approach is the implicit assumption that the
homoplasy within the data set (and if a tree with no homoplasy
were possible, we wouldn't be discussing this problem) is generated
randomly within some unique tree. If biologically no unique tree exists
or if within such a unique tree, homoplasy is generated non-randomly,
then this approach will not work either. A general solution to this problem
is, from my experience, not available.
Mike Syvanen
---