Biosequences .. Software .. Molbio soft .. Network News .. FTP

# justification for "maximum maximum likelihood"

Andrew J. Roger roger at evol5.mbl.edu
Fri Jul 18 07:51:35 EST 1997

```Hi,

I was wondering if anyone has an up to date take on the debate between
Felsenstein and Sober regarding how one comes up with nuisance
parameters (such as branch lengths or any parameter in
the model) during maximum likelihood analysis.

The argument went (I think..please correct me if I make mistakes)
as follows:

When you calculate the probability of observing the data
given a tree, you need to have information about
probabilities of change over the branches of the tree.
But these quantities are unknown-- so how should one
get such estimates?  Take just a single parameter for
example.  If you do not know this parameter in advance
what value should you use for it?  Felsenstein suggested
that one can estimate it by finding a value that maximizes
the likelihood of the data given the tree.  However, Sober
suggested that maybe this is not always going to
give one an appropriate value.  Take two trees A and B
and a single parameter that needs to be estimated
to calculate the likelihood of the data under trees A
and B.  If one estimates this parameter under each tree
separately by maximimizing the likelihood then you can
then ask which tree confers a higher likelihood on the
data given the tree (and the model and ML parameter estimates).

So say for instance the answer is:

Pr(Data| Tree A, model, P*A) > Pr(Data| Tree B, model, P*B)

where P*A is the maximum likelihood estimate of the paramater P
under tree A and P*B is the ML estimate of the parameter P
under tree B.

You can now say that Tree A is preferred over Tree B...Or can
you?  What if for most of the possible values for parameter
P, Pr(Data| tree A, model, P) < Pr(Data| tree B, model, p),
but only at the maxima of the two likelihood curves does tree A
confer a higher probability on the data.  Wouldn't one want to say
that overall tree B is a better choice that tree A?

What I would like to know, is if anyone has views about
whether this potential problem in calculating likelihood
is actually a real problem for real data (or simulated
data).

I hope that this inspires a little debate-- much lacking
recently on this newsgroup!

Cheers
Andrew J. Roger

```