In article <8ojk88$fha$1 at mercury.hgmp.mrc.ac.uk>,
Guy Hoelzer <hoelzer at unr.edu> wrote:
>In article <8ojcvc$73u$1 at mercury.hgmp.mrc.ac.uk>, Mary K. Kuhner
><mkkuhner at kingman.genetics.washington.edu> wrote:
>> The log likelihood is the probability of your data, given the tree.
>Is this the same as the probability of the tree, given your data? It
>seems to me that these sorts of likelihood statements are not necessarily
>reversible, but I am not sure in this context.
It's not the same. If we had some way to find out the probability
of the tree, given the data, we would be in a much more convenient
position: we could not only say which tree was best, but how
probable each tree was. (I.e. "My tree inference has a 95% chance of
being correct.")
Think of "the probability of the data given the tree" as "how frequently would
I get the same data if I simulated data at random down this tree?" (No
wonder the numbers are so small! For any realistic amount of data, you
would have to simulate for a long time to get exactly the same data back.)
"The probability of the tree given the data" would be "how frequently would
I find this tree if I examined all evolutionary scenarios leading to this data?"
and can only be calculated if you are willing to assume something about
the prior distribution of trees and use a Bayesian approach. Likelihood fans
tend to reject the idea that we can know what the prior on trees is.
One way to clarify this idea is to ask "What would I have to add up, in order
to get the total probability to equal one?"
For likelihoods [P(data|tree)] you would have to add up the probability of
all possible data sets of that size on the given tree.
For the elusive P(tree|data) you would have to add up the probability of
all possible trees which could explain the given data.
These are not, except by vanishingly rare accident, ever going to be the
same.
Mary Kuhner mkkuhner at genetics.washington.edu
---