ML analysis on PAUP

John Huelsenbeck johnh at brahms.biology.rochester.edu
Thu Feb 3 21:23:36 EST 2000


In article <86qkp2$4ae$1 at mercury.hgmp.mrc.ac.uk>, Brice Quenoville
<quenovib at si.edu> wrote:

I'll try to answer the other two questions.
>Hi,
>
>I have 4 questions regarding ML analysis. Thanks for any insights.
>
>1/ I am analyzing nuclear data sequences for which I have some sites
coded as ambiguities (following a IUB code). I am wondering how the last
Paup version exactly treats such positions during a ML search. Although
heterozygotie may not be informative at the taxonomic level I'm working, I
still want to include these positions because some of them are
heterozygous in only one sequence and parsominy informative in others.

Mary's explanation is correct.

>3/ I did a ML run based on a "branch and bound" search and it took 6
hours 45 mn, leading to a unique best tree. I then did an exhaustive
search with the same settings and it took 45 mn (I have 7 taxa), leading
to a different tree with a slightly higher -Ln L value. Some people told
me here that "branch and bound"  is good for parsimony criteria but may
get lost in ML search because it is dealing with probabilities and not
number of steps. Is that so true that a branch and bound method is not
highly commendable for ML search??

This worries me. You might consider reporting the bug to Swofford. I wonder
if it is true that adding a taxon guarantees that the likelihood score
must decrease? It seems that it should and that B-and-B should work for
ML. However, I've never actually heard of someone trying it for ML.

>4/ Finally what is exactly the -ln L value unconstrained? Is it a value
calculated based on a star tree or on data patterns without topological
reference. Then, if one has sequence data for the same gene in two
different groups of taxa, does it mean something to use the -ln L
unconstrained value or a ratio between that value and the best tree value
to perform between groups comparison (having or not the same number of
taxa, the same number of branches and the same "type" of topology). Or is
it meaningless or exactly as informative than just comparing the % of
informative sites between these groups.

The unconstrained likelihood is the likelihood calculated under a multinomial
model. Likelihood sees the data as counts of different site patterns. For
four species, for example, there are 4^4 = 256 site patterns that could
be observed. For a data set of 4 species, some of these site patterns
will not be observed (have counts of 0) and others will be observed one
or more times. This pattern is multinomial. You can consider the unconstrained
likelihood as the best likelihood that you could achieve for your data; under
the multinomial model, no evolutionary assumptions have been made, though
the different site patterns are considered to be independent. The difference
between the unconstrained likelihood and the likelihood you calculate
under your favorite evolutionary model is the cost associated with
assuming a model of evolution. Goldman (1993; JME) showed how this
difference in log likelihoods could be used to formulate a test of
model adequacy. It involves simulating many data sets under your evolutionary
model.


John Huelsenbeck







More information about the Mol-evol mailing list