> Not exactly. Actually I am using this option of puzzle currently to assign
> the site rates. However, what puzzle generates is actually a "most
> probable" assignment. Somehow it goes through and asks which rate
> contributes the most to the likelihood at that particular site and spits
> it out. This is obviously different than estimating the rate at a site as
> an actual parameter of the model (and using that assignment to calculate
> the likelihood). I want the latter. Gary Olsen implemented this in
> DNArates for DNA, but hasn't done it for proteins yet. In most cases it
> will be useless for tree inference because the number of parameters
> estimated = number of sites. But I think if you are only after the site
> rates (under an assumed tree), then it should be more accurate than the
> puzzle "most probable rate category" assignments.
Not to be too pedantic about this, but Olsen's DNArates program actually estimates the rate of change for each unique data pattern rather than each position in an
alignment. Consequently, the number of estimated parameters (i.e., number of patterns) will usually be far less than the number of positions for any sequence data of
useful length, unless the data are extraordinarily noisy. The program subsequently parses the rates into a user-defined number of categories (up to 35), for each of
which the assigned rate is some average of the rates of the patterns within it.
It's not my intention to instigate a wave of responses on the dangers of over-parameterization, but when it comes to ML tree inference with correction for site-to-site
variability in evolutionary rate, it's not clear to me how this method is so very different from, for example, the user-input rate categories that can be used with
DNAML in PHYLIP.
Sean Turner, Ph.D.
National Library of Medicine
NIH, Building 38A, Room 8N-805 phone: 301-435-8943
8600 Rockville Pike fax: 301-480-2918
Bethesda, MD 20894, USA e-mail: turner at ncbi.nlm.nih.gov