Kent Holsinger writes (in response to my query about whether one
should limit a transition/transversion estimate from a maximum
liklihood tree to the most closely related sequences):
Kent> Joe can answer this more authoritatively than I, but in principle
Kent> there should be no need to restrict the method to a subset of the most
Kent> closely related sequences. DNAML includes a model of the
Kent> substitutional process that will correct for multiple substitutions.
[good stuff deleted about how to test for different Ts/Tv in different parts
of the tree]
Let me make the question more specific.
I understand that DNAML chooses a branch length (in effect adding
multiple substitutions) until it finds a length that gives the
greatest liklihood of observing the original distribution of
differences (assuming the Ts/Tv ratio you specified). If you run with
different Ts/Tv you get a curve with the peak at the most likely Ts/Tv
and a range of uncertainty defined by where the curve drops below
some acceptable liklihood ratio. If transitions are saturated in the
data to start with, then the curve will never drop below threshold on
the high Ts side. Instead the program with just make the branch longer
by adding multiple transitions and then report that a higher Ts/Tv is
reasonably accommodated by the data. In effect, you transfer
uncertainty about length of branches into uncertainty about Ts/Tv. So
far, this accurately reflects what you can actually tell from the
Now my question is, what happens if you merge divergent sequences
with saturated transitions into data with more closely related sequences?
Will the liklihood curve for Ts/Tv broaden over what it would have been
with the closely related sequences alone? That is, will you convert
uncertainty about the divergent branch lengths into uncertainty about
the global Ts/Tv?
Steve Hardies, Dept. Biochem., Univ. Texas HSC at San Antonio
Hardies at thorin.uthscsa.edu