frank at sass.sari.ac.uk (Frank Wright) writes:
>A colleague is interesting in testing whether the synonymous
>and non-synonymous distances are statistically different for a
>protein coding DNA sequence. The data available consists of
>seven sequences known to have the following tree topology:
>> / v
> a \ /\ w
> >------------------------</ x
> b / \/ y
> \ z
>>Rather than use the formula for the Variance of the distance
>estimate (Jukes-Cantor method) calculated on the mean distance
>(average of the 10 pairwise distances between <a,b> and
>>I presume that a more accurate test would be to do a t-test on Ka-Ks
>for the 10 pairwise distances from <a,b> to <v,w,x,y,z>. This would
>be tested against an H0 that Ka-Ks = 0. I'm assuming that the
>distances within the 2 clusters are negligable compared to the
>branch length connecting them, and the 10 pairwise distances
>give a more reliable estimate of the variance of the distance
>estimate than does the theoretical formula for the variance of
>the Jukes-Cantor distance.
>>Is this a reasonable approach?
No, because all "long" distances in the tree shown above are highly
correlated. Actually, if all differences inside each cluster are
negligable, they do not add much information to a single distance,
More about the estimation of distance variances can be found in:
Tajima F (1992) Statistical method for estimating the standard errors
of branch lengths in a phylogenetic tree reconstructed without
assuming equal rates of nucleotide substitution among different
Mol. Biol. Evol. 9:168-181.
Rzhetsky A, and Nei M (1992) A simple method for estimating and testing
Mol. Biol. Evol. 9:945-967.
Bulmer M (1989) Estimating the variability of substitution rates.
and about the estimation of Ka and Ks in:
Li W-H (1993) Unbiased estimation of the rates of synonymous and
J. Mol. Evol. 36:96-99.