A colleague is interesting in testing whether the synonymous
and non-synonymous distances are statistically different for a
protein coding DNA sequence. The data available consists of
seven sequences known to have the following tree topology:
/ v
a \ /\ w
>------------------------</ x
b / \/ y
\ z
Rather than use the formula for the Variance of the distance
estimate (Jukes-Cantor method) calculated on the mean distance
(average of the 10 pairwise distances between <a,b> and
<v,w,x,y,z> clusters)....
I presume that a more accurate test would be to do a t-test on Ka-Ks
for the 10 pairwise distances from <a,b> to <v,w,x,y,z>. This would
be tested against an H0 that Ka-Ks = 0. I'm assuming that the
distances within the 2 clusters are negligable compared to the
branch length connecting them, and the 10 pairwise distances
give a more reliable estimate of the variance of the distance
estimate than does the theoretical formula for the variance of
the Jukes-Cantor distance.
Is this a reasonable approach?
Frank Wright
SASS, University of Edinburgh,
Edinburgh, Scotland, U.K.
e-mail: frank at sass.sari.ac.uk