combining distances

Joe Felsenstein joe at evolution.genetics.washington.edu
Wed Sep 10 23:28:03 EST 1997


In article <3416EC03.53A5 at evol5.mbl.edu>,
Andrew J. Roger <roger at evol5.mbl.edu> wrote:
...
>However, in a lot of cases, I have noticed that the distances
>between a pair of taxa will be separately estimated
>from each individual gene and then the distances from many 
>genes will be averaged somehow.
>
>My question is, does the averaging of distances over 
>many different genes DECREASE the variance of the final
>distance estimate between two taxa in the same way that
>concatenating the sequences would?
>
>Intuitively, I would think that the averaging of distances
>will, in the end, only lead to an average variance that is
>comparable to the variance of any of the original datasets.

Averaging distances would lead to a distance that has
the same expected value but a lower variance.  Your intuition
is wrong on this: variances of averages of  n  things are
roughly 1/n as large as the variances of the things.

Concatenating sequences will reduce the variance too, and to
about the same extent.

But there is one reason why averaging distances might be better.
If loci differ in their evolutionary rates, concatenating would
give incorrect results, as the overall correction for unobserved
substitutions (reversals etc) will then be wrong.  But averaging
loci would do better as each locus would get its correction set
to its own rate of change.

-- 
Joe Felsenstein         joe at genetics.washington.edu     (IP No. 128.95.12.41)
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA



More information about the Mol-evol mailing list