I wonder if someone can tell me whether the following is
The estimate of distance between two sequences has
a variance associated with it that is a function of
the sequence length. For short sequences, therefore,
the variance of the distance estimate is large. So
if one wanted to correct this situation one could
get more sequence and calculate the distance for
the larger sequences. This is similar (identical) to the practice
of concatenating datasets to improve the distance
estimate (assuming that they are all evolving in a similar
However, in a lot of cases, I have noticed that the distances
between a pair of taxa will be separately estimated
from each individual gene and then the distances from many
genes will be averaged somehow.
My question is, does the averaging of distances over
many different genes DECREASE the variance of the final
distance estimate between two taxa in the same way that
concatenating the sequences would?
Intuitively, I would think that the averaging of distances
will, in the end, only lead to an average variance that is
comparable to the variance of any of the original datasets.
I hope someone can explain this to me!!!!
Andrew J. Roger
Marine Biological Laboratory
Woods Hole, MA