In article <4mvjdq$nnq at mserv1.dl.ac.uk>,
Dr. J.P. Clewley <jclewley at hgmp.mrc.ac.uk> wrote:
>>I would like to take PFGE profiles (and possibily also ribotypes and
>PCR-RFLP patterns) of different strains of the same organism, and
>generated with several different restriction enzymes, and combine the
>data to produce a phenetic (phylogenetic) tree.
>>For the PFGE profiles neither the location of the cutting sites nor
>the precise number of sites can be known. Thus, RESTML cannot be
>used as it assumes that the sites are mapped. Is this correct?
Quite correct. It assume the presence/absence of individual sites can
be scorred, not just of individual fragments.
>Therefore, I propose to use the equation of Nei and Lei (PNAS 76: 5269,1979)
>D = 1 - 2(n(xy))/(n(x)/n(y)) as described by Vilgalys and Hester (J Bact 172:
>4238, 1990) and Gurtler et al (J Gen Microbiol 137: 2673, 1991).
>>In this approach the individual distance matrices of D values are averaged
>to produce a single matrix for e.g. FITCH. Is this valid?
In principle it is valid but do go back to Nei and Li's paper for your
formula. Aside from a typo that I see (n(x) is not divided by n(y))
you need to have a distance that is additive across branch length. That
is, if we evolve along branch X we accumulate distance D (on average),
if we evolve along branch Y we accumulate distance D', and if we
evolve along one followed by the other, we accumulate D+D'. Arbitrary
dissimilarity formulas do not have this property. In the formula you
give the distance reaches a maximum of 1, so it cannot be additive
(i.e. if branch X gives distance 0.6, and so does branch Y, you want
branch X followed by branch Y to give 1.2, not 1.0).
I think that Nei and Li's paper will give a distance formula that is
additive (under a simple model of DNA evolution) and I would use that.
The strategy of adding up distance measures is in general fine.
--
Joe Felsenstein joe at genetics.washington.edu (IP No. 128.95.12.41)
Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA