In article <cgrunau-070795175303 at biog024.riken.go.jp>,
cgrunau at rkna50.riken.go.jp says...
>>Hi RFLPers!
>>Does anyone knows a paper about the experimental proof for Nei & Li's
paper
>from 1979 "Mathematical model for studying genetic variations in terms
of
>restriction endonucleases"?
>>What I am doing is:
>>#1) PCR amplification of hsp70 from a mixed population
>>#2) RFLP of the obtained clones to reduce data redundancy by estimating
the
> substitutions per nucleotide site (Nei & Li) and hierarchical
>clustering
>>#3) choosing a type clone for each cluster and sequencing the type
clones
>>#4) sequence comparison
>>Problem: RFLP and sequence comparison gives SAME tree topology but more
>than one order of magnitude DIFFERENT distance values (substitutions per
>nucleotide site). The RFLP tree is "smaller" than the sequence tree.
>>I would appreciate to hear your opinion.
>>Thanks in advance.
>>Christoph Grunau
In the paper you mention, Nei & Li 1979, the authors describe two
different approaches to deal with two different kind of data: restriction
sites and restriction fragments. I'm not sure about what kind of data are
you dealing with, but I guess you are talking about restriction
fragments.
In a similar experiment whose data I analized, some coleages of me
amplified by PCR a region of around 1kb, which subject to digestion with
5 different enzimes. I calculated the genetic distances for each enzime
accordingly to Nei & Li estimator and I combined the results as suggested
by Nei & Miller (1990) Genetics 125:873. Since the sequence of some of
the samples used was known I had the opportunity of plotting the
distances obtained from RFLP vs the ones obtained from the sequence, and
I observed exactly what you have observed: the slope was around 0.1,
although the correlation coefficient was very high. Topologies were
identical, as in your case.
Publisehd observations concerning this:
Firstly, the final expression proposed (the famous iterative formula) is
obtained after some simplifications. It is also known that the formula
underestimates d (the distance) when d >= 0.1 (Kaplan, 1983,Statistical
analysis of DNA sequence data, Marcel Dekker, NY, pp. 75), and Nei states
in his book (Molecular evolutionary genetics, 1987, Columbia Univ Press)
that the formula is accurate for d <= 0.05. So, this is a very narrow
ragne of application.
Now, here my lucubrations go:
In my opinion there is still another problem. The efficiences of the
method have usually been chacked by computer simulation of sequences in
which mutations are assumed to occur at random along the sequence. This
observarion rarely occurs, specially in coding sequences. Then, if
mutations tend to occur at hot spots, we can expect that this produce a
drastic decreasing in the number of restriction sites randomly
distributed that will change for a given number of differences between
two sequences. Moreover, we can speculate that restriction sites
overlapping non synonymous positions will hardly change, at least in the
range of genetic distances in which we are working (d > 0.1).
This is for me, perhaps, the more serious problem. Fortunately, in
several "experiments" we made simulating tha digestion of known sequences
we always observed that, despite distances were clearly underestimated,
the topologies obtained were ALWAYS correct. In the few cases that
topologies changed, they failed to place some branches that were not well
defined in the original set of sequences (branching points with values of
60% an smaller).
We are currenty developping some software for using RFLPs in population
and molecular epidemiology studies. Although it is not yet public domail
(it is in its very beta or alpha version), you are kindly invited to
visit our Web page at the URL:
http://www.cnb.uam.es/www/programas/pag-soft.html
you may find something interesting for you.
Regards
Joaquin Dopazo