Is there experimental proof for Nei & Li (1979)?

Joaquin Dopazo dopazo at samba.cnb.uam.es
Wed Jul 12 03:42:16 EST 1995


In article <cgrunau-070795175303 at biog024.riken.go.jp>, 
cgrunau at rkna50.riken.go.jp says...
>
>Hi RFLPers!
>
>Does anyone knows a paper about the experimental proof for Nei & Li's 
paper
>from 1979 "Mathematical model for studying genetic variations in terms 
of
>restriction endonucleases"?
>
>What I am doing is:
>
>#1) PCR amplification of hsp70 from a mixed population
>
>#2) RFLP of the obtained clones to reduce data redundancy by estimating 
the
>    substitutions per nucleotide site (Nei & Li) and hierarchical
>clustering
>
>#3) choosing a type clone for each cluster and sequencing the type 
clones
>
>#4) sequence comparison
>
>Problem: RFLP and sequence comparison gives SAME tree topology but more
>than one order of magnitude DIFFERENT distance values (substitutions per
>nucleotide site). The RFLP tree is "smaller" than the sequence tree.
>
>I would appreciate to hear your opinion.
>
>Thanks in advance.
>
>Christoph Grunau

In the paper you mention, Nei & Li 1979, the authors describe two 
different approaches to deal with two different kind of data: restriction 
sites and restriction fragments. I'm not sure about what kind of data are 
you dealing with, but I guess you are talking about restriction 
fragments.
In a similar experiment whose data I analized, some coleages of me 
amplified by PCR a region of around 1kb, which subject to digestion with 
5 different enzimes. I calculated the genetic distances for each enzime 
accordingly to Nei & Li estimator and I combined the results as suggested 
by Nei & Miller (1990) Genetics 125:873. Since the sequence of some of 
the samples used was known I had the opportunity of plotting the 
distances obtained from RFLP vs the ones obtained from the sequence, and 
I observed exactly what you have observed: the slope was around 0.1, 
although the correlation coefficient was very high. Topologies were 
identical, as in your case.

Publisehd observations concerning this:
Firstly, the final expression proposed (the famous iterative formula) is 
obtained after some simplifications. It is also known that the formula 
underestimates d (the distance) when d >= 0.1 (Kaplan, 1983,Statistical 
analysis of DNA sequence data, Marcel Dekker, NY, pp. 75), and Nei states 
in his book (Molecular evolutionary genetics, 1987, Columbia Univ Press) 
that the formula is accurate for d <= 0.05. So, this is a very narrow 
ragne of application. 

Now, here my lucubrations go:
In my opinion there is still another problem. The efficiences of the 
method have usually been chacked by computer simulation of sequences in 
which mutations are assumed to occur at random along the sequence. This 
observarion rarely occurs, specially in coding sequences. Then, if 
mutations tend to occur at hot spots, we can expect that this produce a 
drastic decreasing in the number of restriction sites randomly 
distributed that will change for a given number of differences between 
two sequences. Moreover, we can speculate that restriction sites 
overlapping non synonymous positions will hardly change, at least in the 
range of genetic distances in which we are working (d > 0.1).

This is for me, perhaps, the more serious problem. Fortunately, in 
several "experiments" we made simulating tha digestion of known sequences 
we always observed that, despite distances were clearly underestimated, 
the topologies obtained were ALWAYS correct. In the few cases that 
topologies changed, they failed to place some branches that were not well 
defined in the original set of sequences (branching points with values of 
60% an smaller).

We are currenty developping some software for using RFLPs in population 
and molecular epidemiology studies. Although it is not yet public domail 
(it is in its very beta or alpha version), you are kindly invited to 
visit our Web page at the URL:
http://www.cnb.uam.es/www/programas/pag-soft.html
you may find something interesting for you.
Regards

Joaquin Dopazo




More information about the Mol-evol mailing list