Frequency of BstE II cutting?

Chris Boyd chrisb at hgu.mrc.ac.uk
Tue Jul 2 05:16:37 EST 1996


Mikhail Alexeyev (malexeyev at biost1.thi.tmc.edu) wrote:
: In article <4qubv9$188 at scotsman.ed.ac.uk>, chrisb at hgu.mrc.ac.uk (Chris
: Boyd) wrote:

: > Mikhail Alexeyev (malexeyev at biost1.thi.tmc.edu) wrote:
: > : In article <DtIG58.8wK.B.midge at bath.ac.uk>, bspwrb at bath.ac.uk (W R
: > : BENNETT) wrote:
: > 
: > : > If BstE II has a restriction site of GGTNACC, does it cut at the same
: > : > frequency as a six-cutter (i.e. an average 1 in 4096 disregarding sequence
: > : > distribution considerations), which is the "intuitive" answer, or does it
: > : > cut with reduced frequency (which is what I'd like!).  Promega's technical
(snip)
: > 
: > : Yet another way to put it is in terms of probability to encounter a
: > : specific nucleotide at a specific position. For BstEII it should be:
: > 
: > :  G   G   T   N   A   C   C
: > : 1/4 1/4 1/4 4/4 1/4 1/4 1/4
: > 
: > : Probability is: (1/4)^6 x 4/4= 4^-6= 1 in 4096
: > 
: > Yes, this is a fair first approximation way of looking at this, and is
: > all you need for most applications. In reality, however, the occurrence
: > frequency of any given query sequence is markedly affected by the base
: > composition and sequence microstructure (CpG islands etc.) of the
: > target DNA.  E.g., CTAG is far rarer than GATC in the E. coli genome.
: > 
: > For pedantically accurate theoretical results, you unfortunately have
: > to do a Markov chain analysis to explain why, and calculate to what
: > extent, sequences with repeated adjacent bases are commoner than the
: > above naive analysis would suggest.
: > 
:  

: Very true, but the original post implied to DISREGARD these considerations
: by asking to compare frequency of BstEII cutting with that of a generic
: six-cutter (1 in 4096 disregarding sequence distribution considerations),
: didn't it? 

Mikhail, I realised that, but I thought it would be useful to give some
extra relevant information.  In particular, I took the original
poster's caveat about sequence distributions to refer to the substrate
DNA, not the recognition sequence.  I was pointing out (inter alia)
that you have to take account of the recognition sequence also, even if
the substrate DNA is constant and of random sequence.  In such a
substrate, for example, BssHII (GCGCGC) sites will occur more
frequently than BstEII sites.

In the real world, a quick gel will give the best answer!

Best wishes,
--
Chris Boyd                       | from, | MRC Human Genetics Unit
chrisb at hgu.mrc.ac.uk             |  not  |  Western General Hospital
http://www.hgu.mrc.ac.uk/~chrisb |   for |   Edinburgh EH4 2XU, SCOTLAND



More information about the Methods mailing list