frequency of restriction enzymes cutting

Nancy Terryn nater at
Fri Dec 23 08:07:21 EST 1994

Dear net,

I include in this mail the answers I got to my question. Maybe it can be usefull for some of you.

Merry X-mas and happy new year


----- Begin Included Message

> Dear net-friends,
> can anybody suggest to me where I could find a kind of list or table on 
> the estimated cuttings of certain restriction enzymes in Arabidopsis?
> I think I once saw it in a company catalogue but I don't seem to find it 
> back.
> Any help will be appreciated. I will mail the answers back to the net, so 
> it can be of use to others.
> Thanks a lot!
> Nancy Terryn
> Lab of Genetics
> K.L. Ledegankcstraat 35
> 9000 Gent/ Belgium
> Tel 32 9 264 50 04
> Fax 32 9 264 53 49
Answers :


This was posted on the net a few years ago:
>From deanre%anders.dnet at SERVER.UGA.EDU Wed Jun  5 05:04:44 1991
From: deanre%anders.dnet at SERVER.UGA.EDU
Newsgroups: bionet.genome.arabidopsis
Subject: table
Message-ID: <9106051204.AA06990 at>
Date: 5 Jun 91 12:04:44 GMT
Sender: daemon at
Distribution: bionet
Lines: 39

     The following is a table for some common restriction enzyme
sites in the Arabidopsis genome. This table was composed using Jamie
computer program and data I obtained through searches.  The table itself
made by John McDowell using the information from the computer printouts
of dAta.

             ENZYME         RECOGNITION SITE    FREQUENCY       AVE

1            Apa I          GGGCCC              0.0026          38,460
2            Xma I          CGGCCG              0.0053          18,870
3            Sma I          CCCGGG              0.0066          15,150
4            Sac II         CCGCGG              0.0092          10,860
5            Kpn I          GGTACC              0.0118          8,470
6            Xho I          CTCGAG              0.0144          6,940
7            Bam HI         GGATCC              0.0144          6,940
8            Xba I          TCTAGA              0.0158          6,330
9            SaII/HincII    GTCGAC              0.0158          6,320
10           Spe I          ACTAGT              0.0197          5,070
11           Sac I          GAGCTC              0.0249          4,016
12           Pst I          CTGCAG              0.0289          3,460
13           Eco RV         GATATC              0.0302          3,310
14           Eco RI         GAATTC              0.0368          2,590
15           Cla I          ATCGAT              0.0394          2,530
16           Hind III       AAGCTT              0.0617          1,620
17           AhaIII/DraI    TTTAAA              0.0703          1,422

If you have any questions my email address is
DEANRE%gandal.dnet at

The table was put together using the known sequences of Arabidopsis as
in Genbank and Uembl.  The computer program (which fits a markov chain)
these sequences and searches for trinucleotide, tetranucleotide, and
hexanucleotide counts (compares random to expected).  John took the
hexanucleotide counts and looked for common restriction sites which he
put in table form.
                                         Best of luck,


----- End Included Message -----

----- Begin Included Message -----

I wrote a Hypercard stack for the Macintosh that does this for the
species Brassica oleracea (and some other plant and animal species).
>From the readme file :

The stack calculates the frequency of the various
six-cutter restriction enzymes in a few different genomes. The algorithm is
based on the frequency of dinucleotide pairs, i.e., nearest-neighbor frequency.
Unfortunately, nearest-neighbor analysis has been performed on only a few
species, and they're the one's analyzed in this stack. If you look at this
stack, and you're aware of nearest-neighbor data for any other species please
tell me! DNA's with available nearest-neighbor data : bacteriophage lambda,
Mus musculus, wheat, Brassica, E. coli, Chlamydomonas, Saccharomyces,
Gallus, Homo sapiens.

Ftp address -
Or I can mail it to you if you want it.

Brian Osborne

Plant Gene Expression Center
800 Buchanan Street
Albany, CA  USA 94710
TEL 510-559-5919
FAX 510-559-5718

----- Begin Included Message -----

A good guide is to take into account the G-C content of Arabidopsis.  If it
was 50% then the chances of getting any 6 base recognition sequence would
be 1:4096 (4e6).  Since Arabidopsis is 41% G-C, things change a bit.  The
odds of finding a G or C are 1:4.88 and for A or T are 1:3.39. If you do
the math you will find the frequency finding a site that has 6 Gs and Cs is
1:13,505 bases; 4 Gs and Cs and 2 As and Ts is 1:6,517; 2 Gs and Cs and 4
As and Ts is 1:3,145; and 6 As and Ts 1:1,517.
These number correlate well with emperical observations.  Good luck.

Scott Michaels
Department of Biochemistry
University of Wisconsin-Madison
420 Henry Mall
Madison, WI  53706
Phone: 608-262-4640
----- End Included Message -----

bosborne at

----- End Included Message -----

More information about the Arab-gen mailing list