frequency of restriction enzymes cutting

I include in this mail the answers I got to my question. Maybe it can be usefull for some of you.

> Dear net-friends,
> can anybody suggest to me where I could find a kind of list or table on 
> the estimated cuttings of certain restriction enzymes in Arabidopsis?
> I think I once saw it in a company catalogue but I don't seem to find it 
> back.
> Any help will be appreciated. I will mail the answers back to the net, so 
> it can be of use to others.
> Thanks a lot!
Answers :


This was posted on the net a few years ago:
     The following is a table for some common restriction enzyme
sites in the Arabidopsis genome. This table was composed using Jamie
computer program and data I obtained through searches.  The table itself
made by John McDowell using the information from the computer printouts
of dAta.

             ENZYME         RECOGNITION SITE    FREQUENCY       AVE

1            Apa I          GGGCCC              0.0026          38,460
2            Xma I          CGGCCG              0.0053          18,870
3            Sma I          CCCGGG              0.0066          15,150
4            Sac II         CCGCGG              0.0092          10,860
5            Kpn I          GGTACC              0.0118          8,470
6            Xho I          CTCGAG              0.0144          6,940
7            Bam HI         GGATCC              0.0144          6,940
8            Xba I          TCTAGA              0.0158          6,330
9            SaII/HincII    GTCGAC              0.0158          6,320
10           Spe I          ACTAGT              0.0197          5,070
11           Sac I          GAGCTC              0.0249          4,016
12           Pst I          CTGCAG              0.0289          3,460
13           Eco RV         GATATC              0.0302          3,310
14           Eco RI         GAATTC              0.0368          2,590
15           Cla I          ATCGAT              0.0394          2,530
16           Hind III       AAGCTT              0.0617          1,620
17           AhaIII/DraI    TTTAAA              0.0703          1,422

The table was put together using the known sequences of Arabidopsis as
in Genbank and Uembl.  The computer program (which fits a markov chain)
these sequences and searches for trinucleotide, tetranucleotide, and
hexanucleotide counts (compares random to expected).  John took the
hexanucleotide counts and looked for common restriction sites which he
put in table form.
I wrote a Hypercard stack for the Macintosh that does this for the
species Brassica oleracea (and some other plant and animal species).
>From the readme file :

The stack calculates the frequency of the various
six-cutter restriction enzymes in a few different genomes. The algorithm is
based on the frequency of dinucleotide pairs, i.e., nearest-neighbor frequency.
Unfortunately, nearest-neighbor analysis has been performed on only a few
species, and they're the one's analyzed in this stack. If you look at this
stack, and you're aware of nearest-neighbor data for any other species please
tell me! DNA's with available nearest-neighbor data : bacteriophage lambda,
Mus musculus, wheat, Brassica, E. coli, Chlamydomonas, Saccharomyces,
Gallus, Homo sapiens.

A good guide is to take into account the G-C content of Arabidopsis.  If it
was 50% then the chances of getting any 6 base recognition sequence would
be 1:4096 (4e6).  Since Arabidopsis is 41% G-C, things change a bit.  The
odds of finding a G or C are 1:4.88 and for A or T are 1:3.39. If you do
the math you will find the frequency finding a site that has 6 Gs and Cs is
1:13,505 bases; 4 Gs and Cs and 2 As and Ts is 1:6,517; 2 Gs and Cs and 4
As and Ts is 1:3,145; and 6 As and Ts 1:1,517.
These number correlate well with emperical observations.  Good luck.

