Regulatory region analysis: NSITE

Victor Solovyev solovyev at sanger.ac.uk
Wed Aug 18 19:52:19 EST 1999


 We installed NSITE program to analyze genome Regulatory regions
===================================================================
It is available at http://genomic.sanger.ac.uk/ of our
Computational Genomic Group WEB server
(http://genomic.sanger.ac.uk/gf/gf.html)

NSITE Program Description

NSITE - Search for of consensus patterns with statistical estimation

by Ilham Shahmuradov and Victor Solovyev

Analysis of nucleotide sequences is available through WWW:
http://genomic.sanger.ac.uk/gf/gf.shtml

NSITE serves for analysis of regulatory regions and their functional
motifs composition. The program is designed on UNIX OS and
adopted to work with Transfac type sites.

Method description:
     The method is based on statistical estimation of expected number
     of a nucleotide consensus pattern in a given sequence [1-2]. It
     uses the NSITE formatted datafile, which can include any set
     of consensus sequences of functional motifs. In current version this
     file consists of the public release of Transfac sequences (3.4, 1998),
     composite elements [3] and a set additioanl functional
     motifs.

     If we found a pattern which has expected number significantly less
     than 1, it can be supposed that the analysed sequence
     possesses the pattern's function.

     In the output of NSITE we can see a pattern, its position in the
     sequence, accession number, ID, Description of motif and binding
     factor name from the original database if exist.

     Asknowledgments: We asknowledge Igor Rogozin which took part in
     development some applications of this method for
     nucleotide consensuses searching on IBM PC [4].

     Output example:


      Program  *** N S I T E *** Shahmuradov, Solovyev
                (http://genomic.sanger.ac.uk)

      File with SITEs:     nsite.dat
      File with SEQUENCEs: ace1.seq
      Search PARAMETRS: Expected. Number -  0.0100000
      Siginicance Level -  0.9500000  Print Status - Yes

      Note: AC - Accession no. in TRANSFAC   or  NSITE DB
            DE - Description (gene or gene product)
            RE - Gene region (e.g. promoter,enhancer or unknown)
            BF - Binding factor(s)
            OS - Organism species

      ***************************************************************************
     > ace-1 /acetylcholinesterase 1 (ACHE)/* Chr. 10*/C.elegans/-2200:-1/
     Frequencies:  A -  0.31   G -  0.16   T -  0.35   C -  0.18 ... Length =
  2140

              10        20        30        40        50        60
      aaaaaaaactacgtgactagacatatcacgtttcggccgctactactttttgcgttgata
      ttttttttgatgcactgatctgtatagtgcaaagccggcgatgatgaaaaacgcaactat
      .......................................................
          2110      2120      2130      2140
      tctcccggcggtccaaacgattatgatttgttgaagaagc
      agagggccgccaggtttgctaatactaaacaacttcttcg
     ===========================================================================
         25. [  3] T: AC: R00037  / DE: beta-actin
      RE: unknown                            / OS: human, Homo sapiens
      BF:  SRF ..

              10
      ccttwyatgg
     ---------- Sites in  2nd chain ----------
      Max mismatch :  2
      Exp.Number:    0.006 Conf.Interval:   0 Found:   1
      begin: 1704 end: 1695 mismatch:   0 exp.num.:   0.006, site:CCTTTTATGG
      ===========================================================================
         74. [  1] T: AC: R00103  / DE: AMV (avian myeloblastosis virus)
      RE: unknown                            / OS: AMV, avian myeloblastosis
virus
      BF:  C/EBPalpha ..

              10
      cttgcgtca
     ---------- Sites in  1st chain ----------
      Max mismatch :  0
      Exp.Number:    0.004 Conf.Interval:   0 Found:   1
      begin: 1920 end: 1928 mismatch:   0 exp.num.:   0.004, site:CTTGCGTCA
      ===========================================================================
        103. [  1] T: AC: R00140  / DE: apoAII (apolipoprotein AII)
      RE: unknown                            / OS: human, Homo sapiens
      BF:  Tf-LF1 ..  NF-BA1 ..

              10        20
      cttcaacctttaccctggt
     ---------- Sites in  2nd chain ----------
      Max mismatch :  4
      Exp.Number:    0.001 Conf.Interval:   0 Found:   1
      begin:  897 end:  879 mismatch:   4 exp.num.:   0.001,
site:CTTCAACgTTgtCCCTGaT
      ===========================================================================
        287. [  1] T: AC: R00381  / DE: EGF receptor
      RE: unknown                            / OS: human, Homo sapiens
      BF:  Sp1 ..

              10        20
      tccgccccccgcacgg
     ---------- Sites in  1st chain ----------
      Max mismatch :  4
      Exp.Number:    0.004 Conf.Interval:   0 Found:   1
      begin:   79 end:   94 mismatch:   4 exp.num.:   0.004,
site:TCCGtCCCCgcCACtG
      ===========================================================================


     Reference:

     [1] Shahmuridov K.A. Kolchanov N.A.Solovyev V.V.Ratner V.A. Enhancer-like
         structures in middle repetitive sequences of the
         eukaryotic genomes. Genetics (Russ),22, 357-368,(1986).
     [2] Solovyev V.V., Kolchanov N.A. 1994,
         Search for functional sites using consensus
         In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A.,
Lim
         H.A.), World Scientific, p.16-21.
     [3] Heinemeyer, T., Chen, X., Karas, H., Kel, A. E., Kel, O. V., Liebich,
I.,
         Meinhardt, T., Reuter, I., Schacherer, F., Wingender, E. (1999).
Expanding
         the TRANSFAC database towards an expert system of regulatory olecular
     [4] Solovyev V.V.,Rogozin I.B. The program package of the context analysis
         of   DNA, RNA and protein sequenses 1.Search for gomology
         and functional sites. Institute Cytology and Genetics of the
         USSR Academy of Science, Novosibirsk,(Russ),1-70,(1986).



-- 
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk  http://genomic.sanger.ac.uk
Phone: 44-1223-494799  FAX:   44-1223-494919




More information about the Bionews mailing list