I- Presentation

NRSub  (which  means  "Non  Redundant  Subtilis")  is a database
containing  a  "clean"  set of Bacillus subtilis sequences taken
from  the  SubtiList  collection. By "clean" we mean that all of
its nucleotide sequences are cleared of duplications. Additional
data  on  gene  mapping  and codon usage are also introduced, as
cross-references with EMBL, Swiss-Prot and Enzyme collections.

NRSub  release 4 contains a total of 248 contigs (61 composite).
All  these  sequences  are  chromosomal (plasmidic sequences are
removed)  and  totalize 1,251,557 bp. This represents approxima-
tively 30% of the entire Bacillus subtilis chromosome consisting
of  about 4,165 kbp. These sequences contain 1053 CDS (358 ORF),
72  tRNA  and  27  rRNA.  At  last, a total of 423 bibliographic
references can be accessed.

II- System requirement

NRSub  is provided either in EMBL flat file format or structured
under the ACNUC data base model. Of course, the flat file can be
used  with  any  kind  of computer. On the other hand, the ACNUC
version need the retrieval program query. We provide executables
of query for the following architectures: Sun Sparc (under SunOS
4.1.x  or Solaris 2.x), IBM RISC, SGI, and DEC Alpha. Sources of
the  line-mode  version  of  query  (in  Fortran and C) are also
included  in  the  distribution.  This line-mode  version may be
compiled  and ran on almost any UNIX system (BSD or SysV). To do
so, you need to have a Fortran compiler and a C compiler instal-
led on your computer.

Detailed instructions for set-up and use are given in the INSTALL
file of the package.

III- Distribution

The  release  4  of  NRSub is available at the NIG anonymous FTP
(  or  in the directory /pub/db/nrsub.
It is also possible to access NRSub through a WWW server at URL:

The distribution includes: 

 - The  NRSub data base under ACNUC and the sources of the query
   program for line mode use (file NRSub.r4.tar.Z).

  - The flat version of the NRSub data base in EMBL format (file
    NRSub.dat or NRSub.dat.Z).

  - The  binaries  of the graphical version of the query program
    (files  query_win.*.Z).  Query_win.SUN file is for Sun Sparc
    under  SunOS,  query_win.SOL is for Sun Sparc under Solaris,
    query_win.RS6000  is  for  IBM  RISC, query_win.ALPHA is for
    DEC Alpha, and query_win.SGI is for Silicon Graphics.

All  the  *.Z  files  are  compressed  using  the  UNIX  command
'compress'.   In   a  way  to  uncompress  them,  you  must  use
'uncompress'.  The  flat  file  version  of NRSub is distributed
either compressed (NRSub.dat.Z) or in plain text (NRSub,dat) for
the people not working on a UNIX machine.

If  you  have some problems or questions, feel free to ask me at
the following address:

                 Guy Perriere
                 National Institute of Genetics
                 Shizuoka-ken 411, Mishima

                 Email: gperrier at

