MycDB 3-8 data release

Staffan Bergh staffan at biochem.kth.se
Wed Jun 21 00:16:51 EST 1995


Stockholm & Paris June 18, 1995

The eight release of the Mycobacterium database, MycDB, is now available.

MycDB is funded by the WHO and the Fondation Raoul Follereau and is maintained
jointly by the Unite de Genetique Moleculaire Bacterienne at the Institut
Pasteur in Paris, France and the Department of Biochemistry at the Royal
Institute of Technology in Stockholm, Sweden.

MycDB is available free of charge via Internet network transfer. A complete
description of the procedure to retrieve and install the software and database
is available through the Internet (see the end of this message for addresses).

MycDB is also available through the WorldWideWeb servers at the Royal Institute
of Technology (URL: http://www.biochem.kth.se/MycDB.html) and (through the good
offices of the Genome Informatics Group at the National Agricultural Library)
on the Agricultural Genome Information Server, AGIS, in Beltsville, Maryland,
USA (WorldWideWeb URL: http://probe.nalusda.gov:8300/other/index.html or gopher
URL: gopher://probe.nalusda.gov:7000/11/genome.databases/mycdb/).

MycDB uses the excellent database software written by Richard Durbin (MCR-LMB,
UK) and Jean Thierry-Mieg (CNRS, France). The ACEDB software allows the user to
browse information by simply pointing and clicking with the workstation mouse.
A variety of powerful query methods are also available. However, our experience
is that most users choose the mouse interface to find the information they are
interested in.

As far as possible all information is connected to other information in the
database. The database software presents the information in separate windows
that allow many parts of the database to be viewed at one time. There are also
many paths to any piece of information, allowing the user to easily navigate
the connections between the various types of information.

New in release 3-8:

   * New and edited sequences from EMBL/GenBank/DDBJ, as of May 31, 1995. Due
     to extensive reanalysis, annotation and merging of overlapping sequences,
     and the removal of the blast search data (pending new data) the number of
     sequences have drastically decreased (from over 4000 Sequence objects to
     1832, and from 543 DNA objects to 515). However, the total number of
     nucleotides have increased by 171,008 (12.3%) to 1,560,886 nt. Thus, this
     is a considerably less redundant set of sequences from mycobacteria than
     in previous releases. We intend to continue this analysis and merging of
     sequences, and also add blast searches and coding frame predictions to the
     data set.
     There are two new subclasses, derived from the Sequence class:
     NucleotideSeq contains Sequence objects with attached nucleotide sequence
     (this replaces the earlier DNA class in the Main Window) and PeptideSeq
     contains protein sequences.

   * 112 new literature citations from MedLine, current as of May 31, 1995.
     The links to abstracts in the set of papers in earlier releases have been
     rechecked and errors removed. Of 3180 Paper objects in release 3-8, 2399
     have attached abstracts.

   * The Antigen class has been updated, mostly from a preview of a manuscript
     by Jelle Thole et al. The number of antigens have increased slightly (from
     202 to 249), and the number of antigens crossreferenced to sequences have
     almost doubled (from 49 to 97). Three new antigen codes have been assigned
     (in consultation with Jelle Thole).

   * The Locus class has been cleaned up, uninformative or erroneous names
     removed and crosslinking to Sequences and Antigens improved. The number of
     loci has consequently dropped from 373 to 312. We have tried to follow the
     locus naming convention used in the E.coli community. When function
     assignment is based on homology, the name used for the corresponding locus
     in E.coli has generally been used.
     There is a new subclass derived from the Locus class: Gene contains Loci
     for which sequence is known.

   * The number of objects in the Strain class has almost doubled (from 185 to
     302) and the Strain entries in sequences have been completely reanalysed,
     resulting in more complete crosslinking between strains and sequences.
     Strains are now crossreferenced to Antibodies and Antigens.

   * The Colleague class has been updated with some new additions and new
     addresses for some colleagues.

For information on earlier releases, see MycDB.3-6.Release.

The database currently requires a Unix workstation running X-Windows. A variety
of precompiled versions of the ACEDB database software are available through
anonymous ftp. See the file MycDB.Retrieval for more info. A Macintosh version
of the ACEDB software is also available, and we are working with the developers
of that version to produce a complete Macintosh archive, with a preloaded
database.

The model changes for this release are minor, and we are therefore only
distributing difference files. Thus no complete reload is necessary. The update
has been split in two files to keep file sizes reasonable. The files are in
compressed tar files called update.myc.3_7.tar.Z and update.myc.3_8.tar.Z. They
are intended to be installed together. If you already have MycDB installed on
your system, you only need these two files - put them in the same directory as
the database directory, uncompress and unpack them ('zcat update.myc.3_*.tar.gz
| tar xvf -') and then start MycDB, choose Add Update File from the drop-down
menu in the Main Window, and click on the All Updates button. If you do not
have the database installed on your system, the procedure to install the
database from scratch is described in the file MycDB.Retrieval.

This release (and the earlier releases 3-5 and 3-6) are intended to be used
with version 3_0 of the ACEDB database manager.

Unfortunately, there is currently some confusion with the binary distributions
of the different versions of ACEDB: ACEDB3_0 is available from MRC in Cambridge
(cele.mrc-lmb.cam.ac.uk (131.111.84.1) in pub/acedb/ace3). Other sites only
have ACEDB3_7, which is a preview of ACEDB4, a greatly enhanced version of the
database manager that is currently in beta test. ACEDB3_7 has some bugs that
will make it crash on updating, printing and on displaying some Sequence
objects. It also requires a different set of database specifications. As soon
as version 4 is stable, we will make a new release of data for MycDB, but in
the meantime please try to use ACEDB3_0. If you absolutely can't get version
3_0, contact Staffan.

If you wish to obtain the MycDB database please contact us, via e-mail, fax,
mail or telephone. If you are impatient and are already familiar with
internet/ftp, all relevant information can be found at

 * the WWW server at the Department of Biochemistry, KTH:
     http://www.biochem.kth.se/MycDB.html
 * the ftp server at the Department of Biochemistry, KTH:
     ftp.biochem.kth.se (130.237.52.64) in pub/MycDB
 * the ftp server at Institut Pasteur:
     ftp.pasteur.fr (157.99.64.12) in pub/MycDB
 * or the mirror site at the Weizmann Institute in Israel (thanks Jaime!):
     bioinformatics.weizmann.ac.il (132.76.55.12) in pub/databases/acedb/mycdb

The file MycDB.Retrieval describes in more detail the database system
requirements, network retrieval procedures for obtaining the database and
methods of obtaining future updates.

For more information contact Staffan Bergh or Stewart Cole.

Staffan Bergh
Biochemistry, Royal Institute of Technology, S-100 44 Stockholm, Sweden
Email: staffan at biochem.kth.se
Fax: (46 8) 24 54 52
Voice: (46 8) 790 8758

Stewart Cole
Unite de Genetique Moleculaire Bacterienne, Institut Pasteur, F-75724 Paris
Cedex 15, France
Email: stcole at pasteur.fr
Fax: (33 1) 45.68.85.93
Voice: (33 1) 45.68.84.46



More information about the Bionews mailing list