ACEDB data update 1-15 to 1-21 and code 1-10

Mike Cherry CHERRY at FRODO.MGH.HARVARD.EDU
Tue Aug 3 19:20:41 EST 1993


ACEDB data update 1-15 to 1-21 and code 1-10
--------------------------------------------

We are sending this list to all those who have asked to be on our
mailing lists, plus all those who have installed ACEDB using the
automatic INSTALL script.  We have eliminated all exact duplicates,
but some of you may have received more than one copy, in which case
please excuse us.

This is a major data release for the C. elegans database.  There are
also a number of code improvements since 1_9, but we are planning a
more significant code release for the next release.

Installation
------------
Ftp the correct tar.Z files and our INSTALL script in the ACEDB
main directory and type INSTALL. The details are explained at the
end of this letter.

NEW DATA: 
---------
We know that we have let the C elegans database run nearly 6 months
out of date.  We apologize for not distributing data regularly, and
have now a lot of new data, including the new CGC genetic map and the
seuqencing project cosmids as at the worm meeting.

The files in this update are compatible with the previous versions
of ACEDB.  In order to make the new genetic map, we have worked in
Cambridge with a slightly different data format that allows us to
represent properly the map data that was sent in earlier this year.
We plan to make a new release 2-1 in the next month or two which
will use the new data structures.

There 7 new update files.
1-15	new genetic map, including estimated accuracies
	new genetic map data, in old form (see above)
	new gene data, and gene names
	physical map 28/6/93
	bibliography up to around March 1993
	gazette and worm meeting titles and authors up to wm93
	changes to other minor classes
1-16	text for all gazette articles up to wbg12.1 (Setember 1991),
		(thanks to WCS for allowing us to take this data)
1-17	new DNA sequences, including all the cosmids referred to
		at the worm meeting.
1-18, 1-19, 1-20, 1-21 annotations for sequences

The gazette article text is stored under the "Abstract" tag.  You
can search it all for any component word by setting the "Long
Search" option on the main window before starting a text search.
This will take a little longer than a standard text search, since
the abstracts must be read in from disk.

When loading this data you will be asked what size to increase the
database to.  Reply 90000 (ninety thousand).  This takes 90Mb
of disk space.

We realise that the sequence annotations are very bulky.  Most of
the reason is that we are storing the complete results of a BLASTX
database search for all the cosmids and cdna sequences.  This
includes the name and title of all matching protein sequences, as
well as scores and offsets.  We plan to change this structure
before the next release, which will shrink the size of the whole
database significantly.

NEW PORTS:
----------
Acedb has now been ported several new Unix platforms, Sun-Solaris,
HP, PC-486 running LINUX, DEC-alpha, Convex etc. The complete list
is in wmake directory. It becomes however impossible to test
everything on every machine, so, in case of problem, please  contact
mieg at kaa.cnrs-mop.fr 

NEW CODE:
---------
The code has evolved in several directions.

-Fonts and Printer:
You can now choose your fonts and drive a color laser printer by
editing the self documented files wspec/xfonts.wrm and
wspec/psfonts.wrm 

-Subclass mechanism:
You can construct subclasses, which can be used in queries and from
the main menu, by editing wspec/subclasses.wrm

-Queries: 
You can now ask for say all authors with first letter m to z by typing
'> m' (greater than m) in the main window.  You can also search the
items in the LongText class by using the 'Long Search' button.

In addition 2 new interfaces, contributed by Gary Aochi of
LBL-Berkeley, will help you retrieve data. 'Query by examples' follows
a simple 'fill-the-blanks' paradigm. It is available from the main
menu. Query builder helps you write complex queries, it is available
as a button in the query window.

-Tables:
Table maker, available from the main menu, is a relatively friendly
interface that lets you export data as relational tables compatible
with other systems like Sybase or Lotus-1-2-3. In addition, you can
register some table definitions with some classes in
wspec/table.menu.wrm and they are then available as a button in text
displays of the objects of that class. 

-Server clients:
A server client architecture is being developed for acedb. At
present it only runs in non graphic mode. It can however be very
useful to import and export data in and out of acedb in batch mode.
The export supports several formats: fasta, ace, and a user
defined format which allows you to interspace your text with
exported data (see wdoc/client.report.example). The binaries for the
server client architecture are not ditributed, you must recompile
them or contact us.

-Metadata:
If you are interested in acedb for other organisms than the nematode,
John McCarthy and his group at LBL is working on an online help system
on the significance and use of the tags and models of acedb.  This
code is not part of the present release, but may be of interest to
you.

-Genetic map display: 
The presentation of the genetic map in this release is nearly exactly
the same as in the previous release.  As explained above, a modified
display will be released in the future, that will not be compatible
with the present data model. Developers will have the choice of
keeping the older code or reformatting their data in the newer form.
They should contact us now for a prerelease of the new code.

-Sequence display: 
The same is true to a lesser extent of the feature map that presents
the dna sequence and features.  However in this case, the new data
model is rather a simplification of the older one.  The reformatting
of the data is easy and the present and future code are nearly
identical.

-Restriction analysis:
Restriction analysis of known sequences, available from DNA on the
main menu or Analysis at the end of the fmap menu has been improved
and can also be used to locate pcr-primers on your own sequences.

-Sequence fetch: 

An interface to external protein databases has been written by Erik
Sonnhammer at the Sanger Centre.  When you look at a sequence, blastx
homologies are displayed as vertical blue rectangle.  These support a
sub-menu that says "Fetch" and "Show Align".  You select it with the
right mouse button.  Fetch retrieves the whole entry from an external
database, and displays it in text form in an acedb window.  Align
retrieves the sequences and displays a multiple alignment to your
translated DNA sequence.  Both require an external program in
directory $ACEDB/wscripts called "fetch" which acts as follows:
  Usage: fetch -options <query>
         Options:
         -B <database> Choose other database [SW|PIR|EMBL]
         -q            Only the sequence (on one line - used by align)
Note that this is not quite the same as the GCG fetch command
(unfortunately).

The program that we use requires EMBL CDROM style indices, as used by
the Staden package.  Source code for this version is avilable in 
fetch.tar.Z from the acedb directory at the standard ftp sites.

-----------------------------------------------------------------

	Instructions for obtaining updates/the whole thing

All the files are available in the following public access accounts
(anonymous ftp sites) accessible over internet:

  lirmm.lirmm.fr (193.49.104.10) in France, in directory genome/acedb
  cele.mrc-lmb.cam.ac.uk (131.11.84.1) in England, in pub/acedb
  ncbi.nlm.nih.gov (130.14.20.1) in the USA, in repository/acedb

In each case, log in as user "anonymous" and give a user identifier
as password.  Remember to transfer the files in BINARY mode by
typing the word "binary" at the start of your ftp session.  Many
thanks to NCBI for letting us share in their excellent resource.

Example:

ftp 193.49.104.10
login: anonymous
password 'whatever you want'
cd pub/acedb
binary
mget *
get README  y/n   answer yes   etc
..
..




More information about the Acedb mailing list