Tue Aug 24 17:03:06 EST 1993

[Given the growing importance of informatics for biological research, this
message is being sent to multiple postings; this is an apology to those on
more than one mailing list and therefore getting multiple copies.]

                         11 August 1993 

Dear Colleague,

Biologists who are routine users of the Internet are all well aware
of the twin revolutions in the biological sciences and in
information technology.  Uniting the advances in computers,
informatics, and networking with those in biology is widely
recognized as a key to understanding the inherent complexities of
living systems.  As the genome project develops the requisite
capacity to increase sequencing throughput by up to three orders of
magnitude, the technological burden will shift increasingly to
informatics.  An Advisory Group to the Department of Energy (DOE)
suggests "The success of the genome project will increasingly be
judged by the ease with which accurate and timely answers to
interesting questions about genomic data may be obtained."  Indeed,
for the genome project to be the basis, as expected, for future
departures in biological research and medicine, a diverse,
creative, and robust informatics effort to link broad areas of
biological knowledge with DNA sequence data will be essential. 

The DOE Office of Health and Environmental Research (OHER) has
supported a portfolio of research projects in genome informatics. 
We expect to expand such informatics and computational biology
support broadly over the next few years, since we are committed to
ensuring that the necessary computational tools and data resources
are developed and enhanced to exploit the products of the genome
project for the biological community.  Recognizing the generality
of the growing importance of computer and information science
across biology, we will seek to embed support for the
infrastructure within structural biology, microbial genome
research, and other areas of biotechnology and biomedical
applications which DOE supports.  

Of particular importance are tools and data resources that permit
integrated views of diverse biological data.  To ensure the high
quality of the programs we support and to develop a strong vision
for the longer term future, we asked an independent panel to review
the entire existing OHER program in bioinformatics, including
research at private institutions, universities and the national
laboratories.  We then held a series of planning meetings and
discussions focusing on community needs for information tools and
resources.  We are circulating a draft document for comments, to
create a white paper that captures the essence of the advice that
we have received.

After additional input from the community, this draft
document/white paper will be turned into a White Paper on
Bioinformatics, to be released by October 1, 1993, if not sooner.
By that time, this White Paper will be available electronically,
including through the computational biology Gopher at Johns Hopkins
( under the directory, "Mathematics and Biology". 
The draft version can now be obtained electronically under that
directory; comments may be sent to jay.snoddy%er at
or by mail to HGMIS (see below)  Hard copies of the draft version
are also available from HGMIS; after October 1, 1993, hard copies
of the final White Paper will be available from HGMIS at:
               HGMIS (Human Genome Information Management System)
               Oak Ridge National Laboratory
               PO Box 2008
               Oak Ridge, TN 37831-6050
                    Internet: bkq at
                    Fax  615-574-9888

The first report or White Paper, and others to be developed, should
be valuable to the international community and the funding
agencies, and encourage widespread discussion on how best to unite
the twin revolutions in biology and information technology.  The
White Paper is being used by DOE as one input in priority-setting
for continued funding of critical areas of bioinformatics research
and for seeking new areas for development.  DOE will cooperate
closely with all interested funding agencies, software developers,
and researchers.  DOE has a strong commitment that the required
informatics infrastructure for the genome, including DNA Sequence
Data and Mapping Data, will be maintained, improved, deeply
integrated, and available across the internet using new interfaces
supported by rigorous standards.  DOE is also committed to
facilitate not only US wide efforts but also coordination with
international collaborators to deliver an ever-increasing level of
support to the community as a whole. 

To ensure the growing informatics needs of the genome community are
met, to exploit the advances of computer networking technology, and
respond to our advisors, the suggestions made in the White Paper,
DOE will support further development of the DNA sequence database
activities at Los Alamos National Laboratory.  In these efforts,
Los Alamos National Laboratory is collaborating with private
institutions, universities and other National Laboratories.  The
Department of Energy's research support for further development of
these collaborative efforts focused around LANL's core efforts in
this area over more than a decade, will focus in the future on the
following components:

** Increased emphasis for on-line submission and maintenance. **  
For that growing portion of the user community on the Internet, we
will be emphasizing the use of on-line data submission tools, such
as DOE's Annotator's WorkBench (AWB), over batch-submission tools
like Authorin. In addition to simplifying the submission process,
the AWB also enables submittors to make corrections in their data
without the intervention of our staff (although they will continue
to review all such work), and makes it possible for other
researchers to add their own (properly attributed) annotation to
existing entries. 

** Renewed emphasis on remote database access. **  In addition to
our support for relational satellite copies of the DOE-Los Alamos
database, we will be providing direct, Sybase client-server access
at Los Alamos and a number of other DOE-supported sites for remote
SQL queries.  Further, the relational schema of the DNA Sequence
Database will evolve rapidly, both to better support the most
common queries, and to provide the basis for queries involving
other key genome and structural biology databases.   

** Increased emphasis on quality control. **  In addition to the
work that the annotation and review staff does on quality control
of submissions, DOE/Los Alamos and other collaborators will focus
increasingly on the task of automating the quality control process.

** Continued processing of direct submissions. **   To enhance the
relational version of the DNA Sequence Database and maintain the
necessary capacity for genome submittors and the genome project,
DOE/Los Alamos will continue to receive and process direct
submissions (normally within 48 hours) at the following email and
postal addresses.  (Genome submittors may obtain a Mac or PC version of
Authorin by anonymous FTP from, in the pub/authorin

    - Electronic submissions: gb-sub at  
    - Corrections, additions: update at   
    - Diskette submissions:
                    Data Submissions
                    Group T-10, Mail Stop K710
                    Los Alamos National Laboratory
                    Los Alamos, NM  87545

** Continued international collaboration. **  All data processed by
DOE/Los Alamos will be shared with the DDBJ, EMBL, and NCBI
databases.  Collectively, the collaborating databases will continue
to work toward the goal of at least the appearance of a single,
international data collection, to facilitate access by researchers
and software developers. 


David J. Galas
Associate Director 
Health and Environmental Research
Washington, DC 20585

