Sequence databases questions

David Kristofferson kristoff at
Sat Apr 25 11:55:16 EST 1992

timmer at CCWF.CC.UTEXAS.EDU (Richard T. Timmer) writes:

>I'm sure that the following are a series of naive FAQ's, but
>any help would be greatly appreaciated (and I hope that this
>is the correct bionet section to post this on).  So with those
>introductions aside, I hope that someone can help with the 

>     1. Is there a GenBank mail server that can automatically
>process searches for sequences by query keyword or phrase?

Not at GenBank.  You need to access our IRX account (info below) for
this purpose.  The account requires interactive login.  Dan Davison's
server at Houston allows keyword searches by e-mail and I leave a
description of that to him.

>     2. In addition to FASTA and BLAST, what other types of 
>access to GenBank(or other genetic/protein databases) are 
>available via the Internet?

Again see the info below.

> Where does one find about these 
>other types of access?

Right here or better yet by posting to the
GenBank-BB/bionet.molbio.genbank newsgroup.  E-mail address is
genbankb at

> In addition to the ability to do 
>keyword or phrase searches of the databases (as mentioned 
>above), I also was curious whether there was something similar
>to NCBI's EntrezSequences available via some type of Internet 
>server or via Gopher. 

Don Gilbert has set one up.  He can respond to this.  I should note
that the GenBank On-line Service is shutting down at the end of
September so the services mentioned herein are of temporary nature.
NCBI will be picking up some of them.

>    3. Has anyone (individual or organization) collected into 
>some type ofdatabase a compilation of motifs (DNA, RNA, and 
>protein) described in theliterature?

PROSITE (available for FTP from and other places) is a
public domain database for protein motifs.  The only DNA motif databank that
I am personally aware of (since I work for IntelliGenetics) is KeyBank
which works with the IG Suite.

> If such a compilation is
>available, can it be searched by homology comparison against
>a query sequence (e.g. my favorite protein)?

Yes.  You can get access to KeyBank via a class 2 GenBank On-line
Service account.

>     4.  I suppose this is an extension of #2, or perhaps more
>exactly what I was trying to ask initially.  I was curious as 
>to the types of genetic/protein databases are accessible via 
>Internet (FTP or Gopher) and the nature of the access (or 
>database search services) available.

Most of the major sequence databases are available for FTP from and as well as from several other


				Dave Kristofferson
				GenBank Manager

				kristoff at


The GenBank On-Line Service

The GenBank On-Line Service (GOS) provides access to the most recent
quarterly releases of the GenBank and EMBL nucleic acid sequence databases, 
as well as the data added to each of these since their most recent releases 
(in the New Data databases).  In addition, the Swiss-Prot protein sequence 
database and GenPept, a database of peptide sequences derived by the 
automatic translation of annotated coding regions of entries in the 
GenBank databases, are available.  Users can query the databases by 
annotation keywords, search for sequence similiarity, and retrieve entries 
of interest.  The GOS is available through e-mail servers, anonymous FTP, 
anonymous interactive login, and login to established, password-protected, 
individual accounts.  Access to all GOS services is available to both 
commercial and non-commercial users at the same cost.  On-line help is 
available for all aspects of this Service.  User manuals, information
on costs, and application forms may be requested from GenBank at


Interactive access to the GOS databases is provided through the SprintNet
public data network and via remote login over the Internet.  At present,
the IRX (Information Retrieval Experimental Workbench) program is the
primary interactive database retrieval program.  Three usage classes are
available for the GOS; these classes are described below.

Class 0 Accounts

Anonymous users of the interactive system are provided with 20 minute
sessions using the IRX retrieval program.  With this program, entries
in any of the on-line databases can be located by searching for a
keyword or combination of keywords appearing in any of the fields of
the entries' annotations.  Located entries can be displayed on the
terminal or downloaded to the user's computer with the Kermit
file-transfer program.  (The Kermit program is available for a wide
variety of computers from numerous software bulletin boards, user
groups, and from Columbia University.  MS-DOS and Macintosh versions
are available from GenBank on request.)  New users of the IRX program
should read the on-line introduction which can be displayed by
answering 'Y' to the first question the program asks ("Do you want

To use the GOS Class 0 account, one must have a supported terminal or
a computer with software for emulating one of those terminals (see the
list in the Example at the end of this message) and a modem capable of
communicating at 300, 1200, 2400, or 9600 baud.  Instructions for
dialing to access the GenBank computer are shown in the example below.
After completing the login procedure shown in the example, the IRX
database query program is immediately started.

Class 1 Accounts

To gain access to additional services, users of the GOS may wish to
establish accounts on the GOS computer.  These accounts provide access
to the GOS computer, 1 Mbyte of disk space for user files, access to
IRX, the GenBank relational database management system, and
interactive and batch mode use of FASTA and TFASTA (a version of FASTA
that compares a peptide sequence with a nucleic acid sequence database
by translating the database sequences in up to six reading frames "on
the fly") and the BLAST similarity search programs.  Class 1 accounts
also provide electronic mail access for contacting other users of the
GOS and users of computers connected to the Internet and other
computer networks. Access to a wide variety of electronic bulletin
boards is also provided.  Newsgroups that may be of special interest
are the bionet.journals.contents newsgroup which provides on-line
versions of the tables of contents of several important journals
before publication and bionet.sci-resources which provides on-line
copies of the NIH Guides to Grants and Contracts.  Several other
newsgroups are available for exchange of information on experimental
protocols and other areas of scientific interest.

Class 2 Accounts

For an additional fee, Class 2 users are provided with access to the
IntelliGenetics Suite of sequence analysis programs and databases
formatted for those programs.  Additional databases (e.g., the PIR Protein
Sequence Database, KeyBank(TM), and VectorBank(TM)) are also available to
Class 2 users.  Class 2 users also have access to all the facilities
available to Class 1 users.


In addition to providing interactive access, GenBank currently offers
three electronic mail servers, two for sequence similarity searching
and one for database entry retrieval.  These are freely available to
anyone who can send mail to an Internet address.  The following
networks have gateways to the Internet: BITNET, EARN, NETNORTH and
JANET.  Users of computers on these networks may need to change the
format of the addresses given below to send the message through a
forwarding gateway.  Users should consult their computer system
managers or administrators to determine the proper forwarding gateway
and address form.  Questions regarding the use of the e-mail servers
(or other technical support questions about GOS) may be addressed to:

FASTA Server

The GenBank FASTA Server receives mail messages containing a nucleic acid
or protein query sequence with instructions for the search. The server
then performs a FASTA sequence similarity search against the specified
database, and returns the results by electronic mail.

To use the FASTA Server, send an electronic mail message containing the
formatted query sequence to the following Internet address:
SEARCH at GENBANK.BIO.NET.  To receive instructions for formatting the query
sequence, send a mail message to this address containing the word "HELP"
as the only line of the message.

BLAST Server

The GenBank BLAST Server receives mail messages containing a nucleic acid
or protein query sequence with instructions for the search. The server
then performs a BLAST sequence similarity search against the specified
database, and returns the results by electronic mail.

To use the BLAST Server, send an electronic mail message containing
the formatted query sequence to the following Internet address:
BLAST at GENBANK.BIO.NET.  To receive instructions for formatting the
query sequence, send a mail message to this address containing the
word "HELP" as the only line of the message.

Entry Server

E-mail access to sequence database entries is provided for three
reasons: 1) to enable users of the FASTA and BLAST Servers to retrieve
entries identified by sequence similarity searches; 2) to enable users
of the Class 0 interactive system described above, who access it by
network remote login (e.g., telnet) to retrieve copies of entries of
interest; and 3) to enable readers of journals that identify published
sequences by accession number to retrieve computer-readable versions
of those sequences.  To retrieve a database entry, send a mail message
containing only the entry name or the accession number (not both) to
the address: RETRIEVE at GENBANK.BIO.NET.  Multiple entries may be
requested in the same message if entered on separate lines.  The
on-line databases are searched and the entry (if any) which
corresponds to the supplied entry name or accession number will be
returned by electronic mail.  To receive instructions on using the
Entry Server, send a mail message to the RETRIEVE address (above)
containing the word "HELP" as the only line of the message.  Because
of the order in which the databases are searched, if both GenBank and
EMBL data banks contain entries with the same primary accession number
(the usual case), a query on the accession number will result in the
GenBank version of the entry being returned.  If the EMBL-format
version of the entry is required, it can be retrieved from the EMBL


In addition to interactive access and electronic mail servers, GenBank
also provides files for anonymous FTP (File Transfer Protocol),
including GenBank and EMBL new data and contributed software.  Each week
the new entries created in the GenBank database are collected into an
update file.  The file has a name in the form of gbMMDD.seq, where MM is the
number of the month and DD is the date of file creation.  Likewise, new
EMBL entries are collected into files with names in the form of emMMDD.seq.
The weekly update files are kept in the new data directories until they
are superseded by a new quarterly release of the database.

To access any of the files available for anonymous FTP, one should use
the FTP protocol to connect to GENBANK.BIO.NET [], using
"anonymous" as the Username and one's surname as the Password.

Example.   Login to the free GOS IRX account

ATDT14159616860			Use ATDP for pulse dialing phone.
CONNECT 2400			Connect to 2400 baud modem

login:genbank		Typing 'genbank' allows you to access the GenBank
Password:4nigms		This is the password for the GenBank computer; it
			MUST be entered in lowercase characters.  
Last login...		This message includes a date showing the last 
			anonymous login, as well as other system
SunOS Release 4.0.3 (GENBANK)

The following is a list of commonly used terminals 
Designation 	Terminal Type 
adm3a 		Lear Siegler (ADM) 
aaa-48 		Ann-Arbor Ambassador in 48 line mode
aaa-60 		Ann-Arbor Ambassador in 60 line mode 
dm3025 		Datamedia 3025a 
h19		Heath H19 or Zenith 
hp2621 		Hewlett Packard HP2621 
hp2648-iv 	Hewlett Packard HP2648A 
sun 		Sun Microsystems Workstation console 
tvi912 		Televideo 912, 920 
tvi950 		Televideo 950 
vi200 		Visual 200 
vt100 		Digital Equipment VT100 (default) 
vt102 		Digital Equipment VT102 
vt200 		Digital Equipment VT200 
Press Return to select vt100, or enter the appropriate terminal

(type the designation of the appropriate terminal type followed by <CR>)

After completing the login procedure shown above, the IRX sequence entry
searching program is immediately started.


Further information about the GenBank On-line Service may be obtained by
contacting GenBank at:

      c/o IntelliGenetics Inc.
      700 East El Camino Real
      Mt. View, CA  94040
      (415) 962-7364
      gos at

More information about the Bioforum mailing list