Latest FTP instructions from GenBank

Dave Kristofferson kristoff at GENBANK.BIO.NET
Tue Feb 20 19:29:48 EST 1990

	     Example of FTPing data updates from GenBank
		    (last update 2/20/90 - D.K.)

In the following example NET<n> is the Unix prompt on the computer
used for this example.  Your computer's system prompt will be
different.  Details of the ftp protocol may also be different on your
local computer, so please consult your local systems manager if the
following commands are not recognized on your local computer.

Comments in the example below are set off by ***'s as here or by ;'s
at the end of a line.  Input is underlined and <cr> means press the
return key.  NOTE that the GenBank UNIX system is CASE-SENSITIVE.  Use
lower case input unless specifically noted otherwise below.

NET<1>ftp<cr>		;FTP to the computer
Connected to
220 GENBANK.BIO.NET FTP server (SunOS 4.0) ready.
Name ( anonymous<cr>	;use name "anonymous"
331 Guest login ok, send ident as password.
Password:	 <cr>				;use your user name as a
	 ------------				;password, e.g., kristoff
						;in my case.
230 Guest login ok, access restrictions apply.
ftp> ls<cr>					;list the directory contents
200 PORT command successful.
150 ASCII data connection for /bin/ls (,2591).
226 ASCII Transfer complete.
50 bytes received in .68 seconds (0.072 Kbytes/s)

The GenBank new data is actually kept a few subdirectories down under
pub/db/gb-newdata.  EMBL new data is under pub/db/embl-newdata.  For
the sake of brevity we simply change directly to the pub/db directory.

ftp> cd /pub/db<cr>
250 CWD command successful.
ftp> ls<cr>				;show GenBank and EMBL subdirectories
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
embl-newdata				;directory for new EMBL data
gb-newdata				;directory for new GenBank data
gb-rel62				;directory for GenBank Release 62
gp-rel62				;directory for GenPept Release 62
gp-newdata				;directory for new GenPept data
226 Transfer complete.
79 bytes received in 0.04 seconds (1.9 Kbytes/s)

Finally we change into the GenBank new data subdirectory.  The above
steps could have been omitted and one could simply use "cd
pub/db/gb-newdata" to switch to this directory after logging in.  The
ls command below shows the names of the new data files.  Note that the
file names all start with "gb" for GenBank followed by the month and
the day for the file in question (each file contains the previous
week's data), and finally an extension of .seq is used to indicate
that the file contains nucleic acid sequence data.  A similar
file naming convention is used for EMBL data as can be seen by using
the ls command in the directory pub/db/embl-newdata.  Files that end
with a .Z extension after the .seq are compressed using the UNIX
"compress" utility.

Notes: Weekly update files are available as standard ASCII files or
       as compressed ASCII files.  The compressed files are about
       one-third the size of the standard files.  They can be
       distinguished by the .Z suffix and can be uncompressed after
       transfer with the standard Unix uncompress utility.

       In addition to the weekly (incremental) update files, the
       newdata directories each contain a file which represents the
       cumulative contents of the (GenBank, GenPept, or EMBL) new data
       since the last (quarterly) public release, pruned of duplicate
       entries.  A given weekly update file will not contain duplicate
       instances of an entry, but a given entry may appear (with
       changed status) in more than one of the weekly update files.
       Within a cumulative update file (gbseq.all, gpseq.all, or
       emseq.all) there should be no duplicate entries.  The cumulative
       update files are updated daily and are available in compressed
       form only.

       The current GenBank release files (in gb-rel62) and
       GenPept release files (in gp-rel62) are provided
       in compressed form only.

ftp> cd gb-newdata<cr>
250 CWD command successful.
ftp> ls<cr>
200 PORT command successful.
150 Opening ASCII mode data connection for file list.
gb1225.seq			;data for the week ending 12/25.
gb1225.seq.Z			;compressed data for the week ending 12/25.
gbseq.all.Z			;compressed cumulative update file.
226 Transfer complete.
372 bytes received in 0.02 seconds (18 Kbytes/s)

Next use the "get" command to retrieve the desired file.  These files
are typically anywhere from 1 - 4 Megabytes.  Transfer time will be
limited most likely by the speed of your local Internet connection.
The GenBank connection runs at 1.54 Mbps so it will not be the rate
limiting step in most cases.  The "get" command retrieves the file
over the Internet into your directory on your local computer.

NOTES FOR VMS USERS: Some of the data files are in compressed format
(using the UNIX "compress" utility) and are named as follows:
gbxxxx.seq.Z.  To transfer successfuly these to a VMS system, you will
have to rename them locally (before beginning the transfer) to
eliminate one of the "."'s, e.g., rename the file to gbxxxx_seq.Z.
Also note that case is important in the file names, i.e., the final
"Z" is definitely in uppercase!

ftp> get gb1106.seq<cr>		;get data for week ending Nov. 11th.
200 PORT command successful.
150 ASCII data connection for gb1106.seq (,2595) (1510609 bytes).
226 ASCII Transfer complete.
local: gb1106.seq remote: gb1106.seq
1535022 bytes received in 19 seconds (80 Kbytes/s)

After the first transfer is complete you can retrieve other files or
use the "bye" or "quit" command to end the session.

ftp> bye<cr>
221 Goodbye.


More information about the Bioforum mailing list