Genbank key search & fetch thru IUBio Gopher hole

Don Gilbert gilbertd at sunflower.bio.indiana.edu
Mon Feb 17 07:56:08 EST 1992


In article <1992Feb17.075807.10132 at urz.unibas.ch> doelz at urz.unibas.ch writes:
>...
>> The currently installed Genbank is release 0.01 (January 1992) from
>> NCBI, which has some 62,807 sequence entries (nearly 200 megabytes
>> of sequence and descriptive data).  This is based on release 70 of
>> Genbank plus many entries from Medline added at NCBI.  It was
>> obtained by anonymous ftp to ncbi.nlm.nih.gov, cd ncbi-genbank.
>> 
>I'm just curious ... do these 'many' entries reflect additions to the 
>annotation or are these 'real' ? 

These are real -- taken from published journal articles that were not
otherwise submitted to Genbank.  By count, Gb release 70 has 
  58952 loci, 77337678 bases, from 74023 reported sequences
while NCBI-Genbank release 0.01 has
  62807 loci, 81434522 bases, from 62807 reported sequences
Each NCBI-Genbank entry also has a protein translation, where relevant,
in the features table (translation="...").

>Sounds *very* interesting to me. I have the CD ROM of EMBL installed, and 
>certainly would like to make indexes available this way also. 

An interesting feature of this Gopher index/search is that the indexes
(40mb) can be kept on a fast hard disk/machine for quick lookups, while
the bulky data, which doesn't need access except for fetches can be
kept on a slow disk, cdrom, and even another computer.

>Are you willing to share, sell or otherwise disclose the code? I would 
>appreciate to hear about it. 

Of course I'll give it away -- it was just a quick weekend hack anyway.
(it took me 1/2 day to get it working, then another 1/2 day for a
few more patches like the NOT operator, then another 1/2 day for
a few more diddles like the long list...).

I need to check the copyrights of the original source to see if I can
redistribute a modified source, if not I'll make difference files
available for "patch" use.   Also I need to add some instructions, etc.
If someone with EMBL and programming ability wants to install that format
in the waisindex (Reinhard ?), I'll help you out first and we can 
distribute the version with both Genbank and EMBL.  I think it will be
just a matter of substituting "DE   " for "DESCRIPTION  ", etc.

-- Don

-- 
Don Gilbert                                     gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405




More information about the Bio-soft mailing list