David Kristofferson kristoff at
Thu Aug 29 12:24:35 EST 1991

This kind of thing crossed my mind too but we only indexed the
individual contents pages for the following reasons:

1) they weren't that long and the WAIS software highlights the
individual keywords where hits are found within the TOC.

2) we did not want to invest too much time on this until we got some
kind of feedback on how quickly people will start to use this.  To
date this has been about one day's worth of effort.  There are quite a
few whiz-bang projects that we could do, but we have to have some kind
of justification that they will actually be used, so we start off
modestly.  I am elated that you have already tried this out and I
would appreciate hearing from anyone else who uses WAIS with our

It should not be too much additional work for Eliot to index the TOC's
by article.  I'll talk to him about it soon and I wouldn't be
surprised if it happens within a day or so.

Once again, I'd like to hear from anyone who finds this to be useful.
It has often been our experience that people can be very
conservative when adopting new software and it would be great to be
proved wrong in this case.

We have also considered indexing all GenBank entries for WAIS
retrieval but have held off on this because WAIS requires YET ANOTHER
INDEX into the database and substantial consumption of disk space for
this task.  I have asked Eliot to try indexing the GenBank short
directory of entries which would allow users to do keyword searches on
information in the DEFINITION line.  Here is a sample of the gbsdr.txt
file which would be indexed by line:

AGMAPHAAA   African green monkey alpha-DNA.                               208bp
AGMERLTR1   African green monkey endogenous retroviral 5' LTR, segment 1  612bp
AGMERLTR2   African green monkey endogenous retroviral 3' LTR, segment 2  550bp
AGMGIBSC1   African green monkey BSC-1 cell growth inhibitor, complete   1585bp
AGMHSV1A    C.aethiops open reading frame A gene, complete cds and open  1164bp
AGMHSV1B    C.aethiops gene sequence.                                     243bp
AGMHSV1C    C.aethiops gene sequence.                                     118bp
AGMHSV1D    C.aethiops gene sequence.                                    2304bp
AGMHSV1E    C.aethiops gene sequence.                                     201bp
AGMHSV1F    C.aethiops gene sequence.                                     303bp
AGMHSV1G    C.aethiops gene sequence.                                     190bp
AGMKPNRSA   African green monkey kpni family interspersed repeat; ls1.   1784bp
AGMKPNRSB   African green monkey kpni family interspersed repeat; a7.     495bp

Users can then use the LOCUS names to retrieve entries via our e-mail
server.  Admittedly this is less elegant than allowing people to
retrieve the entire entry using WAIS, but I am not going to commit to
the consumption of a substantial amount of additional disk space until
it is apparent that a sizeable user base exists which demands this.
If you would find this useful, please contact me at
kristoff at  Once I receive enough responses we will
move in this direction.  Please don't bother sending these to the
newsgroups (!!!) because I am interested in getting a broad response
and not just dealing with a vocal minority.


				Dave Kristofferson
				GenBank Manager

				kristoff at

