Drosophila database searches thru Gopher and WAIS

Don Gilbert gilbertd at sunflower.bio.indiana.edu
Wed Nov 4 10:44:15 EST 1992


New Drosophila data search and retrieval thru Internet network
services are now available from the computer ftp.bio.indiana.edu.
These services are found using Internet Gopher or Wide Area Information
Service (WAIS). 


The Gopher services for Drosophila at ftp.bio.indiana.edu
look like this:


                    Root gopher server: ftp.bio.indiana.edu

      1.  About-IUBio-Gopher  [21Jun92, 3kb].
      2.  About-New-Features  [ 1Nov92, 3kb].
 -->  3.  Drosophila/
      4.  Genbank-Sequences/
      5.  IUBio-Software+Data/
            (more...)


                                  Drosophila

 -->  1.  About Drosophila Gopher  [23Feb92, 1kb].
      2.  Clone database search <?>
      3.  Cytological features search <?>
      4.  Drosophila Archive/
      5.  Drosophila Information Newsletter <?>
      6.  Drosophila Stocks at Bloomington, USA  <?>
      7.  Drosophila Stocks at Bowling Green, USA  <?>
      8.  Drosophila Stocks at Umea, Sweden <?>
      9.  Fly worker & GSA address search <?>
      10. Flybase/
      11. Flybase search  <?>
      12. Redbook/
      13. Redbook search <?>

All of the <?> services are WAIS/Gopher searches of fly data
files that reside in the Drosophila Archive:
   Clone database search == search clonelist.txt
   Cytological features search == search Amero.txt
   Drosophila Information Newsletter == search newsletter issues
   Drosophila Stocks ... == search stock lists
   Fly worker address search == search Haynie & GSA address files
   Flybase search == search Ashburner flybase files
   Redbook search == search complete Lindsley & Zimm Genome book

These search services are also available via WAIS client software.
The relevant WAIS source for IUBio archive is:
(:source 
   :version  3 
   :ip-address "129.79.224.25"
   :ip-name "ftp.bio.indiana.edu"
   :tcp-port 210
   :database-name "INFO"
   :cost 0.00 
   :cost-unit :free 
   :maintainer "archive at bio.indiana.edu"
   :description "
This WAIS service includes several indexed Biology information sources,
including Genbank nucleic acid gene sequence databank, Drosophila genetics
BioSci/Bionet network news, and others. 
")

And the fly wais databases are named:
   :database-name "fly-address"
   :database-name "fly-amero"   
   :database-name "fly-clones"
   :database-name "fly-din"
   :database-name "flybase"
   :database-name "flystock-bg"
   :database-name "flystock-bl"
   :database-name "flystock-um"
   :database-name "redbook"


As a reminder, client software for Macintosh, MS-Dos, Unix, VMS and
other computer systems are available for Internet Gopher via
anonymous ftp to boombox.micro.umn.edu, in /pub/gopher, and client
software for WAIS is available via ftp to ftp.think.com.   There
are also some of these available via ftp to ftp.bio.indiana.edu,
in /util/gopher and /util/wais directories.


I've modified the WAIS indexing and searching software in several
ways to make it more suitable for biology and genetic data searching.
These modifications include 
   a) use of symbols, so that queries like "In(4;5)red39" should
      work
   b) boolean 'and' and 'not' operators to limit a query results
   c) partial word searches, such as "hum*" matches human and hummingbird
   d) literal phrase searches, such finding "red rooster[45]" exactly
   e) output of data file headers (Gopher only so far).


The use of symbols is still somewhat problematic, since WAIS is based
on free text indexing, rather than on indexing of delimited fields
in databases, it needs to use some characters and symbols to delimit
words.  I've tried to find a distinction between symbols needed for genetic
"words" and symbols needed for distinguishing words (other than spaces),
but there is some overlap.  If you use the literal phrase search,
by enclosing a phrase with symbols in quote (') or double quote (")
marks, you may get better results.

For instance, in some of the fly data files, esp. redbook, "(" and ")" 
are used both as genetic symbols and as word delimiters.  Thus
searching for 
  Df(3)something
will generally parse into searching for the three words "Df"  "3" 
and "something", producing lots of matches.   While using a literal 
search,  
  'Df(3)something'
should limit the results to just that phrase.

There are other ways to better index genetic symbols, but they involve
more effort.  I'd like to get some feedback first on the usefullness
of this, from the general community of Internet-enabled fly
researchers.


The header file output adds a useful touch.  Here is one result
returned from a search of flybase for "ashburner":

This section is from the document '//Drosophila/Drosophila Archive/flybase/ABREFS.TEXT'.

gene-symbol         first-author           reference
------------        ----------------       -----------------------
Df(2L)ScoR+4                            Ashburner, Genetics 126:679
                                        McGill, Genetics 119:647
 

And a search for "red" produces this:
This section is from the document '//Drosophila/Drosophila Archive/flybase/LOCI.TEXT'.

gene-name-abbrev;  full-gene-name
    gene-map-position   cyto-map-position
        function
            nucleic-acid-databank-accession-number
                ditto-for-species-other-than-melanogaster
                    protein-database-accession-number
                        ditto-for-other-species
----------------------------------------------------------
red;    red
          3-53.6     88B1-88B2

-- 
Don Gilbert                                     gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405



More information about the Bionews mailing list