IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Improved searching for the Arabidopsis Research Companion

Mike Cherry 726-5955 CHERRY at FRODO.MGH.HARVARD.EDU
Mon Nov 16 12:48:08 EST 1992


The searching power associated with the Arabidopsis Research Companion
have been improved.  All of these  improvements are the result of work
by Don Gilbert (Indiana University) and Tim  Gauslin (USGS). The three
WAIS databases provided by the Arabidopsis Research  Companion are the
Arabidopsis thaliana Genome Database  (AAtDB),  Arabidopsis BioSci/MSU
electronic conference  archive and the  Caenorhabditis elegans  Genome
Database (ACEDB)

Don Gilbert has improved the ability to  search WAIS indexed databases
by extending the  power  of the WAIS   software.  The 'and' and  'not'
search modifiers, partial words  using wildcards, and  literal phrases
provide    increased  power to   craft  a  specific  question of these
databases.

Below are Don Gilbert's examples that explain these new features:

Search Modifiers: The terms 'and' and 'not' are effective in modifying
the query.  For example:

    Query: red and green not blue
    Result: just those records with both the words 'red' and 'green',
            excluding all records with the word 'blue'.
            
Partial words: The asterisk (*) applied at the end  of a  partial word
will match all documents with words that start  with the partial word.
For example,
    
    Query: hum*
    Result: all records with 'hum', 'hummingbird', 'human',
            'humbug', etc.
            
Literal  phrases: If quotes  (') or double   quotes (\") surrounding a
phrase, it will match that phrase exactly.  For example,
     
     Query: 'red rooster-39'
     Result:  only those records with the the full string
            'red rooster-39' will be matched.

There are some practical limits on this.  The first  part of a literal
phrase must be a  word that is otherwise  indexed.  Thus your  literal
cannot start  with a symbol  or other word delimiter.   Within quotes,
the search modifiers (and and not)  and the partial  word wildcard are
not active.

Here are  a few examples  that  might  be of  use  to the  Arabidopsis
community:

      tt4 and ttg
      ethylene and muta* and 199*
      light and harvesting
      'light harvesting'
      '2,4-d'
      dwarf* and flower

Hints: The WAIS software provides an  index of every word,  except for
words that appear  more than 20,000 times.   Because the search occurs
very fast it is reasonable to start small with a general question then
procede to more specific questions. Also because of the variability in
english it is good to try the '*' wildcard to match several forms of a
word. For example: dwarf* will match dwarf, dwarfs and dwarf-like.

The Arabidopsis Research Companion is a Internet Gopher server that is
available   from  the host  weeds.mgh.harvard.edu.    The  three  WAIS
databases mentioned above are available from weeds.mgh.harvard.edu and
the database names are:

AAtDB                   An Arabidopsis thaliana Database
ACEDB                   A Caenorhabditis elegans Database
Arabidopsis-BioSci      ARAB-GEN BioSci mailing list plus all messages
                         from the original MSU Arabidopsis mailing list. 

If you  do not  have  either Gopher or  WAIS client  software a public
access account is  available via the Internet  telnet command. Here is
an example:

$ telnet ochre.mgh.harvard.edu
username: gopher
password: thaliana

If you need more  information  or have suggestions please  contact the
AAtDB group  via email at curator at frodo.mgh.harvard.edu or  via FAX at
(617) 726-6893.

Mike Cherry



More information about the Arab-gen mailing list

Send comments to us at biosci-help [At] net.bio.net