Sequence of Bacillus subtilis chromosome

Tom Schneider toms at fcs260c2.ncifcrf.gov
Wed Oct 23 15:48:29 EST 1991


In article <Oct.22.11.41.00.1991.2335 at genbank.bio.net> julie at genbank.bio.net
(Julie Lawrence) writes:
>In response to the following post by Conrad Halling:
>
>> Furthermore, I'd like to point out that it is virtually impossible to
>> obtain the accession number of such a "complete" entry if you know only
>> that an entry with the sequence of a chromosome of organism X exists.
>> Using IRX to search GenBank, the following search strings give the following
>> number of hits:
>
>> sequence of Bacillus subtilis chromosome                30,992 entries
>> sequence AND Bacillus subtilis AND chromosome              544 entries

>I thought it might be useful to point out that these searches
>yielded so many hits because of the way they were phrased.  IRX
>interprets a phrase such as 'Bacillus subtilis' as an implied 'OR';
>thus the queries above are asking for every entry that has either
>Bacillus OR subtilis in it--hence, the large number of hits.
>
>If the query is rephrased as:
>
>  sequence AND bacillus AND subtilis AND chromosome
>
>only 26 hits are found--a much more manageable result.

Of course, one must still wade through these to find what one wants.

In my searches, I assumed what Conrad did.  Would it be hard to change the
program so that "Bacillus subtilis" means to search for "Bacillus" followed by
the word "subtilis"?  I don't think the program can do this now, and it would
be VERY useful!

Note that these all would be different searches:

Bacillus subtilis
Bacillus OR subtilis
Bacillus AND subtilis

The first would find the whole phrase "Bacillus subtilis" (with arbitrary
amounts of white space between them), the second would find entries with either
word, the third would find entries with BOTH words, but not necessarily
together or in any order.  You could implement the first one as an AND
followed by a check for order and pure white space inbetween.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms at ncifcrf.gov



More information about the Bioforum mailing list