an EMBL gopher

Reinhard Doelz doelz at
Sun Mar 12 05:00:54 EST 1995

Reinhard Doelz (doelz at wrote:

:         (2) WAIS-indexing is a full-text search. This means that a full 
:             text is analyzed for keywords. The more keywords there are 
:             the slower WAIS will become. Worse, it will not allow searches
:             for trivial words ('protein', 'mrna') as it has a built-in 
:             limitation to not incorporate words occuring more than a given
:             number.

Colleagues, to give you an example, look at the xembl index and see that 
the following words are affected (about 300). This is not a real word
but a phenomenon caused by not merging the index block of this word.
(xembl is only the daily updates. Real 'embl' has a much higher exclusion

adams, adamskerlavage, animalia, animaliametazoa, assessment, based,
basedupon, basepairs, bednarik, bednarikcao, blake, blakebrandon, bp,
brandon, brandonchiu, bult, bultlee, cao, caocepeda, catarrhini,
catarrhinihominidae, cdna, cepeda, cepedacoleman, chiu, chiuclayton,
chordata, chordatavertebrata, clark, clarkdubuque, clayton, claytoncline,
cline, clinecotton, clone, coleman, colemancollins, collins, collinsdimke,
cotton, cottonearle, dillon, dillonfannon, dimke, dimkefeng, diversity,
dubuque, dubuqueelliston, earle, earlehughes, elliston, ellistonhawkins,
est, esthomo, estproject, eukaryota, eukaryotaanimalia, eukaryotaplantae,
eutheria, eutheriaprimates, expression, expressionpatterns, fannon,
fannonrosen, feng, fengferrie, ferrie, ferriefischer, fields,
fieldsfraser, fine, finefitzgerald, fischer, fischerhastings, fitzgerald,
fitzgeraldfitzhugh, fitzhugh, fitzhughfritchman, fleischmanfuldner,
fleischmann, fragment, fraser, fraserventer, fritchman,
fritchmangeoghagen, fuldner, fuldnerbult, gene, genediversity, genexpress,
genexpressgenexpress, genexpressthe, geoghagen, geoghagenglodek, glodek,
glodekgnehm, gnehm, gnehmhanna, gocayne, gocaynewhite, greene,
greenegruber, gruber, gruberhudson, hanna, hannahedblom, haplorhini,
haplorhinicatarrhini, haseltine, haseltinefields, hastings, hastingshe,
hawkins, hawkinsholman, hedblom, hedblomhinkle, hehu, hillier,
hillierclark, hinkle, hinklejr, holman, holmanhultman, hominidae,
hominidaeadams, hominidaegenexpress, hominidaehillier, homo, hu, hudson,
hudsonkim, hughes, hughesfine, hugreene, hultman, hultmankucaba, human,
humangene, initial, initialassessment, ji, jili, jr, jrkelley, kelley,
kelleyklimek, kelleyliu, kerlavage, kerlavagefleischman, kim, kimkozak,
kirkness, kirknessweinstock, klimek, klimekkelley, kozak, kozakkunsch,
kucaba, kucabale, kunsch, kunschji, le, lee, leekirkness, lelennon,
lennon, lennonmarra, li, libednarik, limeissner, liu, liumarmaros,
mammalia, mammaliatheria, marmaros, marmarosmerrick, marra, marraparsons,
mcdonald, mcdonaldnguyen, meissner, meissnerolsen, merck, merckest,
merrick, merrickmoreno, metazoa, metazoachordata, millionbasepairs,
moreno, morenopalanques, nguyen, nguyenpellegrino, olsen, olsenraymond,
palanques, palanquesmcdonald, parsons, parsonsrifkin, partial, patterns,
patternsbased, pellegrino, pellegrinophillips, phillips, phillipsryder,
plantae, primates, primateshaplorhini, program, project, raymond,
raymondwei, rifkin, rifkinrohlfing, rna, rnaest, rohlfing, rohlfingtan,
rosen, rosenhaseltine, ruben, rubendillon, ryder, ryderscott, sapiens,
saudek, saudekshirley, scott, scottsaudek, sequence, shirley,
shirleysmall, similar, small, smallspriggs, spriggs, spriggsutterback,
standard, sutton, suttonblake, tan, tantrevaskis, thegenexpress, theria,
theriaeutheria, transcribed, trevaskis, trevaskiswaterston, utterback,
utterbackweidman, venter, venterinitial, vertebrata, vertebratamammalia,
washu, washumerck, waterston, waterstonwilliamson, wei, weidman,
weidmanli, weinstock, weinstockgocayne, weiwing, white, whitesutton,
williamson, williamsonwohldmann, wilson, wilsonwashu, wing, wingxu,
wohldmann, wohldmannwilson, xu, xuyu, yu, yuruben

As you can see, GOPHER  built on WAIS indices might fail to retrieve
some authors heavily involved in publishing :-)

More severely, retrieval expressions like 'mrna and mammalia' will 
give not the exprected results. 


 R.Doelz         Klingelbergstr.70| Tel. x41 61 267 2247  Fax x41 61 267 2078|
 Biocomputing        CH 4056 Basel| electronic Mail    doelz at|
 Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=>EMBnet Switzerland:info at</a> 

More information about the Embl-db mailing list