an EMBL gopher

Reinhard Doelz doelz at comp.bioz.unibas.ch
Sun Mar 12 05:00:54 EST 1995


Reinhard Doelz (doelz at comp.bioz.unibas.ch) wrote:

:         (2) WAIS-indexing is a full-text search. This means that a full 
:             text is analyzed for keywords. The more keywords there are 
:             the slower WAIS will become. Worse, it will not allow searches
:             for trivial words ('protein', 'mrna') as it has a built-in 
:             limitation to not incorporate words occuring more than a given
:             number.

Colleagues, to give you an example, look at the xembl index and see that 
the following words are affected (about 300). This is not a real word
but a phenomenon caused by not merging the index block of this word.
(xembl is only the daily updates. Real 'embl' has a much higher exclusion
rate.)

adams, adamskerlavage, animalia, animaliametazoa, assessment, based,
basedupon, basepairs, bednarik, bednarikcao, blake, blakebrandon, bp,
brandon, brandonchiu, bult, bultlee, cao, caocepeda, catarrhini,
catarrhinihominidae, cdna, cepeda, cepedacoleman, chiu, chiuclayton,
chordata, chordatavertebrata, clark, clarkdubuque, clayton, claytoncline,
cline, clinecotton, clone, coleman, colemancollins, collins, collinsdimke,
cotton, cottonearle, dillon, dillonfannon, dimke, dimkefeng, diversity,
dubuque, dubuqueelliston, earle, earlehughes, elliston, ellistonhawkins,
est, esthomo, estproject, eukaryota, eukaryotaanimalia, eukaryotaplantae,
eutheria, eutheriaprimates, expression, expressionpatterns, fannon,
fannonrosen, feng, fengferrie, ferrie, ferriefischer, fields,
fieldsfraser, fine, finefitzgerald, fischer, fischerhastings, fitzgerald,
fitzgeraldfitzhugh, fitzhugh, fitzhughfritchman, fleischmanfuldner,
fleischmann, fragment, fraser, fraserventer, fritchman,
fritchmangeoghagen, fuldner, fuldnerbult, gene, genediversity, genexpress,
genexpressgenexpress, genexpressthe, geoghagen, geoghagenglodek, glodek,
glodekgnehm, gnehm, gnehmhanna, gocayne, gocaynewhite, greene,
greenegruber, gruber, gruberhudson, hanna, hannahedblom, haplorhini,
haplorhinicatarrhini, haseltine, haseltinefields, hastings, hastingshe,
hawkins, hawkinsholman, hedblom, hedblomhinkle, hehu, hillier,
hillierclark, hinkle, hinklejr, holman, holmanhultman, hominidae,
hominidaeadams, hominidaegenexpress, hominidaehillier, homo, hu, hudson,
hudsonkim, hughes, hughesfine, hugreene, hultman, hultmankucaba, human,
humangene, initial, initialassessment, ji, jili, jr, jrkelley, kelley,
kelleyklimek, kelleyliu, kerlavage, kerlavagefleischman, kim, kimkozak,
kirkness, kirknessweinstock, klimek, klimekkelley, kozak, kozakkunsch,
kucaba, kucabale, kunsch, kunschji, le, lee, leekirkness, lelennon,
lennon, lennonmarra, li, libednarik, limeissner, liu, liumarmaros,
mammalia, mammaliatheria, marmaros, marmarosmerrick, marra, marraparsons,
mcdonald, mcdonaldnguyen, meissner, meissnerolsen, merck, merckest,
merrick, merrickmoreno, metazoa, metazoachordata, millionbasepairs,
moreno, morenopalanques, nguyen, nguyenpellegrino, olsen, olsenraymond,
palanques, palanquesmcdonald, parsons, parsonsrifkin, partial, patterns,
patternsbased, pellegrino, pellegrinophillips, phillips, phillipsryder,
plantae, primates, primateshaplorhini, program, project, raymond,
raymondwei, rifkin, rifkinrohlfing, rna, rnaest, rohlfing, rohlfingtan,
rosen, rosenhaseltine, ruben, rubendillon, ryder, ryderscott, sapiens,
saudek, saudekshirley, scott, scottsaudek, sequence, shirley,
shirleysmall, similar, small, smallspriggs, spriggs, spriggsutterback,
standard, sutton, suttonblake, tan, tantrevaskis, thegenexpress, theria,
theriaeutheria, transcribed, trevaskis, trevaskiswaterston, utterback,
utterbackweidman, venter, venterinitial, vertebrata, vertebratamammalia,
washu, washumerck, waterston, waterstonwilliamson, wei, weidman,
weidmanli, weinstock, weinstockgocayne, weiwing, white, whitesutton,
williamson, williamsonwohldmann, wilson, wilsonwashu, wing, wingxu,
wohldmann, wohldmannwilson, xu, xuyu, yu, yuruben


As you can see, GOPHER  built on WAIS indices might fail to retrieve
some authors heavily involved in publishing :-)

More severely, retrieval expressions like 'mrna and mammalia' will 
give not the exprected results. 


Regards
Reinhard

-- 
 R.Doelz         Klingelbergstr.70| Tel. x41 61 267 2247  Fax x41 61 267 2078|
 Biocomputing        CH 4056 Basel| electronic Mail    doelz at ubaclu.unibas.ch|
 Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info at ch.embnet.org</a> 



More information about the Embl-db mailing list