Reinhard Doelz (doelz at comp.bioz.unibas.ch) wrote:
: (2) WAIS-indexing is a full-text search. This means that a full
: text is analyzed for keywords. The more keywords there are
: the slower WAIS will become. Worse, it will not allow searches
: for trivial words ('protein', 'mrna') as it has a built-in
: limitation to not incorporate words occuring more than a given
: number.
Colleagues, to give you an example, look at the xembl index and see that
the following words are affected (about 300). This is not a real word
but a phenomenon caused by not merging the index block of this word.
(xembl is only the daily updates. Real 'embl' has a much higher exclusion
rate.)
adams, adamskerlavage, animalia, animaliametazoa, assessment, based,
basedupon, basepairs, bednarik, bednarikcao, blake, blakebrandon, bp,
brandon, brandonchiu, bult, bultlee, cao, caocepeda, catarrhini,
catarrhinihominidae, cdna, cepeda, cepedacoleman, chiu, chiuclayton,
chordata, chordatavertebrata, clark, clarkdubuque, clayton, claytoncline,
cline, clinecotton, clone, coleman, colemancollins, collins, collinsdimke,
cotton, cottonearle, dillon, dillonfannon, dimke, dimkefeng, diversity,
dubuque, dubuqueelliston, earle, earlehughes, elliston, ellistonhawkins,
est, esthomo, estproject, eukaryota, eukaryotaanimalia, eukaryotaplantae,
eutheria, eutheriaprimates, expression, expressionpatterns, fannon,
fannonrosen, feng, fengferrie, ferrie, ferriefischer, fields,
fieldsfraser, fine, finefitzgerald, fischer, fischerhastings, fitzgerald,
fitzgeraldfitzhugh, fitzhugh, fitzhughfritchman, fleischmanfuldner,
fleischmann, fragment, fraser, fraserventer, fritchman,
fritchmangeoghagen, fuldner, fuldnerbult, gene, genediversity, genexpress,
genexpressgenexpress, genexpressthe, geoghagen, geoghagenglodek, glodek,
glodekgnehm, gnehm, gnehmhanna, gocayne, gocaynewhite, greene,
greenegruber, gruber, gruberhudson, hanna, hannahedblom, haplorhini,
haplorhinicatarrhini, haseltine, haseltinefields, hastings, hastingshe,
hawkins, hawkinsholman, hedblom, hedblomhinkle, hehu, hillier,
hillierclark, hinkle, hinklejr, holman, holmanhultman, hominidae,
hominidaeadams, hominidaegenexpress, hominidaehillier, homo, hu, hudson,
hudsonkim, hughes, hughesfine, hugreene, hultman, hultmankucaba, human,
humangene, initial, initialassessment, ji, jili, jr, jrkelley, kelley,
kelleyklimek, kelleyliu, kerlavage, kerlavagefleischman, kim, kimkozak,
kirkness, kirknessweinstock, klimek, klimekkelley, kozak, kozakkunsch,
kucaba, kucabale, kunsch, kunschji, le, lee, leekirkness, lelennon,
lennon, lennonmarra, li, libednarik, limeissner, liu, liumarmaros,
mammalia, mammaliatheria, marmaros, marmarosmerrick, marra, marraparsons,
mcdonald, mcdonaldnguyen, meissner, meissnerolsen, merck, merckest,
merrick, merrickmoreno, metazoa, metazoachordata, millionbasepairs,
moreno, morenopalanques, nguyen, nguyenpellegrino, olsen, olsenraymond,
palanques, palanquesmcdonald, parsons, parsonsrifkin, partial, patterns,
patternsbased, pellegrino, pellegrinophillips, phillips, phillipsryder,
plantae, primates, primateshaplorhini, program, project, raymond,
raymondwei, rifkin, rifkinrohlfing, rna, rnaest, rohlfing, rohlfingtan,
rosen, rosenhaseltine, ruben, rubendillon, ryder, ryderscott, sapiens,
saudek, saudekshirley, scott, scottsaudek, sequence, shirley,
shirleysmall, similar, small, smallspriggs, spriggs, spriggsutterback,
standard, sutton, suttonblake, tan, tantrevaskis, thegenexpress, theria,
theriaeutheria, transcribed, trevaskis, trevaskiswaterston, utterback,
utterbackweidman, venter, venterinitial, vertebrata, vertebratamammalia,
washu, washumerck, waterston, waterstonwilliamson, wei, weidman,
weidmanli, weinstock, weinstockgocayne, weiwing, white, whitesutton,
williamson, williamsonwohldmann, wilson, wilsonwashu, wing, wingxu,
wohldmann, wohldmannwilson, xu, xuyu, yu, yuruben
As you can see, GOPHER built on WAIS indices might fail to retrieve
some authors heavily involved in publishing :-)
More severely, retrieval expressions like 'mrna and mammalia' will
give not the exprected results.
Regards
Reinhard
--
R.Doelz Klingelbergstr.70| Tel. x41 61 267 2247 Fax x41 61 267 2078|
Biocomputing CH 4056 Basel| electronic Mail doelz at ubaclu.unibas.ch|
Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info at ch.embnet.org</a>