Scientometric OAI Search Engines
harnad at ecs.soton.ac.uk
Sun Aug 25 09:40:48 EST 2002
This message is addressed to researcher/authors of papers in
peer-reviewed research journals:
Something revolutionary is in the making in the form of scientometric
OAI search engines.
Citebase http://citebase.eprints.org/ is a prototype OAI service
http://www.openarchives.org/service/listproviders.html now available
(free, of course) to give research authors, users, their institutions
and their research-funders a foretaste of what is coming and what
Citebase has just been incorporated as an experimental feature for
all users of the Physics Archive http://arxiv.org -- the largest
http://arxiv.org/show_monthly_submissions and most heavily used
http://arxiv.org/show_weekdays_graph Eprint Archive to date.
We are hoping that by demonstrating the remarkable possibilities that a
full-text citation-linked open-access corpus opens up, citebase will
help to accelerate the rate at which the refereed research
literature is made openly accessible online through institutional
The mother of all hyperlinks is bibliographic
citation. Google's spectacularly successful system of
ranking digital content by the number of incoming links
http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm is simply
a generalized case of the pre-existing scholarly practice of following
the links that authors provide by citing their reference. The
number of incoming reference links has long been used for ranking
scholarly/scientific content by, for example, the Institute for
Scientific Information (ISI) in the form of the citation impact factor
Here are the elements in the chain:
(1) A manuscript (preprint) is submitted to a journal for evaluation
in the form of peer review.
(2) The manuscript is peer-reviewed, revised and, if successful,
published under the journal's name, certifying that it has met that
journal's quality standards.
(3) The journal's name and track record for quality is then used by
researchers, research-funders, and the author's own institution as
one of the guides in evaluating whether the work should be read,
used, cited and further funded, and whether the author should be
rewarded through salary increases, promotion, or prizes.
(4) In addition to the journal-name's established reputation for
peer-review standards, its "citation impact factor" (the average
number of citation links to its articles from other articles) is
used an evaluative guide by potential users and funders.
(5) Articles and authors can also be evaluated and ranked, not just
by the name-brand and citation impact of the journal in which they
appear, but by the individual citation impact of each individual
article and/or author.
(6) Journal reputations and journal/article/author citation impacts
can also be supplemented by evaluations in review articles and
commentaries and by various forms of promotion and self-promotion
by journals, authors, alerting services, and the public press
(although these evaluations themselves would need to be evaluated,
if they were not simply to be counted as further citations).
(7) A new potential measure of on-line impact, not available in the
on-paper era, is usage, in the form of "hits." This measure is noisy
(it can be inflated by automated web-crawlers, short-changed by
intermediate caches, abused by deliberate self-hits from authors,
and undiscriminating between nonspecific site-browsing and
item-specific reading) yet it seems to have some signal-value too,
partly correlated with and partly independent of citation impact:
(8) Nor do citations and hits exhaust the potential of online
performance indicators. They are just the beginning of a wealth of
potential scientometric guides to users and evaluators, including
co-citation analysis, time-series analysis, and other potentially
predictive analyses of correlations and trends among citations, hits,
and even articles' content-words that will no doubt be invented and
discovered as more of this corpus comes online.
So try out citebase, and don't forget to supplement your experience with
(a) Citebase content right now is preponderantly in physics,
mathematics and computer science. Imagine what it would be like if
the full-text open-access content http://www.soros.org/openaccess/
were up there in all the other disciplines too. (And remember
that getting it up there depends on -- and waits on -- only
(b) Notice how natural and useful it feels to navigate the literature
via citation links, guided by author or article ranking in terms of
citation impact or hit impact. Imagine how much more useful it will
feel when all the research literature is up there, gap-free, and
spawns still newer and more powerful online scientometric guides.
Harnad, S. (2001) "Research access, impact and assessment." Times Higher
Education Supplement 1487: p. 16.
NOTE: A complete archive of the ongoing discussion of providing open
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00 & 01):
Discussion can be posted to: september98-forum at amsci-forum.amsci.org
See also the Budapest Open Access Initiative:
and the Free Online Scholarship Movement:
More information about the Jrnlnote