Central vs. Distributed Archives
harnad at ecs.soton.ac.uk
Wed Sep 10 16:02:41 EST 2003
Ebs Hilf -- who will host a meeting on the subject next week:
-- confirms that the rate of growth of the biggest and oldest open-access
archive -- the Physics Arxiv -- is still far, far too slow. I entirely
This does not diminish from the credit from Arxiv's having been the
first; but now, 12 years down the road, this unchangingly slow rate
suggests that something more may be needed than what has been feeding
Arxiv across the years, and my own guess (and Ebs's) is that that
something more may well be distributed institution-based self-archiving,
instead of Arxiv's central discipline-based self-archiving.
The reason institutional self-archiving is more likely to speed up
self-archiving and to generalize it across disciplines is that
researchers and their institutions both share the benefits of the impact
of their research output, whereas researchers and their disciplines do
not. It is not the discipline that exercises the incentive of
the "publish-or-perish" carrot-and-stick on researchers, it is their
research institutions. As the co-investor in and co-beneficiary of the
rewards of research impact (research funding, overheads, reputation,
prizes) the researcher's institution is in a position to mandate not only
"publish or perish" but "publish with maximal impact" -- which means
maximal access, which means open access, which means self-archiving.
I think on all this we agree with Ebs Hilf. Ebs too notes the likely
remedy for the sluggish growth rate of self-archiving in physics:
institutional (indeed, departmental) self-archiving. What is needed to
accelerate that is compelling empirical demonstrations of the correlation
between access and impact, to make researchers and their institutions
realize that self-archiving is in their own interest (and how much so)
-- in all disciplines.
There is, however, in Ebs's summary below, a rather important and
potentially misleading ambiguity: He conflates self-archiving with
publishing -- referring to depositing papers in Arxiv as "publishing"
them, in contrast to "self-archiving" them in institutional eprint
archives. But surely *both* of these are self-archiving and not
publishing! The publishing is done in the journals (in both cases). The
self-archiving is merely the provision of a supplementary version of
the paper, its full-text accessible online toll-free for all would be
users webwide (in either a central discipline-based eprint archive or in
distributed institution-based eprint archives).
Both central disicplinary archives like Arxiv and distributed
institutional archives include, in addition to the all
important peer-reviewed, published version of each article (the
"postprint") also the pre-peer-review preprint version(s) and
sometimes also postpublication updated and enhanced versions
("post-postprints"). But the critical version, and the one that
counts as the publication, is of course the published postprint:
That (and not unpublished preprints or revisions) is what
"publish-or-perish" is all about!
But apart from these minor points, I don't think Ebs and I disagree. Here
is the quote/commentary:
On Wed, 10 Sep 2003, Eberhard R. Hilf wrote:
> Dear Stevan and the list members,
> here are some arguments for
> 1. All physicists will publish in the ArXiv not before the year 2050,
> although the arxiv size is growing quadratically, not linearly with time.
> Earlier estimates [St. Harnad,
> slide 25 are to be revised].
> [see http://isn-oldenburg.de/~hilf/ ]
If readers look at slide 25 above, they will find that according to
Ebs's estimate (which I accept!), it would have to be revised to extend
the linear growth from 2020 instead to 2050. According to Ebs, at the
present growth rate, 2050 would be the first year in which *all* physics
articles published in that year are self-archived in Arxiv.
But note that that's *self-archived* in Arxiv, not *published* in Arxiv:
There is absolutely no reason to believe that all those articles will
not continue (*exactly* as they all do now) being published in the
appropriate peer-reviewed journal for their area and their quality-level.
("Publication" will continue to mean, as it does now, peer-review and
certification of having met that journal-name's quality standards.)
And the rate of growth of the portion of total annual published journal
article output in physics that is self-archived will grow (linearly!) from
now till it reaches 100% in 2050, at exactly the same unchanging rate
at which it has been growing for 12 years now.
> 2. Usage of repositories seems to be proportional to their size,
> but independent of absolute size.
> The full text you find at
> physicists will publish in the ArXiv not before the year 2050
Usage means downloads of papers, as opposed to deposits. It will be
interesting to see the figures as the number of distributed institutional
eprint archives ("repositories") grows. Each university will have
an eprint archive in each of its disciplines. All eprint archives will
be OAI-compliant. All will be harvested by cross-archive search-engines
such as oaister http://oaister.umdl.umich.edu/o/oaister/ and citebase
http://citebase.eprints.org/cgi-bin/search and even Elsevier's scirus
Of course the number of downloads from a particular eprint archive will
be proportional to its size (and the quality of its eprints), but it
will also be indicative of the research impact of that institution. (No
real equivalent for a central archive.)
> The ArXiv is unique in that it serves its own usage and submission logs.
So will the institutional eprint archives.
> Other new developments may have a much steeper rise of spreading,
> notably the self-archiving by the authors, their institutes or
> Universities and their libraries forming a distributed net of repositories.
I certainly hope so -- and there is also reason to expect it, for the
carrot/stick reasons shared by researchers and their institutions.
> The advantage is its scalability, flexibility, the business model
> (distributed funding by the institutions of the creators of the
> the retaining of the author's rights, the update possibility,
> and the acceptance spreading: to convince a large body such as a
> learned community to set up a central service such as the ArXiv for
> is much harder, then to convince a percentage of local distributed
> and institutes (the multiple small versus one large barrier chance).
It takes little to persuade institutions to maximize their own research
impact (in all its disciplines). Larger central bodies have no such
shared interests with their depositors.
By the way, there is no great author-rights retention problem: The only
right in question is the self-archiving right, and 55% of the 7000+
journals sampled so far by Romeo already formally support
self-archiving, while many of the rest will agree if asked on a
And there is always the preprint+corrigenda for the rest:
> The challenges are to set up the needed international standards,
What standards (apart from OAI-interoperability plus multilingual
> to allow intelligent search engines to serve the retrieval,
Look at the many ingenious OAI search engines that have already
been spawned, even by the minimal open-access content available so far!
> to stimulate the discussion and communication between the authors,
Mike Jewell has almost finished a variant of Eprints -- Jprints --
a generic version modeled on its two open-peer-commentary
journal variants, bbsprints and psycprints) in the same way eprints was
modeled on CogPrints. Jprints will be GNU open-source software for
self-archiving both articles and commentaries and author-responses.
Simon Buckingham Shum has created similar generic commentary software
in association with eprints too:
> At present, the ArXiv is still unique in serving unconditional time stamp,
> and long term readability.
All eprint archives can and will provide date stamps. XML documents are
the target for long-term readability (Jprints will generate XML) but for
now, when content is thin, beggars cannot (and should not) be choosers!
Eprints accepts any format that includes a screen-readable version (XML,
HTML, PDF, PS, TeX, ASCII).
> All (usage) numbers are astonishingly low, as we know from libraries usage
> of journals and books.
I couldn't follow this. If you mean the average article has
astonishingly few users, that's true. But the relevant thing is how
many *more* users it has if it is open-access than if it is
toll-access! (And how many more citations that eventually generates.)
More information about the Jrnlnote