Central vs. Distributed Archives

Stevan Harnad harnad at cogito.ecs.soton.ac.uk
Sat Feb 3 05:28:19 EST 2001

On Fri, 2 Feb 2001, Greg Kuperberg wrote:

> On Sun, Dec 31, 2000 at 09:57:50PM +0000, Stevan Harnad wrote:
> >
> >   http://www.cogsci.soton.ac.uk/~harnad/Tp/resolution.htm
> >
> >   "Physicists have already shown the way, but at their current
> >   self-archiving rate, even they will take another decade to free the
> >   entire Physics literature"
> Of course you are entitled to your opinion that institution-based open
> archiving (sorry, I won't call it "self-archiving") is the bugle call
> of the revolution.

Terminology is terminology, but calling one's own archiving of one's own
papers "self-archiving" sure sounds like calling a spade a spade...

Besides, the Open Archives Initiative (OAI http://www.
openarchives.org) has informed me in no uncertain terms that I should
NOT characterize self-archiving as open-archiving or vice versa. The
OAI is a much broader initiative than the self-archiving initiative.

OAI is dedicated to providing shared interoperability standards for the
entire on-line digital literature, whether self-archived or not,
whether for-free or for-fee, whether journal, book or other, whether
full-text or not, whether centralized or distributed.

It is true that the OAI was originally proposed as the "UPS" (Universal
Preprint Service), which was indeed a form of self-archiving (though a
limited form, focussing on the unrefereed preprint rather than on both
the unrefereed preprint and the refereed postprint, as self-archiving
does). But "UPS" was quickly dropped and the OAI has since vastly
outgrown those limited original objectives.

> In my opinion, institution-based archives are,
> o in physics, all but superceded by the arXiv,

On-Line archives (apart from the Physics arXiv) are all but non-existent.

The hope is that institution-based, distributed self-archiving (perhaps
with the newfound help of the http://www.eprints.org archive-creating
software) will now remedy this.

And, as I said above, even in Physics, self-archiving is still growing
too slowly to free the Physics literature in less than a decade. It
seems to me that the central self-archiving model, admirable and
welcome though it is, can use all the help it can get.

> o in mathematics, a politically appealing distraction, and

I have no idea why you mention politics. The only "appeal" is to
researchers, that they should free their refereed research from their
obsolete access- and impact-barriers by self-archiving it, now. I have
no "political" preference for their doing it the central way or the
distributed way: We should all just go ahead and DO it!

I used to lean towards central self-archiving myself, seeing no reason
why it should not all be subsumed under arXiv; but that just isn't
happening, and the clock is ticking; so it's time to add more powerful
and general means of self-archiving.

Besides, the whole point of OAI-compliance and interoperability is that
it should no longer MATTER which way you self-archive: centrally or
institutionally. It's all harvestable into the same global virtual
archive anyway, thanks to the OAI protocol.

Unless one's "political" objective becomes, publisher-like, to protect
one's own proprietary (centralized?) turf instead of to free the
research literature...

> o in computer science and economics, the inadequate status quo.

I have no idea what you mean by the above.

> As I said before, I know that NCSTRL and RePEc, which are the efforts
> in computer science and economics to make institutional archives
> interoperable, are important major projects.  I don't mean to slight
> them.  But they are not a panacea and they do not match the arXiv.

Nobody is trying to "match" anything. We are trying to free the research
literature, as quickly and as effectively as possible.

> Computer science has a second important project, ResearchIndex/CiteSeer,
> which has some good features that the arXiv does not.  But (a) it doesn't
> match the arXiv either, (b) it relies on search engine intelligence and
> not bureaucratic standards, and (c) an arXiv search facility could be
> made as intelligent as CiteSeer.

I really can't follow any of this, and I have no idea who you think is
competing with whom for what:

ResearchIndex/CiteSeer is a wonderful tool, harvesting and
citation-linking papers on the Web, whether in OAI-compliant archives
or not. As the OAI-compliant corpus grows (with the growth of central
and distributed self-archiving), ResearchIndex/CiteSeer's harvest will
grow, and surely we all welcome that!

I don't know what you have in mind with "bureaucratic standards," but you
need not sell me on search-engine intelligence: I love it already.

Moreover, as the OAI-compliant corpus grows, it will spawn still
further and more powerful Open Archive Service Providers (e.g., OpCit
http://opcit.eprints.org and ARC http://arc.cs.odu.edu/).

But the main goal now is to do whatever can be done to make that corpus
grow into the full refereed literature in all disciplines as soon as
possible. This is not the time to squabble over who has the best
current archive or search engine...

> Also, in my opinion the main use for OAI is not interoperability between
> institutions, but rather between entire disciplines.

Fine. But before you can make the on-line research literature
interoperable across disciplines you have to get it on-line (and free)!
And if central, discipline-based self-archiving is not freeing it
on-line anywhere near fast enough, it's time to give the institutional
route a go, with the help of OAI interoperability.

Stevan Harnad                     harnad at cogsci.soton.ac.uk
Professor of Cognitive Science    harnad at princeton.edu
Department of Electronics and     phone: +44 23-80 592-582
             Computer Science     fax:   +44 23-80 592-865
University of Southampton         http://www.cogsci.soton.ac.uk/~harnad/
Highfield, Southampton            http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM           

