Prospects for institutional e-print repositories study
harnad at ecs.soton.ac.uk
Sun Jul 13 11:58:45 EST 2003
> Prospects for institutional e-print repositories in the United Kingdom
> Michael Day UKOLN, University of Bath, Bath BA2 7AY, United Kingdom
> http://www.ukoln.ac.uk/ m.day at ukoln.ac.uk
> Institutional e-print repositories are one of a range of responses
> to what is generally known as the serials pricing crisis.
There is a genuine serials pricing crisis, and it does need to be
remedied, and eprint archives may well contribute to its remedy,
but in my opinion it is a (substantial) strategic mistake to see and
portray self-archiving merely as a response to the serials pricing
crisis. Researchers are far less equipped and motivated to solve
the serials pricing crisis than they are to take advantage of the
dramatic and unprecedented new opportunity that the online medium
(and OAI interoperability) has provided to enhance the impact of their
own research output, in terms of access, use, application, and citation. (6
recent papers on this specific point are cited at the end of this
commentary. A 7th appears below.)
Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003)
Mandated online RAE CVs Linked to University Eprint
Archives: Improving the UK Research Assessment
Exercise whilst making it cheaper and easier. Ariadne.
> The costs of the present journal-based communication system,
> both in terms of subscription prices and restricted access, have
> meant that the impending demise of traditional journals has been
> confidently predicted for a number of years... However, for a variety
> of reasons, this has not yet happened.
True. And this should suggest that the access-toll-factor, though it
plays an important role in the motivation for change, may not be playing
the *direct* causal role that a focus on pricing or on trying to change
journals' practise would suggest: It is *researchers'* practise that can
(and will) change.
> No longer, as in the print era, do authors need to trade the copyright
> of works to publishers in exchange for having them printed and
> distributed (Harnad & Hey, 1995, p.114
The picture is now more in focus than it was in 1995, and the most
transparent and direct way of describing the new possibility is that
authors do not need to trade access-blockage (and hence
impact-blockage) in exchange for peer reviewed publication. They can
supplement it by self-archiving an open-access version of their own
paper in their institutional eprint archives, freely accessible to all
potential users worldwide whose institutions cannot afford the tolls to
the toll-access version. (It is a mistake to suggest that authors need
to retain copyright in order to do this -- welcome though that would
be. There is no need to wait for a change in publishers' copyright
retention policy. All that is needed is immediate self-archiving (of the
pre-refereeing preprint in any case, and then a self-archiving either
of the refereed postprint or of a list of the corrections that were made
between the preprint and the postprint, whichever is appropriate).
> Those scientists and scholars that support the initiative are
> encouraged to facilitate its aims in two ways, firstly by supporting
> open-access journals, secondly through the 'self-archiving' of
> peer-reviewed research papers (or e-prints).
The numbers dictate that these two strategies should be described to
BOAI-2: For those of your papers for which a suitable open-access
journal exists, publish them there. (This amounts to about 5%
of the annual 2,000,000 papers that appear in the planet's 20,000
BOAI-1: For those of your papers for which a suitable open-access
journal does not yet exist, self-archive their preprints and
postprints in your institutional eprint archive. (This amounts to
the remaining 95% of the annual 2,000,000 papers that appear in the
planet's 20,000 peer-reviewed journals.)
> Institutional repositories have been defined in a recent Scholarly
> Publishing and Academic Resources Coalition (SPARC) position paper
> as digital collections that capture and preserve the intellectual
> output of a single or multi-university community (Crow, 2002, p.
> 4). ... In this, they are not necessarily
> limited to e-prints of the research literature, but can provide an
> institutional focus for the collection and preservation of scientific
> data, learning resources, image collections and many other different
> types of content.
See the discussion thread "EPrints, DSpace or ESpace?" about ways in
which the institutional-repositories agenda has become entangled in
other institutional archiving agendas that may be obscuring and
distracting from (and hence slowing) the growth of the very specific
(and urgent, and overdue) agenda of self-archiving institutional refereed
research output, for the sake of maximizing its research impact:
"EPrints, DSpace or ESpace?"
"Cliff Lynch on Institutional Archives"
(One exception: The self-archiving of the data on which refereed research
articles are based is not a distraction, but a direct and natural
ally of the agenda of self-archiving the refereed research itself:
> While a number of subject-based e-print repositories are based
> (or have mirror sites) in the UK, there are, to date, very few
> institutional repositories. Those that do exist, contain a relatively
> small number of e-prints.
It is not enough to simply create an institutional eprint archive.
The primary challenge for those who wish to hasten open-access to all of
refereed research is to inform researchers, their institutions and their
funders about the vast benefits of open-access for research visibility,
accessibility, usability and citation, and to inform them also about how
these benefits can be attained by implementing systematic institutional
(and national) self-archiving policies:
> ...a wider advocacy role may be required. This
> would need both to focus on the incentives for academics and
> researchers to deposit in institutional repositories and to answer
> some of their concerns.
Here are some powerpoints that may help:
> The challenge of the FAIR programme
> and related initiatives will be to help foster the setting up of
> well-populated institutional repositories in UK higher and further
> education institutions.
Indeed; and the UK is in a position to lead the way, if it couples its
systematic self-archiving efforts with the reform of the Research
Assessment Exercise, involving all the Research Councils. The US "Bethesda
Statement" focuses on open-access journals (the 5% solution).
A concerted UK focus on self-archiving could help capture the other 95%!
> Van de Sompel and Lagoze (2002, p. 145) note that technologies like
> the OAI-PMH exhibit network effects, in that "initial adoption may be
> slow and steady and positive feedback then dramatically increases the
> adoption rate." This means that can be difficult to measure success,
> particularly in the early stages of adoption. The same will apply
> to measuring the long-term impact of institutional repositories.
"How to compare research impact of toll- vs. open-access research"
> Direct advocacy of institutional repositories is not part of the main
> scope of ePrints UK (although it is part of the FAIR programme itself
> or projects like SHERPA)
Yet it is the one thing that we need the most at this time!
> 5. Potential impediments to success
> 5.1 Practical issues
> There are doubtless a number of reasons why e-prints and open-access
> principles have not yet caught the imagination of those who would be
> perceived to most benefit from them, i.e. individual researchers.
> While many academics and researchers seem happy to make copies
> of selected research papers available on institutional, project
> or personal Web pages, they appear to be less sure about the role
> of e-print repositories, whether subject-based or institutional.
> Concerns are often raised about practical issues like copyright
> or peer-review.
More use should be made of the information in the self-archiving FAQs to
dispel the many persistent misunderstandings about copyright and
peer-review in relation to self-archiving and open access:
> 5.1.1 Copyright
> One possible impediment to the success of institutional e-print
> repositories is the traditional assignment of copyright to publishers.
It is fine to try to retain copyright where feasible, but the only
thing it is necessary to retain is the right to self-archive. 55% of
publishers already explicitly support this, and many of the others will
agree if asked. (So at the *very* least, 55% of the peer-reviewed literature
should be self-archived *already*!) Here is the current Romeo table
(slightly transformed for more transparency):
> the latest copyright agreement issued
> by Nature Publishing Group asks authors to grant them an exclusive
> licence to publish. Authors are allowed to "re-use the papers
> in any printed volume of which they are an author; to post a PDF
> copy on their own (not-for-profit) website; to copy (and for their
> institutions to copy) their papers for use in coursework teaching;
> and to re-use figures and tables" (Nature, 2003). However, the
> licence expressly excludes "open archival websites, such as those
> that host collections of articles by an institution's researchers."
The Nature license itself makes no reference to such incoherent and
unenforceable distinctions. It is only an accompanying FAQ, which is
meant to *clarify* the license (!) that formulates this nonsensical
distinction, thereby successfully confusing the would-be self-archiver
utterly! Nature has been invited to clarify this, and has promised me to
do so, but they seem so far to have been caught in endless ruminations
about it. Authors are best advised to ignore Nature's FAQ, take Nature's
license wording at its word, and self-archive in their own institution's
eprint archive without giving the empty hemming and hawing another
"Open Letter to Philip Campbell, Editor, Nature"
> Those responsible for institutional repositories will have to be
> aware their responsibilities as de facto publishers.
I disagree and I would like to respectfully suggest that this is *very*
bad advice to give to would-be self-archivers and their
institutions! Self-archiving is something that is done by the *authors*
of published, refereed research. It is done on the world wide web. What
is put on the world wide web is publicly accessible to anyone with access
to the web. Authors are not *self-publishing* their published
postprints. They are already published in the refereed journal in which
they appeared! They are *self-archiving* them, in order to maximize their
impact, by maximizing access to them. Their research institution is
providing the websites for their researchers. There is no difference
whatsoever whether that website is labelled "personal website" or
"personal sector in the institution's research repository." There is
also no difference whether the data are or not tagged using the OAI
metadata tags for interoperability. These are all red herrings and should
be politely side-stepped (with nose-plugs) as they deserve to be. The
Nature license correctly states that the author retains the right to
publicly self-archive his own full-text on the web. "That is all Ye know
on earth, and all ye need to know."
> These should include (for example) warranties on the part of the
> author that they are not breaching any third party agreement -
> or copyright - by posting the eprint. This would also ensure that
> authors explicitly accepted the terms under which the content is
> being made available to others.
It is a pity, that -- in switching from the central, discipline-based
self-archiving physicists (for example) have been doing for more than
a decade now, to the much more general and potentially faster-growing
distributed institutional self-archiving model we are now advocating
as reflecting the joint interests of authors and their institutions
in maximizing their research impact -- some well-meaning advisers of
institutions are recommending needless constraints that were never even
considered by the sensible physicists as they went about doing the right
thing for over a decade now.
May I recommend (again) the "Los Alamos Lemma" that if there is something
that anyone tells you you *must* do or worry about before you can safely
go ahead and self-archive, and that something was *not* something that
the physicists self-archiving in Los Alamos bothered about (yet here they
are, a decade hence, with the biggest body of self-archived full-text
to date), then that something is *not* something you must do or worry
about before you can safely go ahead and self-archive?
"The 'Los Alamos Lemma'"
> It has been proposed that one way of solving at least some of
> the copyright issues of institutional repositories would be for
> universities and other educational institutions to assert copyright
> ownership of the research outputs of employees (Gadd, Oppenheim &
> Probets, 2003).
Fine idea, but a separate project, *not* a prerequisite for immediate
self-archiving of all institutional refereed-research output. Go for it,
but not as a precondition; it's not necessary, and it would waste more
time instead of going ahead and doing what is already long overdue.
> 5.1.2 Peer-review and quality control
> Another objection to e-print repositories is that it might enable the
> bypassing of peer-review.
The way to disable this red herring is ever so simple: State clearly
that institutional eprint archives are primarily intended for the
self-archiving of *peer-reviewed, published research articles,* before and
after peer review and publication. Continue to distinguish (as researchers
have always done) unrefereed preprints from refereed postprints:
And don't forget that self-archiving is not self-publication:
"1.4. Distinguish self-publishing (vanity press) from
self-archiving (of published, refereed research)"
> In order
> to ensure a certain level of quality control, some institutions
> may decide to separate peer-reviewed e-prints from those that have
> not been reviewed.
Unnecessary and misleading about the nature of institutional archiving
and metadata tagging: All that is needed is that peer-reviewed,
published articles should be clearly *tagged* as such (including their
journal-name, etc.), and that unrefereed preprints should likewise be
clearly tagged as such, as they are in the eprints.org archives. Leave
the rest to the researcher-users, as it has always been left to them. The
institution is not the publisher; it is merely providing (tagged) access
to its own researchers' research output, pre- and post-peer-review.
> 5.1.3 Long-term preservation
> Another potential problem is what will happen to e-print repositories
> in the longer-term (e.g., Smith, 2003).
Is it not yet apparent that preservation is one of the most roseate and
disfragrant of the many red herrings repeatedly flung into
*If* it is not sufficient to note that self-archiving is not
self-publication -- that it is merely a supplement to, not a substitute
for, publication, meant to provide immediate open-access, now, for those
would-be users whose institutions cannot afford toll-access to one's
published research, and that hence any preservation worries should be
redirected to the primary locus, the locus classicus, of this published
research, namely, the publisher's official on-paper and on-line version,
*not* the author's institutional open-access supplement to it:
-- *then* surely the fact that (as Michael Day points out) the
institutional archives are near empty (Strathclyde has exactly *zero*
eprints, though it has been up for at least a year) should make it clear
that the preservation of the non-contents of these near-empty archives
is *not* the problem: *filling* them is.
> The move towards licensing content threatens the role of libraries as
> the preserver of scientific knowledge.
> ...institutions that set up repositories may not
> always be aware of their responsibility to ensure the long-term
> preservation of content.
Librarians need to more fully think through the implications of their
new roles in digital repository curation: For the time being, these
archives are supplements only, intended to provide immediate open access
to institutional research output. The primary locus of that research
output is still the peer-reviewed journals. Please separate concerns
about the changing library role in ensuring the permanence of those
primary contents (bought in from the journal publishers, as always) from
the brand new role of providing immediate access to university research
output, as a supplement to its traditional and still-primary means of
access. In other words, the problem in self-archiving is how to get
the archives filled, not how to preserve their contents. (Get those
supplementary contents first, then perhaps move on to the luxury problem
of preserving them. And above all, don't conflate the primary-corpus
preservation problem with anything concerning this secondary, supplemental
corpus, whose urgent problem is its virtual nonexistence, not its
> 5.2 Cultural issues
> While some advocates of e-prints argue that the
> authors of peer-reviewed papers write primarily for research impact
> (e.g., Harnad, 2001, p. 1024), the multiple roles that journals
> have evolved over time to fulfil suggest that this may not be the
> whole story.
And what are those further multiple roles? Research impact means being
seen, read, used, applied and cited. We are talking only about
peer-reviewed research. So apart from providing peer-review and impact,
what further roles does one have in mind here? Distribution?
Self-archiving is a supplement to the usual (on-paper and on-line) modes
of distribution. Preservation of the primary (publisher's) corpus, whether
on-line or on-paper, is independent of self-archiving, as noted. So what
are these multiple roles? And what other roles are needed for eprints,
apart from maximizing immediate access and hence impact, in order to
complete the whole story?
> [the view] that the main function of journals is dissemination
> ...is incomplete.
> In the first place, peer-reviewed journals are not, and have never
> been, the only way for scientists or scholars to disseminate. It
> is one of a range of different dissemination methods - including
> informal discussion, conference papers, pre-prints, books, etc. -
> that are now being supplemented in the digital era.
True, but irrelevant to the main functions of *journals*, as well as to
the main function of self-archiving, which is to supplement access to
> Secondly, while
> dissemination remains one of the more important roles of peer-reviewed
> journals, they have evolved into a sophisticated system that provides
> (at least) the following additional features (Rowland, 1997):
> Quality control through editorial processes and peer-review
> The basis for the establishment of priority over advances in
> science and the recognition of authors
> The establishment of a distributed public-domain archive
Peer-review is indisputably one of the essential functions performed by
peer-reviewed journals (but it is not at issue, since self-archiving is
intended mainly in order to provide open access to peer-reviewed
Publication provides priority; self-archiving provides access.
I don't know about "public domain." (Peer-reviewed research authors
would like their papers to be openly accessible, i.e., publicly
accessible, but I am not sure they want to relinquish all copyright
protection by making them "public-domain"! I suspect this is a red
herring too.) Let institutions self-archive their research in
OAI-compliant eprint archives and the distributedness will take care of
itself (partly through the web itself, partly through
> In short, the multiple essential functions that are fulfilled by
> journals may mean that scientists and scholars may be reluctant
> to adopt forms of scientific communication that emphasise the
> importance of dissemination over its other roles.
I can't follow this. Researchers publish to make their (peer-reviewed)
work publicly accessible and usable. Self-archiving is merely a way to
enhance access and usage.
> It is perhaps instructive that many of the papers deposited in arXiv
> are also submitted for publication in peer-reviewed journals, thus
> combining the rapid dissemination abilities of digital technology
> with the other functions best provided by journals.
It is not just instructive (and not just "many" but virtually *all*)!
One wonders what grounds anyone ever had to suppose otherwise.
Absolutely nothing has changed in physics publication practices since
self-archiving began in 1991: Virtually everything continues to be
submitted to peer-reviewed journals, refereed, revised, and published,
exactly as before. Self-archiving is merely to enhance access to it,
both before and after peer-review.
> Following this, the advocates of self-archiving now argue that
> depositing e-prints in institutional repositories need not mean
> that authors should give up publishing in high-impact journals.
Who ever said otherwise? Self-archiving has always been the
self-archiving of research, pre- and post-peer review, to maximize
access and impact. The conjecture that self-archiving unrefereed
research might become a *substitute* for peer-reviewed publication, and
that peer review might disappear or be replaced by something else, is,
and has always been pure speculation, and based on nothing that has
ever actually happened throughout the past dozen years or more.
It seems as remote from reality to speculate about whether open access
will alter or eliminate peer review (when most research is still only
available through toll-access) as to worry about preserving the contents
of near-empty eprint archives! What we need is not speculation but
> there remain 'perverse incentives' for scholars and scientists to
> publish papers in the existing journal-based system...
> Researchers and scholars choose whether (or not) to publish in
> journals depending on a range of objective or subjective criteria
> - e.g., prestige, perceived quality, audience, high-impact, etc.
> - but not primarily on price.
These all seem to be non-sequiturs! Almost all researchers today
submit their research to and, if accepted, publish their research in
peer-reviewed journals (or peer-reviewed conference proceedings), as they
did for decades previously. Most of them do not yet self-archive their
research. But even those who do self-archive nevertheless continue to do
exactly the same as those who don't, namely, they continue to submit their
research to and, if accepted, publish their research in peer-reviewed
journals. Of course this has nothing to do with the journal's price! It
has to do with the journal's quality and track-record for quality (hence
the journal's peer-review quality, rigor, selectivity).
"Publish-or-Perish" never meant vanity-press publication, or
self-publication (or the mere self-archiving of unpublished, unrefereed
work). It meant peer-reviewed publication, where the track-record of the
journal for its level of peer review and quality mattered, both to
would-be users and to funders and evaluators.
Nothing has changed -- except that the impact of that research can now
be maximized by self-archiving it.
> the parts of
> institutions that actually spend money on journal subscriptions
> or licenses are not always the ones who read or submit articles to
And hence we see again why the pricing crisis has little direct
connection with the self-archiving incentive: Researchers are aware that
they themselves cannot access everything, and that of course makes them
wish it cost less, or that their institutions could afford more. But
that has no obvious connection with the incentive to self-archive. The
connection comes at *other* institutions: those that cannot afford the
tolls for access to the journals in which you publish. For their
would-be users cannot then see or use your research. And that means lost
impact for you. What is the solution? Not to strive to lower or
eliminate journal prices (welcome as that would be), but to make your
own research immediately accessible to all, toll-free, by self-archiving
> establishment of priority over a particular advance or discovery is
> one of the basic motivations of most scientists and is, on occasion,
> considered more important than being read or cited by peers
You can't establish priority if no one can read your paper; so priority
is just a form of impact. But it is certainly true that the earlier
one can establish priority, the better. This is another incentive
for also self-archiving the pre-peer-review preprint, and not just the
post-peer-review postprint. But a constraint (for both the author and
the user) is that, in general, unrefereed results are less reliable than
refereed results. One does not want to make a public fool of oneself
either (nor to waste one's limited reading time on -- let alone trying to
build upon -- unfiltered work that might not be sound). So (as always),
the rush to priority is tempered by prudence about first successfully
meeting the standards and scrutiny of peer review.
> The publication of papers is also becoming increasingly important
> in institutional contexts as a response to the growing culture of
> research assessment.
And the impact of those papers is just as important as publishing them
at all. Unread, unused research is less likely to be funded and rewarded.
And inaccessible research is less likely to be read or used.
> Historians write essays and articles for many specific and
> often unrelated reasons - to launch their careers, to establish a
> reputation, to keep their hand in; to please themselves, to impress
> their colleagues, to reach a broader audience; to sketch out a new
> idea, to anticipate a major work, to avoid writing a book; to take
> a break from a big project, to dabble but not delve too deeply, to
> revisit old friends and old haunts; to give as conference papers,
> to deliver as public lectures, to contribute to edited volumes;
> to indulge their scholarly curiosity, to make some (but not much)
> money, and (most recently and regrettably) to provide essential
> fodder for the Research Assessment Exercise.
But it is unlikely that a historian ever writes with indifference about
whether his writing will be read and used.
(And I doubt that even historians receive any royalties from their
peer-reviewed journal articles.)
> While few of these - with the partial exception of the economic
> motive - would invalidate the possibility of depositing such essays
> in an institutional archive, their complexity might suggest that
> there may be little direct incentive for some researchers to do so.
Disciplinary practises may change too, as webmetric measures of impact
become more sensitive and powerful, in the wake of self-archiving.
"Can journal-based research impact assessment
be generalised to book-based disciplines?"
> 5.2.3 Cultural differences between subject domains
There are no doubt disciplinary differences, but I challenge anyone to
find a discipline that is indifferent to whether or how much its
refereed publications are read and used.
> 5.2.4 The diverse nature of research institutions
> what should happen to that proportion
> of published research that is not published by academic institutions.
> ... need to work with other national services and repositories set up
> by non HE institutions.
All institutional refereed-research output should be self-archived,
whether the institution is a university or not.
> Great progress has been made in the development of standards and
> software tools that permit the easy creation of repositories....
> The organisational side is less well developed,
> and some stakeholders have concerns about copyright, peer-review and
> long-term preservation.
It is certainly true that the organisational side of the self-archiving
initiative is less well-developed, and that that is where the thought
and resource-investment is needed now. The information for allaying
all concerns about copyright, peer-review and preservation is available
and has been (both in print and online) for some time. It now has to be
clearly and systematically communicated to the "stakeholders" so we can
get on with it!
> More seriously, the cultural dependence of
> academics on the existing journal system may mean that the take-up
> of self-archiving and other open-access methods may be incremental
> rather than rapid, focusing more on some subject disciplines than
> on others.
This again sounds like a non-sequitur to me. Disciplines have certainly
varied in their self-archiving progress to date, and that is what needs
the systematic work to accelerate. But no discipline fails to benefit
from enhanced research impact; they vary only in the degree to which
they may be *aware* of the benefits -- and the newfound possibility of
> ePrints UK should support the significant advocacy activity proposed
> by Pinfield (2003).
Hear hear! http://www.ariadne.ac.uk/issue29/open-archives/
> Once sufficient content is available, it will be possible to evaluate
> both the proposed ePrints UK national service and the Web services
> designed to support their development and use.
It is already possible to evaluate and quantify the impact enhancement
provided by what self-archiving there has been. Quantifying and
publicizing this systematically and extensively will help propagate the
> Lynch, C.A. (2003). "Institutional repositories: essential
> infrastructure for scholarship in the digital age." ARL Bimonthly
> Report, 226. Available at: http://www.arl.org/newsltr/226/ir.html
"Cliff Lynch on Institutional Archives"
Some references follow.
Harnad, S. (2003) Self-Archive Unto Others as Ye Would Have Them
Self-Archive Unto You. The Australian Higher Education Supplement.
Harnad, S. (2003) Measuring and Maximising UK Research
Impact. Times Higher Education Supplement. Friday, June 6 2003.
Harnad, S. (2003) Maximising UK Research Impact Through
Harnad, S. (2003) Electronic Preprints and Postprints. Encyclopedia
of Library and Information Science Marcel Dekker, Inc.
Harnad, S. (2003) Online Archives for Peer-Reviewed Journal
Publications. International Encyclopedia of Library and
Information Science. John Feather & Paul Sturges (eds). Routledge.
Harnad, S. (2003) Back to the Oral Tradition Through
Skywriting at the Speed of Thought. Interdisciplines.
NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03):
Discussion can be posted to: september98-forum at amsci-forum.amsci.org
More information about the Jrnlnote