Blackwell Publishing & Online Open

Stevan Harnad harnad at ecs.soton.ac.uk
Tue Mar 8 14:50:59 EST 2005


On Tue, 8 Mar 2005, Matthew Cockerill wrote:

> What I am saying is that there is huge scope for the scientific community to
> 'add value' to published scientific research in various ways.

OA is not about adding value, it is about accessing published peer-reviewed
journal articles (with the value already in them).

> Adding value to the literature is made dramatically easier (and therefore
> are much more likely to happen) if the scientific research concerned can be
> downloaded in a structured XML form. 

Are you referring to articles here or research data? If research data, we are
talking apples and oranges, because OA is about articles. If you are talking
about articles, what is the XML problem? (If it ever becomes indispensable,
authors can XML-ize their own articles; but I doubt they will think it's
worth the trouble, and that's the point!)

> I can perhaps clarify what I mean by "added value" with an analogy. 
> Take the new service Google Maps <http://maps.google.com>

Maps are data we use. Texts are literature we read. Database collection,
enhancement and provision are irrelevant to OA, which is about access to
articles, written by authors and given away by them for research impact.

> Structured information includes: reliably and consistently formatted
> bibliographic citation information, mathematical formulae, chemical
> structures, figure legends, author affiliation information etc etc. Sure,
> much of this can be partially reverse-engineered from an unstructured
> version of the document, but this process is complex, error-prone and a
> major unnecessary hurdle to re-use. The more hurdles there are, the less
> chance there is that any given idea for adding value will see the light of
> day. It's not simply a question of whether something is possible - we need
> to be making it *easy* to add value to the data.

The many would-be users who cannot afford it have no access to 80% of
these articles today, and you are worrying about reliability of formatting?
How about settling for making it 100% accessible to everyone first? Then
we can talk about adding more value...

> Stevan, I know that you will object that some of these things could
> theoretically be achieved without access to the XML, and without rights to
> redistribution, but my point is not that these things are *impossible* in
> the absence of structured XML and rights for reuse/redistribution, but that
> all are made immeasurably *easier* if freely re-distributable structured
> data is available.  The practical result of this is that making the
> re-usable and re-distributable XML available has an immensely stimulating
> effect on innovation.

I don't dispute that; I'm just pointing out that it's the ("added-value")
tail wagging the (80% non-existent) dog right now!

> * Image mining/reuse

If authors want their data or images re-used, they can self-archive them on
their own sites (and link them to the publisher's official version of record too).

> * Text mining

Ditto for text.

> * Antbase

Data-mining again.

> * PubMed's Bookshelf 

Irrelevant: OA's target now is the 2.5 million annual articles in the 24,000
peer-reviewed journals. Every one of those articles is an author give-away. Not
so, in general, for books. So don't hitch their fates. (Let those authors who want
to self-archive their books do so...)

> mathematical formulae.

Do it in your self-archived version.

> * Biological database linking

Irrelevant. If the data are OA, link to them. If you want to make your data OA,
self-archive them.

> molecules

Data-archiving again. See:
http://www.psigate.ac.uk/ebank/

> Again, I'm not saying that all the above is *impossible* with self-archived
> material in unstructured form.

Nor is it impossible to structure self-archived material!

> a whole lot more... will be
> achievable once the bulk of the scientific literature is fully open access
> in the Bethesda/Berlin/Budapest sense (i.e. available in a structured form,
> and fully redistributable).

No doubt. But almost as much will be achievable if it is just made OA in the
sense that pre-dates and post-dates this needless pre-emptive complexification:

    Immediate, permanent, full-text, online access, free for all users. 

That's OA (and we only have 20% of it today). Let's talk about the icing
when we have 100% of the cake!

> But in the long term, is it really the best use of resources invested by the
> funder/employer for researchers to spend their time self-archiving their
> research

Yes, that 6-10 minutes of keystrokes per article is eminently well spent!
(Les Carr will shortly have a paper with the empirical timing data
on this.)

> identifying any significant changes which happened during
> post-acceptance proofing and copy-editing

Only as important as the changes are significant (i.e., not much!).

> reconciling their various versions 

Reconciling versions for what? The official version of record is safely on the
publisher's website, for those users who can afford it, as it always was. With
self-archiving, the author's supplementary version is available for all those
users who cannot afford the official version. The refereed, accepted final
draft is already the difference between night and day for those users, and
you are talking about a bit of photomoter fine-tuning (when it is still 80%
night!).

> coordinating XML markup etc? 

For those who wish to bother: let them go ahead!

> Surely this can far more efficiently
> be taken care of as an inherent part of the publishing process, rather than
> being tacked on the end?

Indeed! But as only 5% of journals (at most!) are currently offering to do this,
can we get back to the 95% solution?

Stevan Harnad






More information about the Jrnlnote mailing list