VMS database submission form program
pgil at HISTONE.LANL.GOV
Thu Aug 1 13:08:05 EST 1991
I have no wish to exert control over what our users and
submittors do, however I now believe it is necessary to voice
an opinion on these latest developments designed to encourage
the use of the ASCII submission form in preference to Authorin.
Let me say first that I empathise completely with those
conditions which require the need to take this particular
route. I have no desire to enter into a discussion on the
difficulties of authorin usage in these environments, or the
required solutions. What I believe I must point out is the COST
to the users and to the database, of adopting this alternative
My principle argument rests on the business end of Authorin,
the transaction output which represents the database submission
file. It is in this file that the predominant impact of the
program lies, particularly with respect to the processing of
the data here at Los Alamos.
Almost all data which we now routinely receive from Authorin is
processed automatically. Data are read directly into the
database, and if the transaction is successful, accession
numbers are issued immediately. In the case of email
transactions, we are able to provide roughly a same day service
for return of accession number(s). (I say almost, because some
early PC versions must be massaged first, but are then treated
as for other versions--this will change in the upcoming PC
In the case of Authorin, the process of "annotation" has gone
from one of data extraction and interpretation to one of
review. Essentially, the author has already entered the data
and their attendant biological interpretation. Our task is to
review the data and make sure it is consistent with the rest of
the database by applying a set of integrity checks to the
Again this entire process is made vastly more efficient by the
fact that the data are already in the database.
Our average turnaround on non confidential data submissions is
now of the order of DAYS, i.e., days from the point of receipt
to release of the data to our servers. One of the most
significant reasons that we have reached and can hold to this
steady state can be attributed to the increasing use of
Authorin, which now holds a 30% share of the submissions market
Equally, the process of releasing confidential data has vastly
improved; All confidential data are held outside the database
until their release. This means that though we have had the
data in our hands for months, we are essentially starting
annotation from scratch at the point of release. With authorin
transactions, we can accomplish this in a day, and the fact
that so much of our data comes in this manner allows us the
time to deal efficiently with the rest of the non-authorin
We have steadfastly maintained that in order to deal with a
rate of increase of data flow that is of the order of a yearly
doubling, we MUST increase the efficiency by which we can deal
with the data--either that or double our staff every year!
That's why we developed the submissions protocol and why we
developed automated data submission capabilities.
Secondly, Authorin can provide us with far more extensive annotation
on the data than can we provide ourselves, or more importantly, have
the time to--even if it came in a submission form.
Right now our performance is at a steady-state, all time high,
yet the growth rate in data production has not abated--in 1985,
it took us a year to process 1.5M bp, today we release that
much to the servers in a day.
THAT is where authorin has its most powerful impact.
If you must write something for the VMS (or UNIX) world, then
we plead that it be transaction capable, because no matter how
glitzy or easy_to_use you make your product, without that
capability you are dooming your users to an antiquated and
increasingly outdated mode of data submission, and in the end,
it is they who will not thank you for it.
Biology Domain Leader,
GenBank, Los Alamos.
More information about the Bioforum