VMS database submission form program

Paul Gilna pgil at HISTONE.LANL.GOV
Thu Aug 1 13:08:05 EST 1991

	I have no wish to exert control over what our users and
	submittors do, however I now believe it is necessary to voice
	an opinion on these latest developments designed to encourage
	the use of the ASCII submission form in preference to Authorin.

	Let me say first that I empathise completely with those
	conditions which require the need to take this particular
	route. I have no desire to enter into a discussion on the
	difficulties of authorin usage in these environments, or the
	required solutions. What I believe I must point out is the COST
	to the users and to the database, of adopting this alternative

	My principle argument rests on the business end of Authorin,
	the transaction output which represents the database submission
	file.  It is in this file that the predominant impact of the
	program lies, particularly with respect to the processing of
	the data here at Los Alamos.

	Almost all data which we now routinely receive from Authorin is
	processed automatically. Data are read directly into the
	database, and if the transaction is successful, accession
	numbers are issued immediately. In the case of email
	transactions, we are able to provide roughly a same day service
	for return of accession number(s). (I say almost, because some
	early PC versions must be massaged first, but are then treated
	as for other versions--this will change in the upcoming PC

	In the case of Authorin, the process of "annotation" has gone
	from one of data extraction and interpretation to one of
	review.  Essentially, the author has already entered the data
	and their attendant biological interpretation.  Our task is to
	review the data and make sure it is consistent with the rest of
	the database by applying a set of integrity checks to the

	Again this entire process is made vastly more efficient by the
	fact that the data are already in the database.

	Our average turnaround on non confidential data submissions is
	now of the order of DAYS, i.e., days from the point of receipt
	to release of the data to our servers. One of the most
	significant reasons that we have reached and can hold to this
	steady state can be attributed to the increasing use of
	Authorin, which now holds a 30% share of the submissions market
	and climbing.

	Equally, the process of releasing confidential data has vastly
	improved; All confidential data are held outside the database
	until their release.  This means that though we have had the
	data in our hands for months, we are essentially starting
	annotation from scratch at the point of release. With authorin
	transactions, we can accomplish this in a day, and the fact
	that so much of our data comes in this manner allows us the
	time to deal efficiently with the rest of the non-authorin

	We have steadfastly maintained that in order to deal with a
	rate of increase of data flow that is of the order of a yearly
	doubling, we MUST increase the efficiency by which we can deal
	with the data--either that or double our staff every year!
	That's why we developed the submissions protocol and why we
	developed automated data submission capabilities. 
	Secondly, Authorin can provide us with far more extensive annotation
	on the data than can we provide ourselves, or more importantly, have
 	the time to--even if it came in a submission form.
	Right now our performance is at a steady-state, all time high,
	yet the growth rate in data production has not abated--in 1985,
	it took us a year to process 1.5M bp, today we release that
	much to the servers in a day.

	THAT is where authorin has its most powerful impact.

	If you must write something for the VMS (or UNIX) world, then
	we plead that it be transaction capable, because no matter how
	glitzy or easy_to_use you make your product, without that
	capability you are dooming your users to an antiquated and
	increasingly outdated mode of data submission, and in the end,
	it is they who will not thank you for it.

/end (soapbox)


Paul Gilna
Biology Domain Leader,
GenBank, Los Alamos.

More information about the Bioforum mailing list