Dataless entries in EMBL,GenBank

tony at tony at
Mon Nov 11 10:40:12 EST 1991

	I have a question for the database collators that probably needs a
wide airing given the implications.

	I do not know if many of you have picked it up (and this may be an error
and not a policy initiative of the databases), but in release 26 or 27 of EMBL, 
and continued in release 28, and because of the cross pollination of EMBL into 
GenBank 69, there is an entry that has reference/annotation information but no
sequence data. There may be others but this is all I have detected to date.
The entry is M37586 (HIVXXXXX in EMBL and HIVVXXXXX in GenBank).

	The entry follows:

ID   HIVXXXXX   standard; RNA; VRL; 1 BP.
AC   M37586;
DT   01-FEB-1991 (Rel. 26, Last updated, Version 2)
DT   22-DEC-1990 (Rel. 26, Created)
DE   Temporary entry.
KW   .
OS   Human immunodeficiency virus type 1
OC   Viridae; ss-RNA enveloped viruses; Positive strand RNA viruses;
OC   Retroviridae; Lentivirinae.
RN   [1]
RP   1-1
RA   LaRosa G.J., Davide J.P., Weinhold K., Waterbury J.A., Profy A.T.,
RA   Lewis J.A., Langlois A.J., Dreesman G.R., Boswell R.N.,
RA   Shadduck P., Holley L.H., Karplus M., Bolognesi D.P.,
RA   Matthews T.J., Emini E.A., Putney S.D.;
RT   "Conserved sequence and structural elements in the HIV-1 principal
RT   neutralizing determinant";
RL   Science 249:932-935(1990).
CC   The data associated with reference [1] are currently being d and
CC   checked for accuracy by the database staff and hors. The full
CC   dataset will be available as soon has been completed.
FH   Key             Location/Qualifiers
SQ   Sequence  1 BP;  0 A; 0 C; 0 G; 0 T; 1 other;

	On the one hand, I do not like the idea of withholding data, but I
would like to see entries like the above for entries that are on hold until
publication date. Then the world at large has an accession number to relate
to papers etc, and can scream out when a data set is still on hold after 
publication etc. However as the publication date is passed in this case,
the withholding of the data even for the reasons given is a worry.

	However, is the main body of the database, (in this case the VRL 
division), the appropriate place for such entries. We have an unannotated 
division - how about a "dataless" division. As it is, some software broke on
the above entry. For example, depending on the version of TFASTA on which 
system you are using, it had trouble translating the six frames of 
this "sequence" - surprise, surprise!

	If this entry is to become a trend, cannot such entries be put in a 
separate division? Or do we withhold such entries completely? 

	What do you think folks?

				Regards, Tony
Dr. Tony Kyne, Head, Computer Sciences Unit,
               The Walter and Eliza Hall Institute of Medical Research,
               P.O. Royal Melbourne Hospital, Victoria, 3050, Australia.
Phone: International +61-3-345-2586   FAX: International +61-3-347-0852
            National 03-345-2586                    National 03-347-0852
Email: Internet: tony at
        PSIMAIL: PSI%0505233430002::tony

More information about the Bioforum mailing list