Dataless entries in EMBL,GenBank
tony at wehi.edu.au
tony at wehi.edu.au
Mon Nov 11 10:40:12 EST 1991
I have a question for the database collators that probably needs a
wide airing given the implications.
I do not know if many of you have picked it up (and this may be an error
and not a policy initiative of the databases), but in release 26 or 27 of EMBL,
and continued in release 28, and because of the cross pollination of EMBL into
GenBank 69, there is an entry that has reference/annotation information but no
sequence data. There may be others but this is all I have detected to date.
The entry is M37586 (HIVXXXXX in EMBL and HIVVXXXXX in GenBank).
The entry follows:
ID HIVXXXXX standard; RNA; VRL; 1 BP.
DT 01-FEB-1991 (Rel. 26, Last updated, Version 2)
DT 22-DEC-1990 (Rel. 26, Created)
DE Temporary entry.
OS Human immunodeficiency virus type 1
OC Viridae; ss-RNA enveloped viruses; Positive strand RNA viruses;
OC Retroviridae; Lentivirinae.
RA LaRosa G.J., Davide J.P., Weinhold K., Waterbury J.A., Profy A.T.,
RA Lewis J.A., Langlois A.J., Dreesman G.R., Boswell R.N.,
RA Shadduck P., Holley L.H., Karplus M., Bolognesi D.P.,
RA Matthews T.J., Emini E.A., Putney S.D.;
RT "Conserved sequence and structural elements in the HIV-1 principal
RT neutralizing determinant";
RL Science 249:932-935(1990).
CC The data associated with reference  are currently being d and
CC checked for accuracy by the database staff and hors. The full
CC dataset will be available as soon has been completed.
FH Key Location/Qualifiers
SQ Sequence 1 BP; 0 A; 0 C; 0 G; 0 T; 1 other;
On the one hand, I do not like the idea of withholding data, but I
would like to see entries like the above for entries that are on hold until
publication date. Then the world at large has an accession number to relate
to papers etc, and can scream out when a data set is still on hold after
publication etc. However as the publication date is passed in this case,
the withholding of the data even for the reasons given is a worry.
However, is the main body of the database, (in this case the VRL
division), the appropriate place for such entries. We have an unannotated
division - how about a "dataless" division. As it is, some software broke on
the above entry. For example, depending on the version of TFASTA on which
system you are using, it had trouble translating the six frames of
this "sequence" - surprise, surprise!
If this entry is to become a trend, cannot such entries be put in a
separate division? Or do we withhold such entries completely?
What do you think folks?
Dr. Tony Kyne, Head, Computer Sciences Unit,
The Walter and Eliza Hall Institute of Medical Research,
P.O. Royal Melbourne Hospital, Victoria, 3050, Australia.
Phone: International +61-3-345-2586 FAX: International +61-3-347-0852
National 03-345-2586 National 03-347-0852
Email: Internet: tony at wehi.edu.au
More information about the Bioforum