The recent discussion on BIOSCI about "Genbank errors" made it clear that
there is a (debatable) case for keeping different versions of the same
sequence in the databases.
I also understood that some merging of sequences DID go on - hence multiple
accession numbers for each sequence.
NO FLAME - Just curious...
What justification is there for the following two entries in the
database, by the same authors (Rao et al), published in the same
year (1986):
AC M16288 length 3443 bp (EMBL:HSPDGFBA, GB:HUMPDGFBA)
AC M12783 length 3798 bp (EMBL:HSSISPDG, GB:HUMSISPDG)
They are identical along their shared 3443 bp stretch. Surely the
data should be merged into one entry?
Maybe this is just an oversight. Any ideas how many other examples like this
exist? It seems an unnecessary waste of space and cpu in searches, but it
would be quite a job finding them all....
Cary O'Donnell
*****************************************************************************
AFRC Computing Division JANET : AFRC.ARCB::ODONNELL
West Common INTERNET: ODONNELL at ARCB.AFRC.AC.UK
Harpenden Tel: (+44) 582 762271 ext 229
Herts AL5 2JE Fax: (+44) 582 761710
U.K. (AFRC = Agricultural & Food Research Council)
-----------------------------------------------------------------------------