IUBio

Duplicate database entries

Cary O'Donnell ODONNELL at ARCB.AFRC.AC.UK
Tue Dec 10 09:28:00 EST 1991


The recent discussion on BIOSCI about "Genbank errors" made it clear that
there is a (debatable) case for keeping different versions of the same
sequence in the databases.

I also understood that some merging of sequences DID go on - hence multiple
accession numbers for each sequence.

NO FLAME - Just curious...

   What justification is there for the following two entries in the
   database, by the same authors (Rao et al), published in the same
   year (1986):

      AC M16288 length 3443 bp (EMBL:HSPDGFBA, GB:HUMPDGFBA)
      AC M12783 length 3798 bp (EMBL:HSSISPDG, GB:HUMSISPDG)

   They are identical along their shared 3443 bp stretch. Surely the
   data should be merged into one entry?

Maybe this is just an oversight. Any ideas how many other examples like this
exist? It seems an unnecessary waste of space and cpu in searches, but it
would be quite a job finding them all....

Cary O'Donnell

*****************************************************************************
AFRC Computing Division         JANET   : AFRC.ARCB::ODONNELL
West Common                     INTERNET: ODONNELL at ARCB.AFRC.AC.UK
Harpenden                       Tel: (+44) 582 762271 ext 229
Herts AL5 2JE                   Fax: (+44) 582 761710
U.K.                            (AFRC = Agricultural & Food Research Council)
-----------------------------------------------------------------------------



More information about the Embl-db mailing list

Send comments to us at biosci-help [At] net.bio.net