same sequence is different in EMBL and GENBANK

Tom Schneider toms at fcsparc6.ncifcrf.gov
Mon May 9 11:50:45 EST 1994


In article <1994May5.062921.10972 at comp.bioz.unibas.ch>
doelz at comp.bioz.unibas.ch (Reinhard Doelz) writes:

| Tom Schneider (toms at fcsparc6.ncifcrf.gov) wrote:
| ...
| : In article <2q8h6o$av8 at mserv1.dl.ac.uk> Massimo Delledonne
| : <DELLE%IPCUCSC.earn at earn-relay.ac.uk> writes:
| ...
| : | Who made the mistake ? The autors who submitted the correct version to Genbank
| ...
| : Although horrifying, this is a perfect example of one of the reasons that
| : "federations" of databases are likely to fail horribly.  EMBL and GENBANK are
| : supposed to work closely together!  What will happen when we have 50 sequence
| : databases and they DON'T even try to work together?  People will be checking
| : one database against the other and finding errors.  If a database gets a
| : reputation for handling data poorly, won't people simply stop using it?
| 
| I strongly argue against this. We have now PIR versus Swissprot, 
| Los Alamos/NCBI/EMBL, not to mention pacific rim originating sources ...
| We MUST make the databases work together. We'll never cope with the 
| flood otherwise. I am scared to read that you seem to suggest doing 
| once again a new database or even ?stop? using a database because it were 
| poor. Can we afford, fund, pay for this? 

The GenBank advisors (of which I was one) were trying to get a unified database
10 years ago.  The hope was that GenBank and EMBL could have identical
formats.  This was not politically possible because both sides wanted control.
So the two worked together, hopefully having the same data.  As we see now this
hasn't worked very well.  It means that people writing programs have to handle
two different formats, and it means that the two databases drift apart.  I was
not proposing that people choose a database, but rather that many databases
have chosen to work against one another rather than with each other.  Instead
of a single unified database we are aiming to have a database for every
chromosome of every species...  Under that ridiculous circumstance, the several
databases which attempt to gather all the data under one format will be in
competition for use by researchers use.  Darwin has something to say about
that.

Do I like this circumstance?  No.  Can we afford to have a bunch of databases
pulling in different directions?  No.

| PS: We have most of the databases available worldwide and crosscheck 
| them, e.g. genbank vs EMBL, via standard programs like 'GCG' or 'nrdb'
| from GCG Inc, and NCBI, resp.

Are you doing this and not telling the databases how to become closer to one
another?  If your effort at crosschecking were to be fed back into the
databases you wouldn't have to do it at every release.  You would also save a
lot of effort by others who probably have to do the same thing.

| Counting all, we get about 20 GByte needed
| for this effort. Justification being, we are an EMBnet node and try 
| to deliver what people need. Your suggestion would be to shift all that 
| to the users? END users?  It might be useful to bring up oddities as 
| above to these newsgroups but the direct partner to report database 
| problems should be the database vendors. I found both GENBANK and EMBL 
| staff very helpful (special Thanks to Peter Stoehr). Ther really appreciate
| detailed reports. 

As you say in another posting, the end users have to become more involved.  But
end users cannot make the overall database consistent.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms at ncifcrf.gov



More information about the Embl-db mailing list