Different entry is same sequence (Re: same sequence is different in EMBL and GENBANK)

Tom Schneider toms at fcsparc6.ncifcrf.gov
Mon May 9 12:00:50 EST 1994


In article <1994May8.084902.4583 at comp.bioz.unibas.ch> doelz at comp.bioz.unibas.ch
(Reinhard Doelz) writes:

| I haven't analyzed this systematically but I am afraid that inconsistencies 
| like this make database provider's life difficult.

It makes the database user's life extremely difficult.

| As human intervention
| is extremely expensive (manpower) and we (customers) don't want to pay the 
| prediction that it will become worse in the future is a safe guess. 

Yes, unless action is taken soon eventually there will be a crisis.

| I think we all agree that databases are non-optimal. On the other hand, 
| if you see those guys working, they don't feel lazy, nor do they enjoy 
| being reminded that they do produce low-quality data. (I won't talk 
| on proteins here but the situation there is even worse). The data need
| better MAINTENANCE! 

Yes

| We could spend another XX M$ on both sides of the atlantic to have a 
| staff of workers clean up the past, and cope with the flood of the future. 
| But still, this wouldn't help. I think that there's something severely 
| wrong with responsibilities. The researchers don't do what they should, namely 
| take care of their own entries or areas, and correct the entries as appropriate.

BINGO!

| And, for the future, the genome projects should adopt slightly more 
| responsibility for what they produce. Just dumping thousands of low-quality
| data entries to the databases, generated by robots, and complain afterwards
| doesn't help. The funding agencies must understand that a genome project 
| is USELESS (read: wasted money) if the data are not integrated well into the 
| data sets. The coordinators of the projects must refer from cooking their 
| own little databases as they comlain the loudest on the unability of the 
| general database providers. We certainly don't need hundreds of small databases
| but rather one set which is complete, and high quality. 
| ?We ? 

BINGO!

| Who are 'We' that we tolerate these duplications without doing something
| ourselves? A change in culture is needed. 

Duplication should not be tolerated, that's why it is the first principle in my
database philosophy paper.  (anonymous ftp from
ftp.ncifcrf.gov/pub/delila/philgen* but in revision at the moment.  If you
would like me to tell you when the next revision is out, please send me a
note.)

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms at ncifcrf.gov



More information about the Embl-db mailing list