GenBank Errors

Tom Schneider toms at fcs260c2.ncifcrf.gov
Wed Oct 16 13:19:03 EST 1991


Folks:

I have found that the following entries overlap:

TRN5IR1 1438 1737   =  TRN5NEO   1 301
TRN5NEO 901  1300   =  TRNTN5STR 1 400
TRN5NEO             =  ECOTN5X   (exactly, reported in previous posting)

In addition, the data are inconsistent:
TRN5IR1 is missing base 284 (c) in TRN5NEO.

TRN5IR2 is mostly internal to TRN5IR1, with some base changes and then
about 67 bases different on one end.  Yet they are both supposed  to be
tn5.  I have not tracked down the source of this discrepancy.  I think it is
outside the transposon.

By joining these entries, the entire sequence of the tn5 transposon would be in
the database, and anyone wanting it would just grab it.

I keep getting blubber from people who say that "Oh, we can handle that, we'll
just have the entries separate and we'll provide you with a view of the data
that is merged".

Well, get on with it.  So far it's hot air.

Oh yes, in TRN5NEO, the end of the neomycin phospohtransferase gene is at 945.
(which would be the A of the TGA)
TRNTN5STR says that the end of the kananycin phosphotransferase
(nb, the same gene as above, isn't it??)
is at position 45, the A of TGA.  Thus the two are inconsistent.
HOWCOME THIS WAS NOT DETECTED BY A PROGRAM??????

Comparing TRN5NEO to the identically sequenced ECOTN5X, we see that some in one
caase things are in features, the other, still in comments.  Inconsistent.

Also, the name of the gene is in a NOTE.  Put the names in something other than
notes and comments so we can read them with programs!!  I've been saying this
for 10 years and the GenBank staff has STILL not gotten it through their thick
skulls.  No wonder they lost the contract.  Will you do any better David
Lipman?

Sorry.  This is an international disaster and nobody cares.

GenBank is:

  INCONSISTENT
  REDUNDANT
  FULL OF ERRORS

With an exponential growth, this is only going to get worse.  PLEASE if you
know of errors REPORT THEM ON THE NET FROM NOW ON so we can all see how bad the
situation is.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms at ncifcrf.gov



More information about the Bioforum mailing list