Duplicate use of entry codes

POSTMAST at GUNBRF.BITNET POSTMAST at GUNBRF.BITNET
Fri Aug 28 16:52:01 EST 1992


In message <9208271604.AA10206 at genbank.bio.net> ODONNELL at ARCB.AFRC.AC.UK
(Cary O'Donnell) expressed concern about an apparent "duplicated use of the
sequence ID for two very similar (almost identical!!) sequences."

Several people brought to our attention a problem concerning duplicated entry
identification codes and accession numbers among the data sections PIR1, PIR2,
and PIR3 in the PIR-International Protein Sequence Database.  We apologize for
this difficulty and have modified our procedures to ensure that this does
not recur in future releases.  We thank those who brought this problem to our
attention and will greatly appreciate any further comments, corrections, or
recommendations concerning the database.

We will to take this opportunity to restate the policy concerning entry
identification codes and accession numbers in order to clarify (we hope)
the situation.

The entry identification code (on the `header-line' in NBRF-format; on the
`ENTRY' record in CODATA format) is a unique code assigned to every entry in
PIR1, PIR2, and PIR3.  The code should be unique across all three data
sections.  The code is not a permanent identifier, however; it is subject to
change from release to release.  The duplication of entry identification codes
reported in version 33 was a mistake and has been corrected.

An accession number as it appears within the reference section of an entry
refers uniquely to the sequence as reported by the authors in the corresponding
publication, manuscript, or submission.  These sequences are being compiled
into an archival data set.  The accession number is the entry identification
code of the archival sequence entry. fThese accession numbers are permanent
identifiers of the `reported' sequences and will remain associated with the
reported sequences as long as they remains in the database.

When the data are processed by PIR-International staff and entered in the PIR1
and PIR2 data sections, the accession numbers are placed in the accession field
of the appropriate reference.  In NBRF format, they occur on `A;Accession:'
lines following the corresponding reference.  In CODATA format they occur
within the `REFERENCE #accession' fields.  Please refer to the document CXFSD
available from FILESERV at GUNBRF.BITNET (SEND CXFSD) for specifics concerning the
CODATA format.

Note that the reference-specific accession numbers are distinct from those that
occur on the `C;Accession:' line (NBRF format) or on the ACCESSION record
(CODATA format).  This field contains a list of all the accession numbers that
were ever associated with the entry; some of these do not correspond to
specific reported sequences because our original policy was to associate them
with the entire `merged' PIR entry.

The PIR3 section of the database consists of all entries in the archival data
set that have not been entered into PIR1 and PIR2.  These entries have not been
`merged' and the entry identification code and the accession number are
identical.  There should not be any case where accession numbers found in PIR1
and PIR2 overlap with those in PIR3.  However, there may be overlap among
accession numbers found within the PIR1 and PIR2 sections.

The entry in question A26616 will be merged with JS0468 in the next release.
A copy of the current version of that merged entry, which clearly presents the
origin of the sequence difference, is appended below.
------------------------------------------------------------------------
                                 Dr. David G. George
                                 Dr. John S. Garavelli
                                 Protein Identification Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMASTER at GUNBRF.BITNET
------------------------------------------------------------------------
\\\
ENTRY           JS0468       #Type Protein
TITLE           Cytochrome-b5 reductase, placental - Human
                  #EC-number 1.6.2.2

DATE            17-Jul-1992 #Sequence 17-Jul-1992 #Text 17-Jul-1992
PLACEMENT          0.0    0.0    0.0    0.0    0.0
SOURCE          Homo sapiens #Common-name man
ACCESSION       JS0468\ A26616\ PX0015

REFERENCE
   #Authors     Tomatsu S., Kobayashi Y., Fukumaki Y., Yubisui T.,
                  Orii T., Sakaki Y.
   #Journal     Gene (1989) 80:353-361
   #Title       The organization and the complete nucleotide
                  sequence of the human NADH-cytochrome b5 reductase
                  gene.
   #Reference-number JS0468
   #Accession   JS0468
   #Molecule-type DNA
   #Residues    1-301 <TOM>
   #Cross-reference GB:M28705
   #Comment     The authors translated the codon CCA for residue 66
                  as Ser.

REFERENCE
   #Authors     Yubisui T., Naitoh Y., Zenno S., Tamura M.,
                  Takeshita M., Sakaki Y.
   #Journal     Proc. Natl. Acad. Sci. U.S.A. (1987) 84:3609-3613
   #Title       Molecular cloning of cDNAs of human liver and
                  placenta NADH-cytochrome b-5 reductase.
   #Reference-number A94154
   #Accession   A26616
   #Molecule-type mRNA
   #Residues    8-65,'S',67-240 <YUB>

REFERENCE
   #Authors     Murakami K., Yubisui T., Takeshita M., Miyata T.
   #Journal     J. Biochem. (1989) 105:312-317
   #Title       The NH2-terminal structures of human and rat liver
                  microsomal NADH-cytochrome b5 reductases.
   #Reference-number PX0016
   #Accession   PX0015
   #Molecule-type protein
   #Residues    2-25 <MUR>

KEYWORDS        oxidoreductase

SUMMARY       #Molecular-weight 34245  #Length 301  #Checksum   370
SEQUENCE
                5        10        15        20        25        30
      1 M G A Q L S T L G H M V L F P V W F L Y S L L M K L F Q R S
     31 T P A I T L E S P D I K Y P L R L I D R E I I S H D T R R F
     61 R F A L P P P Q H I L G L P V G Q H I Y L S A R I D G N L V
     91 V R P Y T P I S S D D D K G F V D L V I K V Y F K D T H P K
    121 F P A G G K M S Q Y L E S M Q I G D T I E F R G P S G L L V
    151 Y Q G K G K F A I R P D K K S N P I I R T V K S V G M I A G
    181 G T G I T P M L Q V I R A I M K D P D D H T V C H L L F A N
    211 Q T E K D I L L R P E L E E L R N K H S A R F K L W Y T L D
    241 R A P E A W D Y G Q G F V N E E M I R D H L P P P E E E P L
    271 V L M C G P P P M I Q Y A C L P N L D H V G H P T E R C F V
    301 F
///
\\\



More information about the Proteins mailing list