EMBL vs. PIR entries/errors?

Fri Jun 11 16:25:54 EST 1993

In message <9306111647.AA17621 at net.bio.net> Roland (rhubner at molbiol.ox.ac.uk)
> 2)lipase 2 of Moraxella appears however in PIR2 under A39556 as well as lipase
>   3, but lipase 1 is not there... Do sequences that have been translated (ONLY)
>   appear in PIR? I thought that only aa sequenced stuff appears
>   there...
> 3)another lipase from Psychrobacter entered correctly EMBL (X67712; 
>   Empro:Pilipaa), but has TWO entries in PIR3: S28225 and S26486!!?

Lipase 1, 2 and 3 from Moraxella sp. all appear in PIR.
  PIR3:S12104  Lipase 1 - Moraxella sp.
  PIR2:A39556  triacylglycerol lipase (EC 2 - Moraxella sp. TA144
  PIR3:S14276  Triacylglycerol lipase (EC - Moraxella sp.
(From the title of the article the S14276 entry appears to be a product of the
"lip3" gene, and without reading the paper I must assume that is what it
probably is.)

It would be a very poor database, indeed, if the PIR did not have translated
sequences since these days most larger sequences are only available from
nucleotide sequence translations.  You are perhaps getting this issue confused
with the PIR submission policy.  The PIR does not accept the _submission_ of
peptide sequences determined solely by the translation of nucleotide sequences. 
The PIR requests authors to submit nucleotide sequences (possibly with
translations) to the recognized nucleotide sequence depositories for assignment
of nucleotide sequence accession codes.  The PIR acquires the entries from the
nucleotide sequence depositories, checks the translations and assigns accession
codes to the protein sequences with the nucleotide cross-references already
made.  This policy saves time for the authors (they only have to deal with one
database to get a publishable accession number for the proper experimental
entity) and for the databases because the correct cross-references can be
created without having to do sequence searches.

The appearance of both S26486 and S28225 can be explained because of this
policy.  The entry derived from EMBL submission is S26486.  The entry derived
from journal scanning is S28225.  It occassionally happens (more often than you
would think) that the published sequence is not the same as the submitted
sequence for the same authors.  Just in case, entries are prepared for both.
(We would have to do it anyway just to be able to compare them.)  Eventually,
the entries will be merged and annotated.  This all goes especially well when,
as in this case, the submitted and published sequence translations are
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMAST at GUNBRF.BITNET
                                 POSTMASTER at NBRF.GEORGETOWN.EDU

More information about the Embl-db mailing list