From owner-embldatabank@net.bio.net Mon Jun 07 23:00:00 1993
Path: biosci!daresbury!mrccrc!gwilliam
From: gwilliam@crc.ac.uk (Gary Williams x3294)
Newsgroups: bionet.molbio.embldatabank
Subject: Warning - amino acid sequence in EMBL pro.dat
Keywords: warning
Message-ID: <1993Jun8.131546.24238@crc.ac.uk>
Date: 8 Jun 93 13:15:46 GMT
Sender: news@crc.ac.uk
Organization: MRC Human Genome Resource Centre
Lines: 48
Nntp-Posting-Host: tin


The following sequence occurs in the file 'pro.dat' in EMBL release 35
available from the EMBL FTP server ftp.embl-heidelberg.de in the file
/pub/databases/embl/release/pro.dat.Z date-stamped 'June 5 17:14'


ID   A01448     standard; DNA; PRO; 397 BP.
XX
AC   A01448;
XX
DT   29-MAR-1993 (Rel. 35, Created)
DT   29-MAR-1993 (Rel. 35, Last updated, Version 1)
XX
DE   E.coli tyrB aminotransferase protein sequence
XX
KW   .
XX
OS   Escherichia coli
OC   Prokaryota; Bacteria; Gracilicutes; Scotobacteria;
OC   Facultatively anaerobic rods; Enterobacteriaceae; Escherichia.
XX
RN   [1]
RA   Primrose S.B., Edwards R.M.;
RT   "The cloning and utilization of aminotransferase genes.";
RL   Patent number EP0293514-A/2, 07-DEC-1988.
RL   G.D. Searle & Co..
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..397
FT                   /organism="Escherichia coli"
XX
SQ   Sequence 397 BP; 51 A; 5 C; 30 G; 15 T; 296 other;
     vfqkvdayag dpiltlmerf kedprsdkvn lsiglyyned giipqlqava eaearlnaqp        60
     hgaslylpme glncyrhaia pllfgadhpv lkqqrvatiq tlggsgalkv gadflkryfp       120
     esgvwvsdpt wenhvaifag agfevstypw ydeatngvrf ndllatlktl parsivllhp       180
     cchnptgadl tndqwdavie ilkarelipf ldiayqgfga gmeedayair aiasaglpal       240
     vsnsfskifs lygervggls vmcedaeaag rvlgqlkatv rrnyssppnf gaqvvaavln       300
     dealkaswla eveemrtril amrqelvkvl stempernfd yllnqrgmfs ytglsaaqvd       360
     rlreefgvyl iasgrmcvag lntanvqrva kafaavm                                397
//


-- 
GARY WILLIAMS,  Computing Services,          Janet:       G.Williams@UK.AC.CRC
MRC Human Genome Mapping Project,            Internet:    G.Williams@CRC.AC.UK
Watford Rd, HARROW, Middx, HA1 3UJ, UK         ** Sequence databases have  **
Tel 081-869 3294   Fax 081-423 1275            ** about a 3% error content **

From owner-embldatabank@net.bio.net Tue Jun 08 23:00:00 1993
Path: biosci!comp.bioz.unibas.ch
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Newsgroups: bionet.announce,bionet.molbio.embldatabank
Subject: EMBL database availability
Keywords: keeping FTP traffic low
Message-ID: <1993Jun9.141817.26005@comp.bioz.unibas.ch>
Date: 9 Jun 93 14:18:17 GMT
Sender: kristoff@net.bio.net
Reply-To: doelz@urz.unibas.ch
Organization: EMBnet Switzerland [BASEL]
Lines: 57
Approved: bionews-moderator@net.bio.net
Xref: biosci bionet.announce:576 bionet.molbio.embldatabank:183


You might know that we maintain a very large molecular biology database 
server accessible via FTP, GOPHER, and HASSLE. Periodically, I receive 
messages like (original quote) 

>         My apologies for ftping some unwanted files. I wanted to get a 
> copy  of the recent embl and pir dbases for running blast. And I realized 
> I copied some outdated files. 
>        Once again my apologies for the apparent misuse of network 
> resources.
> I would appreciate if you can advise me how I can get the above mentioned
> dbases (and the updates) in an acceptable manner.

The reason for 'outdated files' is that he EMBL CD-ROM arrives usually 
a couple of days later than we install the magnetic version in the different 
formats.I won't describe PIR International Access, but will tell on EMBL 
databases. Several points to be made here: 

(1) Computer networks are extremely unsuited for transporting full database 
    releases. Unless there is a well-functioning data distribution schema, 
    ftp'ing databases of this size on transnational lines is not appreciated
    by the network providers and we  (Molecular Biology Users) can ruin 
    our image entirely if all end-users do this in uncoordinated fashion. 

(2) CD-ROMs are cheap and easy to ship. If you have the demand to get a 
    database not currently present in your portfolio, ask the database 
    providers to ship you the databases you want. 

(3) If you look for these issues in Europe, the European Molecular Biology
    network (EMBnet) is the umbrella organisation which keeps copies
    of the EMBL database in each country, on the so-called 'national node'
    in Norway, Sweden, Finnland, Denmark, UK, Netherlands, Belgium, France,
    Germany, Switzerland, Spain, Italy, Greece, Israel. Other nodes are 
    about being started in Portugal, Austria and more coutries. Inquiries 
    welcome! EMBL maintains also an ftp server with their current database, 
    and this archive is mirrored by the Israelian EMBnet node, and also 
    partially on others.

(4) If you look for the EMBL database in the US, Mike Cherry volunteered
    to make the EMBL database available on his machine, anonymous ftp 
    frodo.mgh.harvard.edu, in compressed form. 

(5) If you look for the _updates_ to the EMBL database, there are sevaral 
    possibilities. (1) The NCBI keeps collaborating with EMBL to incorporate
    the entries concerned into the GENBANK releases. (2) Mike Cherry 
    mirrors the updates on a weekly basis from Europe. (3) Various national
    EMBnet nodes provide the data to their community on demand or via 
    ftp archive. 
-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz@urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------

From owner-embldatabank@net.bio.net Thu Jun 10 23:00:00 1993
Path: biosci!NBRF.GEORGETOWN.EDU!POSTMASTER
From: POSTMASTER@NBRF.GEORGETOWN.EDU
Newsgroups: bionet.molbio.embldatabank
Subject: Re: EMBL vs. PIR entries/errors?
Message-ID: <01GZ9C3WC8C2A9L0Y9@NBRF.Georgetown.Edu>
Date: 11 Jun 93 21:25:54 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 49

In message <9306111647.AA17621@net.bio.net> Roland (rhubner@molbiol.ox.ac.uk)
inquires:
> 2)lipase 2 of Moraxella appears however in PIR2 under A39556 as well as lipase
>   3, but lipase 1 is not there... Do sequences that have been translated (ONLY)
>   appear in PIR? I thought that only aa sequenced stuff appears
>   there...
> 3)another lipase from Psychrobacter entered correctly EMBL (X67712; 
>   Empro:Pilipaa), but has TWO entries in PIR3: S28225 and S26486!!?

Lipase 1, 2 and 3 from Moraxella sp. all appear in PIR.
  PIR3:S12104  Lipase 1 - Moraxella sp.
  PIR2:A39556  triacylglycerol lipase (EC 3.1.1.3) 2 - Moraxella sp. TA144
  PIR3:S14276  Triacylglycerol lipase (EC 3.1.1.3) - Moraxella sp.
(From the title of the article the S14276 entry appears to be a product of the
"lip3" gene, and without reading the paper I must assume that is what it
probably is.)

It would be a very poor database, indeed, if the PIR did not have translated
sequences since these days most larger sequences are only available from
nucleotide sequence translations.  You are perhaps getting this issue confused
with the PIR submission policy.  The PIR does not accept the _submission_ of
peptide sequences determined solely by the translation of nucleotide sequences. 
The PIR requests authors to submit nucleotide sequences (possibly with
translations) to the recognized nucleotide sequence depositories for assignment
of nucleotide sequence accession codes.  The PIR acquires the entries from the
nucleotide sequence depositories, checks the translations and assigns accession
codes to the protein sequences with the nucleotide cross-references already
made.  This policy saves time for the authors (they only have to deal with one
database to get a publishable accession number for the proper experimental
entity) and for the databases because the correct cross-references can be
created without having to do sequence searches.

The appearance of both S26486 and S28225 can be explained because of this
policy.  The entry derived from EMBL submission is S26486.  The entry derived
from journal scanning is S28225.  It occassionally happens (more often than you
would think) that the published sequence is not the same as the submitted
sequence for the same authors.  Just in case, entries are prepared for both.
(We would have to do it anyway just to be able to compare them.)  Eventually,
the entries will be merged and annotated.  This all goes especially well when,
as in this case, the submitted and published sequence translations are
identical.
------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMAST@GUNBRF.BITNET
                                 POSTMASTER@NBRF.GEORGETOWN.EDU

From owner-embldatabank@net.bio.net Thu Jun 10 23:00:00 1993
Path: biosci!daresbury!buzz.bmc.uu.se!corax.udac.uu.se!sunic!pipex!uknet!comlab.ox.ac.uk!oxuniv!oxpath!rhubner
From: rhubner@molbiol.ox.ac.uk
Newsgroups: bionet.molbio.embldatabank
Subject: EMBL vs. PIR entries/errors?
Message-ID: <1993Jun11.163234.1@molbiol.ox.ac.uk>
Date: 11 Jun 93 15:32:34 GMT
Organization: Oxford University Molecular Biology Data Centre
Lines: 13
Nntp-Posting-Host: heatly
Nntp-Posting-User: rhubner

Hi group, friends of mine are puzzled with database entries to EMBL and PIR.
1)the entries X53868 and J03545 were attributed for lipase 2 of Moraxella sp.
  and human pancreatic elastase nucleotide sequences respectively. Where are
  they? Note that lipase 1 and 3 of Moraxella are in the database under
  Empro:Mslip3 (X53869) and Empro:Mslipase (X53053)...but the above sequences
  didn't appear with string searches of different types!
2)lipase 2 of Moraxella appears however in PIR2 under A39556 as well as lipase
  3, but lipase 1 is not there... Do sequences that have been translated (ONLY)
  appear in PIR? I thought that only aa sequenced stuff appears
  there...
3)another lipase from Psychrobacter entered correctly EMBL (X67712; 
  Empro:Pilipaa), but has TWO entries in PIR3: S28225 and S26486!!?
thanks for any comment, Roland

From owner-embldatabank@net.bio.net Thu Jun 10 23:00:00 1993
Path: biosci!daresbury!buzz.bmc.uu.se!embl-heidelberg.de!stoehr
From: stoehr@embl-heidelberg.de
Newsgroups: bionet.molbio.embldatabank
Subject: Re: Warning - amino acid sequence in EMBL pro.dat
Message-ID: <1993Jun9.220045.92012@embl-heidelberg.de>
Date: 9 Jun 93 21:00:45 GMT
References: <1993Jun8.131546.24238@crc.ac.uk>
Organization: EMBL, European Molecular Biology Laboratory
Lines: 15

In article <1993Jun8.131546.24238@crc.ac.uk>, gwilliam@crc.ac.uk
(Gary Williams x3294) writes:
> 
> The following sequence occurs in the file 'pro.dat' in EMBL release 35
> available from the EMBL FTP server ftp.embl-heidelberg.de in the file
> /pub/databases/embl/release/pro.dat.Z date-stamped 'June 5 17:14'
>...

We apologise for this mistake. It is an amino acid sequence captured from the
patent literature which was erroneously flagged as being nucleotide :-(
It should not happen again.

Regards,
Peter Stoehr
EMBL Data Library

From owner-embldatabank@net.bio.net Fri Jun 11 23:00:00 1993
Path: biosci!NET.BIO.NET!kristoff
From: kristoff@NET.BIO.NET (David Kristofferson)
Newsgroups: bionet.molbio.embldatabank
Subject: IMPORTANT BIOSCI INFORMATION
Message-ID: <9306120900.AA28151@net.bio.net>
Date: 12 Jun 93 09:00:03 GMT
Sender: kristoff@net.bio.net
Distribution: bionet
Lines: 143


Three important items follow: BIOSCI archive searching by e-mail, the
BIOSCI FAQ, and the BIOSCI User Address Directory form.  If you have
not yet listed yourself in our e-mail address directory, please take a
few minutes to complete and return the form below.  If your address
information has changed since you listed yourself, please send us an
updated form.

				Sincerely,

				Dave Kristofferson
				BIOSCI/bionet Manager

				kristoff@net.bio.net



	  **** SEARCHING BIOSCI ARCHIVES WITH WAISMAIL ****

E-mail users can search the BIOSCI archives by using our waismail
e-mail server.  For instructions send the message

help

to waismail@net.bio.net.  Leave the Subject: line blank.  Other
methods of searching the archives via WAIS and gopher are described in
the BIOSCI FAQ ...


       **** BIOSCI FREQUENTLY ASKED QUESTIONS (FAQ) SHEET ****

New users of BIOSCI/bionet may want to read the "Frequently Asked
Questions" or "FAQ" sheet for BIOSCI.  The FAQ provides details on how
to participate in these forums and is available for anonymous FTP from
net.bio.net [134.172.2.69] in pub/BIOSCI/biosci.FAQ.  It may also be
requested by sending e-mail to biosci@net.bio.net (use plain English
for your request).  The FAQ is also posted on the first of each month
to the newsgroup BIONEWS/bionet.announce immediately following the
posting of the BIOSCI information sheet.


	       **** BIOSCI USER ADDRESS DIRECTORY ****

Please take this opportunity to add you name and address information
to the BIOSCI User Address Database if you have not already done so.

Below is the address form that we would like each reader of the
BIOSCI/bionet newsgroups to complete and return if you would like to
be listed in our database.  The database will serve as a directory
that will enable biologists, who are currently using (or even just
reading) the BIOSCI newsgroups, to look up e-mail addresses and other
information about our users.

The address database will be indexed for WAIS and waismail access
(waismail is our WAIS e-mail server, more below) and will also be
available for access via other gopher sites if they wish to permit it.
The raw unindexed data will be available for FTP from net.bio.net and
is atomized sufficiently to allow import into your local RDBMS should
you so desire.

Please carefully follow the instructions for completing the form
below and return it to either of the following two addresses
(whichever is more convenient for you).  Thanks in advance for taking
the time to complete and return the form.

Addresses for returning forms         Location        Network
-----------------------------         --------        -------
biovote@net.bio.net                   U.S.A.          Internet/BITNET
biovote@daresbury.ac.uk               U.K.            JANET


	     MAKING SURE THAT YOUR INFORMATION IS CURRENT

This notice will be mailed bimonthly to each newsgroup.  You should
check our WAIS source or waismail e-mail server from time-to-time to
see if your address information is still up-to-date.  Send the message

help

to waismail@net.bio.net for instructions on using waismail.  Leave the
Subject: line in your message blank.

(Note as of 5 May 93 - the address database will be updated nightly
and will go on-line soon after we have collected and processed the
initial rush of data).


	    IMPORTANT INSTRUCTIONS - PLEASE READ CAREFULLY

Please enter all responses after the : on each line, leaving one (1)
blank space after the : (i.e., before the start of your text).

Please do NOT extend your responses past the end of each line (80
characters) or alter any of the field identifiers such as "first name: ". 
Several lines are provided at the end of the form for comments, but,
please adhere to the line length restriction.

On the date: line, please enter the date in the DD-MM-YY format, e.g.,
05-05-93 for 5 May 1993.  This line will tell others when the
information was last updated.  Please be sure to include the 0's for
single digit days or months, e.g., 05-05-93, not 5-5-93.

Note that the "e-mail network: " line below is for specifying, e.g.,
"Internet," "BITNET," "EARN," "JANET," or whatever other network that
your computer may be on.

If you are uncertain about any field, please feel free to leave it
blank, but please DO NOT DELETE the field identifier from the form!

In the first field below, "New information or Update ...", please
enter "N" if this is the first time that you have registered in the
directory or "U" if you are correcting a listing that you sent to us
previously.

Thanks again for your cooperation!

--------------- please cut here and return portion below ---------------

New information or Update to old record (enter N or U): 
date (DD-MM-YY): 
first name: 
middle initial: 
family name: 
job title: 
e-mail address: 
e-mail network: 
phone number: 
FAX number: 
institution: 
address1: 
address2: 
address3: 
city: 
state/province: 
country: 
postal code: 
research interest: 
research interest: 
comment: 
comment: 
comment: 
comment: 
comment: 

From owner-embldatabank@net.bio.net Tue Jun 15 23:00:00 1993
Path: biosci!daresbury!cnbvx3.cnb.uam.es!mrege
From: mrege@cnbvx3.cnb.uam.es
Newsgroups: bionet.molbio.embldatabank
Subject: Riders of the lost gene
Message-ID: <1993Jun16.134141.47@cnbvx3.cnb.uam.es>
Date: 16 Jun 93 13:41:41 GMT
Organization: C.N.Biotecnologia,  CSIC
Lines: 21

Hi, Netters,

This is to report an error in the EMBL nucleic acid database.
We are looking for the gene and protein sequence of lom, a gene from 
bacteriophage lambda: Barondess and Beckwith (1990) 'A bacterial virulence 
determinant encoded by lysogenic coliphage lambda' Nature, 346:871-74.

The sequence we retrieved from the database was BLAMLOM  X55792, which
has found to be the sequence of the gene 'bor', described in the same
paper. Searching for 'bor' did not retrieve any sequence, nor lom nor bor.

Please, can anyone find for us this sequence, or is it definitively lost
in the cyberspace?

Thanks,


Miquel Regue                                    e-mail:regue@farmacia.ub.es
Dept. Microbiology
Fac. of Pharmacy
University of Barcelona

From owner-embldatabank@net.bio.net Tue Jun 15 23:00:00 1993
Path: biosci!agate!howland.reston.ans.net!darwin.sura.net!welchgate.welch.jhu.edu!danj
From: danj@welchgate.welch.jhu.edu (Dan Jacobson)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: Riders of the lost gene
Message-ID: <1993Jun16.144848.642@welchgate.welch.jhu.edu>
Date: 16 Jun 93 14:48:48 GMT
References: <1993Jun16.134141.47@cnbvx3.cnb.uam.es>
Organization: Johns Hopkins Univ. Welch Medical Library
Lines: 117

In article <1993Jun16.134141.47@cnbvx3.cnb.uam.es> mrege@cnbvx3.cnb.uam.es writes:
>Hi, Netters,
>
>This is to report an error in the EMBL nucleic acid database.
>We are looking for the gene and protein sequence of lom, a gene from 
>bacteriophage lambda: Barondess and Beckwith (1990) 'A bacterial virulence 
>determinant encoded by lysogenic coliphage lambda' Nature, 346:871-74.
>
>The sequence we retrieved from the database was BLAMLOM  X55792, which
>has found to be the sequence of the gene 'bor', described in the same
>paper. Searching for 'bor' did not retrieve any sequence, nor lom nor bor.
>
>Please, can anyone find for us this sequence, or is it definitively lost
>in the cyberspace?
>

A quick gopher search of EMBL for: 

lom or bor 

finds the following two entries.
  

-----------

ID   BLAMLOM    standard; DNA; PHG; 326 BP.
XX
AC   X55792;
XX
DT   01-APR-1993 (Rel. 35, Created)
DT   01-APR-1993 (Rel. 35, Last updated, Version 1)
XX
DE   Bacteriophage lambda lom gene
XX
KW   envelope protein.
XX
OS   Bacteriophage lambda
OC   Viridae; ds-DNA nonenveloped viruses; Siphoviridae.
XX
RN   [1]
RP   1-326
RA   Barondess J.J., Beckwith J.;
RT   "A bacterial virulence determinant encoded by lysogenic coliphage
RT   lambda";
RL   Nature 346:871-874(1990).
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..326
FT                   /organism="Bacteriophage lambda"
FT   CDS             24..317
FT                   /gene="lom"
XX
SQ   Sequence 326 BP; 104 A; 72 C; 69 G; 81 T; 0 other;
     caaaaagcat cgggaataac accatgaaaa aaatgctact cgctactgcg ctggccctgc        60
     ttattacagg atgtgctcaa cagacgttta ctgttcaaaa caaaccggca gcagtagcac       120
     caaaggaaac catcacccat catttcttcg tttctggaat tgggcagaag aaaactgtcg       180
     atgcagccaa aatttgtggc ggcgcagaaa atgttgttaa aacagaaacc cagcaaacat       240
     tcgtaaatgg attgctcggt tttattactt taggcattta tactccgctg gaagcgcgtg       300
     tgtattgctc acaataattg catgag                                            326
//


--------------------------


ID   BLAMBOR    standard; DNA; PHG; 701 BP.
XX
AC   X55793;
XX
DT   01-APR-1993 (Rel. 35, Created)
DT   01-APR-1993 (Rel. 35, Last updated, Version 1)
XX
DE   Bacteriophage lambda bor gene
XX
KW   envelope protein.
XX
OS   Bacteriophage lambda
OC   Viridae; ds-DNA nonenveloped viruses; Siphoviridae.
XX
RN   [1]
RP   1-701
RA   Barondess J.J., Beckwith J.;
RT   "A bacterial virulence determinant encoded by lysogenic coliphage
RT   lambda";
RL   Nature 346:871-874(1990).
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..701
FT                   /organism="Bacteriophage lambda"
FT   CDS             66..686
FT                   /gene="bor"
XX
SQ   Sequence 701 BP; 157 A; 144 C; 232 G; 168 T; 0 other;
     ctgagtgtgt tacagaggtt cgtccgggaa cgggcgtttt attataaaac agtgagaggt        60
     gaacgatgcg taatgtgtgt attgccgttg ctgtctttgc cgcacttgcg gtgacagtca       120
     ctccggcccg tgcggaaggt ggacatggta cgtttacggt gggctatttt caagtgaaac       180
     cgggtacatt gccgtcgttg tcgggcgggg ataccggtgt gagtcatctg aaagggatta       240
     acgtgaagta ccgttatgag ctgacggaca gtgtgggggt gatggcttcc ctggggttcg       300
     ccgcgtcgaa aaagagcagc acagtgatga ccggggagga tacgtttcac tatgagagcc       360
     tgcgtggacg ttatgtgagc gtgatggccg gaccggtttt acaaatcagt aagcaggtca       420
     gtgcgtacgc catggccgga gtggctcaca gtcggtggtc cggcagtaca atggattacc       480
     gtaagacgga aatcactccc gggtatatga aagagacgac cactgccagg gacgaaagtg       540
     caatgcggca tacctcagtg gcgtggagtg caggtataca gattaatccg gcagcgtccg       600
     tcgttgttga tattgcttat gaaggctccg gcagtggcga ctggcgtact gacggattca       660
     tcgttggggt cggttataaa ttctgattag ccaggtaaca c                           701
//




Best of luck,

Dan Jacobson

danj@welchgatee.welch.jhu.edu

From owner-embldatabank@net.bio.net Wed Jun 16 23:00:00 1993
Path: biosci!TWNAS886.BITNET!MBDMITRY
From: MBDMITRY@TWNAS886.BITNET
Newsgroups: bionet.molbio.embldatabank
Subject: sequence charon 4a
Message-ID: <01GZI32TDQ4K90PSJ1@ccvax.sinica.edu.tw>
Date: 18 Jun 93 06:05:00 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 9

I would like to receive sequence of lambda charon 4a vector left arm DNA
from the position 18560(Kpn1-site) to the position 20660 (Sac1-site of
lac5 insert). Thanks very much
Dmitry Bessarab
Institute of Molecular Biology
Academia Sinica
Nankang, Taipei,11529
Rep. of China
e-mail mbdmitry@ccvax.sinica.edu.tw

From owner-embldatabank@net.bio.net Thu Jun 17 23:00:00 1993
Path: biosci!daresbury!buzz.bmc.uu.se!corax.udac.uu.se!sunic!pipex!uknet!comlab.ox.ac.uk!oxuniv!oxpath!rhubner
From: rhubner@molbiol.ox.ac.uk
Newsgroups: bionet.molbio.embldatabank
Subject: EMBL vs. PIR entries/errors? --continued
Message-ID: <1993Jun18.221142.1@molbiol.ox.ac.uk>
Date: 18 Jun 93 21:11:42 GMT
Organization: Oxford University Molecular Biology Data Centre
Lines: 21
Nntp-Posting-Host: heatly
Nntp-Posting-User: rhubner

Hi again,
 Thanks for the message. Initially PIR had been searched ONLY for EC number
and we therefore missed "unfortunately" lipase 1...
 Interestingly, I could NOW also find lip2 of Moraxella in EMBL release 35
with a stringsearch for "lip" or "lipase"!!!
***
ID   MSLIP2     standard; DNA; PRO; 2134 BP.
XX   
AC   X53868;
XX   
DT   27-APR-1993 (Rel. 35, Created)
DT   27-APR-1993 (Rel. 35, Last updated, Version 2)
XX   
DE   Moraxella sp. lip2 gene for lipase 2
***  
 I think I erred in the elastase (pancreatic) accession number I reported...
should be J03516 according to publication... NOT found with stringsearch under
elastase and also not with the number [J03517 will give an entry]...

Roland
 

From owner-embldatabank@net.bio.net Sun Jun 20 23:00:00 1993
Path: biosci!daresbury!buzz.bmc.uu.se!embl-heidelberg.de!ouzounis
From: ouzounis@embl-heidelberg.de
Newsgroups: bionet.molbio.embldatabank
Subject: GenPept question
Message-ID: <1993Jun21.153603.95282@embl-heidelberg.de>
Date: 21 Jun 93 14:36:03 GMT
Organization: EMBL, European Molecular Biology Laboratory
Lines: 13

Hi

one point for clarification: how is GenPept produced? Only from doc lines
in GenBank, or also by some intelligent program for ORF identification??

Thanks...

C A Ouzouns
EMBL
Heidelberg
Germany

ouzounis@embl-heidelberg.de

From owner-embldatabank@net.bio.net Sun Jun 20 23:00:00 1993
Path: biosci!FCRFV2.NCIFCRF.GOV!GUNNELL
From: GUNNELL@FCRFV2.NCIFCRF.GOV ("Mark A. Gunnell")
Newsgroups: bionet.molbio.embldatabank
Subject: RE: GenPept question
Message-ID: <930621111751.20209310@FCRFV2.NCIFCRF.GOV>
Date: 21 Jun 93 15:17:51 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 25

C A Ouzouns, EMBL, Heidelberg, asked:

> one point for clarification: how is GenPept produced? Only from doc lines
> in GenBank, or also by some intelligent program for ORF identification??

The version of GenPept that I distribute from fconvx.ncifcrf.gov is entirely 
derived from the GenBank "/translation=" feature qualifiers (see 1.4.6 in the 
GenPept 77.0 release notes).  According to the GenBank Release 77.0 release 
notes, the "/translation=" data is "automatically generated".  I can't say
whether or not the program used by the NCBI folks is utilizing an ORF 
identification scheme to generate this information.  Would anyone from NCBI
like to comment? 

Best Regards,

-Mark Gunnell
-------------------------------------------------------------------------------
Mark A. Gunnell                   | Internet: gunnell@ncifcrf.gov
Sci. Applications Analyst         | Bitnet:   gunnell%ncifcrf.gov@cunyvm.bitnet
Biomedical Supercomputer Center   | Phone:   (301) 846-5779
PRI/DynCorp                       |
NCI-FCRDC                         |
PO Box B, Bldg 430                |
Frederick, MD 21702-1201  USA     |
-------------------------------------------------------------------------------

From owner-embldatabank@net.bio.net Mon Jun 21 23:00:00 1993
Path: biosci!agate!howland.reston.ans.net!math.ohio-state.edu!cs.utexas.edu!uunet!pipex!doc.ic.ac.uk!daresbury!cnbvx3.cnb.uam.es!mrege
From: mrege@cnbvx3.cnb.uam.es
Newsgroups: bionet.molbio.embldatabank
Subject: Re: Riders of the lost gene/THANKS!
Message-ID: <1993Jun22.170723.49@cnbvx3.cnb.uam.es>
Date: 22 Jun 93 17:07:22 GMT
References: <1993Jun16.134141.47@cnbvx3.cnb.uam.es> <1993Jun16.144848.642@welchgate.welch.jhu.edu>
Organization: C.N.Biotecnologia,  CSIC
Lines: 23

Hello,

Thank you for your answer and thanks are also given to the people who
answered me directly.

Now we have got both sequences and after comparing them with the paper
they were reported in (Nature, 346:871-874), we have found that their
names are, as we figured out, interchanged. So the entry named BLAMLOM
corresponds to gene and protein bor, and the one named BLAMBOR is actually
the gene and protein lom.

Hope this helps to correct the error.
Cheers,



Miquel

====================================================================
Miquel Regue             e-mail:  regue@farmacia.ub.es
Dept. Microbiology
Fac. of Pharmacy
University of Barcelona 

From owner-embldatabank@net.bio.net Mon Jun 21 23:00:00 1993
Path: biosci!agate!howland.reston.ans.net!darwin.sura.net!lhc!ray!dab
From: dab@ray.nlm.nih.gov (Dennis Benson)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: GenPept question
Message-ID: <1993Jun22.195631.20426@nlm.nih.gov>
Date: 22 Jun 93 19:56:31 GMT
References: <930621111751.20209310@FCRFV2.NCIFCRF.GOV>
Sender: news@nlm.nih.gov
Distribution: bionet
Organization: National Library of Medicine
Lines: 20
X-Newsreader: Tin 1.1 PL4

GUNNELL@FCRFV2.NCIFCRF.GOV (Mark A. Gunnell) writes:
: C A Ouzouns, EMBL, Heidelberg, asked:
: 
: > one point for clarification: how is GenPept produced? Only from doc lines
: > in GenBank, or also by some intelligent program for ORF identification??
: 
: The version of GenPept that I distribute from fconvx.ncifcrf.gov is entirely 
: derived from the GenBank "/translation=" feature qualifiers (see 1.4.6 in the 
: GenPept 77.0 release notes).  According to the GenBank Release 77.0 release 
: notes, the "/translation=" data is "automatically generated".  I can't say
: whether or not the program used by the NCBI folks is utilizing an ORF 
: identification scheme to generate this information.  Would anyone from NCBI
: like to comment? 

NCBI is not searching the sequence to add potential coding regions to GenBank
-- the author's specification of the coding region (the CDS feature) is used
to do the translation.

Dennis Benson
NCBI

From owner-embldatabank@net.bio.net Wed Jun 23 23:00:00 1993
Path: biosci!daresbury!bioftp.unibas.ch!comp.bioz.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: On the usefulness of sequences in the databases
Keywords: Patents, database quality, trash
Message-ID: <1993Jun24.170658.14826@comp.bioz.unibas.ch>
Date: 24 Jun 93 17:06:58 GMT
Sender: usenet@comp.bioz.unibas.ch (NEWS transaction account)
Reply-To: doelz@urz.unibas.ch
Organization: EMBnet Switzerland [BASEL]
Lines: 45
Xref: biosci bionet.molbio.genbank:1318 bionet.molbio.embldatabank:196
Nntp-Posting-Host: biox.embnet.unibas.ch

Colleagues, 
I have full sympathy in being eager to collect all and everything in 
the sequence data arena. Therefore, I currently compare GENBANK 77 vs. 
EMBL 35, and try to figure out the differences. Most can be explained,  
however, look at the following sequence: 

LOCUS       A00674          6 bp    DNA             PAT       29-JAN-1993
DEFINITION  Nucleotide sequence 3 from patent number WO8601533
ACCESSION   A00674
KEYWORDS    .
SOURCE      Unknown
  ORGANISM  Unknown
            Unclassified.
REFERENCE   1  (bases 1 to 6)
  AUTHORS   
  TITLE     'PRODUCTION OF CHIMERIC ANTIBODIES'
  JOURNAL   Patent: WO 8601533-A 3 13-MAR-1986;
  STANDARD  full automatic
BASE COUNT        3 a      2 c      0 g      1 t
ORIGIN      
        1 cactaa


I start to worry what the purpose of these sequences is. I am aware of the
fact that this is an extreme, but there are a couple of more, very short 
sequences. In particular, these 'patent' data are of more than doubtful 
quality. However, as both EMBL and GENBANK incorporate these, I need to 
explain to my customers what the need is to have a hexanucleotide 
in a sequence database, which occurs 28340 times in over 70000 sequences?


Regards
Reinhard 
  


-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz@urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------

From owner-embldatabank@net.bio.net Thu Jun 24 23:00:00 1993
Path: biosci!agate!usenet.ins.cwru.edu!magnus.acs.ohio-state.edu!math.ohio-state.edu!darwin.sura.net!lhc!object!ostell
From: ostell@object.nlm.nih.gov (Jim Ostell)
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: Re: On the usefulness of sequences in the databases
Message-ID: <1993Jun25.125802.23133@nlm.nih.gov>
Date: 25 Jun 93 12:58:02 GMT
References: <1993Jun24.170658.14826@comp.bioz.unibas.ch>
Sender: news@nlm.nih.gov
Organization: National Library of Medicine
Lines: 52
Xref: biosci bionet.molbio.genbank:1322 bionet.molbio.embldatabank:197
X-Newsreader: Tin 1.1 PL4

Patents in the sequence databases:

There has been some confusion about the purpose, meaning, and quality of the
patent sequences appearing in the sequence databases. To give a little
background, the European, U.S., and Japanese Patent Offices have made an
agreement to exchange patent related sequences with each other and to make
them available to the public. Each office has made their own arrangements for
capturing their patent sequence data. In the U.S., the NCBI is working with
the U.S. Patent and Trademark Office to enter the backlog of patents, gather
the new patents, and to distribute them in GenBank.

Speaking now only for the U.S. effort, most of the backlog has been entered.
In the last tow releases of GenBank, NCBI has included patent sequences in a
separate division of the database, "PAT". The sequences come jointly from the
data capture efforts of EMBL for European patents and the NCBI for the U.S.
patents.  The agreement with the USPTO was to take only DNA sequences at
least 10 residues long. Because a patent is a legal, not a scientific,
document, it is often difficult or impossible to reliably capture all the
information of biological interest such as features or even the organism
name. Further, it is often difficult or impossible, except by careful legal
scrutiny, to determine what sequence (if any) is claimed to be patented and
what is just an "exhibit" or associated information. Finally, the sequences
themselves may be presented in a sufficiently complex manner that it can be
hard to determine what the sequence itself is in some regions.

In consultation with USPTO then, it became clear that patent sequences could
not be guaranteed to be a source of new biological data. We took the approach
instead that the purpose of the patent sequences was to serve as a search
key back to the original patent.  To achieve this, we entered all sequences
in the patent which met the length limits and associated them with the
patent document. No attempt was made to biologically annotate them or
determine what was claimed by the patent. Their most valuable use is to
search by sequence similarity to discover patents that may overlap with a
possible claim made on a new sequence.  The other use would be to find the
sequences relevant to a given patent.

Since these uses are so specialized, we felt it was important to keep these
sequences in a separate division. Note that it is typical for a patent to
refer to published sequences or for sequence data to be published after a
patent is awarded. So most of these sequences have cognates in the usual
divisions of the database.

As the sequences directly submitted electronically to USPTO as part of the
patent application process become available, it is possible that they will
be more fully annotated biologically by the submittor. However, since the
immediate submittor is usually a lawyer's office and since such annotation
is not required for the patent to be considered, do not expect too much.

We hope this clarifies the attributes of patented sequence entries.

  Jim Ostell
  NCBI

From owner-embldatabank@net.bio.net Sun Jun 27 23:00:00 1993
Path: biosci!ACS.UCALGARY.CA!sjszarka
From: sjszarka@ACS.UCALGARY.CA (Steve Szarka)
Newsgroups: bionet.molbio.embldatabank
Subject: PCR sequences in Databases?
Message-ID: <9306280258.AA22023@acs4.acs.ucalgary.ca>
Date: 28 Jun 93 02:58:31 GMT
Sender: daemon@net.bio.net
Reply-To: sjszarka@acs.ucalgary.ca
Distribution: bionet
Lines: 13


What is the present policy on submitting DNA sequences off of PCR
products to the EMBL or Genbank databases? I looked through the
Authorin v3.0 program and could not find a PCR heading for the
type of molecule sequenced. Is PCR fidelity not reliable enough
to include sequence information in the databases?


Thanks Steve
sjszarka@acs.ucalgary.ca




From owner-embldatabank@net.bio.net Sun Jun 27 23:00:00 1993
Path: biosci!ACS.UCALGARY.CA!sjszarka
From: sjszarka@ACS.UCALGARY.CA (Steve Szarka)
Newsgroups: bionet.molbio.embldatabank
Subject: Sequences from PCR fragments in Databases?
Message-ID: <9306281825.AA57346@acs4.acs.ucalgary.ca>
Date: 28 Jun 93 18:25:25 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 9

What is the present policy on submitting DNA sequences off of PCR
products to the EMBL or Genbank databases? I looked through the
Authorin v3.0 program and could not find a PCR heading for the
type of molecule sequenced. Is PCR fidelity not reliable enough
to include sequence information in the databases?


Thanks Steve
sjszarka@acs.ucalgary.ca

From owner-embldatabank@net.bio.net Mon Jun 28 23:00:00 1993
Path: biosci!bcm!cs.utexas.edu!uunet!mcsun!news.funet.fi!funic!convex!harper
From: harper@convex.csc.FI (Rob Harper)
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: Re: On the usefulness of sequences in the databases
Message-ID: <1993Jun29.123212.18851@nic.funet.fi>
Date: 29 Jun 93 12:32:12 GMT
References: <1993Jun24.170658.14826@comp.bioz.unibas.ch> <1993Jun25.125802.23133@nlm.nih.gov>
Sender: usenet@nic.funet.fi
Organization: Finnish Academic and Research Network Project - FUNET
Lines: 70
Xref: biosci bionet.molbio.genbank:1324 bionet.molbio.embldatabank:200
Nntp-Posting-Host: convex.csc.fi

In <1993Jun25.125802.23133@nlm.nih.gov> ostell@object.nlm.nih.gov 
(Jim Ostell) writes:

>Patents in the sequence databases:

When I was in the States Brewster Kahle said that one of the most
popular wais sources was the patent-sampler.src which covered
about 2 weeks of Patent applications (18MBytes) from the US Patent Office.

Last week when I was at EMBL, Peter Stoehr showed me the patent.dat
that is available from the epo (European Patent office), and I thought
it might be a good idea to make a wais source from the data.

This is a copy of the epo.src

------------------8< --------- clip-------------8<---------
(:source
   :version  3
   :ip-address "128.214.6.100"
   :ip-name "wais.funet.fi"
   :tcp-port 210
   :database-name "embnet/epo"
   :cost 0.00
   :cost-unit :free
   :maintainer "harper@wais.funet.fi"
   :description "Server created with WAIS release 8 b5 on
    Jun 28 13:47:08 1993 by harper@wais.funet.fi
    This is a wais source of the genetic material in the
    European Patent office.
   The files of type catalog used in the index were:
   /pub/sci/molbio/databases/patent/patent.dat
"
)
------------------8< --------- clip-------------8<---------

If you do not have Xwais for seaching I have also included the epo.src
on the gopher at CSC under the directory (WAIS WORLD for the biologist)
(That's near enough to WAYNE'S WORLD... isn't it Daniel)

                          WAIS WORLD for the Biologist

      1.  Agriculture/
      2.  Journal TOC search <?>
      3.  Materials & Methods search <?>
      4.  Prosite protein database search <?>
      5.  Search Specialised Databases at Welchlab /
      6.  Search Usenet News "sci"-groups (UK) <?>
      7.  Search for Software at EMBL (Mac, Dos, Unix, Vax)/
      8.  Search for Software at Indiana - IUBio (Mac, Dos, Unix, Vax)/
      9.  Search the Biosci/Bionet newsgroups (USA) <?>
 -->  10. Search the European Patent Office database (Finland) <?>
      11. Search the GDB database (USA) <?>
      12. Search the INFO-GCG mailing list (FINLAND) <?>
      13. Search the Protein Data Bank (USA) <?>
      14. Search the WAIS directory of servers (USA) <?>

So now all those commercial firms that have just patented a 10 base pair
sequence can now quickly check and see if some-one has already beat them
to it:-)

RGDS -=ROB=-




--
 Rob Harper                        E-mail:          harper@convex.csc.fi    
 Center for Scientific Computing   Molbio/software: harper@nic.funet.fi
 Tietotie 6, P.O. Box 405          Telephone:       +358 0 457 2076
 SF-02101 Espoo Finland            Fax:             +358 0 457 2302

From owner-embldatabank@net.bio.net Wed Jun 30 23:00:00 1993
Path: biosci!uwm.edu!ux1.cso.uiuc.edu!sdd.hp.com!cs.utexas.edu!uunet!mcsun!uknet!comlab.ox.ac.uk!oxuniv!oxpath!rhubner
From: rhubner@molbiol.ox.ac.uk
Newsgroups: bionet.molbio.embldatabank
Subject: secondary accession numbers
Message-ID: <1993Jul1.234108.1@molbiol.ox.ac.uk>
Date: 1 Jul 93 22:41:08 GMT
Organization: Oxford University Molecular Biology Data Centre
Lines: 11
Nntp-Posting-Host: heatly
Nntp-Posting-User: rhubner

Hi there group,
 just to tell to those who aren't aware: sequence entries (e.g. my recent
enquiry about a nucleotide sequence for human pancreatic elastase J03516)
may represent *secondary* accession numbers and can not be searched for by
FETCH (and especially STRINGSEARCH; only ref fields) in GCG. This was the
reason why I could not locate the correct entry for this sequence[however,this
case is peculiar as there seems to be several entries for J03516 as secondary
number]. The SRS software can do the job properly...
 Special thanks to Kate Rice (EMBL - Heidelberg) to have pointed out these
details to me!!!
Roland

