From owner-embldatabank@net.bio.net Tue Nov 01 22:00:00 1994
Path: biosci!rutgers!gatech!howland.reston.ans.net!news.sprintlink.net!EU.net!Germany.EU.net!news.dfn.de!urmel.informatik.rwth-aachen.de!news.rhrz.uni-bonn.de!ibm.rhrz.uni-bonn.de!UZS13B
From: UZS13B@ibm.rhrz.uni-bonn.de (Stefan Kahlert)
Newsgroups: bionet.molbio.embldatabank,bionet.molbio.genbank
Subject: How to search just on promoter-sequences?
Date: Wed, 02 Nov 94 14:59:12 MEZ
Organization: RHRZ Uni-Bonn
Lines: 15
Message-ID: <17062D2C0S85.UZS13B@ibm.rhrz.uni-bonn.de>
NNTP-Posting-Host: ibm.rhrz.uni-bonn.de
Xref: biosci bionet.molbio.embldatabank:383 bionet.molbio.genbank:1815

Dear Colleagues
 
I am looking for a certain sequence (a binding-motif) only in *promoters*.
Fasta allows to search whole genes but only promoters are interesting for
my problem.
I don't have an idea how to do a search only on promoters.
It would be very helpful if you could just tell me how to get a helpfile
from the server that does searching on a promoter-database.
 
Thanks a lot for your help...
 
Stefan Kahlert
 
Medical Polyclinic
University of Bonn

From owner-embldatabank@net.bio.net Tue Nov 01 22:00:00 1994
Path: biosci!rutgers!gatech!udel!news.sprintlink.net!EU.net!uunet!heifetz.msen.com!emory!usenet
From: bcresas@bimcore.emory.edu (Scott Sammons)
Newsgroups: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Subject: E. coli Database Collection -->  GCG format
Date: 2 Nov 1994 20:28:59 GMT
Organization: Biomolecular Computing Resource, Emory University
Lines: 16
Distribution: world
Message-ID: <398sqb$79b@emory.mathcs.emory.edu>
Reply-To: bcresas@bimcore.emory.edu
NNTP-Posting-Host: bimcore.cc.emory.edu
Xref: biosci bionet.software.gcg:801 bionet.software:9889 bionet.molbio.embldatabank:384

Greetings:

Has anyone successfully reformatted the ECD data into GCG formatted databases.
The program embltogcg core dumps when I try it with one of the ECD .dat
files.

Scott Sammons
=======================================================================
| Scott A. Sammons                                    (404) 727-2780  |
| Emory University                               FAX: (404) 727-3659  |
| Biomolecular Computing Resource                                     |
| 3025 Rollins Research Center       Email: sammons@bimcore.emory.edu |
| Atlanta, GA 30322                                                   |
=======================================================================



From owner-embldatabank@net.bio.net Wed Nov 02 22:00:00 1994
Path: biosci!rutgers!gatech!howland.reston.ans.net!pipex!lyra.csx.cam.ac.uk!nntp-serv.cam.ac.uk!pmr
From: pmr@staffa.sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Subject: Re: E. coli Database Collection -->  GCG format
Date: 03 Nov 1994 09:45:14 GMT
Organization: University of Cambridge, England
Lines: 24
Distribution: world
Message-ID: <PMR.94Nov3094514@staffa.sanger.ac.uk>
References: <398sqb$79b@emory.mathcs.emory.edu>
NNTP-Posting-Host: staffa.sanger.ac.uk
In-reply-to: bcresas@bimcore.emory.edu's message of 2 Nov 1994 20:28:59 GMT
Xref: biosci bionet.software.gcg:802 bionet.software:9901 bionet.molbio.embldatabank:385

In article <398sqb$79b@emory.mathcs.emory.edu> bcresas@bimcore.emory.edu (Scott Sammons) writes:
>   Has anyone successfully reformatted the ECD data into GCG formatted databases.
>   The program embltogcg core dumps when I try it with one of the ECD .dat
>   files.

As I am involved in both ECD and EGCG, I will try to put something into EGCG 8.0
when it is ready. I take it you are referring to the very latest (new format)
ECD here. Do you mean the genorf.dat file or the contigs/*.dna files (which are
closer to EMBL format and have more sequence data)?

Beware though - E.coli sequencing is now going so well that ECD has a contig (and
more to follow) over the 350k mark, which will give some problems with GCG.

Another option would be to use a script (or Perl) to reformat into enough of an
"EMBL" format for EMBLtoGCG to accept it.


--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr    | England

From owner-embldatabank@net.bio.net Wed Nov 02 22:00:00 1994
Path: biosci!daresbury!bioftp.unibas.ch!citi2.fr!jussieu.fr!cea.fr!usenet
From: cisitm@albert.cad.cea.fr (Pierre Didierjean)
Newsgroups: bionet.molbio.embldatabank
Subject: *** Q: WHAT KIND OF PEOPLE ON THE NET ?
Date: 3 Nov 1994 16:17:03 GMT
Organization: SSII
Lines: 28
Sender: cisitm@albert.cad.cea.fr
Message-ID: <39b2dv$9pd@anemone.saclay.cea.fr>
NNTP-Posting-Host: nyassa.cad.cea.fr

I'd like to know what kind of people i find on the net.

Students, Commercials, Adminitrations, Scientifics or what ??

Is anybody knows that or have statistical results ?


What are YOU doing in life ?

I am a system administrator.


Thanks for the answers and sorry for my english .....



Bye


+-----------------------------------------------------------------------------+
|		Pierre DIDIERJEAN 					      |
|									      |
|		Administrateur Systeme UNIX				      |
|		Cisi, Aix-en-Provence 					      |
|		France							      |
+-----------------------------------------------------------------------------+
|	email : 	cisitm@albert.cad.cea.fr 			      |
+-----------------------------------------------------------------------------+

From owner-embldatabank@net.bio.net Wed Nov 02 22:00:00 1994
Path: biosci!agate!darkstar.UCSC.EDU!pellinore!rafael
From: rafael@cse.ucsc.edu (David Konerding)
Newsgroups: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Subject: Re: E. coli Database Collection -->  GCG format
Followup-To: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Date: 3 Nov 1994 21:14:47 GMT
Organization: UC Santa Cruz CIS/CE
Lines: 38
Distribution: world
Message-ID: <39bjs7$t3p@darkstar.UCSC.EDU>
References: <398sqb$79b@emory.mathcs.emory.edu> <PMR.94Nov3094514@staffa.sanger.ac.uk>
NNTP-Posting-Host: pellinore.cse.ucsc.edu
X-Newsreader: TIN [version 1.2 PL2]
Xref: biosci bionet.software.gcg:805 bionet.software:9916 bionet.molbio.embldatabank:387

Peter Rice (pmr@staffa.sanger.ac.uk) wrote:
: In article <398sqb$79b@emory.mathcs.emory.edu> bcresas@bimcore.emory.edu (Scott Sammons) writes:
: >   Has anyone successfully reformatted the ECD data into GCG formatted databases.
: >   The program embltogcg core dumps when I try it with one of the ECD .dat
: >   files.

: As I am involved in both ECD and EGCG, I will try to put something into EGCG 8.0
: when it is ready. I take it you are referring to the very latest (new format)
: ECD here. Do you mean the genorf.dat file or the contigs/*.dna files (which are
: closer to EMBL format and have more sequence data)?

: Beware though - E.coli sequencing is now going so well that ECD has a contig (and
: more to follow) over the 350k mark, which will give some problems with GCG.

: Another option would be to use a script (or Perl) to reformat into enough of an
: "EMBL" format for EMBLtoGCG to accept it.

Hmm.  Can anybody give me a pointer to where I can find the ECD?  I am writing
a thesis which regards computational methods for finding genes in prokaryotic
DNA, and any database more complete than EcoSeq6 would be great.

Thanks.


: --
: ------------------------------------------------------------------------
: Peter Rice                           | Informatics Division
: E-mail: pmr@sanger.ac.uk             | The Sanger Centre
: Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
: Fax: (44) 1223 494919                | Cambs, CB10 1RQ
: URL: http://www.sanger.ac.uk/~pmr    | England

--
--
  O~_    -------------  David Konerding (University of California, Santa Cruz)
 c/ /'   -------        rafael@cse.ucsc.edu
( ) \( ) ---            rafael@cats.ucsc.edu


From owner-embldatabank@net.bio.net Wed Nov 02 22:00:00 1994
Path: biosci!rutgers!gatech!howland.reston.ans.net!pipex!lyra.csx.cam.ac.uk!nntp-serv.cam.ac.uk!pmr
From: pmr@staffa.sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Subject: Re: E. coli Database Collection -->  GCG format
Followup-To: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Date: 03 Nov 1994 22:44:38 GMT
Organization: University of Cambridge, England
Lines: 29
Distribution: world
Message-ID: <PMR.94Nov3224438@staffa.sanger.ac.uk>
References: <398sqb$79b@emory.mathcs.emory.edu> <PMR.94Nov3094514@staffa.sanger.ac.uk>
	<39bjs7$t3p@darkstar.UCSC.EDU>
NNTP-Posting-Host: staffa.sanger.ac.uk
In-reply-to: rafael@cse.ucsc.edu's message of 3 Nov 1994 21:14:47 GMT
Xref: biosci bionet.software.gcg:806 bionet.software:9918 bionet.molbio.embldatabank:388

In article <39bjs7$t3p@darkstar.UCSC.EDU> rafael@cse.ucsc.edu (David Konerding) writes:
>   Peter Rice (pmr@staffa.sanger.ac.uk) wrote:
>   : In article <398sqb$79b@emory.mathcs.emory.edu> bcresas@bimcore.emory.edu (Scott Sammons) writes:
>   : >   Has anyone successfully reformatted the ECD data into GCG formatted databases.
>   : >   The program embltogcg core dumps when I try it with one of the ECD .dat
>   : >   files.
>
>   : As I am involved in both ECD and EGCG, I will try to put something into EGCG 8.0
>   : when it is ready. I take it you are referring to the very latest (new format)
>   : ECD here. Do you mean the genorf.dat file or the contigs/*.dna files (which are
>   : closer to EMBL format and have more sequence data)?
>
>   Hmm.  Can anybody give me a pointer to where I can find the ECD?  I am writing
>   a thesis which regards computational methods for finding genes in prokaryotic
>   DNA, and any database more complete than EcoSeq6 would be great.

The latest ECD is available from ftp.ebi.ac.uk in directory pub/databases/ecdc
(note the extra c at the end :-)

The new format is described in: Kroeger et al. (1994); Nucleic Acids Research 22:3450-3455.

The contigs directory has the non-redundant contiguous sequences from E.coli K-12.
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr    | England

From owner-embldatabank@net.bio.net Wed Nov 02 22:00:00 1994
Newsgroups: bionet.software.gcg,bionet.software,bionet.molbio.embldatabank
Path: biosci!rutgers!gatech!howland.reston.ans.net!news.sprintlink.net!EU.net!sun4nl!sci.kun.nl!jackl
From: jackl@sci.kun.nl (Jack Leunissen)
Subject: Re: E. coli Database Collection --> GCG format
Message-ID: <Cypp0K.L9@sci.kun.nl>
Sender: news@sci.kun.nl (News owner)
Nntp-Posting-Host: wn2.sci.kun.nl
Organization: University of Nijmegen, The Netherlands
References: <398sqb$79b@emory.mathcs.emory.edu>
Date: Thu, 3 Nov 1994 21:57:55 GMT
Lines: 17
Xref: biosci bionet.software.gcg:807 bionet.software:9921 bionet.molbio.embldatabank:389

In <398sqb$79b@emory.mathcs.emory.edu> bcresas@bimcore.emory.edu (Scott Sammons) writes:

>Has anyone successfully reformatted the ECD data into GCG formatted databases.
>The program embltogcg core dumps when I try it with one of the ECD .dat
>files.

You might try my program "embl2nbrf", which reformats EMBL-formatted data
into NBRF (=PIR) format. GCG handles this, provided you change the format
identifier in the .header file from GCG into NBRF.

You can find the program at "ftp.caos.kun.nl", via anonymous FTP, in the
directory "pub/molbio/embl2nbrf".

Best regards,

Jack Leunissen
CAOS/CAMM Center

From owner-embldatabank@net.bio.net Sun Nov 06 22:00:00 1994
Newsgroups: bionet.molbio.embldatabank
Path: biosci!rutgers!gatech!swrinde!pipex!uunet!utcsri!utnut!nott!cunews!freenet.carleton.ca!FreeNet.Carleton.CA!af910
From: af910@FreeNet.Carleton.CA (Tom Trottier)
Subject: Is there anyone there?
Message-ID: <Cywwt2.AI0@freenet.carleton.ca>
Sender: news@freenet.carleton.ca (Usenet News Admin)
Reply-To: af910@FreeNet.Carleton.CA (Tom Trottier)
Organization: The National Capital FreeNet
Date: Mon, 7 Nov 1994 19:29:26 GMT
Lines: 7


Tom
--
\ \~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\
 \Tom Trottier  613 594-4829  fax:594-8944  af910@freenet.carleton.ca\
  \Information Animation, 199 Holmwood Ave Ottawa ON Canada K1S 2P3   \
   \___________________________________________________________________\

From owner-embldatabank@net.bio.net Wed Nov 09 22:00:00 1994
Newsgroups: bionet.molbio.embldatabank
Path: biosci!agate!sunsite.doc.ic.ac.uk!hgmp.mrc.ac.uk!ebi.ac.uk!jecop
From: jecop@ebi.ac.uk (Jeroen Coppieters)
Subject: Re: Is there anyone there?
Message-ID: <Cz1pn1.2nI@ebi.ac.uk>
Sender: news@ebi.ac.uk (Mr news)
Organization: European Bioinformatics Institute
X-Newsreader: TIN [version 1.2 PL2]
References: <Cywwt2.AI0@freenet.carleton.ca>
Date: Thu, 10 Nov 1994 09:42:36 GMT
Lines: 30

Tom Trottier (af910@FreeNet.Carleton.CA) wrote:
:Is there anyone there?

: Tom
: --
: \ \~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\
:  \Tom Trottier  613 594-4829  fax:594-8944  af910@freenet.carleton.ca\
:   \Information Animation, 199 Holmwood Ave Ottawa ON Canada K1S 2P3   \
:    \___________________________________________________________________\
where is there?
I know I'm here.

Jeroen
--
======================================================================

         . O .                               Jeroen Coppieters
     . O O o   O .                            Software Support
   O O O O *o    O O               Jeroen.Coppieters@ebi.ac.uk
  O O O O(   *o  )O O                         ++44 1223 494422
  )O O O O   o*  O O(                        
  O O O O( o*    )O O
  )O O O O  *o   O O(                      EMBL Outstation EBI
  O O O O(   *o  )O O      (European Bioinformatics Institute)
  )O O O O   o*  O O(                             Hinxton Hall
    O O O( o*   )O('                                   Hinxton
     ` O(   *o O  '                         Cambridge CB10 1RQ
         ` O '                                              UK
http://www.ebi.ac.uk
======================================================================

From owner-embldatabank@net.bio.net Wed Nov 09 22:00:00 1994
Path: biosci!bloom-beacon.mit.edu!spool.mu.edu!howland.reston.ans.net!pipex!uunet!newsfeed.ACO.net!info.univie.ac.at!coil!gl
From: gl@coil.mdy.univie.ac.at (Gerald Loeffler)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: Is there anyone there?
Date: 10 Nov 1994 12:49:40 GMT
Organization: Inst. for Theoretical Chemistry / Univ. of Vienna
Lines: 16
Sender: gl@coil (Gerald Loeffler)
Distribution: world
Message-ID: <39t4t4$6e6@infosrv.edvz.univie.ac.at>
References: <Cywwt2.AI0@freenet.carleton.ca>
NNTP-Posting-Host: coil.mdy.univie.ac.at

yes
-- 
Gerald Loeffler
PhD student in Theoretical Biochemistry

Email: gl@mdy.univie.ac.at
Phone: +43 1 40480 612
Fax:   +43 1 4028525
Mail:  University of Vienna
       Institute for Theoretical Chemistry
       Theoretical Biochemistry Group
       Waehringerstrasse 17/Parterre
       A-1090 Wien, Austria

Chauvinistic Statement: "Austria Erit In Orbe Ultima"


From owner-embldatabank@net.bio.net Fri Nov 11 22:00:00 1994
Newsgroups: bionet.molbio.yeast,bionet.molbio.embldatabank
Path: biosci!agate!sunsite.doc.ic.ac.uk!daresbury!bioftp.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Subject: How much yeast is known? [Re: Progress Report YEAST Genome Sequencing]
Message-ID: <1994Nov12.071107.19651@comp.bioz.unibas.ch>
Organization: EMBnet Switzerland [Basel]
X-Newsreader: TIN [version 1.2 PL2]
References: <39dl63$kiu@rc1.vub.ac.be>
Date: Sat, 12 Nov 1994 07:11:07 GMT
Lines: 47
Xref: biosci bionet.molbio.yeast:1888 bionet.molbio.embldatabank:393

Colleagues, 
I got an inquiry of a colleague (not woking at the Biozentrum, but 
within our EMBnet services) on what fraction of the yeast genome is 
sequenced, resp. available if a BLAST, FASTA or MPsrch request is 
launched. I screened the archive and found the data recently posted
to the yeast bb.  Just for the sake of a correct answer on the second
question, I would like to ask the following questions related to the 
included posting. 

Mordant Philippe (pmordant@rc1.vub.ac.be) wrote:

: *********************************************************************
: *                                                                   *
: *      IN EMBL + GENBANK + MIPS + SDB ON NOVEMBER 01, 1994          *
: * PROGRESS REPORT FOR THE SYSTEMATIC SEQUENCING OF THE YEAST GENOME *
: *                                                                   *
: *********************************************************************

Does this include MIPS sequences which are kept confidential, i.e. 
not being available to the non-MIPS-customer research community? 

[...]
: ---------------------------------------------------------------------
:                           submitted     estimated	new in
:                            up today     length		october 94
:                                (kb)     (kb)	(%)	(kb)
: ---------------------------------------------------------------------
[...]
: ----------------------------------------------------------------
: TOTAL YEAST GENOME             5687   12400    46%    669
: ----------------------------------------------------------------
[...]

As there is a considerable amount of yeast sequences already published 
in the sequence database (with functions etc. assigned), what would be 
a more customer-oriented guess on what fraction of sequences is available 
to the community in the EMBL database? 

Regards
Reinhard 


-- 
 R.Doelz         Klingelbergstr.70| Tel. x41 61 267 2247  Fax x41 61 267 2078|
 Biocomputing        CH 4056 Basel| electronic Mail    doelz@ubaclu.unibas.ch|
 Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info@ch.embnet.org</a> 

From owner-embldatabank@net.bio.net Fri Nov 11 22:00:00 1994
Path: biosci!agate!news.Stanford.EDU!fafner.Stanford.EDU!cherry
From: cherry@fafner.Stanford.EDU (Mike Cherry)
Newsgroups: bionet.molbio.yeast,bionet.molbio.embldatabank
Subject: Re: How much yeast is known? [Re: Progress Report YEAST Genome Sequencing]
Date: 12 Nov 1994 17:34:20 GMT
Organization: Stanford University Genetics Department
Lines: 48
Message-ID: <3a2uas$42r@nntp.Stanford.EDU>
References: <39dl63$kiu@rc1.vub.ac.be> <1994Nov12.071107.19651@comp.bioz.unibas.ch>
NNTP-Posting-Host: fafner.stanford.edu
Xref: biosci bionet.molbio.yeast:1889 bionet.molbio.embldatabank:394

In article <1994Nov12.071107.19651@comp.bioz.unibas.ch>,
Reinhard Doelz <doelz@comp.bioz.unibas.ch> wrote:
>Mordant Philippe (pmordant@rc1.vub.ac.be) wrote:
>
>: *      IN EMBL + GENBANK + MIPS + SDB ON NOVEMBER 01, 1994          *
>: * PROGRESS REPORT FOR THE SYSTEMATIC SEQUENCING OF THE YEAST GENOME *
>
>Does this include MIPS sequences which are kept confidential, i.e. 
>not being available to the non-MIPS-customer research community? 

I believe the answer is yes. The numbers posted by Dr. Philippe appear
to include non-public sequences held at MIPS or at one of the other
sequencing centers around the world. These would be regions that are
considered almost done and now being annotated and verified, but not
yet submitted to the public databases.

Note that SGD, which might be what is referred to as SDB above, only
contains publicly available sequences.  That is, SGD does not contain
any sequences that are not already in GenBank/EMBL.

>:                           submitted     estimated	new in
>:                            up today     length		october 94
>:                                (kb)     (kb)	(%)	(kb)
>: TOTAL YEAST GENOME             5687   12400    46%    669
>
>As there is a considerable amount of yeast sequences already published 
>in the sequence database (with functions etc. assigned), what would be 
>a more customer-oriented guess on what fraction of sequences is available 
>to the community in the EMBL database? 

We estimate that 52% of the S. cerevisiae genome is present in
GenBank/EMBL.  This estimate includes some analysis that we have done
on yeast sequences in GenBank. We built a non-redundant set of
sequences, or a consensus sequence contig, out of the GenBank
sequences. When the consensus sequences are combined with the genomic
sequencing results we obtain a number of 6.4Mbp. The amount of
publicly available sequences expected from the genomic sequencing
projects should increase by 4Mbp in 1995. Thus it is advised that you
regularly search the GenBank/EMBL on the chance that your region of
interest has been made public.

Mike

J. Michael Cherry                       Internet: cherry@genome.stanford.edu
Head, Computing                         Stanford DNA Sequence & Tech. Center
Project Manager                         Saccharomyces Genome Database
Stanford University School of Medicine  Stanford, CA 94305-5120
Voice: 415-723-7541                     FAX: 415-723-7016

From owner-embldatabank@net.bio.net Sun Nov 13 22:00:00 1994
Path: biosci!bloom-beacon.mit.edu!gatech!swrinde!pipex!sunic!news.funet.fi!hydra.Helsinki.FI!news.helsinki.fi!kruuna!zheng
From: zheng@cc.Helsinki.FI (Huanquan Zheng)
Newsgroups: bionet.molbio.embldatabank,bionet.molbio.genbank
Subject: pET22b sequence needed
Date: 14 Nov 1994 13:47:51 GMT
Organization: University of Helsinki
Lines: 8
Message-ID: <3a7pq7$jpe@oravannahka.Helsinki.FI>
NNTP-Posting-Host: kruuna.helsinki.fi
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Newsreader: TIN [version 1.2 PL0]
Xref: biosci bionet.molbio.embldatabank:395 bionet.molbio.genbank:1822

I can not find the pET22b in both databases EMBL and GenBank, 
any hint? 

Thanks!


--
	Huanquan Zheng 	~{V#;7H*#,KD4(HK#,;lHUWS5D#,~} 

From owner-embldatabank@net.bio.net Sun Nov 13 22:00:00 1994
Newsgroups: bionet.molbio.yeast,bionet.molbio.embldatabank
Path: biosci!bloom-beacon.mit.edu!gatech!swrinde!pipex!sunsite.doc.ic.ac.uk!hgmp.mrc.ac.uk!ebi.ac.uk!tflores
From: tflores@ebi.ac.uk (Tom Flores)
Subject: Re: How much yeast is known? [Re: Progress Report YEAST Genome Sequencing]
Message-ID: <Cz99Hw.CBF@ebi.ac.uk>
Lines: 43
Sender: news@ebi.ac.uk (Mr news)
Reply-To: tflores@ebi.ac.uk (Tom Flores)
Organization: European Bioinformatics Institute (EMBL) - UK
X-Newsreader: mxrn 6.18-16
References: <39dl63$kiu@rc1.vub.ac.be> <1994Nov12.071107.19651@comp.bioz.unibas.ch>
Date: Mon, 14 Nov 1994 11:34:44 GMT
Xref: biosci bionet.molbio.yeast:1895 bionet.molbio.embldatabank:396


In article <1994Nov12.071107.19651@comp.bioz.unibas.ch>, Reinhard Doelz writes:
>
>As there is a considerable amount of yeast sequences already published 
>in the sequence database (with functions etc. assigned), what would be 
>a more customer-oriented guess on what fraction of sequences is available 
>to the community in the EMBL database? 
>

and

In article <3a2uas$42r@nntp.Stanford.EDU>, Mike Cherry writes:
>
>We estimate that 52% of the S. cerevisiae genome is present in
>GenBank/EMBL.  This estimate includes some analysis that we have done
>on yeast sequences in GenBank. We built a non-redundant set of
>sequences, or a consensus sequence contig, out of the GenBank
>sequences. When the consensus sequences are combined with the genomic
>sequencing results we obtain a number of 6.4Mbp. The amount of
>publicly available sequences expected from the genomic sequencing
>projects should increase by 4Mbp in 1995. Thus it is advised that you
>regularly search the GenBank/EMBL on the chance that your region of
>interest has been made public.
>

I recently carried out a survey for the EC in which the amount
of database redundancy was estimated. In release 39 of EMBL it
was estimated that there was 34.8% redundancy for Saccharomyces
cerevisiae in FUN.DAT. If we apply this value to release 40 
(12th September 1994) of the EMBL database with 9.8Mb of sequence
we also get a figure of 6.4Mb in agreement of the above figure.

Tom

-- 
================================================
Tomas Flores PhD          Tel:+44-(0)1223 494414
The EBI Data Library      Fax:+44-(0)1223 494400
Hinxton Hall              Email:flores@ebi.ac.uk
Hinxton
Cambridge CB10 1RQ
UK                         "FLAMES >> /dev/null"
================================================

From owner-embldatabank@net.bio.net Tue Nov 15 22:00:00 1994
Path: biosci!bloom-beacon.mit.edu!grapevine.lcs.mit.edu!uhog.mit.edu!sgiblab!spool.mu.edu!howland.reston.ans.net!usc!nic-nac.CSU.net!newshub.sdsu.edu!lif_sci_n3-mac8.sdsu.edu!user
From: vnewman@lifsci.sdsu.edu (Vicky Newman)
Newsgroups: bionet.molbio.embldatabank,bionet.molbio.genbank
Subject: Re: pET22b sequence needed
Date: 15 Nov 1994 23:25:07 GMT
Organization: SDSU Biology
Lines: 17
Message-ID: <vnewman-1511941527530001@lif_sci_n3-mac8.sdsu.edu>
References: <3a7pq7$jpe@oravannahka.Helsinki.FI>
NNTP-Posting-Host: lif_sci_n3-mac8.sdsu.edu
Xref: biosci bionet.molbio.embldatabank:397 bionet.molbio.genbank:1824

In article <3a7pq7$jpe@oravannahka.Helsinki.FI>, zheng@cc.Helsinki.FI
(Huanquan Zheng) wrote:

> I can not find the pET22b in both databases EMBL and GenBank, 
> any hint? 
> 
> Thanks!
> 
> 

 Zheng,
   Novagen is the company that makes that plasmid and they sell
a computer disk with the sequences in them. The disk cost $15.
I do not know of any other way to get the sequences unless you
know someone who already has it.

Vicky

From owner-embldatabank@net.bio.net Wed Nov 16 22:00:00 1994
Newsgroups: bionet.molbio.embldatabank
Path: biosci!ns1.faseb.org!darwin.sura.net!howland.reston.ans.net!pipex!uunet!newsgate.watson.ibm.com!hawnews.watson.ibm.com!puffin!dflash
From: dflash@watson.ibm.com (The dFLASH Project)
Subject: dFLASH server for latest GenBank/PIR/SwissProt (Release 1.1.0)
Sender: news@hawnews.watson.ibm.com (NNTP News Poster)
Message-ID: <CzF9Io.6yD7@hawnews.watson.ibm.com>
Date: Thu, 17 Nov 1994 17:20:48 GMT
Disclaimer: This posting represents the poster's views, not necessarily those of IBM.
Nntp-Posting-Host: puffin.watson.ibm.com
Organization: IBM T.J. Watson Research
Lines: 476


The dFLASH Group wishes to announce release 1.1.0 of the dFLASH electronic
mail server.  Beginning with this release of the server, we will be supporting
the latest release of the GENBANK, PIR and SWISSPROT databases.

In particular, users  can now carry out searches in 
        GENBANK    Release 85 (September 30, 1994)                             
        PIR        Release 42 (September 30, 1994) --> DEFAULT Database <--    
        SWISSPROT  Release 30 (October   30, 1994)                             

Full bibliographic references can optionally be included with the computed
alignments, for all three databases.

Notice that a number of necessary changes and additions have been incorporated
in the "query language".  For example, since we now support a larget set of
databases, "target protein" is not a valid directive anymore! The appended help
file describes the changes and available functions in detail.

NEW FEATURES:
   o    the reported results can now be sorted using a sorting key specified by
   	the user via the "query language"

   o    a smart-email filter has been implemented:  various specification
        errors  are now caught and corrected automatically; notifications
        are sent to the user for all taken actions.

It is our intention to update the server with the latest release of each of the
above dbases within the first two weeks after it becomes available.

The server is accessible through the Internet and is now operating 24 hours a
day, 7 days a week and can be accessed both directly and through "Grail" of the
Oak Ridge National Lab.

Sincerely,

The dFLASH Group





------------------------------>  CUT HERE <-----------------------------------

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! The dFLASH server now supports the GenBank, PIR and SWISSPROT databases.  !!
!!      The supported releases are:                                          !!
!!      GENBANK    Release 85 (September 30, 1994)                           !!
!!      PIR        Release 42 (September 30, 1994) --> DEFAULT Database <--  !!
!!      SWISSPROT  Release 30 (October   30, 1994)                           !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                         N O T A     B E N E                               !!
!! The dFLASH server is still under development.  If some of the answers do  !!
!! not make sense it is very likely that this is due to a bug in our code.   !!
!! Please, email bug reports and comments to dflash@watson.ibm.com with      !!
!! subject line "bug" or "comments".                                         !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Dear User, welcome to Release 1.1.0 of the the dFLASH server!

    The dFLASH server is a "homologous sequence retrieval" program for PROTEIN
and DNA sequences.  

    dFLASH is a parallel system running on an IBM SP/x architecture. Intra-node
communication, evidence integration and alignment are performed in parallel. 
The system has been implemented using IBM's Concert/C language for distributed
programming. The server is available 24 hours a day, 7 days a week and can be
accessed both directly and through "Grail" of the Oak Ridge National Lab.

    Incremental changes and improvements made to the server will be reflected
in the "Message of the day" at the beginning of this help file:  we recommend
that users periodically issue a `send help' request for up to date information
on the server.

    For the moment, we can process requests originating from email addresses of 
the form 
                 user@[machine.][subdomain.]institution.type
                        or 
                 user%machine@[machine.][subdomain.]institution.type
                        or 
                 "string::user"@[machine.][subdomain.]institution.type

We plan to further expand the accepted formats, depending on demand.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

HOW TO USE THE SERVER: You can use the dFLASH facilities by sending an email 
---------------------- message with the appropriate syntax to the address 
		       "dflash@watson.ibm.com" (without the quotes).
			
SUBJECT LINE: It is important that the "Subject" line of your message contain 
------------- one of: { dflash, dFlash, dFLASH, DFLASH }.  Messages whose
              subject line does NOT conform to this rule, **WILL BE LEFT
	      UNPROCESSED**.  The reason for that restriction is that we want
	      to be able to automatically distinguish between messages that are
	      addressed to the server and those that are meant for one  of the
	      group members.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


MESSAGE FORMAT: The typical message-body of an email request looks as follows
---------------

     BLOSUM 62                                  (optional  | DIRECTIVE)
     VERBOSE  10 20                             (optional  | DIRECTIVE)
     SEQUENCES  100                             (optional  | DIRECTIVE)
     ALIGNMENTS 50                              (optional  | DIRECTIVE)
     THRESHOLD  30                              (optional  | DIRECTIVE)
     KEY XMATCH					(optional  | DIRECTIVE)
     SOURCE PROTEIN				(optional  | DIRECTIVE)
     TARGET SP                                  (optional  | DIRECTIVE)
     BEGIN                                      (mandatory | DIRECTIVE)
     >A_ONE_LINE_TEST_SEQ_LABEL                 (mandatory -- notice the '>' )
     a_sequence_of_{amino_acids,nucleic_acids,spaces,tabs}
     1                                          (mandatory terminator)

    The PAM/BLOSUM, VERBOSE, SEQUENCES, ALIGNMENTS, THRESHOLD, KEY, SOURCE and
TARGET directives can appear in any order but they *must* precede the BEGIN
directive. The BEGIN line must be followed by the LABEL line which in turn
should be followed by the test sequence.

    The test sequence should contain at least 18(=proteins)/54(=dna) and not
more than 1500 amino acid or nucleotide characters.  But it may contain ANY
NUMBER of CARRIAGE RETURN TAB and SPACE characters; the latter are not of
course counted while computing the length of the test sequence. There is NO
case sensitivity in the label and the test sequence itself.  If the test
sequence is longer than 1500 characters, the e-mail filter will truncate it to
the first 1500 characters and will send a note to that effect to the originator
of the query; the filter will then submit the truncated sequence to the search
engine.

NOTA BENE:  The words appearing on the lines marked DIRECTIVE above can be in 
----------  lower case or upper case; in other words, you can have pam or PAM, 
	    threshold or THRESHOLD, alignments or ALIGNMENTS, etc.  However,
	    something like ThReShOlD will not work.

    The directive pertaining to the scoring matrix allows the user to specify
the matrix to be used for computing the alignment scores.  You can use either
the word PAM followed by a space and the desired distance, or the word BLOSUM
followed by space and the desired distance.  Examples:  PAM 250, BLOSUM 62 etc.
If no matrix directive is included in the message, PAM 250 is used as the
default.  Depending on the values of the directive TARGET (see below) the
matrix directive if present may be ignored.

    The VERBOSE line allows the sender to also retrieve the data about authors,
dates, entries, superfamilies etc. that are contained in the original PIR,
SwissProt and GenBank databases.  This directive accepts one OR two arguments;
for example:
                verbose         15      25
means "send me the text data for the sequences occupying positions 15 through 25
in the final ranking."  On the other hand,
                verbose         15
means "send me the text data for the sequences occupying the first 15 positions
in the final ranking."  If no verbose line appears, no citation data is sent.

    The SEQUENCES line allows one to restrict the reported sequences to the
given number.  This directive controls the number of entries in the ``short
list'' of recovered database sequences only.  If no SEQUENCES line is given,
the server code will set it to an appropriate default value (100).

    The ALIGNMENTS line allows one to restrict the reported alignments to the
given number.  If no ALIGNMENTS line is given, the server code will set it to
an appropriate default value (100).  The ALIGNMENTS value cannot exceed 5000.
Values larger than 5000 are reduced to 5000.

    The THRESHOLD line allows one to restrict the number of reported sequences
(and thus alignments) to only those whose Score exceeds the given THRESHOLD
value.  If no THRESHOLD line is given the server code will set it to an
appropriate default value.  The default values are 50 for DNA sequences, and 80
for protein sequences.   There is also a *hard* threshold value of 40 for DNA,
and 30 for PROTEIN sequences;  if the user-requested values are smaller than
these hard-thresholds, the requested threshold will be increased accordingly.
NOTA BENE:  (1) if the THRESHOLD value is too small, you are running the danger
----------  of upsetting your mailer program since chances are that you will
            receive a very big file as a reply from the server.  
	    (2) if the THRESHOLD is too high the list of recovered entries 
	    will be empty, or very short; you should decrease the threshold's
	    value and resubmit your query.

    The KEY line allows the user to specify the key to be used when sorting the
results (retrieved sequences) corresponding to a submitted search request.  The
keyword KEY can be followed by one of { SCORE,score,   LENGTH,length,  PEAK,
peak,  GAP,gap,  MATCH,match,  XMATCH,xmatch }.   By setting KEY to one of
{SCORE,score} the user indicates that the retrieved sequences should be sorted
in decreasing order of total computed score.  By setting KEY to one of {LENGTH,
length} the user indicates that the retrieved sequences be sorted in decreasing
order of their length.  Setting KEY to one of {PEAK,peak} will  result in the
retrieved sequences being sorted in decreasing order of the maximum score value
over *any* 18(=proteins)s or 54(=dna) residue window of the recovered match.
Setting KEY to one of {GAP,gap} will  result in the retrieved sequences being
sorted in decreasing order of the maximum gap inserted that will result in a
best alignment with the query strand.  Setting KEY to one of {MATCH,match} will
result in the retrieved sequences being sorted in decreasing order of the total
(=conservative+exact) number of matches with the query strand. Finally, setting
KEY to one of {XMATCH,xmatch} will sort the retrieved sequences in decreasing
order of the number of exact matches with the query strand.  If no KEY directive
is specified, the retrieved sequences will be sorted in order of decreasing
"score".

    The SOURCE line allows the user to specify the type of the query strand as
being a { PROTEIN,protein,    DNA,dna } sequence.  By setting  SOURCE to one of
{PROTEIN,protein} the user indicates that the query strand is a sequence of
amino acids.  By setting SOURCE to one of {DNA,dna}  the user indicates that the
query strand is a sequence of nucleotides. 

    The TARGET line allows the user to specify the type of the target database
to be one of { PIR,pir,   SP,sp,   GB,gb }.  This way the user controls the
database in which the search will be carried out.  If TARGET is set to one of
{PIR,pir}, the search will take place in the PIR database. If TARGET is set to
one of {SP,sp} the search will take place in the SWISSPROT database.   If
TARGET is set to one of {GB,gb},  the search will take place in the GenBank
database. Requests for searches in unsupported databases will be *IGNORED* by
the server and generate a complaint message that will be sent back to the
originator of the request.

If *only* SOURCE is specified, then the TARGET will be set automatically: in
particular, if SOURCE is set to one of { protein, PROTEIN } then the search 
will be carried in the "PIR" database, whereas if source is set to one
of { dna, DNA } then the search will take place in the "GB" database. If
*neither* SOURCE *nor* TARGET lines are given, the server will assume it is
dealing with an amino acid strand and carry out the search against the "PIR"
database.

    The LABEL line allows the user to enter mnemonic information pertaining the
the test sequence, the time of the day etc.  The information of this line will
be reproduced in the Subject line of the reply message.   Notice that the
LABEL line *must* begin with the character '>'.

    All the submitted messages must be terminated by the number '1'  This
number can follow the last character of the test sequence or be in a line by
itself.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

A 'SMART' FILTER:   The email filter that allows for the above message format 
-----------------   has been improved in this release.  In particular, the 
filter is 'smart' enough to catch inconsistencies in the user's message. The 
filter will correct them and send a note to the originator of the message. 
*Unlike* older releases of the filter, this version will submit the corrected
message to the search engine.  The filter will also send one email note to the
originator of the query for *every* change it has carried out; the note(s)
will contain information about the actions that the filter has taken.

For example, if the user's note contains the following lines

	sequences 20
	alignments 50 
	verbose 10 30

the filter will reset the value of 'alignments' to 20, and of the 'verbose_to
to 20, and subsequently submit the corrected query to the search engine. Since
two changes took place, the filter will also send two email notes to the
originator of the query detailing the actions it has taken.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


EXAMPLES:  Two example inputs follow
---------

Example 1: 
                pam 250
                sequences 50
                alignments 30
                threshold  100
		target pir
                begin
                > HBA_HUMAN STANDARD; PRT; 141 AA. P01922; HEMOGLOBIN ALPHA 
                VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
                TTKTYFPHFDLSHGSAQVKGHG     KKVADALTNA
                V A H V D D M PNALSALSDLHAHKLRVDPVNFK
                llshcllvtlaahlpaeftpavhasldkflasvstvltskyr
                1

            Note:  all amino acids  from "VLSP" through "ltskyr  will be used 
            in the search.  Not more than the 50 top scoring sequences will be
            reported in the short list.  Also, the alignments for the top 30
            scoring sequences will be returned.  No reported sequence will have
            score that is less than 100, and the reported sequences will be 
	    sorted in order of decreasing score.  The test sequence is declared
	    to be a sequence of amino acids and should be searched against the 
	    PIR database.

Example 2:
                BLOSUM 62
                KEY  XMATCH
	        BEGIN
                > Sequence sent to dflash on Fri May 20 13:40:17 EDT 1994
                VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
                TTKTYFPHFDLSHGSAQVKGHG     KKVADALTNA

                V A H V D D M PNALSALSDLHAHKLRVDPVNFK

                llshcllvtlaahlpaeftpavhasldkflasvstvltskyr
                1

            Note:  all amino acids  from "VLSP" through "ltskyr"  will be used 
            in the search.  The server code will set the various parameters to
            appropriate default values.  The server will treat the test sequence
	    as a sequence of amino acids (default) and will search against the 
	    "PIR" database (default) with a score threshold set at 80 (default).
	    The retrieved sequences will be reported in order of decreasing
	    number of exact matches with the query strand.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

SCORING MATRICES:
-----------------

    You can use both PAM and BLOSUM scoring matrices for protein searches. These
can be requested via the optional { pam, PAM, blosum, BLOSUM } directive. The
currently supported distances are

for BLOSUM:  30, 35, 40, 45, 50, 55, 60, 62, 65, 70, 75, 80, 85, 90, 100

for PAM:     10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,
             160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280,
             290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410,
             420, 430, 440, 450, 460, 470, 480, 490, and 500.

For DNA searches, the PAM/BLOSUM declarations are ignored


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


NOTE ON ALIGNMENT:
------------------

    The server's alignment code implements the Smith-Waterman algorithm (dynamic
programming) to align each of the retrieved sequences with the test input. This
is *NOT* to be confused with the indexing method that we use to determine the
candidates to be aligned.

    The meaning of the variables in the listing that is returned by the dFLASH
server 

   .....
   ....

   Score Matrix: PAM250
   Max Reported Sequences:  1000
   Max Reported Alignments: 10
   Score Threshold  At: 65

    Id  Label:                                   Score  NRes  Ex% Tot% Sig  Pk
   ----------------------------------------------------------------------------
     1. HAHU hemoglobin alpha chain - human        655   141 100% 100% 100  89
     2. HACZ hemoglobin alpha chain - chimpanzee   655   141 100% 100% 100  89
     3. HACZP hemoglobin alpha chain - pygmy chi   655   141 100% 100% 100  89
     4. HAGO hemoglobin alpha chain - lowland go   654   141  99% 100%  99  89
     5. HAMQP hemoglobin alpha chain - hanuman l   653   141  97% 100%  99  89
     6. B27792 hemoglobin alpha-1 chain - orangu   649   141  97% 100%  99  89
     7. A25126 hemoglobin alpha-1 chain - Sumatr   649   141  97% 100%  99  89
    ...
    .....
    ..

is the following:

NRes:  the number of residues (amino acids) in the recovered match
Score: sequence  similarity score of the recovered sequence based on the
       selected mutation matrix
Ex%:   percentage of *exact* matching residues
Tot%:  percentage of *total* (=exact+conservative) matching residues
Sig:   100 times the ratio between the actual computed score and the score
       obtained by matching the retrieved sub-segment with itself; the
       denominator is the maximum obtainable score for the sub-segment in
       question (all gaps removed).
Peak:  the maximum score value over *any* 18(=proteins)s or 54(=dna) residue
       window of the recovered  match.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


TO OBTAIN HELP:
---------------
    You can obtain this message at any moment by sending a message with one of:
{ dflash, dFlash, dFLASH, DFLASH } in the "Subject" line and a body containing
one of { help, HELP, send help, SEND HELP }.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


TO OBTAIN ON-LINE REPRINTS OF PAPERS
------------------------------------
    You can obtain reprints (in PostScript) of relevant papers by sending a
message with one of: { dflash, dFlash, dFLASH, DFLASH } in the "Subject" line
and a body containing 

one of {flashpaper, FLASHPAPER, send flashpaper, SEND FLASHPAPER }        
                                        ---> returns to the originator of the 
                                        request a copy of the FLASH paper
					that will appear in `CABIOS'

one of {dflashpaper, DFLASHPAPER, send dflashpaper, SEND DFLASHPAPER }        
                                        ---> returns to the originator of the 
                                        request a copy of a paper that contains
                                        a description of dFLASH that has 
					appeared in `IEEE Computational Science
					and Engineering'

one of {concertpaper, CONCERTPAPER, send concertpaper, SEND CONCERTPAPER } 
                                        ---> returns to the originator of the 
                                        request a copy of a high-level paper
                                        describing the CONCERT/C language

one of {bayespaper, BAYESPAPER, send bayespaper, SEND BAYESPAPER } 
                                        --> returns to the originator of the 
                                        request a copy of a paper describing 
                                        a computer-vision application based 
                                        on similar to dFLASH indexing prin-
					ciples that will appear in `CVGIP-IU'

    Notice there can only be *one* such request per message! Also, make sure
you do not issue a new paper request until after the previous request has
returned to you all of the postscript files and you have removed the latter
from your mailbox:  the returned messages are rather big (between 1 and 4
Megabytes) and are guaranteed to overflow the disk set aside for mail messages
on most systems.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


Thank you for your interest in the dFLASH server. 

                                        Sincerely,

                                        The dFLASH Group


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

COMMENTS??  We will appreciate receiving your feedback, suggestions, comments, 
----------  or bug reports; all of these can be sent to "dflash@watson.ibm.com" 
	    Please, make sure your  "Subject" line contains the word "comments"
	    or "bug".

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

REFERENCES  If you make use of the dFLASH server, please reference 
----------

     A. Califano and I. Rigoutsos, "FLASH: A Fast Look-up Algorithm for String
     Homology."  In  CABIOS.  To appear.

     I. Rigoutsos and A. Califano, "Searching In Parallel for Similar Protein
     Strings."  In IEEE Computational Science and Engineering, June 1994.

If you wish to find out more, you can contact Isidore Rigoutsos and Andrea
Califano at {rigoutso,acal}@watson.ibm.com


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


For more information on the Concert/C language, please refer to

     J. Auerbach, D. Bacon, A. Goldberg, G. Goldszmidt, A. Gopal, M. Kennedy,
     A. Lowry, J. Russell, W. Silverman, R. Strom, D. Yellin, and S. Yemini,
     "High-level language support  for programming reliable distributed
     systems."  In Proceedings of the International Conference on Computer
     Languages, April 1992, Oakland, California.

or contact Jim Russell (jrussell@watson.ibm.com)

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


------------------------------>  CUT HERE <-----------------------------------


From owner-embldatabank@net.bio.net Fri Nov 18 22:00:00 1994
Path: biosci!biosci!not-for-mail
From: dflash@watson.ibm.com (The dFLASH Project)
Newsgroups: bionet.announce,bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: dFLASH server for latest GenBank/PIR/SwissProt (Release 1.1.0)
Date: 18 Nov 1994 22:25:42 -0800
Organization: IBM T.J. Watson Research
Lines: 473
Sender: kristoff@net.bio.net
Approved: bionews-moderator@net.bio.net
Distribution: world
Message-ID: <CzF8qG.7Brs@hawnews.watson.ibm.com>
NNTP-Posting-Host: net.bio.net
Disclaimer: This posting represents the poster's views, not necessarily those of IBM.
Xref: biosci bionet.announce:1583 bionet.molbio.genbank:1827 bionet.molbio.embldatabank:399

The dFLASH Group wishes to announce release 1.1.0 of the dFLASH electronic
mail server.  Beginning with this release of the server, we will be supporting
the latest release of the GENBANK, PIR and SWISSPROT databases.

In particular, users  can now carry out searches in
        GENBANK    Release 85 (September 30, 1994)
        PIR        Release 42 (September 30, 1994) --> DEFAULT Database <--
        SWISSPROT  Release 30 (October   30, 1994)

Full bibliographic references can optionally be included with the computed
alignments, for all three databases.

Notice that a number of necessary changes and additions have been incorporated
in the "query language".  For example, since we now support a larget set of
databases, "target protein" is not a valid directive anymore! The appended help
file describes the changes and available functions in detail.

NEW FEATURES:
   o    the reported results can now be sorted using a sorting key specified by
   	the user via the "query language"

   o    a smart-email filter has been implemented:  various specification
        errors  are now caught and corrected automatically; notifications
        are sent to the user for all taken actions.

It is our intention to update the server with the latest release of each of the
above dbases within the first two weeks after it becomes available.

The server is accessible through the Internet and is now operating 24 hours a
day, 7 days a week and can be accessed both directly and through "Grail" of the
Oak Ridge National Lab.

Sincerely,

The dFLASH Group




------------------------------>  CUT HERE <-----------------------------------

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! The dFLASH server now supports the GenBank, PIR and SWISSPROT databases.  !!
!!      The supported releases are:                                          !!
!!      GENBANK    Release 85 (September 30, 1994)                           !!
!!      PIR        Release 42 (September 30, 1994) --> DEFAULT Database <--  !!
!!      SWISSPROT  Release 30 (October   30, 1994)                           !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                         N O T A     B E N E                               !!
!! The dFLASH server is still under development.  If some of the answers do  !!
!! not make sense it is very likely that this is due to a bug in our code.   !!
!! Please, email bug reports and comments to dflash@watson.ibm.com with      !!
!! subject line "bug" or "comments".                                         !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Dear User, welcome to Release 1.1.0 of the the dFLASH server!

    The dFLASH server is a "homologous sequence retrieval" program for PROTEIN
and DNA sequences.

    dFLASH is a parallel system running on an IBM SP/x architecture. Intra-node
communication, evidence integration and alignment are performed in parallel.
The system has been implemented using IBM's Concert/C language for distributed
programming. The server is available 24 hours a day, 7 days a week and can be
accessed both directly and through "Grail" of the Oak Ridge National Lab.

    Incremental changes and improvements made to the server will be reflected
in the "Message of the day" at the beginning of this help file:  we recommend
that users periodically issue a `send help' request for up to date information
on the server.

    For the moment, we can process requests originating from email addresses of
the form
                 user@[machine.][subdomain.]institution.type
                        or
                 user%machine@[machine.][subdomain.]institution.type
                        or
                 "string::user"@[machine.][subdomain.]institution.type

We plan to further expand the accepted formats, depending on demand.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

HOW TO USE THE SERVER: You can use the dFLASH facilities by sending an email
---------------------- message with the appropriate syntax to the address
		       "dflash@watson.ibm.com" (without the quotes).
			
SUBJECT LINE: It is important that the "Subject" line of your message contain
------------- one of: { dflash, dFlash, dFLASH, DFLASH }.  Messages whose
              subject line does NOT conform to this rule, **WILL BE LEFT
	      UNPROCESSED**.  The reason for that restriction is that we want
	      to be able to automatically distinguish between messages that are
	      addressed to the server and those that are meant for one  of the
	      group members.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


MESSAGE FORMAT: The typical message-body of an email request looks as follows
---------------

     BLOSUM 62                                  (optional  | DIRECTIVE)
     VERBOSE  10 20                             (optional  | DIRECTIVE)
     SEQUENCES  100                             (optional  | DIRECTIVE)
     ALIGNMENTS 50                              (optional  | DIRECTIVE)
     THRESHOLD  30                              (optional  | DIRECTIVE)
     KEY XMATCH					(optional  | DIRECTIVE)
     SOURCE PROTEIN				(optional  | DIRECTIVE)
     TARGET SP                                  (optional  | DIRECTIVE)
     BEGIN                                      (mandatory | DIRECTIVE)
     >A_ONE_LINE_TEST_SEQ_LABEL                 (mandatory -- notice the '>' )
     a_sequence_of_{amino_acids,nucleic_acids,spaces,tabs}
     1                                          (mandatory terminator)

    The PAM/BLOSUM, VERBOSE, SEQUENCES, ALIGNMENTS, THRESHOLD, KEY, SOURCE and
TARGET directives can appear in any order but they *must* precede the BEGIN
directive. The BEGIN line must be followed by the LABEL line which in turn
should be followed by the test sequence.

    The test sequence should contain at least 18(=proteins)/54(=dna) and not
more than 1500 amino acid or nucleotide characters.  But it may contain ANY
NUMBER of CARRIAGE RETURN TAB and SPACE characters; the latter are not of
course counted while computing the length of the test sequence. There is NO
case sensitivity in the label and the test sequence itself.  If the test
sequence is longer than 1500 characters, the e-mail filter will truncate it to
the first 1500 characters and will send a note to that effect to the originator
of the query; the filter will then submit the truncated sequence to the search
engine.

NOTA BENE:  The words appearing on the lines marked DIRECTIVE above can be in
----------  lower case or upper case; in other words, you can have pam or PAM,
	    threshold or THRESHOLD, alignments or ALIGNMENTS, etc.  However,
	    something like ThReShOlD will not work.

    The directive pertaining to the scoring matrix allows the user to specify
the matrix to be used for computing the alignment scores.  You can use either
the word PAM followed by a space and the desired distance, or the word BLOSUM
followed by space and the desired distance.  Examples:  PAM 250, BLOSUM 62 etc.
If no matrix directive is included in the message, PAM 250 is used as the
default.  Depending on the values of the directive TARGET (see below) the
matrix directive if present may be ignored.

    The VERBOSE line allows the sender to also retrieve the data about authors,
dates, entries, superfamilies etc. that are contained in the original PIR,
SwissProt and GenBank databases.  This directive accepts one OR two arguments;
for example:
                verbose         15      25
means "send me the text data for the sequences occupying positions 15 through 25
in the final ranking."  On the other hand,
                verbose         15
means "send me the text data for the sequences occupying the first 15 positions
in the final ranking."  If no verbose line appears, no citation data is sent.

    The SEQUENCES line allows one to restrict the reported sequences to the
given number.  This directive controls the number of entries in the ``short
list'' of recovered database sequences only.  If no SEQUENCES line is given,
the server code will set it to an appropriate default value (100).

    The ALIGNMENTS line allows one to restrict the reported alignments to the
given number.  If no ALIGNMENTS line is given, the server code will set it to
an appropriate default value (100).  The ALIGNMENTS value cannot exceed 5000.
Values larger than 5000 are reduced to 5000.

    The THRESHOLD line allows one to restrict the number of reported sequences
(and thus alignments) to only those whose Score exceeds the given THRESHOLD
value.  If no THRESHOLD line is given the server code will set it to an
appropriate default value.  The default values are 50 for DNA sequences, and 80
for protein sequences.   There is also a *hard* threshold value of 40 for DNA,
and 30 for PROTEIN sequences;  if the user-requested values are smaller than
these hard-thresholds, the requested threshold will be increased accordingly.
NOTA BENE:  (1) if the THRESHOLD value is too small, you are running the danger
----------  of upsetting your mailer program since chances are that you will
            receive a very big file as a reply from the server.
	    (2) if the THRESHOLD is too high the list of recovered entries
	    will be empty, or very short; you should decrease the threshold's
	    value and resubmit your query.

    The KEY line allows the user to specify the key to be used when sorting the
results (retrieved sequences) corresponding to a submitted search request.  The
keyword KEY can be followed by one of { SCORE,score,   LENGTH,length,  PEAK,
peak,  GAP,gap,  MATCH,match,  XMATCH,xmatch }.   By setting KEY to one of
{SCORE,score} the user indicates that the retrieved sequences should be sorted
in decreasing order of total computed score.  By setting KEY to one of {LENGTH,
length} the user indicates that the retrieved sequences be sorted in decreasing
order of their length.  Setting KEY to one of {PEAK,peak} will  result in the
retrieved sequences being sorted in decreasing order of the maximum score value
over *any* 18(=proteins)s or 54(=dna) residue window of the recovered match.
Setting KEY to one of {GAP,gap} will  result in the retrieved sequences being
sorted in decreasing order of the maximum gap inserted that will result in a
best alignment with the query strand.  Setting KEY to one of {MATCH,match} will
result in the retrieved sequences being sorted in decreasing order of the total
(=conservative+exact) number of matches with the query strand. Finally, setting
KEY to one of {XMATCH,xmatch} will sort the retrieved sequences in decreasing
order of the number of exact matches with the query strand.  If no KEY directive
is specified, the retrieved sequences will be sorted in order of decreasing
"score".

    The SOURCE line allows the user to specify the type of the query strand as
being a { PROTEIN,protein,    DNA,dna } sequence.  By setting  SOURCE to one of
{PROTEIN,protein} the user indicates that the query strand is a sequence of
amino acids.  By setting SOURCE to one of {DNA,dna}  the user indicates that the
query strand is a sequence of nucleotides.

    The TARGET line allows the user to specify the type of the target database
to be one of { PIR,pir,   SP,sp,   GB,gb }.  This way the user controls the
database in which the search will be carried out.  If TARGET is set to one of
{PIR,pir}, the search will take place in the PIR database. If TARGET is set to
one of {SP,sp} the search will take place in the SWISSPROT database.   If
TARGET is set to one of {GB,gb},  the search will take place in the GenBank
database. Requests for searches in unsupported databases will be *IGNORED* by
the server and generate a complaint message that will be sent back to the
originator of the request.

If *only* SOURCE is specified, then the TARGET will be set automatically: in
particular, if SOURCE is set to one of { protein, PROTEIN } then the search
will be carried in the "PIR" database, whereas if source is set to one
of { dna, DNA } then the search will take place in the "GB" database. If
*neither* SOURCE *nor* TARGET lines are given, the server will assume it is
dealing with an amino acid strand and carry out the search against the "PIR"
database.

    The LABEL line allows the user to enter mnemonic information pertaining the
the test sequence, the time of the day etc.  The information of this line will
be reproduced in the Subject line of the reply message.   Notice that the
LABEL line *must* begin with the character '>'.

    All the submitted messages must be terminated by the number '1'  This
number can follow the last character of the test sequence or be in a line by
itself.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

A 'SMART' FILTER:   The email filter that allows for the above message format
-----------------   has been improved in this release.  In particular, the
filter is 'smart' enough to catch inconsistencies in the user's message. The
filter will correct them and send a note to the originator of the message.
*Unlike* older releases of the filter, this version will submit the corrected
message to the search engine.  The filter will also send one email note to the
originator of the query for *every* change it has carried out; the note(s)
will contain information about the actions that the filter has taken.

For example, if the user's note contains the following lines

	sequences 20
	alignments 50
	verbose 10 30

the filter will reset the value of 'alignments' to 20, and of the 'verbose_to
to 20, and subsequently submit the corrected query to the search engine. Since
two changes took place, the filter will also send two email notes to the
originator of the query detailing the actions it has taken.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


EXAMPLES:  Two example inputs follow
---------

Example 1:
                pam 250
                sequences 50
                alignments 30
                threshold  100
		target pir
                begin
                > HBA_HUMAN STANDARD; PRT; 141 AA. P01922; HEMOGLOBIN ALPHA
                VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
                TTKTYFPHFDLSHGSAQVKGHG     KKVADALTNA
                V A H V D D M PNALSALSDLHAHKLRVDPVNFK
                llshcllvtlaahlpaeftpavhasldkflasvstvltskyr
                1

            Note:  all amino acids  from "VLSP" through "ltskyr  will be used
            in the search.  Not more than the 50 top scoring sequences will be
            reported in the short list.  Also, the alignments for the top 30
            scoring sequences will be returned.  No reported sequence will have
            score that is less than 100, and the reported sequences will be
	    sorted in order of decreasing score.  The test sequence is declared
	    to be a sequence of amino acids and should be searched against the
	    PIR database.

Example 2:
                BLOSUM 62
                KEY  XMATCH
	        BEGIN
                > Sequence sent to dflash on Fri May 20 13:40:17 EDT 1994
                VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
                TTKTYFPHFDLSHGSAQVKGHG     KKVADALTNA

                V A H V D D M PNALSALSDLHAHKLRVDPVNFK

                llshcllvtlaahlpaeftpavhasldkflasvstvltskyr
                1

            Note:  all amino acids  from "VLSP" through "ltskyr"  will be used
            in the search.  The server code will set the various parameters to
            appropriate default values.  The server will treat the test sequence
	    as a sequence of amino acids (default) and will search against the
	    "PIR" database (default) with a score threshold set at 80 (default).
	    The retrieved sequences will be reported in order of decreasing
	    number of exact matches with the query strand.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

SCORING MATRICES:
-----------------

    You can use both PAM and BLOSUM scoring matrices for protein searches. These
can be requested via the optional { pam, PAM, blosum, BLOSUM } directive. The
currently supported distances are

for BLOSUM:  30, 35, 40, 45, 50, 55, 60, 62, 65, 70, 75, 80, 85, 90, 100

for PAM:     10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,
             160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280,
             290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410,
             420, 430, 440, 450, 460, 470, 480, 490, and 500.

For DNA searches, the PAM/BLOSUM declarations are ignored


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


NOTE ON ALIGNMENT:
------------------

    The server's alignment code implements the Smith-Waterman algorithm (dynamic
programming) to align each of the retrieved sequences with the test input. This
is *NOT* to be confused with the indexing method that we use to determine the
candidates to be aligned.

    The meaning of the variables in the listing that is returned by the dFLASH
server

   .....
   ....

   Score Matrix: PAM250
   Max Reported Sequences:  1000
   Max Reported Alignments: 10
   Score Threshold  At: 65

    Id  Label:                                   Score  NRes  Ex% Tot% Sig  Pk
   ----------------------------------------------------------------------------
     1. HAHU hemoglobin alpha chain - human        655   141 100% 100% 100  89
     2. HACZ hemoglobin alpha chain - chimpanzee   655   141 100% 100% 100  89
     3. HACZP hemoglobin alpha chain - pygmy chi   655   141 100% 100% 100  89
     4. HAGO hemoglobin alpha chain - lowland go   654   141  99% 100%  99  89
     5. HAMQP hemoglobin alpha chain - hanuman l   653   141  97% 100%  99  89
     6. B27792 hemoglobin alpha-1 chain - orangu   649   141  97% 100%  99  89
     7. A25126 hemoglobin alpha-1 chain - Sumatr   649   141  97% 100%  99  89
    ...
    .....
    ..

is the following:

NRes:  the number of residues (amino acids) in the recovered match
Score: sequence  similarity score of the recovered sequence based on the
       selected mutation matrix
Ex%:   percentage of *exact* matching residues
Tot%:  percentage of *total* (=exact+conservative) matching residues
Sig:   100 times the ratio between the actual computed score and the score
       obtained by matching the retrieved sub-segment with itself; the
       denominator is the maximum obtainable score for the sub-segment in
       question (all gaps removed).
Peak:  the maximum score value over *any* 18(=proteins)s or 54(=dna) residue
       window of the recovered  match.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


TO OBTAIN HELP:
---------------
    You can obtain this message at any moment by sending a message with one of:
{ dflash, dFlash, dFLASH, DFLASH } in the "Subject" line and a body containing
one of { help, HELP, send help, SEND HELP }.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


TO OBTAIN ON-LINE REPRINTS OF PAPERS
------------------------------------
    You can obtain reprints (in PostScript) of relevant papers by sending a
message with one of: { dflash, dFlash, dFLASH, DFLASH } in the "Subject" line
and a body containing

one of {flashpaper, FLASHPAPER, send flashpaper, SEND FLASHPAPER }
                                        ---> returns to the originator of the
                                        request a copy of the FLASH paper
					that will appear in `CABIOS'

one of {dflashpaper, DFLASHPAPER, send dflashpaper, SEND DFLASHPAPER }
                                        ---> returns to the originator of the
                                        request a copy of a paper that contains
                                        a description of dFLASH that has
					appeared in `IEEE Computational Science
					and Engineering'

one of {concertpaper, CONCERTPAPER, send concertpaper, SEND CONCERTPAPER }
                                        ---> returns to the originator of the
                                        request a copy of a high-level paper
                                        describing the CONCERT/C language

one of {bayespaper, BAYESPAPER, send bayespaper, SEND BAYESPAPER }
                                        --> returns to the originator of the
                                        request a copy of a paper describing
                                        a computer-vision application based
                                        on similar to dFLASH indexing prin-
					ciples that will appear in `CVGIP-IU'

    Notice there can only be *one* such request per message! Also, make sure
you do not issue a new paper request until after the previous request has
returned to you all of the postscript files and you have removed the latter
from your mailbox:  the returned messages are rather big (between 1 and 4
Megabytes) and are guaranteed to overflow the disk set aside for mail messages
on most systems.


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


Thank you for your interest in the dFLASH server.

                                        Sincerely,

                                        The dFLASH Group


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

COMMENTS??  We will appreciate receiving your feedback, suggestions, comments,
----------  or bug reports; all of these can be sent to "dflash@watson.ibm.com"
	    Please, make sure your  "Subject" line contains the word "comments"
	    or "bug".

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

REFERENCES  If you make use of the dFLASH server, please reference
----------

     A. Califano and I. Rigoutsos, "FLASH: A Fast Look-up Algorithm for String
     Homology."  In  CABIOS.  To appear.

     I. Rigoutsos and A. Califano, "Searching In Parallel for Similar Protein
     Strings."  In IEEE Computational Science and Engineering, June 1994.

If you wish to find out more, you can contact Isidore Rigoutsos and Andrea
Califano at {rigoutso,acal}@watson.ibm.com


$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


For more information on the Concert/C language, please refer to

     J. Auerbach, D. Bacon, A. Goldberg, G. Goldszmidt, A. Gopal, M. Kennedy,
     A. Lowry, J. Russell, W. Silverman, R. Strom, D. Yellin, and S. Yemini,
     "High-level language support  for programming reliable distributed
     systems."  In Proceedings of the International Conference on Computer
     Languages, April 1992, Oakland, California.

or contact Jim Russell (jrussell@watson.ibm.com)

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


------------------------------>  CUT HERE <-----------------------------------

From owner-embldatabank@net.bio.net Tue Nov 22 22:00:00 1994
Path: biosci!rutgers!gatech!newsxfer.itd.umich.edu!zip.eecs.umich.edu!umn.edu!newsdist.tc.umn.edu!mayonews.mayo.edu!NewsWatcher!user
From: ander@fermat.mayo.edu (RAA)
Newsgroups: bionet.molbio.embldatabank
Subject: Genbank Search Result?????
Followup-To: bionet.molbio.embldatabank
Date: 21 Nov 1994 20:05:15 GMT
Organization: Mayo Foundation
Lines: 6
Distribution: world
Message-ID: <ander-211194140513@129.176.176.10>
NNTP-Posting-Host: romulus.mayo.edu

I have isolated what I thought to be a new cDNA clone.  I ran this against
the GenBank.  I found a 330 bp cDNA match that had recently been identified
by a group called  "Genexpress cDNA program".  Does anyone know what this
is?  Is this a Biotech company or part of the human genome project?  Any
comments would be apperciated.
RAA

From owner-embldatabank@net.bio.net Tue Nov 22 22:00:00 1994
Newsgroups: bionet.molbio.embldatabank
Path: biosci!agate!howland.reston.ans.net!pipex!sunsite.doc.ic.ac.uk!daresbury!hgmp.mrc.ac.uk!ebi.ac.uk!tome
From: tome@ebi.ac.uk (Patricia Rodriguez-Tome)
Subject: Re: Genbank Search Result?????
Message-ID: <CzpsnL.EF6@ebi.ac.uk>
Lines: 27
Sender: news@ebi.ac.uk (Mr news)
Reply-To: tome@ebi.ac.uk (Patricia Rodriguez-Tome)
Organization: European Bioinformatics Institute (EMBL) - UK
X-Newsreader: mxrn 6.18-16
References:  <ander-211194140513@129.176.176.10>
Date: Wed, 23 Nov 1994 09:50:09 GMT


In article <ander-211194140513@129.176.176.10>, ander@fermat.mayo.edu (RAA) writes:
>I have isolated what I thought to be a new cDNA clone.  I ran this against
>the GenBank.  I found a 330 bp cDNA match that had recently been identified
>by a group called  "Genexpress cDNA program".  Does anyone know what this
>is?  Is this a Biotech company or part of the human genome project?  Any
>comments would be apperciated.
>RAA
>

The "Genexpress  cDNA program" is a research project from Genethon in France

Genethon is a Human Genome Research Center that has the academic status 
in France.
If you want more information you can mail genexpress@genethon.fr
or Remi.Houlgatte@isagrogn.vjf.inserm.fr

Regards

Pat RT
-- 
=======================================================================
Dr. Patricia Rodriguez-Tome		| Email:tome@ebi.ac.uk
EBI - European Bioinformatics Institute	| URL:	http://www.ebi.ac.uk
Hinxton Hall, Hinxton			| Tel:	+44 (0)223 494 414
Cambridge CB10 1RQ, UK			| Fax:	+44 (0)223 494 468
========================================================================

From owner-embldatabank@net.bio.net Wed Nov 23 22:00:00 1994
Path: biosci!bloom-beacon.mit.edu!gatech!howland.reston.ans.net!pipex!uunet!zib-berlin.de!informatik.tu-muenchen.de!lrz-muenchen.de!ipp-garching.mpg.de!alf.biochem.mpg.de!krasel
From: krasel@alf.biochem.mpg.de (Cornelius Krasel)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: Genbank Search Result?????
Date: 24 Nov 1994 10:17:55 GMT
Organization: Rechenzentrum der Max-Planck-Gesellschaft in Garching
Lines: 22
Message-ID: <3b1p8j$2b2l@sat.ipp-garching.mpg.de>
References: <ander-211194140513@129.176.176.10> <CzpsnL.EF6@ebi.ac.uk>
NNTP-Posting-Host: alf.biochem.mpg.de
X-Newsreader: TIN [version 1.2 PL2]

Patricia Rodriguez-Tome (tome@ebi.ac.uk) wrote:

> In article <ander-211194140513@129.176.176.10>, ander@fermat.mayo.edu (RAA) writes:
> >I have isolated what I thought to be a new cDNA clone.  I ran this against
> >the GenBank.  I found a 330 bp cDNA match that had recently been identified
> >by a group called  "Genexpress cDNA program".  Does anyone know what this
> >is?  Is this a Biotech company or part of the human genome project?  Any
> >comments would be apperciated.

> The "Genexpress  cDNA program" is a research project from Genethon in France

There is another "Genexpress cDNA program" which runs at the University
of Munich. They sequence ESTs from human heart. You can contact the
leader, Dr. Brigitte Obermaier, at
	obermaier@vms.biochem.mpg.de

--Cornelius.

--
/* Cornelius Krasel, Abt. Lohse, Genzentrum, D-82152 Martinsried, Germany  */
/* email: krasel@alf.biochem.mpg.de                 fax: +49 89 8578 3795  */
/* "Science is the game you play with God to find out what His rules are." */

From owner-embldatabank@net.bio.net Sun Nov 27 22:00:00 1994
Path: biosci!bloom-beacon.mit.edu!usc!nic-nac.CSU.net!newshub.sdsu.edu!ucsnews!ucssun1!shumard
From: shumard@ucssun1.sdsu.edu (shumard)
Newsgroups: bionet.molbio.embldatabank
Subject: what i'm doing in life
Date: 28 Nov 1994 05:22:54 GMT
Organization: San Diego State University Computing Services
Lines: 4
Message-ID: <3bbpfe$q7d@gondor.sdsu.edu>
NNTP-Posting-Host: 130.191.1.100
X-Newsreader: TIN [version 1.2 PL0]

Hi, I'm a PhD student (molecular biology) at San Diego State University
I would like to know who else is surfing the net.  My address is 
t1001@ucssun1.sdsu.edu


From owner-embldatabank@net.bio.net Mon Nov 28 22:00:00 1994
Path: biosci!TAWA.FRI.CRI.NZ!BISHOPS
From: BISHOPS@TAWA.FRI.CRI.NZ (Sharon Bishop-Hurley)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: what i'm doing in life
Date: 29 Nov 1994 11:34:20 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 12
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <edae6c70@tawa.fri.cri.nz>
NNTP-Posting-Host: net.bio.net



HI, saw your message on the bionet news and thought i would send you a message. 
My name is Sharon Bishop-Hurley and i have just recently started a PhD in 
molecular biology at New Zealand Forest Research Institute.

My area of research is embryo development in radiata pine with the aim of 
isolating some embryo-specific genes.

Keep cool
sharon


From owner-embldatabank@net.bio.net Tue Nov 29 22:00:00 1994
Path: biosci!rutgers!gatech!howland.reston.ans.net!pipex!uunet!newstf01.news.aol.com!newsbf01.news.aol.com!not-for-mail
From: drrutledge@aol.com (Drrutledge)
Newsgroups: bionet.molbio.embldatabank
Subject: How do you download from genbank?
Date: 29 Nov 1994 21:35:05 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 3
Sender: news@newsbf01.news.aol.com
Message-ID: <3bgocp$r06@newsbf01.news.aol.com>
NNTP-Posting-Host: newsbf01.news.aol.com

What is the easiest way to download a sequence from genbank to then be
used in the new Oligo for windows program?  Should I be getting to genbank
through world wide web (mosaic)?  If so, what is the URL address?

From owner-embldatabank@net.bio.net Tue Nov 29 22:00:00 1994
Path: biosci!BORDUAS.NLM.NIH.GOV!francis
From: francis@BORDUAS.NLM.NIH.GOV (Francis Ouellette)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: How do you download from genbank?
Date: 29 Nov 1994 19:21:27 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 17
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9411300321.AA08114@borduas.nlm.nih.gov>
Reply-To: francis@ncbi.nlm.nih.gov
NNTP-Posting-Host: net.bio.net


> What is the easiest way to download a sequence from genbank to then be
> used in the new Oligo for windows program?  Should I be getting to genbank
> through world wide web (mosaic)?  If so, what is the URL address?

The URL for GenBank is:

http://www.ncbi.nlm.nih.gov

regards,

francis

--
| B.F. Francis Ouellette  
|
| francis@ncbi.nlm.nih.gov   

