From owner-embldatabank@net.bio.net Wed Dec 01 22:00:00 1993
Path: biosci!bcm!cs.utexas.edu!uunet!hearst.acc.Virginia.EDU!murdoch!dayhoff.med.Virginia.EDU!wrp
From: wrp@dayhoff.med.Virginia.EDU (William R. Pearson)
Newsgroups: bionet.software,bionet.molbio.embldatabank,bionet.molbio.genbank
Subject: Fasta updates
Message-ID: <CH2Cqn.FG4@murdoch.acc.Virginia.EDU>
Date: 25 Nov 93 19:56:47 GMT
Sender: usenet@murdoch.acc.Virginia.EDU
Organization: University of Virginia
Lines: 9
Xref: biosci bionet.software:6625 bionet.molbio.embldatabank:263 bionet.molbio.genbank:1456


	New versions of the FASTA package of programs are available
from "virginia.EDU" in pub/fasta/fasta16c24b.shar(.Z) and
pub/fasta/beta/fasta16c31a.shar(.Z).  The programs have been belatedly
modified to read the current EMBL format and the beta/fasta16c31.shar
program has been more extensively tested (and corrected) to work with
the NCBI BLASTP/N formats.

Bill Pearson

From owner-embldatabank@net.bio.net Mon Dec 06 22:00:00 1993
Path: biosci!JUDY.ENG.UCI.EDU!liang
From: liang@JUDY.ENG.UCI.EDU (liang)
Newsgroups: bionet.molbio.embldatabank
Subject: CHINESES_BIOTECH_NET_FOUNDED
Message-ID: <9312061945.AA20484@judy.eng.uci.edu>
Date: 6 Dec 93 19:45:46 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 12


CBNet (Chinese Biotechnology Network) is a non-profit organization composed of
professionals in biological, chemical, medical sciences, engineering
and related fields.  The CBNet sponsors the Chinese Biotechnology Internet
Forum (CBIF) newsletter. To subscribe CBIF, please send an email to
Listserv@UCSD.Edu with the message body: Add CB-Net.

          
       




From owner-embldatabank@net.bio.net Mon Dec 06 22:00:00 1993
Path: biosci!JUDY.ENG.UCI.EDU!liang
From: liang@JUDY.ENG.UCI.EDU (liang)
Newsgroups: bionet.molbio.embldatabank
Subject: CHINESES_BIOTECH_NET_FOUNDED
Message-ID: <9312060500.AA19381@judy.eng.uci.edu>
Date: 6 Dec 93 05:00:49 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 12


CBNet (Chinese Biotechnology Network) is a non-profit organization composed of
professionals in biological, chemical, medical sciences, engineering
and related fields.  The CBNet sponsors the Chinese Biotechnology Internet
Forum (CBIF) newsletter. To subscribe CBIF, please send an email to
Listserv@UCSD.Edu with the message body: Add CB-Net.

          
       




From owner-embldatabank@net.bio.net Wed Dec 08 22:00:00 1993
Path: biosci!daresbury!zeta.bmc.uu.se!corax.udac.uu.se!sunic!pipex!uunet!caen!batcomputer!ghost.dsi.unimi.it!genes!pongor
From: pongor@genes.icgeb.trieste.it (Sandor Pongor)
Newsgroups: bionet.molbio.embldatabank
Subject: Mail server for protein functional domain homologies - ICGEB Trieste
Keywords: mail server, molecular biology, protein domains, sequences
Message-ID: <1993Dec9.100711.21830@genes.icgeb.trieste.it>
Date: 9 Dec 93 10:07:11 GMT
Organization: ICGEB
Lines: 148



        SS   BBB      A      SS   EEEE      H   H  EEEE L    PPP  
       S  S  B  B    A A    S  S  E         H   H  E    L    P  P 
       S     B B    A   A   S     E         H   H  E    L    P  P 
        S    BB     AAAAA    S    EEE       HHHHH  EEE  L    PPP  
         S   B B    A   A     S   E         H   H  E    L    P    
          S  B  B   A   A      S  E         H   H  E    L    P    
      S   S  B  B   A   A  S   S  E         H   H  E    L    P    
       SSS   BBB    A   A   SSS   EEEE      H   H  EEEE LLLL P    

              ----------------------------------
     This is the help file of the SBASE Email Server at the
International Centre for Genetic Engineering and Biotechnology
     AREA Science Park, Padriciano 99, 34012 Trieste, Italy.
              ----------------------------------

The SBASE Email Server is at present experimental. Please send comments 
          to sbase-comment@icgeb.trieste.it. Thanks.


The SBASE e-mail server accepts a specially formatted mail message
containing a protein query sequence, and, as a response, it sends the list
of the most probable domain homologies. A database search is performed 
against the SBASE library of protein domains using the BLAST algorithm, 
and the search results, provided with annotations, are returned in a mail 
message.


How to use:

Getting HELP:
============

This help file can be gotten by sending an email to

    	    sbase@icgeb.trieste.it

containing the word HELP alone in the first line of the message body.

Making a QUERY:
===============

Send e-mail to sbase@icgeb.trieste.it using the below example

Note: the parameters before the token BEGIN are optional, (here the 
defaults are listed), the lines after BEGIN are required.

Example:

MATRIX PAM120
SCORE PARAMETER 35
ANNOTATIONS YES
BEGIN
> mysequence
LRNGVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYG
GLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYG


Response time
=============

Requests are handled immediately, in serial order. At present, response
time is quite short and is restricted by the network load rather than CPU
availability. Please note that large output files, such as sometimes occur
with very short query sequences, may need a long time to traverse the net.

Evaluation of the output:
=========================

The output of the server are BLAST search results against the SBASE 
protein domain library. The output file contains the BLAST search
results, organized as follows:

1) List of the best scoring domain entries. As SBASE entries are named 
by domain names (function, structure, etc.), this list already may give 
some information on the expected domain composition.

2) List of alignments. For each SBASE entry you will find the complete 
annotation of the domain, followed by one (or several) alignments with 
(different parts of) the query. If one domain is found several times in 
your query, you may find several alignments with the same or related 
entries at different parts of the query. Please note that it depends on the 
score parameter whether or not you see all the alignments. Do not use very 
low cutoff values because that results in prohibitively long output 
files. (For the time being, we have set the default cutoff to 35 and 
the minimum cutoff to 30).

3) Run statistics. This is usually not essential for the evaluation of 
the results; you can get a complete description of these and other blast 
parameters by sending a HELP message to blast.ncbi.nlm.nih.gov

Important: Failure to see a homology with a known domain may be due to 
several reasons: i) The domain type is not (yet) included in the SBASE 
domain library; ii) The threshold score parameter was set too high for the 
domain to be detected; iii) A different scoring matrix may be necessary in 
order to detect the alignment with the domain type in question. In the present 
experimental version of the SBASE server we support only the matrices used 
by BLAST; "customized matrices" will be added later to the final version.

Papers to reference in reporting results:
=========================================

Pongor, S., Skerl, V., Cserzo, M. and Hatsagi, Z., Simon, G. and 
Bevilacqua, V. (1992): The SBASE domain library release 2.0&  A 
collection of annotated protein sequence segments, Nucleic Acids. Res , 
21, 311-315

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990) Basic local alignment search tool J. Mol. 
Biol. 215:403-410.

Software availability
=====================

The Sbase database is available by anonymous ftp at
ftp.icgeb.trieste.it:/pub/sbase2

The blast software is available by anonymous ftp at
ncbi.nlm.nih.gov:/pub/


Protection of your sequence data
================================

The query sequences are not stored in any form. 

Further info:
=============

Server functions:   Zsolt Hatsagi 
                    <hatsagi@icgeb.trieste.it> 
                    Tel: +39-40-3757342

                    Valeria Bevilacqua, systems manager
                    (valeria@icgeb.trieste.it)
                    Tel.: +39-40-3757330

General info:       Sandor Pongor
                    <pongor@icgeb.trieste.it>
                    Tel: +39-40-3757300

FAX:                +39-40-226-555

Mail:	    	    International Centre for Genetic Engineering
    	    	    and Biotechnology
    	    	    AREA Science Park, Padriciano 99
    	    	    34012 Trieste, Italy

From owner-embldatabank@net.bio.net Thu Dec 09 22:00:00 1993
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!pipex!uknet!daresbury!not-for-mail
From: jclewley@crc.ac.uk (Dr. J.P. Clewley)
Newsgroups: bionet.molbio.embldatabank
Subject: Unavailability of published sequences with embl accession numbers
Message-ID: <2ear7b$gpq@mserv1.dl.ac.uk>
Date: 10 Dec 93 21:54:51 GMT
Sender: daemon@mserv1.dl.ac.uk
Distribution: bionet
Lines: 18
Original-To: embl-db@dl.ac.uk


 
Sequences reported in: van Camp et al, System. Appl. Microbiol. 16, 361-368
(1993)  with EMBL accession numbers x67758-x67775, described Tas availableU 
in  the legend to Table 1 in that paper could not be found in the EMBL database
 of 5 December, 1993.
 
 
When will these sequences be available?
 
 
Jon Clewley
 
Central Public Health Laboratory, Colindale, London, UK
 
 
email: jclewley@crc.ac.uk
 

From owner-embldatabank@net.bio.net Tue Dec 21 22:00:00 1993
Newsgroups: bionet.molbio.embldatabank
Path: biosci!daresbury!doc.ic.ac.uk!warwick!pipex!uunet!newsgate.watson.ibm.com!watnews.watson.ibm.com!hawnews.watson.ibm.com!puffin!dflash
From: dflash@watson.ibm.com (The dFLASH Project)
Subject: a dFLASH server update
Sender: news@hawnews.watson.ibm.com (NNTP News Poster)
Message-ID: <CIGG2L.sBo@hawnews.watson.ibm.com>
Date: Wed, 22 Dec 1993 21:08:44 GMT
Disclaimer: This posting represents the poster's views, not necessarily those of IBM.
Nntp-Posting-Host: puffin.watson.ibm.com
Organization: IBM T.J. Watson Research
Lines: 371



The dFLASH group announces the availability of a new release  of the dFLASH
email server.  The current version of the server represents a new release and
features:

o  greatly improved sensitivity 
o  improved mail interface
o  improved syntax-error checking
o  ability to recover text data pertaining to the retrieved sequences (see the
   option VERBOSE below)
o  new, more comprehensive protein database (we now use PIR Rel. 38)
o  a new computational platform
o  much improved alignment results
o  new request handling approach:  all of the submitted requests will
   eventually be processed, independent of whether the server is running upon
   reception of the request;  users will not receive the "server UNavailable"
   messages anymore, and there will be no need for re-submission.


Also, *no* registration is required anymore in order to use the dFLASH server. 
All submitted requests will be honored, assuming that the senders' email address
conforms to the accepted formats (see below). At the end, the server's help
file is included with details on the use of the system.

Finally, we would like to mention that we are looking forward to receiving
your feedback on the current version of the server.  We will greatly appreciate
receiving your suggestions for modifications, comments, criticism etc which
should be forwarded to dflash@watson.ibm.com (Subject: Comments).


Sincerely,


The dFLASH Group





---------------------------------- Cut Here  ----------------------------------


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                             N O T A     B E N E                           !!
!! The dFLASH server is still under development.  If some of the answers do  !!
!! not make sense it is very likely that this is due to a bug in our code.   !!
!!                                                                           !!
!! Reporting of such bugs will help us to incorporate all the needed fixes.  !!
!!                                                                           !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!                                                                           !!
!! The database that we use is PIR Release 38 with *no* incremental updates. !!
!! For more information, contact: 					     !!
!!                                                                           !!
!!		     Protein Information Resource (PIR) 		     !!
!!		   National Biomedical Research Foundation		     !!
!!		   	   3900 Reservoir Road, N.W.,		             !!
!!		   	  Washington, DC  20007, USA		     	     !!
!!                                                                           !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Dear User, welcome to the dFLASH server!

    The dFLASH server is a "homologous sequence retrieval" program for protein
sequences (see also NOTES below).  dFLASH is a  distributed system which runs
on a 16-node IBM SP/1.  Although, the SP/1 has a fast interconnection network
for intra-node communication, dFLASH currently uses regular TCP/IP for message
delivery.  Furthermore, evidence integration and alignment areperformed on a
single node, instead of in parallel on all 16 nodes.  As is evidenced by the
difference in the total CPU usage and the elapsed wall clock time, a large
portion of the total time is consumed by the network communicationand the
serial processing.  We will soon exploit the SP/1's fast interconnection
feature and also parallelize the evidence intergation/alignment code resulting
in an expected  16-fold speedup. The system has been implemented using IBM's
Concert/C language for distributed programming. The server is now available 24
hours a day, 7 days a week.   Meanwhile, incremental changes and improvements
made to the server will be reflected in the text of this help file:  it is
recommended that users periodically issue a `send help' request for up to date
information on the server.



    Effective today, November 16, 1993, *no* registration is required in order
to use the dFLASH server.  



For the moment, we can process requests originating from email addresses of the
form 
		"user@[machine.]institution.type"  
			or 
		"user%machine@[machine.]institution.type"  
We plan to further expand the accepted formats, depending on demand.

    You can use the dFLASH facilities by sending an email message to 

			"dflash@watson.ibm.com"

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$

VERY IMPORTANT:    the "Subject" line of the message should be one of: { dflash,
---------------    dFlash, dFLASH, DFLASH }.  Messages whose subject line does
	      not conform to this rule, will be left **unprocessed**. The reason
	      for that restriction is that we want to be able to automatically
	      distinguish between messages that are addressed to the server and
	      those that are meant for one  of the group members.

$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$


REQUEST FORMAT:
---------------
The typical message-body of an email request looks like:

     PAM   250  				(mandatory | DIRECTIVE)
     VERBOSE  10 20				(optional  | DIRECTIVE)
     SEQUENCES  100  				(optional  | DIRECTIVE)
     ALIGNMENTS 50  				(optional  | DIRECTIVE)
     THRESHOLD  30  				(optional  | DIRECTIVE)
     BEGIN 					(mandatory | DIRECTIVE)
     >A_ONE_LINE_TEST_SEQ_LABEL               	(mandatory -- notice the '>' )
     a_sequence_of_{amino_acids,spaces,tabs}
     1						(mandatory terminator)

The PAM/BLOSUM, VERBOSE, SEQUENCES, ALIGNMENTS, and THRESHOLD directives can
appear in any order but they *must* precede the BEGIN directive.  The BEGIN
line must precede the LABEL line, and the latter must precede the test sequence.
The test sequence should contain at least 30 and not more than 1,000 aminoacids.
BUT it *may* contain CARRIAGE RETURNS, TABS and SPACES.  There is NO case
sensitivity in the label and the test sequence itself.

The words appearing on the lines marked DIRECTIVE above can be in lower case or
upper case; in other words, you can have pam or PAM, threshold or THRESHOLD,
alignments or ALIGNMENTS, etc.  However, something like ThReShOlD will not work.

The VERBOSE line allows the sender to also retrieve the data about authors,
dates, entries, superfamilies etc. that are contained in the original PIR 
database.  This directive can take one or two arguments; for example:
		verbose 	15 	25
means "send me the text data for the proteins occupying positions 15 through 25
in the final ranking."  On the other hand,
		verbose 	15
means "send me the text data for the proteins occupying the first 15 positions
in the final ranking."  If no verbose line appears, no text data will be sent.


The SEQUENCES line allows one to restrict the reported sequences to the given
number.  This directive controls the number of entries in the ``short list''
of recovered database sequences only.  If no SEQUENCES line is given, the
server code will set it to an appropriate default value.


The ALIGNMENTS line allows one to restrict the reported alignments to the given
number.  If no ALIGNMENTS line is given, the server code will set it to an
appropriate default value.  The ALIGNMENTS value cannot exceed 1000.  Values
larger than 1000 are reduced to 1000.


The THRESHOLD line allows one to restrict the number of reported sequences (and
thus alignments) to only those whose Score exceeds the given THRESHOLD value. 
If no THRESHOLD line is given the server code will set it to an appropriate
default value.  The THRESHOLD value cannot be less than 30.  Values smaller
than 30 are increased to 30. Notice:  if the THRESHOLD value is too small, you
are running the danger of upsetting your mailer program since chances are that
you will receive a very big file as a reply from the server.


The LABEL line *must* now be preceded by the character '>'.


Finally, notice that you need to terminate the sequence with the terminator '1'.


Two example requests follow:

Example 1: 
		pam 250
		sequences 50
		alignments 30
		threshold  100
		begin
		> HBA_HUMAN STANDARD; PRT; 141 AA. P01922; HEMOGLOBIN ALPHA 
		VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
		TTKTYFPHFDLSHGSAQVKGHG     KKVADALTNA
		V A H V D D M PNALSALSDLHAHKLRVDPVNFK
		llshcllvtlaahlpaeftpavhasldkflasvstvltskyr
		1

            Note:  all amino acids  from "VLSP" through "ltskyr  will be used 
	    in the search.  Not more than the 50 top scoring sequences will be
	    reported in the short list.  Also, the alignments for the top 30
	    scoring sequences will be returned.  No reported sequence will have
	    score that is less than 100.

Example 2:
		BLOSUM 62
		BEGIN
		>     Your-Favorite-Label Goes Here
		VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
		TTKTYFPHFDLSHGSAQVKGHG     KKVADALTNA

		V A H V D D M PNALSALSDLHAHKLRVDPVNFK

		llshcllvtlaahlpaeftpavhasldkflasvstvltskyr
		1

     	    Note:  all amino acids  from "VLSP" through "ltskyr"  will be used 
	    in the search.  The server code will set the various parameters to
	    appropriate default values.



SCORING MATRICES:
-----------------
You can use both PAM and BLOSUM scoring matrices. These can be requested via
one of { pam, PAM, blosum, BLOSUM }. The currently supported distances are

for BLOSUM:  30, 35, 40, 45, 50, 55, 60, 62, 65, 70, 75, 80, 85, 90, 100

for PAM:     10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,
	     160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280,
	     290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410,
	     420, 430, 440, 450, 460, 470, 480, 490, and 500.


NOTE ON ALIGNMENT:
------------------
The server's alignment code now implements dynamic programming.  This is not to
be confused with the indexing method that is used to determine the candidates
to align.

The meaning of the variables in the listing that is returned by the dFLASH
server

   .....
   ....

   Score Matrix: PAM250
   Max Reported Sequences:  1000
   Max Reported Alignments: 10
   Score Threshold  At: 65

     Id  Label:                                   Score  NRes  Ex% Tot% Sig  Pk
   ----------------------------------------------------------------------------
      1. HAHU hemoglobin alpha chain - human        655   141 100% 100% 100  89
      2. HACZ hemoglobin alpha chain - chimpanzee   655   141 100% 100% 100  89
      3. HACZP hemoglobin alpha chain - pygmy chi   655   141 100% 100% 100  89
      4. HAGO hemoglobin alpha chain - lowland go   654   141  99% 100%  99  89
      5. HAMQP hemoglobin alpha chain - hanuman l   653   141  97% 100%  99  89
      6. B27792 hemoglobin alpha-1 chain - orangu   649   141  97% 100%  99  89
      7. A25126 hemoglobin alpha-1 chain - Sumatr   649   141  97% 100%  99  89
    ...
    .....
    ..

is the following:

NRes:  the number of residues (amino acids) in the recovered match
Score: sequence  similarity score of the recovered sequence based on the
       selected mutation matrix
Ex%:   percentage of *exact* matching residues
Tot%:  percentage of *total* (=exact+conservative) matching residues
Sig:   100 times the ratio between the actual computed score and the score
       obtained by matching the retrieved sub-segment with itself; the
       denominator is the maximum obtainable score for the sub-segment in
       question (all gaps removed).
Peak:  the maximum score value over *any* 20 residue-window of the recovered
       match



TO OBTAIN HELP:
---------------
    You can obtain this message at any moment by sending a message with one of:
{ dflash, dFlash, dFLASH, DFLASH } in the "Subject" line and a body containing
one of { help, HELP, send help, SEND HELP }.


TO OBTAIN ON-LINE REPRINTS OF PAPERS
------------------------------------
    You can obtain reprints (in PostScript) of relevant papers by sending a
message with one of: { dflash, dFlash, dFLASH, DFLASH } in the "Subject" line
and a body containing 

one of {flashpaper, FLASHPAPER, send flashpaper, SEND FLASHPAPER }        
					---> returns to the originator of the 
					request a copy of the FLASH paper

one of {dflashpaper, DFLASHPAPER, send dflashpaper, SEND DFLASHPAPER }        
					---> returns to the originator of the 
					request a copy of a paper that contains
					a description of dFLASH (long)

one of {concertpaper, CONCERTPAPER, send concertpaper, SEND CONCERTPAPER } 
                                        ---> returns to the originator of the 
					request a copy of a high-level paper
					describing the CONCERT/C language

one of {bayespaper, BAYESPAPER, send bayespaper, SEND BAYESPAPER } 
                                	--> returns to the originator of the 
					request a copy of a paper describing 
					a computer-vision application based 
					on similar to dFLASH indexing 
					principles (long)

Notice there can only be *one* such request per message!



OTHER  NOTES:
-------------

(1) for the time being we do not incorporate incremental updates of PIR.
(2) the reply from the server now contains the label on its Subject line; we
    thought this might be useful to some users.
(3) format checking and error reporting have been improved considerably.
(4) at the moment we are putting together the version of the server that will
    allow sequence searches in GenBank.  The current projection is that the
    GenBank search server will be available before the middle of January.
(5) dFLASH searches are currently available through GRAIL of the Oak Ridge
    National Laboratory.

Thank you for your interest in the dFLASH server. 

					Sincerely,

					The dFLASH Group


###############################################################################

COMMENTS??
----------
We will appreciate receiving your feedback, suggestions, comments, or bug
reports; all of these can be sent to "dflash@watson.ibm.com"  Please, make sure
your  "Subject" line contains the word "comments".

###############################################################################

REFERENCES
----------

If you make use of the dFLASH server, please reference 

     A. Califano and I. Rigoutsos, "FLASH: A Fast Look-up Algorithm for String
     Homology."  In Proceedings of the First International Conference on
     Intelligent Systems for Molecular Biology, July 1993, Bethesda, MD.

If you wish to find out more about the dFLASH server, you can contact Andrea
Califano (acal@watson.ibm.com) or Isidore Rigoutsos (rigoutso@watson.ibm.com)

###############################################################################


For more information on the Concert/C language, please refer to

     J. Auerbach, D. Bacon, A. Goldberg, G. Goldszmidt, A. Gopal, M. Kennedy,
     A. Lowry, J. Russell, W. Silverman, R. Strom, D. Yellin, and S. Yemini,
     "High-level language support  for programming reliable distributed
     systems."  In Proceedings of the International Conference on Computer
     Languages, April 1992, Oakland, California.

or contact Josh Auerbach (jsa@watson.ibm.com)

###############################################################################


---------------------------------- Cut Here  ----------------------------------



