Access to sequence analysis of yeast proteins

Chris Sander sander at embl-heidelberg.de
Thu Mar 14 13:26:24 EST 1996


To:   Yeast community
From: GeneQuiz Team
Re:   This note explains how to access the results of a computer
      sequence analysis of yeast protein sequences.

Reply to: GeneQuiz at embl-heidelberg.de


ANALYSIS OF PUBLICLY AVAILABLE YEAST PROTEIN SEQUENCES

A computer sequence analysis of more than 5000 yeast protein 
sequences was performed March 4-7, 1996, using the most recent 
public sequence databases. The automated analysis was done using the 
GeneQuiz software developed by current and former EMBL scientists on 
a 64 processor powerCHALLENGE array at the Silicon Graphics 
Supercomputing Technology Center in Cortaillod, Switzerland. The 
results of the analysis are available on the Internet and can be 
viewed using WWW browsers such as Netscape or Mosaic at the 
following Web sites.

Overview of the March 4-7 runs:

	http://genecrunch.sgi.ch     (Europe)
	http://genecrunch.sgi.com    (USA)

Direct access to the yeast results:

	http://genecrunch.sgi.ch/yeast.html   (Europe)
	http://genecrunch.sgi.com/yeast.html  (USA)

Access to GeneQuiz summaries of mycoplasma genitalium and 
haemophilus influenzae as well as yeast:

 http://www.sander.embl-heidelberg.de/genequiz/

...  enjoy !

Georg Casari, Reinhard Schneider, Antoine de Daruvar, Chris Sander

13-March-96, EMBL Heidelberg-Cambridge


READ ON FOR MORE INFORMATION AND USER GUIDE TO THE GENEQUIZ SERVER

For proteins with an informative functional annotation already in
the public sequence databases, the GeneQuiz summary simply mirrors 
the database functional assignments. For proteins of unknown 
function, the analysis aims at the prediction of protein function 
and 3D-structure by homology, as deduced from sequence similarity.

Homology information is labeled as 'clear', 'tentative' or only 
'marginal', depending on the level of sequence similarity. The 
corresponding functional assignments are those of the homologues in 
the database and provide a hypothesis (or prediction) regarding the 
function of the search protein. Note that with increasing 
evolutionary distance the function of the search protein may differ 
significantly from that of the homologues.

The results of this large scale sequence analysis are a distillation 
of massive amounts of data. In order to give efficient response to 
your questions, we provide the results through queries, enabling you 
to selectsets of proteins of particular interest to you. Criteria 
for selection can be a search string, a gene name, a chromosome 
number etc.

Examples of queries available through the WWW pages:

*   What is the closest homologue and the number of homologues in
    the databases for RA51_YEAST (chr5) ?
    Answer: RA51_human, 24 homologues.
.
*   Which proteins code for AMD genes ?
    Answer: AMDM_YEAST (chr13), AMDY_YEAST (chr4).

*   Which proteins on chromosome 3 have a homolog of known 3D
    structure at the level of clear homology ?
    Answer: KCC4_YEAST, LEU3_YEAST, CISZ_YEAST (3D model available),
    etc.

*   How many proteins on chromosome 14 have "KINASE" in the derived
    annotation, i.e., are homologous to a kinase ?
    Answer: 9, of which 3 are probable protein kinases (KNOS_YEAST,
    SCCHXIV43_19, etc.).

*   Which proteins located on chromosome 5 having a tentatively
    predicted function ?
    Answer: left as an exercise to the reader ...

Note that some proteins may appear more than once in the lists you 
obtain. This occurs when two databases contain the same protein in 
slightly different form (example: swissprot:amdy_yeast is 99.8% 
identical to trembl: sc8419_9). The set of sequences analyzed here 
has 6613 sequences, corresponding to probably more than 5000 unique 
proteins (the entire yeast genome is estimated to have about 6200 
unique proteins). Although this set has been cleaned to remove 
multiple entries (100% identity), the more subtle redundancies will 
remain until the complete genome is released (announced for spring 
1996) and the duplications in the genome can be reliably 
distinguished from duplications in the databases. See
http://genecrunch.sgi.ch/nryeast.html for a more detailed 
description of the sequence set analyzed. Note also that some 
sequences have composition biased regions marked as 'XXXX' in the 
alignment reports, i.e., 'X' does not mean 'unknown' but 'removed 
for purposes of improving search selectivity'.

The GeneQuiz yeast server is likely to evolve and expand in 
functionality in response to your feedback and as the result of new 
developments in the GeneQuiz methodology.

We hope that the information accessible from the GeneQuiz yeast 
server will be useful in planning future experiments on gene 
function.

Let us know what you think.

GeneQuiz at embl-heidelberg.de

____________________________________________________________________
Credits

We are indebted to all scientists world-wide who have made
sequences and other experimental results publicly available, 
especially all those involved in the yeast genome sequencing 
projects and to the staff of database centers such as EMBL-EBI, 
GenBank, DDBJ, Swissprot, PIR, MIPS, YPD, and SGD.

GeneQuiz is a collaborative effort between scientists at 
EMBL-Heidelberg and EMBL-EBI (the European Bioinformatics Institute 
near Cambridge) and former EMBL scientists at CNB Madrid, MDC 
Berlin, and SRI Menlo Park: Georg Casari, Antoine de Daruvar, 
Reinhard Schneider, Michael Scharf, Peer Bork, Miguel Andrade, 
Javier Tamames, Alfonso Valencia, Christos Ouzounis and Chris 
Sander. Special thanks to: Thure Etzold, Burkhard Rost and Gerrit
Vriend, EMBL.

The GeneCrunch team at the Silicon Graphics Supercomputing 
Technology Center at Cortaillod, Switzerland included: Pam Bremer, 
Michael Schlenkrich, Richard Mercille, Horst Vollhardt, Ron Larson, 
Christophe Desperrier, Ove Hansen, Oliver Enzmann.

Thanks to all  :-))
____________________________________________________________________



More information about the Yeast mailing list