ANNOUNCEMENT OF MUCIN TYPE O-GLYCOSYLATION PREDICTION E-MAIL SERVER

Jan Hansen janhan
Mon Jun 26 07:17:43 EST 1995


*************** NetOglyc Mail Server V1.0 *************** 
                      
Prediction of Mucin type O-glycosylation of mammalian proteins



Center for Biological Sequence Analysis
The Technical University of Denmark
DK-2800 Lyngby, Denmark  

DESCRIPTION:

The  NetOglyc  mail  server  is  a  service  producing  neural  network
predictions of mucin type O-glycosylation sites in mammalian proteins as 
described in: 
J.E. Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J.E.S.
Hansen and  S.  Brunak,  Prediction  of  O-glycosylation  of  mammalian
proteins:  Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase.  
The   Biochemical   Journal,   308, 801-813, 1995. 

ABSTRACT:

The specificity of the enzyme(s) catalyzing the covalent  link  between
the hydroxyl side-chains of serine or threonine and  the  sugar  moiety
GalNAc is unknown. Pattern recognition by  artificial  neural  networks
and weight matrix algorithms  was  performed  to  determine  the  exact
position of in vivo O-linked GalNAc glycosylated serine  and  threonine
residues from the primary sequence exclusively. The  acceptor  sequence
context for O-glycosylation of serine was found to differ from that  of
threonine and the two types  were  therefore  treated  separately.  The
context of the sites showed a high abundance  of  proline,  serine  and
threonine extending far beyond the previously reported region  covering
positions -4 through +4  relative  to  the  glycosylated  residue.  The
O-glycosylation sites  were  found  to  cluster  and  to  have  a  high
abundance in the amino-terminal part of the  protein.  The  sites  were
also found to have an increased preference for three different  classes
of beta-turns. No simple consensus like rule could be deduced  for  the
complex glycosylation sequence acceptor patterns. The  neural  networks
were trained on the hitherto largest data  material  consisting  of  48
carefully   examined    mammalian    glycoproteins    comprising    264
O-glycosylation sites. For detection  neural  network  algorithms  were
much more reliable than weight matrices. The networks  correctly  found
60-95% of the O-glycosylated serine/threonine residues  and  89-97%  of
the non-glycosylated residues in two independent  test  sets  of  known
glycoproteins.  A  computer  server  using  E-mail  for  prediction  of
O-glycosylation sites has been implemented and made publicly available. 
FURTHER INFORMATION:

The NetOglyc server returns a help file if the submitted file  contains
the word `help'. 
        
CONFIDENTIALITY

Your submitted sequences  will  be  deleted  automatically  immediately
after processing by NetOglyc. 

PAPER TO REFERENCE IN REPORTING RESULTS:

Jan E. Hansen, Ole  Lund,  Jacob  Engelbrecht,  Henrik  Bohr,  Jens  O.
Nielsen,  John-E.S.   Hansen,   and   Soren   Brunak.   Prediction   of
O-glycosylation  of  mammalian  proteins:   Specificity   patterns   of
UDP-GalNAc:polypeptide  N-acetylgalactosaminyltransferase.  Biochemical
Journal 308, 801-813, 1995. 

COMMENTS AND SUGGESTIONS:

Since an expanded data set  with  additional  O-glycosylated  sequences
would increase the performance of the network, we are  very  interested
in receiving such material. If you  have  knowledge  of  experimentally
determined O-glycosylation sites in glycoproteins not  already  in  the
data set (see reference Biochem. J. 308, 801-813, 1995.) we would  like
to include them. Any other comments regarding the  predictions  or  the
data may be sent to: 

   Jan Hansen (janhan at cbs.dtu.dk)
        
   Center for Biological Sequence Analysis
   The Technical University of Denmark
   Building 206
   DK-2800 Lyngby
   Denmark
   
   Tel: +45 45252485
   Fax: +45 45934808

PROBLEMS: 

Should be addressed to: 

   Kristoffer Rapacki (rapacki at cbs.dtu.dk)
   
   or
 
   Karsten Dalsgaard (karsten at cbs.dtu.dk)

   Center for Biological Sequence Analysis
   The Technical University of Denmark
   Building 206
   DK-2800 Lyngby
   Denmark
   
   Tel: +45 45252477
   Fax: +45 45934808  

-----------------------------------------------------------------------

INSTRUCTIONS for using the NetOglyc mail server:

In order to use the mail server for prediction on amino acid sequences: 

1) Prepare a text file including one or more sequences.  The  sequences
must be preceded by a first line starting by the symbol > followed by a  name
(identifyer) of the sequence. 
Next line contain the sequence. There must be at least one  character  at
each line of each sequence. 

The sequences must be submitted using the one letter abbreviations  for
the amino acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'. 
N.B. Other characters will be accepted, but not encoded in the  network
window, when making the prediction.  
        
Example: Create a text file: `sequence.txt' using an editor, the syntax
of the file may look like this: 
        
>seq_name1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD

2) Mail the text file to NetOglyc at cbs.dtu.dk: 

In the UNIX environment you may mail the text  file  `sequence.txt'  to
NetOglyc at genome.cbs.dtu.dk by typing:

mail NetOglyc at .cbs.dtu.dk < sequence.txt

3) You will receive a mail containing the prediction, or possibly error
messages from the server. If the file contains the  word  `help',  this
help file will be returned. Response time depends on system load. 

4) A www server: http://www.cbs.dtu.dk/ may also be used.


FORMAT OF NetOglyc PREDICTION OUTPUT:

IDENTIFIER:               <sequence name>     
LENGTH:                   <length of sequence in amino acids>  
DISTRIBUTION:             <number of predicted O-glycosylations>

SSTTGVAMHTSTSSSVTKSYISSQT   <sequence>
s........s.s.....s..s...   <Predicted assignment (serine)>

SINGLE RESIDUE ACTIVITIES:

ID          <sequence name>
POSITION    <position in sequence of serines or threonines>
RESIDUE     <amino acid>
ASSIGNMENT  <predicted assignment: t=O-glycosylated, .=non-glycosylated>
ACTIVITY    <prediction strength,  values above the threshold above 0.5 
             denotes O-glycosylated serine or threonine>

 
EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above. 

 NetOglyc Mail Server Output 
 Prediction for: THREONINE RESIDUES

Message 1/1  From genome mail server                  Jun 26 '95 at 12:27 pm
120
 IDENTIFIER: seq_name1    LENGTH:     143
 DISTRIBUTION: t:       1

ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
................t..............................................................
..............................................................

 SINGLE RESIDUE ACTIVITIES:
 (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1         5 T .  0.150
 seq_name1        18 T t  0.522
 seq_name1        32 T .  0.283
 seq_name1        71 T .  0.376
 seq_name1       130 T .  0.188
 seq_name1       132 T .  0.312
 seq_name1       143 T .  0.157

 NetOglyc Mail Server Output 
 Prediction for: SERINE RESIDUES   

 IDENTIFIER: seq_name1    LENGTH:     143
 DISTRIBUTION: s:       1

ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
...............................................................................
....................s.........................................

 SINGLE RESIDUE ACTIVITIES:
 (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1         8 S .  0.243
 seq_name1        12 S .  0.181
 seq_name1        13 S .  0.290
 seq_name1        14 S .  0.404
 seq_name1        17 S .  0.043
 seq_name1        35 S .  0.186
 seq_name1        37 S .  0.227
 seq_name1        51 S .  0.089
 seq_name1        53 S .  0.087
 seq_name1        54 S .  0.046
 seq_name1        63 S .  0.390
 seq_name1        64 S .  0.075
 seq_name1        74 S .  0.077
 seq_name1        76 S .  0.203
 seq_name1        90 S .  0.089
 seq_name1        92 S .  0.087
 seq_name1        93 S .  0.046
 seq_name1       102 S s  0.618
 seq_name1       103 S .  0.177
 seq_name1       108 S .  0.202
 seq_name1       111 S .  0.197
 seq_name1       135 S .  0.120

CURRENT NETWORK

The network will be updated and predictions can alter due to different
versions. The network is
balanced to give optimal predictions whether you submit sequences with no
homology to the known
O-glycosylated proteins or not. If however the submitted sequence is very close
to or identical to the
sequences in our training dataset, we will notify you by sending you both the
assigment of the
homologous (or identical) sequence in our data set and the prediction.

Jan Hansen (janhan at cbs.dtu.dk)




More information about the Glycosci mailing list