ANNOUNCEMENT OF O-GLYCOSYLATION PREDICTION SERVER

Jan Hansen janhan
Tue Jul 4 02:57:13 EST 1995


*************** NetOglyc Mail Server V1.0 ***************

Prediction of Mucin type O-glycosylation of mammalian proteins

Center for Biological Sequence Analysis The Technical University of Denmark
DK-2800 Lyngby, Denmark

DESCRIPTION:

The NetOglyc mail server is a service producing neural network predictions of
mucin type O-glycosylation sites in mammalian proteins as described in: 
J.E. Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J-E.S. Hansen 
and S. Brunak.
Prediction of O-glycosylation of mammalian proteins: Specificity patterns of 
UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase.
The Biochemical Journal, 308, 801-813, 1995.

ABSTRACT:

The specificity of the enzyme(s) catalyzing the covalent link between the
hydroxyl side-chains of serine or threonine and the sugar moiety GalNAc is
unknown. Pattern recognition by artificial neural networks and weight matrix
algorithms was performed to determine the exact position of in vivo O-linked
GalNAc glycosylated serine and threonine residues from the primary sequence
exclusively. The acceptor sequence context for O-glycosylation of serine was
found to differ from that of threonine and the two types were therefore treated
separately. The context of the sites showed a high abundance of proline, serine
and threonine extending far beyond the previously reported region covering
positions -4 through +4 relative to the glycosylated residue. The
O-glycosylation sites were found to cluster and to have a high abundance in the
amino-terminal part of the protein. The sites were also found to have an
increased preference for three different classes of beta-turns. No simple
consensus like rule could be deduced for the complex glycosylation sequence
acceptor patterns. The neural networks were trained on the hitherto largest
data material consisting of 48 carefully examined mammalian glycoproteins 
comprising 264 O-glycosylation sites. For detection neural network algorithms 
were much more reliable than weight matrices. The networks correctly found 
60-95% of the O-glycosylated serine/threonine residues and 89-97% of the 
non-glycosylated residues in two independent test sets of known glycoproteins. 
A computer server using E-mail for prediction of O-glycosylation sites has 
been implemented and made publicly available.

FURTHER INFORMATION:

The NetOglyc server returns a help file if the submitted file contains the word
`help'.

CONFIDENTIALITY

Your submitted sequences will be deleted automatically immediately after
processing by NetOglyc.

PAPER TO REFERENCE IN REPORTING RESULTS:

Jan E. Hansen, Ole Lund, Jacob Engelbrecht, Henrik Bohr, Jens O.  Nielsen,
John-E.S. Hansen, and Soren Brunak. Prediction of O-glycosylation of mammalian
proteins: Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase. Biochemical Journal 308, 801-813, 1995.

COMMENTS AND SUGGESTIONS:

Since an expanded data set with additional O-glycosylated sequences would
increase the performance of the network, we are very interested in receiving
such material. If you have knowledge of experimentally determined
O-glycosylation sites in glycoproteins not already in the data set (see
reference Biochem. J. 308, 801-813, 1995.) we would like to include them. Any
other comments regarding the predictions or the data may be sent to:

 Jan Hansen (janhan at cbs.dtu.dk)

 Center for Biological Sequence Analysis The Technical University of Denmark
 Building 206 DK-2800 Lyngby Denmark

 Tel: +45 45252485 Fax: +45 45934808

PROBLEMS:

Should be addressed to:

 Kristoffer Rapacki (rapacki at cbs.dtu.dk)

 or

 Karsten Dalsgaard (karsten at cbs.dtu.dk)

 Center for Biological Sequence Analysis The Technical University of Denmark
 Building 206 DK-2800 Lyngby Denmark

 Tel: +45 45252477 Fax: +45 45934808

-----------------------------------------------------------------------

INSTRUCTIONS for using the NetOglyc mail server:

In order to use the mail server for prediction on amino acid sequences:

1) Prepare a text file including one or more sequences. The sequences must be
preceded by a first line starting by the symbol > followed by a name
(identifier) of the sequence.  Next line contain the sequence. There must be at
least one character at each line of each sequence. Note: Any character after
the symbol > will be interpreted as sequence.

The sequences must be submitted using the one letter abbreviations for the
amino
acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'.  N.B. Other characters will
be accepted, but not encoded in the network window, when making the prediction.

Example: Create a text file: `sequence.txt' using an editor, the syntax of the
file may look like this:

>seq_name1 
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2 
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD

2) Mail the text file to NetOglyc at cbs.dtu.dk:

In the UNIX environment you may mail the text file `sequence.txt' to
NetOglyc at genome.cbs.dtu.dk by typing:

mail NetOglyc at .cbs.dtu.dk < sequence.txt

3) You will receive a mail containing the prediction, or possibly error
messages
from the server. If the file contains the word `help', this help file will be
returned. Response time depends on system load.

4) A www server: http://www.cbs.dtu.dk/netOglyc/cbsnetOglyc.html may also be
used.


FORMAT OF NetOglyc PREDICTION OUTPUT:

IDENTIFIER:    	<sequence name> 
LENGTH:     	<length of sequence in amino acids>
DISTRIBUTION:   <number of predicted O-glycosylations>

 SSTTGVAMHTSTSSSVTKSYISSQT <sequence>
 .s........s.s.....s..s... <Predicted O-glycosylated assignment (serine)>



SINGLE RESIDUE ACTIVITIES:

ID   	   <sequence name>
POSITION   <position in sequence of serines or threonines>
RESIDUE    <amino acid> 
ASSIGNMENT <predicted assignment: s or t=O-glycosylated, .=non-glycosylated> 
ACTIVITY   <prediction strength, values above threshold of 0.5 means
O-glycosylated
            serine or threonine>


EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above.

NetOglyc Mail Server Output Prediction for: THREONINE RESIDUES

Message 1/1 From NetOglyc mail server     Jun 26 '95 at 12:27 pm 120

IDENTIFIER: seq_name1 LENGTH:  143 DISTRIBUTION: t:  1
 
 ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
 YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
 .................t..............................................................
 ...............................................................

 SINGLE RESIDUE ACTIVITIES:  (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1    5 T . 0.150 
 seq_name1   18 T t 0.522 
 seq_name1   32 T . 0.283
 seq_name1   71 T . 0.376 
 seq_name1  130 T . 0.188 
 seq_name1  132 T . 0.312
 seq_name1  143 T . 0.157

 NetOglyc Mail Server Output Prediction for: SERINE RESIDUES

 IDENTIFIER: seq_name1 LENGTH:  143 DISTRIBUTION: s:  1

 ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
 YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
 ................................................................................
 .....................s.........................................

 SINGLE RESIDUE ACTIVITIES:  (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1    8 S . 0.243
 seq_name1   12 S . 0.181 
 seq_name1   13 S . 0.290
 seq_name1   14 S . 0.404 
 seq_name1   17 S . 0.043 
 seq_name1   35 S . 0.186
 seq_name1   37 S . 0.227 
 seq_name1   51 S . 0.089 
 seq_name1   53 S . 0.087
 seq_name1   54 S . 0.046 
 seq_name1   63 S . 0.390 
 seq_name1   64 S . 0.075
 seq_name1   74 S . 0.077 
 seq_name1   76 S . 0.203 
 seq_name1   90 S . 0.089
 seq_name1   92 S . 0.087 
 seq_name1   93 S . 0.046 
 seq_name1  102 S s 0.618
 seq_name1  103 S . 0.177 
 seq_name1  108 S . 0.202 
 seq_name1  111 S . 0.197
 seq_name1  135 S . 0.120

CURRENT NETWORK

The network will be updated and predictions can alter due to different
versions.The network is balanced to give optimal predictions whether 
you submit sequences with no homology to the known O-glycosylated 
proteins or not. If however the submitted sequence is identical to 
the sequences in our training dataset, we will notify you by sending 
you both the assigment of the identical sequence in our data set 
and the prediction.

*************** NetOglyc Mail Server V1.0 ***************

Prediction of Mucin type O-glycosylation of mammalian proteins

Center for Biological Sequence Analysis The Technical University of Denmark
DK-2800 Lyngby, Denmark

DESCRIPTION:

The NetOglyc mail server is a service producing neural network predictions of
mucin type O-glycosylation sites in mammalian proteins as described in: 
J.E. Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J-E.S. Hansen 
and S. Brunak.
Prediction of O-glycosylation of mammalian proteins: Specificity patterns of 
UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase.
The Biochemical Journal, 308, 801-813, 1995.

ABSTRACT:

The specificity of the enzyme(s) catalyzing the covalent link between the
hydroxyl side-chains of serine or threonine and the sugar moiety GalNAc is
unknown. Pattern recognition by artificial neural networks and weight matrix
algorithms was performed to determine the exact position of in vivo O-linked
GalNAc glycosylated serine and threonine residues from the primary sequence
exclusively. The acceptor sequence context for O-glycosylation of serine was
found to differ from that of threonine and the two types were therefore treated
separately. The context of the sites showed a high abundance of proline, serine
and threonine extending far beyond the previously reported region covering
positions -4 through +4 relative to the glycosylated residue. The
O-glycosylation sites were found to cluster and to have a high abundance in the
amino-terminal part of the protein. The sites were also found to have an
increased preference for three different classes of beta-turns. No simple
consensus like rule could be deduced for the complex glycosylation sequence
acceptor patterns. The neural networks were trained on the hitherto largest
data material consisting of 48 carefully examined mammalian glycoproteins 
comprising 264 O-glycosylation sites. For detection neural network algorithms 
were much more reliable than weight matrices. The networks correctly found 
60-95% of the O-glycosylated serine/threonine residues and 89-97% of the 
non-glycosylated residues in two independent test sets of known glycoproteins. 
A computer server using E-mail for prediction of O-glycosylation sites has 
been implemented and made publicly available.

FURTHER INFORMATION:

The NetOglyc server returns a help file if the submitted file contains the word
`help'.

CONFIDENTIALITY

Your submitted sequences will be deleted automatically immediately after
processing by NetOglyc.

PAPER TO REFERENCE IN REPORTING RESULTS:

Jan E. Hansen, Ole Lund, Jacob Engelbrecht, Henrik Bohr, Jens O.  Nielsen,
John-E.S. Hansen, and Soren Brunak. Prediction of O-glycosylation of mammalian
proteins: Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase. Biochemical Journal 308, 801-813, 1995.

COMMENTS AND SUGGESTIONS:

Since an expanded data set with additional O-glycosylated sequences would
increase the performance of the network, we are very interested in receiving
such material. If you have knowledge of experimentally determined
O-glycosylation sites in glycoproteins not already in the data set (see
reference Biochem. J. 308, 801-813, 1995.) we would like to include them. Any
other comments regarding the predictions or the data may be sent to:

 Jan Hansen (janhan at cbs.dtu.dk)

 Center for Biological Sequence Analysis The Technical University of Denmark
 Building 206 DK-2800 Lyngby Denmark

 Tel: +45 45252485 Fax: +45 45934808

PROBLEMS:

Should be addressed to:

 Kristoffer Rapacki (rapacki at cbs.dtu.dk)

 or

 Karsten Dalsgaard (karsten at cbs.dtu.dk)

 Center for Biological Sequence Analysis The Technical University of Denmark
 Building 206 DK-2800 Lyngby Denmark

 Tel: +45 45252477 Fax: +45 45934808

-----------------------------------------------------------------------

INSTRUCTIONS for using the NetOglyc mail server:

In order to use the mail server for prediction on amino acid sequences:

1) Prepare a text file including one or more sequences. The sequences must be
preceded by a first line starting by the symbol > followed by a name
(identifier) of the sequence.  Next line contain the sequence. There must be at
least one character at each line of each sequence. Note: Any character after
the symbol > will be interpreted as sequence.

The sequences must be submitted using the one letter abbreviations for the
amino
acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'.  N.B. Other characters will
be accepted, but not encoded in the network window, when making the prediction.

Example: Create a text file: `sequence.txt' using an editor, the syntax of the
file may look like this:

>seq_name1 
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2 
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD

2) Mail the text file to NetOglyc at cbs.dtu.dk:

In the UNIX environment you may mail the text file `sequence.txt' to
NetOglyc at genome.cbs.dtu.dk by typing:

mail NetOglyc at .cbs.dtu.dk < sequence.txt

3) You will receive a mail containing the prediction, or possibly error
messages
from the server. If the file contains the word `help', this help file will be
returned. Response time depends on system load.

4) A www server: http://www.cbs.dtu.dk/netOglyc/cbsnetOglyc.html may also be
used.


FORMAT OF NetOglyc PREDICTION OUTPUT:

IDENTIFIER:    	<sequence name> 
LENGTH:     	<length of sequence in amino acids>
DISTRIBUTION:   <number of predicted O-glycosylations>

 SSTTGVAMHTSTSSSVTKSYISSQT <sequence>
 .s........s.s.....s..s... <Predicted O-glycosylated assignment (serine)>



SINGLE RESIDUE ACTIVITIES:

ID   	   <sequence name>
POSITION   <position in sequence of serines or threonines>
RESIDUE    <amino acid> 
ASSIGNMENT <predicted assignment: s or t=O-glycosylated, .=non-glycosylated> 
ACTIVITY   <prediction strength, values above threshold of 0.5 means
O-glycosylated
            serine or threonine>


EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above.

NetOglyc Mail Server Output Prediction for: THREONINE RESIDUES

Message 1/1 From NetOglyc mail server     Jun 26 '95 at 12:27 pm 120

IDENTIFIER: seq_name1 LENGTH:  143 DISTRIBUTION: t:  1
 
 ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
 YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
 .................t..............................................................
 ...............................................................

 SINGLE RESIDUE ACTIVITIES:  (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1    5 T . 0.150 
 seq_name1   18 T t 0.522 
 seq_name1   32 T . 0.283
 seq_name1   71 T . 0.376 
 seq_name1  130 T . 0.188 
 seq_name1  132 T . 0.312
 seq_name1  143 T . 0.157

 NetOglyc Mail Server Output Prediction for: SERINE RESIDUES

 IDENTIFIER: seq_name1 LENGTH:  143 DISTRIBUTION: s:  1

 ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
 YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
 ................................................................................
 .....................s.........................................

 SINGLE RESIDUE ACTIVITIES:  (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1    8 S . 0.243
 seq_name1   12 S . 0.181 
 seq_name1   13 S . 0.290
 seq_name1   14 S . 0.404 
 seq_name1   17 S . 0.043 
 seq_name1   35 S . 0.186
 seq_name1   37 S . 0.227 
 seq_name1   51 S . 0.089 
 seq_name1   53 S . 0.087
 seq_name1   54 S . 0.046 
 seq_name1   63 S . 0.390 
 seq_name1   64 S . 0.075
 seq_name1   74 S . 0.077 
 seq_name1   76 S . 0.203 
 seq_name1   90 S . 0.089
 seq_name1   92 S . 0.087 
 seq_name1   93 S . 0.046 
 seq_name1  102 S s 0.618
 seq_name1  103 S . 0.177 
 seq_name1  108 S . 0.202 
 seq_name1  111 S . 0.197
 seq_name1  135 S . 0.120

CURRENT NETWORK

The network will be updated and predictions can alter due to different
versions.The network is balanced to give optimal predictions whether 
you submit sequences with no homology to the known O-glycosylated 
proteins or not. If however the submitted sequence is identical to 
the sequences in our training dataset, we will notify you by sending 
you both the assigment of the identical sequence in our data set 
and the prediction.




More information about the Glycosci mailing list