Prediction Bacterial coding regions New server

Victor V. Solovyev solovyev at cmb.bcm.tmc.edu
Fri Mar 31 00:58:29 EST 1995


*************************************************************************
	CDSB - Prediction of Protein coding regions in Bacterial sequences
 			(Version 1.   3.21.95)
	Department of Cell Biology, Baylor College of Medicine
=========================================================================
	Analysis of uncharacterized human sequences is available through 
University of Houston and Weizmann Institute of Science servers: 
with the name of the program in the subject line 

Examples:
mail -s cdsb service at theory.bchs.uh.edu < test.seq
mail -s cdsb services at bioinformatics.weizmann.ac.il < test.seq

where test.seq a file with your sequence.

Method description:
**********************
   The method is based on the discriminant analysis of open reading 
   frames flanked ATG(GTG) and STOP codon pairs. Prediction is performed by
   linear discriminant function combining characteristics describing
   5'- and 3'-mRNA regions and also coding region for each  open reading frame.
   The program parameters were calculated based on E.coli annotated gene 
   sequences.   

Accuracy:
********************************
  The accuracy is about 94% recognition of coding nucleotides within a
sequence up to 400000 bp long. The program predict nonoverlapping 
protein coding regions in a given sequence. 

Submitting sequences via email:
***********************************
  For email submission the sequences must have the following format:  

Nane of your sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
aagacagatacatcccaccatgatcaggatcacccaaccttcaacaagatcacccccaac
ctggctgagttcgccttcagcctataccgccagctggcacaccagtccaacagcaccaat
atcttcttctccccagtgagcatcg...............

   (Restrict the line length to 80 characters or less).


Hexon output:		
******************
   1st line - name of the program
   2nd line - name of your sequence
   3nd line - length of your sequence, thereshold (will be optional),
   number of potential CDS
   4th line and next - positions and scores of predicted exons 
	and amino acid sequences of predicted CDS
   For example:

 CDSB search for protein coding regions in E.coli sequences
 ECAPTS       3708 bp    DNA             BCT                                                                              
Length:   3708 Threshold:   0.0, # of potential CDS:   3
   152 -    886 w=  0.94
  1612 -   2277 w=  0.26
  2572 -   3300 w=  0.72
 CDS-      1  Amino acid sequence -    245aa
MKKVLIAALIAGFSLSATAAETIRFATEASYPPFESIDANNQIVGFDVVDLAQALCKEID
ATCTFSNQAFDSLIPSLKFRRVEAVMGGMDITPEREKQVLFTTPYYDNSALFVGQQGKYT
SVDQLKGKKVRSVQNGTTHQKFIMDKHPEITTVPYDSYQNAKLDLENGRIDGVFGDTAVV
HEWLKDNPKLVVVGDKVTDKDYFGTGLGIAVRQGNTELQQKLNTALEKVKKDGTYETIYN
KWFQK
 CDS-      2  Amino acid sequence -    222aa
MFEYLPELMKGLHTSLTLTVASLIVALILALIFTIILTLKTPVLVWLVRGYITLFTGTPL
LVRIFLIYYGPGQFPTLQEYPALWHLLSEPWLCALIALSVNSAAYTTQLFYGAIRAIPEG
QWQSCSALGMSKKDTLAILLPYAFKRSLSSYSNEVVLVFKSTSLAYTITLMEVMGYSQLL
YGRTYDVMVFGAAGIIYLVVNGLLTLMMRLIERKAVAFERRN
 CDS-      3  Amino acid sequence -    243aa
MKKLVLAALLASFTFGASAAEKINFGVSATYPPFESIGANNEIVGFDIDLAKALCKQMQA
ECTFTNHAFDSLIPSLKFRKYDAVISGMDITPERSKQVSFTTPYYENSAVVIAKKDTYKT
FADLKGKCIGMENGTTHQKYIQDQHPEVKTVSYDSYQNAFIDLKNGRIDGVFGDTAVVNE
WLKTNPQLGVATEKVTDPQYFGTGLGIAVRPDNKALLEKLNNALAAIKADGTYQKISDQW
FPQ

Reference about the methods:

  The method is described in detail in:
  Solovyev V.V.,Salamov A.A., Lawrence C.B.
   Predicting internal exons by oligonucleotide composition and 
   discriminant analysis of spliceable open reading frames. 
  (Nucl.Acids Res.,1994, 22,24, 5156-5163).

   Solovyev V.V., Salamov A.A. , Lawrence C.B.
   The prediction of human exons by oligonucleotide composition and 
   discriminant analysis of spliceable open reading frames.
   in: The Second International conference on Intelligent systems
   for Molecular Biology (eds. Altman R., Brutlag D.,
   Karp R., Latrop R. and Searls D.), AAAI Press, Menlo Park, CA 
   1994, 354-362. 

Problems, comments, and suggestion:
   can be mailed to solovyev at cmb.bcm.tmc.edu.



More information about the Bionews mailing list