Prediction Bacterial coding regions New server
Victor V. Solovyev
solovyev at cmb.bcm.tmc.edu
Wed Mar 29 12:48:03 EST 1995
*************************************************************************
CDSB - Prediction of Protein coding regions in Bacterial sequences
(Version 1. 3.21.95)
Department of Cell Biology, Baylor College of Medicine
=========================================================================
Analysis of uncharacterized human sequences is available through
University of Houston and Weizmann Institute of Science servers:
with the name of the program in the subject line
Examples:
mail -s cdsb service at theory.bchs.uh.edu < test.seq
mail -s cdsb services at bioinformatics.weizmann.ac.il < test.seq
where test.seq a file with your sequence.
Method description:
**********************
The method is based on the discriminant analysis of open reading
frames flanked ATG(GTG) and STOP codon pairs. Prediction is performed by
linear discriminant function combining characteristics describing
5'- and 3'-mRNA regions and also coding region for each open reading frame.
The program parameters were calculated based on E.coli annotated gene
sequences.
Accuracy:
********************************
The accuracy is about 94% recognition of coding nucleotides within a
sequence up to 400000 bp long. The program predict nonoverlapping
protein coding regions in a given sequence.
Submitting sequences via email:
***********************************
For email submission the sequences must have the following format:
Nane of your sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
aagacagatacatcccaccatgatcaggatcacccaaccttcaacaagatcacccccaac
ctggctgagttcgccttcagcctataccgccagctggcacaccagtccaacagcaccaat
atcttcttctccccagtgagcatcg...............
(Restrict the line length to 80 characters or less).
Hexon output:
******************
1st line - name of the program
2nd line - name of your sequence
3nd line - length of your sequence, thereshold (will be optional),
number of potential CDS
4th line and next - positions and scores of predicted exons
and amino acid sequences of predicted CDS
For example:
CDSB search for protein coding regions in E.coli sequences
ECAPTS 3708 bp DNA BCT
Length: 3708 Threshold: 0.0, # of potential CDS: 3
152 - 886 w= 0.94
1612 - 2277 w= 0.26
2572 - 3300 w= 0.72
CDS- 1 Amino acid sequence - 245aa
MKKVLIAALIAGFSLSATAAETIRFATEASYPPFESIDANNQIVGFDVVDLAQALCKEID
ATCTFSNQAFDSLIPSLKFRRVEAVMGGMDITPEREKQVLFTTPYYDNSALFVGQQGKYT
SVDQLKGKKVRSVQNGTTHQKFIMDKHPEITTVPYDSYQNAKLDLENGRIDGVFGDTAVV
HEWLKDNPKLVVVGDKVTDKDYFGTGLGIAVRQGNTELQQKLNTALEKVKKDGTYETIYN
KWFQK
CDS- 2 Amino acid sequence - 222aa
MFEYLPELMKGLHTSLTLTVASLIVALILALIFTIILTLKTPVLVWLVRGYITLFTGTPL
LVRIFLIYYGPGQFPTLQEYPALWHLLSEPWLCALIALSVNSAAYTTQLFYGAIRAIPEG
QWQSCSALGMSKKDTLAILLPYAFKRSLSSYSNEVVLVFKSTSLAYTITLMEVMGYSQLL
YGRTYDVMVFGAAGIIYLVVNGLLTLMMRLIERKAVAFERRN
CDS- 3 Amino acid sequence - 243aa
MKKLVLAALLASFTFGASAAEKINFGVSATYPPFESIGANNEIVGFDIDLAKALCKQMQA
ECTFTNHAFDSLIPSLKFRKYDAVISGMDITPERSKQVSFTTPYYENSAVVIAKKDTYKT
FADLKGKCIGMENGTTHQKYIQDQHPEVKTVSYDSYQNAFIDLKNGRIDGVFGDTAVVNE
WLKTNPQLGVATEKVTDPQYFGTGLGIAVRPDNKALLEKLNNALAAIKADGTYQKISDQW
FPQ
Reference about the methods:
The method is described in detail in:
Solovyev V.V.,Salamov A.A., Lawrence C.B.
Predicting internal exons by oligonucleotide composition and
discriminant analysis of spliceable open reading frames.
(Nucl.Acids Res.,1994, 22,24, 5156-5163).
Solovyev V.V., Salamov A.A. , Lawrence C.B.
The prediction of human exons by oligonucleotide composition and
discriminant analysis of spliceable open reading frames.
in: The Second International conference on Intelligent systems
for Molecular Biology (eds. Altman R., Brutlag D.,
Karp R., Latrop R. and Searls D.), AAAI Press, Menlo Park, CA
1994, 354-362.
Problems, comments, and suggestion:
can be mailed to solovyev at cmb.bcm.tmc.edu.
More information about the Bio-soft
mailing list