New Gene-finder program to test E. coli contamination

Victor V. Solovyev solovyev at cmb.bcm.tmc.edu
Thu Apr 27 11:01:56 EST 1995


*************************************************************************
	HBR - Recognition of Human and E.coli sequences to test 
              a library for E. coli contamination
 	Department of Cell Biology, Baylor College of Medicine
=========================================================================
	Analysis of sequences is available through 
Weizmann Institute of Science server 
with the name of the program in the subject line 

Example:
mail -s hbr services at bioinformatics.weizmann.ac.il < test.set

Soon HBR will be installd in the University of Houston server:  
mail -s hbr service at theory.bchs.uh.edu < test.set

where test.set a file with one or some sequences.

 And you can run the program by
WWW BCM Human Genome Center and Search launcher Home page
URL:http://kiwi.imgen.bcm.tmc.edu:8088/search-launcher/launcher.html

for accsess to Gene-finder prediction Help files and programs.
->  BCM Gene Finder

Description:
**********************
Recognition of human and bacterial sequences (HBR) to test 
a library for E. coli contamination by sequencing example 
clones. The program calculates the probability to be a human
sequence (P) or E.coli sequence (1-P) for each sequence of your
set and the total percentage human and bacterial sequences in 
the set.
  
  The method is based on linear discriminant functions 
  Solovyev V.V.,Salamov A.A., Lawrence C.B.
   Predicting internal exons by oligonucleotide composition and 
   discriminant analysis of spliceable open reading frames. 
  (Nucl.Acids Res.,1994, 22,24, 5156-5163).

Accuracy:
********************************
The accuracy of recognition is about 99%. But you have better to present
long sequences and enough representative set of them.
   We recommend to analyse 400 bp and longer sequences and do not 
take into account the sequences with 0.4 < P < 0.6 which can not be
reliable assigned to human or E.coli group.


Submitting sequences via email:
***********************************
  For email submission the sequences must have the following format:  

 Name of 1st sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
aagacagatacatcccaccatgatcaggatcacccaaccttcaacaagatcacccccaac
ctggctgagttcgccttcagcctataccgccagctggcacaccagtccaacagcaccaat
 Name of 2nd sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
atcttcttctccccagtgagcatcg...............
......
   (Restrict the line length to less than 80 characters;
    The line with the sequence name must have at least one 'Space' symbol
    in the first position).

HBR output:		
******************
   1st line - total number and % of human and E.coli sequences
	the next groups of 3 lines: 
   1st line - numbers of your sequences
   2nd line - length of them 
   3d  line - Probability to be Human sequence (P) or bacterial (1-P)

   For example:

 Number of sequences-     12 % human=  50 % bacterial=  50
     1     2     3     4     5     6     7     8     9    10
   900   960  1501   360   360  1020   330   480   240   541
  1.00  1.00  0.83  1.00  1.00  0.00  0.01  0.01  0.00  0.01
    11    12
   720   540
  0.57  0.01

Problems, comments, and suggestion:
   can be mailed to solovyev at cmb.bcm.tmc.edu.
   




More information about the Bio-soft mailing list