PROSITE Search with Statistics (BCM WWW) PSITE

Victor V. Solovyev solovyev at cmb.bcm.tmc.edu
Sun Apr 28 17:45:33 EST 1996


	PSITE - Search for of prosite patterns with statistical estimation
		(Version 1) by Solovyev V.V.

	Analysis of amino acid sequences is available through WWW:

http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html
 
Method description:

The method is based on statistical estimation of expected number of
a prosite pattern in a given sequence. It uses the PROSITE database
(author: Amos Bairoch,1995) of functional motifs. If we found 
a pattern which has expected number significantly less than 1, 
it can be supposed that the analysed sequence possesses the 
pattern function. Presented version 1 is the simplest
version that search for patterns without any deviation from a given 
Prosite consensus. In the following version we will include this possibility.

In the output of PSITE we can see a prosite pattern, its position in the sequence,
accession number, ID, Description in the PROSITE database  as well as 
Document number where is pattern characteristics outlined. 

It must be noted that patterns which started at the begining or end of protein
sequence will be recognized along the whole sequence in this version. It may
be useful for analysis of ORF or 6 frame translation sequences.

 Asknowledgments: We asknowledge Ilgam Sahmuradov and Igor Rogozin which
                 took part in development some applications of this method for
                 nucleotide consensuses searching and Asya Salihova for
                 protein sites searching on IBM PC. 

 Submitting sequences via WWW:


Past your amino acid sequence to the WWW page window
  
  RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGV
  KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRA
  QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
  PLVQREDDRPETVVK............
   (Restrict the line length to 75 characters).

Example of PSITE output:		

 PSITE V1 - search for Prosite patterns
         10        20        30        40        50        60
 RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLAKTFIDQGKLI
         70        80        90       100       110       120
 PDDVMTRLVLHELKN*TQYNWLLDGFPRTLPQAEALDRAYQIDTVINLNVPFEVIKQRLT
        130       140       150       160       170       180
 ARWIHPGSGRVYNIEFNPPKTMGIDDLTGEPLVQREDDRPETVVKRLKAYEAQTEPVLEY
        190       200       210       220       230       240
 YRKKGVLETFSYTETNKIWPHVYAFLQTKLPDANKDDALDQREWSAAAAWLAAAAALDLN
        250       260       270       280       290       300
 AGCPAAALAAAAAGSAACAAAAAFAAAAAACCAACAAAAAAACAAAADAACGAYAYACAP

ID   GLYCOSAMINOGLYCAN; RULE.
AC   PS00002;
DE   Glycosaminoglycan attachment site.
DO   PDOC00002;
PA   S-G-x-G.
 Sites found:  1 Expected number:   0.0272 95% confidential interval:   0
  #  Start  End  Expected  Site sequence
  1    12    15   0.0272  SGKG
ID   EF_HAND; PATTERN.
AC   PS00018;
DE   EF-hand calcium-binding domain.
DO   PDOC00018;
PA   D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)-
PA   [DE]-[LIVMFYW].
 Sites found:  1 Expected number:   0.0004 95% confidential interval:   0
  #  Start  End  Expected  Site sequence
  1   212   224   0.0004  DANKDDALDQREW
ID   ADENYLATE_KINASE; PATTERN.
AC   PS00113;
DE   Adenylate kinase signature.
DO   PDOC00104;
PA   [LIVMFYW](3)-D-G-[FY]-P-R-x(3)-[NQ].
 Sites found:  1 Expected number:   0.0000 95% confidential interval:   0
  #  Start  End  Expected  Site sequence
  1    81    92   0.0000  WLLDGFPRTLPQ

 Reference:

Solovyev V.V., Kolchanov N.A. 1994,
  Search for functional sites using consensus
  In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim 
  H.A.), World Scientific, p.16-21.
=======================================================================================
 
The other programs in BCM Gene-Finder service:
========================================================================================
	Analysis of uncharacterized human sequences is available through the  
	Weizmann Institute of Science
Gene-Server by sending the file containing a sequence (a sequence name is in the first line)

 to  services at bioinformatics.weizmann.ac.il       with the subject line "fgenehb".

Examples: mail -s fgenehb services at bioinformatics.weizmann.ac.il < test.seq
		    where test.seq a file with the sequence.
 


You can use also WWW BCM Human Genome Center and Search launcher
  Home page to get the help file URL:http://kiwi.imgen.bcm.tmc.edu:8088/search-launcher/launcher.html

for accsess to Gene-finder prediction Help files and programs. ->  BCM Gene Finder

or directly: http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html



----------------------------------------------------------------------------------

 Questions:solovyev at cmb.bcm.tmc.edu

================== The  services are =============================================================== 

FGEBEHB - search for Mammalian gene structure with exons assembling by dynamic programming and
		using similarity information with known proteins by data base scaning with fasta 
FEXHB   - search for Mammalian coding exons using exon recognition functions and similarity information
                with known proteins by data base scaning with fasta

		(the above 2 programs are available by ftp to run locally, 
                 the others can be used
                 through WWW and Email servers of Houston University and Weizmann Institute of Science): 

	mail -s fgeneh services at bioinformatics.weizmann.ac.il < test.seq
	
	mail -s fexh service at theory.bchs.uh.edu < test.seq

FGENEH - search for Mammalian gene structure with exons assembling by dynamic 
								programming
FEXH   - search for 5'-, internal and 3'-exons
HEXON  - search for internal exons
HSPL   - search for splice sites
RNASPL - prediction exon-exon junctions in cDNA sequences
CDSB   - prediction of Bacterial coding regions
HBR    - recognition of human and bacterial sequences to test a library
         for E. coli contamination by sequencing example clones
TSSG   - recognition of human promoter regions (Ghosh/Prestridge motif data)
TSSW   - recognition of human promoter regions (Weingender motif data base)
POLYAH - recognition of of 3'-end cleavage and polyadenilation region
         of human mRNA precursors

FGENED - search for Drosophila gene structure with exons assembling by dynamic 
	 programming
FEXD - search for Drosophila 5'-, internal and 3'-exons
DSPL - search for Drosophila splice sites


FGENEN - search for Nematode gene structure with exons assembling by dynamic 
	 programming
FEXN - search for Nematode 5'-, internal and 3'-exons
NSPL - search for Nematode splice sites

FGENEA - search for Plant gene structure with exons assembling by dynamic 
	 programming
FEXA - search for Plant 5'-, internal and 3'-exons
ASPL - search for Plant splice sites
============================================================================

WWW address: http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html

SSP    - prediction of a-helix and b-strand in globular proteins
	 by segment-oriented approach. 
NSSP   - prediction of a-helix and b-strand segments in globular proteins
         by nearest-neighbor algorithm.









More information about the Bio-soft mailing list