Blastp search in Human Predicted Genes and Proteins Database (INFOGENP)

Victor Solovyev solovyev at sanger.ac.uk
Mon Sep 28 06:10:42 EST 1998


Blastp search in DB of Human Predicted Genes and Proteins
------------------------------------------------------------------
  We install NCBI's Gapped BLASTP search in Database of protein sequences of
predicted genes (INFOGENEP) of finished and unfinished human sequences
    at http://genomic.sanger.ac.uk/db.html
  Web page of Computational Genomic Group of the Sanger Centre.

If you find some interesting similarity with your sequence you can use

ID to check the gene structure of this protein in the INFOGENP DB
and get clone name and sequence

Example:
==============
a sequence T0078 from CASP3 has significan similarity with protein of
predicted gene GHS005230 from clone dJ337O18 of Sanger finished sequences

Query= Query:
         (288 letters)

Database: INFOGENE_PREDICTIONS
           18,574 sequences; 3,010,387 total letters

Searching..................................................done

                                                                   Score     E
Sequences producing significant alignments:                        (bits)
 Value

ID GHS005230  dJ337O18  human_pg  Sanger_finished 319                212  8e-56
ID GHS005240  dJ337O18  human_pg  Sanger_finished  477               124  2e-29

>ID GHS005230  dJ337O18  human_pg  Sanger_finished 319
           Length = 319

 Score =  212 bits (533), Expect = 8e-56
 Identities = 120/282 (42%), Positives = 167/282 (58%), Gaps = 10/282 (3%)

Query: 12  TLLNLEKIEEGLFRGQSEDLGLRQVFGGQVVGQALYAAKETVPEERLVHSFHSYFLRPGD 71
           T+LNLE ++E LFRG+   +  +++FGGQ+VGQAL AA ++V E+  VHS H YF+R GD
Sbjct: 30  TVLNLEPLDEDLFRGRHYWVPAKRLFGGQIVGQALVAAAKSVSEDVHVHSLHCYFVRAGD 89

Query: 72  SKKPIIYDVETLRDGNSFSARRVAAIQNGKPIFYMTASF-QAPEAGFEHQKTMPSAPAPD 130
            K P++Y VE  R G+SFS R V A+Q+GKPIF   ASF QA  +  +HQ +MP+ P P+
Sbjct: 90  PKLPVLYQVERTRTGSSFSVRSVKAVQHGKPIFICQASFQQAQPSPMQHQFSMPTVPPPE 149

Query: 131 G-LPSETQIAQ-----SLAHLLPPVLKDKFICDRPLEVRPVEFHNPLKGHVAEPHRQVWI 184
             L  ET I Q     +L    P  L      + P+E++PV      +    EP +  W+
Sbjct: 150 ELLDCETLIDQYLRDPNLQKRYPLALNRIAAQEVPIEIKPVNPSPLSQLQRMEPKQMFWV 209

Query: 185 RANGSVPD-DLRVHQYLLGYASDLNFLPVALQPHGIGFLEPGIQIATIDHSMWFHRPFNL 243
           RA G + + D+++H  +  Y SD  FL  AL PH   +      + ++DHSMWFH PF
Sbjct: 210 RARGYIGEGDMKMHCCVAAYISDYAFLGTALLPH--QWQHKVHFMVSLDHSMWFHAPFRA 267

Query: 244 NEWLLYSVESTSASSARGFVRGEFYTQDGVLVASTVQEGVMR 285
           + W+LY  ES  A  +RG V G  + QDGVL  +  QEGV+R
Sbjct: 268 DHWMLYECESPWAGGSRGLVHGRLWRQDGVLAVTCAQEGVIR 309

-----------------------------------------------

        Currently DB includes
genes predicted for Sanger finished and unfinished sequences.
It is about 1500 locuses and 18000 protein sequences corresponding
predicted genes (by Fgenes and Genescan programs).
       Exons predicted by both programare much more often the real ones.
Known Protein and EST similarity included in the data.
  This DB will include all predicted genes and protein for
the Human genome draft as well as genes and proteins predicted for
other model organisms.
  The database list:
Predicted GENES Structure Database (INFOGENEP Rel 1.)
Nucleotide and Protein sequences of INFOGENEP genes
GENES Structure and Functioning Database (INFOGENE Rel 1.)
Nucleotide and Protein sequences of INFOGENE genes


-- 
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk  http://genomic.sanger.ac.uk
Phone: 44-1223-494799  FAX:   44-1223-494919




More information about the Bio-www mailing list