modelling protein structPRINTures

Andrej Sali sali at tamika.rockefeller.edu
Tue Jan 31 10:15:09 EST 1995


In article <D38E1q.8y9 at ebi.ac.uk> ewan birney <birney at molbiol.ox.ac.uk> writes:
> sali at tamika.rockefeller.edu (Andrej Sali) wrote:
> >
> > In article <3gbenc$and at mserv1.dl.ac.uk> <bionet at cgmvax.cgm.cnrs-gif.fr>  
> > writes:
> > > >
> > > >This is much too pesimistic. About one third of all currently known
> > > >sequences are related to at least one currently known structure. 
> > > >
> > > 
> > > ??? You really mean that 15,000 sequences from Swissprot (for example)  
> > are 
> > > related to at least one entry in the PDB ? I'd be interested in getting  
> > a 
> > > reference on this subject.
> > > 
> > > Cheers,
> > > 
> > > Jean-Loup
> > > 
> > > 
> > >    ---------------------------------------------------------------------
> > >   Jean-Loup Risler                      Tel:  (33 1) 69 82 31 34
> > >   CNRS		                        Fax:  (33 1) 69 07 49 73
> > >   Centre de Genetique Moleculaire	Email:  
> > risler at cgmvax.cgm.cnrs-gif.fr
> > >   91198 Gif sur Yvette Cedex  France    
> > >    ---------------------------------------------------------------------
> > > 
> > > 
> > 
> > I have not meant exactly what you said, because I wanted to be  
> > conservative, but it is close enough. Many of the actually related  
> > sequence-structure pairs cannot be detected as such (yet) because the  
> > usual sequence alignments and even threading techniques are not perfect  
> > (yet). You can get the hard numbers in the very nice paper by Orengo,  
> > Jones, Thornton, Nature 372, pp 631, 1994. Sander, Holm et al also did  
> > some nice work along these lines.
> > 
> > Andrej
> > 
> > P.S. My own hard number related to this argument is that about one third  
> > of currently deposited PDB structures have significant sequence similarity  
> > to at least one already deposited PDB structure (>30%).
> 
> This of course assumes that the currently determined PDB structures
> are a random selection of protein sequences, which I think is 
> unlikely, but someone can correct me if he/she has evidence.
> 
> (I have no idea either how robust this sort of extrapolation would be
> either as you started to deviate from a Normal distribution....)
> 
> ewan

The argument based on PDB exclusively is of course dependent strongly on how representative the  
proteins in PDB are and I agree with you that this estimate is very approximate as far as the  
completness of the fold database is concerned. However, comparisons between sequence and PDB  
databases do not depend on that so much and they give similar results (10-40% is similar). See  
the Orengo&Jones&Thornton paper in Nature that discusses this issue, too, among others 

Andrej




More information about the Bio-soft mailing list