BLAST

chai_z at wehi.edu.au chai_z at wehi.edu.au
Fri Mar 5 01:45:05 EST 1993


In article <1993Mar2.120540.1382 at gserv1.dl.ac.uk>, risler at cgmvax.cgm.cnrs-gif.fr writes:
> 
>  Dear fellow netters,
> 
>  Like many of you, I use BLAST at NCBI for searching sequence databanks.
>  Like many of you, I don't like using programs when I don't understand what
>  (and how) they do.
>  Hence I've tried to read the original papers about BLAST and, in particular,
>  I've tried to understand how they compute the probability P(N) associated
>  with a given score. I must confess that I failed to fully understand, either
>  because I'm just stupid and/or because it is not clearly written. In any
>  case, I thought that P(N) was computed from the figures obtained by a very
>  large number of simulations. If this was true, then this probability should
>  be the same for the same hit whatever the databank used.
> 
>  A colleague of mine recently searched a protein sequence with BLAST against
>  the "non-redundant protein databank" and against Swissprot. She got in both
>  cases the same hit with the same score, but with different probabilities.
>  With the non-redundant database P(N) was 0.84 and with Swissprot P(N) was
>  0.51. The segment pairs were exactly the same in both cases.
> 
>  Could somebody help me understand?
> 
>  Thank you,
> 
>   --------------------------------------------------------------------
>  | Jean-Loup Risler                   |                               |
>  | CNRS                               | risler at frcgm51.bitnet         |
>  | Centre de Genetique Moleculaire    | risler at cgmvax.cgm.cnrs-gif.fr |
>  | 91198  Gif sur Yvette Cedex France |                               |
>   --------------------------------------------------------------------



1




More information about the Bio-soft mailing list