# How to calculate ?

Steven Brenner brenner at mole.bio.cam.ac.uk
Sun Aug 11 12:55:41 EST 1996

```While very interesting, the equation below won't work (I think)
because the various parameters aren't all independent if you're only
considering the best (or near best) regions of alignment.

There is an extensive theory behind probabilities of two sequences
matching each other with a given level of similarity over a particular
length.

An introduction to the theory, with many references, can be found on
the BLAST help page at:
http://www.ncbi.nlm.nih.gov/BLAST/blast_help.html

An excerpt is:

From Karlin and  Altschul  (1990),  the  principal  equation
relating  the  score  of an HSP to its expected frequency of
chance occurrence is:

E = K N exp(-Lambda S)

where E is the expected frequency of chance occurrence of an
HSP having score S (or one scoring higher); K and Lambda are
Karlin-Altschul parameters; N is the product  of  the  query
and  database  sequence  lengths,  or the size of the search
space; and exp is the exponentiation function.

Lambda may be thought of as the expected increase in  relia-
bility  of  an  alignment associated with a unit increase in
alignment score.  Reliability in this case is  expressed  in
units  of  information,  such  as bits or nats, with one nat
being equivalent to 1/log(2) (roughly 1.44) bits.

leen at bio-3.bsd.uchicago.edu (Lee Newberg) writes:
>The average number of "matches" with exactly those parameters
>that arises randomly is not too difficult to figure out.
...
> Putting it all together gives

>E = (L1 + 1 - LR) * (L2 + 1 - LR) * (LR choose N) * (25%)^(LR-N) * (75%)^N

>In article <4u6oio\$5tg at mserv1.dl.ac.uk>,
>> Dear all,
>>
>> I can't find a good idea, how to calculate:
>>
>> Than I comparing two sequences (amino acid or nucleotide)
>> with length L1 and L2, I get a common region with
>> length LR, containing N mismatches.
>> The questions are:
>> What a chance to obtain such region in unrelated sequences ?
>> Can I use the binomical formulas for this case ?
>>
>> Could any body send me the formulas to calculate this chance
>> or reference for it ?
>>
>>
>>

--
Steven E. Brenner                    | S.E.Brenner at bioc.cam.ac.uk
MRC Laboratory of Molecular Biology  |
Hills Road                           | Office:   +44 1223 248011
Cambridge CB2 2QH, UK                | Fax:      +44 1223 213556

```