Sean Eddy wrote:
>> In article <6anmv7$osi at net.bio.net> Iddo Friedberg <idoerg at cc.huji.ac.il> writes:
> >This Monte-Carlo strategy of evaluating alignment scores is being used
> >routinely in the GCG sequence alignment programs. Basically, the idea is
> >as you stated it. Once you make, say, 100 randomizations, you get a
> >normal distribution of scores (vs. the random) with a given mean, and
> ^^^^^^^^^^^^^^^^^^^
[snip...]
> And it's since been shown (papers by Karlin, Altschul, and others)
> that the reason for this is that the score distribution for local
> alignments is not a normal distribution. Z-scoring is unreliable,
> giving overestimates of how significant a score is. The score
> distribution is instead closer to an extreme value distribution, with
> a longer tail than the Gaussian. Bill Pearson's FASTA/SSEARCH software
> package is an example of a package that lets you do Monte Carlo
> estimation of alignment significance using the extreme value
> distribution.
>
I stand corrected... the tail is indeed longer than Gaussian, hence what
follows in Sean's post.
Iddo
--
Iddo Friedberg
Phone: (972)-2-6758647
email: idoerg at cc.huji.ac.il
web: http://www.ls.huji.ac.il/~idoerg
More info: finger idoerg at cc.huji.ac.il