estimating K and Lambda from an extreme value distribution

Kevin Karplus karplus at cheep.cse.ucsc.edu
Mon Mar 1 16:11:33 EST 2004


In article <giu11czfui.fsf at pusch.xnet.com>, Gordon D. Pusch wrote:
> Just to stir the pot a little about the near-universal abuse of extreme value 
> theory that routinely occurs in bioinformatics: Since a "random sequence"
> model underlies the derivation of the so-called "Karlin-Altschul distribution" 
> used by BLAST (whose correct name is the "Gumbel distribution," since
> Gumbel discovered it and the other two asymptotic classes of extreme value
> distribution decades before Karlin and Altschul), should not this exact
> same objection also be equally true of the "standard" P-values returned 
> by BLAST --- which everyone still uses on a routine basis ???  >:-I

The derivation of the Gumbel distribution may not be rock
solid---indeed there are problems with the null model assumptions used
in BLAST, but the authors of BLAST have continued to improve the
composition and length corrections, and have good empirical evidence
that the Gumbel distribution is a good fit to their scores.

A good paper to read is 
@article{improved-psiblast-2001,
	author={Sch\"affer, Alejandro A.  and Aravind, L. 
		and Madden, Thomas L. and Shavirin, Sergei
		and Spouge, John L. and Wolf, Yuri I.
		and Koonin, Eugene and Altschul, Stephen F.},
	title ="Improving the accuracy of {PSI-BLAST} protein database
	searches with composition-based statistics and other refinements",
	journal="Nucleic Acids Research",
	volume=29, number=14,
	year=2001,
	pages="2994-3005"
	}

Of course, the real reason that people routinely use the BLAST
e-values is not because the auhtors of BLAST have been very careful to
make the e-values as accurate as they can (although they have been
careful), but because many biologists have blind faith in their
computational tools.  One would expect wet-lab scientists to have a
healthy scepticism of any results, knowing how often experiments fail,
and how much bad data has made it out into the literature, but many
seem to have an almost mystical faith in anything produced by
computation.  (On the other hand, computational people seem to have an
almost mystical faith in wet-lab verification---expecting experiments
to be neat, quick deterministic tests like "if" statements in code.)

-- 
Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
Affiliations for identification only.




More information about the Comp-bio mailing list