estimating K and Lambda from an extreme value distribution

Kevin Karplus karplus at cheep.cse.ucsc.edu
Mon Feb 23 00:15:04 EST 2004


In article <cb92c3bc.0402181458.7cd25d94 at posting.google.com>, Ranjeeva wrote:
> Hi All,
> 
> I'm trying to fit a set of scores I get from searching  a database of
> 1000 amino acid sequences with a HMM. I want to calculate a p-value
> for each matching score. My questions are
> 
> a) How do you estimate the scalling factors K and Lambda to fit my
> scores (1000) to an extreme value distribution?
> 
> b) How do I then calculate a p-value/E-value from this ditribution for
> a given score?

Fitting an extreme-value distribution to the scores from a real search
of a real database is a bad idea.  If there are any matches in the
database, they will make the tail you are trying to fit much fatter,
and your p-values will be very inaccurate.  You need to calibrate the
HMM on sequences that look like the real database sequences but do NOT
include any true positives.

There are several papers in the literature on doing such fits.
I believe that one fairly recent one is 

@ARTICLE{BAILEY98,
 KEY            ="BAILEY",
 AUTHOR         ="Bailey, TL and Gribskov, M.",
 TITLE          ="Methods and Statistics for Combining Motif Match Scores",
 JOURNAL        ="Journal of Computational Biology",
 YEAR           ="1998",
 month          =" SUMMER",
 volume         ="5",
 number         ="2",
 pages          ="211-221",
}

but I don't have a copy immediately at had to verify that I have the
right citation.


-- 
Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
Affiliations for identification only.




More information about the Comp-bio mailing list