[Computational-biology] Re: ask help for sequences alignment
Kevin Karplus
via comp-bio%40net.bio.net
(by karplus At cheep.cse.ucsc.edu)
Mon Dec 11 11:20:09 EST 2006
On 2006-12-09, jianhao cao <biocjh At gmail.com> wrote:
> i am writing to ask help for some question on sequences alignment.
> 1 what is the different between the P value and E value?
The P-value of an event with a given null model is the probability of
seeing that event under the null model in a single trial. For
sequence alignment, the null model is usually a crude model that
assumes that all letters are independent and drawn from the same
background distribution.
The E-value is the expected number of events in a series of trials.
If there are n trials, the E-value is n times the p-value.
Some people and programs (such as wu-blast) use a p-value of a
composite event---the event of having at least one occurence in n-trials:
P_n = 1 - (1-p_1)^n
This is equivalent to the E-value (p_1 *n) when both are small, but
gets uselessly close to 1 as the E-value gets bigger.
> how to calculate the two value in a certain alignment?
Read Karlin and Altschul's paper:
@article{karlin90,
author="Karlin, Samuel and Altschul, Stephen F.",
title="Methods for Assessing the Statistical Significance of
Molecular Sequence Features by Using General Scoring Schemes",
journal=pnas,
year=1990, month=Mar,
volume=87,
pages="2264--2268"
}
> 2 why if the P value<0.001 in a sequence alignment, the result has a
> biological meaning?
It doesn't, necessarily. All a P-value <0.001 means is that there is
only one chance in a thousand of getting a score that good from the
null model. It says nothing about the biology---just that null
hypothesis is not a good one. This could be because of a biological
phenomenon, or because the null model is ignoring some important
characteristic of the data that has nothing to do with the hypothesis
you think you are testing.
In general, bioinformaticians and biologists need to pay a lot more
attention to their null models---a lot of the trash I see published is
because people assume that invalidating a stupid null model somehow
establishes their bizarre hypothesis, when there are much simpler
explanations available.
------------------------------------------------------------
Kevin Karplus karplus At soe.ucsc.edu http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE) (Board of Directors & Chair of Education Committee, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.
More information about the Comp-bio
mailing list