Some myths concerning statistical hypothesis testing
Richard Vickery
Richard.Vickery at unsw.edu.au
Fri Nov 8 08:06:57 EST 2002
Glenn,
your posts are couched in a somewhat hostile manner which does not
encourage others to join in or to ask for clarification.
> 1.) a p-value is a conditional probability of the form p(A/B) where A is the
> observation and B is the truth of the null hypothesis.
>
> 2.) you don't know if B is true or false.
>
> Conclusion: whatever a p-value is, it cannot be a quantitative assessment of
> the truth of B because the meaning of the p-value is dependent on B and you
> don't know what B is. Now attack the premises or the conclusion. I dare you.
So p is the probability of B given A. I am not sure where truth comes
into it. But the quantitative assessment is the conditional probability.
I know what B is (typically, that the two results are sampled from a
common population that is normally distributed, homoscedastic etc), I
just don't know if it is true.
> > Marc does not say what follows in his paper, but this > misconception has
> produced a state of affairs in which a great deal of > importance is
> attached to findings before it is clear that the finding is > reliable. The
> result is that there is, all things being equal, a great deal > of
> discrepant results in the various scientific literatures that rely on >
> statistical significance testing. In contrast, for sciences in which the >
> reliability is demonstrated in each subject (usually repeatedly), or >
> "subject" if the preparation is not a whole animal, there is far less >
> failure to replicate (this is because such data are published only when >
> there have been numerous demonstrations of reliability within and across >
> subjects). For an example of how this is done, you may examine my paper: The
> > Effects of Acutely Administered Cocaine on Responding Maintained by a >
> Progressive-ratio Schedule of Food Presentation, which is in press in >
> Behavioural Pharmacology. Or, you may examine virtually any paper in the >
> Journal of the Experimental Analysis of Behavior. Or you may obtain a copy >
> of Sidman's Tactics of Scientific Research, or even Claude Bernard's classic
> > book. >
>
> Mat: doh! you are doing the very same as the people you chastise! by
> repeating the experiments you are increasing your n, such that if there is a
> true difference it should become apparent.
>
> GS: Nonsense. What I am doing, and what others like me do is directly
> demonstrating the reliability. That's why it is not unheard of to publish
> data collected and analyzed "individual-subject style" with 3 subjects. And
> such data are, as I explained, generally proven to be reliable through
> direct and systematic replication. What "thinkers" like you do is increase
> the N because doing so will almost always result in differences even if the
> "effect" is virtually nonexistent (see below).
Glenn, it is vey dependendent upon what you work on. I record single
neurons. They just don't hang around long enough to do a lot of repeated
measures. I also can't see why you prefer 3 people tested 5 times to 15
people tested once, unless you need trained subjects, or you want to look
at intra and inter-subject variability which might be important for some
things. For many clinical trials the patient gets better with treatment,
and it is not ethical to make them sick again ;-)
> averaged together. And if, say, only two subjects showed the effect in
> question, I wouldn't publish the data, but I would strongly suspect that
> there was something worth pursuing, and I might try to figure out why I got
> the effect in only two of the animals.
Surely this depends on what the effect is. Aren't there a small
proportion of people who are HIV positve but never develop AIDS. Even if
they were 2 out of 100, they would be worth investigating. This is
really to do with being a good scientist, not a stats abuser. I don't
think anyone is disagreeing with this. This is a very different
situation from a controlled randomized trial where you are not exploring,
but simply testing a simple hypothesis.
> GS: No, it doesn't. It tells you that IF THE NULL HYPOTHESIS IS TRUE (which
> you don't know) there is a 5% or 1% chance of obtaining the data again.
> Since you don't know if the null hypothesis is true or not, you have no
> quantitative assessment of the likelihood of obtaining the observation,
But you're not interested in the likelihood of getting the observation -
you already have it. The issue is that if the likelihood of getting the
data was small given that the null hypothesis is true, we choose to take
a punt and say the null hypothesis is likely not true.
> GS: Think about this: if you have a drug that produces large effects in 40%
> of the sample, and no effect in the other 60%, one could obtain statistical
> significance if one increased the N enough. So now we have an effect that
> works in only 40% of the population and it is deemed important and reliable?
> If you are dying, you might want to try it, but only an insipid idiot would
> call it reliable. Yet this is, apparently, your version of "modern science."
> But, of course, in most experiments, not even the researcher may know how
> many of his subjects actually showed an effect. All he or she may know
> (because that is all they are paying attention to) is that p<.01. And
> certainly the reader usually has no clue as to how many of the subjects
> actually "had" the "effect." In medical research, fortunately, there is some
> pressure to pay close attention to the individual effects (BTW, Mat, if it
> is possible to judge an effect in an individual, what do you need statistics
> for?) . However, I argue, and occasionally some enlightened MD argues, that
> significance testing is dangerous. Sometimes you have nothing else but quite
> often you do.
Aren't we all on the same page? You plot the data. You look for sub-
groups and weird effects. You can test for some of these properties. if
everything looks like a homogeneous group then you can do some
inferential stats on them. In your example, the data would have two
peaks (at 0 and +x% effect) and would not be normally distibuted. Anyone
testing this without caution is an idiot, but it does not make the
statistical tests wrong.
> GS: Usually the null hypothesis is, in the simplest case, that there is no
> difference between the control group (or control condition as in the paired
> t-test, which is the simplest form of repeated-measures ANOVA; hehehe) and
> experimental group. So, yes, if you are doing ANYTHING it is likely to have
> SOME effect, and if you throw enough subjects at it, you will eventually
> reach a point where you "obtain statistical significance." This is, in fact,
> usually what happens in the sort of "science" you are talking about. BTW, in
> physics and many other sciences, what functions as the null hypothesis is,
> in fact, the scientist's own prediction! That is, the scientist does
> everything in his or her power to reject their own prediction, and when this
> does not occur they begin to assert the importance of their hypothesis. In
> contrast, "scientists" like you do everything in their power to reject the
> stawman notion that there is no effect which, as I have pointed out, is
> almost certain to be false.
Come on Glenn, I don't think that too many papers are pointing out a 5%
difference even if it is significant at p<0.001. Maybe you've had a bad
experience lately you want to share? Clinical significance involves the
idea that the effect is worth risking a change in therapy and so must be
a substantial improvement (not 5%) as well as a statisically significant
improvement.
Yours in enquiry
Richard Vickery
More information about the Neur-sci
mailing list