# Some myths concerning statistical hypothesis testing

Glen M. Sizemore gmsizemore2 at yahoo.com
Wed Nov 6 18:22:18 EST 2002

```> My apologies if this gets double posted....I gave my lame server two days
to
> post it, but they loose about 30% of what I try to post. Anyway I thought
> that this would interest/anger many of you.
>
> Some myths concerning statistical hypothesis testing (from a recent paper
>
> 1.) Tests of statistical significance do not provide a quantitative
estimate
> of the reliability of the result.
>
> 2.) Tests of statistical significance do not estimate the probability that
> the results were due to chance.
>
> 3.) Tests of statistical significance usually do not answer a question to
> which the answer is unknown.

Mat: was there any reasoning behind these statements?

GS: What do you suppose, Mat?

Mat: any mathematics to back it up or did he just 'say' and 'argue' the
above?

GS: He uses the definitions themselves.

Mat: Unless he gave a rigourous mathematical proof that the above are
correct then it is pointless arguing about them as statistics lies in the
realm of mathematics, obviously. You don't 'discuss' mathematics.

The first assumption is incorrect even before we begin to debate. Statistics
churns out numbers, so it is by definition quantitative. What those numbers
actually mean is another matter.

GS: You really should learn to be patient. The first is not an assumption,
it is a fact. A p-value expresses a conditional probability. That is, a
p-value expresses the probability of obtaining the observation in question
GIVEN THAT THE NULL HYPOTHESIS IS TRUE. Since it is not known whether or not
the null hypothesis is true (at least ostensibly, but see below), the notion
that a small p-value means the finding is highly likely to be replicated is
clearly false. The only way to demonstrate the reliability of data is to
replicate the finding. Marc does not say what follows in his paper, but this
misconception has produced a state of affairs in which a great deal of
importance is attached to findings before it is clear that the finding is
reliable. The result is that there is, all things being equal, a great deal
of discrepant results in the various scientific literatures that rely on
statistical significance testing. In contrast, for sciences in which the
reliability is demonstrated in each subject (usually repeatedly), or
"subject" if the preparation is not a whole animal, there is far less
failure to replicate (this is because such data are published only when
there have been numerous demonstrations of reliability within and across
subjects). For an example of how this is done, you may examine my paper: The
Effects of Acutely Administered Cocaine on Responding Maintained by a
Progressive-ratio Schedule of Food Presentation, which is in press in
Behavioural Pharmacology. Or, you may examine virtually any paper in the
Journal of the Experimental Analysis of Behavior. Or you may obtain a copy
of Sidman's Tactics of Scientific Research, or even Claude Bernard's classic
book.

Mat: The second point - is the argument that the procedures are incorrect
(i.e. the algorithm) or that the underlying basic assumptions are
incorrect (e.g. normal distribution). If it is the former, then again
its rubbish, if its the latter then this argument is well known and he
presents nothing new.

GS: Wrong. Remember that a p-value represents the probability that one will
observe certain data given that the null hypothesis is true. If one asserts
that the p-value is really the probability that the null hypothesis is true
given the data (which is the same thing as saying it represents the
probability that the observed data are "due to chance") is to "reverse the
conditionality." As Marc says, this is tantamount to saying that the
probability of rain given that it is cloudy is the same as the probability
that it is cloudy given that it is raining. Think about it when your blood
pressure returns to normal.

Mat: What does he mean by 'answer'? no, stats rarely gives categorical yes
or no (which is in a sense a qualitative answer, which he's previously
argued stats does give)[...]

GS: No, that's not what he said, as should now be clear (as well as being a
pile of crap; see below).

Mat: [...]but thats not what people expect. Stats is
used to get a better understanding by measuring data pertinent to a
particular question with a host of well known caveats as to how the
conclusions it allows to be drawn and never produces a 'certain'
answer either way. Everyone knows this.

GS: Now it is my turn to use the term "rubbish" (here in the States, we
usually call it "garbage," but BS is probably more appropriate). If you
"obtain significance" you write a paper and submit it. If you do not, you
throw the data in the garbage (sounds pretty damn "categorical" to me), or
you just "increase the N" until you have found your "truth" (the fact that
all you have to do usually to reject the null hypothesis is simply add more
subjects should tell you something). You know this is true. But, in any
event, you are not on the right track. The point is that the strawman null
hypothesis is almost always not true. Marc, quoting Kraemer (ref. on
request), writes, "something nonrandom is almost always going on, and it
seems a trivial exercise to redemonstrate that fact." At the end of this
section Branch concludes, "Perhaps it is not so bad that significance tests
do not estimate the truth of the null hypothesis, because we already know
that it is false. Also, a procedure designed to help us decide about
something we already we already know could hardly be one that would provide
quantitative estimation of reliability."

Cordially,

Glen

"mat" <mats_trash at hotmail.com> wrote in message

```