Some myths concerning statistical hypothesis testing
Glen M. Sizemore
gmsizemore2 at yahoo.com
Thu Nov 7 12:51:38 EST 2002
Glen M. Sizemore wrote:
> 1.) Tests of statistical significance do not provide a quantitative
> estimate of the reliability of the result.
SM: They don't. Do neuroscientists generally believe this?
GS: Oh yeah.
> 2.) Tests of statistical significance do not estimate the probability that
> the results were due to chance.
SM: They estimate the probability of getting at least as
impressive results by fluke.
GS: You know something? I like you.
> 3.) Tests of statistical significance usually do not answer a question to
> which the answer is unknown.
SM: Also true.
Statisticans with knowledge of Bayesian theorems have claimed
this since Pearson, Neuman and Fisher began publishing their
works in the early 20th century. This is not new. Unfortunately,
the majority of contemporery researchers didn't learn about Bayesian
statistics in their applied statitics class.
GS: Tell me about Bayesian statistics, please!
SM: And what actually
matter seems to be the probability of getting a paper accepted,
not the reliability of the result.
GS: The use of inferential statistics to "decide" whether or not data are
any good has all but devastated psychology, behavioral neuroscience, and
other fields as well. Given a twisted interpretation of "operational
definitions," and the fact that most of them have the philosophy skills of a
slug, mainstream psychology and behavioral neuroscience are failures,
despite all the auto-backpatting that goes on. Decade of the brain, indeed!
They seem to be trying to outdo each other's stupidity. Let's call it
"keeping up with the Joneses" (a private joke).
SM: And since this usually means
convincing a refree that knows nothing about statitics, one has
to speak a language the refree knows. And that means using
"tests" that don't actually answer the question being asked.
GS: Fortunately, I am back in a position where I don't usually have to
publish in journals that require inferential statistics. But the number of
such journals is small, and rapidly diminishing. Ironically, many journals
accept "individual-subject" data for nonhuman primates (do you know what I
am talking about?) because you don't "need" a "large N" to make judgements
(monkeys are expensive). It is ironic, of course, because such studies are
"tolerated" when in fact the method is far superior to the nonsensical
significance-testing approach. What a bunch of morons.
SM: There are generally two penalties for using proper statitical
- "Classical" tests overestimates the true significance,
sometimes by an order of magnitude. This is good for getting
papers accepepted. And nobody get fried for using "classical"
- Refrees don't know Bayesian statitics and will ask
for the flawed "classical tests" anyhow.
As long as science is about getting papers published andresearchers aren't
educated about Bayesian statitics,
this will not change. Scientists generelally don't know
what the tests imply, and don't care either.
GS: Amen brother. But.....tell me about Bayesian statistics. I never even
heard of them!
SM: There is a way out of this dilemma, that anyone can
(and probably should) use:
There is a third school of statiticans, those that rely on
graphing data. The idea is that one should know enough
about plotting data, and ways to plot data, in order to
evaluate the results visually (this is usually possible).
If the effect doesn't show on a properly selected graph,
it is not interesting regardless of the "significance",
as the effect size must be negigible. But if the graph
convincing, classical statitics still doesn't matter.
Investigations have shown that visual evaluation is superior
to "tests of signficance" on properly graphed data sets,
and that the correspondance between visual judgement
and exact Bayesian tests are usually high.
There are almost only good aspects by this approach. It
evaluates the effect size as well as the "significance".
Publishers like papers with nice figures. One can still
include classical statistics to convince undereducated
William Cleveland's books "Visualizing data" and "Elements
of Graphing Data" are good places to start.
GS: I - and others like me - are in this category. In addition, we insist
that data from individual subjects not be averaged together. Basically, if
you are not familiar with the approach, baselines of behavior are
established and independent variables are repeatedly introduced and
withdrawn. The effects are judged by the difference between baseline
performance and the performance under various levels of the independent
variable. Basically, if a variable produces an effect outside of the range
of baseline variation then one says that variable has an effect. Data
analysis typically consists of plotting the data in many different ways so
that one may make a judgement. This has several beneficial effects:
1.) since the effect is repeatedly demonstrated in each subject (and the
usual number of subjects is 3-6) one may be very certain that the effect is
reliable upon direct replication. By the time such data are submitted for
publication the effect has essentially been replicated 20 or 30 times.
2.) the data are relevant to the behavior of individual subjects. Data
averaged across subjects may have no relevance to behavior. For example, it
is frequently said that the function relating rate of response to dose of
self-administered cocaine is an inverted u-shaped function under ratio
schedules. However, if one looks at the data for individual subjects, one
usually finds that low doses maintain about as much responding as saline,
and as the dose gets larger the rate of response suddenly jumps to its
maximum. Further increases result in decreases in rate of response. Thus,
there really is no ascending portion of the function. However, that is not
the picture that emerges when the data are averaged together. The averaged
curve completely misrepresents the actual functions.
3.) the scientist must take personal responsibility for judging the quality
of the data. As it stands now, if you publish a paper, and the effect cannot
be replicated, your reputation does not suffer.....after all, it is not your
fault that statistics came out the way they did!
"Sturla Molden" <sturla at molden_dot_net.invalid> wrote in message
news:aqdgde$37u$1 at tyfon.itea.ntnu.no...
More information about the Neur-sci