There are lies, there are damned
lies and then there are statistics. At the heart of almost
every research paper lies a P value, a number that
sums up the statistical validity of research results.
The lower the P number, the less likely it is that
the results are a fluke. Large numbers of participants
tend to drive the P number down, as do wide gaps
in outcome between treatment and placebo groups.
It's widely accepted that a P
number below 0.05 is a sign of statistical significance.
With so much research being published, this provides
a quick and handy reference to the statistical power
of a study. Why bother reading the detailed methodology
when all of these factors can be neatly summed up in
a single figure? It's a great shortcut if the
P number can be trusted. But a review published
in the May 28 issue of BMC Medical Research Methodology
suggests that this trust might be misplaced.
The P number is easy to
read, but hellishly complicated to generate and involves
a set of skills that most clinicians lack. In fact,
doctors' dread of the Cox proportional hazards regression
model or the Poisson ratio is one of the major obstacles
to research. Only the biggest trials can afford the
luxury of a specialized statistician. The rest must
muddle along hoping that peer review will catch any
howlers.
Emili Garcia-Berthou and Carles
Alcaraz, of the University of Girona in Spain, set out
to learn if the system is working. Their conclusion?
Well, no, not really. No fewer than 11.6% and 11.1%
of the statistical results published in Nature and
the British Medical Journal (BMJ), respectively,
during 2001 were wrong. A whopping 38% of the papers
in Nature contained at least one such error, as did
25% in the BMJ.
Most of the inaccuracies appeared
to be due to errors in transcription or in rounding-off,
rather than to bad math. In other words, the very stage
when errors are supposed to be caught was the stage
when most errors appeared.
Mercifully, only one of the 28
errors found actually turned a nonsignificant result
into an apparently significant one. But that appears
to be mostly a matter of luck, because 12% of the errors
changed the significance level of the P value by at
least one order of magnitude.
"Although these kinds of errors
may leave the conclusions of a study unchanged, they
are indicative of poor practice," said the authors.
"The quality of research and scientific papers need
improvement and should be more carefully checked and
evaluated in these days of high publication pressure."
They suggested that one way to
minimize the errors would be for journals to publish
the raw data on the internet. Richard Smith, editor
of the BMJ, agreed that such an approach might
help. Philip Campbell, editor-in-chief of Nature,
said his journal has changed its editing practices since
2001, but added that Nature will examine the
study's findings before deciding whether further changes
are needed.
|