Statistical errors in neuroscience: how a mouse turned into an elephant
Analyzing a large corpus of the neuroscience literature we found that the same statistical error – comparing effect sizes by comparing their significance levels – is appearing throughout even the most prestigious journals in neuroscience.
Early in 2011 I reviewed two manuscripts for Nature Neuroscience. In both manuscripts some of the main conclusions were based on a statistical error, which I realized I had seen many times before: the researchers wanted to claim that one effect (for example, a practice effect on neural activity in mutant mice) was larger or smaller than the other effect (the practice effect in control mice). To support this claim, they needed to report a statistically significant interaction (between amount of practice and type of mice), but instead they reported that one effect was statistically significant, whereas the other effect was not. Although superficially compelling, the latter type of statistical reasoning is erroneous because the difference between significant and not significant need not itself be statistically significant.
I decided to ask the editor if Nature Neuroscience would be interested in publishing a brief ‘letter to the editor’ in which I would point out this common mistake. To my surprise, she wrote back to say she would prefer a full-length opinion article if an extensive literature analysis confirmed my claim about the ubiquity of the error. Wow, a rare opportunity to write an article for a Nature journal—and about statistics, something I had never written about! Quite a challenge. I soon realized I needed to ask some colleagues, one of them a statistical expert, to help me carry out the literature analysis.
Together, we reviewed 513 recent neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience), and found that 78 used the correct procedure and 79 used the incorrect procedure: the error was even more common than I’d expected. In our article we reported the literature analysis and various scenarios in which the erroneous procedure is particularly alluring. The reviewers of our manuscript understood the urgency of the topic and supported publication. In September 2011, the article was published.
What we had not anticipated was the huge amount of attention our article would receive outside academia. Our research featured in several prominent European newspapers, in less prominent newspapers such as de Pers and the Leidsch Dagblad, in numerous blogs on science, and in countless twitter messages. I just googled the exact title of our article and received 75,000 hits. The article was well-timed, given that the media and general public have become increasingly interested in science and, in particular, bad science… But who would have thought an article on statistics would ever draw so much attention? It was certainly my first experience with massive media attention—something that, I realized, requires some different skills than I’d needed so far in my career.