Your brilliant PhD student ran an experiment last week that investigated whether chanting the words “unicorns, Brexit, fairies” repeatedly every morning before dawn raises people’s estimates of the likelihood that they will win the next lottery in comparison to a control group that instead chants “reality, reality, reality”. The manipulation seems to have worked, as the estimates in the experimental group are higher than in the control group. A t-test is highly significant (p < .0001), confirming that the effect was highly unlikely to be due to chance alone, and therefore was likely the result of the experimental intervention. Your student is ready to write this up and submit it to the Journal of Fantastical Politics.
Not so fast, please!
That significant t-test did not tell you that the effect was unlikely to be due to chance alone.
What it did tell you is that if chance were the only effect present, then it is unlikely that you would have observed a t-value that large or larger.
The difference between “unlikely to be due to chance alone” and “if chance were the only effect present, then it is unlikely…” may be semantically subtle but it is statistically and conceptually profound.
So profound, in fact, that it may destroy lives.
Sally Clark was an English solicitor who in 1999 was found guilty of murdering two of her sons, who had died suddenly as babies, apparently from sudden infant death syndrome (SIDS). Her conviction rested largely on statistical evidence that the chances of two children in the same family dying of SIDS were only 1 in 73 million. That’s a very small p-value indeed, and one might be tempted to support the court’s decision to reject the null hypothesis of innocence (i.e., reject the idea that the deaths were due to SIDS).
The court’s decision was, alas, statistically deeply flawed, as the Royal Statistical Society pointed out in a letter to the Lord Chancellor in 2002, after Sally Clark had been languishing in prison for several years. In the letter, Peter Green, the President of the Royal Statistical Society, argued:
“The jury needs to weigh up two competing explanations for the babies’ deaths: SIDS or murder. The fact that two deaths by SIDS is quite unlikely is, taken alone, of little value. Two deaths by murder may well be even more unlikely. What matters is the relative likelihood of the deaths under each explanation, not just how unlikely they are under one explanation.”
If you want to walk through the steps that underlie this conclusion, here is a brief but instructive video that explains the statistics involved in the Sally Clark case:
The video suggests that the chance of Sally Clark being innocent was around 50%. In fact, even that probability is likely an under-estimate because another mathematician involved in the case, Ray Hill, put the probability of her innocence at somewhere between 17 to 1 and 9 to 1.
Sadly, although Sally Clark was released in 2003 largely because the initial statistical evidence was overturned, she never recovered from the trauma and died in 2007.
Let us return from a miscarriage of justice to the experiment of your brilliant PhD student. Let’s prevent a miscarriage of statistical inference.
That p-value of <.0001 tells you how likely the outcome was to have been observed under the null hypothesis of chance alone—in the same way that the probability of two children dying from SIDS can be calculated (although it’s much higher than 1 in 73 million, which reveals another flaw in the prosecution’s case against Sally Clark). But the p-value does not tell you how likely your PhD student’s result would have been even if the null hypothesis had been false—in the same way that the 1 in 73 million figure levelled against Sally Clark did not consider the likelihood of a mother having murdered two of her children.
Perhaps most crucially, the p-value tells you nothing about the relative likelihood of the hypotheses you are interested in—guilt or innocence in Sally Clark’s case, and the null hypothesis versus its alternatives in the case of your brilliant PhD student.
In a recent article on inference in psychology, Jeff Rouder and colleagues helpfully rephrased Peter Green’s letter to the Lord Chancellor to drive home this point:
“The researcher needs to weigh up two competing explanations for the data: The null hypothesis or the alternative hypothesis. The fact that the observed data are quite unlikely under the null hypothesis is, taken alone, of little value. The observed data may well be even more unlikely under the alternative hypothesis. What matters is the relative likelihood of the data under each hypothesis, not just how unlikely they are under one hypothesis.”
So where does this leave us?
If we cannot use p-values to draw reliable conclusions, how can we avoid miscarriages of statistical inference?
One answer was provided by the 18th-century gentleman, Reverend Thomas Bayes. The fundamental contribution of Bayes was presented in An Essay towards solving a Problem in the Doctrine of Chances, which was read to the Royal Society posthumously in 1763.
In a nutshell, Bayes’ theorem resolved the problem known as “inverse probability”, which is precisely the problem we wish to solve in statistical inference: A t-test gives us the probability of an event occurring conditional upon a state of the world (namely, that the null hypothesis is true). What researchers want, however, is the inverse—namely an insight into the likely state of the world given the statistical evidence at hand.
To date, quite frequent(tist)ly, researchers have achieved that desired conclusion by a process of wishful misinterpretation of p-values (the statistical equivalent of pre-dawn incantation of “unicorns, Brexit, fairies”).
A better way of obtaining the desired conclusion is to discard conventional null-hypothesis testing and replace it with Bayesian statistics, as more and more researchers have come to realize.
It is not surprising, then, that if you search this Featured Content site for the string “Bayes” you get 41 hits already. This number will increase next week, because our next digital event, #BayesInPsych, which commences on Monday 19 February, is dedicated to this Bayesian revolution.
The digital event coincides with the publication of a special issue of the Psychonomic Bulletin & Review dedicated to Bayesian Inference for Psychology. The issue was guest edited by Joachim Vandekerckhove (University of California, Irvine), Jeffrey N. Rouder (University of California, Irvine, and University of Missouri), and John K. Kruschke (Indiana University).
The articles in this issue will remain free to access by the public until early April. Here are the titles of the articles and their authors. They can be accessed from this landing page:
- Editorial: Bayesian methods for advancing psychological science. (
- Introduction to Bayesian Inference for Psychology (Alexander Etz and Joachim Vandekerckhove)
- Bayesian inference for psychology. Part I: Theoretical advantages and practical ramification (Eric-Jan Wagenmakers, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Jonathon Love, Ravi Selker, Quentin F. Gronau, Martin Šmíra, Sacha Epskamp, Dora Matzke, Jeffrey N. Rouder, and Richard D. Morey)
- Bayesian inference for psychology. Part II: Example applications with JASP (Eric-Jan Wagenmakers, Jonathon Love, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Ravi Selker, Quentin F. Gronau, Damian Dropmann, Bruno Boutin, Frans Meerhoff, Patrick Knight, Akash Raj, Erik-Jan van Kesteren, Johnny van Doorn, Martin Šmíra, Sacha Epskamp, Alexander Etz, Dora Matzke, Tim de Jong, Don van den Bergh, Alexandra Sarafoglou, Helen Steingroever, Koen Derks, Jeffrey N. Rouder, and Richard D. Morey)
- Bayesian inference for psychology, Part III: Parameter estimation in nonstandard models (Dora Matzke, Udo Boehm, and Joachim Vandekerckhove)
- Bayesian inference for psychology, Part IV: parameter estimation and Bayes factors (Jeffrey N. Rouder, Julia M. Haaf and Joachim Vandekerckhove)
- Determining informative priors for cognitive models (Michael D. Lee and Wolf Vanpaemel)
- Bayes factor design analysis: Planning for compelling evidence (Felix D. Schönbrodt and Eric-Jan Wagenmakers)
- A simple introduction to Markov Chain Monte–Carlo sampling (Don van Ravenzwaaij, Pete Cassey, and Scott D. Brown)
- Bayesian data analysis for newcomers (John K. Kruschke and Torrin M. Liddell)
- The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective (John K. Kruschke and Torrin M. Liddell)
- Four reasons to prefer Bayesian analyses over significance testing (Zoltan Dienes and Neil Mclatchie)
- How to become a Bayesian in eight easy steps: An annotated reading list (Alexander Etz, Quentin F. Gronau, Fabian Dablander, Peter A. Edelsbrunner, and Beth Baribault)
- Fitting growth curve models in the Bayesian framework (Zita Oravecz and Chelsea Muth)
- Bayesian latent variable models for the analysis of experimental psychology data (Edgar C. Merkle and Ting Wang)
- Sensitivity to the prototype in children with high-functioning autism spectrum disorder: An example of Bayesian cognitive psychometrics (Wouter Voorspoels. Isa Rutten, Annelies Bartlema, Francis Tuerlinckx, and Wolf Vanpaemel)
Beginning on Monday, 19 February, we will be discussing some of those articles here in our next digital event #BayesInPsych. Posts will be by the following contributors, in the likely order of appearance:
- Joachim Vandekerckhove will provide an overview of the special issue.
- Andy Gelman will summarize the four benefits of using Bayesian inference.
- Simon Farrell will address the problem of specifying priors during Bayesian modelling.
- Clintin Davis-Stober will discuss how using priors can bring you closer to your research dreams.
- Trisha van Zandt and Brandon Turner will explore how we can go beyond analysis of simple, tractable models like those presented in the special issue to complex, realistic and computational models that have no likelihood.
Much to look forward to—please tune in for all of next week.