#BayesInPsych: Preventing miscarriages of justice and statistical inference

Your brilliant PhD student ran an experiment last week that investigated whether chanting the words “unicorns, Brexit, fairies” repeatedly every morning before dawn raises people’s estimates of the likelihood that they will win the next lottery in comparison to a control group that instead chants “reality, reality, reality”. The manipulation seems to have worked, as the estimates in the experimental group are higher than in the control group. A t-test is highly significant (p < .0001), confirming that the effect was highly unlikely to be due to chance alone, and therefore was likely the result of the experimental intervention. Your student is ready to write this up and submit it to the Journal of Fantastical Politics.

Not so fast, please!

That significant t-test did not tell you that the effect was unlikely to be due to chance alone.

What it did tell you is that if chance were the only effect present, then it is unlikely that you would have observed a t-value that large or larger.

The difference between “unlikely to be due to chance alone” and “if chance were the only effect present, then it is unlikely…” may be semantically subtle but it is statistically and conceptually profound.

So profound, in fact, that it may destroy lives.

Sally Clark was an English solicitor who in 1999 was found guilty of murdering two of her sons, who had died suddenly as babies, apparently from sudden infant death syndrome (SIDS). Her conviction rested largely on statistical evidence that the chances of two children in the same family dying of SIDS were only 1 in 73 million. That’s a very small p-value indeed, and one might be tempted to support the court’s decision to reject the null hypothesis of innocence (i.e., reject the idea that the deaths were due to SIDS).

The court’s decision was, alas, statistically deeply flawed, as the Royal Statistical Society pointed out in a letter to the Lord Chancellor in 2002, after Sally Clark had been languishing in prison for several years. In the letter, Peter Green, the President of the Royal Statistical Society, argued:

“The jury needs to weigh up two competing explanations for the babies’ deaths: SIDS or murder. The fact that two deaths by SIDS is quite unlikely is, taken alone, of little value. Two deaths by murder may well be even more unlikely. What matters is the relative likelihood of the deaths under each explanation, not just how unlikely they are under one explanation.”

If you want to walk through the steps that underlie this conclusion, here is a brief but instructive video that explains the statistics involved in the Sally Clark case:

The video suggests that the chance of Sally Clark being innocent was around 50%. In fact, even that probability is likely an under-estimate because another mathematician involved in the case, Ray Hill, put the probability of her innocence at somewhere between 17 to 1 and 9 to 1.

Sadly, although Sally Clark was released in 2003 largely because the initial statistical evidence was overturned, she never recovered from the trauma and died in 2007.

Let us return from a miscarriage of justice to the experiment of your brilliant PhD student. Let’s prevent a miscarriage of statistical inference.

That p-value of <.0001 tells you how likely the outcome was to have been observed under the null hypothesis of chance alone—in the same way that the probability of two children dying from SIDS can be calculated (although it’s much higher than 1 in 73 million, which reveals another flaw in the prosecution’s case against Sally Clark). But the p-value does not tell you how likely your PhD student’s result would have been even if the null hypothesis had been false—in the same way that the 1 in 73 million figure levelled against Sally Clark did not consider the likelihood of a mother having murdered two of her children.

Perhaps most crucially, the p-value tells you nothing about the relative likelihood of the hypotheses you are interested in—guilt or innocence in Sally Clark’s case, and the null hypothesis versus its alternatives in the case of your brilliant PhD student.

In a recent article on inference in psychology, Jeff Rouder and colleagues helpfully rephrased Peter Green’s letter to the Lord Chancellor to drive home this point:

“The researcher needs to weigh up two competing explanations for the data: The null hypothesis or the alternative hypothesis. The fact that the observed data are quite unlikely under the null hypothesis is, taken alone, of little value. The observed data may well be even more unlikely under the alternative hypothesis. What matters is the relative likelihood of the data under each hypothesis, not just how unlikely they are under one hypothesis.”

So where does this leave us?

If we cannot use p-values to draw reliable conclusions, how can we avoid miscarriages of statistical inference?

One answer was provided by the 18th-century gentleman, Reverend Thomas Bayes. The fundamental contribution of Bayes was presented in An Essay towards solving a Problem in the Doctrine of Chances, which was read to the Royal Society posthumously in 1763.

In a nutshell, Bayes’ theorem resolved the problem known as “inverse probability”, which is precisely the problem we wish to solve in statistical inference: A t-test gives us the probability of an event occurring conditional upon a state of the world (namely, that the null hypothesis is true). What researchers want, however, is the inverse—namely an insight into the likely state of the world given the statistical evidence at hand.

To date, quite frequent(tist)ly, researchers have achieved that desired conclusion by a process of wishful misinterpretation of p-values (the statistical equivalent of pre-dawn incantation of “unicorns, Brexit, fairies”).

A better way of obtaining the desired conclusion is to discard conventional null-hypothesis testing and replace it with Bayesian statistics, as more and more researchers have come to realize.

It is not surprising, then, that if you search this Featured Content site for the string “Bayes” you get 41 hits already. This number will increase next week, because our next digital event, #BayesInPsych, which commences on Monday 19 February, is dedicated to this Bayesian revolution.

The digital event coincides with the publication of a special issue of the Psychonomic Bulletin & Review dedicated to Bayesian Inference for Psychology. The issue was guest edited by Joachim Vandekerckhove (University of California, Irvine), Jeffrey N. Rouder (University of California, Irvine, and University of Missouri), and John K. Kruschke (Indiana University).

The articles in this issue will remain free to access by the public until early April. Here are the titles of the articles and their authors. They can be accessed from this landing page:

Beginning on Monday, 19 February, we will be discussing some of those articles here in our next digital event #BayesInPsych. Posts will be by the following contributors, in the likely order of appearance:

  • Joachim Vandekerckhove will provide an overview of the special issue.
  • Andy Gelman will summarize the four benefits of using Bayesian inference.
  • Simon Farrell will address the problem of specifying priors during Bayesian modelling.
  • Clintin Davis-Stober will discuss how using priors can bring you closer to your research dreams.
  • Trisha van Zandt and Brandon Turner will explore how we can go beyond analysis of simple, tractable models like those presented in the special issue to complex, realistic and computational models that have no likelihood.

Much to look forward to—please tune in for all of next week.

 

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like