(This post was co-authored with Brandon Turner).
Sharon Bertsch McGrayne’s 2012 book, The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy, traces the difficulties that statisticians and empirical researchers alike have had in embracing Bayesian methods. Despite the obvious strengths of these methods, a fear of introducing bias into a statistical analysis (by use of informed priors) and strong personalities in the statistical community prevented many from fully exploring and developing Bayesian methods until the latter half of the last century, fully 150 years after they were introduced by Pierre Simon LaPlace.
In the 1980s, quantitatively-minded psychology faculty, heady with the newfound attention that Bayesian methods were experiencing, told their graduate students (of which Trish Van Zandt was one) that the Bayesian revolution was, finally, just around the corner. Some of us began studying and preparing for this paradigm shift, but most of us did not. And so the Bayesian revolution languished, at least in experimental psychology.
The publication of this special issue marks an important milestone: Now, it seems, the Bayesian revolution is well underway. In a series of papers, our colleagues lay out the foundations of Bayesian inference, including hypothesis testing, parameter estimation, meta-analysis, and model selection. They emphasize the importance of making the shift from traditional frequentist methods to Bayesian methods, and highlight the value-add of the Bayesian approach. They demonstrate how Bayesian methods can be applied across experimental psychology, and showcase analytic tools that are available to everyone, even those without much programming experience. Finally, and perhaps most important, Lee and Vanpaemel address the selection of priors for Bayesian models, the primary issue that kept Bayesian techniques closeted through most of the last century.
The real power of Bayesian analyses is that they incorporate within them the model structure under consideration. The specification of the likelihood (the probability of the data) is the model, and so inference about those model parameters can be done without any intermediary assumptions to get in the way, such as the linearity of an ANOVA or regression model. As Lee and Vanpaemel argue, this power extends to the specification of the priors, which reflect both theoretical constraints and our current understanding (or lack of understanding) of how parameters fluctuate with changes in physical conditions.
As cognitive models become more elaborate and realistic, their likelihoods become more complex and computationally difficult to incorporate into the Bayesian framework. The article by Matzke and colleagues demonstrates how posterior estimates can be derived for a multinomial processing tree model of recognition memory, a flexible structure that can account for a wide range of effects in ecologically valid memory tasks. While the multinomial processing tree model has a relatively simple, analytic likelihood function, the most interesting and powerful models in cognitive psychology do not.
Models like the leaky competing accumulator (LCA; Usher and McClelland, 2001), retrieving effectively from memory (REM; Shiffrin and Steyvers, 1997), and the Leabra cognitive architecture (O’Reilly et al., 2015) are computational: they make predictions by way of simulation. Because we cannot write down a likelihood for these models, or because the likelihoods are very complex and/or badly behaved, even standard analyses such as parameter estimation by maximum likelihood are difficult.
One potential solution to model fitting and parameter estimation for these models is approximate least squares (Malmberg et al., 2004), in which a model to be fit is used to simulate a large number of synthetic data sets, and these data sets are compared to the observed data. Best-fitting parameters are selected according to how close the synthetic and observed data are to each other in a least-squares sense.
While the approximate least-squares method is slow and can be difficult, Bayesian treatments of these models seem impossible. Because the likelihood, missing in these models, is the foundation on which the posterior distributions of parameters is constructed, as Etz and Vandekerckhove note in their article, there seems to be no way to derive or estimate the posterior distributions of the models’ parameters.
Except there is.
We have worked for several years on developing approximate methods that can incorporate computational (simulation-based) models in Bayesian hierarchical structures (Turner and Van Zandt, 2014; Turner et al., 2013; Turner and Van Zandt, 2012; Turner and Seder- berg, 2012, 2014; Turner et al., 2016, 2018), and more recently compiled these efforts into a workbook for cognitive modelers (Palestro et al., 2018).
These methods, developed originally by Pritchard (Pritchard et al., 1999), are formalizations of approximate least squares that permit Bayesian inference. As in approximate least squares, the model simulates a synthetic data set, and then proposed model parameters are evaluated by way of how well the simulated data resemble the observed data. This requires that we define what we mean by “resemble” and we must develop methods for selecting proposed model parameters.
Perhaps the simplest method for performing Bayesian inference on computational models
relies on the rejection algorithm. The rejection algorithm proceeds as follows: First, a candidate parameter value θ∗ is proposed, perhaps by sampling from the prior for that parameter. Second, a synthetic data set Y ∗ is generated by simulating the model under the parameter θ∗. Third, a distance between the observed data Y and the synthetic data Y ∗ is computed. This distance is usually defined in terms of summary statistics of the samples Y and Y ∗, such as the sample moments or quantiles, and may reflect the Euclidean distance between the statistics or the squared differences between them. If the distance is small enough, the candidate θ∗ is retained as a sample from the desired posterior.
While the rejection algorithm has limitations, it works surprisingly well given that it requires only 9 lines of code to implement. For example, Turner and Van Zandt (2012) showed how the rejection algorithm can accurately estimate both the individual and hyper-level posterior distributions of the parameters of a binomial model. While this was a very simple model and not one that would be difficult to analyze using standard Bayesian techniques, it was important to show that approximate methods can be accurate.
Turner and colleagues (2013) went further, and demonstrated how approximate methods can be used to distinguish between computational models of memory, the REM model (Shiffrin and Steyvers, 1997) and the Bind-Cue-Decide model (BCDMEM; Dennis and Humphreys, 2001). Using approximate methods, Turner and colleagues were able to fit hierarchical versions of the models to empirical data, something that had never before been accomplished with computational models of this kind. Turner and colleagues were also able to perform quantitative model comparisons. They were also able to examine, by way of the parameters’ joint posterior distributions, how the models’ parameters covaried. This kind of exploration provides greater insight into a model’s structure and the psychological concepts represented by each parameter.
While the special issue speaks to the power of Bayesian machinery for simple (descriptive) models, we want to emphasize that approximate techniques will be important for working with neurally plausible, complex models that will carry us forward in our understanding of how cognition and the brain work. The development of powerful numerical methods like MCMC, which permits us to sample from a posterior distribution without an analytic form, and which van Ravenzwaaij and colleagues explain in their article, has brought standard Bayesian techniques within everyone’s reach. Approximate methods are not any more difficult than MCMC. Even models with very difficult or intractable likelihoods are now amenable to Bayesian inference.
In 1975, D. V. Lindley noted that “The only good statistics is Bayesian statistics.” The fact that this endorsement was made in 1975, and we are only now, over 40 years later, willing to discuss changing how statistical analyses are performed in experimental psychology attests to how long the Bayesian revolution has been taking place.
In his award-winning 2004 novel Cloud Atlas, David Mitchell wrote, “All revolutions are the sheerest fantasy until they happen, then they are historical inevitabilities.” We saw this coming a long time ago; indeed, it was inevitable.
References
Dennis, S. and Humphreys, M. S. (2001). A context noise model of episodic word recognition. Psychological Review, 108, 452–478.
Malmberg, K. J., Zeelenberg, R., and Shiffrin, R. M. (2004). Turning up the noise or turning down the volume? On the nature of the impairment of episodic recognition memory by Midazolam. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 540–549.
O’Reilly, R. C., Hazy, T. E., and Herd, S. A. (2015). The Leabra cognitive architecture: How to play 20 principles with nature and win! In Chipman, S., editor, Oxford Handbook of Cognitive Science. Oxford University Press, Oxford.
Palestro, J. J., Sederberg, P. B., Osth, A. F., Van Zandt, T., and Turner, B. M. (2018). Likelihood-free methods for cognitive science. In Criss, A. H., editor, Computational Approaches to Cognition and Perception, pages 1–129. Springer International Publishing.
Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., and Feldman, M. W. (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.
Shiffrin, R. M. and Steyvers, M. (1997). A model for recognition memory: REM – retrieving effectively from memory. Psychonomic Bulletin and Review, 4, 145–166.
Turner, B. M., Dennis, S., and Van Zandt, T. (2013). Likelihood-free Bayesian analysis of memory models. Psychological Review, 120, 667–678.
Turner, B. M., Schley, D. R., Muller, C., and Tsetsos, K. (2018). Competing models of multi-attribute, multi-alternative preferential choice. In press.
Turner, B. M. and Sederberg, P. B. (2012). Approximate Bayesian computation with Differential Evolution. Journal of Mathematical Psychology, 56, 375–385.
Turner, B. M. and Sederberg, P. B. (2014). A generalized, likelihood-free method for pa- rameter estimation. Psychonomic Bulletin and Review, 21, 227–250.
Turner, B. M., Sederberg, P. B., and McClelland, J. L. (2016). Bayesian analysis of simulation-based models. Journal of Mathematical Psychology, 72, 191–199.
Turner, B. M. and Van Zandt, T. (2012). A tutorial on approximate Bayesian computation. Journal of Mathematical Psychology, 56, 69–85.
Turner, B. M. and Van Zandt, T. (2014). Hierarchical approximate Bayesian computation. Psychometrika, 79, 185–209.
Usher, M. and McClelland, J. L. (2001). On the time course of perceptual choice: The leaky competing accumulator model. Psychological Review, 108, 550–592.