Psychonomic Society Featured Content Fri, 23 Feb 2018 14:41:22 +0000 en-US hourly 1 Psychonomic Society Featured Content 32 32 The ABC in #BayesInPsych: Approximating likelihoods in simulation models Fri, 23 Feb 2018 09:48:45 +0000
(This post was co-authored with Brandon Turner). Sharon Bertsch McGrayne’s 2012 book, The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy, traces the difficulties that statisticians and empirical researchers alike have had in embracing Bayesian methods. Despite the obvious […]

(This post was co-authored with Brandon Turner).

Sharon Bertsch McGrayne’s 2012 book, The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy, traces the difficulties that statisticians and empirical researchers alike have had in embracing Bayesian methods. Despite the obvious strengths of these methods, a fear of introducing bias into a statistical analysis (by use of informed priors) and strong personalities in the statistical community prevented many from fully exploring and developing Bayesian methods until the latter half of the last century, fully 150 years after they were introduced by Pierre Simon LaPlace.

In the 1980s, quantitatively-minded psychology faculty, heady with the newfound attention that Bayesian methods were experiencing, told their graduate students (of which Trish Van Zandt was one) that the Bayesian revolution was, finally, just around the corner. Some of us began studying and preparing for this paradigm shift, but most of us did not. And so the Bayesian revolution languished, at least in experimental psychology.

The publication of this special issue marks an important milestone: Now, it seems, the Bayesian revolution is well underway. In a series of papers, our colleagues lay out the foundations of Bayesian inference, including hypothesis testing, parameter estimation, meta-analysis, and model selection. They emphasize the importance of making the shift from traditional frequentist methods to Bayesian methods, and highlight the value-add of the Bayesian approach. They demonstrate how Bayesian methods can be applied across experimental psychology, and showcase analytic tools that are available to everyone, even those without much programming experience. Finally, and perhaps most important, Lee and Vanpaemel address the selection of priors for Bayesian models, the primary issue that kept Bayesian techniques closeted through most of the last century.

The real power of Bayesian analyses is that they incorporate within them the model structure under consideration. The specification of the likelihood (the probability of the data) is the model, and so inference about those model parameters can be done without any intermediary assumptions to get in the way, such as the linearity of an ANOVA or regression model. As Lee and Vanpaemel argue, this power extends to the specification of the priors, which reflect both theoretical constraints and our current understanding (or lack of understanding) of how parameters fluctuate with changes in physical conditions.

As cognitive models become more elaborate and realistic, their likelihoods become more complex and computationally difficult to incorporate into the Bayesian framework. The article by Matzke and colleagues demonstrates how posterior estimates can be derived for a multinomial processing tree model of recognition memory, a flexible structure that can account for a wide range of effects in ecologically valid memory tasks. While the multinomial processing tree model has a relatively simple, analytic likelihood function, the most interesting and powerful models in cognitive psychology do not.

Models like the leaky competing accumulator (LCA; Usher and McClelland, 2001), retrieving effectively from memory (REM; Shiffrin and Steyvers, 1997), and the Leabra cognitive architecture (O’Reilly et al., 2015) are computational: they make predictions by way of simulation. Because we cannot write down a likelihood for these models, or because the likelihoods are very complex and/or badly behaved, even standard analyses such as parameter estimation by maximum likelihood are difficult.

One potential solution to model fitting and parameter estimation for these models is approximate least squares (Malmberg et al., 2004), in which a model to be fit is used to simulate a large number of synthetic data sets, and these data sets are compared to the observed data. Best-fitting parameters are selected according to how close the synthetic and observed data are to each other in a least-squares sense.

While the approximate least-squares method is slow and can be difficult, Bayesian treatments of these models seem impossible. Because the likelihood, missing in these models, is the foundation on which the posterior distributions of parameters is constructed, as Etz and Vandekerckhove note in their article, there seems to be no way to derive or estimate the posterior distributions of the models’ parameters.

Except there is.

We have worked for several years on developing approximate methods that can incorporate computational (simulation-based) models in Bayesian hierarchical structures (Turner and Van Zandt, 2014; Turner et al., 2013; Turner and Van Zandt, 2012; Turner and Seder- berg, 2012, 2014; Turner et al., 2016, 2018), and more recently compiled these efforts into a workbook for cognitive modelers (Palestro et al., 2018).

These methods, developed originally by Pritchard (Pritchard et al., 1999), are formalizations of approximate least squares that permit Bayesian inference. As in approximate least squares, the model simulates a synthetic data set, and then proposed model parameters are evaluated by way of how well the simulated data resemble the observed data. This requires that we define what we mean by “resemble” and we must develop methods for selecting proposed model parameters.

Perhaps the simplest method for performing Bayesian inference on computational models

relies on the rejection algorithm. The rejection algorithm proceeds as follows: First, a candidate parameter value θ∗ is proposed, perhaps by sampling from the prior for that parameter. Second, a synthetic data set Y ∗ is generated by simulating the model under the parameter θ∗. Third, a distance between the observed data Y and the synthetic data Y ∗ is computed. This distance is usually defined in terms of summary statistics of the samples Y and Y ∗, such as the sample moments or quantiles, and may reflect the Euclidean distance between the statistics or the squared differences between them. If the distance is small enough, the candidate θ∗ is retained as a sample from the desired posterior.

While the rejection algorithm has limitations, it works surprisingly well given that it requires only 9 lines of code to implement. For example, Turner and Van Zandt (2012) showed how the rejection algorithm can accurately estimate both the individual and hyper-level posterior distributions of the parameters of a binomial model. While this was a very simple model and not one that would be difficult to analyze using standard Bayesian techniques, it was important to show that approximate methods can be accurate.

Turner and colleagues (2013) went further, and demonstrated how approximate methods can be used to distinguish between computational models of memory, the REM model (Shiffrin and Steyvers, 1997) and the Bind-Cue-Decide model (BCDMEM; Dennis and Humphreys, 2001). Using approximate methods, Turner and colleagues were able to fit hierarchical versions of the models to empirical data, something that had never before been accomplished with computational models of this kind. Turner and colleagues were also able to perform quantitative model comparisons. They were also able to examine, by way of the parameters’ joint posterior distributions, how the models’ parameters covaried. This kind of exploration provides greater insight into a model’s structure and the psychological concepts represented by each parameter.

While the special issue speaks to the power of Bayesian machinery for simple (descriptive) models, we want to emphasize that approximate techniques will be important for working with neurally plausible, complex models that will carry us forward in our understanding of how cognition and the brain work. The development of powerful numerical methods like MCMC, which permits us to sample from a posterior distribution without an analytic form, and which van Ravenzwaaij and colleagues explain in their article, has brought standard Bayesian techniques within everyone’s reach. Approximate methods are not any more difficult than MCMC. Even models with very difficult or intractable likelihoods are now amenable to Bayesian inference.

In 1975, D. V. Lindley noted that “The only good statistics is Bayesian statistics.” The fact that this endorsement was made in 1975, and we are only now, over 40 years later, willing to discuss changing how statistical analyses are performed in experimental psychology attests to how long the Bayesian revolution has been taking place.

In his award-winning 2004 novel Cloud Atlas, David Mitchell wrote, “All revolutions are the sheerest fantasy until they happen, then they are historical inevitabilities.” We saw this coming a long time ago; indeed, it was inevitable.


Dennis, S. and Humphreys, M. S. (2001). A context noise model of episodic word recognition. Psychological Review, 108, 452–478.

Malmberg, K. J., Zeelenberg, R., and Shiffrin, R. M. (2004). Turning up the noise or turning down the volume? On the nature of the impairment of episodic recognition memory by Midazolam. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 540–549.

O’Reilly, R. C., Hazy, T. E., and Herd, S. A. (2015). The Leabra cognitive architecture: How to play 20 principles with nature and win! In Chipman, S., editor, Oxford Handbook of Cognitive Science. Oxford University Press, Oxford.

Palestro, J. J., Sederberg, P. B., Osth, A. F., Van Zandt, T., and Turner, B. M. (2018). Likelihood-free methods for cognitive science. In Criss, A. H., editor, Computational Approaches to Cognition and Perception, pages 1–129. Springer International Publishing.

Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., and Feldman, M. W. (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.

Shiffrin, R. M. and Steyvers, M. (1997). A model for recognition memory: REM – retrieving effectively from memory. Psychonomic Bulletin and Review, 4, 145–166.

Turner, B. M., Dennis, S., and Van Zandt, T. (2013). Likelihood-free Bayesian analysis of memory models. Psychological Review, 120, 667–678.

Turner, B. M., Schley, D. R., Muller, C., and Tsetsos, K. (2018). Competing models of multi-attribute, multi-alternative preferential choice. In press.

Turner, B. M. and Sederberg, P. B. (2012). Approximate Bayesian computation with Differential Evolution. Journal of Mathematical Psychology, 56, 375–385.

Turner, B. M. and Sederberg, P. B. (2014). A generalized, likelihood-free method for pa- rameter estimation. Psychonomic Bulletin and Review, 21, 227–250.

Turner, B. M., Sederberg, P. B., and McClelland, J. L. (2016). Bayesian analysis of simulation-based models. Journal of Mathematical Psychology, 72, 191–199.

Turner, B. M. and Van Zandt, T. (2012). A tutorial on approximate Bayesian computation. Journal of Mathematical Psychology, 56, 69–85.

Turner, B. M. and Van Zandt, T. (2014). Hierarchical approximate Bayesian computation. Psychometrika, 79, 185–209.

Usher, M. and McClelland, J. L. (2001). On the time course of perceptual choice: The leaky competing accumulator model. Psychological Review, 108, 550–592.

]]> 0 Psychonomic Society Featured Content
#BayesInPsych: Spiking a slab with sleepless pillow talk and prior inequalities Thu, 22 Feb 2018 07:56:50 +0000
I recently finished reading Suzanne Buffam’s, A Pillow Book. This is a book of non-fiction poetry about thoughts and musings that may enter the mind as one drifts off to sleep, ranging from the historical consideration of pillows to comprehensive lists of sleeping aids. I’ve spent more than a few nights drifting off to sleep considering […]

I recently finished reading Suzanne Buffam’s, A Pillow Book. This is a book of non-fiction poetry about thoughts and musings that may enter the mind as one drifts off to sleep, ranging from the historical consideration of pillows to comprehensive lists of sleeping aids.

I’ve spent more than a few nights drifting off to sleep considering the following question: How can I test fundamental properties of decision making? I want to proceed by making as few assumptions about human behavior as possible—with the end goal being to test only the property of interest—no more, no less. Admittedly, the reader may be thinking that this doesn’t sound very Bayesian. Doesn’t Bayesian analysis require even more assumptions than a classical approach? These are clearly the thoughts of a sleep-deprived individual.

Suppose we wanted to know whether an individual selecting among sleep aids at the local pharmacy acted as if she were evaluating their relative pros (sleep!) and cons (groggy the next day). Could her choices be described by a mathematical function that reflects an optimized balance of these evaluated pros and cons?

One way to proceed would be to run a choice experiment and apply the analysis from Falmagne (1978). In that seminal paper, Falmagne derived a collection of linear inequalities on choice probabilities that are both necessary and sufficient for choices to be described by optimization. Applied to our example, one of the inequalities would be:

(the probability of selecting melatonin over doxylamine) + (the probability of selecting doxylamine over bourbon) – (the probability of selecting melatonin over bourbon) < 1.

The beauty of this result is that the inequalities are general—they can be applied to any set of choice alternatives (not just sleep aids) and require very few assumptions about human behavior. If a person’s choices “satisfy” the inequalities, then she can be described as optimizing her choices, if not, then she cannot be choosing in this way. The hard part is reconciling the observed choice data with the inequalities.

The inequalities are just algebra and say nothing about the variability of the data. To carry out a statistical test to determine if the person’s choices truly conform to the inequalities, we need a statistical model, preferably one that forms the most direct bridge between the choice data and the inequalities of interest.

Why would Bayesian statistics make sense here? While a Bayesian approach requires the specification of a prior, this prior can, perhaps counter-intuitively, make the model simpler and more direct. As described by Lee and Vanpaemel in this special issue, and as already noted on this blog by Simon Farrell yesterday, a prior can be used to many useful ends, and specifying it to be “non-informative” may not always be the best choice.

In the above case, we could start with a very simple statistical model of the choice data and use a prior to encode the inequalities—thereby embedding the theory directly into the statistical model (see McCausland and Marley, 2014, for a nice application of this very idea). This is an example of using theory to determine the prior advocated by Lee and Vanpaemel. While we are making some assumptions, we would be using the prior to bring the theory closer to the data, not further from it. Even better, the prior would be adding constraints to the model, not making it more complex.

We could go even further with this idea. Suppose we wanted to detect individuals who, in their sleep deprived state, simply chose among sleep aids at random? We could go to the trouble of specifying a separate model for those individuals, complete with another likelihood function, or we could make the prior do the work for us. Following Rouder, Haaf, and Vandekerckhove, also in this special issue, we could use a “spike and slab” approach where we place additional prior weight on parameter values that correspond to guessing. Such a modification would be useful in detecting whether an individual is optimizing or guessing. In this case, the prior is handling guessing without requiring more complexity in the likelihood function itself. As discussed by Rouder and colleagues, this “spike and slab” approach is general and could be applied to whatever research question may be on your mind.

The key takeaway is that the specification of a prior in a Bayesian model can be based on far more than a simple “principle of indifference” argument. Using a theory-informed prior, it is possible to seamlessly integrate behavioural constraints into your statistical models. This has the benefit of making analysis both more direct and more interpretable.

To close, I encourage you to embrace your night-time thought wanderings. While you do so, please keep a broad perspective on how Bayesian model specification can be used to dig deep and evaluate precisely the research questions you want to answer. Below is a Buffam-esqe poem of my very own:

P1. Other musings may include whether peoples’ choices become more or less rational (Bayesian) when they are sleep deprived. I’ll save you the sleepless nights; they become less (Dickinson, Dummond, and Dyche, 2016).


Dickinson, D. L., Drummond, S. P., & Dyche, J. (2016). Voluntary sleep choice and its effects on Bayesian decisions. Behavioral Sleep Medicine, 14, 501-513.

Falmagne, J.-C. (1978). A representation theorem for finite random scale systems. Journal of Mathematical Psychology, 18, 52-72.

McCausland, W. J., & Marley, A. A. J. (2014). Bayesian inference and model comparison for random choice structures. Journal of Mathematical Psychology, 62, 33-46.

]]> 0 Psychonomic Society Featured Content
We often know more than we think: Using prior knowledge to avoid prior problems #BayesInPsych Wed, 21 Feb 2018 08:01:47 +0000
One of the unique features of Bayesian statistical and computational modelling is the prior distribution. A prior distribution is both conceptually and formally necessary to do any sort of Bayesian modelling. If we are estimating the values of model parameters (e.g., regression coefficients), we do this by updating our prior beliefs about the parameter values […]

One of the unique features of Bayesian statistical and computational modelling is the prior distribution. A prior distribution is both conceptually and formally necessary to do any sort of Bayesian modelling. If we are estimating the values of model parameters (e.g., regression coefficients), we do this by updating our prior beliefs about the parameter values using the information from our experiment—and without priors, we’d have nothing to update!

Priors also play a major role when performing model selection by Bayes Factors, which includes the hypothesis testing described in several of the papers in the Special Issue. A key quantity in Bayesian model selection is the marginal likelihood, which tells us how consistent our observed data are with a target model. One problem is that models will usually make different quantitative predictions for different parameter values, and we do not know the values of the parameters. The marginal likelihood solves this problem by calculating a weighted average of goodness of fit across all possible parameter values (that is, across the entire parameter space), the weights being determined by our prior distribution of parameter values.

So whether we are estimating parameters, or performing model selection, the prior distribution on the model parameters needs to be specified. But where do the prior distributions come from, and what makes a good prior? In their paper in the Special Issue, Lee and Vanpaemel tackle these questions by discussing how we can specify informative priors for cognitive models.

Cognitive models—as opposed to statistical models—specify psychological theories as mathematical equations or computational algorithms. A default technique used by many modellers is to specify a non-informative or weakly informative prior. For example, if we have a parameter in our model that varies between 0 and 1—for example, a forgetting rate—it seems reasonable to assume that all values between 0 and 1 are equally plausible. However, when we do so, we are arguably throwing away information that is relevant to our modelling. We probably have a fair idea that some forgetting takes place (so the parameter is unlikely to be 0), and that forgetting isn’t catastrophic either (so values close to 1 are also implausible).

Lee and Vanpaemel argue that modellers should be more dedicated to specifying informative priors that specify that some parameter values are clearly more likely than others. Before applying a model to data, we almost always have some idea about the data we are going to see. We all have some notion that some results in an experiment would be more surprising than others, and we usually know enough about our experiments to detect anomalies in our results.

The issue is that the researchers who develop models invest large amounts of time into developing their model (the likelihood in Bayesian inference) to capture key effects in their domain, only to helplessly throw up their arms when it comes to specifying the prior. The view advanced by Lee and Vanpaemel, and others, is that we shouldn’t shy away from specifying an informative prior.

Indeed, Lee and Vanpaemel identify several benefits of specifying informative priors. Informative priors can solve issues with statistical ambiguity (dealing with issues of parameter identifiability) and theoretical ambiguity. Importantly, models can be made simpler by using prior distributions to emphasise certain regions of parameter space as being more plausible. This acts to narrow the range of predictions the model makes, and so makes the theory less flexible and more falsifiable.

As noted by Lee and Vanpaemel, this can work both ways. A falsifiable model is more easily rejected, but if that model’s predictions match the data despite the constraints introduced by an informative prior, that model should receive more support. Several philosophers of science (Popper, Lakatos, and Meehl are just a few) have proposed that the “riskiness” of predictions should be taken into account when a model accounts for one or more sets of data. A well-known maxim due to Wesley Salmon is that it would be a “damn strange coincidence” if a tight range of predictions corresponds to the data, the implication being that data should be taken as giving greater support for a model if its predictions are more precise.

Bayesian model selection is one formalisation of the idea of risky predictions giving greater support. As mentioned earlier, the marginal likelihood is calculated by working out how well the model predicts the data for each possible set of parameter values, and then averaging all those values (this is a simplification—the parameter space is usually continuous and so we use exact or approximate integration).

Critically, this averaging is weighted by the prior distribution. If we place lots of weight (i.e., much of the mass of the prior) in a part of parameter space that doesn’t give good fits to the data, the average fit of the model will be poor: we have made a risky prediction that failed. Alternatively, if we place lots of weight in a part of parameter space that captures the data well, those good fits will be highly weighted, and the model will be better supported than one where we used a more diffuse prior: the model has made a successful risky prediction.

But how do we build our informative priors?

There often won’t be an obvious prior of parameters that have complex and indirect effects on model performance. Lee and Vanpaemel discuss several ways in which we can obtain these priors. One is eliciting priors from experts (e.g., researchers in the field). This can either be done directly (“What rates of forgetting are plausible?”), or working backwards from the predictions: by having experts determine which predictions in the set are more or less plausible a priori, modellers can work backwards to find those regions of parameter space that generate the endorsed predictions. One can also use previous model applications; if a model has been fit to a variety of experiments, modellers will have some basis to construct a prior based on the parameter estimates from those previous applications.

Another approach, used by Kary et al. (2016), is to obtain a data-informed prior. Under this approach, the model is first fit to a subset of the data so as to obtain a posterior distribution on the parameters. This posterior can then be used as a prior for fitting the remaining data; this is especially useful when performing model selection on this second part of the data.

It is notable that many of Lee and Vanpaemel’s comments also apply to statistical modelling. Recent years have seen the introduction of easy-to-use methods for calculating Bayes Factors for t-tests and ANOVAs, such as JASP—which is described in the special issue by Wagenmakers and colleagues—and the BayesFactor package of Morey, Rouder, and Jamil. Any models which specify an effect must specify a prior on that effect. These packages typically come with default priors for effects that are centered on zero, and with some default spread.

These default priors are reasonable for many problems, especially for effects that have not been well studied. However, in many situations we have good information about the possible distribution for effect sizes. For example, if we run a standard recognition memory experiment, we can be fairly confident that people can tell apart “old” and “new” items; the effect of old vs. new should be positive. Accordingly, a symmetric 0-centered distribution—in which we give equal weight to positive and negative effect sizes, and the most weight to the effect size of 0—is inappropriate.

As for cognitive models, psychological researchers can spend a fair amount of time working out our statistical models, especially when using more complicated methods such as mixed effects modelling or structural equation modelling. A message from Lee and Vanpaemel and others (e.g., Dienes, 2014) is that we might also spend a bit more time thinking about our priors.


Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781.

Kary, A., Taylor, R., & Donkin, C. (2016). Using Bayes factors to test the predictions of models: A case study in visual working memory. Journal of Mathematical Psychology, 72, 210-219.

]]> 0 Psychonomic Society Featured Content
The four horsemen of #BayesInPsych Tue, 20 Feb 2018 18:03:32 +0000
I see four benefits to the use of Bayesian inference:  Inclusion of prior information.  Regularization.  Handling models with many parameters or latent variables.  Propagation of uncertainty. Another selling point is a purported logical coherence – but I don’t really buy that argument so I’ll forget that, just as I’ll also set aside philosophical objections against […]

I see four benefits to the use of Bayesian inference:

  1.  Inclusion of prior information.
  2.  Regularization.
  3.  Handling models with many parameters or latent variables.
  4.  Propagation of uncertainty.

Another selling point is a purported logical coherence – but I don’t really buy that argument so I’ll forget that, just as I’ll also set aside philosophical objections against the use of probability to summarize uncertainty.

We’re concerned here with practicalities, not philosophy, and, although I do believe that the philosophy of statistics can be of applied importance (see, for example, Gelman and Hennig, 2017), we have enough directly practical issues to discuss here.

By mentioning the above four benefits, I’m not trying to say that Bayes is the right way to go, or the only way to go, or even the best solution in particular data analysis problems in psychology or elsewhere.

Rather, by laying out these four advantages, I’d like to separate them and consider ways in which various non-Bayesian methods can be used to deliver the same or similar benefits:

  1.  Prior information.  Bayesian inference includes priors directly and easily.  There are, however, ways in which non-Bayesian analyses incorporate prior information: (a) Design.  It is considered acceptable and even desirable to use plausible pre-data estimates of effect sizes and variation to set design parameters and sample size.  (b) Determination of type M (magnitude) and S (sign errors).  We can use effect size estimates obtained independently of the data to assess reporting biases; see Gelman and Carlin (2014).  Indeed, there are settings in which prior information is so strong that a Bayesian or a non-Bayesian analysis can reveal the futility of a classical confidence interval or hypothesis test (see for example Gelman and Weakliem, 2009).
  2.  Regularization.  Bayesian inference with an informative prior gives more stable parameter estimates and predictions, compared to the corresponding inferences from least squares, maximum likelihood, or other traditional statistical procedures.  Again, though, we can ask whether newer non-Bayesian methods can achieve the benefits of regularization, and again the answer is yes:  methods such as lasso regression and false discovery rate analysis extend the classical ideas of point estimation and hypothesis testing to perform regularization, yielding stable inferences even in the limit of increasing number of parameters to estimate or hypotheses to test.
  3.  Models with many parameters or latent variables.  Making use of the rules of probability, Bayesian inference can work when the dimensionality of parameters and latent data is large.  Machine learning approaches such as deep networks can give good inferences in high dimensions too, but by using ideas that are very close to Bayesian, using tools such as variational inference to average over the distribution of latent parameters.
  4.  Propagation of uncertainty from inference to decision.  Bayesian decision analysis requires the user to specify a cost or utility function as well as a prior distribution, but the resulting coherent process is difficult to construct using any existing non-Bayesian approaches.  Alternatives such as p-value thresholds do not do the job (see McShane et al., 2017).  And, for that matter, approaches to Bayesian decision making that avoid explicit cost or utility functions–here I’m thinking of rules based on Bayes factors or posterior probabilities–also fail to be coherent, and in my opinion do not have good practical value.  Bayesian inference without real priors can still be useful–it still allows propagation of uncertainty and can perform some minimal level of regularization–but Bayesian decision making without some measures of costs and benefits does not make sense to me.

In summary, I think Bayesian methods are helpful in psychology and many other applied fields.  Adding prior information can be crucial in constructing inferences that make sense; even weak prior distributions can usefully regularize; in any case, Bayesian inference can handle high-dimensional uncertainty; and Bayesian posterior probabilities can be mapped into decisions, in which case it makes sense to check model fit and carefully examine the assumed cost or utility functions.

To varying degrees, many of the benefits of Bayesian inference can be reaped using existing non-Bayesian approaches.  This statement is not meant as a disparagement of Bayesian inference, as the statement could be flipped around to read that many of the benefits of various popular statistical methods can be reconstructed using the Bayesian calculus in an implementation that for many applications is modular and transparent.


Efron, B., & Hastie, T. (2016).  Computer Age Statistical Inference.  Cambridge University Press.

Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science, 9, 641-651.

Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics (with discussion and rejoinder). Journal of the Royal Statistical Society A 180, 967-1033.

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2017).  Abandon statistical significance.

]]> 0 Psychonomic Society Featured Content
From classical to new to real: A brief history of #BayesInPsych Mon, 19 Feb 2018 08:11:15 +0000
The #BayesInPsych Digital Event kicked off yesterday and as the leading Guest Editor of the special issue of Psychonomic Bulletin & Review, I take this opportunity to provide more context for this week’s posts. The simple act of deciding which among competing theories is most likely—or which is most supported by the data—is the most […]

The #BayesInPsych Digital Event kicked off yesterday and as the leading Guest Editor of the special issue of Psychonomic Bulletin & Review, I take this opportunity to provide more context for this week’s posts.

The simple act of deciding which among competing theories is most likely—or which is most supported by the data—is the most basic goal of empirical science, but the fact that it has a canonical solution in probability theory is seemingly poorly appreciated. It is likely that this lack of appreciation is not for want of interest in the scientific community; rather, I suspect that many scientists hold misconceptions about statistical methods.

Indeed, many psychologists attach false probabilistic interpretations to the outcomes of classical statistical procedures (p-values, rejected null hypotheses, confidence intervals, and the like; we know this from papers such as those by Gigerenzer, 1998; Hoekstra, Morey, Rouder, & Wagenmakers, 2014; Oakes, 1986). You can also use yesterday’s post to catch a quick glimpse at the reasons for those misconceptions, and the adverse consequences they may entail.

Because the false belief that classical methods provide the probabilistic quantities that scientists need is so widespread, researchers may be poorly motivated to abandon these practices.

The American Statistical Association recently published an unusual warning against inference based on p-values (Wasserstein & Lazar, 2016). Unfortunately, their cautionary message did not conclude with a consensus recommendation regarding best-practice alternatives, leaving something of a recommendation gap for applied researchers.

In psychological science, however, a replacement had already been suggested in the form of the “New Statistics” (Cumming, 2014)—a set of methods that focus on effect size estimation, precision, and meta-analysis, and that would forgo the practice of ritualistic null hypothesis testing and the use of the maligned p-value. However, because the New Statistics’ recommendations regarding inference are based on the same flawed logic as the thoughtless application of p-values, they are subject to the same misconceptions and are lacking in the same department. It is not clear how to interpret effect size estimates without also knowing the uncertainty of the estimate.

And despite common misconceptions, confidence intervals do not measure uncertainty; Morey, Hoekstra, Rouder, Lee, & Wagenmakers, 2016). You may recall that we ran a Digital Event that was dedicated to those common misconceptions about confidence intervals. The “New Statistics” also does not tell us how to decide which among competing theories is most supported by data.

In the #BayesInPsych special issue of Psychonomic Bulletin & Review, we review a different set of methods and principles, now based on the theory of probability and its deterministic sibling, formal logic (Jaynes, 2003; Jeffreys, 1939). The aim of the special issue is to provide and recommend this collection of statistical tools that derives from probability theory: Bayesian statistics.

Overview of the special issue on Bayesian inference

The special issue is divided into four sections. The first section is a coordinated five-part introduction that starts from the most basic concepts and works up to the general structure of complex problems and to contemporary issues. The second section is a selection of advanced topics covered in-depth by some of the world’s leading experts on statistical inference in psychology. The third section is an extensive collection of teaching resources, reading lists, and strong arguments for the use of Bayesian methods at the expense of classical methods. The final section contains a number of applications of advanced Bayesian analyses that provides an idea of the wide reach of Bayesian methods for psychological science.

Section I: Bayesian Inference for Psychology

The special issue opens with Introduction to Bayesian inference for psychology, in which Etz and Vandekerckhove describe the foundations of Bayesian inference. They illustrate how all aspects of Bayesian statistics can be brought back to the most basic rules of probability, and show that Bayesian statistics is nothing more nor less than the systematic application of probability theory to problems that involve uncertainty.

Wagenmakers, Marsman, et al. continue in Part I: Theoretical advantages and practical ramifications by illustrating the added value of Bayesian methods, with a focus on its desirable theoretical and practical aspects. Then, in Part II: Example applications with JASP, Wagenmakers, Love, et al. showcase JASP: free software that can perform the statistical analyses that are most common in psychology, and that can execute them in both a classical (frequentist) and Bayesian way.

However, the full power of Bayesian statistics comes to light in its ability to work seamlessly with far more complex statistical models. In Part III: Parameter estimation in nonstandard models, Matzke, Boehm, and Vandekerckhove discuss the nature of formal models and how to implement models of high complexity in modern statistical software packages.

Rounding out the section, Rouder, Haaf, and Vandekerckhove in Part IV: Parameter estimation and Bayes factors discuss the fraught issue of estimation-versus-testing. The paper illustrates that the two tasks are one and the same in Bayesian statistics, and that the distinction in practice is not a distinction of method but of how hypotheses are translated from verbal to formal statements.

Section II: Advanced Topics

The Advanced Topics section covers three important issues that go beyond the off-the-shelf use of statistical analysis. In Determining informative priors for cognitive models, Lee and Vanpaemel highlight the sizable advantages that prior information can bring to the data analyst become cognitive modeler.

Because Bayesian analyses are not in general invalidated by “peeking” at data, the use for sample size planning and power analysis is somewhat diminished. Nonetheless, it is sometimes useful for logistical reasons to calculate ahead of time how many participants a study is likely to need. In Bayes factor design analysis: Planning for compelling evidence, Schoenbrodt and Wagenmakers provide exactly that.

Finally, there arise occasions where even the most sophisticated general-purpose software will not meet the needs of the expert cognitive modeler. In A simple introduction to Markov chain Monte-Carlo sampling, van Ravenzwaaij, Cassey, and Brown describe the basics of sampling-based algorithms and illustrate how to construct custom algorithms for Bayesian computation.

Section III: Learning and Teaching

Four articles make up the Learning and Teaching section. The goal of this section is to collect the most accessible, self-paced learning resources for an engaged novice.

While it is of course their mathematical underpinnings that support the use of Bayesian methods, their intuitive nature provides a great advantage to novice learners. In Bayesian data analysis for newcomers, Kruschke and Liddell cover the basic foundations of Bayesian methods using examples that emphasize this intuitive nature of probabilistic inference.

With The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and planning from a Bayesian perspective, Kruschke and Liddell lay out a broad and comprehensive case for Bayesian statistics as a better fit for the goals of the aforementioned New Statistics (Cumming, 2014).

How to become a Bayesian in eight easy steps is notable in part because it is an entirely student-contributed paper. Etz, Gronau, Dablander, Edelsbrunner, and Baribault thoroughly review a selection of eight basic works (four theoretical, four practical) that together cover the bases of Bayesian methods.

The fourth and final paper in the section for teaching resources is Four reasons to prefer Bayesian analyses over significance testing by Dienes and McLatchie. It is likely that widespread misconceptions about classical methods have made it seem to researchers that their staple methods have the desirable properties of Bayesian statistics that are, in fact, missing (as we noted yesterday). Dienes and McLatchie present a selection of realistic scenarios that illustrate how classical and Bayesian methods may agree or disagree, proving that the attractive properties of Bayesian inference are often missing in classical analyses.

Section IV: Bayesian Methods in Action

The concluding section contains a selection of fully-worked examples of Bayesian analyses. Three powerful examples were chosen to showcase the broad applicability of the unifying Bayesian framework.

The first paper, Fitting growth curve models in the Bayesian framework by Oravecz and Muth, provides an example of a longitudinal analysis using growth models. This framework is likely to gain prominence as more psychologists focus on the interplay of cognitive, behavioral, affective, and physiological processes that unfold in real time and whose joint dynamics are of theoretical interest.

In a similar vein, methods for dimension reduction have become increasingly useful in the era of Big Data. In Bayesian latent variable models for the analysis of experimental psychology data, Merkle and Wang give an example of an experimental data set whose various measures are jointly analyzed in a Bayesian latent variable model.

The final section of the special issue is rounded out by Sensitivity to the prototype in children with high-functioning autism spectrum disorder: An example of Bayesian cognitive psychometrics by Voorspoels, Rutten, Bartlema, Tuerlinckx, and Vanpaemel. The practice of cognitive psychometrics involves the construction of often complex nonlinear random-effects models, which are typically intractable in a classical context but pose no unique challenges in the Bayesian framework.

Additional coverage

As part of our efforts to make our introductions to Bayesian methods as widely accessible as possible, we have set up a social media help desk where questions regarding Bayesian methods and Bayesian inference, especially as they are relevant for psychological scientists, are welcomed. This digital resource is likely to expand in the future to cover new developments in the dissemination and implementation of Bayesian inference for psychology.

Finally, we have worked to make many of the contributions to the special issue freely available online. The full text of many articles is freely available via the Open Science Framework. Here, too, development of these materials is ongoing, for example with the gradual addition of exercises and learning goals for self-teaching or classroom use.


Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29.

Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences, 21, 199–200.

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.- J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.

Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge: Cambridge University Press.

Jeffreys, H. (1939). Theory of probability (1st ed.). Oxford, UK: Oxford University Press.

Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23, 103–123.

Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. New York: Wiley.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p–values: Context, process, and purpose. The American Statistician.

This post is an abridged and edited version of the Editorial to the special issue on #BayesInPsych that was co-authored with Jeffrey N. Rouder and John K. Kruschke.

]]> 0 Psychonomic Society Featured Content
#BayesInPsych: Preventing miscarriages of justice and statistical inference Sun, 18 Feb 2018 19:20:52 +0000
Your brilliant PhD student ran an experiment last week that investigated whether chanting the words “unicorns, Brexit, fairies” repeatedly every morning before dawn raises people’s estimates of the likelihood that they will win the next lottery in comparison to a control group that instead chants “reality, reality, reality”. The manipulation seems to have worked, as […]

Your brilliant PhD student ran an experiment last week that investigated whether chanting the words “unicorns, Brexit, fairies” repeatedly every morning before dawn raises people’s estimates of the likelihood that they will win the next lottery in comparison to a control group that instead chants “reality, reality, reality”. The manipulation seems to have worked, as the estimates in the experimental group are higher than in the control group. A t-test is highly significant (p < .0001), confirming that the effect was highly unlikely to be due to chance alone, and therefore was likely the result of the experimental intervention. Your student is ready to write this up and submit it to the Journal of Fantastical Politics.

Not so fast, please!

That significant t-test did not tell you that the effect was unlikely to be due to chance alone.

What it did tell you is that if chance were the only effect present, then it is unlikely that you would have observed a t-value that large or larger.

The difference between “unlikely to be due to chance alone” and “if chance were the only effect present, then it is unlikely…” may be semantically subtle but it is statistically and conceptually profound.

So profound, in fact, that it may destroy lives.

Sally Clark was an English solicitor who in 1999 was found guilty of murdering two of her sons, who had died suddenly as babies, apparently from sudden infant death syndrome (SIDS). Her conviction rested largely on statistical evidence that the chances of two children in the same family dying of SIDS were only 1 in 73 million. That’s a very small p-value indeed, and one might be tempted to support the court’s decision to reject the null hypothesis of innocence (i.e., reject the idea that the deaths were due to SIDS).

The court’s decision was, alas, statistically deeply flawed, as the Royal Statistical Society pointed out in a letter to the Lord Chancellor in 2002, after Sally Clark had been languishing in prison for several years. In the letter, Peter Green, the President of the Royal Statistical Society, argued:

“The jury needs to weigh up two competing explanations for the babies’ deaths: SIDS or murder. The fact that two deaths by SIDS is quite unlikely is, taken alone, of little value. Two deaths by murder may well be even more unlikely. What matters is the relative likelihood of the deaths under each explanation, not just how unlikely they are under one explanation.”

If you want to walk through the steps that underlie this conclusion, here is a brief but instructive video that explains the statistics involved in the Sally Clark case:

The video suggests that the chance of Sally Clark being innocent was around 50%. In fact, even that probability is likely an under-estimate because another mathematician involved in the case, Ray Hill, put the probability of her innocence at somewhere between 17 to 1 and 9 to 1.

Sadly, although Sally Clark was released in 2003 largely because the initial statistical evidence was overturned, she never recovered from the trauma and died in 2007.

Let us return from a miscarriage of justice to the experiment of your brilliant PhD student. Let’s prevent a miscarriage of statistical inference.

That p-value of <.0001 tells you how likely the outcome was to have been observed under the null hypothesis of chance alone—in the same way that the probability of two children dying from SIDS can be calculated (although it’s much higher than 1 in 73 million, which reveals another flaw in the prosecution’s case against Sally Clark). But the p-value does not tell you how likely your PhD student’s result would have been even if the null hypothesis had been false—in the same way that the 1 in 73 million figure levelled against Sally Clark did not consider the likelihood of a mother having murdered two of her children.

Perhaps most crucially, the p-value tells you nothing about the relative likelihood of the hypotheses you are interested in—guilt or innocence in Sally Clark’s case, and the null hypothesis versus its alternatives in the case of your brilliant PhD student.

In a recent article on inference in psychology, Jeff Rouder and colleagues helpfully rephrased Peter Green’s letter to the Lord Chancellor to drive home this point:

“The researcher needs to weigh up two competing explanations for the data: The null hypothesis or the alternative hypothesis. The fact that the observed data are quite unlikely under the null hypothesis is, taken alone, of little value. The observed data may well be even more unlikely under the alternative hypothesis. What matters is the relative likelihood of the data under each hypothesis, not just how unlikely they are under one hypothesis.”

So where does this leave us?

If we cannot use p-values to draw reliable conclusions, how can we avoid miscarriages of statistical inference?

One answer was provided by this 18th-century gentleman, the Reverend Thomas Bayes:

The fundamental contribution of Bayes was presented in An Essay towards solving a Problem in the Doctrine of Chances, which was read to the Royal Society posthumously in 1763.

In a nutshell, Bayes’ theorem resolved the problem known as “inverse probability”, which is precisely the problem we wish to solve in statistical inference: A t-test gives us the probability of an event occurring conditional upon a state of the world (namely, that the null hypothesis is true). What researchers want, however, is the inverse—namely an insight into the likely state of the world given the statistical evidence at hand.

To date, quite frequent(tist)ly, researchers have achieved that desired conclusion by a process of wishful misinterpretation of p-values (the statistical equivalent of pre-dawn incantation of “unicorns, Brexit, fairies”).

A better way of obtaining the desired conclusion is to discard conventional null-hypothesis testing and replace it with Bayesian statistics, as more and more researchers have come to realize.

It is not surprising, then, that if you search this Featured Content site for the string “Bayes” you get 41 hits already. This number will increase next week, because our next digital event, #BayesInPsych, which commences on Monday 19 February, is dedicated to this Bayesian revolution.

The digital event coincides with the publication of a special issue of the Psychonomic Bulletin & Review dedicated to Bayesian Inference for Psychology. The issue was guest edited by Joachim Vandekerckhove (University of California, Irvine), Jeffrey N. Rouder (University of California, Irvine, and University of Missouri), and John K. Kruschke (Indiana University).

The articles in this issue will remain free to access by the public until early April. Here are the titles of the articles and their authors. They can be accessed from this landing page:

Beginning on Monday, 19 February, we will be discussing some of those articles here in our next digital event #BayesInPsych. Posts will be by the following contributors, in the likely order of appearance:

  • Joachim Vandekerckhove will provide an overview of the special issue.
  • Andy Gelman will summarize the four benefits of using Bayesian inference.
  • Simon Farrell will address the problem of specifying priors during Bayesian modelling.
  • Clintin Davis-Stober will discuss how using priors can bring you closer to your research dreams.
  • Trisha van Zandt and Brandon Turner will explore how we can go beyond analysis of simple, tractable models like those presented in the special issue to complex, realistic and computational models that have no likelihood.

Much to look forward to—please tune in for all of next week.


]]> 0 Psychonomic Society Featured Content
How the takete got its spikes: Why some words sound like what they are Thu, 15 Feb 2018 09:22:36 +0000
Human beings have an incredible capability to communicate. Unlike other species, humans have evolved to use language to express our states, desires, observations, and, I guess, tweet about them. Language is a powerful system of communication because it allows the expression of counterfactuals: we can easily discuss the past and future; distant, unseen locations; complex […]

Human beings have an incredible capability to communicate. Unlike other species, humans have evolved to use language to express our states, desires, observations, and, I guess, tweet about them. Language is a powerful system of communication because it allows the expression of counterfactuals: we can easily discuss the past and future; distant, unseen locations; complex emotional states; abstract concepts; and many things that are difficult, if not impossible, to see, hear, and feel in the real world.

But with great power comes great inscrutability. While we can look directly at a GIF and see emotional states (excitement, disgust), or look at an Emoji and see visual icons (eggplant, brain), we cannot hear the word “eggplant” and evoke the features of eggplant.

To illustrate this point, try to imagine what a “takete” might be like compared to a “maluma.” The phonemes that make up these (imaginary) words convey nothing about what a takete or maluma might look like or be. (Phonemes are the smallest unit of speech that convey meaning).

This mapping, of the sounds of a word onto the meaning of the word has often been thought to be arbitrary. We cannot derive any of the features of a takete from hearing the word because the sounds in it are not meaningfully related to the thing in the real world.

Or can we?

Sound symbolic association, a theory in modern linguistics, argues that the structure of the sounds in phonemes are not arbitrarily related to their meaning, but show systematic relationships. For example, the sounds in “takete” are sharp and abrupt – which might be why people who encounter this (imaginary) word are more likely to associate it with a sharp object than the softer-sounding, longer vowels and consonants of “maluma.” (You may recall that we’ve covered this phenomenon before on this blog with round “bouba” and pointy “kiki.”) It might also be why people can guess the meaning of (real) foreign antonyms above chance – because they’re likely to sound like what they are.

Sound symbolic associations can occur with several different degrees of iconicity – whether the form of the phoneme matches an aspect of its meaning. Sound symbolic associations occur directly in cases where the sound resembles the symbol (like onomatopoeia – “bang” or “whirr”). They also occur indirectly in cases where, for example, making the sound in “round” causes the mouth to become round. Sound symbolic associations can occur across several words – teeny (with its short, high-frequency vowel) is smaller than tiny (with its longer, lower-frequency vowel).

Examples abound of sound symbolic associations, within and across languages, but it’s unclear why. A recent paper published in Psychonomic Bulletin & Review by David Sidhu and Penny Pexman discusses five possible mechanisms behind sound symbolic associations. Of course, this list of mechanisms may not be exhaustive, and the mechanisms may also combine to create some of the sound symbolic associations that have been discovered. But for simplicity, Sidhu and Pexman discuss them one at a time. Here they are:

1. Statistical co-occurrence. Perhaps sound symbolic associations arise because dimensions of things in the world themselves tend to occur together.

  • Example: Large things—such as elephants—tend to make lower frequency noises and resonate at lower frequencies than small things—such as hummingbirds. Over time, humans could have observed these co-occurrences, which would arise again in language. Front vowels (like the vowel in “min”) have a higher (second) frequency than back vowels (like the vowel in “max”). These frequencies map onto the size meanings of “minimum” and “maximum.”

2. Shared properties. Sounds may share some property directly with the things they refer to. This would mean certain sounds would naturally be paired with particular things. According to Sidhu and Pexman, this mapping could be at a perceptual level (like the sound in round, described above), or a conceptual level (like a high-frequency vowel being associated with a high-frequency tone, and relating to the concept of bright). These often occur across modalities (like, from sound to vision), which is why they are often called cross-modal perceptions (as we’ve discussed before on this blog).

  • Example: Takete and maluma each share properties with sharpness and roundness that match their phonemic qualities.

3. Neural factors. Overlap in neural processing is not uncommon, and could explain why the production of some sounds corresponds with certain concepts.

  • Example: Neural control of articulation of certain phonemes (like “t” versus “g”) corresponds with a precision mouth grip (opening a sunflower seed) and power mouth grip (bobbing for apples), respectively. This neural association between mouth grasping and a phoneme might result in the association between small “t” things and large “g” things.

4. Species-general associations. On an evolutionary time scale, natural selection may have put pressure on organisms to identify particular sounds, and thus gain information about their environment.

  • Example: Across species, low-frequency utterances tend to indicate threats, which could be because organism are trying to appear large. Thus, evolutionary pressure would lead to better survival for organisms who associated low-frequency with large.

5. Language patterns. Like statistical co-occurrence, language pattern explanations of sound symbolic associations emphasize commonalities in meaning within language for certain phonemes.

  • Example: Glow, glisten, gleam, glitter, glamour, and glory all relate to brightness and shininess and all share an initial “gl” sound, despite “gl” not having anything to do with brightness on its own.

Sound symbolic associations fly in the face of what most of us think about language – that except for a few outliers, mostly language is arbitrary. Instead, language might contain more meaningful relations that make it easier for infants to learn, for speakers to remember, and possibly for language to have evolved in the first place.

Reference for the article discussed in this post:
Sidhu, D.M., & Pexman, P.M. (2017). Five mechanisms of sound symbolic association. Psychonomic Bulletin & Review. DOI: 10.3758/s13423-017-1361-1.

]]> 0 Psychonomic Society Featured Content
Phineas Gage in a bottle: Alcohol decreases prefrontal activity Mon, 12 Feb 2018 02:48:50 +0000
There are many ways to become famous. Phineas Gage, an American railway construction foreman in the mid-19th century, experienced one of the most improbable (and least recommended) paths to eternal fame. Few first-year psychology students around the world will have escaped the story of Phineas, and his mishap with an iron rod used to tamp […]

There are many ways to become famous. Phineas Gage, an American railway construction foreman in the mid-19th century, experienced one of the most improbable (and least recommended) paths to eternal fame. Few first-year psychology students around the world will have escaped the story of Phineas, and his mishap with an iron rod used to tamp blasting powder into a hole in the rock in preparation for a controlled explosion. Things didn’t go to plan, and the explosion was rather lacking in control. Wikipedia relates the story:

“Rocketed from the hole, the tamping iron‍—‌1 1⁄4 inches (3.2 cm) in diameter, three feet seven inches (1.1 m) long, and weighing 13 1⁄4 pounds (6.0 kg)‍—‌entered the left side of Gage’s face in an upward direction, just forward of the angle of the lower jaw. Continuing upward outside the upper jaw and possibly fracturing the cheekbone, it passed behind the left eye, through the left side of the brain, and out the top of the skull through the frontal bone.”

Improbably, Phineas Gage survived, and it is his survival that secured his place in the hall of fame of neuroscience. After the accident, his behavior seems to have been altered dramatically. According to his physician, “the balance between his intellectual faculties and animal propensities seems to have been destroyed”. Phineas apparently became grossly profane, irreverent, and showed “but little deference for his fellows.”

We now know that those changes in personality likely resulted from damage to Phineas Gage’s frontal lobes, the part of the brain now commonly implicated in numerous aspects of executive control, including in particular the tempering of a person’s level of aggression. No wonder, then, that Phineas experienced a lasting change in personality, although the exact details of this personality change are actually remarkably fuzzy.

Our knowledge of how the brain operates, and what parts are involved in particular aspects of our behavior and cognition, has expanded considerably since the mid-19th century. Modern technology has played a major role in this development as we have moved from tamping irons to non-invasive techniques, such as functional magnetic resonance imaging (fMRI), which we have discussed here previously on several occasions. fMRI exploits the fact that when neurons in a particular brain region are active, more fresh oxygen-rich blood flows into these regions, replacing oxygen-depleted blood. fMRI detects these changes in blood flow, via magnetic resonance (hence the name), allowing us to measure how active a brain region is at any given time.

A recent article in the Psychonomic Society’s journal Cognitive, Affective, & Behavioral Neuroscience reported a study using fMRI that focused on why people often become aggressive and even violent after drinking alcohol. Most existing theories have linked this aggression to alcohol-related changes in the functioning of the prefrontal cortex: Alcohol may have consequences that are less permanent than those suffered by Phineas, but perhaps it acts on the same regions as that mid-19th century tamping iron?

Researchers Thomas Denson, Kate Blundell, Timothy Schofield, Mark Schira, and Ulrike Krämer addressed this question by inviting participants to the laboratory to consume an alcoholic beverage containing vodka or a placebo (in the control condition), before participating in an aggression-eliciting task. The amount of alcohol consumed was calibrated to each participant’s weight to achieve a level of intoxication of .05; that is, the point at which it is illegal to drive in most jurisdictions.

Participants in both conditions were told that they would be consuming alcohol. The drinks were mixed in front of participants, using a vodka bottle to add the alcohol (in the experimental condition) or tonic-water placebo (in the control condition). In the control condition, a small amount of vodka was smeared on the rim of the participant’s cup to provide an odor of alcohol.

The task was the Taylor Aggression Paradigm (TAP), which is illustrated in the figure below and which was administered while participants were lying in the fMRI scanner. The basic idea behind the TAP is that participants believe they are competing against an opponent on a reaction-time task. Depending on whether they win or lose on a given trial, they administer or receive a noise blast to/from their opponent. In reality, no opponent exists and the sequence of wins and losses, as well as the intensity of the noise blasts delivered by the “opponent” are controlled by a computer.

As shown in the figure, participants first decide on a level of aggression that they wish to deliver to the opponent (on an intensity scale from 1 to 4) should they win the next trial. They then wait for a colored square to appear, whereupon they press a button as quickly as possible. On a “winning” trial, when participants were “faster” than the imaginary opponent, the burst of noise of the pre-determined level of intensity was (ostensibly) delivered to the opponent. On a “losing” trial, the participant would be exposed to the noise blast selected by the opponent. Denson and colleagues created two “opponents” that differed in the level of provocation: Opponent 1 only delivered noise blasts of intensity 1 or 2, whereas Opponent 2 delivered blasts of intensity 3 or 4 only.

The main behavioral results are shown in the figure below, which plots the participant-selected noise level across trials for each opponent in the two conditions.

Aggression was greater overall when the game was played against a high-provocation opponent.

No surprises there.

But what does this have to do with Phineas Gage?

The figure below shows the effects of alcohol on brain activation in certain regions in the brain that were selected to be of interest on the basis of prior theory. The figure shows that brain activation was lowered by alcohol in the prefrontal cortex (PFC; top panel), caudate (second panel), and ventral striatum (third panel). The opposite result, heightened activity due to alcohol, was observed in the hippocampus.

These results are compatible with the idea that the PFC is instrumental in controlling aggression, and that alcohol lowers the functioning of the PFC to control that aggression.

Further confirmation of the role of the PFC in managing aggression was obtained by correlating activity in the dorsomedial and dorsolateral PFC with aggressive behavior. These results are shown in the figure below: it is clear that activity was related to aggression, but only for intoxicated participants.

Denson and colleagues opine that when considered together, these findings

“suggest that when intoxicated, the PFC becomes dysregulated relative to sobriety, but that the activity that is present may facilitate intoxicated aggression.”

The results reported by Denson and colleagues are largely consistent with a growing body of research about the neural basis of aggression, and how it is triggered by altering the function of the prefrontal cortex, the limbic system and reward-related regions of the brain.

Alcohol seems to trigger those changes much like a tamping iron does, although with (usually) considerably less permanence.

Psychonomics article featured in this post:

Denson, T. F., Blundell, K. A., Schira, M. M., & Krämer, U. M. (2018). The Neural Correlates of Alcohol-related Aggression. Cognitive, Affective, & Behavioral Neuroscience. DOI: 10.3758/s13415-017-0558-0.

]]> 0 Psychonomic Society Featured Content
When working memory works with ⺙x – 2 = ⻂: Effects of prior training on performance Thu, 08 Feb 2018 20:49:47 +0000
“Working memory” is a broad term that describes what we do with information that is consciously accessible. For instance, when students take notes in class, they are hearing the lecturer’s sentences, placing them in the context of what they know about the topic, and synthesizing both to form the note they ultimately write on the […]

“Working memory” is a broad term that describes what we do with information that is consciously accessible. For instance, when students take notes in class, they are hearing the lecturer’s sentences, placing them in the context of what they know about the topic, and synthesizing both to form the note they ultimately write on the page. This may require no manipulation — if, say, they write down exactly what was said — or substantial manipulation, if they are building a mindmap on the fly.

Working memory and long-term memory interact with one another; things you’ve previously learned help you understand and structure the information you’re currently working with. One of the most common examples is “chunking”, in which items are combined into meaningful groups to be remembered. The eight-digit number sequence 31412718 is easier to remember if you know it is the first four digits of π and the first four digits of e. Eight numbers become two — but only if you already know π and e.

The connection between working memory and long-term memory is a matter of debate, but it is typically assumed that well-learned material from long-term memory can be manipulated with considerable and equal ease in working memory. This is consistent with working memory and long-term being conceptualized as, in some sense, separate.

In their article “Item strength affects working memory capacity”, published recently in the Psychonomic Society’s journal Memory & Cognition, Shen, Popov, Delahay, and Reder challenge this view. They show that the ability of participants to remember associations between Chinese characters and numbers — and then use those associations — depends on how often participants have previously seen those Chinese characters, in spite of the fact that all the characters were well-learned.

This is difficult to reconcile with well-known conceptions of working memory, such as those by Atkinson and Shiffrin or by Baddeley and Hitch, that depend on long-term and working memory being separate, with abstract representations of items in memory being “transferred” between them.

When we learn new things — such as words in new language, or mathematical symbols — the frequency of different words or symbols will vary. For instance, in English some words are common (“the”) and others rare (“qat”). Shen and colleagues wanted to explore the effect of this variability on working memory, but they did not want to depend on natural variability in already learned items, because this is often confounded with other variables; for instance, the most common words in English are functional words like “the”, with nouns and verbs being less common. Any differences between common words and uncommon words might be attributed to factors other than how often they’re seen. To get around this, Shen and colleagues used Chinese characters that were unfamiliar to the research participants. These Chinese characters were randomly assigned to be “high frequency” or “low frequency”, with high-frequency characters being shown twenty times more often than low-frequency items.

To teach the Chinese characters to the participants, the authors used a visual search task (see the figure below). Participants were shown a target Chinese character (high or low frequency) for one second and then shown a group of four similar characters that might, or might not, contain the target character. Participants were then asked to indicate whether the target character was present. In nine sessions across several weeks, the participants performed over 6,000 such trials, gradually improving to about 95% accuracy, on average.

Shen and colleague’s visual search task Participants performed over 6000 trials over several weeks.

Consider how long-term memory supports such a task. If an English reader were to perform the same task with Latin letters (say, a) they would only need to remember the identity of the letter, and hunt for it in the search group. This is due to the fact that a is known to English readers. However, this is not possible with characters that are not previously known. With the visual search task, the authors’ purpose was to induce long-term learning of the Chinese characters.

After weeks of training in the visual search task, the participants were then asked to perform a combined memory and algebra task. In this new task, on each trial participants were asked to remember associations between two of the Chinese characters and two digits. Participants were then immediately asked to solve an algebraic equation such as x/2 – 2 = 1, in which x=6. Notice that this equation takes two steps to solve: adding 2 to both sides, then multiplying both sides by 2. An equation like x-3=5, on the other hand, requires only one step to solve. Participants were presented with some of both.

In an additional twist, participants might also be provided with equations in which two of the numbers were replaced by the Chinese characters they were asked to remember: for instance, ⺙x – 2 = ⻂. In order to solve this equation, one would have to remember both associations between Chinese characters and digits, mentally substitute the digits into the equation, and then solve for x.

An equation could thus be one of four types: one step or two steps, and requiring substitution (from Chinese) or not. After solving the equation, the participants were tested on their memory for the Chinese characters and digits. Participants were asked to indicate which of the several Chinese characters was presented, and what digits were associated with them.

Shen and colleague’s combined memory and algebra task. Participants were asked to solve an equation while simultaneously remembering digits associated with two Chinese characters.

Let’s stop to consider what we’d predict will happen. The algebra task is taxing for working memory already; when we add mental substitution of Chinese characters, it seems reasonable to predict that the task will be even harder. Also, increasing the number of steps in the algebra task should make it harder. As the figure below shows, that’s exactly what Shen and colleagues found; increasing the number of steps and requiring substitution both decreased performance.

Average performance on the algebra task.

What Shen and colleagues were really after, though, was a potential effect of whether the Chinese characters were high frequency or low frequency during the initial training. As long as the Chinese characters were well-learned, there’s no particular reason why their frequency during training would affect how well you can juggle them during the algebra task. You can see from the figure above, however, that when two algebra steps were required, substitution of a low frequency character caused a performance decrease of over 10%, compared to high frequency characters.

Shen and colleagues call this an effect of “item strength” on working memory, although it should be noted that “item strength” is a metaphor rather than a measured quantity (and even granted the metaphor, an item’s representation could be “strong” but the item might still be low in frequency relative to other items). The reason for the effect of high vs. low frequency Chinese characters is not understood, but it seems clear any conception of working memory that considers long-term memory as a separate system that simply feeds into working memory is inadequate.

Interestingly, Shen and colleagues also discuss the implications of their findings for the debate about whether working memory capacity is best thought of as made up of discrete “slots” or a continuous resource. Is your memory limited by the number of discrete things you try to remember — say, four digits — or is it more flexible? Could you remember, say, ten simple things, but only two complex things? Shen and colleagues believe that their results rule out discrete working memory models, because in these models an item is simply that: an item. There does not seem to be any room for learned frequency to affect performance.

I suspect that advocates of “slot” models will object that 1) only the strongest of such models would fail, and 2) resource models benefit from being extremely flexible and making no prediction at all, which is not much of a win in scientific terms. It isn’t clear why learned frequency would affect the resources an item requires, either. One thing is certain, though: Shen and colleagues’ results provide fertile ground for further debate over the relationship of working memory to long-term memory.

Psychonomics article discussed in this post:

Shen, Z., Popov, V., Delahay, A. B., & Reder, L. M. (2017). Item strength affects working memory capacity. Memory & Cognition. DOI: 10.3758/s13421-017-0758-4.


]]> 0 Psychonomic Society Featured Content
Learning to classify better than a Student’s t-test: The joys of SVM Tue, 06 Feb 2018 19:29:20 +0000
Is a picture necessarily worth a thousand words? Do bilinguals always find some grammatical features in their second language to be more difficult than native speakers of that language? Is the Stroop effect necessarily larger when the task is to name the color ink of a color word than when the task is to read […]

Is a picture necessarily worth a thousand words? Do bilinguals always find some grammatical features in their second language to be more difficult than native speakers of that language? Is the Stroop effect necessarily larger when the task is to name the color ink of a color word than when the task is to read that word?

Pretty much any experiment conducted by researchers in cognitive science will involve a comparison between at least two groups or two conditions to which participants are exposed. That comparison will necessarily involve some form of statistical test, be it a frequentist or Bayesian test. There is almost no escaping the use of a test, because even confidence intervals really are variants of a statistical test, albeit with properties that many researchers do not necessarily understand.

So let us consider statistical tests.

Even the most basic of tests, for example the t-test that was invented to monitor the quality of Guinness stout (with great success, in my view), rests on various assumptions. For example, the data must be sampled independently from two normal distributions (in the two-sample case we are concerned with here) with, ideally, equal variances.

What happens if we violate those assumptions? Can we avoid those assumptions altogether?

The answer to the first question is nuanced, but in some cases it is “not much”. Almost 60 years ago, Alan Boneau published what I believe to be the first Monte Carlo experiment on the properties of the t-test. Let us look at his methodology in some detail because it will also help us understand the answer to the second question.

A Monte Carlo experiment relies on simulating a process or procedure by sampling of random numbers—hence the name. Because we can control the exact nature of those random numbers, and because we know exactly how they were sampled, we can use Monte Carlo techniques to gather insight into the behavior of statistical tests. In a nutshell, we create a situation in which we know with 100% certainty that something is true—for example, we may know that the null hypothesis is true because we sample some random numbers from two populations with identical means and variances.

Suppose we do precisely that (with 15 observations per group, let’s say) and then conduct a t-test. What’s the expected probability of us finding a significant difference between our two samples? Exactly, it’s .05 (assuming we set our alpha level to .05).

Now suppose we repeat that process 1,000 times and count the actual number of times the t-test is significant. We would expect to count around 50 such events, each of which represents a dreaded Type I error, give or take a few because the process is random.

Enter the important question: what happens if we violate the assumptions and we sample from, say, two uniform distributions instead of the normal distributions as required by the t-test? What if we introduce inequality between the variances?

Boneau explored several of those potential violations. Here is what he had to say:

“it is demonstrated that the probability values for both t, and by generalization for F are scarcely influenced when the data do not meet the required assumptions. One exception to this conclusion is the situation where there exists unequal variances and unequal sample sizes. In this case the probability values will be quite different from the nominal values.”

That’s fairly good news because we do not have to lie awake at night wondering whether our data really are normally distributed. But there is that rather large fly in the ointment, namely that our presumed Type I error level is not what we think it is when we have unequal sample sizes (often unavoidable) and the variances between our two groups are different (also often unavoidable).

What then?

This brings us to the second question, can we avoid those assumptions altogether? Could we perform comparisons between conditions without having to worry about, well, anything really?

A recent article in the Psychonomic Society’s journal Behavior Research Methods addressed this question and introduced a new method for statistical comparisons that does not make any assumptions about how the data are to be modeled. Researchers Bommae Kim and Timo von Oertzen based their technique on an algorithm developed by artificial-intelligence researchers known as a Support Vector Machine (SVM).

In a nutshell, an SVM learns from examples how to assign labels to objects. The range of applications of SVMs is incredibly broad: SVMs can learn to detect fraudulent credit card transactions by examining 1,000s of credit card activities for which it has already been established whether they are fraudulent or nonfraudulent. SVMs can learn to recognize hand-writing by learning from a large collection of images of handwritten digits or letters. And the list goes on.

How does an SVM do this?

The figure below, taken from an excellent primer on SVMs, shows the simplest possible example. The data are taken from genetics in this instance, but the same principle applies to any other data set consisting of two groups (in this case the green vs. red dots that are separated along two dimensions of measurement).

The panel on the left shows the data, including one observation whose group membership is unknown (the blue dot). The panel on the right shows the “hyperplane” (in this case a line) that the SVM learns to arrange in a way that optimally differentiates between the two clusters. The unknown observation is now clearly identified as belonging to the red cluster.

Unlike a t-test, the SVM does not make any assumptions about the nature of the data: it simply seeks to differentiate between two clusters as best it can. If the SVM can assign group membership more accurately than expected by chance, then it has successfully learned the difference between the two groups. Crucially, it can only do so if there is a discernible difference between the two groups. In the above figure, if the red and green dots were randomly intermixed, this difference could not be learned and classification of unknown test cases (i.e., the blue dot) would be at chance. (In reality, an SVM does a lot more than drop a line in between two clusters; this tutorial provides a good introduction.)

So here, then, is the SVM equivalent of a t-test: two groups differ on one (or more) measure(s) if the machine can learn to assign unknown cases with above-chance accuracy. The unknown cases are simply those that the SVM is not trained on: this simply means we leave out some subset of the observations during training and then seek to predict the group membership of those “unknown” items after training. To maximize power, each observation can take a turn across multiple applications of the SVM to play the role of a single “unknown” observation.

Kim and von Oertzen reported multiple Monte Carlo experiments to demonstrate the utility of the SVM as a statistical analysis tool.

The simplest experiment is sketched in the figure below. Each panel contains two distributions, assumed to represent two different groups in an experiment. The two distributions differ either in terms of means only (panel a), or only variances (b), or shape (c), or all of the above (d).

The next figure shows the results of this experiment. All cell entries refer to the proportion of times that the tests yielded a significant difference between the two groups. The top row (Condition 1) refers to the situation in which the null hypothesis was perfectly true and groups differed neither in mean (M), nor shape or variance (SD). The entries for that row therefore reflect Type I errors, and it can be seen that both the t-test and the SVM were close to the expected .05 level.

Now consider the remaining rows of the table. Although the t-test was more powerful than the SVM when groups differed only in mean (.31 vs. .13) or in mean and variance (.27 vs. 15), the SVM outperformed the t-test in all other situations, in particular those involving a difference in shape between conditions.

Across a number of further conditions, including an experiment involving multivariate measures, Kim and von Oertzen observed that

“the SVMs showed the most consistent performance across conditions. Moreover, SVMs’ power improved when group differences came from multiple sources or when data contained multiple variables.”

The SVM is therefore particularly useful if the research question is to find any kind of differences between groups, whereas conventional methods are more useful if the focus is on specific differences between groups. Kim and von Oertzen argue that the search for any differences can be crucial in clinical applications or program evaluations, for example to ascertain that control and treatment groups do not differ on multiple measures after randomization.

Psychonomics article highlighted in this blogpost:

Kim, B., & von Oertzen, T. (2017). Classifiers as a model-free group comparison test. Behavior Research Methods. DOI: 10.3758/s13428-017-0880-z.

]]> 0 Psychonomic Society Featured Content
When a flash a memory makes: Memorability of pictures in an RSVP task Wed, 31 Jan 2018 17:43:35 +0000
What is it we remember, and why? Research in cognitive psychology has provided a broad and often very reliable sketch of the variables that determine memory performance. For example, recall of words is better when word repetitions are spaced rather than massed. To learn the Lithuanian word for cookie, you are better off spreading apart […]

What is it we remember, and why? Research in cognitive psychology has provided a broad and often very reliable sketch of the variables that determine memory performance. For example, recall of words is better when word repetitions are spaced rather than massed. To learn the Lithuanian word for cookie, you are better off spreading apart repetitions of sausainis rather than crowding them together (“sausainis – sausainis – sausainis – …”). The spacing effect is sufficiently strong for it to be a main component of a technique that doubles the learning rate for the acquisition of Lithuanian from 15 words an hour to 30 and that we blogged about here.

We also know that memory recall is better for items presented at the start or end of a list than for items in the middle. Few people order a beer from the middle of the list that a waitress recites (“Miller, Heineken, Bud, Bud Light, Beck’s, …”) when you ask her what’s on tap. This serial-position effect, like the spacing effect, is so strong that it replicates pretty much anywhere—from the classroom to the lab or the pub.

Although these variables are powerful, they predict memory performance at the level of stimulus ensembles rather than individual items. Words buried in the middle of a list are recalled worse than those at the ends, irrespective of which items are assigned to the positions. In fact, experiments usually randomize those items and collect data from many trials precisely because we are not interested in particular items but broad principles of memory.

There is, however, another question we can ask about memory: which particular items are remembered better than others? Is sausainis a more memorable Lithuanian word than, say, palepstis? Can we predict the memorability of specific items from their attributes?

Those questions have been tackled in a line of recent research that examined and compared the memorability of more than 2,000 pictures. The figure below shows a sample of those pictures, taken from a study by Phllip Isola and colleagues, and separates them by their memorability.

A crucial aspect of studies into the memorability of items is that noise and random variability must be differentiated from systematic idiosyncratic differences. After all, any set of stimuli will produce varied responses across items, so how do we know what is noise and what are stable properties of individual stimuli?

In this instance, we can be fairly confident that the differences between the memorable items on the left of the figure, and the forgettable items on the right, result from more than just random variability: the difference is consistent across observers and across retention intervals ranging from 36 seconds to 40 minutes. Moreover, memorability can be predicted on the basis of high-level properties of the scenes. Scenes are more memorable if they include people, interiors, foregrounds, and human-scale objects. In contrast, exteriors, wide-angle vistas, backgrounds, and natural scenes tend to be less well remembered. Intriguingly, low-level properties such as hue, saturation, or luminance, or the number of objects in a picture play no role.

A recent article in the Psychonomic Bulletin & Review took this research one step further by asking what role differences in perceptibility play in determining memorability. Are pictures remembered better when they are easily and readily perceived?

To shed light on this issue, researchers Nico Broers, Mary Potter, and Mark Nieuwenstein presented pictures of known but varied memorability to participants in a rapid serial visual presentation (RSVP) task, in which stimuli are flashed up briefly in rapid succession. The presentation duration was varied from only 13 ms to 360 ms—so from 1/75th of a second to about a third of a second. 1/75th of a second is rapid indeed.

The sequence of events in the experiment by Broers and colleagues is shown in the figure below:

People saw 6 pictures followed by a single item for which they had to decide whether it was among the original 6 (respond yes) or a new item (no). The critical memory item always appeared in position 2, 3, 4, or 5, and it was either of high or low long-term memorability as determined by earlier research.

The results are shown in the figure below for two experiments that differed only in the range of presentation durations. Performance is represented by d’, a statistical measure that provides a bias-free measure of performance.

In both experiments, performance increased with presentation duration. This result is unsurprising because a brief flash (1/75th of a second) is barely sufficient to recognize a scene let alone remember it. The more interesting result is that highly memorable pictures were remembered better in the RSVP task than less memorable pictures at any presentation duration. Moreover, for those memorable pictures, performance increased over duration sooner and at a steeper rate (at least in Experiment 1) than for the less memorable pictures.

At first glance, one might be tempted to dismiss this result as either circular or unsurprising. After all, if pictures are memorable, why is it surprising that they are remembered better?

What is surprising about the results is that they link memorability to the information that can be extracted in a single sweep of a scene, in 1/75th of a second. This is before any top-down feedback (e.g., direct eye movements to a critical object) can kick in that might enhance processing and extract more meaning from the picture.

As Broers and colleagues put it: “In short, the present results suggest that there is a strong link between the speed of understanding a picture and the likelihood of remembering it. What you are more likely to remember, you may also be more likely to see.”

This result is particularly intriguing in light of the findings from earlier research, mentioned above, that low-level features of a picture, such as hue and saturation or luminance, are not related to long-term memorability. So whatever happens in the first brief sweep of an image is already going beyond the surface-feature level and is extracting meaning and structure from the image—provided that structure involves people, interiors, foregrounds, and human-scale objects.

Article focused on in this post:

Broers, N., Potter, M. C., & Nieuwenstein, M. R. (2017). Enhanced recognition of memorable pictures in ultra-fast RSVP. Psychonomic Bulletin & Review. DOI: 10.3758/s13423-017-1295-7.

]]> 0 Psychonomic Society Featured Content
Nine bets on replicability that you will win everywhere Wed, 24 Jan 2018 20:13:51 +0000
If you had to bet on a psychological effect replicating, what effect would you bet on? Though it seems like an unlikely bet to be asked to make, it’s a reality for anyone conducting a psychology class demo. You want an effect that holds up no matter who your students are. You don’t want an […]

If you had to bet on a psychological effect replicating, what effect would you bet on?

Though it seems like an unlikely bet to be asked to make, it’s a reality for anyone conducting a psychology class demo. You want an effect that holds up no matter who your students are. You don’t want an effect that disappears at certain times of the day, or in certain classrooms. Since you’re not sure what classes students have taken before, what demos they’ve seen, and what experiments they’ve been in, you want an effect that replicates even for non-naïve participants (i.e., participants who have done the experimental task before).

When choosing a demo for a psychology research methods class, the advice I got from a seasoned lecturer was to always bet on the classic Deese–Roediger–McDermott false memory task. This task was a failsafe workhorse, and its intended effect would come through no matter how uninterested the students seemed, no matter what time the class was held, no matter what the classroom was like.

In the class demo version of this task, students listened to lists of words, and then recalled as many words as they could from each list. Some of the lists were semantically related. For example, one list went something like this: sour, candy, sugar, bitter, good, taste, soda, chocolate, honey, etc. Even though they were told to only write down words they were reasonably sure they had heard, a good proportion of the students recalled hearing the word “sweet,” though it was not in the list.

The false memory effect – remembering a word that wasn’t encountered before but that was semantically related to all the words in a list—held. Bet won. Psychology instructor: 1, failure to replicate: 0.

What effects replicate, and why?

The reproducibility crisis in science, and especially in psychology, has garnered a lot of attention and thought. A large replication effort of 100 studies found that fewer than half of cognitive and social psychology findings could be replicated.

But the picture was not as bleak across the entire field. Certain effects seemed to replicate better than others. Cognitive psychology effects fared better (50% replicated) than social psychology effects (25% replicated).

Is it the case that cognitive psychology effects, especially ones that rely on within-participant comparisons, are more robust and more likely to hold up under a variety of conditions?

In a recent article in the Psychonomic Bulletin & Review, researcher Rolf Zwaan and colleagues tested nine widely-used tasks from across three subfields of cognitive psychology to examine whether the tasks’ effects were similar under conditions that might be expected to decrease the likelihood of reproducibility – namely, in online environments and when participants completed the same task multiple times.

The researchers selected three tasks each from three domains in cognitive psychology — perception/action, memory, and language. The tasks chosen were ones thought to be robust, the workhorses of the field:

  • (1) Perception/action: Simon task. Key effect: responses are faster when a target is spatially compatible with a response (a target and response are both on the left) than when a target and response is incompatible (a target is on the left and the response is on the right).
  • (2) Perception/action: Flanker task. Key effect: Responses are faster when distractors flanking a central target are compatible (AAAAA) than when they are incompatible (AAEAA).
  • (3) Perception/action: Motor priming. Key effect: Responses to stimuli (<<) are faster when primed by compatible items (<<) than incompatible items (>>).
  • (4) Memory: Spacing effect. Key effect: Recall of words is better when word repetitions are spaced than massed.
  • (5) Memory: False memories (described above). Key effect: Words that are semantically related to words in a list are falsely recognized as presented before.
  • (6) Memory: Serial position. Key effect: Memory recall is better for items presented at the start or end of a list than for items in the middle.
  • (7) Language: Associative priming. Key effect: Responses to a target are faster when the target is preceded by a related prime than when preceded by an unrelated prime.
  • (8) Language: Repetition priming. Key effect: Responses to an item are faster when the item is repeated than when the item is new.
  • (9) Language: Shape simulation. Key effect: Responses to a picture are faster when the picture matches the shape implied in the sentence preceding the picture than when it does not match.

Participants were recruited online and completed each task twice. Some participants completed the task twice with the same materials; others completed the task with different materials each time.

The researchers examined effect sizes to see whether the effect differed across instances of completing the tasks, and whether the use of the same materials mattered.

The effect sizes turned out to be remarkably stable across repetitions, both when the same materials were used and when different materials were used. This is shown in the figure below, which plots effect sizes for tasks completed by participants the first time (Wave 1) vs. the second time (Wave 2). Each number corresponds to a task from the list above. Tasks completed with the same materials both times are plotted in blue, tasks completed with different materials are plotted in red.

Reliability vs. sensitivity: What are we trying to replicate?

Why were these tasks’ effects so stable? The authors argue that these experimental tasks are so constraining that they protect behavior from any outside influence, such as the environment people are in (since they were tested in a variety of environments), task repetition, and the specifics of task materials.

Whether this is desirable or not depends on what research questions experimental tasks are intended to be used for.

As brought up in a recent Psychonomics featured content post, there are actually different kinds of reliability that researchers may be seeking to maximize with an experimental task.

There is the reliability of observing an effect using a task across individuals and environments, as in the nine tasks described above.

Another desirable feature of a task may be to generate reliable differences in an effect between individuals. In this case, individual participants could be reliably distinguished by their task performance, even across repetitions of the task or across environments.

A final desirable feature may be the ability to reliably identify differences in context or environment using task performance. This is something I care a lot about in my research on the effects of indoor environments on people. For example, to test whether an environment, like an office, helps people sustain attention, I’d be looking for an attention measure that responds reliably to changes in environmental conditions.

Different kinds of reliability and sensitivity are desirable for different research questions, and there is no experimental task that can do it all. With the recent thoughtful discussion on how to move forward from the replicability crisis, considering what we seek to replicate and why will help the field grow.

Psychonomics article featured in this post:

Zwaan, R. A., Pecher, D., Paolacci, G., Bouwmeester, S., Verkoeijen, P., Dijkstra, K., Zeelenberg, R. (2017). Participant Nonnaiveté and the reproducibility of cognitive psychology. Psychonomic Bulletin & Review. DOI: 10.3758/s13423-017-1348-y.

Psychonomic Society Featured Content
NIH, clinical trials, and the Psychonomic Society: A comment from the chair of the governing board Mon, 22 Jan 2018 18:21:42 +0000
Although I typically do not submit grants to NIH, I recently was perusing their Funding Opportunity Announcements (FOAs) to find out if any would align with my educationally relevant research.  The great news is that I found some promising announcements, but the “Clinical Trial Not Allowed” warning made me flash back to NIH’s recent decision […]

Although I typically do not submit grants to NIH, I recently was perusing their Funding Opportunity Announcements (FOAs) to find out if any would align with my educationally relevant research.  The great news is that I found some promising announcements, but the “Clinical Trial Not Allowed” warning made me flash back to NIH’s recent decision to begin implementing a definition of clinical trials that was established in 2014.

Several concerns and questions immediately came to mind:  Would my particular question and methodological approach count as a clinical trial?  If so, how could I change my approach so that it was not a clinical trial?  And, if my research were considered a clinical trial and I could submit, would it be sent to a review panel that understood my science or instead would it go to a special committee that reviews proposals with clinical trials?  No doubt if you have been thinking about applying to NIH, you too may need answers to these and other questions.

Fortunately, Jeremy Wolfe (past chair of the Psychonomic Society and current editor of CRPI) has done our field a great service by interviewing Dr. Mike Lauer (Deputy Director for Extramural Research at NIH) in search of answers to pressing concerns about NIH clinical trial policies.

I encourage you to read the interview for important details, but I’d also like to point out a few highlights.  First, the policies have been in place since 2014, so what has changed since then is that NIH is now attempting to implement them; in the simplest terms, they want to make sure that applicable clinical trials are subjected to the oversight and penalties in the FDA system, which presumably will increase the chances that more funded clinical trials will be conducted and reported in a timely fashion. Such transparency is what Psychonomics members desire, so we should all be on board with those goals.

Second, given that NIH is now implementing the policies, many have taken a closer look, and it is not obvious what exactly counts as a clinical trial, especially given that the detailed case studies appear to have broadened the scope of what counts.  Jeremy’s focused questions did help Lauer clear up some confusion, such as “that most of basic behavioral and brain science clinical trials funded by NIH are likely to be mechanistic trials,” and hence they are not subject to oversight by the FDA.  As chair of the governing board for the Psychonomic Society, I’m somewhat embarrassed to admit that I’m still a bit confused about whether my research would be considered a clinical trial. Such confusion can be costly, because grants that include clinical trials must be submitted to a different FOA. So, what should you do if there is any doubt?  Dr. Lauer suggests several avenues, such as consulting case studies they have compiled or their list of FAQs (for links to these please refer to the complete interview).

When in doubt, however, I think a key take-home message from the interview (and one that often holds about any questions concerning a grant) is to contact a program officer for guidance.

It is too early to tell how much NIH’s implementation of the clinical trial policies will impact our research and funding.  The Wolfe-Lauer interview did assuage some of my own concerns by clarifying NIH’s agenda for implementing the policies. No doubt we will seek further clarification, and we will continue to share what we have learned. Finally, many thanks to Jeremy Wolfe for his continued efforts to clarify and inform Psychonomics members on these issues: His posts here, here, and here are essential reading for anyone concerned about this issue.

I certainly feel more confident about how to proceed, but I also plan on contacting my program officer before moving forward with the next NIH grant.

Christina BeDell 2016
Play – the good, the bad, and the cornerstone of life Fri, 19 Jan 2018 07:11:34 +0000
Rolling down a hill in a park in Ottawa, Canada, with my 12 year old son and 9 year old daughter. Climbing in trees to play Barbie vs. GI Joe with my brother when we were 10 and 8. Pretending the couch is a small raft in a dangerous river of lava that required jumping […]

Rolling down a hill in a park in Ottawa, Canada, with my 12 year old son and 9 year old daughter.

Climbing in trees to play Barbie vs. GI Joe with my brother when we were 10 and 8.

Pretending the couch is a small raft in a dangerous river of lava that required jumping from the “raft” to the “island in the lava river” (i.e., the chair) to avoid the lava spewing up while also pushing my 5 year old brother off the “raft” so that I could “save” him from the lava.

As we have seen all this week, whether an adult or a kid, a bird or a rat, a turtle or a dolphin, play emerges in many contexts and forms.

Some people prefer to play with words, making up jokes or puns.

Some people like games – card games, board games, ball games, computer games, pretend and role-playing games, puzzles.

Some people like to test their physical limits – rough-housing, playing king of the hill or ring-around-the rosy, jumping off things, flying high on a swing, sky-diving.

And some people like to create things – writing, drawing, building things, carving, concocting “potions”, cooking, baking, painting, welding, gardening.

 The Good: Play is Easy to Identify

Play appears to be ubiquitous in humans of all ages, possibly appearing as early as 4 to 6 months when infants begin testing their surroundings and make things rattle, squeak, and bang. This was first identified by Jean Piaget and later examined by Philippe Rochat (1989). Free play increases as children age, as Belsky and Most reported in 1981, with pretend play and imaginary friends developing as children become more self-aware and begin to master language and reading behavioral cues (discussed in three articles by Artin Göncü, Tracy R. Gleason, and Lili Ma published in Learning and Behavior’s special issue on play).

Children often re-kindle adult interest in play as parents fight off monsters that live under their children’s bed each night and travel to distant places on our dragons and unicorns while riding around on “horses” better known as brooms.

Animals also remind us of the freedom that play provides to just let loose. Whether it is ravens soaring on air currents, killer whales surfing waves, or juvenile vervet monkeys performing acrobatic moves during a play fight (figure below), play is universal across species.

As described by Gordon Burghardt in his seminal book “The Genesis of Animal Play” (2005), which he echoed in yesterday’s post, and recently followed up by Vladimir Dinets (2015), even reptiles manipulate toys that make noise and play tug-a-war with hoses.  Turtles chase balls; crocodiles hold them:

In their paper for the special issue, Serge Pellis and Vivian Pellis describe play-fighting in red-river hogs (image below) and warty pigs, which consists of aggressive pushing and shoving during head-to-head competitions until one of the contestants “loses”.  This play fight is signified by the use of a “submission” signal between the two contestants, which terminates the “contest”.  This signal does not occur during true fights.

A “Playful” Workshop

As described by Alex De Voogt in his introductory post, 16 experts in play from around the world attended the meeting that ultimately gave rise to the special issue and this digital event. The goal of the workshop was to define play in a way that could increase cohesion across the diverse fields represented.  For three days, we heard from developmentalists studying pretense and imaginary friends in human children and the importance of play for children with chronic medical issues, to cognitive psychologists and anthropologists studying the influence of various games on cognitive abilities and socialization processes. Cindy Clark’s post earlier this week illustrates these socialization processes. We also heard from experts in animal behavior about the forms and potential functions of play in a variety of species, including reptiles, birds, horses, marine mammals, pigs, cows, and various rodents.

And each day, after intense discussions and a number of presentations, we enjoyed the hospitality of Brookfield Zoo where we had several opportunities to “play” ourselves: we interacted with a number of birds, watched children enjoy the beautiful weather and open grounds of the zoo, and of course watched plenty of animals play.

The Bad: An Impossible Dream?

After all of the talks were given, we were tasked with the difficult question of how to define, unify, and plan for the future of play.  What are the functions of play? How did play evolve across a myriad of taxa and contexts? Can and should play be used as an indicator of an individual’s current welfare state?

As part of the evolutionary question, are play signals present in all species?

Dogs bow to one another, primates display play faces, humans smile.  What about birds, pandas, and marine mammals? What is the function of these stereotyped displays?  Elisabetta Palagi and Chiara Scopa believe these displays may have evolved to facilitate social communication through mimicry, a topic they discuss in their special-issue article.

As both Marek Špinka and Gordon Burghardt described in their posts, some animal models have progressed to mapping the neurophysiology of play (e.g., rodents) and have tested the effects social play and its deprivation on the immediate and future behavioral and neurophysiological outcomes.

Unfortunately, three days were insufficient to answer all of these questions. But, as we agreed at the end of the workshop, it is critical for scientists studying play to open their minds and begin to address these difficult questions.  The posts this week have extended our initial efforts from 2016, with Špinka discussing the idea that perhaps a “play engine” powers the varied components of play across myriad species, and Burghardt picking up where we left off with the need to examine play from all sides and disciplines as we attempt to integrate and expand our knowledge of play into a cohesive theory.

Crossing the Line

Burghardt discusses the need for a comparative approach in the study of play in his post.  As comparative psychologists, we straddle both sides – humans and non-human animals. Fascinated by the use of play as indicator of animal welfare and the current lack of evidence that cognitively challenging games, such as chess, do not appear to have any clear-cut cognitive benefits, I left the workshop with mixed feelings: excitement and trepidation for the monumental task at hand.  Burghardt expressed these very thoughts in his post, and they were also echoed by Špinka.

There is still so much to learn about the nature of play on each side before we can delve deeper into functions and more proximate mechanisms.  First and foremost, we need a common terminology as expressed De Voogt in the opening post and by Lance Miller in his article in the special issue.

This week, we have read a number of posts about play in humans, from imaginary friends to elves on shelves to the “social glue” of many cultures.  We have read a perspective about play in animals and the current state of the field.  Written by experts in the field of play, each perspective, whether by an anthropologist, a comparative psychologist, or animal ethicist, reinforces that we all have the same goal – to better understand play and its consequences.

Play may be easy to identify, but it is definitely not easy to quantify or manipulate experimentally.

We argue that children need to play outside without structure (i.e., recess) to enhance their overall development (e.g., greater cognitive success, better social skills, improved physical health). Yet, we have not demonstrated the mechanisms involved that support these positive outcomes.  These concerns were described by Špinka in his post.

Pursuing the origins of play, its characteristics, and mechanisms will provide us with better opportunities to evaluate its functions.  Unfortunately, we must first uncover the characteristics of play and the neurological mechanisms underlying it to begin to address its possible functions.  Studying non-human animals from a comparative perspective provides opportunities to evaluate universals and species-specific characteristics. Elisabetta Palagi and her colleagues have studied play in a number of primates using this comparative approach with some very intriguing results, especially with regard to play fighting and play faces. Palagi and colleagues share their perspective in their article on mimicry in play.

The Cornerstone of Life

To unravel the remarkable secrets of play, it is clear that we still have many more years of effort ahead of us.  Knowing that rats laugh when tickled as Jaak Panksepp and his colleagues discovered, and some primates “grin” during play to minimize conflict, while the removal of play fight opportunities as young pups limits future social behavior in rats, are all steps to understanding the importance of play in animals – human and non-human.  The posts this past week have emphasized both the importance and the mystery of play.

Despite the current state of organized chaos, one universal does exist: Watching others play is as delightful as play itself.

As my former mentor and one of the imagineers for this workshop, the late Dr. Stan Kuczaj, always said: “Life is short. Ride a roller coaster. Eat a hot dog. And most importantly play!”

Psychonomic Society Featured Content
Merging multiple shades of play in multiple ways Thu, 18 Jan 2018 10:31:35 +0000
The special issue on the evolution and psychological significance of play in Learning and Behavior covers multiple topics, species, and ages and is most welcome. I hope the issue and thoughtful papers receive the attention that they deserve. With the great influx of research interest in play over the last 20 years, some of the […]

The special issue on the evolution and psychological significance of play in Learning and Behavior covers multiple topics, species, and ages and is most welcome. I hope the issue and thoughtful papers receive the attention that they deserve. With the great influx of research interest in play over the last 20 years, some of the isolation is breaking down between those focused on play in animals, the neuroscience of play, child development, sports, artistic endeavors, folklore, and applications to education, animal welfare, psychological therapy, and so on.

In this issue are 11 papers that, while not addressing all the critical issues in play, do tackle important topics and move the field forward. These papers include research program overviews, conceptual issues, and empirical studies including ones on dogs, cetaceans, children, and indigenous populations.

A true feast.

Rather than primarily react to a specific paper or topic, or individually comment on them all (virtually impossible in a short essay), I will briefly reflect on issues and approaches on display in the papers in the issue. I will not cover each paper directly, but use them as grist for comments that go beyond the points made by the editors, Voogt and Miller in their editorial and opening post. The editors hope for cross fertilization among researchers on diverse topics and species, while also acknowledging a point made by several of us years ago: play is a heterogeneous phenomenon reflecting different causal mechanisms, developmental processes, and evolutionary histories.

In my 2005 book, I tried to address the heterogenous nature of play and its functions by viewing play phenomena as primary, secondary, or tertiary.  Play may thus have no adaptive function (either immediate or delayed), aid in maintaining behavioral competencies, or have benefits of many sorts. Play may also be costly, cruel, and even maladaptive (e.g., gambling, risky play such as running class 6 whitewater rapids). Thus, perhaps in contrast to Marek Špinka’s preceding post in this digital event, I argue that the search for the function of play is doomed to failure. The search should be, if that is the researcher’s aim, for the role that play plays in specific aspects, or behavior systems, of the human or nonhuman animal being studied.

It is also important to study whether any of these putative benefits generalize to other behavioral contexts and not just assume generalization. The article on chess in this issue (by Sala and Gobet) is a fine example of this type of approach. The many studies on play ‘fighting’ in rodents also reflect this interest.

Defining and characterizing play was the focus of several papers in this issue (MillerGöncü). When I started to seriously study play from a comparative perspective, it was clear to me that existing definitions had serious deficiencies. Play was largely thought a feature of mammals and perhaps some birds. My goal was to see if we could identify behavior as play in species and contexts where it had not been seriously considered, as in reptiles, fish, and insects, by removing the anthropomorphic aspects.

This is difficult for many observers who think that play must show obvious signs to them of joy, pleasure, or having fun. I am writing this while attending a small conference of leading animal behavior researchers covering a wide gamut of species and topics. Yesterday I gave a talk on play and ritual where I showed this video clip of a cat and turtle playing tag around a pole:

Immediately someone raised the objection that it was certainly play for the cat but not the turtle, because one could see the cat was enjoying his/her self but we could not know this for the turtle. This comment shows the resistance to expanding the evolutionary and comparative reach of play.

I think the five criteria I derived from the literature that jointly need to be satisfied have done this successfully within the play research community if not outside it. Although the criteria have been slightly refined since 2005, given that play is such a mixed class of phenomena it is to be expected that overarching criteria cannot be overly precise. We need now to have more detailed characterizations of specific types of play such as object play, social play, pretense, and other types sensitive to the contexts and species involved. Several papers in the issue are doing this regarding shared intentionality (Heesen), strategic games (Voogt), imaginary play (Göncü, Gleason), and pretense (Ma).

Tinbergen’s classic 1963 paper on the four aims of ethology, namely to understand the causal mechanisms, ontogeny, evolution, and adaptive function of behavior, was a most useful advance in clarifying the strategies and tactics of behavior research (see the article in the special issue by Palagi). Yet in my mind it was incomplete in that it ignored the psychological and private experiences of the organism. Although influenced by von Uexküll’s perceptual and effector components of behavior, Tinbergen ignored von Uexküll’s focus on the ‘inner world’ of the animal. Indeed, the irony here is that Tinbergen in that 1963 paper rejected the study of play as it was too subjective! Given that a focus of this issue of Learning & Behavior is on the psychological significance of play, the message I tried to deliver in 2005 on the importance of adding a fifth aim, that of private experience, to Tinbergen’s four seems especially salient.

Often those studying play in humans focus on pretend, imaginative, sociodramatic, and educational play, whereas those studying animal play focus more on overt behavior in the three basic categories of locomotor/rotational play, object/predatory play, and social play (typically wrestling, chasing, etc.). This is shown in this issue in papers discussing play in rodents (Pellis), dogs (Merhkam), cetaceans (Hill), and nonhuman primates (Palagi). The progress being made in understanding rodent, especially rat, social play is far ahead of the pack, but studies on breed differences in dogs and relationships across play types addressed in this issue reflect the growing importance of canine studies in the field of comparative cognition in general.

While the literature cited in the human and nonhuman focused papers in this issue overlap to some extent, the disconnect between them is still too great. Human-play researchers often focus on what I generically call mental play, while nonhuman play researchers focus largely on active behavioral play. While ‘mental’ play is a special challenge for those studying nonhuman animals, active physical behavioral play is not so difficult to study with humans. Certainly, there is research on behavioral play in humans, but too often it seems focused on what it tells us about mental capacities rather than its role and function qua behavior, and especially its role on functioning in social groups.

The paper on shared attention (Heesen) is an admirable bridge across these literatures. The paper on a Piagetian analysis of behavioral development in whales and dolphins (Hill) also is one of the most thorough and insightful such efforts and shows what students of nonhuman animals can gain from the child development literature. The paper on mimicry (Palagi) also effectively shows the value and importance of comparative papers that include the human animal to derive important insights that bridge the anthropocentric divide. Indeed, play in all its guises may be one of the best conduits in developing a genuine integrative comparative psychology that includes the human animal.

One important means of aiding this integration is relating specific research papers to some of the more important integrative theoretical books on play. I also think that some scholars are unduly neglected. Only one article cited Tom Henricks, but not his recent book (Henricks, 2015). One article cited Robert Mitchell’s work on pretend play in human and animals, but his work on defining play as being focused on intentionality was not cited. Two papers cited Brian Sutton-Smith, perhaps the most important play scholar since Huizinga (who was cited by one article). Henricks and Sutton-Smith (1997) provide perspectives that reach into the sociological, historical, anthropological, and other areas that will eventually need to be incorporated into a true integrative conception of play. Such integration will best be done by accumulating careful, detailed studies of specific and diverse play phenomena, and this special issue is a major advance in reaching this ‘impossible dream.’

Psychonomic Society Featured Content
The Engine of Play Wed, 17 Jan 2018 09:20:00 +0000
Play is rich and fascinating; it is also strange and puzzling. It is playing all kinds of tricks with seriously-minded thinkers and researchers. Play is easy to recognize in children below one year of age, yet professors at the zenith of their play-research career are struggling to work out a simple and useful definition of […]

Play is rich and fascinating; it is also strange and puzzling. It is playing all kinds of tricks with seriously-minded thinkers and researchers. Play is easy to recognize in children below one year of age, yet professors at the zenith of their play-research career are struggling to work out a simple and useful definition of it. It leaves psychologists, anthropologists and philosophers bewildered by the labyrinth of forms among humans of all ages and cultures, luring them into digging deep into one aspect while so many others evade them. And biologists fare no better. Here is their paradox: evolutionary biologists, who, by definition, investigate the adaptive functions of behavior, define play behavior as “having no obvious function”.

With all the diversity in form and variation in function, what keeps “play” connected? The puzzle is made even more challenging by the fact that the variety of play really expands in two realms that are only connected through a rather narrow isthmus. For animal ethologists, the richness of play resides in the plethora of forms across animal species, with us humans being just one case among thousands of others. And for human psychologists, the play variability extends across the variety of child, youth, adolescent and adult forms of gamboling, games and grotesques, reaching into sports, arts, rituals and politics. Of these play forms, there are just a few that we share with other species – rough-and-tumble, for instance, or juvenile object play.

One dimension of variability that is common both for mammalian play in general and specifically for human play, is the continuum between solitary and social activities. In between the cases of purely solitary playing (such as a lone weasel gamboling in fresh snow) and elaborated social play (such as structured rough-and-tumble play bouts described in the article by Heesen and colleagues, there are various degrees of play (non-)coordination in groups of mammals. Thus, young calves or foals engage in non-contact, yet parallel playful scampering; piglet play stretches from solitary runs-jumps-pivots through isolated and fleeting head-knock contacts to committed pair-wise shoveling duels.

Social play has attracted far more research interest than solitary play – understandably so, as we humans are an extremely social species. One particularly vivid research field focuses on highly structured and rule-based forms of play, and seeks to uncover in it evidence for high cognitive and communicative abilities, either in non-human animals or in very young children. Thus, Pellis and Pellis devote their article to a look at how the balance between competition and cooperation is achieved and maintained during play fighting bouts by various rodent and pig species. The various species employ different strategies to strike the competition-cooperation balance: restraint in executing the competitive tactics, not taking advantage after a competitive success, or honoring signals of submission. Notwithstanding these differences in how it is achieved, the competition-cooperation dynamics make it necessary for the animal to monitor continuously both their own actions and those of their partner.

Heesen and colleagues make the case that animal rough-and-tumble play requires a very high level of coordination because the play partners need to agree on opening, maintaining, and closing a play bout. In a brief comparative overview, the authors review evidence that while managing cooperatively the play bouts, various mammalian and bird species employ cognitive and communicative means (e.g., mutual responsiveness and role reversal) that belong to the building blocks of the human “shared intentionality” as it was defined by Tomasello and Moll in 2010.

Thus, structured animal social play may be a window through which we can reveal how the specific human “cognition for interaction” came into existence. Earlier, Bekoff suggested that social play requires the play partners to behave “fairly”, i.e. not to take too much advantage at the expense of the other, which could be one of the building blocks for the evolution of morality.

The articles in the special issue of Learning & Behavior dedicated to The Evolutionary and Psychological Significance of Play reflect a general trend in comparative play research: complex forms of social play get most research and theoretical attention. Researchers hope that the study of the highly elaborated play cases will advance our understanding of social cognition, communication, emotional coordination or even morality. This “top-down” approach (no derogation here) begins with some very complex play skills and then looks for their evolutionary and ontogenetic roots.

Let us here take, for a while, the opposite bottom-up approach (as for example summarized by de Waal and Ferrari) starting with the simplest form of play – the solitary locomotor-rotational play, such as a young monkey leaping on and bouncing off a substrate where it cannot hold or a calf kicking its hind feet sidewise during a boisterous gambol,  thus rotating its body asymmetrically along the spine. Describing limb flexions and torso rotations might seem a boring task. In reality, watching animals as they jump and scamper, frolic and gambol, slide and swing, roll and rotate is most amusing, and, at the scientific level, taking cognition as an embodied phenomenon is gaining currency.

The surprising fact is that there is a fascinating similarity between the space-time structure of locomotor play, the communicative dynamics of bodily social play, and the mental engagement in human cognitive and verbal games. In all these realms of playing, there is the same pattern of three crucial aspects. Firstly, the animal puts itself temporarily in an “as-if” play Umwelt that somehow mirrors the “serious” Umwelt but is free of the pragmatic evolutionary-fitness load. The second aspect is that play is performed as a series of repetitive actions limited to a small subsection in the vast multidimensional space offered by the as-if world. And finally, the repetitive stream of playing abounds in unpredictable variations that are invited through deliberate lack of control by the playing animal(s), as I noted with colleagues some time ago.

Here are some examples:

  • Locomotor play: a weasel in Europe emerges from a hole to find snow cover for the first time in life. The unreal substance incites her to frolic wildly, jumping, pivoting and twisting her long body in all kinds of wriggling shapes. No dignity, no elegance, no efficiency, no care.
  • Social play: two young Norway rats play fight, attacking and counterattacking in a 3D-melée that includes many probabilistic role reversals and a lot of excited 50-kHz squeals. A successful attack may result in the winner pinning the other to the ground with its forepaws. But once the play partner is pinned down, the on-top animal does a maneuver that gives up the hard-won control – it switches to standing on its supine partner with all four of its paws, thus making its position vulnerable to an easy counterattack.
  • And finally, cognitive play: U.S. parents gang up with a somewhat unruly and rather unpredictable breed of ‘elves-on-a-shelf’ to stage ongoing night-to-night series of pre-Christmas performances for their kids. Yesterday’s post by Cindy Clark explained this in detail.

Do you see the pattern? I think it is quite obvious, but that begs the next question – where does the pattern come from? How come play is so deeply similar across the various mammalian species and across the bodily, social and mental realms?

My guess (nothing more at the moment) is that what is working behind the scenes is a phylogenetically old mammalian capacity which I hereby call the ‘play engine’. This label deliberately refers to Stephen Levinson’s influential concept of the ‘interaction engine.’ The interaction engine, according to Levinson, is an ensemble of cognitive skills and motivational predispositions that enable us humans to get into, maintain, and get out of a devoted communication space.

Similarly, the putative play engine is an assemblage of the cognitive ability to enter and leave the as-if Umwelt, the motivational proclivity to visit this world with gusto and the emotional capacity for the fun emotion that enjoys the rollercoaster-like switches between loss and regain of control about what is happening, as shown in the work of Panksepp, me and colleagues, and Trezza and colleagues.

Interestingly, there is also an ontogenetic parallelism between the two engines: pre-verbal human infants master the interaction engine as effortlessly as very young mammals handle the as-if world of play. But then there is a difference: the interaction engine was seemingly assembled into a saleable product just recently in the Homo sapiens lineage. In contrast, the play engine is an old mammalian invention that had originally been utilized for bodily gamboling in 3D space, before being drawn into the even richer hyperspace of social tumbling and finally exploding in the meta-space of human culture and language.

The capacity to enter, roam and leave the playful mode of existence is widespread, if not universal, among mammalian species. At the same time, the as-if Umwelts of various species are vastly diverse. Even closely related species often contrast strikingly in their way of playing. For instance, rats are avid play-fighters while mice are devoted solitary pouncers. The extreme variety in the semblance of play (and, as far as we can say, in its ultimate functions) indicates that while the core engine remains conserved, there has been a wide diversification in the mammalian phylogeny of the adaptive machines driven by it.

Often the play engine has been coupled with other capacities, such as with exploration in object play, or yoked for a new purpose, such as to mitigate aggression among adults. In humans, the derivative forms of play are manifold, including, among others, the rough-and-tumble, role playing, pretend play, symbolic play, imaginative play and strategic games discussed in the special issue.

What I propose is that in each of those fancy gadgets, there is the old good play motor that drives it. And what I argue is that besides investigating all the sophisticated functions of the products, it is important to also understand the workhorse that drives them. Psychologists are more and more aware of the central role of play in human development and educationists are concerned about the shrinking physical and social space for free child play. And yet, the physical locomotor-rotation play is primarily valued just for its anti-obesity and physical fitness function. That is important, for sure, but it is not the whole story. The engine in a machine is not just for gas-guzzling. We need to understand the engine and take proper care of it. What use will be a machine if the driver chokes the engine by putting too much gas on it? And is the engine maintained properly so that it will kick on reliably even in bitter frost conditions?

On the submission deadline day, I am returning to the text, with the intent to write a nice coda. I enter the para-real world of competing and collaborating play theories. Here I see the new beast: the idea that all the diverse and advanced forms of mammalian and human play are highly transfigured clones of the phylogenetically ancient bodily gamboling. I pounce on it, trying to seize with both arms. It kicks with several of its protean buts, escapes and grimaces from an inviting distance. It is great fun, anyway. I will keep trying to pin it down, for a while.

Psychonomic Society Featured Content
Morality Play, New Jersey Style: The American version of isumaqsayuq Tue, 16 Jan 2018 09:53:43 +0000
In contrast to the animal play that is covered in the special issue of Learning & Behavior dedicated to The Evolutionary and Psychological Significance of Play, humans often use elaborate representation (language and other symbols) in their play.  An example that occurs during contemporary Christmas season is the elf-on-the-shelf. By the time I visited homes […]

In contrast to the animal play that is covered in the special issue of Learning & Behavior dedicated to The Evolutionary and Psychological Significance of Play, humans often use elaborate representation (language and other symbols) in their play.  An example that occurs during contemporary Christmas season is the elf-on-the-shelf.

By the time I visited homes of low income children in New Jersey (the week after Christmas in 2015), the elf-on-the-shelf was gone from its temporary residence in the homes of children.  In the Madera home, where John (age seven) and his three-year-old brother lived, the elf had first made its appearance shortly after Thanksgiving, but was gone by Christmas eve.

A 2011 children’s book by Carol Aebersold and her daughter Chandra Bell, who started elf-on-the-shelf as their own Christmas tradition, described the basis for the small toy elf to have visited.

Have you ever wondered how Santa could know if you’re naughty or nice each year as you grow? …. At holiday time Santa sends me to you.  I watch and report on all you do.  My job’s an assignment from Santa himself.  I am his helper, a friendly scout elf … Each night while you’re sleeping to Santa I’ll fly … I tell him if you have been good or been bad.

John’s father Ben drove a cab at night and by day cared for the two boys, while Mrs. Madera worked her day job. Ben Madera had been a tattoo artist prior to driving a cab, and had an affinity for making art that his seven-year old son, John, shared. Despite two working parents in the home, the Madera’s income qualified first grader John to receive a subsidized school lunch.  They were part of the legion of working poor in post-Great Recession America, for whom the recent economic recovery had not lifted them above struggle.

When I asked young John Madera to draw Santa for me, he drew the elf Max as well:

It was clear that John believed in a close tie between Max – the name he and his brother had given to the family’s elf-on-the-shelf – and Santa. John thought, for instance, that possibly Santa had learned about toy-making from elves, rather than vice versa. The elf-on-the-shelf is a contemporary syncretism.  This bears similarity to Rudolph the Red-nosed Reindeer (a character conceived in 1939 in a publication distributed by the catalog company Montgomery Ward), another syncretic element now widely assumed real within children’s Santa mythology.

In the Madera household, in a pattern similar to many other homes and classrooms during December, Max the elf took on a subjunctive or imaginal reality sustained not only by children, but through the support of performances engineered by an adult. Adults moved the elf-on-a-shelf from place to place, overnight, thus making it possible that children would find the elf in a new place each day.

Customarily, when moving the elf, adults “set a scene” that implied the elf had been active, even mischievous, at night.  Mr. Madera relished the creative act of constructing situations that made Max seem naughty, such as tangling Max in Christmas lights (as if he had messed up the lights), or staging scenarios that made it seem like Max had engaged in physical altercation with John’s toy action figures.  One night, Max “went crazy” and landed in the Christmas tree.  Another night, Max got stuck to the front door when he presumably was playing with adhesive tape.  At times Max succumbed to temptation, such as when Max was found next to the crumbs of a jelly donut meant for the next day’s breakfast.  These playful scenarios also take place in other families, many of whom post their elf’s scenarios via photographed scenes of elfin misadventures.  Facebook pages have become devoted during December to showing mischief and trickster-like playfulness in various homes of elves.

In early elementary school classrooms where some children’s teachers placed an elf-on-the-shelf, children told me they were reminded that touching the elf would cancel out its magic.  Teachers, like parents, moved the elf-on-a-shelf at night while creating mischievous scenarios for the elf.  One elf raided the teacher’s supply of Hershey’s kisses and distributed them, one on each child’s desk.   Another drew a picture on the chalkboard.  An elf fell off the shelf after attempting to string a zip-line (with a pulley) from one place to another.

The elf toy thus is assembled together with other props in a kind of still puppetry.  Intriguingly, these stagings correspond in many ways with a less materially based form of emotionally charged socializing dramas:  dramas enacted for the three-year old Inuit child Chubby Maata in Jean Brigg’s ethnography, Inuit Morality Play

Inuit adults engage children with tableaux as well, tableaux enacted through dialogue with a child, rather than through toys and material props.  Such dramas are consciously thought of as having socializing value.  They comprise a facet of what the Inuit call isumaqsayuq, an approach to socialization that literally translates in English to “cause thought.” Adults present Inuit tots with emotionally charged dilemmas that cannot be ignored, such as by asking a question that implies danger to the child.

To take one example, a child who is resentful or jealous of a new sibling might be asked by Inuit adults:  Do you love or wish to kill your baby sibling?  Through as-if scenarios that develop via dialogue, the child works through a presented moral dilemma, as the child’s own ideas about actions and intents are applied to dramatic scenarios.  This is a kind of morality play, Brigg’s book’s titular allusion to ritual allegorical drama in medieval Europe.

Like the playful dramas of isumaqsayuq among the Inuit, elf-on-a-shelf both requires and invites an allotment of playfulness and creativity from participant adults, albeit adult scene-making by American parents is done without the child’s participation and takes tangible form.  The opportunity for creative subjunctivity seems central to the elf-on-a-shelf’s growing appeal among U.S. adults.  I described in my 1985 book Flights of Fancy, Leaps of Faith how American adults reap from Christmas an opportunity to vicariously enter such states of mind as wonder, awe and the suspension of disbelief, through their child’s participation in the Santa ritual. Elf-on-a-shelf allows the adult to directly and actively shape Christmas fantasia.

In a sense, Mr. Madera and children’s classroom teachers were motivated to invest their time in creating elf-on-the-shelf tableaux as a means of partaking in the fuller Santa mythology and ritual. Facilitators of elfin scenarios, through the guise of socializing children, they experienced a version of second childhood via playful subjunctivity.  Examples of adult-made tableaus range from an elf seated in front of a game board across from a doll who is also “playing” to an elf seated alone, eating a meal of pancake syrup, marshmallows, and spaghetti.   Christmas can be a socially sanctioned catalyst for adults to scavenge their childhoods of memory, to let go of literal reality, and to act out a bit of mischief – all justified as instilling children with righteousness since the elf is presumed to be a moral monitor.

The fact that Max and other elves engage in high jinks and mayhem lends an isumaqsayuq-like force to adult-made elfin displays. For elfin scenarios do seem to serve as a jolt to moral thinking for the children who discover morning-after evidence of acts an elf has committed the night before.  Children like John are not unsympathetic to their elf’s mishaps, but they are also prompted to reflect on the outcome of unchecked elfin impulses, whether this is one less jelly donut or disturbed ornaments on the Christmas tree.  John Madera, for example, thought aloud about the mixed-up outcome that occurred when his elf Max’s impulse to fight went unchecked:

Maxwell … sometimes he fights with toys.  Last time the toys were fighting, they got scotch tape, taped him to the door … My toys captured him and taped him to the door.  They even taped his mouth. [John, 7]

Thanks to parental creations of elfin tableaus, children witness a pageant of miniature morality plays in the weeks leading to Christmas.  These little dramas make children into eyewitnesses of possible implications of mischief and unchecked indulgence.  Elves-on-shelves promulgate moral reflection through an American fantasia in material form, through moral possibilities that are patently dramatized.

In an article on conflict resolution in medieval morality plays, Dorothy Wertz (1969) discussed the fact that allegorical drama historically included participation by characters that were literally devilish. She wrote that “Drama … gives the individual the opportunity to participate in fantasies forbidden in ordinary walks of life.”  Wertz called this “test identification,” a kind of trial scenario in which individuals can try out potential social, religious, or personal roles.  At times a temptation may arise to over-identify with less than heroic roles, but Wertz asserts that ego-mastery provides a check on carrying such roles beyond the drama, beyond the stage.  Watching trespass in an as-if frame is a way to bring possibilities to awareness, without actual harm to self or society.  The Inuit might agree, based on their played out scenarios, described by Briggs.

American children with elves in their homes or classrooms thus derive moral lessons from elf-on-the-shelf in at least two ways. First, there is the implied warning that the elf will report their bad or good behavior to Santa, a credible possibility to young believers. Second, elves also serve as negative exemplars of the troubles that ensue if one’s impulses go unexamined and unchecked. In this latter sort of lesson, American children benefit from a materially enacted, American version of isumaqsayuq, by which thought is caused about the implications of actions.  Honing to cultural patterns, the American version of isumaqsayuq,is carried out, first, with a bricolage of material artifacts and, second, with an opportunity for adults to exploit Christmas fantasy for vicarious playfulness.

Psychonomic Society Featured Content
Getting ready to play Mon, 15 Jan 2018 13:42:04 +0000
The Digital Event on The Evolutionary and Psychological Significance of Play got under way yesterday with an overview post. Today is the first day of this event, and it serves to introduce the special issue of the Psychonomic Society’s journal Learning & Behavior on which it is based. In June of 2016, the Chicago Zoological […]

The Digital Event on The Evolutionary and Psychological Significance of Play got under way yesterday with an overview post. Today is the first day of this event, and it serves to introduce the special issue of the Psychonomic Society’s journal Learning & Behavior on which it is based.

In June of 2016, the Chicago Zoological Society–Brookfield Zoo hosted the Psychonomic Society Leading Edge Workshop on The Evolutionary and Psychological Significance of Play. In a partnership between the Chicago Zoological Society, American Museum of Natural History in New York, and the University of Southern Mississippi, we brought together sixteen of the leading experts from a diversity of fields that study play. The goal of the workshop was to examine the evolutionary and psychological significance of play while increasing interest in cross-disciplinary research.

Photos of the workshop and its participants can be found here.

The resulting special issue of Learning & Behavior, and the posts in this digital event, highlight many of the discussion points throughout the workshop, starting with an exploration of common terminology and definitional criteria.

Behavioral observations and experiments are guided by increasingly theoretical questions to test ideas about what, how, and, ultimately, why play occurs. Certain types of play are understood well enough to theorize convincingly that there is a particular psychological or evolutionary significance. For example, children’s imaginary play research is grounded in a long tradition of developmental psychology and ideas about its psychological significance. But, in contrast, adult game play is studied in cognitive psychology with increasing evidence that its significance is minimal in terms of cognitive transfer, while its social significance does not seem generalizable across cultural groups.

This splitting of fields might appear to create an unwanted contradiction, but rather it is a useful outcome of bringing scholars together. It allows for a better understanding of specific behaviors, reveals the limitations of current research, and highlights possibilities for future investigation.

The crossing of the divides, often one of the noble goals for multidisciplinary meetings, is visible in other ways. Research on imaginary companions that suggests play to be useful for later adaptive social functioning strikes a chord with research on mammalian social play. The paucity of research on crocodilian play, because of its rare occurrence, is mirrored in limited descriptive research on board games, as they have often been overlooked by anthropologists. The different disciplines share theoretical questions, descriptive limitations, as well as the same diversity of play that make generalizations so problematic.

As confirmed by several authors in this volume, there is a wide assortment or diversity of play. This includes everything from play in cetaceans, rats, and dogs to board games and imaginary play in humans. Given the diversity of topics and frequency of research, the importance of play behavior re- mains elusive. There is growing consensus among scholars of play that the phenomenon may be too diverse and complex to capture in one all-encompassing theory. Considering the different species, types of play, and contexts, there is little reason to assume that all play shares the same universal significance.

This notion is highlighted among the contributions of this special issue. It will also be taken up by the posts during the remainder of the week. Please join us between now and Friday and contribute to the discussion.

(This post is based on an Editorial for the special issue that was jointly written by Alex de Voogt and Lance Miller.)

Psychonomic Society Featured Content
38 shades of play: Commencing a digital event on the science of a diverse and pervasive behavior Sun, 14 Jan 2018 17:39:45 +0000
We all know what it means to play. We play badminton, we play with others, we are playfully exploring an environment…. Come to think of it, there is so much to playing, what does it mean to play? According to the Oxford English dictionary, the verb “play” has 7 different meanings, ranging from “Engaging in […]

We all know what it means to play. We play badminton, we play with others, we are playfully exploring an environment…. Come to think of it, there is so much to playing, what does it mean to play?

According to the Oxford English dictionary, the verb “play” has 7 different meanings, ranging from “Engaging in activity for enjoyment and recreation rather than a serious or practical purpose” to “Allowing (a fish) to exhaust itself pulling against a line before reeling it in.” Many of those 7 different meanings, in turn, come either with or without an object, and if they come without an object they can still be adverbial. Altogether, Oxford provides 28 definitional examples for the verb “play”, plus another 10 for the noun “play”. In case you forgot, the noun applies to light (“the artist exploits the play of light across the surface”), markets (“our policy allows the market to have freer play”), and personal ruin (“a young nobleman, ruined by play”).

On top of that, consider the fact that play is not limited to humans. Animals play, too. Sometimes on their own, and sometimes because humans put them on skis or on surfboards, as you can see here:

Clearly, play is a fundamental and pervasive behaviour of living organisms. No wonder that Carl Jung opined that “The creation of something new is not accomplished by the intellect but by the play instinct.”

Our next digital event, which commences Monday 15 January, is therefore dedicated to the scientific examination of play in all its forms.

The digital event coincides with the publication of a special issue of Learning & Behavior dedicated to The Evolutionary and Psychological Significance of Play. The issue was guest edited by Alex De Voogt (American Museum of Natural History) and Lance Miller (Chicago Zoological Society). The articles in this issue will remain free to access by the public for a month. Here are the titles of the articles and their first authors. They can be accessed from this landing page:

Beginning on Monday, 15 January, we will be discussing some of those articles here in our next digital event. The following posts, listed in the likely order of their publication, will contribute to the event:

  • Alex De Voogt and Lance Miller will provide an overview of the articles in the special issue.
  • Cindy Dell Clark will bring to bear her anthropological expertise on the morality aspect of play (New Jersey style).
  • Marek Špinka will ask what keeps the concept of “play” connected together, given all the diversity in form and variation in function?
  • Gordon Burghardt will provide some reflections on the state of play research and future directions inspired by the special issue on play and the diverse topics and approaches covered by these leading scholars.
  • Heather Hill, one of our Digital Associate Editors, will conclude by providing an integrative commentary on the special issue, to which she has also contributed a paper.

I look forward to this event and I hope many readers will join us in our exploration of play next week.


Psychonomic Society Featured Content
A new look at old data: Results may look better but different Thu, 11 Jan 2018 01:24:15 +0000
Replication and reanalysis of old data is critical to doing good science. We have discussed at various points how to increase the replicability of studies (e.g. here, here, here, and here), and have covered a few meta-analyses (here, here). Maybe it is because technology is constantly changing, and because we forget where we leave files […]

Replication and reanalysis of old data is critical to doing good science. We have discussed at various points how to increase the replicability of studies (e.g. here, here, here, and here), and have covered a few meta-analyses (here, here). Maybe it is because technology is constantly changing, and because we forget where we leave files and on which hard or flash drive, but it has been relatively rare to see researchers go back and examine their old data.

More recently, however, the Psychonomic Bulletin & Review (PB&R) has been encouraging publications that reexamine research previously published in PB&R. Today we will cover a study just out in PB&R in which the authors looked back on their own initial findings, and critically reassessed a theory. The authors reported that the article is the product of the second author’s skepticism after the first author joined the lab about the validity of the original results based on the analysis.

To remedy the original study’s shortcomings, the authors attempt to re-analyze the original study following the first’ author’s memory and the paper, then present a novel analysis of the original data, along with several additional tests of a computational model using the original data. They also provide a cautionary tale about keeping track of your old analyses and emphasize the importance of open datasets.

Researchers Sean Duffy and John Smith sought to replicate analyses from their original study (Duffy et al., 2010, also published in PB&R), which compared a Bayesian model of categorization to human performance on a spatial judgment task.

The original model, known as the category adjustment model (CAM), is a Bayesian model of categorization judgments. It has been used to explain why individuals’ category representations are often close to the average (e.g. the size, length, loudness, voice onset time, etc. of the stimuli in an experiment). This phenomenon is known as the central tendency bias. CAM, as a model, has been applied to explain a wide variety of tasks. For example, it has been used to explain how the most common way of saying a word influences speech perception, or how the average properties of speech sounds impact word and sound learning. It has also been applied to spatial categories and facial recognition.

CAM accounts for the central tendency bias by stating that participants have an imperfect memory of the stimulus and retain a running average of the stimuli they have seen in addition to remembering how variable the stimuli have been. Below is a simple graphic of the two contributing factors that influence judgments, taken from the original paper by Duffy and colleagues:

As one example of a task that might demonstrate this bias, imagine participants are shown images of cats of different sizes, which disappear from view, and the participants attempt to recreate the size of the last cat from memory by sizing an image up or down. The size of these stimulus cats might come from a symmetrical normal distribution, or it could be skewed towards small sizes, skewed towards large sizes, or the cats might be uniformly varying in size.

The central tendency bias would predict that participants keep track of the distribution, thus remembering the average size of the cats, as well as how variable the cats have been in size. Because participants are biased toward the mean, their responses are typically closer to the mean than expected. More important, judgments should be biased against recent stimuli, other than by how their observation influences the running average and variation. The figure below shows a simple experimental timeline that depicts these different-sized cats over time.

The task of Duffy et al. (2010) and the reanalysis in the latest article by Duffy and Smith involved not pictures of cats but horizontal lines on computer screens.

In their line length judgment task, participants see horizontal lines that disappear, and then must adjust a line to the size they recall of the line that just disappeared. Typically, participants tend to judge lines as being closer to the mean length of the lines they have seen than is initially expected. Duffy and colleagues noted in their original study that participants often judged lines that were short as being longer than they actually were, whereas lines that were long were judged to be shorter than they actually were. This was true no matter whether the majority of lines were short, long, or came from a uniform distribution of line lengths. Below is the graphic that shows error (deviation from the horizontal line) across the three distributions, taken from the original paper:

Maybe you feel skeptical about these results – well, I am sure we have all had discussions with advisors, collaborators, and people who come to the microphone after conference talks who noticed a hitherto hidden flaw or questioned our results, analysis, initial theoretical conclusions, or all of the above. Thankfully, this time that skepticism led to the 2017 Duffy and Smith paper.

Rerunning the original analysis: In the years that have passed since the original study appeared (2010), researchers have gotten better about recording the exact steps of their analyses, which improves reproducibility at the level of analysis. So, the authors attempted to recreate the original analysis of the data from two experiments.

In the original analysis, it appears that the authors modeled the length of the line participants were estimating from memory as a function of the stimulus size, the running mean, and the mean of the preceding 20 trials, without exactly specifying how they calculated that mean. In the original paper, the mean was important, and the recent trials were not, but when reanalyzed, the mean was unimportant, and the recent trials were. This appears to be bad news for the CAM.

A different analysis using repeated measures regression: The original analysis worked with aggregate data – averages over all of the time points for each of the line sizes, and ignoring compounds between the different variables. In a new analysis, the authors used repeated-measures mixed-effects models (which we have covered before here) to account for participants’ responses on each trial as a function of the running mean, the mean of some set of recent targets (e.g. the last 3, 5, 10, 15, and 20 targets), and the target’s actual length.

In this analysis, the authors found even more bad news for CAM: the previous targets continue to influence participants’ responses, and the running mean only matters if models do not include information about previous targets (which are part of the running mean). This highlights the importance of modeling at the individual trial level, because many variables like running averages are correlated with previous observations.

The authors also took advantage of the computational properties of CAM to further explore their original data. They looked at two previously unconsidered consequences of the model that should be evident in the behavioral data:

(1) As we get more observations of stimuli in an experiment, the variability in our perception of those stimuli goes down. With each observation, we become more and more confident in the mean. Duffy and Smith decided to test whether participants became more sensitive to the mean length of the lines they are judging as the experiment goes on. More bad news for CAM: The authors found that participants do not seem to show greater bias toward the average when that average becomes less uncertain, that is as the experiment goes on.

(2) Related to the previous question, the authors asked whether participants avoid making responses that are shorter than the minimum or longer than the maximum, as the trials go on. Again, bad news: Participants do not avoid making responses that are impossible after having learned the distribution of the stimuli.

So what can we take away from this latest article by Duffy and Smith, other than that it is always worth looking at your old data with a skeptic’s eye?

Duffy and Smith provide us with a bit of advice: Wherever possible, (1) Make all of your data available at the individual observation level, and analyze it at that level; (2) Make your analyses available for re-examination; (3) If you have variables that can be operationalized in many different ways (like the influence of the previous k trials), test multiple variants of those and report all of them; (4) Consider whether your model has to have a specific structure, or whether other types of models (e.g., non-Bayesian versions) would make the same predictions; (5) Strongly consider experimental results that go against your model’s predictions, or that go against an entire class of models.

Altogether, the replication crisis need not be a crisis at all – sometimes looking at old data can provide new insights. Furthermore, everyone benefits from getting a better understanding of an experiment’s results in the statistical methods of the present, even if the new results go against popular thinking.

Psychonomic Society journal articles featured in this post:

Duffy, S. & Smith, J. (2017). Category effects on stimulus estimation: Shifting and skewed frequency distributions – A reexamination. Psychonomic Bulletin & Review. DOI: 10.3758/s13423-017-1392-7.

Psychonomic Society Featured Content