“But when does lack of ‘simplicity’ in the protective belt of theoretical adjustments reach the point at which the theory must be abandoned?” – Lakatos, 1976
What does it take to falsify a psychological theory? This question sounds straightforward: if you find data that are inconsistent with the theory, you reject the theory. But in the real world, when testing theories, we have to specify how exactly the broad theory ‘grounds out’ in a specific set of data. This is not always easy.
If we say, “Coffee makes you alert,” how do we then measure alertness? There are many possible waysand each could give a different answer. This is because researchers can have very different ideas about which measure best captures constructs like alertness. If some of these measures happen to show decreased alertness, does this mean that we should reject the theory as a whole? Imagine you give people coffee and then test how well they can do on a challenging game that requires alertness but also requires very steady hands. Because coffee makes some people jittery, you find that, on average, people do worse at this game after drinking coffee. Does this mean we should reject the broad theory that coffee makes you alert? Many people would say no and point to the idea that the broad theory might be true while the specific measure used to test it in this case is flawed.
Our article, written by Maria Robinson, Jamal Williams, John Wixted, and Tim Brady (pictured below), published in Psychonomic Society journal, Psychonomic Bulletin & Review is driven by the core question, “What does it take to falsify a psychological theory?” This paper builds on recent work on theory assessment practices in psychology. Using an accessible ‘case study’ to examine a fundamental idea from philosophy of science, we demonstrate how the ‘protective belt’ of auxiliary assumptions (like whether your measure of alertness is a ‘good’ one) affect our ability to test theories. While broad, verbal theories, are often hard to rigorously test, and many people expect that very well-specified, computational models – where parameters of the model appear to straightforwardly map on to constructs like ‘alertness’ – would not suffer from this same issue and would be easy to evaluate. In our paper, we show how this assumption fails and what we can do to avoid it.
We focus on a concrete example of prominent theories — well-specified in mathematical models — which have been repeatedly tested by sophisticated computational modellers. In particular, we target the theory that people can “remember 3-4 items in working memory,” a classic idea of the ‘slot model’ of memory that when faced with many items to remember, only a few (N, usually 4) of those items can be remembered and all others are completely forgotten. This intuitive model is tested against another popular theory, often called ‘continuous resource’ theory, that posits that some information is always present in visual working memory – even though it may be highly degraded and essentially ‘noise-like.’
We specifically examine tests of these theories in a widely cited article reported in a high-profile outlet (Rouder et al., 2008, Proceedings of the National Academy of Sciences) as well as a replication and extension of this study, which took the first steps towards addressing limitations of the original study. We use these articles as case studies because they continue to have a significant influence on how researchers theorize about and measure visual working memory and are, therefore, incredibly important in the study of visual memory. Moreover, both articles involve testing well-established models by highly advanced computational researchers, which we would assume is the best-case scenario for assessing psychological theories.
In our reanalysis, we find that these papers still fall prey to the incredible difficulty of sorting out which aspects of a theory are core to it, and which are just “auxiliary” assumptions needed to ground out the theory in specific data and methodology (Fig. 2). We show, through a systematic reanalysis, that when core theoretical and analytic assumptions are checked, the data in these papers are either non-diagnostic or support entirely opposite conclusions to those made in the original work.
In other words, the two theories of visual working memory were tested using different or unchecked auxiliary assumptions, and these ancillary decisions, rather than the merits of slot vs. resource theory per se, led to the specific conclusions in these papers (Fig. 3). Therefore, while the original articles reported support for slot models, our reanalysis indicates that they actually show more support for resource models (Fig. 4).
Lakatos responds to his question by pointing to the idea that the ‘protective belt’ of auxiliary assumptions is difficult to assess based on parsimony alone. Instead, auxiliary assumptions should be seen as a series of subsidiary theories, which support the core theory and may warrant revision or replacement as part of theoretical development. While Lakatos’ philosophy of falsification was nuanced, we use an accessible and prominent example to outline basic steps towards identifying and testing such subsidiary theories and addressing current challenges in falsification in psychology. Together, our article integrates and builds on critical contemporary issues in theory assessment practices, and we hope researchers from various sub-disciplines engage with this work.
Links to related papers
For readers interested in additional contemporary work on theory assessment practices in psychology we recommend a starting sample of the following articles:
- Anatomy of a psychological theory: Integrating construct-validation and computational-modelling methods to advance theorizing
- If mathematical psychology did not exist we might need to invent it: A comment on theory building in psychology
- How computational modelling can force theory building in psychological science
- Theory before the test: How to build high-verisimilitude explanatory theories in psychological science
- Arrested theory development: The misguided distinction between exploratory and confirmatory research
- The ‘paradox’ of converging evidence
- A model hierarchy for psychological science
- Rigorous exploration in a model-centric science via epistemic iteration
- If god handed us the ground-truth theory of memory, how would we recognize it?
Featured Psychonomic Society article
Robinson, M., Williams, J.R., Wixted, J. T. & Brady, T. (2024). Zooming in on what counts as core and auxiliary in theory assessment: A case study on recognition models of visual working memory. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-024-02562-9