Psychological science needs theory development before preregistration

“(…) a substantial proportion of research effort in experimental psychology isn’t expended directly in the explanation business; it is expended in the business of discovering and confirming effects”

—Cummins (2000).

I am contributing to this digital event from a theoretician’s perspective. I thought I’d be upfront about this to set the right expectations. Theoretical perspectives have been largely neglected in developments of improving psychological science over the last decade, and it seems that the current discussion surrounding preregistration hasn’t made it easier to bring in such perspective. If anything, it seems to have made it harder.

Advocates of preregistration have centered their concerns on things such as (a) error control in statistical inference, and (b) improving the replicability of empirical findings. I will leave it for others to comment on (a). Nothing I wish to argue here hinges on it. As for (b), it is worth noting that it remains unclear whether improving replicability should be a goal in itself. Likely, a single-minded focus on improving replicability risks that we, as a field, get stuck on studying only simple systems and/or mistaking complex systems for simple ones (cf. Devezer et al., 2018). Be that as it may, this blog focuses on a different (though related) issue.

My main concern, here, is that the current discussion surrounding preregistration continues to distract psychological science from a deeper and longstanding problem: the poor theoretical basis of psychological science and the pervasive illusion that gathering more reliable “effects” can make up for it.

One may object (as some have) that improving psychological science on one aspect need not come at the cost of improving another important aspect. Yet, in practice, it does. I’ll explain why. But for that, I need to cover some ground.

Philosophy of science

As any view on the pros and cons of preregistration depends on the underlying philosophy of science, I should disclose that my view is grounded in philosophy of psychological science (e.g., Cummins, 2000; Wright and Bechtel, 2007). This philosophy departs from traditional philosophy of science which primarily focuses on physics. While theories in physics often involve subsumption under law, cognitive psychological theories aim to provide mechanistic explanations. In other words, cognitive psychological theories aim to answer the question “How does it work?” and not just “What are the laws?” (Cummins, 2000).

Ever since the cognitive revolution started in the 1950s, computational concepts and formalisms have proven vital for casting explanations in cognitive psychology in mechanistic terms. However, widespread adoption of these tools in psychological science, broadly construed, seems to have been hindered by historical and sociological factors.

Much of current psychology is in the business of discovering and confirming “effects”. But effects are explananda (things to be explained), not explanations. As Cummins (2000) notes, no one would claim, e.g., “that the McGurk effect explains (…) why someone hears a consonant like the speaking mouth appears (…) That just is the McGurk effect“. Moreover, such effects as we typically observe in the lab aren’t even the primary explananda for psychology. They are secondary explananda. Testing for effects should be a means of testing our explanations of primary explananda.

Psychology’s primary explananda are not effects

Primary explananda are the set of key phenomena that define a field of study. For instance, cognitive psychology’s primary explananda are the various cognitive capacities that humans and other animals seem endowed with. These include the capacity for learning, language, high-level vision, concepts and categories, memory, decision-making, and reasoning to name just a few.

It is only by the way in which we postulate that such capacities are exercised, that we come to predict effects. A simple example is given by Cummins, using multiplication as an illustration:

‘’Consider two multipliers, M1 and M2. M1 uses the standard partial products algorithm we all learned in school. M2 uses successive addition. Both systems have the capacity to multiply: given two numerals, they return a numeral representing the product of the numbers represented by the inputs. But M2 also exhibits the “linearity effect”: computation is, roughly, a linear function of the size of the multiplier. It takes twice as long to compute 24 X N as it does to compute 12 X N. M1 does not exhibit the linearity effect. Its complexity profile is, roughly, a step function of the number of digits in the multiplier.” (Cummins, 2000).

The example illustrates several points. First, effects that are typically tested in our labs are incidental to the way in which capacities are exercised. They can be used to test competing explanations of “how it works” (e.g., by giving a person different pairs of numerals, and measuring RT, one can test whether or not their timing profile fits M1 or M2, or any different M’). Second, candidate explanations of capacities (e.g., multiplication) come in the form of different algorithms (e.g., partial products method or repeated addition) that compute a particular problem (i.e., the product of two numbers). Such algorithms aren’t constructed to account for the incidental effects, but postulated as a priori candidate procedures for realizing the target capacity.

Not effects but explanations need to be discovered

The primary explananda in cognitive psychology do not need to be “discovered” (Cummins, 2000). Everyone knows that humans can learn languages, interpret complex visual scenes, and can navigate a dynamic and uncertain physical and social world. These abilities are so complex to understand computationally, that we have no idea yet how to emulate them in artificial systems at human levels of sophistication. What needs to be high on a fundamental research agenda is not the discovery of arbitrary effects, but discovering plausible ways of explaining capacities.

Only when we have good candidate explanations can we know which effects are most informative and relevant to test for. It will also enable us to formulate theoretically motivated ideas about when—under what experimental conditions—an effect can rationally be expected to replicate and when it would not, which can be used to rigorously test between competing computational explanations (see e.g. Kokkola et al., in press).

But it takes time and resources to get there. Developing good candidate explanations is a non-trivial pursuit. Just-so stories or verbal theories won’t do. Good candidate explanations have to be precise unambiguous specifications and ideally can be computationally simulated. It requires real ingenuity to come up with even one plausible candidate explanation for human abilities like, say, pragmatic inference in communication (Goodman & Frank, 2016), the ability for analogical reasoning (Gentner, 1983), or the ability to generate novel hypotheses (Blokpoel et al., 2018). And even then, we run into obstacles since such candidate explanations may yield models that fit toy problems in the lab, but still fail to scale or generalize to more complex real-world situations (van Rooij, 2008)—a bit like a multiplier that would work for 2 x 50, but fail for 86 x 27.

Exploration of theoretical spaces

What we need, thus, is exploration. Not just of data, but of theoretical ideas about how to construct candidate explanations for primary explananda. The search space is vast. Explaining a cognitive capacity involves specifying (1) the function, or problem, that it computes (Marr’s, 1981, computational-level theory), and (2) the algorithms by which those functions are hypothesized to be computed (the algorithmic-level theory). Both spaces—of possible computable functions and algorithms—are infinite. Even trickier, the search space is rugged. By this I mean that whatever search methods we adopt to find good candidate computational- and algorithmic level theories, they must be able to get out of local optima where models seem locally well-behaved and corroborated but are in fact globally far from the truth (i.e., have low verisimilitude).

We know from complexity theory that searching rugged spaces can be intractable even when we have perfect knowledge of all relevant information (Arora & Barak, 2009). In empirical science things are even worse: besides combinatorial complexity, we have uncertainty to deal with. It is tempting to think we can best cope with that uncertainty by trying to work bottom-up from simple effects, make sure we get those right first, and then build theory from thereon up. But searching a theory space based on local binary decisions about the presence or absence of effects is highly inefficient and arguably ill conceived (see also Newell’s, 1973, “You can’t play 20 questions with nature and win”). And even if one moves beyond a binary decision approach, inferring theory from observation remains an ill-defined problem. We may be able to formalize parts of it but, again, such an approach may work for simple statistical problems where stringent assumptions hold true, but does not scale straightforwardly beyond toy scenarios (Navarro, 2018).

It is not a limitation of our methods that we cannot deduce theories from data, but a fundamental limitation of the scientific enterprise (cf. the Duhem-Quine thesis; see also Meehl’s, 1997, “The problem is epistemology, not statistics”). At best we can abduce theories. And abduction, we know, is not a local inference but highly sensitive to background knowledge (this is in part why abduction has so far eluded tractable formalization in cognitive science; Blokpoel et al., 2018). Hence, we cannot rationally expect to be able to determine what is true locally without weighing the evidence in a global context.

Theory development requires resources and tools

Because theory is hard, some believe it is wise to start with easier and more tangible goals, such as improving statistical inference and replicability. As a theoretician who studies the fundamental limits of resource-bounded minds, I am not convinced. If we want to improve any aspect of our field it requires resources (time, money, researcher training, tool development, etc.).

The vast majority of resources have been invested in strengthening our experimental and statistical tools. Much less has been invested in strengthening our tools for building theories. This can be understood historically and sociologically: many psychologists see the “experimental method” as the mark of psychology as a science, whereas theoretical research is often perceived as mere philosophizing. The status quo is maintained by our psychology curricula. Every psychology student receives training in experimental design and statistics, but virtually no formal training in theoretical or mathematical psychology. It is no wonder that people complain that what is often called “theory” is little more than loose story telling. What else could it be, if the vast majority of researchers in psychology lack the basic tools and training required for building solid theories?

Communication is furthermore complicated by confusion about what theory development entails. I regularly encounter claims like “making good theories just means making theories that make (falsifiable) predictions.” But, to me, this confuses several things. First, stating a desired feature for theories is not yet a constructive approach to building theories. Second, on the philosophy of psychological science assumed here, good theories are not defined by how well they predict, but by how well they explain; i.e., how well they answer the question “how does it work?” (Cummins (2000) explains how prediction and explanation are dissociable; for example, to this day we predict the tides with tide tables and could do so long before we had an explanation of the tides). Predictions of incidental effects can serve to perform confirmatory tests of candidate explanations. But we need theory development to get any good candidate explanations on the table in the first place.

How to move forward from here

These days, anyone who ventures to question the use of preregistration risks being looked at with suspicion. This was visible in online responses to Van Zandt’s and Shiffrin’s Psychonomics talks two months ago. Such atmosphere is unhelpful.

Concerns about a generalized push for preregistration are currently coming from researchers in cognitive science and mathematical psychology. These researchers build on a long tradition of mathematical and computational rigor in their theory-driven science. Dismissing their concerns out of hand misses an opportunity for learning about different perspectives and methods for improving psychological science.

I hope this digital event will be the start of fruitful dialogue. There are many opportunities, and an urgent need, for developing open science practices that support the rigorous and cumulative development of theories (see e.g. Guest & Rougier, 2016). I see it as the next important challenge for open science to build more common ground with theory-driven researchers on what we need to move psychological science forward.

Acknowledgements

I am grateful for the many discussions on- and offline that have shaped my thoughts on this topic. Thanks to Mark Blokpoel, Berna Devezer, Remi Gau, Will Gervais, Olivia Guest, Steve Lindsay, Esther Mondragon, Roland van Mossel, Danielle Navarro, Ven Popov, Anne Scheel, Joshua Skewes, Paul Smaldino, Sanjay Srivastava, Ivan Toni, and Brad Wyble, and others who I may have forgotten to list.

Author

Iris van Rooij

View all posts