Prediction, pre-specification and transparency

I have a confession to make. I like the idea behind preregistration a lot, and over the last few years I’ve been making a concerted effort to start using it more often, but I still don’t preregister as much as I “should”. Ever since I was asked to write this post I’ve been wondering where my reluctance stems from (besides laziness, I suppose). I think the answer lies in the fact that preregistration can be used for many different purposes, and there is often ambiguity about which goals are being served. In my own work I typically run into three related but distinct goals: prediction, pre-specification and transparency.

Perhaps the simplest of these three goals is the desire to make predictions about the results of an experiment, and to avoid hindsight bias by putting my predictions in writing before running the study. To serve this goal properly, I’ve found it helps me to just be as brutally honest with myself as I can. In most experiments I run, I really don’t have any strong predictions to make, because I don’t have a lot of faith in my own theories about the world. At most I have a vague and mild hunch about what might happen: if so, making vague and mild claims seems entirely appropriate. There are exceptions of course. For instance, when running a replication of one of my earlier study, I’m much more likely to preregister a stronger prediction.

When using preregistration to lodge a prediction, I don’t find much use in making the preregistration particularly detailed. In contrast, a second use case for preregistration is to create a pre-specified analysis plan for a confirmatory data analysis, and in this case the details are everything. If the reason I’m preregistering something is to help control Type I errors in the sense laid out by Jerzy Neyman, then I must be incredibly specific, and I cannot deviate from the plan. The moment I depart from my plan, the p-value I compute will no longer be correct. Personally, I’m not much of a believer in Type I error control. I don’t think that Neyman’s approach to hypothesis testing is a particularly useful way to do inferential statistics, and I try to avoid using it. However, even for a Bayesian, I think that confirmatory data analysis can benefit from pre-specification. Clearly stating one’s priors and the statistical model one intends to apply can be a worthwhile exercise to prevent innocent self-deception. This is something I don’t do as much as I should, and is an area where I’m trying to be better.

That being said, one of the big reasons I rarely do this in practice, is that very few of my analyses are genuinely confirmatory. There aren’t many occasions where I have enough confidence in my beliefs about the phenomenon I’m studying that I would feel comfortable trying to specify an analysis method before encountering the data. Most of my work has enough of an exploratory flavor to it that I don’t think that confirmatory analyses are the right thing to be doing. In fact, if I’m being completely honest, in most cases the only reason I report a p-value (or Bayes factor) is that reviewers and journal editors require me to do so. With only a few exceptions, hypothesis testing – whether orthodox or Bayesian – is simply not how I would do my research if I had the freedom do to so.

A third role for preregistration – and for me, the one that causes most headaches – is to aid transparency of the research process. Can other people work out exactly what I did or what I knew at different stages of the research project? To understand why I have reservations about preregistration in this context, it’s useful to talk a bit about what I do. Most of my research relates to the development of computational models of learning, reasoning and decision making. For instance, in one recent paper I’ve built a model for some inductive reasoning problems based on Gaussian process regression, or in an older paper I modelled category learning using Dirichlet process mixture models.

But those are just one flavor of cognitive model: I might equally be interested in building connectionist models, sequential sampling models, or a variety of other possibilities. What I’ve consistently found when building new models is that the class of possible cognitive models is limited only by one’s own creativity. It is very different to fitting a GLM or running an ANOVA, simply because the space of possible models is so much bigger, and the theoretical constraints supplied by existing literature are often quite modest. The consequences of this are that during the model construction process (a) I cannot prespecify an analysis plan because I don’t have one and (b) I can make only the most vague of “predictions” about what my final model will look like because I simply don’t know. In this context the utility of preregistration would seem to be determined solely by whether it is a good method for ensuring transparency.

I want to argue here that it is not the best solution to this problem.

There are reasons why one might want to employ something akin to preregistration here: building a new computational model is a creative and iterative process of trying out different possible models, evaluating them against data, revising the model and so on. As a consequence, of course, there is quite rightly a concern that any model that emerges will be overfit to the data from my experiments. There are tools that I can use to minimize this concern (e.g., cross validation on a hold-out data set, evaluating the model on a replication experiment, and so on), but to employ them effectively I need to have alternatives to my model, and this is where an extremely high degree of transparency is important. Should someone else (even if that’s just me a few months later) want to test this model properly at a later date, it helps to be able to follow my trail backwards through the “garden of forking paths” to see all the little decisions I made along the way, in order to ask what are the alternative models she didn’t build? To my mind this really matters – it’s very easy to make one’s preferred model look good by pitting it against a few competitors that you aren’t all that enthusiastic about. To develop a “severe test”, a model should be evaluated against the best set of competitors you can think of, and that’s the reason I want to be able to “go back in time” to various different decision points in my process and then try to work out what plausible alternatives might have emerged if I’d followed different paths.

With this in mind, I don’t think that (for example) the current OSF registration system provides the right toolkit. To produce the fine-grained document trail that shows precisely what I did, I would need to create a great many registrations for every project (dozens, at the very least). This is technically possible within the OSF system, of course, but there are much better ways to do it. Because what I’m really talking about here is something closer to an “open notebook” approach to research, and there are other excellent tools that can support this. For my own part I try to use git repositories to leave an auditable trail of commit logs that can be archived on any number of public servers (e.g., GitHub, BitBucket, GitLab), and I use literate programming methods such as R Markdown and Jupyter notebooks to allow me to document my thinking on the fly during the model building process. Other researchers might have different approaches.

Although the goals of the open notebook approach are not dissimilar to preregistration insofar as transparency is relevant to both, there are a lot of differences. The workflow around the OSF registration system makes it easy to lodge a small number of detailed registrations, whereas a notebook approach based around git repositories emphasizes many small registrations (“commits”) and allows many paths to be explored in parallel in different “branches” of the repository. Neither workflow seems inherently “better” than the other in my view: they’re useful for different things, and as such it is best not to conflate them. Trying to force an open notebook to fit within a framework designed for preregistrations seems likely to cause confusion rather than clarity, so I typically don’t use preregistration as a tool for aiding transparency.

In a sense, the argument I’m making about transparency in computational modelling has much in common with what Trish Van Zandt had to say about exploratory data analysis at the last Psychonomics meeting. As she puts it,

There is a tendency today to view exploratory data analysis with suspicion, and this is a mistake. Exploratory data analysis is absolutely critical for model development and experimental design. What is also critical is that such analyses are performed in a completely transparent fashion; preregistration is not a guarantee of transparency, and I fear that it will instead hinder transparency.

I think she’s completely right. Along similar lines, when I express reservations about the utility of preregistration for computational modelling, it is not because I am opposed to “open science principles” (whatever those are taken to be). On the contrary, it is because transparency is my primary goal and since preregistration doesn’t provide the kind of transparency I’m looking for, I use different tools.

I’d like to think that’s reasonable.

Prediction, pre-specification and transparency

You may also like

#whatWM? Definitions of working memory do not need provocative claims

Where science fiction meets science. Looking inside the brain during social interactions

Getting ready to play