Preregistration of a forking path – What does it add to the garden of evidence?

Preregistration has many advantages, which have been pointed out in Steve Lindsay’s post yesterday and many other places. The most important advantage is probably that it demonstrates without doubt that the hypotheses and data-analysis path chosen for a study were not chosen in response to the data with an eye towards obtaining the desired results.

The question I want to raise is: Does a finding from a preregistered study have more evidential weight than one from a study without preregistration (everything else being equal)? Some proponents of preregistration obviously think so, and this was also reflected in our expert survey reported here and here. For instance, Hal Pashler in his keynote lecture at the 2018 Psychonomics Meeting (video here) argued that only findings from preregistered studies should be included in textbooks. What this implies is that we evaluate evidence differently depending on what we know about the researcher’s history of mind: If we believe the researcher has made all the choices before looking at the data, we assign the same evidence more weight than if we believe the researcher has made these choices only during interacting with the data.

This has paradoxical consequences.

Consider a study in which there are 29 different analysis paths to answer the same research question, all of which are equally justifiable. I did not choose the number 29 arbitrarily – it is the number of analyst teams who analyzed the same data to adjudicate whether there is racial bias in the handing down of red cards in soccer in a recent study (Silberzahn et al., 2018). As it happened in that study, different analyst teams came to different conclusions. Now imagine that Silberzahn and colleagues had introduced an experimental manipulation: Suppose a random subset of the teams had been assigned to the “preregistration” group which preregistered their analysis plans whereas the others did not. If we give results from preregistered analyses more evidential weight, then whether we conclude that referees give more red cards to dark-skinned players or not could depend on which analysis teams happen to end up in the preregistration group. I think we can all agree that our conclusions from data should not depend on decisions made by chance.

The more typical situation is of a single analyst team facing 29 or so equally justifiable analysis paths and choosing one for preregistration. If the different options are really equally defensible, that choice can only be arbitrary. Why should we give its outcome more evidential weight than the outcome of any of the other 28 not-registered analysis paths? Imagine our team submitted the results of their preregistered analysis for publication, and a reviewer asked for an additional analysis using one of the other analysis options. Suppose the authors oblige, and their second analysis arrives at a different conclusion. Should we now give the original, preregistered analysis more weight than the new, not preregistered one? The author questionnaire of JEP:HPP introduced in 2017 asks whether authors have clearly distinguished between confirmatory analyses that were planned a priori, and exploratory analyses, which include analyses suggested by a reviewer. If the reviewer’s suggestion is just as appropriate as the preregistered analysis for answering the research question I do not see why we should make that distinction. (If it is not, the authors should decline doing it.)

If we are in a situation where different, equally justifiable choices along the garden of forking paths lead to different answers to our research question based on the same data, then the only reasonable conclusion to draw is that the data are not suited to answer our question. In that case, preregistration of one arbitrarily chosen analysis path does not get us closer to that conclusion – to the contrary, it will probably increase our confidence in the answer we obtained from the one analysis path that happened to be preregistered and reported. If we are in a situation where the answer we get is robust against reasonable variations in our analytical choices, then there is no need to preregister: Even if our analysis path were influenced by what we see in the data, it would make no difference for what we conclude.

What to do? If there are multiple equally justifiable analysis paths, we should run all of them, or a representative sample, so see whether our results are robust. Perhaps a researcher does not want to invest that much time in data analysis; perhaps we don’t trust a single researcher to select an unbiased sample of analysis methods. In that case, making the raw data publicly available enables other researchers – in particular those skeptical about the original authors’ claims – to run their own analyses to check whether the results are robust. It seems to me that, once publication of the raw data becomes common practice, we have all we need to guard against bias in the choice of analysis paths without giving undue weight to the outcome of one analysis method that a research team happens to preregister.

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like