Avoiding Nimitz Hill with more than a Little Red Book: Summing up #PSprereg

At 1 am on 6 August 1997 Korean Air Flight 801, on approach to Guam, flew into Nimitz Hill, 6 km short of the runway, killing 228 of the 254 people on board. The approach occurred in limited visibility and while the instrument landing system was out of service. The crash was a classic example of controlled flight into terrain; an accident where nothing really goes wrong except that the aircraft is deposited in an unwelcoming place.

In the case of Flight 801, this occurred despite automated warnings that alerted the crew that they were far outside routine parameters, and despite the fact that at least 2 of the 3 men in the cockpit knew that something had gone awry. The NTSB accident report reveals considerable confusion in the cockpit during the approach, with instructions being disregarded without cross-checking and at least one team member likely aware of the impending doom.

This was far from the only tragedy involving Korean Air in the 1980s and 1990s. Nor was it the only one due to a collective failure of the crew even though at least one person in the cockpit had been aware of a problem.

And then everything changed.

Korean Air has not had a passenger fatality for more than 20 years and the single incident since 2000 involved an engine fire during takeoff. Korean Air now has a 7 out of 7 safety rating.

This dramatic improvement did not result from new “gee-whiz” technology or new equipment. It was the result of a change in culture. A revolutionary change inside a company that acknowledged its cultural legacy and then re-trained their pilots in Cockpit Resource Management—the ability to exchange information and cross-check team members’ performance without regard to hierarchy or rank.

Culture is (nearly) everything in aviation safety.

Culture is also (nearly) everything in science.

Experimental psychology currently feels a little like Korean Air after the Guam crash. We are in the middle of a cultural transformation, and like all cultural upheavals, this is both exciting and unsettling.

One of the elements of this cultural change is the issue of preregistration that we have been discussing in the #PSprereg digital event.

Steve Lindsay laid out the case for preregistration beautifully, and his introductory post was followed by 5 contributions that added further nuances and critique to the issue of preregistration. My intention in this post is to contribute my own views to the discussion and to draw some connections between the diverse voices we have heard during the last week.

Cultural revolution, but with footnotes

There are several new practices that clearly improve the scientific culture in experimental psychology. All of those new practices are terrific. But they should come with footnotes rather than (just) passion.

For example, I fully support openness and transparency. Wherever possible, data—and ideally also all analysis software—should be made publicly available to researchers (or indeed anyone). I expressed that commitment by joining the Peer Reviewers’ Openness Initiative, which means that I (and the other 508 signatories) will not review a paper unless the authors either make the data publically available or explain why they cannot do so.

But this commitment is not absolute: Openness and transparency have a price tag, and together with Dorothy Bishop, I argued in Nature a few years ago that there is a difference between being open and transparent on the one hand, and being naked and defenseless on the other. We must be open but also resilient.

The same commitment applies to preregistration: Nothing (much) happens in my lab without preregistration. All student projects are preregistered and my own studies, too, have been conducted with preregistration for the last few years.

But this endorsement is not absolute.

Preregistration is a proxy, not a panacea

A fundamental premise of preregistration is that if hypotheses, the sampling plan (i.e., clear stopping rules for data collection), and the analysis (especially discretionary decisions such as removal of outliers) are put on record before the data are collected and analyzed, then the “researcher degrees of freedom” are limited and hence questionable research practices (QRPs) such as HARKing and p-hacking are prevented.

This is indisputably true.

One simply cannot (honestly) run additional subjects after the first t-test shows p < .06 if the sampling plan has been specified a priori. One simply cannot pretend that one’s theory “predicted” that only female participants without college education would be responsive to the experimental intervention when the preregistered hypotheses did not mention this outcome.

Preregistration is therefore sufficient to prevent several QRPs. It is therefore advisable and hence I preregister virtually all my studies.

But it would be a logical fallacy to suggest that preregistration is also necessary to avoid QRPs. I believe that much of the confusion and dissenting voices are, at least tacitly, connected to this logical fallacy.

The paradox of predictivism

Preregistration enforces a temporal order between hypothesizing and interpreting the data. In so doing, it prevents hypotheses being developed on the basis of the results, which is considered bad practice for very good reasons. (As the Texas sharpshooter example vividly illustrates.) But temporal order ultimately does not matter—it is merely a proxy variable for the distinction that really matters, namely between justified and arbitrary hypotheses and analyses.

In philosophy of science, the question whether temporal order matters in establishing the validity of hypotheses is known as the paradox of predictivism. The paradox is between our strong intuition that a theory receives more support from a successful prediction of a novel, hitherto unknown, result, and an equally strong intuition that it should not matter if at the time a hypothesis is formulated, the results were already known to a tribal chieftain in New Guinea but no one else. The notion that the mental and social history of a researcher—e.g., whether she has traded cellphone numbers with the chieftain—should impact the degree of support that a theory receives from a finding seems untenable.

Instead, a strong theme in philosophy of science now holds that the degree of support a theory receives from having “predicted” outcome X is proportional to the strength of independent support for the theory. In a nutshell, if a theorist does not appeal to outcome X to motivate her theory, then successful observation of X supports the theory irrespective of when that observation was gathered. What matters, therefore, is epistemic novelty not an arbitrary temporal order. Those are not just abstract musings: most physicists consider the success of Einstein’s general theory of relativity to account for a known anomaly in the orbit of Mercury as highly confirmatory, even though the anomaly had been known for a long time. Einstein constructed his theory based on principles and constraints that had nothing to do with empirical data about Mercury.

So yes, preregistration enforces a temporal order that ensures that the theory being tested has been formulated independently of the subsequent discovery of X. But it does not follow that Einstein’s theory of relativity would have only found support from accommodating the orbit of Mercury if Einstein had lived 100 years earlier.

If a hypothesis is entailed by an independently formulated theory it does not matter when the theorist becomes aware of the supporting results.

In a nutshell: Preregistration is a convenient proxy, but not a necessary condition to avoid HARKing. Likewise, preregistration is a convenient way to ensure that your data analysis was not guided by the desired outcome. But omitting preregistration does not imply you HARKed or decided on the analysis only because then p < .05.

Going beyond the proxy of preregistration

With those considerations in mind, we can now revisit the posts of this digital event in a new light.

  • Klaus Oberauer agrees that preregistration “demonstrates without doubt that the hypotheses and data-analysis path chosen for a study were not chosen in response to the data with an eye towards obtaining the desired results.” He then goes on to explore paradoxical consequences similar to the ones I just reviewed and comes to the conclusion that “if there are multiple equally justifiable analysis paths, we should run all of them, or a representative sample, so see whether our results are robust.” This recommendation augments, but does not negate, the utility of preregistration.
  • Richard Morey worries that the problem with preregistration “is not that preregistration is a straightjacket, but rather that it actually lulls one into thinking that methodological vice is virtue” by legitimizing one analysis (often determined by the “tyranny of the default”) over multiple other sceptical and thorough analyses. His solution is the same as Oberauer’s: encourage researchers to run multiple analyses irrespective of whether all (or indeed any) have been preregistered.
  • Danielle Navarro is predicting the anomalous orbit of Mercury. Well, not exactly, but her open notebook approach renders visible the processes behind theory construction which allow us to ascertain the degree of support a theory should receive from a finding irrespective of its temporal history (and hence irrespective of preregistration). An open notebook or a GitHub archive allows even greater scrutiny—and greater protection against HARKing—than a single preregistration.
  • Iris van Rooij provides another philosophical perspective. Much like Navarro, she cites the importance of theory and notes how concerns about preregistration are expressed primarily by “researchers [who] build on a long tradition of mathematical and computational rigor in their theory-driven science.” I believe that those concerns arise in that community precisely because rigorous theorizing is less in need of preregistration.
  • Rich Shiffrin recognizes that “many arguments for pre-registration are based on the imperative that decisions about analysis and interpretation should be made prior to seeing the results.” As we have seen, the temporal order should only be seen as a convenient proxy and not an imperative. Accordingly, Shiffrin suggests that “it is absolutely correct for a scientist to look at the data and then decide how to proceed.” Yes, there is nothing that speaks against that but only if the decision was influenced by criteria other than to obtain a desired result—and here, pace Shiffrin, I believe that cognitive scientists know enough about human decision making to realize that this intellectual purity will be elusive even with the best of intentions. We are too good at fooling ourselves to allow us to proceed without preregistering our analysis plan or without reporting a multiverse analysis.

We must not fly into Nimitz Hill. We need to change our culture. But the cultural revolution should rest on more than one Little Red Book. Preregistration is a very useful driver of this revolution but, like transparency and openness, it requires nuanced understanding rather than unconditional adherence.

Intriguingly, the reform that turned around Korean Air was based on an infusion of adaptability and flexibility into the cockpit, to replace exclusive reliance on rank and uncritical execution of procedures.

 

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like

1 Comment

  1. The thousands of manuscript submissions I have seen over the last few years lead me to believe that there is a vast population of researchers who strive to make contributions to the psychological literature but who do so in ways that put them at high risk of false positives or exaggerated effect-size estimates. My contribution to this Digital Event aimed to communicate to such researchers.

    The thesis of my piece was that if you are going to conduct hypothesis-testing research then it is a good idea to work out and record your plans for a study in advance. The aim is not to limit what you do after the data are in hand, but to enable you (and others) to distinguish between your (a) a priori plans and predictions and (b) decisions that arose only in light of the data. Preregistering a research plan for hypothesis-testing research promotes transparency and, I believe, foster replicability. Those aren’t the only aims of science, but they aren’t chopped liver.

    As things turned out, if the audience I sought has been following this Digital Event they were probably discouraged from recording a plan for their research studies before beginning data collection. Perhaps those readers will instead by inspired to take up Bayes, multiverse analysis, severe testing, computational modeling, and theory development. But probably they will stick with their standard practices. Just in case it might help, here I make another pitch for preregistration.

    In his post, Richard Shiffrin made a number of wise observations regarding the nature of science. He is a great scientist and who could argue with his points about the complexity of science? But with all due respect those insights did not dissuade me from believing that many research psychologists would benefit from preregistration.

    Rich argued that because science is inherently complex “simple methodological rules intended to apply generally, or even most of the time, are misplaced and likely to do more harm than good.” It seems to me that it is precisely because science is inherently complex that it is so valuable to follow general principles of good practice, such as showing your work, using work-flows that reduce error rates, having systems to double-check data and analyses, assessing the reliability of measurement tools, taking steps to minimize experimenter effects, etc. In my view, one such general good-practice guideline is to keep and safeguard detailed records of your work, including your plans for conducting and analyzing hypothesis-testing research.

    Rich asked “What is the harm” that follows from publishing claims that turn out to be exaggerated or completely misplaced? What is the harm, for example, if it turns out that power posing does not confer dramatic lasting benefits, that grit and growth mindsets aren’t potent and malleable determinants of human performance, that passive exposure to socially laden primes does not affect gait or honesty, or that exercising self-control does not deplete a limited resource of will? I do not mean to foreclose on the status of these hypotheses, but if it turns out that these effects are not real (or that they are tiny, fleeting, and of very limited generality), then it seems to me that the harm will have been substantial.

    The ill-effects of publishing false positives and exaggerated effect-size estimates would be smaller if our error-correction practices were better. But in many areas of psychology it is common for researchers to conduct a large number of low-powered studies that afford many researcher degrees of freedom. And it is common to submit for publication only those studies from which the predicted effect was successfully wrested. Thus the psychology literature includes phenomena that appear in meta-analyses to be large and robust but that fail to materialize (or to be much smaller than originally reported) in large preregistered studies (see https://www.psychologicalscience.org/publications/replication/ongoing-projects ). (Note, btw, that psychology has many robust effects that do replicate in large preregistered studies; see, e.g., Zwaan et al., 2017).

    To turn Rich’s question around, what is the harm of preregistering a research plan? Richard Morey expressed concern that preregistering might lead researchers to analyze their data in shallow ways, slavishly adhering to the plan even if alternative analyses would be more telling. That’s possible, and certainly there is no harm in encouraging psychologists who preregister to keep their eyes and minds open when analyzing their data (which I believe was Richard’s main aim). I also think it would be harmful if people assumed preregistered=good and non-preregistered=bad (because of course it just ain’t so), but that isn’t a harm of preregistration but merely a misapprehension of it.

    As per Iris van Rooij’s thought-provoking post, it will avail us little to be rigorous in studies of trivia. Researchers must think hard and long about measurement, theory, and context if we are to develop a practically useful psychological science. Those are hard tasks, big challenges, and crucially important. I agree that there is a sense in which theory is prior to test. But I do not see that as an argument against preregistering plans when conducting hypothesis-testing research. The best-laid plans gang aft agley but that is not a reason to sally forth willynilly. If we had better theory we could probably do better preregistrations, but let’s not wait for theory to do what we can to reduce the frequency with which psychology publishes false positives and exaggerated effect-size estimates. That seems like a rather easy problems to address.

    I plan to continue to think through, record, and register my plans for empirical studies before I begin data collection. I believe that the benefits of doing so greatly outweigh the costs. Of course, other psychological scientists must decide for themselves whether or not (or, as per Danielle Navarro’s nuanced posting, for which sorts of projects) preregistration is worthwhile. But I hope that they will at least consider it (see, e.g., Rouder, https://jeffrouder.blogspot.com/2018/11/preregistration-try-it-to-see-if-you.html ).

    Steve Lindsay

    R.A. Zwaan, D. Pecher, G. Paolacci, S. Bouwmeester, P.P.J.L. Verkoeijen, K. Dijkstra & R. Zeelenberg (2018). Participant nonnaivete and the reproducibility of cognitive psychology. Psychonomic Bulletin & Review, 25, 1968-1972. doi: 10.3758/s13423-017-1348-y