Complexity of science v. #PSprereg?

I have written about a number of issues concerning the practice of science out of concern that the present narrative is unbalanced: I believe that science is doing very well, even in our fields, despite the problems many have identified. One essay, albeit aimed at all of science, is found in the recent PNAS article Scientific progress despite irreproducibility: A seeming paradox (with my co-authors Katy Börner and Stephen M. Stigler).

Of my various proposals, one has sparkled particularly many negative reactions: The argument that pre-registration, in many and possibly most cases, has more costs than benefits.  The issue is quite complex, with good pro and con arguments on both sides. However, some advocates seem to take a somewhat evangelistic tenor, strongly suggesting that most scientists most of the time ought to employ this methodology. I therefore think it important to present the contrary case, trying to balance the story.

My arguments are rooted in what is a fundamental conception about the way science works, that I believe becomes a misconception in the hands of many arguing for pre-registration. I think it clear that science starts with data, and data lead to hypotheses, theories, quantitative models, new tests, new studies, and further data—science is post-hoc, not a priori. Furthermore, the universe we inhabit is infinitely complex, much of it unknown to scientists, and progress occurs when new and unexpected data is discovered. Scientists are not omniscient and generally do not and cannot anticipate accurately the results of those very studies that move science forward. Pre-registration at least implicitly assumes the universe is known, and is designed to affirm what is already known or claimed. This theme will play out in the following essay.

A second theme, equally important, is what I see as an incredibly oversimplified view of the way science operates that seems to underlie arguments for pre-registration: The data we observe in our experiments are due to an extremely complex mixture of processes, and even in the best controlled studies differ due to differences among individuals, use of differing processes, and differing choices of strategies. We should not be asking whether an effect is valid or true, or a process is valid or true. Rather we should be trying to discover which effects occur in what settings and with what sizes, and ask what are the primary cognitive and behavioral processes that operate in that range of settings. The question “Is an effect present?” is really misguided since every variable manipulated has some effect, and even small effects can be important (as when a manipulation designed to affect voting has a minuscule effect but when operating across a population of millions can swing an election).

I do not want to imply that all scientists in our fields should operate as I tend to do, exploring patterns of results over many conditions studies, and using quantitative modeling to try to capture the main processes at work. Simpler studies targeting ‘single; effects can also be important and informative, but should be interpreted in the context of the complex mixture of effects and processes that are always at work.

There are many forms of registration and pre-registration, some being employed now and some that could be employed in the future. Some require reviewing by third parties, but I will only consider versions in which the scientist places plans for an experiment that has yet to be carried out in an archive that is publicly accessible. The situations in which different versions will produce a gain for science rather than a loss is a complex matter.

Let me remark that I do not see a sharp dichotomy between exploratory and confirmatory science so intend the following remarks to apply to all the science we do, though in varying degrees. Consider the ‘trial and error’ basis for scientific research. We plan a study as best we can, collect data, and are often surprised by the results. Many times the results are not scientifically interesting, except that they show the design or execution was poor or inadequate. For example, there may have been a programming error, a poor analysis method, a critical control condition that was omitted, an inappropriate range of control variables, a poor set of instructions, choices of design that produce results that provide no useful information (e.g. all conditions produce data at ceiling or floor), and much more along these lines. The scientist then produces a redesign, and tries again. This process may iterate several times before a good experiment emerges, and only then might the results and analyses deserve submission for publication. At what point in this iteration is it worth archiving the plans for the next study? Other times the results are surprising and although not publishable as they stand, lead to a new line of investigation. Pre-registration seems misplaced in such scenarios.

Some methodologists accept different versions of this reasoning for exploratory science but argue for pre-registration mainly, or only, for confirmatory science. I believe this distinction is debatable. Consider some strong versions of confirmation science: A result or theory has been proposed and published, and a scientist carries out a new study to assess the validity of the result or theory. A little thought reveals this is not a yes-or-no issue. All theories are wrong, and all results are context dependent and apply only in certain situations and not others. Confirmation is thus a ‘fuzzy’ concept, not easy to define. The simplest and strongest form of confirmation would be a demonstration that a study carried out that is identical to the original produces a result ‘close’ to the original (put aside for the moment how close is ‘close’, itself difficult to define, especially when a result is a multi-dimensional pattern). The problem of course is that it is impossible to carry out a perfect replication—there are always differences, some of which could be important. Thus even in its strongest form confirmation is actually a matter of generalization. In that sense, confirmation is actually exploration of a new part of the empirical space. At the very least, the strength of arguments for pre-registration for replication studies must depend on the degree to which replication approximates the original study, and the degree of correspondence of the two sets of results, but these are not factors establishable by some arbitrary statistical criterion. Good judgment is needed.

Many arguments for pre-registration are based on the imperative that decisions about analysis and interpretation should be made prior to seeing the results. This imperative is embedded in many forms of Bayesian inference, with a strong distinction between probability based on prior information and likelihood based on data (something I have argued is not always appropriate in the face of vague information, as is the case in science; see the paper by Pothos, Shiffrin, and Busemeyer, 2014). Pre-registration is therefore supposed to keep scientists ‘honest’ and prevent HARKing, p-hacking, and analyses and interpretations decided after seeing the data.

However, science is post-hoc, and I believe it is absolutely correct for a scientist to look at the data and then decide how to proceed. This essay is word-limited and far too short to delve into this issue in detail, but let me take a simple example raised by Joachim Vandekerckhove: deciding what subject and what data to discard. One can specify in advance the criteria for doing so, but I do not believe this is sensible. Even for simple one dimensional data, one would want to look at the distribution of data (across people, conditions, trials) and trim data based on points on the distribution that are clearly deviant from most of the distribution. (Of course this might discard data that is valid due to a long-tailed process, but it is far more likely such deviant data is produced by different processes and possibly errors). Perhaps one could try to specify how one might deal with all possible distributional forms, but this would be almost impossible. If for example, there were a clear bi-modal result with half the data completely deviant from the other half, one would likely want to analyze each half separately. Establishing all such possibilities in quantitative detail would be impossible, and this difficulty occurs even for simple one-dimensional data. The difficulties are immensely larger when the data are multi-dimensional, across many conditions and measures. Even more critical, no amount of cogitation and foresight could anticipate new results that prior theories and data would not have suggested, or even would rule out, a recent example from physics being dark matter and dark energy.

The example of data trimming is quite trivial but illustrates the need to use the data to guide all aspects of the analysis and inferences drawn from data. Let us say we observe an unexpected finding that could be important, and publish it and a new theory to explain the processes involved. Let us further assume that the scientist overstates the reliability of the result or pretends that the new theory predicted the result and thereby confirms it. The benefit is letting other scientists know something new and potentially important ought to be considered and perhaps followed up with further research. What is the harm? Scientists are skeptical and know many publications misstate and mislead; knowing this, they use their judgment to decide which results and theories are promising enough to pursue. Methodology designed to promote validity rather than promise, one justification for pre-registration, would likely stifle progress.

My bottom line: Science is immensely complex with many avenues toward progress, depending on goals, domain, setting, resources, and much more. Simple methodological rules intended to apply generally, or even most of the time, are misplaced and likely to do more harm than good. Good scientific judgments are needed to deal with the complexities of science and continual new discoveries. Of course those who wish to use pre-registration should be free to do so, whether doing so makes them feel better, or increases the probability that a result or a null result could be accepted for publication, or increases the reader’s belief in the importance of using and following up the findings and conclusions, these and other benefits indeed providing reasons for its use. However, I believe there are strong arguments against general use, some of which I have hinted at in this brief essay.

The Psychonomic Society (Society) is providing information in the Featured Content section of its website as a benefit and service in furtherance of the Society’s nonprofit and tax-exempt status. The Society does not exert editorial control over such materials, and any opinions expressed in the Featured Content articles are solely those of the individual authors and do not necessarily reflect the opinions or policies of the Society. The Society does not guarantee the accuracy of the content contained in the Featured Content portion of the website and specifically disclaims any and all liability for any claims or damages that result from reliance on such content by third parties.

You may also like

1 Comment

  1. Shiffrin’s Complexity of science v. #PSprereg?

    Thanks, Richard, for this insightful essay. By way of introduction, I am a retired computer scientist with a career split between mainstream CS and bioinformatics. I spent about 20 years working with biologists as an expat from my native field of CS. This gave me the opportunity to observe biologists at work without the cultural biases that come from formal training in their field. Of course, I brought cultural biases from my own training that no doubt affected the aspects of biological practice I found noteworthy.

    Your description of the scientific process resonates with my experience in biology. Biology experiments are technically difficult. A student can easily spend a year or more getting an experiment set up and working robustly. By then, the original hypothesis that started the student down the path is long forgotten. The “hypothesis” that makes it into print is a statement that can be demonstrated by the experimental method the student has struggled to master. On top of this, there’s the whole genomics thrust where people don’t even claim to be doing “hypothesis-driven science” but rather use the term “discovery science”.

    The scientific process that I see in biology is about the accumulation of data. The statements called “hypotheses” are attempts to draw succinct conclusions from the data. Authors usually put the “hypothesis” in the introduction (and abstract) of their research papers and then repeat the “hypothesis” with more detail in the conclusion. Students are trained to read papers from the inside out, starting with the experimental methods and data, then working out to the introduction and conclusion. They are trained to critically evaluate the methods and data and to regard the stated “hypothesis” as spin, a narrative usually informed by the professor’s research agenda. The real science is the stuff in the middle; the narrative connects the results to the overall flow of the field.

    My sense from the replication crisis and preregistration literature is that some scientists feel a great angst about the validity of results in their field. Every scientific field has flaws, as do all human endeavors. My experience in biology is that the flaws are deeper than statistics. The validity problems in biology reflect problems in experimental methods, such as, the use of measurement techniques that don’t actually measure what people imagine they do, biological “models”, eg, specialized mice, that are supposed to mimic a human disease or physiology but don’t, analysis methods that are just plain wrong, and more. The focus on statistics and replication and preregistration distracts from the need to identify and fix the deeper flaws.

    That said, I feel strongly that biology is making progress. I can’t prove it, but I can point to advances in medical treatment across a wide range of diseases as well as improvements in agriculture. It may be true as Ioannidis noted in 2005 that “Most Published Research Findings Are False”, but enough are true to allow progress.