To create social good, psychology needs credible evidence

Authors: Patrick S. Forscher*, Simine Vazire*, and Farid Anvari*

In his 1969 address, former APA president George Miller issued a challenge: psychologists should “give psychology away” by using their science to solve social problems. Miller argued that psychology is relevant to everything that people do, giving it enormous potential to create social good.

Yet Miller also noted a yawning gap between psychological science and the real-world social problems it seeks to solve. Whereas “harder” disciplines like physics had created a clear division of labor between the scientist and the practitioner (e.g., the physicist vs the engineer), psychology had not.¹ The result was that, whereas physics had demonstrable working technologies based on the principles it had uncovered, such as electricity, cars, and airplanes, psychology had no such technology. These comparisons to physics undoubtedly had a peculiar resonance to Miller’s audience: psychology has long aspired to be taken as seriously as its harder cousins.

More than fifty years later, the world faces a global crisis of enormous scope. Moreover, because the responses to the crisis depend on the coordinated action of many people, this is a crisis that psychological scientists ought to be particularly suited to address. Is this the time that psychological scientists can save countless lives and prove that the discipline is ready for application?

Maybe not.

Sound applications require sound science. Unfortunately, the fifty years since Miller’s speech have revealed that the foundations of many areas of psychological science are not as sound as we might like. Many of the problems that afflict the credibility of our evidence were known, at least in part, in Miller’s time, but they have received renewed focus as part of psychology’s current movement to re-examine its evidence-base. In the next section of this post, we will focus on three of these problems: poor replicability, unknown generalizability, and unknown measurement validity. These are just a subset of the large number of challenges psychology faces, including weak theory, dubious causal inference, and a lack of engagement with the stakeholders and communities affected by our research. These problems can and should affect our belief in the credibility of our science. Moreover, these problems raise the serious possibility that many direct applications of psychological findings to novel situations, like the COVID-19 pandemic, are unwise, and in the worst case, disastrous. In the second half of this post, we describe a system we call the Evidence Readiness Levels (ERLs) that will help us clearly identify and communicate which findings are ready for application, and what needs to be done to move up the evidence readiness ladder.

Threats to psychology’s credibility: Replicability, generalizability, and measurement validity

Plenty of research shows that at least some findings in psychology lack replicability. Most findings have not been subjected to direct replication attempts, so we don’t know which findings are replicable, nor which may be directionally replicable but misleading because the size of the reported effects is inflated by questionable research practices. Even when we are confident about the existence and magnitude of an effect, our designs do not typically rule out the possibility that the effect is produced by idiosyncratic and theoretically-irrelevant aspects of the sample, stimuli, operationalizations, or settings. In one particularly compelling demonstration of this problem, fifteen research teams designed and executed studies to answer five original research questions. The resulting effects varied so widely in magnitude and direction as to raise questions about the validity of conclusions based on any single research paradigm.

Even when we know a finding is replicable, we do not necessarily know if the finding generalizes to other contexts. This knowledge is crucial if we want to make claims that our findings apply to a setting of interest. For example, are the results specific to Western Educated Industrialized Rich Democratic cultures, or would they hold up for different populations of people? Does a lab-based finding in which, for example, stressful experiences are mimicked by having participants undergo a stressful laboratory task, generalize to actual stressful experiences in everyday life? The challenge of establishing that our results can be applied to other populations, concepts, or contexts is enormous.

Another major obstacle for confidence in our results is the threat of poor measurement validity. While the problems of replicability and generalizability have received the lion’s share of research attention, poor measurement validity can seriously undermine the credibility of the conclusions that we can draw from a piece of research. In social psychology, many measures are created ad hoc with no supporting validity evidence. Even when measures are validated, the validations are often narrow—addressing one aspect of construct validity but ignoring others. But when we tell non-experts that we’ve measured something, they’re likely to assume that we’ve used validated measures, that we’ve truly measured the thing we say we’ve measured, particularly given our authority as quantitative scientists. And yet, without validity evidence we just don’t know. We usually don’t even know whether the results of a study would remain the same if a different measure of the same construct was used. Without evidence of measurement validity we can’t confidently apply our insights in a crisis-situation without seriously considering the potential for negative consequences.

Assessing the costs and benefits of real-world applications

The three issues that we reviewed surely impact psychology’s credibility, and there are many more threats than we had the space to review here. These threats to psychology’s credibility are devastating for our prospects of rising to Miller’s challenge of giving psychology away by creating “psychological technology”.

To see why, consider the perspective of a policymaker. Policymakers operate in a resource-constrained world. “Resource” here means money, of course, but it also means attention, human resources, and time. For this reason, most policymakers perform a formal or informal cost-benefit analysis: how can I get the greatest policy benefit at a cost that my limited budget, staffing, and time can absorb? This analysis must also consider opportunity costs (does pursuing one policy constrain resources enough that it closes the door to other policies with similar or greater benefits?) and side effects (does the policy have unintended costs when deployed in real world settings?). From the perspective of this policymaker, doubts about the expected benefits of a policy due to poor replicability, generalizability, and measurement validity are a veritable kiss of death.

How can we do better?

The challenges psychology faces on the road to application are formidable. Yet we see potential for psychology to make some improvements so that it can meet and overcome these challenges.

First, psychologists can clearly and transparently communicate areas of uncertainty that may obstruct the path to application. One way to communicate this uncertainty accurately is to use systematic frameworks to judge the application-worthiness of our findings. We have proposed one such framework, which we call the Evidence Readiness Levels (ERLs). The ERL framework outlines a series of steps to follow from theory to application and highlights areas of uncertainty at each Evidence Readiness Level. These steps are by no means exhaustive, nor do we have any illusions that our framework is perfectly applicable to all subdisciplines. However, we believe the framework can help highlight gaps between research and application, which helps practitioners accurately weigh the costs and benefits of potential evidence-based policies.

Second, psychologists can systematically improve the quality of the evidence base. We see particular promise in “team science” approaches that pool the resources of multiple labs into one large study. These team-based approaches vastly scale up the resources that can be invested in a single study beyond what could be easily achieved by one lab. The larger scale of resources allows an idea to be tested in a variety of settings, cultures, and measures. From the perspective of the ERL framework, this rigorous testing allows ideas to rapidly ascend to higher levels of evidence readiness.

Third, psychologists can embrace the role of “organized skepticism” (Merton, 1942) in identifying flaws in past evidence. Organized skepticism is one of the central norms of science precisely because the ideas that survive such skepticism are tested with sufficient severity that they can be trusted. Here we believe the entire scientific ecosystem can play a role. Publishing scientists can actively seek criticism during the research process, either by submitting their articles as Registered Reports, or by implementing their own “internal review” / “red team” system to directly incentivize outsiders to spot flaws in their own projects. Journal editors and reviewers can ensure that published articles clearly identify constraints on generality and other areas of uncertainty, perhaps using a framework like the ERLs. Individual scientists can make a practice of conducting post-publication peer review using, for example, commenting platforms like PubPeer, Twitter, and hypothes.is. The building blocks of a highly robust system of organized skepticism are already present in our scientific ecosystem – we just need to use them.

Conclusion

The challenges of applying psychological findings are formidable. But the mere difficulty of these challenges does not absolve scientists from dealing with the messiness of the “real world”. As Miller noted in his address, “difficulty is no excuse for surrender”. Ensuring that psychology fulfills its great potential will require hard work, honesty, and a dash of humility about what we do and do not understand.

—–

*All authors contributed equally. We determined order with the following R code:

seed <- 3333

set.seed(seed)

order <- sample(c(“simine”, “farid”, “patrick”)

print(order)

¹ Some fields of psychology (e.g., some parts of clinical psychology, education psychology, community psychology, industrial/organizational psychology) are more applied than others, developing interventions based on research findings, and then implementing in the field and evaluating them.

Authors

Patrick Forscher

Patrick Forscher studies social disparities and what to do about them. He also has strong interests in statistics and research methods and open and reproducible science.
View all posts
Simine Vazire

Simine Vazire a social and personality psychologist who studies how self-perception and self-knowledge influence one's personality and behavior. She is also interested in research methods and factors that affect the validity and replicability of psychological research.
View all posts
Farid Anvari

Farid Anvari researches the impact of hierarchical decision structures on performance in explore-exploit search tasks and how competition affects performance and search behavior. He also works on the impact of psychology’s replicability crisis on people’s trust in psychological science and on finding methods to determine the smallest effect size of interest for specific research questions.
View all posts

To create social good, psychology needs credible evidence

Authors

You may also like

A recipe for moving your physical lab to the online lab

From classical to new to real: A brief history of #BayesInPsych

#whatWM? Playing ‘telephone’ with working memory or the War of the Ghosts 2.0