Why you should check–not just test–your statistical assumptions

Picture this. After a long and effortful process of gathering data for your latest experiment, you sit down in front of the computer, coffee in hand, ready to analyze the results. Before you dive into interpreting your analyses and seeing whether your predictions came to fruition, you decide to check the assumptions for your statistical test of choice. You’re using a linear regression model, so you test whether the residuals are normally distributed, using a Shapiro-Wilk test. You get a p-value above .05, so you conclude your residuals are normal and move on. Yay! But here’s a question–if you were to make a graph plotting the residuals, which of these plots do you think it could look like?

3 graphs of distributions of data compared with normal distributions. Graph A is noisy, with the distribution not exactly matching the normal distribution in a random way. Graph B is skewed, with a larger left tail than the normal distribution has. Finally, graph C is bimodal, with a lack of density in the middle of the distribution. — Which of these distributions do you think is most likely to be deemed “normal” by a Shapiro-Wilk test? Distributions are shown in blue, and the normal distribution is shown in orange. Figure adapted from Shatz (2023) with modification.

Trick question! The answer is all of them. Despite all of these graphs looking non-normal on visual inspection, all of them would be interpreted as “normal” if all you had was the results of a Shapiro-Wilk test. In fact, all of the graphs were specifically created by simulating non-normal distributions.

This example illustrates one issue with traditional approaches researchers take to testing statistical assumptions: sometimes, you can get false negatives–you conclude that the assumption is not violated, but really it is. In this case, just conducting a statistical test of our assumption is misleading–it is only when we check the assumption, in this case by using a graph, that we get the real picture.

A recent article published in the Psychonomic Society’s journal Behavior Research Methods shows the importance of assumption checking while conducting statistical analyses, beyond just assumption testing. Most statistical methods have underlying assumptions, such as normality in linear regression models. Researchers who want to make sure these assumptions are met before beginning their analyses often use null hypothesis significance testing, or NHST. However, author Itamar Shatz, pictured below, argues that relying solely on NHST can lead to several problems.

Photograph of author of the featured article, Itamar Shatz. — *Featured article author, Itamar Shatz.*

What kinds of problems? In addition to the false positives we already discussed, it can also lead to false negatives (determining there is no assumption violation when there really is). Another issue lies in using a false binary to interpret the results of assumption testing. We often interpret these tests using a strict threshold (usually p < 0.05) that tells you if an assumption is met or not. What this doesn’t tell you is the type or magnitude of assumption violation, underscoring another problem of limited descriptiveness, which might lead you to make different decisions as to how to proceed in your analyses.

Another problem, concerning NHST more broadly, is misinterpretation. For example, a p-value might be interpreted as an effect size, so a really small p-value may be interpreted as a large assumption violation, even if that is not truly the case. Finally, these assumption tests often have assumptions themselves, leading to potential testing failure when these assumptions aren’t tested.

What can be done about these problems? According to Shatz, creating visualizations and looking at numerical effect sizes can be helpful when checking assumptions. One method of visualization presented in the article is the lineup protocol, which involves simulating similar distributions which don’t contain the assumption violation and seeing if other people can tell the difference between the original and simulated plots. If few people can pick out the original plot, then the violation is likely not very severe. Other ways to use visualizations in your assumption-checking include looking at examples of plots with assumption violations to calibrate your expectations and showing them to unbiased experts to help interpret the plots.

In the article, Shatz outlines some practical recommendations for assumption checking. Here, we summarize a few:

Keep these issues with assumptions testing in mind when doing your analysis.
Distinguish between assumption testing (using statistical tests) and assumption checking (a broader method of assessment that can include visualizations and numerical effect sizes, along with statistical tests).
Clearly explain the thinking behind your analysis decisions, such as what assumptions you checked, how you checked them, and why.
Keep in mind that assumption violations are a complex spectrum, not a pass/fail binary. Consider multiple aspects of a violation, including what kind of violation, how severe it is, and how it might affect your specific analyses.

These are just a sampling of the many tips highlighted in Shatz’s article. So next time you find yourself with some fresh new data in hand, be sure to properly investigate your assumptions to maintain the integrity of your results.

Webcomic of a group of three people meeting at a table under the banner “assumption club.” The leader of the meeting begins by saying “so, I think we all know why we’re here.” — *We all know the dangers of making assumptions…Hopefully, you now know the dangers of not checking them. Source: https://www.amiiillustrates.com/gallery*

Featured Psychonomic Society Article

Shatz, I. (2023). Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics. Behavior Research Methods, 1-20. https://doi.org/10.3758/s13428-023-02072-x

Authors

Raunak Pillai

Raunak Pillai is a doctoral candidate in the Department of Psychology and Human Development at Vanderbilt University, advised by Dr. Lisa Fazio. He studies the psychological mechanisms by which people come to believe true and false information about the world, with a specific focus on the role of memory processes.
View all posts
Kelly Cotton

Kelly Cotton is a doctoral candidate in Cognitive and Comparative Psychology at the City University of New York. Her research interests include working memory, long-term memory, and how these functions differ in various populations.
View all posts

Why you should check–not just test–your statistical assumptions

Authors

You may also like

Prediction, pre-specification and transparency

When in doubt, keep it Gaussian

Bayes prevails in implicit learning categorization and beyond