Picture this. After a long and effortful process of gathering data for your latest experiment, you sit down in front of the computer, coffee in hand, ready to analyze the results. Before you dive into interpreting your analyses and seeing whether your predictions came to fruition, you decide to check the assumptions for your statistical test of choice. You’re using a linear regression model, so you test whether the residuals are normally distributed, using a Shapiro-Wilk test. You get a p-value above .05, so you conclude your residuals are normal and move on. Yay! But here’s a question–if you were to make a graph plotting the residuals, which of these plots do you think it could look like?
Trick question! The answer is all of them. Despite all of these graphs looking non-normal on visual inspection, all of them would be interpreted as “normal” if all you had was the results of a Shapiro-Wilk test. In fact, all of the graphs were specifically created by simulating non-normal distributions.
This example illustrates one issue with traditional approaches researchers take to testing statistical assumptions: sometimes, you can get false negatives–you conclude that the assumption is not violated, but really it is. In this case, just conducting a statistical test of our assumption is misleading–it is only when we check the assumption, in this case by using a graph, that we get the real picture.
A recent article published in the Psychonomic Society’s journal Behavior Research Methods shows the importance of assumption checking while conducting statistical analyses, beyond just assumption testing. Most statistical methods have underlying assumptions, such as normality in linear regression models. Researchers who want to make sure these assumptions are met before beginning their analyses often use null hypothesis significance testing, or NHST. However, author Itamar Shatz, pictured below, argues that relying solely on NHST can lead to several problems.
What kinds of problems? In addition to the false positives we already discussed, it can also lead to false negatives (determining there is no assumption violation when there really is). Another issue lies in using a false binary to interpret the results of assumption testing. We often interpret these tests using a strict threshold (usually p < 0.05) that tells you if an assumption is met or not. What this doesn’t tell you is the type or magnitude of assumption violation, underscoring another problem of limited descriptiveness, which might lead you to make different decisions as to how to proceed in your analyses.
Another problem, concerning NHST more broadly, is misinterpretation. For example, a p-value might be interpreted as an effect size, so a really small p-value may be interpreted as a large assumption violation, even if that is not truly the case. Finally, these assumption tests often have assumptions themselves, leading to potential testing failure when these assumptions aren’t tested.
What can be done about these problems? According to Shatz, creating visualizations and looking at numerical effect sizes can be helpful when checking assumptions. One method of visualization presented in the article is the lineup protocol, which involves simulating similar distributions which don’t contain the assumption violation and seeing if other people can tell the difference between the original and simulated plots. If few people can pick out the original plot, then the violation is likely not very severe. Other ways to use visualizations in your assumption-checking include looking at examples of plots with assumption violations to calibrate your expectations and showing them to unbiased experts to help interpret the plots.
In the article, Shatz outlines some practical recommendations for assumption checking. Here, we summarize a few:
- Keep these issues with assumptions testing in mind when doing your analysis.
- Distinguish between assumption testing (using statistical tests) and assumption checking (a broader method of assessment that can include visualizations and numerical effect sizes, along with statistical tests).
- Clearly explain the thinking behind your analysis decisions, such as what assumptions you checked, how you checked them, and why.
- Keep in mind that assumption violations are a complex spectrum, not a pass/fail binary. Consider multiple aspects of a violation, including what kind of violation, how severe it is, and how it might affect your specific analyses.
These are just a sampling of the many tips highlighted in Shatz’s article. So next time you find yourself with some fresh new data in hand, be sure to properly investigate your assumptions to maintain the integrity of your results.
Featured Psychonomic Society Article
Shatz, I. (2023). Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics. Behavior Research Methods, 1-20. https://doi.org/10.3758/s13428-023-02072-x