I see four benefits to the use of Bayesian inference:
- Inclusion of prior information.
- Regularization.
- Handling models with many parameters or latent variables.
- Propagation of uncertainty.
Another selling point is a purported logical coherence – but I don’t really buy that argument so I’ll forget that, just as I’ll also set aside philosophical objections against the use of probability to summarize uncertainty.
We’re concerned here with practicalities, not philosophy, and, although I do believe that the philosophy of statistics can be of applied importance (see, for example, Gelman and Hennig, 2017), we have enough directly practical issues to discuss here.
By mentioning the above four benefits, I’m not trying to say that Bayes is the right way to go, or the only way to go, or even the best solution in particular data analysis problems in psychology or elsewhere.
Rather, by laying out these four advantages, I’d like to separate them and consider ways in which various non-Bayesian methods can be used to deliver the same or similar benefits:
- Prior information. Bayesian inference includes priors directly and easily. There are, however, ways in which non-Bayesian analyses incorporate prior information: (a) Design. It is considered acceptable and even desirable to use plausible pre-data estimates of effect sizes and variation to set design parameters and sample size. (b) Determination of type M (magnitude) and S (sign errors). We can use effect size estimates obtained independently of the data to assess reporting biases; see Gelman and Carlin (2014). Indeed, there are settings in which prior information is so strong that a Bayesian or a non-Bayesian analysis can reveal the futility of a classical confidence interval or hypothesis test (see for example Gelman and Weakliem, 2009).
- Regularization. Bayesian inference with an informative prior gives more stable parameter estimates and predictions, compared to the corresponding inferences from least squares, maximum likelihood, or other traditional statistical procedures. Again, though, we can ask whether newer non-Bayesian methods can achieve the benefits of regularization, and again the answer is yes: methods such as lasso regression and false discovery rate analysis extend the classical ideas of point estimation and hypothesis testing to perform regularization, yielding stable inferences even in the limit of increasing number of parameters to estimate or hypotheses to test.
- Models with many parameters or latent variables. Making use of the rules of probability, Bayesian inference can work when the dimensionality of parameters and latent data is large. Machine learning approaches such as deep networks can give good inferences in high dimensions too, but by using ideas that are very close to Bayesian, using tools such as variational inference to average over the distribution of latent parameters.
- Propagation of uncertainty from inference to decision. Bayesian decision analysis requires the user to specify a cost or utility function as well as a prior distribution, but the resulting coherent process is difficult to construct using any existing non-Bayesian approaches. Alternatives such as p-value thresholds do not do the job (see McShane et al., 2017). And, for that matter, approaches to Bayesian decision making that avoid explicit cost or utility functions–here I’m thinking of rules based on Bayes factors or posterior probabilities–also fail to be coherent, and in my opinion do not have good practical value. Bayesian inference without real priors can still be useful–it still allows propagation of uncertainty and can perform some minimal level of regularization–but Bayesian decision making without some measures of costs and benefits does not make sense to me.
In summary, I think Bayesian methods are helpful in psychology and many other applied fields. Adding prior information can be crucial in constructing inferences that make sense; even weak prior distributions can usefully regularize; in any case, Bayesian inference can handle high-dimensional uncertainty; and Bayesian posterior probabilities can be mapped into decisions, in which case it makes sense to check model fit and carefully examine the assumed cost or utility functions.
To varying degrees, many of the benefits of Bayesian inference can be reaped using existing non-Bayesian approaches. This statement is not meant as a disparagement of Bayesian inference, as the statement could be flipped around to read that many of the benefits of various popular statistical methods can be reconstructed using the Bayesian calculus in an implementation that for many applications is modular and transparent.
References
Efron, B., & Hastie, T. (2016). Computer Age Statistical Inference. Cambridge University Press.
Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science, 9, 641-651.
Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics (with discussion and rejoinder). Journal of the Royal Statistical Society A 180, 967-1033.
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2017). Abandon statistical significance.