#goCRPI: Bayes battling baserate neglect in medical diagnosis

You are an intern in the premier hospital in Tierra del Fuego and you are seeing about 240 patients daily, who always suffer from one of two possible diseases (things are a little different in Tierra del Fuego), namely meowism or barkosis. The tricky thing is that the same symptoms are associated with both diseases, albeit to very different extents. Fortunately, a painless swab can confirm your initial diagnosis but that takes time and you would like to prescribe the appropriate remedy on the spot, to alleviate the patient’s suffering. So you had better learn how to do the diagnosis even before the lab results come in.

During the morning of your shift you’ve examined dozens of patients, and a partial record of their symptoms is this (the diagnosis being confirmed by the swab later that day):

Fever, Swelling, Shiver                    meowism

Rash, Shiver                                    barkosis

Fever, Shiver, Rash                         meowism

Now suppose that in 60% of all patients for whom meowism turns out to be the correct diagnosis, a fever has been present, whereas only 20% of all barkosis sufferers ran a fever.

Are you justified to conclude that fever is particularly diagnostic of meowism?

Perhaps surprisingly, the answer is no, even though participants in experiments reliably—and erroneously—conclude that Fever is a highly diagnostic cue.

How can fever not be diagnostic given that it occurs 3 times as often with meowism than barkosis? The answer is given by the additional fact that barkosis is far more common than meowism: 75% of all patients suffer barkosis compared to only 25% meowism overall. So at the end of your shift, once you’ve seen all 240 patients, you will have encountered an equal number—namely, 36—who have fever and meowis or fever and barkosis. In other words, knowing that someone has a fever tells you nothing in reality: you might as well toss a coin, your probability of making a correct diagnosis would be the same.

And yet most people are quite certain that a fever indicates meowism after they participate in an experiment that mimics this diagnostic situation.

This phenomenon has been widely replicated in many circumstances and is known as base-rate neglect: People neglect to consider the fact that in the absence of any symptomology, a person is far more likely to suffer from barkosis than meowism.

Far from being an esoteric phenomenon that is only studied in the laboratory and of interest to cognitive scientists, base-rate neglect is of real concern to medical practitioners and those who teach them. Theodore E. Woodward, the famous American medical researcher and diagnostician, is credited with coining the celebrated aphorism “When you hear hoofbeats behind you, don’t expect to see a zebra.”

Expressed more formally, using the famous Bayes Theorem, the likelihood of a disease being present after all the symptoms have been considered is a combination of the prior probability of that disease (is it a horse or a zebra?) and the likelihood ratio of the combinations of symptoms for the candidates under consideration.

This is easily said and understood mathematically, and sometimes people are exquisitely tuned to Bayes Theorem even under rather exotic circumstances. But do medical practitioners conform to Bayes Theorem? Do they see zebras or horses when the hooves are bearing down on them?

One of the articles in the first issue of the Psychonomic Society’s newest journal, Cognitive Research: Principles and Implications, investigated this issue and asked whether doctors are appropriately informed by base rates in their medical decision making.

Researchers Ben Rottman, Mica Prochaska, and Roderick Deaño presented residents (in a Chicago-area hospital, not Tierra del Fuego) with 5 vignettes that outlines the cases of hypothetical patients admitted to the general medicine service.

One variable of interest was the effect of seniority or experience (measured by residency year) on prevalence judgments concerning potential diseases, and how that prevalence judgment would be associated with diagnostic judgments.

To illustrate, the residents might be confronted with this vignette:

“A 61-year-old woman was admitted with acute-onset confusion and bilateral knee pain. Four days before admission, she had been examined for evaluation of frontal sinus headaches, and a viral syndrome had been diagnosed. Her medical history, including drug allergies, was unremarkable. On physical examination, the patient was delirious and febrile (temperature, 39.3° C). Her pulse rate was 90/min, blood pressure was 130/70 mm Hg, and a grade 2/6 apical pansystolic cardiac murmur was noted. The patient’s knee joints were warm and painful, with small bilateral effusions (more extensive on the right than on the left). A homonymous left-sided visual field deficit was present, and she displayed fluent aphasic speech errors. No other neurologic signs were evident, and the rest of the physical examination findings were unremarkable.”

So there.

Now, is this patient suffering (a) Transient Ischemic Attack or Stroke; (b) Encephalitis; (c) Connective Tissue Disease; (d) Infective Endocarditis; (e) Bacteremia; or (f) none of the above?

After reading each vignette, the participants in the study by Rottman and colleagues judged the posterior probability of each potential diagnosis, subject to the constraint that the probabilities had to sum to 100% across the available options.

Having thus “diagnosed” the 5 “patients”, participants next indicated the prevalence of each diagnosis. That is, participants would rate the percentage of patients presenting to the hospital who would suffer from encephalitis or bacteremia and so on, for the full set of 21 options used across the 5 vignettes.

The results are readily summarized: First, there was a statistically significant association between residents’ prevalence judgments and their diagnostic likelihood judgments. In other words, residents did not ignore base rates: if a disease was thought to be more prevalent, it was also more likely to occur as a diagnosis.

Second, as experience of the participants increased across the 3 years of residency, their prevalence estimates became more precise, as reflected in the reduction of the standard deviations of their estimates. Thus, whereas first-year residence might disagree considerably amongst themselves of whether 1% or 10% of patients suffered from bacteremia, by the third year of residency they might all agree on 5-6% (or whatever value they settled on).

Third, even though the precision of prevalence estimates increased with experience, the mean deviation of those estimates from the actual base rates of diseases changed little. In other words, the average accuracy of residents did not improve, but they all moved closer to the average as they gathered more experience.

Finally, participants’ prevalence judgments were highly correlated with the actual prevalence of the diseases, with individual correlation coefficients ranging from .41 to .82 (mean around .60).

Rottman and colleagues conclude that residents are sensitive to base rates in their diagnostic decision making, a finding that they consider to be “comforting”. Given that diagnostic errors are a major contributor to poor patient outcomes, it is indeed comforting to know that residents do not fully neglect base rates and are more likely to think of a horse than a zebra when they hear hoofbeats.

One zebra in the ointment is that some residents did not have very accurate knowledge of the base rates, which could lead to errors in diagnosis. Some residents may think that zebras are no less common than horses.

Article focused on in this post:

Rottman, B. M., Prochaska, M. T., & Deaño, R. C. (2016) Bayesian reasoning in residents’ preliminary diagnoses. Cognitive Research: Principles and Implications. DOI 10.1186/s41235-016-0005-8.

You may also like