© Sebastian
I'm very interested in how doctors think. How do we use the information gained from talking to and examining a patient to reach a reasonable list of likely diagnoses (a so called "differential")? When we order a test, what specifically are we looking for, and how will we react to the result that comes back? More cynically, I'm curious about the extent to which we understand what the test result actually means. And what are the odds that we will make a correct decision based on the answer we get back?

I think that anyone who has even a partial understanding of what doctors do understands that the practice of medicine, although based on scientific knowledge, isn't a science. Rather it is an art form. And as with all art forms, there are those who excel, and those who plod along, occasionally producing something nice or useful. Most people are probably aware of the fact that if you go to five different doctors with a problem, there is a significant probability that you will get five different answers. Medicine is so complex, with so many different variables to consider, and doctors themselves are so varied in terms of how they think and what they know, that the end result of any one consultation will often vary wildly.

One of the things that always needs to be estimated in any individual consultation is probability. What is the probability that the breast lump is cancer? What is the probability that the fever is due to a serious bacterial infection? When faced with these questions, I think most doctors are more like an experienced chess player than a robot. They act on a feeling, not on a conscious weighing of probabilities. Doctors with a nervous disposition therefore order more tests and prescribe more antibiotics, while those with a more relaxed disposition order fewer tests and prescribe fewer antibiotics.

But how good is the average doctor?

That is what a study recently published in JAMA Internal Medicine sought to find out. The study was conducted in the United States, and funded by the National Institutes of Health. 492 physicians working in primary care in different parts of the United States filled in a survey, in which they had to estimate the probability of disease in four different common clinical situations, both before and after a commonly used test.

The situations were mammography for breast cancer, x-ray for pneumonia, urine culture for urinary tract infection, and cardiac stress testing for angina. For each scenario, the physicians were provided with a vignette detailing the situation and providing information on the age, gender, and underlying risk factors of the patient. Based on this they were asked to estimate the probability of disease before the test and then after the test, in both a situation where the test came back positive and one where the test came back negative. Here's an example from the survey:

Ms. Smith, a previously healthy 35-year-old woman who smokes tobacco presents with five days of fatigue, productive cough, worsening shortness of breath, fevers to 102 degrees Fahrenheit (38.9 degrees centigrade) and decreased breath sounds in the lower right field. She has a heart rate of 105 but otherwise vital signs are normal. She has no particular preference for testing and wants your advice.
How likely is it that Ms. Smith has pneumonia based on this information? ___%
Ms. Smith's chest X-ray is consistent with pneumonia. How likely is she to have pneumonia? ___%
Ms. Smith's chest X-ray is negative. How likely is she to have pneumonia? ___%

The average age of the participants was 32 years, and they had been in practice for an average of three years. In other words, these were mostly young doctors who had recently graduated medical school. It is reasonable to think that they would do better on this type of test than older doctors, since what they were taught in medical school is still relatively fresh in their memories and is also more updated and correct. Additionally, medical school today emphasises probabilistic thinking and concepts like sensitivity and specificity far more than it did in the past.

So, what were the results?

In the pneumonia scenario, the doctors overestimated the pre-test probability of pneumonia by 78%. In other words they thought the likelihood that the patient had pneumonia was almost double what it actually was. Not good. Unfortunately, that was their best performance. When it came to angina, they overestimated the pre-test probability by 148%. When it came to breast cancer, they overestimated the pre-test probability by 976% (i.e. they thought it was ten times more likely than it actually was). And when it came to the urinary tract infection scenario, they overestimated the pre-test probability by 4,489%! (i.e. they thought it was 45 times more likely than it actually was).

Doh! What are doctors being taught in medical school these days?

What I think is particularly interesting here is that the error was always in the same direction - in each of the four scenarios the doctors thought that the disease was more likely than it is in reality. If this reflects real world outcomes, then that would mean that doctors probably engage in an enormous amount of over-treatment. Obviously, if you think a patient likely has a urinary tract infection, you're going to prescribe an antibiotic. And if you think a patient likely has angina, you're going to prescribe a nitrate. You might even refer the patient for some kind of interventional procedure.

To be fair, this study was conducted in the overly litigious United States. Doctors who know that they are likely to face lawsuits if they miss a diagnosis are probably going to over diagnose and over-treat. But my personal experience tells me this is not just a US-based problem. I've seen plenty of patients here in Sweden with asymptomatic colonization of their urinary tract prescribed unnecessary antibiotics, to take just one example. I think the over-estimation has more to do with cognitive bias than with fear of litigation. Once you anchor on a diagnosis, say pneumonia in someone with a fever and a cough, you will almost certainly overestimate the probability of that diagnosis.

Let's move on. When it comes to how much a test changes estimation of probability, the doctors overestimated the effect of a positive lung x-ray by 92%, of a mammography by 90%, and of a cardiac stress test by 804%! They were relatively on the mark, however, when it came to estimating the impact of a positive urine culture, only overestimating by 10%.

When it comes to how much a negative test changes the estimation of probability, the doctors actually did ok, being close to the mark for both the chest x-ray, urine culture, and cardiac stress test, but wildly underestimating the predictive value of a negative mammogram (in other words, they thought breast cancer was far more likely than it actually was after getting back a negative mammogram, so again, they overestimated the probability of disease).

What can we conclude from this? Doctors have a pretty poor understanding of how the tests they use influence the probability of disease, and they heavily overestimate the likelihood of disease after a positive test. They are however generally better at understanding the impact of a negative test than they are at understanding the impact of a positive test.

Finally, the survey asked the doctors to consider a hypothetical scenario in which 1 in 1,000 people has a certain disease, and estimate the probability of disease after a positive and negative result for a test with a sensitivity of 100% and a specificity of 95%. Sensitivity is the probability that a person with the disease will have a positive test result. Specificity is the probability that a person without the disease will have a negative test result.

Regular readers of this blog will have no problem figuring this out. If you test 1,000 people, you will get one true positive (since the sensitivity is 100% you will catch every single positive case) and 50 false positives (with a specificity of 95% that means five false positives per 100 people tested). The odds of any one person with a positive test actually having the disease will thus be roughly 2% (1/51). So what did the doctors answer?

The average doctor in the study thought that the odds of a person with a positive test actually having the disease was 95%. In other words, they overestimated the probability by 4,750%!

Apart from that, they thought that a person with a negative test still had a 3% probability of disease, even though the sensitivity was listed as 100% (which means that the test never fails to catch anyone with the disease). Oops. I should add that there were no meaningful differences in how correct the answers were between attendings (more senior doctors) and residents (more junior doctors).

What can we conclude?

Doctors suck at estimating the probability of common conditions in scenarios they face on a daily basis, are not able to correctly interpret the tests they use, and don't understand even very basic diagnostic testing concepts like sensitivity and specificity. It's kind of like a pilot not being able to read an altitude indicator. Be afraid. Be very afraid.

Medical schools should be thinking long and hard about the implications of this study. What it tells me is that medical education needs a massive overhaul, on par with the one that happened a hundred years ago after the Flexner report. We don't send pilots up in to the air without making sure they have a complete understanding of the tools they use. Yet that is clearly what we are doing when it comes to medicine. Admittedly the practice of medicine is much more complex than flying a plane, but I don't think that changes the fundamental point.