Say you are about to hire someone. During the hiring process, you might ask them to pass a drug test. Say 1% of the population uses drugs that are screened in the test, and that the test gets drug users 80% of the time. Also, the test has a 9.6% false positive rate. Now your potential hire just failed the drug test. What are the chances he actually was on drugs? (If you know the math, can you ballpark the result without working it out? If you don’t know the math, what’s your gut feeling? Take a moment to guess it.)
Most people get this sort of question horribly wrong (see this), yet similar situations happen all the time. Bayes’s Theorem is, very loosely, the tool that lets us turn observations of random events into probabilities of something we can’t directly know. While you could always just use the theorem and calculate it out, it would be much nicer to just draw a picture we could look at. This is what I will show you. Let’s start drawing a couple of lines:
The left vertical line represents the false positive rate of the test, while the right vertical line represents what I’m calling the coverage of the test: how many drug users are actually caught when the test is performed. The blue line represents our particular test: the higher the endpoints are, the higher the probability of either a false positive or a “true positive”. We now add information about our population. We are assuming 1% of all people subject to the test use drugs. Then, we draw a vertical line 1% of the way from the left to the right lines. (Note: I’m grossly exaggerating the distance here to make the drawing clearer). We now connect the intersection point to the lower-left corner:
We are now almost ready to determine the probability that your potential employee was actually on drugs. We take note of the angle that the red line makes with the blue line, and draw a new line that touches the right endpoint of the blue line. As you might now have guessed, the chance the person was on drugs is exactly the proportion between the length of the horizontal line and the horizontal segment cut off by the red line:
Notice that even with a grossly exaggerated proportion of the population doing drugs (the vertical dotted line is much farther right than 1% of the way), the actual chance that the person that got caught on the exam was doing drugs is still well below 25%. Did you guess that? Most people guess a much higher value.
There are some cool things about this drawing. First, you can push the lines up or down in your head, and think about what happens to the graph (this is begging for an interactive Java applet). While it is intuitive that a lower rate of false positives will increase the chance that the test caught a real drug user, it is much less so that the proportion of the population that actually uses drugs has any influence on the test. To see why this is important, think of situations where the event we’re looking for is rare, like cancer screening, (or even worse: DNA testing).
This is why I really like visualization. The drawing makes it pretty obvious what’s going on, and it also gives us a way to explore the space – knock the lines around a little, and you can pretty quickly see the result, without having to work out the numbers.
What’s going on behind the scenes? If you know Bayes’ Theorem, the height of the blue line at the intersection with the vertical is exactly , the cotangent of the angle is , and since the vertical line on the right side is , the horizontal segment is . Kinda neat.
What if you actually want a real number, like 8%, which is the number you should get in the above case? For the Bayes’s theorem case, a really cute line drawing with carefully designed scales (which was published in the New England Journal of Medicine!) can solve exactly the problem we’re dealing with here. In general, we end up with nomograms, or nomographs, which are insanely cool and I’ll hope to write about soon.