Murky Research

No computers today, but some interesting - and important - math. (And, happy Canada Day, Canadians!)

"Car Talk" is a popular weekly phone-in program that has been on National Public Radio for several decades now, in which Bostonian brothers Tom and Ray crack wise and diagnose car (and relationship) problems. On many programs they feature a "puzzler". Here's the puzzler from a couple of weeks ago, which I reprint in full:

Tim and Jethro were happy to have their jobs at the new self-serve gas station in town. And, since the Farmer's Almanac had predicted this to be the coldest winter since the last ice age, they were happy to be working indoors, while the customers pumped their own gas.

This station was so modern that it had a video camera for each of the pumps, and a TV monitor that would show the rear of everyone's vehicles as soon as they pulled up to the pumps.

When the boredom of their jobs finally set in, Tim and Jethro began playing a little game. The game involved trying to figure out which customers had pulled up to a pump with the fuel door on the wrong side-- that is, facing away from the pump.

Now, they couldn't see the cars pull in to the gas station. The video cameras were only aimed at the back of the vehicles. So, there was no time during which they could see the side of a vehicle where the fuel door was located. They could only see the vehicle after it was in position to refuel.

They had to make their bets before the driver shut off the key and exited the vehicle-- before he dope slapped himself for pulling in on the wrong side.

Jethro was correct 99 percent of the time. Tim was correct about 50 percent of the time, because he was just guessing.

What did Jethro know that enabled him to tell when a driver had pulled up to the pump with the fuel door facing the wrong way?

When I heard that puzzler, the answer was immediately obvious to me. Imagine my surprise when the answer that was immediately obvious to me was not the answer Ray gave this past Saturday! The answer given by Ray was that in almost all cars, the tailpipe is on the opposite side of the car from the fuel door. Jethro knows that if a car pulls up with their tailpipe on the same side as the pump then they're on the wrong side. Jethro is 99% accurate because almost all cars you actually see on the road follow this pattern; very few cars have a tailpipe in the middle, have two tailpipes, have the fuel door in the middle, have the tailpipe on the same side as the door, or other anomalies.

Now, I'm not willing to go as far as Tommy sometimes does and call BOOOOOOOOOOOGUS! on this one; I believe them that this is a 99% reliable heuristic for deducing the side the fuel door is on, and that if Jethro knew that, then Jethro could consistently defeat Tim. (I note that we are not given the parameters of how Tim is guessing, but we have reason to assume that Tim is guessing by some process akin to flipping a fair coin.) Clearly, when Jethro and Tim play this game, about half the games are going to be ties (because Tim is guessing right at random), but of the non-ties, Jethro is going to win most of the time. I'm not disputing that.

However, there is an extremely important aspect of this analysis which has been ignored. The percentage of drivers who drive up to the wrong side we know from experience is low. The only times I have driven up to the wrong side of the pump and had to back out is when in a borrowed or rented car (and many cars now have an arrow on the fuel gauge telling you where the fuel door is.) The vast majority of customers will pull up to the correct side. This additional fact, not given in the statement of the puzzler but reasonably assumed, changes the analysis.

Let's call a car which pulls up to the wrong side a "positive" (for reasons which will become apparent later) and a car which pulls up to the correct side a "negative". Jethro and Tim each make a prediction of whether a given car is going to be a positive or a negative. Let's call a car whose fuel door side can be correctly predicted by Jethro to be a "normal" car and one which cannot, an "unusual" car.   

Let's assume that the percentage of "positive" drivers is 1%. Notice that this is the same percentage of cars that Jethro's heuristic predicts incorrectly. I've made the assumption that these percentages are about the same deliberately but really all that matters is that they are both small. We'll do a bit of fiddling with the numbers later to see what happens when we change those around. But for now, just assume that it is a coincidence that the rate of boneheaded drivers is roughly the same rate as the number of cars that Jethro cannot correctly predict the side of the fuel door.

Given that reasonable assumption, Jethro does not need to have his fancy heuristic in the first place! Suppose we replace Jethro with Bob, who always bets negative, regardless of the position of the tailpipe. This strategy is 99% accurate because 99% of the drivers are negatives!

That was the solution that was immediately obvious to me: deny the premise of the question. You can defeat Tim's coin-flipping random strategy by simply observing that negatives vastly outnumber positives; it is reasonable to always bet on negative.

Suppose Bob and Tim play this game a million times  (it's a long winter in Boston) and Bob uses the "always negative" strategy, which, as we know, will be accurate 99% of the time solely on the basis of distribution of negatives vs positives. On average there will be 990000 negatives and 10000 positives. Bob will predict the negatives with 100% accuracy and Tim will predict them with 50% accuracy. So for those 990000 cases, Bob wins 495000 times, ties 495000 times, and loses never. Bob will predict the positives with 0% accuracy, Tim will predict them with 50% accuracy. So for those 10000 cases, Bob wins never, ties 5000 times, and loses 5000 times. Of the million games, Bob wins 495000 times, ties 500000 times, and loses 5000 times. As we'd naively expect, Bob wins 99% of the games which are not ties. 

Now suppose Jethro and Tim play this game a million times and Jethro uses his fancy tailpipe strategy, which is 99% accurate on the basis not of distribution of negatives, but on ability to detect normals. The analysis appears to be just the same as before: There will be 990000 normals and 10000 unusuals. Jethro will predict the normals with 100% accuracy, the unusuals with 0% accuracy, blah blah blah, and will win 495000 times, tie 500000 times and lose 5000 times, same as Bob.

This demonstrates my point that Jethro doesn't need his fancy-pants strategy to beat Tim. Bob and Jethro both beat Tim exactly the same number of times, on average, with their respective strategies. But we can look deeper at Jethro's strategy:

Of those million trials, 99% of them will be normals. Of those normals, 99% of them will be negatives. Working out the percentages, on average we should see:

980100 negative normals - Jethro predicts negative, correctly
9900 positive normals - Jethro predicts positive, correctly
9900 negative unusuals - Jethro predicts positive, incorrectly

100 positive unusuals - Jethro predicts negative, incorrectly.

Holy smackers! Jethro predicts positive 19800 times and is wrong 50% of the time, even with an overall 99% accurate heuristic! Considering only those cases where Jethro says positive, he is on average no more accurate than Tim, who is flipping a coin.

Now suppose instead of yokels trying to predict whether a driver is boneheaded, we have three doctors each trying to predict whether you have Tappet's Disease. Suppose further that 99% of the population is negative for Tappet's Disease: they do not have it. Dr. Jethro has a test for Tappet's Disease which is 99% accurate. (People for whom the test works are "normals", and 99% of people are normals.) Dr. Bob doesn't even bother to diagnose you, he just says "you're negative" every time. Dr. Tim flips a coin.

Suppose you go to all three doctors and they all say "you're negative". Remember, Dr. Bob and Dr. Jethro are both accurate 99% of the time, but Dr. Tim is only accurate 50% of the time. Clearly you have learned absolutely nothing from Dr. Bob, who didn't even look at you; the fact that he is 99% accurate tells you nothing about you. Clearly you have learned absolutely nothing from Dr. Tim; he flipped a coin right in front of you. But of every 980200 patients where Dr. Jethro says "negative", he is correct 980100 times and incorrect 100 times. Dr. Jethro's 99% accurate test is actually 99.99% accurate when he says negative

Put another way: odds of Drs. Bob and Tim being wrong when they predict negative are both 1 in 100. Odds of Dr. Jethro being wrong if he predicts negative are 1 in 10000, a hundred times less likely. Dr. Jethro is taking advantage of the low incidence of positives and the accuracy of his test in a way that the other two are not. Dr. Jethro is way, way more reliable than either of the other two, provided that your result is negative.

But what if the result had been positive? (Obviously Dr. Bob would not say positive, so let's ignore him.) Of Dr. Jethro and Dr. Tim, which should you trust? If Dr. Tim says positive then you have a 99 in 100 chance that Dr. Tim is wrong. If Dr. Jethro says positive then you have a 50 in 100 chance that Dr. Jethro is wrong. Dr. Jethro is clearly still the winner here, but it is deeply counterintuitive to people that a test which is overall accurate 99% of the time can have a 50% false positive rate.    

And of course the more rare the disease is, obviously the more likely it is that a negative result is correct; most people don't have the disease, so a negative result is likely. But the flip side is that the more rare a disease is, the more likely it is that a positive is a false positive: an artefact of flaws in the test.

Imagine if only one in ten thousand drivers pulled up to the wrong side. Jethro's 99% accurate heuristic would now be worse than Bob's "always guess negative" strategy because Jethro gets so many false positives.

This is a serious problem in medicine! The worst false outcome is a false negative - that is, the test says you do not have the condition when really you do. That's why Dr. Bob's strategy is completely unacceptable; all his false results are false negatives. But as we've seen, the mathematics of the situation means that given a Dr. Jethro with a reasonably accurate test for a condition with low incidence, false negatives are very rare.

But false positives cause unnecessary, potentially harmful or expensive treatment, not to mention unnecessary anxiety. The mathematics of the situation is that false positives are an extremely high percentage of positives when the inaccuracy of the test and the rarity of the disease are close to each other. Tests for rare conditions have to be incredibly accurate for the false positive rate to be low. The inaccuracy of the test has to be orders of magnitude less than the rarity of the condition.

This sort of probability analysis is based on Bayes' Theorem and it has many fascinating implications beyond just this quick sketch. It has implications in law, in spam detection, it comes up all over the place.

If Tom and Ray have any comments on this critique of their puzzler, I'd be happy to hear them.

Comments

  • Anonymous
    July 01, 2010
    The comment has been removed

  • Anonymous
    July 01, 2010
    The consequences of false positives and false negatives are rarely the same. This may lead to a relatively less accurate test to be preferred to a more accurate one as long as the error is of the least undesirable kind. For instance, a spam test should never have false positives (which would mean that legitimate mail gets marked as spam), while diagnostic medicine, as you correctly stated, usually considers a false positive the lesser evil (especially so when a disease is highly transmissible: in extreme cases it might be preferrable to just pull a "reverse Bob" and treat everybody). It's when the evaluation of the potential damage is subjective that things get really interesting...

  • Anonymous
    July 01, 2010
    The comment has been removed

  • Anonymous
    July 01, 2010
    I think your analysis of the puzzler is spot on. With regard to your comments on the subject of medical tests, I would say that doctors have a responsibility to educate their patients about what the results of a test actually mean. Statistics and Bayesian theory are not a subject well understood by most people (although wouldn't it be nice if they were). As such, doctors have an ethical responsibility to help their patients understand the true implications of a test's statistical error. Doctors should strive to reduce the suffering of their patients - that includes the mental anguish from learning the results of such tests. There's also another consideration here as well - economics. The accuracy of a medical test needs to improve significantly (relative to its cost) to be a worthwhile alternative to an existing but cheaper test. Based on your own example, if the test was not 99% accurate but 99.3% accurate, its false positive rate is still 44% (as opposed to 50%). At 99.9%, however, the likelihood of a false positive is now <10% - for the same prior probability of disease in the population at large of 1%. If the slightly more accurate test is significantly more expensive (as they often are) - it's questionable if its broad application is the most effective use of scare medical dollars. Achieving an appropriate balance between the cost of medical tests and their effectiveness is fraught with difficulty. As individuals, we all want the best, most advanced, most accurate medical procedures applied to us when we are concerned for our health, or the health of someone we love. But as a society, always opting for such an approach results in escalating costs for both insurance and medical treatment - in effect, undermining the very thing we desire.

  • Anonymous
    July 01, 2010
    >The inaccuracy of the test has to be orders of magnitude less than the rarity of the condition. It seems to me that in medicine, in particular, it's much easier to achieve than it sounds, really: how many people are tested for plague nowadays? Or, say, for leprosy? People only rush to do the tests when the condition becomes dangerously common (or made seem so, usually by sensationalists in the media). And most of the medical tests nowadays are reasonably accurate (although again: the only test I can imagine to have the 99.9% accuracy is testing whether the patient is dead or alive). So, if the condition is really rare, there won't be too many people running tests for it and, consequently, hurt by the inaccuracy of those tests; and if it's not so rare, then the statement about the inaccuracy of the test being order of magnitude less then the rarity of the condition becomes true.

  • Anonymous
    July 01, 2010
    @Dean Harding You are correct about Australia yet I'd state 99% of people are either unaware or unwilling to attempt this. Also, some stations make it mighty hard to do not having extendable hoses to some extent (I drive a Fairmont so it's a bit of a stretch without an extendable hose of some sort). I'm probably the only person I know that actually does this. JB

  • Anonymous
    July 01, 2010
    I think Denis raises an interesting point. All cars will come into a petrol station to refuel. But for diseases that are rare, it's certainly the case that not all people will come in for a test. People will have the test done because of some suspicion - family history, symptoms, etc. So if 1 in 10,000 have the disease, you will likely see a much more frequent rate of disease in the sample of people actually having tests. Interesting post.

  • Anonymous
    July 02, 2010
    This post reminds me of the lotto prediction service that I considered starting once. The idea was that for the low, low price of only $1, I would predict whether your lotto ticket was going to hit the jackpot -- 100% accuracy guaranteed or your money back!

  • Anonymous
    July 02, 2010
    Interesting. Maybe you are interested in the following story I read in a german book last year (Denken Sie selbst, sonst tun es andere für Sie, in english something like: Think on your own, or someone else will do it for you). Say you have 100 women which take the birth control pill (secure at about 99.8%). All of them do a pregnancy test (accurate at the same rate, about 99.8%). If the test says yes, pregnant, the answer is correct in only 50% of the cases. I'm not able to explain as nicely as Eric, but his story remembered me to this case and should have the same mathematical background. Regards.

  • Anonymous
    July 03, 2010
    I argue that many tests (specially medical) should not completely reduced to binary logic. Their result are much more fuzzy-like (see Fuzzy Logic). They have confidence rates. How many times you saw a medic asking for another test because the result of the last one was not conclusive? Of course, for some automated test (like spam detection), a threshold has to be set, and a fuzzy value has to be reduced to a binary value. Nevertheless, I think there are some spam detection software that flags some email messages as spam-suspect.

  • Anonymous
    July 04, 2010
    @Gabe I once had a similar idea. I would guarantee a 50% rate of return on anything they would spend on lotto / poker / whatever. A far better average rate of return than a slot machine will give you, but people didn't find it quite as fun to play.

  • Anonymous
    July 04, 2010
    Wonderful intro to Bayesian reasoning here: yudkowsky.net/.../bayes (Seems to have been given a facelift since I first read it.)

  • Anonymous
    July 06, 2010
    I always go to the pump with the shortest queue (shortest may be zero), irrespective of the side. If two shortest queues are equal, and they offer a choice of sides, I will prefer a pump on the filler cap side. But, all in all, there is no /wrong/ side

  • Anonymous
    July 07, 2010
    I briefly considered posting under some name such as "Captain Pedantic", but that would be cowardly. And really, in a post that is fundamentally about counting, you should get the arithmetic right. "Bob wins 445000 times, ties 500000 times, and loses 5000 times." The snag about this is that 445,000 + 500,000 + 5,000 = 950,000 < 1,000,000. The mistake crept in when you attempted to divide 990,000 by 2 and got 445,000 instead of 495,000. The analogous error is repeated for Jethro. I notice that you have avoided the use of a separator character such as a comma or point in long numbers (trying not to offend or confuse your heterogeneous readership?) and I suggest that this may have promoted your mistake. It was a typographical error. I've fixed it. - Eric

  • Anonymous
    July 07, 2010
    Ironically, Car Talk had a puzzler on this very same issue: www.cartalk.com/.../answer.html

  • Anonymous
    July 08, 2010
    They could easily tweak the puzzle to make sense with their reasoning if they rephrased it to limit the results to the set of those in which the driver drove up on the wrong side, i.e. "In all the situations in which a driver pulled up on the wrong side, Tim called it 50% of the time and Jethro called it 99% of the time."  That should resolve the hangup.

  • Anonymous
    July 15, 2010
    If you like this sort of post, I recommend "The Drunkard's Walk" by Leonard Mlodinow. It explains numerous statistical "gotcha's":

  • Let's Make a Deal puzzle
  • Lady with twins and one is X and Y
  • The one from the post (I believe it's the prosecutor's fallacy)
  • And the gambler's fallacy (it's gotta land on black, it's been red five times now... it's due!)
  • Anonymous
    August 22, 2010
    The comment has been removed