In All Probability

As a teenager, I had the good fortune to meet Tom Körner. He had written a book, The Pleasures of Counting, that opens with a story about John Snow.

Snow was a 19th-century English physician who painstakingly collected and analyzed vast amounts of data to convincingly argue that cholera spreads through contaminated drinking water, and not, as was once widely believed, from some kind of air pollution.

This struck me as a powerful example of how mathematics, and in particular, statistics, can impact our lives. How many more would have succumbed, had the true cause remained hidden?

Yet today, probability and statistics seem much maligned. Statistics are worse than "damned lies"; they’re "pliable"; one can "prove anything by statistcs except the truth"; they are the means to produce "unreliable facts from reliable figures".

How did this happen? I’m sure these famous quotes were composed mainly in jest, and perhaps referred to shady accounting more than actual calculation. But these days, even the mathematics itself seems suspect:

Statistics is indeed a troubled subject. It turns out some guy named R. A. Fisher is to blame. Fisher had a tragic combination of gifts and flaws that led to today’s erroneous orthodox statistics. (Despite an ever-growing mountain of evidence, Fisher steadfastly refused to believe smoking causes lung cancer. How good could his methods be?)

My undergrad introductory course on probability and statistics followed Fisher’s dogma. As a result, I felt that the methods they taught seemed more like black magic than mathematics. But I was convinced that the lecturer only seemed to be teaching superstitions because my understanding was too shallow, and I concluded I must have a poor intuition for the subject.

Years later, and determined to conquer my weakness in this area, I went back to my textbook. And some other books. I discovered the shocking truth: my textbook is wrong. For once, a crazy conspiracy theory was true and They really were corrupting us all with Their false mathematics.

Disclaimer

I heard from Fred Ross that this link got posted to Hacker News. I’d like to remind visitors that these notes are meant for my personal use; I’m happy if others read them, but be aware the material is derived from what little I’ve read.

Fred Ross also supplied the following summary which includes suggestions for further reading:

The underlying theory that justifies most inference (Bayesian, minimax, etc.) is decision theory, which is a subset of the theory of games. Savage’s book on the foundations of statistics has a very nice discussion of why this should be. I learned it from Kiefer’s book, which is the only book I know of that starts there. Lehmann or Casella both get to it later in their books.

The justification for p-value is actually the Neyman-Pearson theory of hypothesis testing. The p-value is the critical value of alpha in that framework. I wrote a couple of expository articles for clinicians going through this if you’re interested.

Jaynes was a wonderful thinker, but be aware that a lot of the rational actor theory breaks down when you don’t have a single utility function. That is true of using classes of prior (see the material towards the end of Berger), or in sequential decision problems (look at prospect theory in psychology, where the overall strategy may have a single utility function, but local decisions along the way can’t be described with one). So the claims in the middle of the 20th century for naturalness of Bayesian reasoning haven’t held up well.


Ben Lynn blynn@cs.stanford.edu 💡