ch _ 0 = 1 ch n k = n*(ch (n - 1) (k - 1)) / fromIntegral k table :: Double -> Double -> Int -> String table nn mm n = unlines $ caption:fst (foldl next (id, 0) [0..10]) [] where caption = intercalate ", " [ "N = " ++ show nn , "M = " ++ show mm , "n = " ++ show n ] next (f, t) r = (f . (unwords [show r, show h, show $ t + h]:), t + h) where h = (ch mm r) * (ch (nn - mm) (n - r)) / ch nn n putStrLn $ table 100 10 50 -- Table 3.1. putStrLn $ table 100 50 10 -- Table 3.2; identical to Table 3.1. putStrLn $ table 99 50 10 -- Table 3.3.
Answers to Exercises
Exercise 2.1
I believe this question is asking if we can write:
\[ p(C|A+B) = f(p(C|A), p(C|B), p(C|AB)) \]
for some function \(f\). There is probably an easy proof this cannot be done with the material given in the book, because it’s missing some Boolean identities(!), notably the absorption identities.
In any case, we exhibit a counterexample to show that there is no way to consistently extend existing theory to obtain such an equation. Suppose \(C = A\), \(p(A|B) = 0.01\) and \(p(A|\bar{B}) = 1\). Then we have:
\[ p(A|A+B) = f(1, 0.01, 1) \]
This implies \(p(A|A+B)\) is constant whether \(B\) is likely or not, which is absurd.
For example, \(B\) is "it’s raining" and \(A\) is "we visit the park", that is, if it’s sunny we’ll definitely visit the park, but if it rains, then we only go with probability 0.01.
If it hardly rains, if I’m told it’s raining or we’re at the park, I’ll believe it’s likely we’re at the park.
If it hardly ever stops raining, if I’m told it’s raining or we’re at the park, I’ll believe it’s unlikely we’re at the park.
As \(p(A|B)\) approaches 0, and climates approach extremes, we would be saying \(f(1, 0, 1)\) approaches 0 and also approaches 1.
[Jaynes writes terms such as \(p(A|BX)\), where \(X\) can be said to represent all our background knowledge. I’ve been omitting \(X\) for brevity, but this may have made the above less clear. In the above example, \(X\) represents "it hardly rains" or "it hardly stops raining".]
Exercise 2.2
Take the product rule:
\[ p(AB|C) = p(A|C) p(B|AC) \]
Substitute \(B=C, C=X, A=A_1+…+A_n\):
\[ p((A_1 ... A_n)C|X) = p(A_1+…A_n|X) p(C|(A_1…+A_n)X) \]
Repeated applications of the sum rule and mutual exclusivity gives:
\[ p(A_1+…A_n|X) = p(A_1|X) +... p(A_n|X) \]
and:
\[ p((A_1+…A_n)C|X) = p(A_1C|X) +... p(A_nC|X) \]
Therefore:
\[ \sum_i p(A_iC|X) = \sum_i p(A_i|X) p(C|(A_1+…+A_n)X) \]
applying the product rule \(n\) times on the left-hand side yields:
\[ \sum_i p(A_i|X) p(C|A_iX) = \sum_i p(A_i|X) p(C|(A_1+…+A_n)X) \]
Exercise 2.3
From the product rule:
\[ P(AB|C) = P(A|C) P(B|AC) = a P(B|AC) \le a \]
since \(P(B|AC) \le 1\).
From the sum rule:
\[ P(A+B|C) = b + a - P(AB|C) \ge b + a - a = b \]
since \(P(AB|C) \le a\) from above.
Let us elide the "\(|C\)" below.
If \(a+b>1\) then:
\[ P(A+B) = a + b - P(AB) > 1 - P(AB) \]
In other words, \(P(AB) > 1 - P(A+B)\). Since \(P(A+B) \le 1\), we have \(P(AB) > 0\).
If \(a+b<1\) then \(P(A+B) < 1 - P(AB) \le 1\) since \(P(AB) \ge 0\).
Exercise 3.1
The hint says it all. A mulitplicity factor of \(n!\) would count e.g. $R_1 R_2 W_1 W_2 W_3\( and \)R_2 R_1 W_3 W_1 W_2$ as distinct cases. To correct for this, we must divide by the number of permutations of \(r\) red balls, and also the number of permutations of the \(w\) white balls, which leads to \(n\choose r\).
Exercise 3.2
For the general case, we consider compositions of \(m\) into \(k\) parts, that is, all sets of positive integers \(\{m_1, …, m_k\}\) such that \(m_1 + … + m_k = m\). Then the probability we have one of each colour after \(m\) draws is:
\[ \frac{1}{N\choose m} \sum_{{m_i}} \prod_i {N_i \choose m_i} \]
where the sum is over all compositions of \(m\) into \(k\) parts.
Alternatively, we can use inclusion-exclusion: we start with all subsets of size \(m\), then for each colour, subtract those missing that colour, then for each pair of colours, add those missing those two colours, and so on.
\[ \frac{ {N\choose m} - \sum_i {{N - N_i} \choose m_i} + \sum_{i,j} {{N - N_i - N_j} \choose m} - \sum_{i,j,k} {{N - N_i - N_j - N_k} \choose m} + …}{N\choose m} \]
where the indices of each summation must be distinct.
We may also consider ordered sets instead, or multiply through the above by \(m!\), which gives a formula involving falling factorials instead of binomial coefficients:
\[ \frac{1}{N^{\underline{m}}}( N^\underline{m} - \sum_i (N-N_i)^\underline{m} + \sum_{i,j} (N-N_i-N_j)^\underline{m} - \sum_{i,j,k} (N-N_i-N_j-N_k)^\underline{m} + …) \]
Here, \(n^\underline{k}\) denotes the \(k\)th falling factorial of \(n\).
If the \(N_i\) are identical, then the above simplifies to:
\[ \frac{1}{N^{\underline{m}}} \sum_i (-1)^i {k\choose i} (N - i N/k)^\underline{m} \]
For \(N = 50\), \(k = 5\), this is larger than 90% when \(m = 15\).
Exercise 3.4
A well-known result about derangements.
Exercise 3.5
Inclusion-exclusion gives:
\[ \frac{1}{M^N} \sum_i (-1)^i {M\choose i} (M-i)^N \]
(We count all possiblities, then for each urn, subtract the number of cases where that urn is empty, then for each pair of urns, add the number of cases where those two urns are empty, and so on.)
HYPERGEO.BAS
This Haskell program produces tables 3.1, 3.2, and 3.3.