Efficient Binomial Coefficients

For a natural number n, consider the set:

\[ \{ (i \text{ OR } j, i \text{ AND } j) | i \in [0..n], j \in [0..n] \} \]

What does this describe? Let’s plot it:

f n = [(i .|. j, i .&. j) | i <- [0..n], j <- [0..n]]
plot n = unlines [[bool ' ' '*' $ (r, c) `elem` f n |
  c <- [0..n]] | r <- [0..n]]

putStr $ plot (20 :: Int)

As a kid I came across a BASIC programming book that mentioned generating art from Pascal’s triangle. The above precisely matches the picture created by marking all odd numbers in Pascal’s triangle:

choose n 0 = 1
choose n k = n * choose (n - 1) (k - 1) `div` k

plot' n = unlines [[(" *"!!) $ (`mod` 2) $ choose r c |
   c <- [0..n]] | r <- [0..n]]

plot (20 :: Int) == plot' (20 :: Int)

[For memory, I once tried to plot one pixel per number. I was unaware of modular arithmetic at the time, so integer overflow quickly disfigured the output!]

This startling equivalence can be stated as follows: the number \(n\choose k\) is even if and only if the subtraction \(n - k\) in binary requires at least one borrow. More precisely, if we write:

\[ {n\choose k} = 2^r s \]

with \(s\) odd, then it turns out \(r\) is exactly the number of borrows in the subtraction \(n - k\) in binary.

More generally, the largest power of a prime \(p\) dividing \({n\choose k}\) (the \(p\)-adic valuation) is equal to the number of borrows in the subtraction \(n - k\) in base \(p\).

Let’s prove it. The power of \(p\) in \(n!\) is:

pfac p n = sum [n `div` q | q <- takeWhile (<= n) $ iterate (p*) p]

-- The highest power of 3 dividing 10! is 3^4.
pfac 3 10

Thus the power of \(p\) in:

\[ {n \choose k} = \frac{n!}{k!(n - k)!} \]

is given by:

f p n k = sum [n `div` q - k `div` q - (n - k) `div` q
  | q <- takeWhile (<= n) $ iterate (p*) p]

-- 10 choose 5 = 252 = 2^2 * 3^2 * 7.
f 2 10 5
f 3 10 5
f 5 10 5
f 7 10 5

Next we use:

(n `div` p) `div` p^i == n `div` p^(i+1)

to write a recursive definition of f:

f p n 0 = 0
f p n k | cn >= ck  =     f p n'       k'
        | bn /= bk  = 1 + f p n'       k'
        | bn /= 0   = 1 + f p (n' - 1) k'
        | otherwise = 1 + f p n'       (k' + 1)
        where
          (n', cn) = n  `divMod` p
          bn       = n' `mod` p
          (k', ck) = k  `divMod` p
          bk       = k' `mod` p
f 2 10 5
f 3 10 5
f 5 10 5
f 7 10 5

The number of borrows in base p satisfies the same recursive relations, proving the result.

We can handle a few special cases to speed up the the function:

f p n 0 = 0
f p n k | k > n - k = f p n (n - k)
        | p > n - k = 1
        | 2 * p > n = 0
        -- In this case, there are at most 2 non-zero digits:
        | p * p > n = fromEnum $ cn < ck
        | cn >= ck  =     f p n'       k'
        | bn /= bk  = 1 + f p n'       k'
        | bn /= 0   = 1 + f p (n' - 1) k'
        | otherwise = 1 + f p n'       (k' + 1)
        where (n', cn) = n  `divMod` p
              bn       = n' `mod` p
              (k', ck) = k  `divMod` p
              bk       = k' `mod` p
f 2 10 5
f 3 10 5
f 5 10 5
f 7 10 5

In Goetgheluck’s article, we have a loop instead of a recursion, and the special cases are only checked before entering the loop. Hopefully this difference is negligible.

We now know how to figure out the exponents of each prime, but we still need the primes themselves to compute the answers. For this demo, we use a naive but succinct algorithm to find all primes:

primes = sieve [2..] where sieve (p:x) = p : sieve [n | n <- x, n `mod` p /= 0]
choose n k = product $ map (\p -> p^f p n k) $ takeWhile (<= n) primes
choose 1000 353

On my laptop, the naive algorithm still wins for small examples like the above, which were considered large examples when the original article was published! We need bigger inputs to enjoy the fruits of our labour. The following program, which uses the primes package, took under 2 seconds to compute choose 1000000 353000, while the naive algorithm took over 15 seconds.

import Data.Numbers.Primes

f p n 0 = 0
f p n k | k > n - k = f p n (n - k)
        | p > n - k = 1
        | 2 * p > n = 0
        | p * p > n = fromEnum $ cn < ck
        | cn >= ck  =     f p n'       k'
        | bn /= bk  = 1 + f p n'       k'
        | bn /= 0   = 1 + f p (n' - 1) k'
        | otherwise = 1 + f p n'       (k' + 1)
        where (n', cn) = n  `divMod` p
              bn       = n' `mod` p
              (k', ck) = k  `divMod` p
              bk       = k' `mod` p

choose n k = product $ map (\p -> p^f p n k) $ takeWhile (<= n) primes

main = print $ choose 1000000 353000

The exact-combinatorics package contains a benchmarked, tested, and better-optimized implementation of this algorithm.


Ben Lynn blynn@cs.stanford.edu 💡