Perfect Codes

An interactive tour of Chapter 1 of David Mackay, Information Theory.

A picture is worth 1728 bits

For this demonstration, we need a list of bits. Following Mackay, we choose one that represents an image. We pick a 72x24 image of what should perhaps really be called the Hubbard or Douady set:

-- Based on https://sametwice.com/4_line_mandelbrot.
prec :: Int
prec = 16384
infixl 7 #
x # y = x * y `div` prec
sqAdd (x, y) (a, b) = (a#a - b#b + x, 2*(a#b) + y)
norm (x, y) = x#x + y#y
douady p = null . dropWhile (\z -> norm z < 4*prec) . take 30 $
  iterate (sqAdd p) (0, 0)
orig = [fromEnum $ douady (616*x - 2*prec, 1502*y - 18022)
  | y <- [0..23], x <- [0..71]]

chunksOf i ls = map (take i) (go ls) where
  go [] = []
  go l  = l : go (drop i l)
putPic = putStr . unlines . map (concatMap show) . chunksOf 72

putPic orig

Make some noise

Imagine the above image was taken by a deep space satellite and sent back to Earth. The signal suffers greatly from interference due to the vast distance, which we simulate with the help of a PCG pseudo-random number generator:

data PCG = PCG Word64 Word64
pcg a b = PCG (a + b') b' where b' = 2*b + 1

next :: PCG -> (Word, PCG)
next (PCG x inc) = (r, PCG x' inc) where
  x' = 6364136223846793005*x + inc
  r = (fromIntegral (x `xor` (x `shiftR` 18) `shiftR` 27) :: Word) `rotateR` (fromIntegral $ x `shiftR` 59)

fromPCG p = map (fromIntegral . fst) $ tail $ iterate (next . snd) (undefined, p)

rnds = tail $ fromPCG $ pcg 42 54

take 10 rnds

We use the list of generated numbers to disrupt communication. Our model is a binary symmetric channel: the following noise function generates a list of bits, each of which is one with probability f. Zero means we leave the corresponding bit in the signal alone, otherwise we flip it.

noise f = map (\x -> fromEnum $ fromIntegral (x `mod` 32768) <= f * 32768)

take 20 $ noise 0.1 rnds

Thus if about one-tenth of the bits are flipped, we might see:

(#+) = zipWith $ (fromEnum .) . (/=)
send msg f = msg #+ noise f rnds

putPic $ send orig 0.1

Replace 0.1 with 0.4 and run the code again (press Ctrl + Enter or click the play button), and the image is almost unrecognizable. Increasing f to 0.5 produces pure noise, as any received bit has the same chance of representing 0 or 1. As f increases further and approaches 1, the image becomes clearer, except we now see the dual image where every bit is flipped.

This last observation reminds me of those logic puzzles where one person always lies. A reliably unreliable source is as useful as a truthteller: just pretend they have a habit of adding "…​NOT!" at the end of every utterance.

Majority rules

How can we deal with noise? What’s the first solution that comes to mind?

Restaurant staff solve this problem by repeating your order to confirm. However, this brings to mind Segal’s Law: if one copy says 0, and the other says 1, which is correct? In a restaurant, the customer can immediately reply to disambiguate, but in general, round trip communication can be expensive or impossible. For us, error detection is insufficient: we want error correction.

The easiest solution is to repeat each bit 3 times, and if there’s disagreement, we believe the majority, a trick popularized by the science fiction story The Minority Report.

This is a repetition code that we denote by \(R_3\). Repetition codes are as simple as you’d expect: they generalize to any length; they correct errors in a word so long as under half the bits are corrupt; if exactly half the bits of a codeword are flipped then we can detect but not correct the error.

rep n = (replicate n =<<)
unrep n = map majority . chunksOf n where
  majority = fromEnum . (> div n 2) . sum

rep 3 [1,0,1,1]
unrep 3 $ rep 3 [1,0,1,1]

Every binary repetition code of length \(2e + 1\) has two codewords: all 0s and all 1s. Every word differs from one of the codewords by at most \(e\) bits.

The following simulates using \(R_3\) to send our image across a binary symmetric channel that flips bits with probability 0.1:

sendR n xs f = unrep n $ send (rep n xs) f

putPic $ sendR 3 orig 0.1

Off By One

Given 4 bits to encode, the (7, 4) Hamming code adds 3 parity bits, which hold the parity of various subsets of the 4 bits. We’ll soon see how the magic numbers 7 and 4 are determined from the number of parity bits via \(7 = 2^3 - 1\) and \(4 = 2^3 - 1 - 3\).

We represent the encoding operation with a matrix \( \newcommand{\GT}{\mathbf{G}^\top} \newcommand{\H}{\mathbf{H}} \newcommand{\w}{\mathbf{w}} \GT\) over \(\mathbb{Z}_2\). Like Mackay, we prefer to work with the transpose of the generator matrix \(\mathbf{G}\), partly so we can get away with writing just one simple matrix-vector multiplication routine. Given a vector \(\mathbf{v}\) of 4 bits, its corresponding codeword is \(\GT \mathbf{v}\).

identityMatrix n = take n $ map (take n) $ iterate (0:) $ 1 : repeat 0
parityMatrix n = transpose $ filter ((> 1) . sum) $ replicateM n [0,1]

gt n = identityMatrix (2^n - n - 1) ++ parityMatrix n

mv m v = dot v <$> m where dot = (((`mod` 2) . sum) .) . zipWith (*)
putMat = putStr . unlines . map (unwords . map show)

putMat $ gt 3

hamming74 = mv (gt 3)
hamming74 [1,0,1,1]

The parity checks seem strange at first: we compute the parity of 3 of the 3-subsets of bits. Why not the fourth one? And how do we choose which one to omit?

It turns out it doesn’t matter which one we leave out. All that matters is that the 4 column vectors in the bottom 3 rows of \(\GT\) correspond to the possible ways to to pick at least two objects of a set of size 3. This leads to the constants 7 and 4.

The parity-check matrix \(\H\) resembles \(\GT\):

h n = zipWith (++) (parityMatrix n) $ identityMatrix n

pcheck74 = mv (h 3)

Exercise: prove \(\H \w = 0\) exactly when \(\w\) is a codeword.

pcheck74 $ hamming74 [1,0,1,1]

Furthermore, if \(\w\) contains exactly one 1, then \(\w\) is one of 7 possibilities, and the syndrome \(\H \w\) is one of the 7 nonzero 3-bit vector. Setting distinct bits leads to distinct parity checks:

pcheck74 [1,0,0,0,0,0,0]
pcheck74 [0,0,0,1,0,0,0]

We decode and a correct a 7-bit word \(\w\) as follows. If \(\H \w = 0\) then \(\w\) is a valid codeword. If not, by linearity, we generate a lookup table that maps syndromes to a vector that indicates which bit of \(\w\) to flip to produce \(\w'\) such that \(\H \w' = 0\), implying \(\w'\) is a codeword. Then we return the first 4 bits.

ham n = concatMap (mv $ gt n) . chunksOf (2^n - n - 1)
unham n = concatMap (take (nn - n) . correct) . chunksOf nn where
  nn = 2^n - 1
  syns = zip (mv (h n) <$> identityMatrix nn) $ identityMatrix nn
  correct w = maybe w (w #+) $ lookup (mv (h n) w) syns

sendH xs f = unham 3 $ send (ham 3 xs) f

putPic $ sendH orig 0.1

As hinted above, for any given number \(n\) of parity bits, we can construct a \((2^n - 1, 2^n - 1 - n)\) Hamming code, that is:

  • There are \(n\) parity bits.

  • Codewords are \(2^n - 1\) bits long.

  • A codeword represents \(2^n - 1 - n\) bits.

  • Any word is at most one bit-flip away from a codeword.

When \(n = 1\), we get the (1, 0) Hamming code, a degenerate case. Each 1-bit word always decodes as the codeword 0, thus we’re communicating in unary; the only information we really send is the length of a message.

When \(n = 2\), we get the (3, 1) Hamming code, which is a fancy way of saying \(R_3\).

Perfect Code

A binary perfect code of length \(n\) is a code where for some \(e\), every \(n\)-bit word differs from exactly one codeword by at most \(e\) bits. This implies different codewords differ by at least \(2e + 1\) bits.

Such a code is perfect in the sense that it is guaranteed to correct up to \(e\) flipped bits. With other codes, a corrupt word may be equally close to distinct codewords. There may be no clear winner.

A little thought shows a necessary condition for a perfect code with \(W\) codewords is:

\[ W \sum_{i=0}^e {n \choose i} = 2^n \]

Thus \(W = 2^k\) for some \(k\) and we may write:

\[ \sum_{i=0}^e {n \choose i} = 2^{n-k} \]

The first few rows of Pascal’s triangle are:

pascal = iterate (\xs -> zipWith (+) (0:xs) (xs++[0])) [1]

putMat $ take 8 pascal

We only need half of each row, because \(2e + 1 \le n\). The relevant cumulative sums:

cumu = zipWith take ((`div`2) <$> [1..]) $ scanl1 (+) <$> pascal
putMat $ take 8 cumu

We see some nontrivial powers of 2, and it turns out they correspond to perfect codes, but unfortunately we’ve seen them all before. Since Hamming codes correct up to 1 bit, the powers of 2 in the second position like 4 and 8 are the Hamming codes. And the powers of 2 in the last position like 16 and 64 (and also 4) correspond to repetition codes.

Thus we compute the cumulative sums that correspond to codes that can correct at least 2 errors but less than \((n - 1) / 2\) errors.

cumu' = map (drop 2) $ zipWith take ((`div`2) <$> [2..]) $
  scanl1 (+) <$> drop 2 pascal

putMat $ take 12 cumu'

Golay searched further, and found an interesting linear code. We retrace his footsteps. We label each sum with its row and column, and detect powers of 2 using a well-known bit-twiddling trick that works on numbers that fit into machine words:

searchMe = zipWith (map . first . (,)) [2..] $ zip [2..] <$> cumu'

isPow2 n = w .&. (w - 1) == 0 where w = fromInteger n :: Word

filter (isPow2 . snd) $ concat $ take 100 searchMe

Thus in roughly 100 rows, we have two candidates. One is:

\[ {90 \choose 0} + {90 \choose 1} + {90 \choose 2} = 2^{12} \]

Golay starts by supposing a perfect linear code exists with \(n = 90\) and \(e = 2\). Consider the 90-bit words containing exactly one 1, which represent the 90 possible 1-bit errors. For some \(r\), exactly \(r\) of these 90 words have a syndrome containing an odd number of 1s, and the other \(90 - r\) have syndromes with an even number of 1s. By linearity, if a word is the sum of one of the \(r\) words and one of the other \(90 - r\) words, then its syndrome must also contain an odd number of 1s. Moreover, this exactly characterizes the syndromes with an odd number of 1s that correspond to 2-bit errors.

Half of the \(2^{12}\) possible syndromes contain an odd number of 1s, hence we have \(r + r(90 - r) = 2^{11}\). This has no integer solution, thus there cannot exist a 90-bit perfect linear code that corrects up to 2 bits.

What about the other candidate?

\[ {23 \choose 0} + {23 \choose 1} + {23 \choose 2} + {23 \choose 3} = 2^{11} \]

This corresponds to a curious creature known as \(G_{23}\), the binary Golay code, and it is mysteriously connected to other areas of mathematics:

  • The icosahedron.

  • Quadratic residues modulo 23.

  • The (7,4) Hamming code plus an extra bit that records the parity of the other 7: the extended (8,4) Hamming code.

  • The Mathieu groups \(M_{23}\) and \(M_{24}\), which are themselves members of a zoo known as the sporadic groups.

  • The Mogul variant of a mathematical coin turning game due to Lenstra (called "Turning Turtles" by Berlekamp, Conway, and Guy, Winning Ways, Chapter 14): two players, a row of 24 coins, and on your turn you lose unless you change one heads to tails, and optionally flip up to 6 of the coins to its right.

It seems easiest to construct the generator matrix of the binary Golay code by taking advantage of its cyclic nature: we shift a certain codeword based on factoring \(x^{23} - 1\) over \(GF(2)\), adding when necessary to cancel out 1s in unwanted places.

gen = replicate 11 0 ++ [1,1,0,0,0,1,1,1,0,1,0,1]
ro x = tail x ++ [head x]
zero11 v = if v!!11 == 1 then gen #+ v else v
g23GT = transpose $ reverse $ take 12 $ iterate (zero11 . ro) gen
putMat g23GT

Coding Contest

Time for a showdown. We simulate a binary symmetric channel with a noise level of 5%, and compare the codes \(R_3 = H(3, 1)\), \(H(7, 4)\), and \(G_{23}\):

g23H = zipWith (++) (drop 12 g23GT) (identityMatrix 11)

wt n 0 = [replicate n 0]
wt 0 _ = []
wt n k = ((1:) <$> wt (n - 1) (k - 1)) ++ ((0:) <$> wt (n - 1) k)

unbits = sum . zipWith (*) (iterate (2*) 1)
tab23 = zip (unbits . mv g23H <$> errs) errs where
  errs = wt 23 =<< [0..3]

golay = concatMap (mv g23GT) . chunksOf 12
unGolay = concatMap (take 12 . correct) . chunksOf 23 where
  correct w = maybe w (w #+) $ lookup (unbits $ mv g23H w) tab23

sendG xs f = unGolay $ send (golay xs) f

putStrLn "repetition-3:"
putPic $ sendR 3 orig 0.05
putStrLn "Hamming(7, 4):"
putPic $ sendH orig 0.05
putStrLn "Golay:"
putPic $ sendG orig 0.05

Eyeballing it, we see \(R_3\) and \(G_{23}\) seem equally effective, while the Hamming code is noticeably worse, which is to be expected since it can only correct up to one bad bit per 7 bits. However, the rates of \(R_3, H(7,4), G_{23}\) are \(1/3, 4/7, 12/23\) respectively. The first rate is substantially lower than the other two, which differ by less than 0.05, so I’d say Golay wins this round.

The binary Golay code turns out to suit several real-life situations. The extended binary Golay code \(G_{24}\) (we add one parity bit to enable the detection, but not the correction, of 4-bit errors) is used in deep space and radio communication.

Perfect is the enemy of good

It turns out we have already described all perfect codes over \(GF(2)\) (binary codes). There are two families, repetition and Hamming, and one outlier hermit, the Golay code. All have extended variants.

What if we consider other fields? Over \(GF(q)\) our constraint generalizes to:

\[ \sum_{i=0}^e {n \choose i}(q - 1)^i = q^{n-k} \]

It so happens that:

\[ {11\choose 0} + {11\choose 1} \cdot 2 + {11\choose 2}\cdot 2^2 = 3^5 \]

and it so happens that there exists a corresponding perfect linear code. It’s called the ternary Golay code, though it was first published by Finnish football pool enthusiast Juhani Virtakallio. As the equation says, this code has a blocklength of 11 and encodes 6 trits (ternary digits) at a time by adding 5 parity checks. It has connections with the Mathieu group \(M_{11}\).

Other than the above, there are no perfect codes. Over any field. Linear or non-linear. See Aimo Tietäväinen, On the Nonexistence of Perfect Codes Over Finite Fields.

This is a pity, because as the blocklength increases, the rate of repetition codes approaches 0, and the rate of Hamming codes approaches 1. Shannon’s noisy-channel coding theorem means that every noisy channel has a capacity \(C\) such that for any given probabiilty \(\epsilon\) and \(R < C\), there exists a code of some blocklength \(N\) with rate at least \(R\) and with a probability of block error less than \(\epsilon\).

For example, a binary symmetric channel with noise level \(f\) has capacity:

\[ 1 + f \lg f + (1 - f) \lg (1 - f) \]

When \(f = 0.1\), we have \(C = 0.531…​\) so in theory, when the noise level is 10%, there exists a code with a really large blocklength that doesn’t even double the size of the message, yet decodes correctly with a really high probability. But whatever this code is, it’s neither repetition nor Hamming, thus it cannot be perfect. In general, we must work much harder to find good codes.


Ben Lynn blynn@cs.stanford.edu 💡