Perfect Codes

An interactive tour of Chapter 1 of David Mackay, Information Theory.

A picture is worth 1728 bits

For this demonstration, we need a list of bits. Following Mackay, we choose one that represents an image. We pick a 72x24 image of what should perhaps really be called the Hubbard or Douady set:

-- Based on https://sametwice.com/4_line_mandelbrot.
prec :: Int
prec = 16384
infixl 7 #
x # y = x * y `div` prec
sqAdd (x, y) (a, b) = (a#a - b#b + x, 2*(a#b) + y)
norm (x, y) = x#x + y#y
douady p = null . dropWhile (\z -> norm z < 4*prec) . take 30 $
  iterate (sqAdd p) (0, 0)
orig = [fromEnum $ douady (616*x - 2*prec, 1502*y - 18022)
  | y <- [0..23], x <- [0..71]]

chunksOf i ls = map (take i) (go ls) where
  go [] = []
  go l  = l : go (drop i l)
putPic = putStr . unlines . map (concatMap show) . chunksOf 72

putPic orig

Make some noise

Imagine the above image was taken by a deep space satellite and sent back to Earth. The signal suffers greatly from interference due to the vast distance, which we simulate with the help of a PCG pseudo-random number generator:

data PCG = PCG Word64 Word64
pcg a b = PCG (a + b') b' where b' = 2*b + 1

next :: PCG -> (Word, PCG)
next (PCG x inc) = (r, PCG x' inc) where
  x' = 6364136223846793005*x + inc
  r = (fromIntegral (x `xor` (x `shiftR` 18) `shiftR` 27) :: Word) `rotateR` (fromIntegral $ x `shiftR` 59)

fromPCG p = map (fromIntegral . fst) $ tail $ iterate (next . snd) (undefined, p)

rnds = tail $ fromPCG $ pcg 42 54

take 10 rnds

We use the list of generated numbers to disrupt communication. Our model is a binary symmetric channel: the following noise function generates a list of bits, each of which is one with probability f. Zero means we leave the corresponding bit in the signal alone, otherwise we flip it.

noise f = map (\x -> fromEnum $ fromIntegral (x `mod` 32768) <= f * 32768)

take 20 $ noise 0.1 rnds

Thus if about one-tenth of the bits are flipped, we might see:

(#+) = zipWith $ (fromEnum .) . (/=)
send msg f = msg #+ noise f rnds

putPic $ send orig 0.1

Replace 0.1 with 0.4 and run the code again (press Ctrl + Enter or click the play button), and the image is almost unrecognizable. Increasing f to 0.5 produces pure noise, as any received bit has the same chance of representing 0 or 1. As f increases further and approaches 1, the image becomes clearer, except we now see the dual image where every bit is flipped.

This last observation reminds me of those logic puzzles where one person always lies. A reliably unreliable source is as useful as a truthteller: just pretend they have a habit of adding "…NOT!" at the end of every utterance.

Majority rules

How can we deal with noise? What’s the first solution that comes to mind?

Restaurants solve this problem by repeating your order to confirm. However, this brings to mind Segal’s Law: if one copy says 0, and the other says 1, which is correct? In a restaurant, the customer can immediately reply to disambiguate, but in general, round trip communication can be expensive or impossible. For us, error detection is insufficient: we want error correction.

The easiest solution is to repeat each bit 3 times, and if there’s disagreement, we believe the majority, a trick popularized by the science fiction story The Minority Report.

This is a repetition code that we denote by \(R_3\). Repetition codes are as simple as you’d expect: they generalize to any length; they correct errors in a word so long as under half the bits are corrupt; if exactly half the bits of a codeword are flipped then we can detect but not correct the error.

rep n = (replicate n =<<)
unrep n = map majority . chunksOf n where
  majority = fromEnum . (> div n 2) . sum

rep 3 [1,0,1,1]
unrep 3 $ rep 3 [1,0,1,1]

Every binary repetition code of length \(2e + 1\) has two codewords: all 0s and all 1s. Every word differs from one of the codewords by at most \(e\) bits.

The following simulates using \(R_3\) to send our image across a binary symmetric channel that flips bits with probability 0.1:

sendR n xs f = unrep n $ send (rep n xs) f

putPic $ sendR 3 orig 0.1

Off By One

Let’s look at a different kind of code, one known as the the (7, 4) Hamming code.

The first number is the output size in bits, while the second is the input size in bits, thus in general, an \((n, k)\) code represents a \(k\)-bit word with an \(n\)-bit word. The rate of a code is the output size over the input size, namely \(k/n\).

The (7, 4) Hamming code adds 3 bits to the 4 input bits to produce a 7-bit codeword. Each of these 3 bits is the parity of various subsets of the 4 bits. As the notation suggests, this code is but one member of a family, with different input and output sizes. However, only certain sizes work, and we’ll soon see how 7 and 4 are determined from the number of parity bits via \(7 = 2^3 - 1\) and \(4 = 2^3 - 1 - 3\).

We represent the encoding operation with a matrix \( \newcommand{\GT}{\mathbf{G}^\top} \newcommand{\H}{\mathbf{H}} \newcommand{\w}{\mathbf{w}} \GT\) over \(\mathbb{Z}_2\). Like Mackay, we prefer to work with the transpose of the generator matrix \(\mathbf{G}\), partly so we can get away with writing just one simple matrix-vector multiplication routine mv. Given a vector \(\mathbf{v}\) of 4 bits, its corresponding codeword is \(\GT \mathbf{v}\).

identityMatrix n = take n $ map (take n) $ iterate (0:) $ 1 : repeat 0
parityMatrix n = transpose $ filter ((> 1) . sum) $ replicateM n [0,1]

gt n = identityMatrix (2^n - n - 1) ++ parityMatrix n

mv m v = dot v <$> m where dot = (((`mod` 2) . sum) .) . zipWith (*)
putMat = putStr . unlines . map (unwords . map show)

putMat $ gt 3

hamming74 = mv $ gt 3
hamming74 [1,0,1,1]

The parity checks seem strange at first: we compute the parity of 3 of the 3-subsets of bits. Why not the fourth one? And how do we choose which one to omit?

It turns out it doesn’t matter which one we leave out. All that matters is that the 4 column vectors in the bottom 3 rows of \(\GT\) correspond to the possible ways to to pick at least two objects of a set of size 3. This leads to the constants 7 and 4.

The parity-check matrix \(\H\) resembles \(\GT\):

h n = zipWith (++) (parityMatrix n) $ identityMatrix n
pcheck74 = mv $ h 3

putMat $ h 3

Exercise: prove \(\H \w = 0\) exactly when \(\w\) is a codeword.

pcheck74 $ hamming74 [1,0,1,1]

Furthermore, if \(\w\) contains exactly one 1, then \(\w\) is one of 7 possibilities, and the syndrome \(\H \w\) is one of the 7 nonzero 3-bit vectors. Setting distinct bits leads to distinct syndromes:

pcheck74 [1,0,0,0,0,0,0]
pcheck74 [0,0,0,1,0,0,0]

We decode and a correct a 7-bit word \(\w\) as follows. If \(\H \w = 0\) then \(\w\) is a valid codeword. If not, by linearity, we generate a lookup table that maps syndromes to a vector that indicates which bit of \(\w\) to flip to produce \(\w'\) such that \(\H \w' = 0\), implying \(\w'\) is a codeword. Then we return the first 4 bits.

ham n = concatMap (mv $ gt n) . chunksOf (2^n - n - 1)
unham n = concatMap (take (nn - n) . correct) . chunksOf nn where
  nn = 2^n - 1
  syns = zip (mv (h n) <$> identityMatrix nn) $ identityMatrix nn
  correct w = maybe w (w #+) $ lookup (mv (h n) w) syns

sendH74 xs f = unham 3 $ send (ham 3 xs) f

putPic $ sendH74 orig 0.1

As hinted above, for any given number \(n\) of parity bits, we can construct a \((2^n - 1, 2^n - 1 - n)\) Hamming code, that is:

There are \(n\) parity bits.
Codewords are \(2^n - 1\) bits long.
A codeword represents \(2^n - 1 - n\) bits.
Any word is at most one bit-flip away from a codeword.

When \(n = 1\), we get the (1, 0) Hamming code, a degenerate case. Each 1-bit word always decodes as the codeword 0, thus we’re communicating in unary; the only information we really send is the length of a message.

When \(n = 2\), we get the (3, 1) Hamming code, which is a fancy way of saying \(R_3\).

Perfect Code

A binary perfect code of length \(n\) is a code where for some \(e\), every \(n\)-bit word differs from exactly one codeword by at most \(e\) bits. This implies different codewords differ by at least \(2e + 1\) bits.

Such a code is perfect in the sense that it is guaranteed to correct up to \(e\) flipped bits. With other codes, a corrupt word may be equally close to distinct codewords. There may be no clear winner.

A little thought shows a necessary condition for a perfect code with \(W\) codewords is:

\[ W \sum_{i=0}^e {n \choose i} = 2^n \]

Thus \(W = 2^k\) for some \(k\) and we may write:

\[ \sum_{i=0}^e {n \choose i} = 2^{n-k} \]

The first few rows of Pascal’s triangle are:

pascal = iterate (\xs -> zipWith (+) (0:xs) (xs++[0])) [1]

putMat $ take 8 pascal

We only need half of each row, because \(2e + 1 \le n\). The relevant cumulative sums:

cumu = zipWith take ((`div`2) <$> [1..]) $ scanl1 (+) <$> pascal
putMat $ take 8 cumu

We see some nontrivial powers of 2, and it turns out they correspond to perfect codes, but unfortunately we’ve seen them all before. Since Hamming codes correct up to 1 bit, the powers of 2 in the second position like 4 and 8 are the Hamming codes. And the powers of 2 in the last position like 16 and 64 (and also 4) correspond to repetition codes.

Thus we compute the cumulative sums that correspond to codes that can correct at least 2 errors but less than \((n - 1) / 2\) errors.

cumu' = map (drop 2) $ zipWith take ((`div`2) <$> [2..]) $
  scanl1 (+) <$> drop 2 pascal

putMat $ take 16 cumu'

No powers of 2 appear in the first 16 rows. Golay searched further, and found an interesting linear code. We retrace his footsteps. We label each sum with its row and column, and detect powers of 2 using a well-known bit-twiddling trick that works on numbers that fit into machine words:

searchMe = zipWith (map . first . (,)) [2..] $ zip [2..] <$> cumu'

isPow2 n = w .&. (w - 1) == 0 where w = fromInteger n :: Word

filter (isPow2 . snd) $ concat $ take 100 searchMe

Thus in roughly 100 rows, we have two candidates. One is:

\[ {90 \choose 0} + {90 \choose 1} + {90 \choose 2} = 2^{12} \]

Golay starts by supposing a perfect linear code exists with \(n = 90\) and \(e = 2\). Consider the 90-bit words containing exactly one 1, which represent the 90 possible 1-bit errors. For some \(r\), exactly \(r\) of these 90 words have a syndrome containing an odd number of 1s, and the other \(90 - r\) have syndromes with an even number of 1s. By linearity, if a word is the sum of one of the \(r\) words and one of the other \(90 - r\) words, then its syndrome must also contain an odd number of 1s. Moreover, this exactly characterizes the syndromes with an odd number of 1s that correspond to 2-bit errors.

Half of the \(2^{12}\) possible syndromes contain an odd number of 1s, hence we have \(r + r(90 - r) = 2^{11}\). This has no integer solution, thus there cannot exist a 90-bit perfect linear code that corrects up to 2 bits.

What about the other candidate?

\[ {23 \choose 0} + {23 \choose 1} + {23 \choose 2} + {23 \choose 3} = 2^{11} \]

This corresponds to a curious creature known as \(G_{23}\), the binary Golay code, and it is mysteriously connected to other areas of mathematics:

The icosahedron.
Quadratic residues modulo 23.
The (7,4) Hamming code plus an extra bit that records the parity of the other 7: the extended (8,4) Hamming code.
The Mathieu groups \(M_{23}\) and \(M_{24}\), which are themselves members of a zoo known as the sporadic groups.
The Mogul variant of a mathematical coin turning game due to Lenstra (called "Turning Turtles" by Berlekamp, Conway, and Guy, Winning Ways, Chapter 14): two players, a row of 24 coins, and on your turn you lose unless you change one heads to tails, and optionally flip up to 6 of the coins to its right.

It seems easiest to construct the generator matrix of the binary Golay code by taking advantage of its cyclic nature: we shift a certain codeword based on factoring \(x^{23} - 1\) over \(GF(2)\), adding when necessary to cancel out 1s in unwanted places.

gen = replicate 11 0 ++ [1,1,0,0,0,1,1,1,0,1,0,1]
ro x = tail x ++ [head x]
zero11 v = if v!!11 == 1 then gen #+ v else v
g23GT = transpose $ reverse $ take 12 $ iterate (zero11 . ro) gen
putMat g23GT

Coding Contest

Time for a showdown. We simulate a binary symmetric channel with a noise level of 5%, and compare the codes \(R_3 = H(3, 1)\), \(H(7, 4)\), and \(G_{23}\):

g23H = zipWith (++) (drop 12 g23GT) (identityMatrix 11)

wt n 0 = [replicate n 0]
wt 0 _ = []
wt n k = ((1:) <$> wt (n - 1) (k - 1)) ++ ((0:) <$> wt (n - 1) k)

unbits = sum . zipWith (*) (iterate (2*) 1)
tab23 = zip (unbits . mv g23H <$> errs) errs where
  errs = wt 23 =<< [0..3]

golay = concatMap (mv g23GT) . chunksOf 12
unGolay = concatMap (take 12 . correct) . chunksOf 23 where
  correct w = maybe w (w #+) $ lookup (unbits $ mv g23H w) tab23

sendG xs f = unGolay $ send (golay xs) f

putStrLn "repetition-3:"
putPic $ sendR 3 orig 0.05
putStrLn "Hamming(7, 4):"
putPic $ sendH74 orig 0.05
putStrLn "Golay:"
putPic $ sendG orig 0.05

Eyeballing it, we see \(R_3\) and \(G_{23}\) seem equally effective, while the Hamming code is noticeably worse, which we expect since it can only correct up to one bad bit per 7 bits. However, the rates \(4/7\) and \(12/23\) differ by less than 0.05, and both substantially exceed \(1/3\), so the winner is Golay.

By adding one parity check bit to \(G_{23}\), we get the extended binary Golay code \(G_{24}\), which can also detect, but not correct, 4-bit errors. The \(G_{24}\) code is used in deep space and radio communication.

Perfect is the enemy of good

It turns out we have already described all perfect codes over \(GF(2)\) (binary codes). There are two families, repetition and Hamming, and one outlier hermit, the Golay code. All have extended variants.

What if we consider other fields? Over \(GF(q)\) our constraint generalizes to:

\[ \sum_{i=0}^e {n \choose i}(q - 1)^i = q^{n-k} \]

It so happens that:

\[ {11\choose 0} + {11\choose 1} \cdot 2 + {11\choose 2}\cdot 2^2 = 3^5 \]

and it so happens that there exists a corresponding perfect linear code. It’s called the ternary Golay code, though it was first published by Finnish football pool enthusiast Juhani Virtakallio. As the equation says, this code has a blocklength of 11 and encodes 6 trits (ternary digits) at a time by adding 5 parity checks. It has connections with the Mathieu group \(M_{11}\).

Other than the above, there are no perfect codes. Over any field. Linear or non-linear. See Aimo Tietäväinen, On the Nonexistence of Perfect Codes Over Finite Fields.

This is a pity, because as the blocklength increases, the rate of repetition codes approaches 0, and the rate of Hamming codes approaches 1. Shannon’s noisy-channel coding theorem means that every noisy channel has a capacity \(C\) such that for any given probabiilty \(\epsilon\) and \(R < C\), there exists a code of some blocklength \(N\) with rate at least \(R\) and with a probability of block error less than \(\epsilon\).

For example, a binary symmetric channel with noise level \(f\) has capacity:

\[ 1 + f \lg f + (1 - f) \lg (1 - f) \]

When \(f = 0.1\), we have \(C = 0.531…\) so in theory, when the noise level is 10%, there exists a code with a really large blocklength that doesn’t even double the size of the message, yet decodes correctly with a really high probability. But whatever this code is, it’s neither repetition nor Hamming, thus it cannot be perfect. In general, we must work much harder to find good codes.

Coding is hard

While encoding is just matrix-vector multiplication, decoding can be more complex. Granted, repetition codes are easy to decode, but other times we’ve resorted to enumerating all 1-bit, 2-bit, …, \(e\)-bit errors so we can compute each of their syndromes to construct a lookup table. This algorithm is exponential in \(e\).

Is there a fast algorithm? Can linearity help? After all, adding input words commutes with computing their syndromes, so maybe we can cleverly pick and choose bits so their syndromes sum up just right.

Nobody knows of an efficient general algorithm. Nobody will unless humanity cracks one of the toughest unsolved problems in computer science. Decoding a linear code is NP-complete (Berlekamp, McEliece, and van Tilborg). We might have guessed this from the existence of similar NP-complete problems like disjoint set cover and subset sum. This is disheartening: even if we find a linear code whose parameters perfectly suit our needs, decoding a message may be infeasible.

Secret codes from error-correcting codes

An NP-complete problem can lead to a trapdoor function suitable for cryptography. Alice works backwards from an answer to produce a question that is hard to solve in general.

This is easier said than done, as for many NP-complete problems, typical instances can be quickly solved with heuristics: it’s only the worst cases that take exponential time. Also, just knowing Alice’s strategy can be enough to compromise the resulting scheme. For example, historically, cryptosystems based on the knapsack problem have proved easy to crack, though the latest versions are still standing.

Consider building a cryptosystem from the decoding problem. Suppoce Alice publishes a matrix \(\GT\) for an \((n, k)\) code that can correct up to \(e\) bits; this is her public key. Suppose also she knows how to decode an enocded message efficiently; this knowledge is her secret key.

To encrypt a message for Alice, Bob applies \(\GT\) to his \(k\)-bit message, then randomly flips \(e\) of the resulting \(n\) bits before sending it to Alice. To decrypt, Alice uses her secret key to efficiently decode the ciphertext, thus corrects the errors and reveals the original \(k\)-bit message.

We have just described a McEliece cryptosystem. To be secure, Alice’s code must be hard for an eavesdropper Eve to decode. So far as we know, codes based on binary Goppa codes have the required properties.

Alice does not simply write down a stock-standard Goppa generator matrix \(\mathbf{G}\). Rather, she disguises it by transforming the \(k\)-bit input message via some invertible \(k\times k\) matrix \(\mathbf{S}\) before multiplying by \(\mathbf{G}\), and then permuting the output \(n\) bits, which we view as multiplication by some \(n \times n\) matrix \(\mathbf{P}\). In other words, along with \(\mathbf{G}\), Alice chooses \(\mathbf{S}\) and \(\mathbf{P}\) and publishes \(\mathbf{S G P}\).

The heart of our approach to decoding is studying the syndrome to identify up to \(e\) erroneous bits among the \(n\)-bit message: the syndrome decoding problem. This is necessailry NP-complete, since decoding is, and it inspires the more streamlined Niederreiter cryptosystem. Alice publishes a parity check matrix \(\H\). Bob’s plaintext is an \(n\)-bit vector containing at most \(e\) ones. The ciphertext is its syndrome, that is, to encrypt, Bob simply multiplies by \(\H\), an operation Alice can undo efficiently.

Signatures from the syndrome decoding problem

Courtois, Finiasz, and Sendrier build a complementary signature scheme from Niederreiter encryption. (A similar scheme can be built from McEliece encryption.) We do what comes naturally. To sign a message, Alice treats it as a syndrome, and decrypts it as if it were a Niederreiter ciphertext. To verify, Bob encrypts and checks the result matches the message.

There’s a fly in the ointment: Alice will likely fail if she attempts to treat some given string of bits as a syndrome. Only a fraction of syndromes correspond to errors involving \(e\) bits or less.

The authors fix this by having Alice modify the message deterministically until it is a suitable syndrome, choosing parameters so that Alice finds a working modification soon enough. Alice includes a string describing the modification in the signature.

(By coincidence, I happened to attend ASIACRYPT2001, the conference where this paper appeared. I had been selected to present a competing scheme: Short signatures from the Weil pairing. My coauthors felt it was too far to travel, while I was eager to go because I grew up in Australia! Unfortunately, I knew nothing about coding theory at the time.)

Ben Lynn blynn@cs.stanford.edu 💡