$ wget https://crypto.stanford.edu/~blynn/haskell/enigma.lhs $ ghci enigma.lhs

# From Atbash to Enigma

2500 years of cryptography in a few lines of Haskell, from ancient ciphers to the infamous Enigma machine:

Ringstellung: Grundstellung:

Input:

Output:

Our code can be run by downloading the source of this page:

Preliminary administrivia:

```
module Enigma where
import Data.Bool
import Data.Char
import Data.List
import Data.Maybe
```

## Writing Wrong

The
*substitution cipher*
is one of the oldest encyption schemes. The idea is so simple that I
inadvertently invented it myself long ago: my handwriting is so bad that
readers often confuse some of my letters for others! Substitution ciphers just
do this deliberately.

We work with the uppercase Latin alphabet. Our code leaves other characters untouched.

`abc = ['A'..'Z']`

A substitution cipher is defined by a permutation \(\pi\) of the alphabet.
The encryption of a letter \(x\) is \(\pi(x)\). We represent permutations with
*words*, that is, we write the permutation \(\pi\) as \(p_A p_B … p_Z\) where
\(p_x = \pi(x)\).

Given a permutation `p`

, we can write functions to encrypt and decrypt
with the corresponding substitution cipher as follows:

```
sub p x = fromMaybe x $ lookup x $ zip abc p
unsub p x = fromMaybe x $ lookup x $ zip p abc
```

The *keyword cipher* is a folk method for constructing a permutation.
We take a keyword and append the alphabet, removing any duplicate letters.
To guard against bad inputs we filter so that only uppercase letters are kept:

```
instaperm k = nub $ filter isUpper k ++ abc
prop_permExample = instaperm "SWORDFISH" == "SWORDFIHABCEGJKLMNPQTUVXYZ"
prop_keywordExample =
(sub (instaperm "SWORDFISH") <$> "SCOLD HIRE") == "POKER HAND"
prop_unsubUndoesSub k =
(unsub (instaperm k) . sub (instaperm k) <$> abc) == abc
```

The *Atbash cipher* for the Latin
alphabet is the substitution cipher using the permutation whose word
representation is the reverse of the alphabet:

```
atbash = sub $ reverse abc
prop_atbashExample = map atbash "SLIM GIRL" == "HORN TRIO"
```

We chose our examples carefully. Few English words encrypt to other English words. [Exercise: write code to find them all from a given dictionary.]

A *Caesar cipher* or *Caesar
shift* is a substitution cipher where each letter is replaced by the letter
that is a fixed number of positions ahead, wrapping around the alphabet if
necessary. For example, if A maps to D, then B maps to E, C to F, and so on.

We represent a Caesar shift by the letter to which A is mapped. This letter is the secret key.

```
shift k = sub $ dropWhile (/= k) $ cycle abc
unshift k = unsub $ dropWhile (/= k) $ cycle abc
prop_rot13Example = (shift 'N' <$> "ABJURER") == "NOWHERE"
```

Shifting by one letter comes in handy, so we give this special case a short name:

```
bump = shift 'B'
prop_unshiftAtbash = and
[unshift k c == bump (shift (atbash k) c) | k <- abc, c <- abc]
```

`prop_2001 = map bump "HAL" == "IBM"`

## Polyalphabetic Substitution

The
*VigenĂ¨re cipher* is a
repeated sequence of Caesar shifts. We repeat a given key \(k_1…k_n\) until
it is as long as the message, and shift each plaintext letter by the
corresponding letter in the extended key:

```
vigenere = zipWith shift . cycle
unvigenere = zipWith unshift . cycle
prop_vigenereExample = vigenere "LEMON" "ATTACKATDAWN" == "LXFOPVEFRNHR"
```

Our code is inconsistent. Earlier, our functions worked on one character at a
time, but now they expect entire strings. This might seem unavoidable because
of *polyalphabetic substitution*, that is, because the permutation used depends
on the position of the character in our plaintext.

However, we can still write an encryption function that operates on a single
character at a time as long as we hang on to some extra state, namely, which
letter of the keyword to use next. And `Data.List.mapAccumL`

is tailor-made for
calling such a function in order to encrypt a string:

```
vigenereChar (k:ks) x = (ks, shift k x)
vigenere' ks = snd . mapAccumL vigenereChar (cycle ks)
```

According to Wikipedia, this cipher is poorly named. Apparently, Giovan Battista Bellaso originally described this cipher in 1553, and Blaise de VigenĂ¨re in fact described the autokey cipher in 1586:

```
autokey ks xs = zipWith shift xs $ ks ++ xs
unautokey ks xs = m where m = zipWith unshift (ks ++ m) xs
prop_autokeyExample = autokey "QUEENLY" "ATTACKATDAWN" == "QNXEPVYTWTWP"
prop_unautokeyExample = unautokey "QUEENLY" "QNXEPVYTWTWP" == "ATTACKATDAWN"
```

## One-rotor Enigma

Enigma machines contain *rotors*, or *wheels*. Each rotor is a hard-wired
substitution cipher. Literally: there are wires fixed in position which
represent a particular permutation of the alphabet.

Let’s take a well-known rotor:

`pI = "EKMFLGDQVZNTOWYHXUSPAIBRCJ"`

On the first keypress, the rotor rotates by one letter, so that the wire from A to E now goes from Z to D, the wire from B to K now goes from A to J, and so on. Current flows from the letter that was typed to the encryption of that letter through the wire that connects them.

This repeats for subsequent letters: the rotor rotates by one letter, then electric current flows from the letter that was struck to determine its encryption. Thus encryption for one wheel can be described as follows:

```
oneRotorEnigma = unvigenere abc . map (sub pI) . vigenere abc
prop_oneRotorEnigmaExample = oneRotorEnigma "AAAAA" == "EJKCH"
```

We use letters to denote how far a wheel has rotated, from A for 0 rotations to Z for 25 rotations.

Above, we really should have used `bump abc`

instead of `abc`

because the rotor
turns before the first letter is enciphered. On a real machine, if we started
with the wheel in the A position (no rotations), we would get "JKCHB"; we would
get "EJKCH" by starting in the Z position. However, this is a trivial
off-by-one issue that we’ll fix later.

Assuming the adversary obtains a copy of the wheels, the secret key is the choice of wheel and its initial position.

## Early Enigma

The earliest Enigma machines consisted of three rotors connected in series. The
first rotor rotated by one letter for each keypress. The second rotor rotated
when the first rotor completed a full revolution, that is, every 26 keypresses.
Similarly, the third rotor rotated when the second rotor completed a full
revolution, that is, every 26^{2} keypresses:

```
pII = "AJDKSIRUXBLHWTMCQGZNPYFVOE"
pIII = "BDFHJLCPRTXVZNYEIWGAKMUSQO"
abcDup n = concatMap (replicate n) abc
rotor1 f = unvigenere abc . f . vigenere abc
rotor2 f = unvigenere (abcDup 26) . f . vigenere (abcDup 26)
rotor3 f = unvigenere (abcDup $ 26^2) . f . vigenere (abcDup $ 26^2)
almostEarlyEnigma = rotor3 f3 . rotor2 f2 . rotor1 f1
where [f1, f2, f3] = map . sub <$> [pI, pII, pIII]
```

Actually, the above is slightly incorrect for a couple of reasons we shall explore later (excluding the off-by-one bug described above), but suffices for a first approximation.

Assuming the adversary obtains a copy of the wheels, the secret key is the choices of the wheels, their ordering, and their initial positions.

## Self-inverse Engima

From a user’s perspective, a drawback of the original Enigma machine is that
decryption is different to encryption. If the encryption function were its own
inverse (an
*involution*), then the
machine would be simpler to use: whether encrypting or decrypting, just set up
the rotors and type away.

Thus a *reflector* was introduced. This is a hard-wired substitution cipher
that is its own inverse. For example, if A maps to Y, then Y must map to A.
For reasons we will explain below, the reflector must map each letter to a
distinct letter. For example, despite being self-inverse, the identity
permutation is an invalid reflector.

In mathematical terms, the permutation is a product of thirteen 2-cycles.

```
reflectorB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"
prop_reflectorNoFixed = and $ zipWith (/=) abc $ map (sub reflectorB) abc
prop_reflectorSelfInverse = iterate (map (sub reflectorB)) abc!!2 == abc
```

Then on each keypress, after the rotors have turned, the electrical current is sent through the wheels in one direction to the reflector and then sent back through the wheels in the opposite direction. This why the reflector permutation must have no fixed points: current can only travel in one direction on a wire.

The reflector makes the Enigma cipher its own inverse: the letter on one end of the path taken by the current swaps with the letter on the other end. The encryption of a letter must be a different letter, a weakness gleefully exploited by cryptanalysts.

In mathematical terms, two permutations have the same cycle structure if and only if they are conjugates, though here we only need one direction of this fact.

Because of the physical layout of the machine, if we use the "default" setting and place the wheels I, II, and III from left to right, then the current goes through wheel III, then II, then I, then the reflector, then back through I, then II, then III:

```
almostEnigma = rotor1 b1 . rotor2 b2 . rotor3 b3 .
reflect . rotor3 f3 . rotor2 f2 . rotor1 f1
where
[f1, f2, f3] = map . sub <$> [pIII, pII, pI]
[b1, b2, b3] = map . unsub <$> [pIII, pII, pI]
reflect = map $ sub reflectorB
prop_almostEnigmaAAAAAA = almostEnigma "AAAAAA" == "UBDZGO"
prop_almostEnigmaSelfInverse s = (almostEnigma . almostEnigma) s == s
```

Our code almost simulates the Enigma I with its wheels set to AAZ (not AAA due to the off-by-one bug).

## Dubstep

We’ve almost recreated an Enigma machine. Some differences stem from mechanical engineering.

In a computer program, turning the second rotor for every 26 turns of the first rotors might be implemented with a counter. With physical rotors, an elegant solution is to carve a notch in the wheel that causes the next wheel to turn. This notch is always in the same place, so we do not always reach it on the 26th keystroke. Instead, the first time we reach the notch depends on the starting position of the rotor (and afterwards we reach the notch again every 26 turns).

A related problem caused by the notches is the *double stepping anomaly*.
We ignore the mechanical
details, and just state its effects. If we reach the notch on the middle
wheel, then the middle wheel turns when the right wheel turns. Hence the name
"double stepping": after a keystroke causes the middle wheel to turn to its
notch, the next keystroke causes it to turn again, past its notch (which in
turn will cause the left wheel to turn).

It’s as if we had a malfunctioning 3-digit counter:

... 0 8 7 0 8 8 0 8 9 0 9 0 1 0 1 1 0 2 ...

Some Enigma variants feature rotors with multiple notches.

## Round and Round

One more subtlety. We have been using letters to indicate how far a wheel has
rotated. These are called the *indicator settings* or the *Grundstellung*. On
Enigma machines, each rotor is labeled with the letters of the alphabet on an
index ring and the indicator settings appear in a row of little windows.

It turns out we can also rotate the wiring relative to the index ring, and we
also denote the extent of such a rotation with a letter. These are called the
*ring settings* or the *Ringstellung*. Grundstellung and Ringstellung
rotations are measured in opposite directions.

We can account for the ring settings by modifying our shift offset. However, the notch is a feature of the index ring, and not the wiring. The propagation of rotation to the next wheel always occurs for the same letter in the indicator window, but the wiring inside the current wheel may be in a different position. Our code must handle this correctly.

## Enigma Variations

To make life harder for codebreakers, the German military augmented commercial
Enigma with an aftermarket part known as the *Steckerbrett* or the
*plugboard*. This allowed users to plug in cables to swap up to 13 pairs of
letters just before a signal enters the rotors and just after it leaves.

In other words, it is a self-inverse permutation applied before and after the standard Enigma cipher. Unlike the reflector, the Steckerbrett may have fixed points, that is, it can be any self-inverse permutation.

All these extra details encourage us to introduce a data structure to hold the state of an Enigma machine:

```
data Enigma = Enigma
{ rotors :: [(String, String)]
, reflector :: String
, grundstellung :: String
, ringstellung :: String
, steckerbrett :: String
} deriving (Eq, Show)
```

The `rotors`

list holds descriptions of the rotors from left to right on
a physical machine; the electrical signal from a keystroke enters the
rightmost wheel first. Each rotor is a pair of strings: the first string
is a permutation of the alphabet as a word, and the second describes all
notches on the rotor.

Here are components from real Enigma machines:

```
wI = ("EKMFLGDQVZNTOWYHXUSPAIBRCJ", "Q")
wII = ("AJDKSIRUXBLHWTMCQGZNPYFVOE", "E")
wIII = ("BDFHJLCPRTXVZNYEIWGAKMUSQO", "V")
wIV = ("ESOVPZJAYQUIRHXLNFTGKDCMWB", "J")
wV = ("VZBRGITYUPSDNHLXAWMJQOFECK", "Z")
ukwA = "EJMZALYXVBWFCRQUONTSPIKHGD"
ukwB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"
ukwC = "FVPJIAOYEDRZXWGCTKUQSBNMHL"
defaultEnigma = Enigma
{ rotors = [wI, wII, wIII]
, reflector = ukwB
, grundstellung = "AAA"
, ringstellung = "AAA"
, steckerbrett = abc
}
```

UKW stands for Umkehrwalze, the German term for the reflector. The secret key consists of the ring settings, indicator settings, reflector choice, rotor choices, rotor order, and plugboard cables.

One iteration of the turning of the wheels can be described as follows:

```
turn m = m { grundstellung =
[ bool g1 (bump g1) $ g2 `elem` n2
, bool g2 (bump g2) $ g2 `elem` n2 || g3 `elem` n3
, bump g3
]} where
[g1, g2, g3] = grundstellung m
[n1, n2, n3] = snd <$> rotors m
```

The `zap`

function follows the current through the wires when a key is
struck on an Enigma machine `m`

to find its encryption:

```
conjugateSub p k = unshift k . sub p . shift k
rotorSubs m = zipWith conjugateSub (fst <$> rotors m) $
zipWith unshift (ringstellung m) $ grundstellung m
zap m = st . unsub p . sub (reflector m) . sub p . st where
p = foldr1 (.) (rotorSubs m) <$> abc
st = sub $ steckerbrett m
```

It remains to write wrappers. For an uppercase letter, the `enigmaChar`

function advances the machine one iteration then finds the encryption of the
letter. Otherwise we just leave the machine alone and return the input
character unchanged. The `enigma`

function passes this function to `mapAccumL`

to encrypt strings.

```
enigmaChar m k = bool (m, k) (m', zap m' k) $ isUpper k where m' = turn m
enigma m = snd . mapAccumL enigmaChar m
prop_enigmaExample =
enigma (defaultEnigma { ringstellung = "BBB" }) "AAAAA" == "EWTYX"
```

The top of this webpage features a simulation of an Enigma machine with rotors I, II, and III from left to right, the B reflector, and no plugboard. The ring and indicator settings are initially both AAA, but these can be adjusted by dragging the black rings or boxes on the wheels, or by typing in the text areas.

I built it on top of the code throughout this article, also with Haskell, and compiled it to JavaScript with Haste. Hopefully, my simulation agrees with other online Enigma simulations:

*blynn@cs.stanford.edu*💡