data Adt a b c = Foo a | Bar | Baz b c

# The Scott encoding

Encoding data as functions may seem strange at first, but after playing with combinators for a while, the Scott encoding suggests itself.

Take an algebraic data type:

Suppose we come across a value `x`

of type `Adt`

. What can we do with it?

The only nontrivial course of action is to scrutinize it with a case statement:

case x of Foo a -> f a Bar -> g Baz b c -> h b c

What’s the quickest way to turn this expression into a closed lambda calculus term?

One idea that springs to mind is to delete the keywords `case`

and `of`

and
replace each data constructor in a pattern with `\`

. Then we get:

x (\a -> f a) (\ -> g) (\b c -> h b c)

Here, a lambda abstraction with no variables has the obvious meaning, namely,
`\ → g`

is the same as `g`

.

Then the Scott encoding of the value `x`

of type `Adt`

is whatever it takes to
make the above work like the original case expression.
For example, the value `Foo 42`

must be the function:

\f _ _ -> f 42

the value `Bar`

must be:

\_ g _ -> g

and the value `Baz "qux" 9000`

must be:

\_ _ h -> h "qux" 9000

More generally:

Foo a = \f _ _ -> f a Bar = \_ g _ -> g Baz a b = \_ _ h -> h a b

## Booleans, Numbers, Lists

We would like to define booleans as:

data Bool = False | True

Since `False`

appears before `True`

, a compiler seeing this would naturally
index them with the numbers 0 and 1 respectively, which matches common
practice. It matches Haskell, too, as can be seen in the functions: ```
fromEnum
toEnum minBound maxBound succ pred
```

.

Unfortunately, for the time being, we must define:

data Bool = True | False

so our compiler produces the Scott encodings:

True = \x _ -> x False = \_ y -> y

This is because long before computers were commonplace, Alonzo Church devised this particular encoding of booleans, which subsequently became standard. We bow down to this convention to avoid confusion.

Peano numbers are defined by the following algebraic data type:

data Peano = Zero | Succ Peano

The Scott encoding is:

Zero = \f _ -> f Succ n = \_ g -> g n

Unlike Church numerals, the predecessor function (where we define the predecessor of zero to be zero) is easy to write down:

predecessor n = case n of Zero -> Zero Succ n -> n

Using the Scott encoding:

predecessor n = n Zero (\n -> n)

Pure lambda calculus has an undeserved reputation for sloth. Perhaps one of the misconceptions is arithmetic must be performed in unary. Not so! We may define numbers in binary instead of unary:

data Binary = End | Nil Binary | One Binary

But we may as well use a list of booleans instead. A list is defined by:

data [a] = [] | a : [a]

yielding the Scott encodings:

[] = \f _ -> f (:) a as = \_ g -> g a as

Here’s a starter pack of functions for numbers encoded in binary:

```
type Binary = [Bool]
suc :: Binary -> Binary
suc bs = case bs of
[] -> [True]
False : bt -> True : bt
True : bt -> False : suc bt
addC :: Binary -> Binary -> Bool -> Binary
addC as bs c = case as of
[] -> if c then suc bs else bs
(a:at) -> case bs of
[] -> if c then suc as else as
(b:bt) -> ((a /= b) /= c) : addC at bt (a && b || c && (a || b))
add :: Binary -> Binary -> Binary
add as bs = addC as bs False
strip :: Binary -> Binary
strip bs = case bs of
[] -> []
True : bt -> True : strip bt
False : bt -> case strip bt of
[] -> []
bt' -> False : bt'
pre' bs = case bs of
[] -> []
True : bs' -> False : bs'
False : bs' -> True : pre' bs'
pre :: Binary -> Binary
pre bs = strip (pre' bs)
decode' n acc bs = case bs of
[] -> acc
False : bt -> decode' (2*n) acc bt
True : bt -> decode' (2*n) (acc + n) bt
decode :: Binary -> Int
decode = decode' 1 0
encode :: Int -> Binary
encode n = if n == 0 then [] else case divMod n 2 of
(n', 0) -> False : encode n'
(n', 1) -> True : encode n'
```

We have no need for these since we’ll extend CL to support the native integer types of the underlying machine. But we stress again pure lambda calculus numerals are not condemned to be unary!

*blynn@cs.stanford.edu*💡