{- For GHC, uncomment the following, and comment the next block.
{-# LANGUAGE LambdaCase, BlockArguments #-}
import Control.Applicative
import Control.Arrow
-}
module Main where
import Base
import System
demo f = interact $ either id (show . snd . f . deBruijn) . parseLC
foreign export ccall "plainMain" plainMain
plainMain = demo plain
foreign export ccall "optimizeKMain" optimizeKMain
optimizeKMain = demo optK
foreign export ccall "optimizeEtaMain" optimizeEtaMain
optimizeEtaMain = demo optEta
foreign export ccall "rawBulkMain" rawBulkMain
rawBulkMain = demo $ bulkPlain bulk
foreign export ccall "optimizeBulkMain" optimizeBulkMain
optimizeBulkMain = demo $ bulkOpt bulk
foreign export ccall "linBulkMain" linBulkMain
linBulkMain = demo $ bulkOpt breakBulkLinear
foreign export ccall "logBulkMain" logBulkMain
logBulkMain = demo $ bulkOpt breakBulkLog
foreign export ccall "linearbbMain" linearbbMain
linearbbMain = interact $ either id (show . uncurry breakBulkLinear) . parseBulk
foreign export ccall "logbbMain" logbbMain
logbbMain = interact $ either id (show . uncurry breakBulkLog) . parseBulk
data Charser a = Charser { getCharser :: String -> Either String (a, String) }
instance Functor Charser where fmap f (Charser x) = Charser $ fmap (first f) . x
instance Applicative Charser where
pure a = Charser \s -> Right (a, s)
f <*> x = Charser \s -> do
(fun, t) <- getCharser f s
(arg, u) <- getCharser x t
pure (fun arg, u)
instance Monad Charser where
Charser f >>= g = Charser $ (good =<<) . f
where good (r, t) = getCharser (g r) t
return = pure
instance Alternative Charser where
empty = Charser \_ -> Left ""
(<|>) x y = Charser \s -> either (const $ getCharser y s) Right $ getCharser x s
sat f = Charser \case
h:t | f h -> Right (h, t)
_ -> Left "unsat"
eof = Charser \case
[] -> Right ((), "")
_ -> Left "want EOF"
char :: Char -> Charser Char
char = sat . (==)
string :: String -> Charser String
string s = mapM char s
isDigit c = '0' <= c && c <= '9'
isAlphaNum c
| 'A' <= c && c <= 'Z' = True
| 'a' <= c && c <= 'z' = True
| isDigit c = True
| otherwise = False
lcTerm = lam <|> app where
lam = flip (foldr Lam) <$> (lam0 *> some var <* lam1) <*> lcTerm
lam0 = str "\\" <|> str "\955"
lam1 = str "->" <|> str "."
app = foldl1 App <$> some (Var <$> var <|> str "(" *> lcTerm <* str ")")
var = some (sat isAlphaNum) <* whitespace
str = (<* whitespace) . string
whitespace = many $ sat $ (`elem` " \n")
parseLC = fmap fst . getCharser (whitespace *> lcTerm <* eof)
parseBulk = fmap fst . getCharser do
whitespace
c <- sat (`elem` "BCS")
whitespace
n <- foldl (\n c -> n*10 + fromEnum c - fromEnum '0') 0 <$> some (sat isDigit)
eof
pure ([c], n)
instance Show Term where
show (Lam s t) = "\955" ++ s ++ showB t where
showB (Lam x y) = " " ++ x ++ showB y
showB expr = "." ++ show expr
show (Var s) = s
show (App x y) = showL x ++ showR y where
showL (Lam _ _) = "(" ++ show x ++ ")"
showL _ = show x
showR (Var s) = ' ':s
showR _ = "(" ++ show y ++ ")"
Kiselyov Combinator Translation
Lambda term:
Oleg Kiselyov’s algorithm for compiling lambda calculus to combinators seems unreasonably effective in practice.
A Special Bracket Abstraction
The heart of the algorithm is a specialized bracket abstraction. Let \(d_1\) be a combinator and eta-expand it any number of times, that is, nest it in a bunch of lambda abstractions and apply to each of the new variables, starting from the variable bound in the outermost lambda. For example: eta-expanding \(d_1\) three times yields:
when we write it with De Bruijn indices. Let \(d_2\) be a combinator, and eta-expand it any number of times. For example, eta-expanding \(d_2\) twice yields:
Let \(d\) be the result of applying the innermost body of one to that of the other, while sharing the same variables:
How do you eliminate the added variables? That is, can you write \(d\) just using \(d_1, d_2\) and some fixed set of combinators?
We’ll have more to say about eta-expansion in our tour of Böhm’s Theorem, but we do want to mention a subtlety here. The eta-expansion of a lambda term \(F\) is defined to be \(\lambda F 0\). Eta-expanding again results in \(\lambda (\lambda F 0) 0\), which reduces to \(\lambda F 0\), getting us nowhere. On the other hand, eta-expanding just the body of the term, namely \(F 0\), results in the less trivial \(\lambda \lambda F 1 0\).
So when we say we eta-expand a term multiple times, we mean we expand the part after the lambdas (the body of a Böhm tree node), and not the entire term.
Solution
We choose the combinators \(B, R, S\):
(Schönfinkel would have written \(Z, TT, S\); in Haskell, these are (.)
, flip
flip
, and the Reader instance of ap
or (<*>)
.)
Then the following solves our example puzzle:
What’s the secret to deriving this? We could run a standard bracket abstraction algorithm a few times, but better to take advantage of the eta-expansions.
First, some notation. Let \(n\models d\) denote the term \(d\) eta-expanded \(n\) times. For example, \(3 \models d = \lambda \lambda \lambda d 2 1 0\).
Then we may restate our problem as follows. Given \(n_1 \models d_1\) and \(n_2 \models d_2\), our goal is find a combinator term equivalent to:
where there are \(\max(n_1, n_2)\) lambda abstractions. (This equation would be more compact if we started our De Bruijn indices from 1, but I refuse to budge from 0-indexing!)
The trick is to use induction. Define the operation \(\amalg\) by:
We can mechanically verify \(d = n_1 \models d_1 \amalg n_2 \models d_2\) is a solution.
Combinators
Adding a little code turns the above bracket abstraction for variable-sharing eta-expansions into an algorithm for translating any lambda calculus term to combinators.
Some standard combinators:
(Schönfinkel would have written \(I, C, T\); in Haskell, these are id
,
const
, and flip
.)
Now, given a lambda term, we rewrite it in De Bruijn form, where each index is written using a unary representation known as the Peano numbers:
For example, the standard combinators we just mentioned are:
We rewrite our De Bruijn conversion code to use Peano numbers:
data Term = Var String | App Term Term | Lam String Term
data Peano = S Peano | Z
data DB = N Peano | L DB | A DB DB | Free String
index x xs = lookup x $ zip xs $ iterate S Z
deBruijn = go [] where
go binds = \case
Var x -> maybe (Free x) N $ index x binds
Lam x t -> L $ go (x:binds) t
App t u -> A (go binds t) (go binds u)
Define:
where we have recursively computed:
If \(e\) is a closed lambda term and \(n \models d = \ceil{e}\), then \(n = 0\) and the combinator \(d\) is the bracket abstraction of \(e\). More generally, for any (possibly open) lambda term \(e\), adding \(n\) abstractions to \(e\) will close it, and \(d\) is the bracket abstraction of the result.
We’ll wave our hands to explain the algorithm and hope informality helps us get away with less theory. See Kiselyov for details.
The first rule says \(z\) translates to \(1\models I\), namely a thing that becomes the \(I\) combinator after adding one lambda. Indeed, if we apply the third rule, we find \(\ceil{\lambda Z} = 0\models I\), as expected.
The second rule uses \(K\) to shield \(d\) and its \(n\) arguments while destroying the argument at position \(n + 1\).
For example:
As expected, we need 2 lambdas to close the term \(sz\), and \(BKI ab = a = (\lambda\lambda 1) ab\).
The third rule has two cases. If the \(d\) is already a closed term, then the \(K\) combinator ignores the new variable and simply returns \(d\). Otherwise, we decrement the count indicating how many more lambdas we need to close the term.
The last rule uses the specialized bracket abstraction, which captures the idea of applying a possibly open term to another. We see why we want to share variables: when we eventually close the term with lambda abstractions, the indices in both sides of the application refer to the same set of variables.
In our code, the tuple (n, d)
represents \(n \models d\).
data CL = Com String | CL :@ CL
instance Show CL where
showsPrec p = \case
Com s -> (if p > 0 then (' ':) else id) . (s++)
t :@ u -> showParen (p > 0) $ showsPrec 0 t . showsPrec 1 u
convert (#) = \case
N Z -> (1, Com "I")
N (S e) -> (n + 1, (0, Com "K") # t) where t@(n, _) = rec $ N e
L e -> case rec e of
(0, d) -> (0, Com "K" :@ d)
(n, d) -> (n - 1, d)
A e1 e2 -> (max n1 n2, t1 # t2) where
t1@(n1, _) = rec e1
t2@(n2, _) = rec e2
Free s -> (0, Com s)
where rec = convert (#)
plain = convert (#) where
(0 , d1) # (0 , d2) = d1 :@ d2
(0 , d1) # (n , d2) = (0, Com "B" :@ d1) # (n - 1, d2)
(n , d1) # (0 , d2) = (0, Com "R" :@ d2) # (n - 1, d1)
(n1, d1) # (n2, d2) = (n1 - 1, (0, Com "S") # (n1 - 1, d1)) # (n2 - 1, d2)
I have a poor intuitive grasp why Kiselyov’s algorithm seems to beat traditional methods, but it’s clear the algorithm sticks out from the rest of the pack. Roughly speaking, all of them start from the innermost lambdas and work their way outwards, but while traditional algorithms treat locally unbound variables as opaque constants, Kiselyov’s algorithm meticulously records information about each unbound variable: the left of a \(\models\) is the number of lambdas needed to close the variable on the right. Perhaps this industriousness leads to better performance.
Lazy Weakening
Consider the De Bruijn lambda term:
The second variable is unused, so we could derive the following combinator:
However, recall:
which leads to the verbose:
A better strategy is to refrain from immediately converting terms like \(sz\) to combinators. Instead, on encountering \(s\), we leave a sort of IOU that we eventually redeem with a combinator. If both sides of an application turn out to ignore the same variable, we merge the IOUs into one, resulting in savings.
To this end, we replace the number \(n\) in \(n \models d\) with a list of \(n\) booleans, which we often denote by \(\Gamma\). We use cons lists, and denote cons by \((:)\), and the empty list by \(\emptyset\).
The list item at index \(k\) is True
when the variable with De Bruijn index
\(k\) appears at least once in the term, in which case we have already
generated combinators to get at it. Otherwise it is unused and marked False
,
and is only cashed in for a \(K\) combinator when eventually
lambda-abstracted, provided it has never been upgraded to True
because the
other branch of an application uses the corresponding variable.
These 4 cases are the same as before. The remaining cases prepare the ground for lazy weakening.
They also complicate the connection with the specialized bracket abstraction:
where \(\vee\) means we OR together the corresponding booleans of each input list, and if one list is longer than the other, we append the leftovers.
convertBool (#) = \case
N Z -> (True:[], Com "I")
N (S e) -> (False:g, d) where (g, d) = rec $ N e
L e -> case rec e of
([], d) -> ([], Com "K" :@ d)
(False:g, d) -> (g, ([], Com "K") # (g, d))
(True:g, d) -> (g, d)
A e1 e2 -> (zipWithDefault False (||) g1 g2, t1 # t2) where
t1@(g1, _) = rec e1
t2@(g2, _) = rec e2
Free s -> ([], Com s)
where rec = convertBool (#)
optK = convertBool (#) where
([], d1) # ([], d2) = d1 :@ d2
([], d1) # (True:g2, d2) = ([], Com "B" :@ d1) # (g2, d2)
([], d1) # (False:g2, d2) = ([], d1) # (g2, d2)
(True:g1, d1) # ([], d2) = ([], Com "R" :@ d2) # (g1, d1)
(False:g1, d1) # ([], d2) = (g1, d1) # ([], d2)
(True:g1, d1) # (True:g2, d2) = (g1, ([], Com "S") # (g1, d1)) # (g2, d2)
(False:g1, d1) # (True:g2, d2) = (g1, ([], Com "B") # (g1, d1)) # (g2, d2)
(True:g1, d1) # (False:g2, d2) = (g1, ([], Com "C") # (g1, d1)) # (g2, d2)
(False:g1, d1) # (False:g2, d2) = (g1, d1) # (g2, d2)
zipWithDefault d f [] ys = f d <$> ys
zipWithDefault d f xs [] = flip f d <$> xs
zipWithDefault d f (x:xt) (y:yt) = f x y : zipWithDefault d f xt yt
This corresponds with Kiselyov’s OCaml implementation from Section 4, but we’re
more lax with types, and we use a standard list of booleans rather than define
a dedicated type. Kiselyov’s C
can be thought of as the empty list, while N
and W
add True and False to an existing list. It’s like Peano arithmetic with
two kinds of successors.
To compute the OR of two lists, we clumsily traverse two lists with
zipWithDefault
even though the recursive calls to ()
have already
examined the same booleans. We sacrifice elegance so that ()
corresponds to
the \(\amalg\) operation. Kiselyov instead builds the output list while
recursing.
As Kiselyov notes, lazy weakening is precisely the famous K-optimization of textbook bracket abstraction. It fits snugly here, whereas the bracket abstraction algorithms awkwardly require extra passes to detect the absence of a variable in a subtree for each lambda.
The Eta Optimization
The above postpones the conversion of \(s\) to a \(K\) combinator. Section 4.1
of Kiselyov describes how to postpone the conversion of \(z\) to an \(I\)
combinator. The V
that appears in his code is an IOU that is either
eventually redeemed for an \(I\) combinator or optimized away.
We take a blunter approach. During the computation of \(\amalg\), we simply match on \(I\) combinators that can be optimized. Loosely speaking, we examine the result of a bracket abstraction to determine our course of action, rather than pass around what we need to know on the left of a \(\models\).
This may be messy, but on the other hand, we have no need to extend the list of
booleans to a more complex data type, and we simply omit the (N e, V)
and
(V, N e)
cases where no optimizations apply.
Here, we prefer the combinator \(T\) given by \(Tab = ba\) to the \(CI\) in the paper.
optEta = convertBool (#) where
([], d1) # ([], d2) = d1 :@ d2
([], d1) # (True:[], Com "I") = d1
([], d1) # (True:g2, d2) = ([], Com "B" :@ d1) # (g2, d2)
([], d1) # (False:g2, d2) = ([], d1) # (g2, d2)
(True:[], Com "I") # ([], d2) = Com "T" :@ d2
(True:[], Com "I") # (False:g2, d2) = ([], Com "T") # (g2, d2)
(True:g1, d1) # ([], d2) = ([], Com "R" :@ d2) # (g1, d1)
(True:g1, d1) # (True:g2, d2) = (g1, ([], Com "S") # (g1, d1)) # (g2, d2)
(True:g1, d1) # (False:g2, d2) = (g1, ([], Com "C") # (g1, d1)) # (g2, d2)
(False:g1, d1) # ([], d2) = (g1, d1) # ([], d2)
(False:g1, d1) # (True:[], Com "I") = d1
(False:g1, d1) # (True:g2, d2) = (g1, ([], Com "B") # (g1, d1)) # (g2, d2)
(False:g1, d1) # (False:g2, d2) = (g1, d1) # (g2, d2)
Bulk combinators
If we allow families of certain bulk combinators, we get a linear-time specialized bracket abstraction, which in turn yields a linear-time translation algorithm from lambda terms to combinators.
Define:
These perfectly suit shared-eta bracket abstraction:
bulkPlain bulk = convert (#) where
(a, x) # (b, y) = case (a, b) of
(0, 0) -> x :@ y
(0, n) -> bulk "B" n :@ x :@ y
(n, 0) -> bulk "C" n :@ x :@ y
(n, m) | n == m -> bulk "S" n :@ x :@ y
| n < m -> bulk "B" (m - n) :@ (bulk "S" n :@ x) :@ y
| otherwise -> bulk "C" (n - m) :@ (bulk "B" (n - m) :@ bulk "S" m :@ x) :@ y
bulk c 1 = Com c
bulk c n = Com (c ++ show n)
We write a version with K-optimization and eta-optimization, the latter generalized to bulk \(B_n\) combinators. Again we replace counts with lists of booleans indicating whether a variable is used in a term. To exploit bulk combinators, we look for repetitions of the same booleans.
This time, our (##)
function returns the list of booleans along with the
combinator term, so the caller need not compute it.
bulkOpt bulk = \case
N Z -> (True:[], Com "I")
N (S e) -> first (False:) $ rec $ N e
L e -> case rec e of
([], d) -> ([], Com "K" :@ d)
(False:g, d) -> ([], Com "K") ## (g, d)
(True:g, d) -> (g, d)
A e1 e2 -> rec e1 ## rec e2
Free s -> ([], Com s)
where
rec = bulkOpt bulk
([], d1) ## ([], d2) = ([], d1 :@ d2)
([], d1) ## ([True], Com "I") = ([True], d1)
([], d1) ## (g2, Com "I") | and g2 = (g2, bulk "B" (length g2 - 1) :@ d1)
([], d1) ## (g2@(h:_), d2) = first (pre++) $ ([], fun1 d1) ## (post, d2)
where
fun1 = case h of
True -> (bulk "B" (length pre) :@)
False -> id
(pre, post) = span (h ==) g2
([True], Com "I") ## ([], d2) = ([True], Com "T" :@ d2)
(g1@(h:_), d1) ## ([], d2) = first (pre++) $ case h of
True -> ([], Com "C" :@ bulk "C" (length pre) :@ d2) ## (post, d1)
False -> (post, d1) ## ([], d2)
where
(pre, post) = span (h ==) g1
([True], Com "I") ## (False:g2, d2) = first (True:) $ ([], Com "T") ## (g2, d2)
(False:g1, d1) ## ([True], Com "I") = (True:g1, d1)
(g1, d1) ## (g2, Com "I") | and g2, let n = length g2, all not $ take n g1 =
first (g2++) $ ([], bulk "B" $ n - 1) ## (drop n g1, d1)
(g1, d1) ## (g2, d2) = pre $ fun1 (drop count g1, d1) ## (drop count g2, d2)
where
(h, count) = headGroup $ zip g1 g2
fun1 = case h of
(False, False) -> id
(False, True) -> apply "B"
(True, False) -> apply "C"
(True, True) -> apply "S"
pre = first (replicate count (uncurry (||) h) ++)
apply s = (([], bulk s count) ##)
headGroup (h:t) = (h, 1 + length (takeWhile (== h) t))
Classic bracket abstraction algorithms blow up quadratically on lambdas terms of the form:
In contrast, our implementation finds:
No bulk combinators
What if we disallow families of combinators?
If we allow sharing (memoization), then define the following cousins of \(B, C, S\) (Smullyan might call them "once removed"):
The eta-optimized translation yields:
We have:
We can use these recurrences to generate each bulk combinator we need exactly once, again resulting in a linear-time translation:
breakBulkLinear "B" n = iterate (comB' :@) (Com "B") !! (n - 1)
breakBulkLinear "C" n = iterate (comC' :@) (Com "C") !! (n - 1)
breakBulkLinear "S" n = iterate (comS' :@) (Com "S") !! (n - 1)
comB' = Com "B":@ Com "B"
comC' = Com "B":@ (Com "B" :@ Com "C"):@ Com "B"
comS' = Com "B":@ (Com "B" :@ Com "S"):@ Com "B"
Our interactive demo supports this translation, but shows the combinator term expanded in full, with no sharing.
No sharing
What if we also diasllow sharing?
We generalize \(C'\) and \(S'\) from the paper:
One can check:
This suggests breaking bulk combinators into standard combinators in a manner akin to exponentiation by squaring. To compute \(S_{50}\) for example, define the following combinators:
In combinators: \(b_0 = SBI, b_1 = B(B(B(BS)B))(SBI)\).
Then chain together \(b_0\) or \(b_1\) according to the bits of 50 written in binary (110010), and apply to \(I\):
The \(B_n\) case is simpler because there’s no need for \(B'\).
bits n = r:if q == 0 then [] else bits q where (q, r) = divMod n 2
breakBulkLog c 1 = Com c
breakBulkLog "B" n = foldr (:@) (Com "B") $ map (bs!!) $ init $ bits n where
bs = [sbi, Com "B" :@ (Com "B" :@ Com "B") :@ sbi]
breakBulkLog c n = (foldr (:@) (prime c) $ map (bs!!) $ init $ bits n) :@ Com "I" where
bs = [sbi, Com "B" :@ (Com "B" :@ prime c) :@ sbi]
prime c = Com "B" :@ (Com "B" :@ Com c) :@ Com "B"
sbi = Com "S" :@ Com "B" :@ Com "I"
Thus we can break down bulk combinators into standard combinators without sharing if we pay a factor logarithmic in the input size.
The savings are more easily seen with the following demo, which breaks down a single bulk combinator.
Bulk combinator:
Return of the Combinators?
Combinators were abandoned by compiler authors long ago: how do you optimize a combinator term? It seems we must simulate it in order to undertand it enough to tinker with it, but doesn’t this simulation convert it back to a lambda term? See Peyton Jones, The Implementation of Functional Programming Languages, section 16.5.
Kiselyov suggests combinators may be worth considering again. Unlike existing approaches, his algorithm understands lambda terms, so it might be suitable for optimizations that are conventionally performed after generating combinations, thus avoiding simulations. At any rate, on real world examples, the eta-optimized variant seems to produce reasonable results.
Our toy compiler for various minimalist combinatory logic and lambda calculus languages features Kiselyov’s algorithm.
So does my award-winning Haskell compiler, which built the wasm binaries running the widgets on this page.