Crazy L
Let’s build a better compiler based on combinators. This time, we’ll produce WebAssembly for a family of languages related to Lazy K, which in turn is a union of minimalist languages based on combinator calculus.
Nat
Lazy K
Fussy K
Crazy L
intermediate form:
wasm:
Input: Output:
Vanishingly Small
The syntax of SKI combinator calculus is already terse, but we can pare it down further.
For starters, we can use Polish notation to replace pairs of parentheses with a single symbol. The Unlambda language chooses the backquote, while Iota chooses the asterisk. (Thus obtaining languages that are even more serious about prefix notation than Lisp.)
We squeeze the syntax harder by playing with combinators.
In Iota, we define the combinator \(\iota = \lambda x . x S K\). It’s handy to define \(V = \lambda x y z . z x y\), which Smullyan calls the Vireo, or the pairing bird, so we can write \(\iota = V S K\). Then
from which we deduce \(\iota (\iota \iota) = S K\), \(\iota (\iota (\iota \iota)) = K\), and \(\iota (\iota (\iota (\iota \iota))) = S\).
The Jot language is another two-symbol language with an interesting property: any string of 0s and 1s is a valid program:
Here \([F]\) represents the decoding of a string \(F\) of 0s and 1s. In particular, the empty string is a valid program: it represents \(I\), the identity combinator.
Incidentally, the description of Jot on Wikipedia seems erroneous (as of May 2017). I get the impression that \(\iota = \lambda w.w S K\) is confused with \(\lambda w.S(K w)\), so while \(w0\) indeed denotes \(\iota w\), \(w1\) actually denotes \(S(K w)\).
Also, in general, \(0^* w\) differs from \(I w\), which is only a minor issue for Gödel numbering: like floating point numbers, we can tacitly assume the presence of a leading 1 bit. All the same, there must be some reason for decoding from the end of the string instead of the beginning, and it would be nice if leading zeroes could be omitted without changing a program’s meaning.
In sum, we can express SK terms in various languages as follows:
dumpIota = dumpWith '*' "*i*i*ii" "*i*i*i*ii" dumpJot = dumpWith '1' "11100" "11111000" dumpUnlambda = dumpWith '`' "k" "s" dumpWith apCh kStr sStr = fix $ \f -> \case x :@ y -> apCh:f x ++ f y Var "K" -> kStr Var "S" -> sStr _ -> error "SK terms only"
Lazy K
Lazy K combines the syntaxes of SKI
combinator calculus, Unlambda, Iota, and Jot, which amazingly coexist mostly in
peace. The only exception is the i
program, which Lazy K interprets as the
identity combinator rather than the iota combinator.
Lazy K expects the first and only argument of the given program to be a list, in the form of nested Church-encoded pairs. The end of a finite list is represented by an infinite list where every element is (the Church encoding of) 256. For example, the string "AB" would be represented as:
V 65 (V 66 (V 256 (V 256 (...))))
where, as before, Vxyz = zxy
, and the numbers are Church-encoded. The
reference interpreter treats any number above 256 as 256.
This is an unfortunate choice. Lambdas and combinators hail from a beautiful mathematical world, which Lazy K has polluted with some constant or other. Obviously, the constant 256 was chosen to suit certain real-life situations, but why constrain ourselves so early in the design process?
Better to represent the end of the list out-of-band. Then we could operate on lists of arbitrary natural numbers, as well as the case when the input is a list of 8-bit bytes. When it’s time to write interpreters and compilers, we may impose limits due to the messiness of the real world, but languages themselves ought to be neat.
Fussy K
The reference implementation of Lazy K is sloppy with respect to the output.
Ideally, it should look for V 256 x
in the output list for any value of x
,
at which point the program should terminate, but instead, the current item of
the list is tested by applying it to the K combinator, and if this returns 256
then the program halts. Indeed, the documentation explicitly mentions that K
256
is a valid end-of-output marker.
However V 256 x
behaves differently to K 256
. For example
V 256 x (KI) = x
while K 256 (KI) = 256
. This complicates our
implementation.
We tie up this loose end by defining Fussy K to be the Lazy K language as it
is specified, that is, the output list must be terminated with a 256 in the
first argument of a V combinator; K 256
will not do.
Crazy L
Let’s design a cleaner Lazy K, and add a few features.
For the input encoding, instead of pairs, we use the right fold representation of lists.
List manipulations become elegant. With types, we could readily prove certain programs terminate on finite inputs, and other theorems. Also for finite inputs, we could choose any evaluation order when running our program. Nonetheless, we’ll stick with lazy evaluation so we can also handle infinite inputs.
We write our interpreter and compiler to expect right fold encodings, and
use the following shim to convert a list x
to Lazy K’s input encoding:
\x.xV(Y(\f.V 256 f))
where Y is the Y combinator and 256 is the Church encoding of 256.
We add support for lambda abstractions and top-level definitions, where all
variables must be single characters other than skiSICKB
.
We name our language Crazy L.
Parsing
We catch up with an old friend: an AST for lambda calculus terms. Once again, we wish to eliminate all the lambda abstractions, leaving only variables and applications.
{-# LANGUAGE CPP #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE RecordWildCards #-} {-# LANGUAGE FlexibleInstances #-} #ifdef __HASTE__ {-# LANGUAGE OverloadedStrings #-} import Haste.DOM import Haste.Events import Haste.Foreign import Data.Bool import Data.IORef #else import Data.Function (fix) import System.Console.Haskeline import System.Environment import System.IO import Test.HUnit import Criterion.Main hiding (env) #endif import Control.Monad import Data.Char import Data.List import qualified Data.Map as M import Data.Map (Map, (!)) import Text.Parsec infixl 5 :@ data Term = Var String | Term :@ Term | Lam String Term
We define a few combinators. Some of their names are a nod to Smullyan’s "To Mock a Mockingbird".
consBird, succBird, vireo, skk, vsk, forever256 :: Term consBird = mustParse "((BS)((B(BB))(CI)))" succBird = Var "S" :@ Var "B" vireo = Var "B" :@ Var "C" :@ (Var "C" :@ Var "I") skk = Var "I" vsk = vireo :@ Var "S" :@ Var "K" forever256 = mustParse "SII(B(BC(CI)(SII(SII(SBI))))(SII))"
Our parser closely follows the grammar specified in the description of Lazy K. Differences:
-
We support lambda abstractions, e.g:
\x.x
-
With
=
, we can assign terms to any letter except those in"skiSICKB"
, e.g:c=\htcn.ch(tcn)
. The lettersB
andC
are reserved for the B and C combinators.
Definitions may only use the core language and previously defined letters. In particular, recursive functions must be defined via the Y combinator.
We expect a term at the top-level, which we consider to be a definition of
the special symbol main
.
Comments begin with #
and all whitespace except newlines are ignored.
Later definitions override earlier ones. In particular, because we support Jot, any trailing newlines (possibly with comments) are significant and change the program to be simply the I combinator.
top :: Parsec String () (String, Term) top = (try super <|> (,) "main" <$> ccexpr) <* eof where super = (,) <$> var <*> (char '=' >> ccexpr) ccexpr = option skk $ foldl1 (:@) <$> many1 expr expr = const skk <$> char 'i' <|> expr' iotaexpr = const vsk <$> char 'i' <|> expr' expr' = jotRev . reverse <$> many1 (oneOf "01") <|> const skk <$> char 'I' <|> Var . pure . toUpper <$> oneOf "ks" <|> Var . pure <$> letter <|> between (char '(') (char ')') ccexpr <|> (char '`' >> (:@) <$> expr <*> expr) <|> (char '*' >> (:@) <$> iotaexpr <*> iotaexpr) <|> flip (foldr Lam) <$> between (char '\\' <|> char '\955') (char '.') (many1 var) <*> ccexpr var = lookAhead (noneOf "skiSICKB") >> pure <$> letter jotRev [] = skk jotRev ('0':js) = jotRev js :@ Var "S" :@ Var "K" jotRev ('1':js) = Var "S" :@ (Var "K" :@ jotRev js) jotRev _ = error "bad Jot term" parseLine :: String -> Either ParseError (String, Term) parseLine = parse top "" . filter (not . isSpace) . takeWhile (/= '#') mustParse :: String -> Term mustParse = either undefined snd . parseLine
Since a definition may only use previously defined symbols, we can substitute
letters for their definitions terms as we parse a program line by line, and
keep our terms fully expanded and bracket abstracted in env
.
sub :: [String] -> [(String, Term)] -> Term -> Either String Term sub bvs env = \case x :@ y -> (:@) <$> sub bvs env x <*> sub bvs env y Var s | elem s bvs -> Right $ Var s | [c] <- s, elem c combs -> Right $ Var s | Just t <- lookup s env -> Right t | otherwise -> Left $ s <> " is free" Lam s t -> Lam s <$> sub (s:bvs) env t parseEnv :: [(String, Term)] -> String -> Either String [(String, Term)] parseEnv env ln = case parseLine ln of Left e -> Left $ show e Right (s, t) -> case sub [] env t of Right u -> Right $ (s, u):env Left e -> Left e parseProgram :: String -> Either String Term parseProgram program = case foldM parseEnv [] $ lines program of Left err -> Left err Right env -> maybe (Left "missing main") Right $ lookup "main" env
A program is interpreted according to which language variant we’ve chosen. Lazy K, Fussy K, and Crazy L all take streams of bytes as input and all produce streams of bytes as output. To this, we add the Nat language, we expects no input and just outputs a Church-encoded number.
data Lang = LazyK | FussyK | CrazyL | Nat
De Bruijn indices
Recall 'De Bruijn indices' arise when we replace each variable
with an integer representing the number of Lam
nodes we encounter as we
travel up the parse tree before reaching the Lam
that introduced it. (Wasm
branch labels are similar.)
For example,
becomes:
Also, \(S = \lambda\lambda\lambda 2 0(1 0)\) and \(K = \lambda\lambda 1\).
This time, we employ a tagless final representation for De Bruijn terms:
infixl 5 # class Deb repr where ze :: repr su :: repr -> repr lam :: repr -> repr (#) :: repr -> repr -> repr prim :: String -> repr
We declare an instance so we can display De Bruijn terms for debugging and testing.
data Out = Out { unOut :: String } instance Deb Out where ze = Out "Z" su e = Out $ "S(" <> unOut e <> ")" lam e = Out $ "^" <> unOut e e1 # e2 = Out $ unOut e1 <> unOut e2 prim s = Out s
Sick B
This time, we need the B and C combinators ((.)
and flip
in Haskell) as
well as the S and K combinators. We also directly implement I combinators,
rather than making do with SKK.
A straightforward recursion computes the De Bruijn indices.
toDeb :: Deb repr => [String] -> Term -> repr toDeb env = \case Var s -> case elemIndex s env of Nothing -> case s of "S" -> lam $ lam $ lam $ su(su ze) # ze # (su ze # ze) "B" -> lam $ lam $ lam $ su(su ze) # (su ze # ze) "C" -> lam $ lam $ lam $ su(su ze) # ze # su ze "K" -> lam $ lam $ su ze "I" -> lam ze _ -> prim s Just n -> iterate su ze !! n Lam s t -> lam $ toDeb (s:env) t x :@ y -> toDeb env x # toDeb env y
We translate to combinators with an algorithm due to Kiselyov. Again we use a tagless final representation for the output.
infixl 5 ## class SickB repr where kV :: String -> repr (##) :: repr -> repr -> repr instance SickB (Bool -> Out) where kV s _ = Out s (e1 ## e2) False = Out $ unOut (e1 False) <> unOut (e2 True) (e1 ## e2) _ = Out $ "(" <> unOut (e1 False) <> unOut (e2 True) <> ")"
We introduce the Babs
data type because the algorithm distinguishes between
closed terms and 3 kinds of unclosed terms. We return a closed term,
but along the way we manipulate open terms.
Kiselyov’s code omits the (V, V)
case because it applies to a simply typed
algebra. We allow untyped terms.
data Babs repr = C {unC :: repr} | N (Babs repr) | W (Babs repr) | V instance SickB repr => Deb (Babs repr) where prim s = C (kV s) ze = V su = W l # r = case (l, r) of (W e, V) -> N e (V, W e) -> N $ C (c ## i) # e (N e, V) -> N $ C s # e # C i (V, N e) -> N $ C (s ## i) # e (C d, V) -> N $ C d (V, C d) -> N $ C $ c ## i ## d (V, V) -> N $ C $ s ## i ## i (W e1, W e2) -> W $ e1 # e2 (W e, C d) -> W $ e # C d (C d, W e) -> W $ C d # e (W e1, N e2) -> N $ C b # e1 # e2 (N e1, W e2) -> N $ C c # e1 # e2 (N e1, N e2) -> N $ C s # e1 # e2 (C d, N e) -> N $ C (b ## d) # e -- Reasonable alternative: -- (N e, C d) -> N $ C c # e # C d (N e, C d) -> N $ C (c ## c ## d) # e (C d1, C d2) -> C $ d1 ## d2 where [s,i,c,b] = kV . pure <$> "SICB" lam = \case V -> C i C d -> C $ k ## d N e -> e W e -> C k # e where [i,k] = kV . pure <$> "IK" showBabs :: Term -> String showBabs t = unOut $ unC (toDeb [] t) False
Interpreter
We build an interpreter to guide our compiler design.
We envision a machine with 4 registers (you can tell I grew up on x86 assembly):
-
IP: holds top-most stack item; dedicating a register to this reduces memory reads and writes.
-
SP: stack pointer; grows downwards from the top of memory.
-
HP: heap pointer; grows upwards from the bottom of memory.
-
AX: accumulator
We arbitrarily decide our wasm instances will request 64 pages of memory.
pageCount :: Int pageCount = 64 maxSP :: Int maxSP = pageCount * 65536 data VM = VM { ip, hp, sp :: Int , ax :: Int , mem :: Map Int Int , input :: String , lang :: Lang }
In addition to the standard SICKB combinators, we add special combinators for practical reasons.
combs :: [Char] combs = "SICKB0+<>."
The heap is organized as an array of 8-byte entries, each consisting of two
4-byte combinators x
and y
. The meaning of such an entry is xy
.
A negative 4-byte value represents one of the primitive combinators. Otherwise it is the address of another 8-byte entry in the heap.
This encoding scheme means if a term consists of a single primitive combinator, such as K, then we must represent it as IK since at minimum a cell holds two combinators.
instance SickB (Int -> [Int]) where kV s _ = [enCom s] (e1 ## e2) n = n:h1:h2:t1++t2 where (h1:t1) = e1 (n + 8) (h2:t2) = e2 (n + 8 + wlen t1) wlen :: [a] -> Int wlen = (4*) . length enCom :: String -> Int enCom [c] | Just n <- elemIndex c combs = -n - 1 enCom s = error $ show s encAt :: Int -> Term -> [Int] encAt n t = tail $ unC (toDeb [] t) n dump :: VM -> String dump VM{..} = unlines $ take 50 . f <$> ps where f a | a < 0 = pure $ combs!!(-a - 1) f a = "(" ++ f (de a) ++ f (de $ a + 4) ++ ")" ps = ip:[de $ 4 + de p | p <- [sp, sp + 4..maxSP - 4]] de k | Just v <- M.lookup k mem = v | otherwise = error $ "bad deref: " ++ show k
We place the Church encoded integers from [0..256] in linear memory starting from 0. Each takes one 8-byte cell, so that the Church encoding of 'n' lies at address '8n' in memory.
Our input handler uses these to quickly map a number up to 256 to its Church encoding. Larger input numbers are unsupported. In principle, we could generate encodings for them on demand, but if we really wanted big numbers we’d use a more efficient encoding, or add a primitive integer type.
Zero is represented by SK, and 'n + 1' is represented by 'm n' where 'm' is
the combinator that computes the successor of a Church number. We place the
definition of 'm' just after the Church numbers, that is, at memory address
8*257
.
gen :: [Int] gen = enCom "S" : enCom "K" : -- Zero concat [[m, 8*n] | n <- [0..255]] ++ -- [1..256] encAt m succBird -- Successor combinator. where m = 8*257
We encode a program immediately after the above. We add our special combinators differently for each language so that the term will behave accordingly, which we explain later.
encodeTerm :: Lang -> Term -> [Int] encodeTerm lang t = (gen ++) $ encAt (wlen gen) $ case lang of Nat -> t :@ Var "+" :@ Var "0" LazyK -> t :@ ugh :@ Var ">" :@ Var "+" :@ Var "0" FussyK -> t :@ ugh :@ Var ">" CrazyL -> t :@ inp :@ Var ">" :@ Var "." where inp = Var "<" :@ consBird ugh = inp :@ vireo :@ forever256
The IP register points to our term, which is just after the Church-encoded integers. The HP register points to the free heap, which begins just after our program. The SP register points to the top of memory, as the stack is initially empty.
sim :: Lang -> String -> Term -> String sim mode inp e = exec VM { ip = wlen gen , sp = maxSP , hp = wlen bs , ax = 0 , mem = M.fromList $ zip [0,4..] bs , input = inp , lang = mode } where bs = encodeTerm mode e
Lazy evaluation is important for the S, B, and C combinators, that is, we must memoize so future evaluations avoid recomputing the same reduction. Without this, even simple programs may be too slow.
We also memoize the result of the K combinator, but this is less vital.
There may be some memoization possible with the I combinator, but it’s practically a tag.
The upd
function updates the heap entry that the top of the stack refers to,
as well as the IP register. It powers memoization and lazy input.
upd :: Int -> Int -> VM -> VM upd a b vm@VM{..} = setIP a $ vm { mem = M.insert (mem!sp) a $ M.insert ((mem!sp) + 4) b mem } exec :: VM -> String exec vm@VM{..} | ip < 0 = case combs!!(-ip - 1) of 'S' -> rec $ upd hp (hp + 8) . pop 2 . putHP [arg 0, arg 2, arg 1, arg 2, hp, hp + 8] 'I' -> rec $ setIP (arg 0) . pop 1 'C' -> rec $ putHP [arg 0, arg 2] . upd hp (arg 1) . pop 2 -- Unmemoized: 'K' -> rec $ setIP (arg 0) . pop 2 'K' -> rec $ upd (enCom "I") (arg 0) . pop 1 'B' -> rec $ putHP [arg 1, arg 2] . upd (arg 0) hp . pop 2
The (+)
combinator acts like I except it also increments AX.
The (<)
combinator is always applied to the combinator equivalent to
\(x =\lambda h t c n . c h (t c n)\), the cons combinator for
right-fold-encoded lists. When we reduce it, we know the top entry of the
stack is (<)x
. We replace the entry with xn(<x)
where n
is the
Church encoding of the next byte of input, or SK
if there is no more input.
The meaning of the 0
combinator depends on the language. For Nat, it
outputs AX and terminates. Thus given a Nat program t
, we recover its
output by reducing t(+)0
, since its output is Church-encoded. As we control
where (+)
and 0
are injected, we get away with giving side effects to
these two combinators.
The (>)
combinator is \xy.x(+)(0y)
. We tweak how 0
works for the other
languages, so that it turns the first argument (which should be a Church
number) into a byte to emit before recursing on y
.
Lazy K is fiddly because we must handle the case when it skips over our (>)
combinator, such as for the program K(K(256))
. (Normally we supply an
argument to 0
for non-Nat programs to ensure IP == when we evaluate
0
, but we can skip it for Lazy K because we only reach it when the program
terminates.)
'0' -> case lang of Nat -> show ax CrazyL -> chr ax : rec (setIP (arg 0) . setAX 0 . pop 1) FussyK -> if ax == 256 then "" else chr ax : rec (upd (arg 0) (enCom ">") . setAX 0) LazyK -> if ax >= 256 then "" else chr ax : rec (upd (hp + 8) (enCom "0") . putHP [arg 0, enCom ">", hp, enCom "+"] . setAX 0) -- I combinator with side effect. '+' -> rec $ setIP (arg 0) . pop 1 . setAX (ax + 1) '>' -> rec $ upd hp (hp + 8) . pop 1 . putHP [arg 0, enCom "+", enCom "0", arg 1] '.' -> "" -- Lazy input. If we reach here, then IP == [[SP]]. '<' -> case input of (h:t) | ord h <= 256 -> exec $ putHP [arg 0, ord h * 8, enCom "<", arg 0] $ upd hp (hp + 8) vm { input = t } | otherwise -> error "no support for integers > 256" _ -> rec $ upd (enCom "S") (enCom "K") _ -> error $ "bad combinator\n" ++ dump vm where rec f = exec $ f vm arg n = mem ! (mem ! (sp + n * 4) + 4) setAX a v = v {ax = a} exec vm@VM{..} = exec $ checkOverflow $ vm { sp = sp - 4, mem = M.insert (sp - 4) ip mem, ip = mem ! ip } pop :: Int -> VM -> VM pop n vm@VM{..} = vm { sp = sp + 4*n } setIP :: Int -> VM -> VM setIP a v = v {ip = a} putHP :: [Int] -> VM -> VM putHP as vm@VM{..} = checkOverflow $ vm { mem = M.union (M.fromList $ zip [hp, hp + 4..] as) mem, hp = hp + wlen as } checkOverflow :: VM -> VM checkOverflow vm@VM{..} | hp >= sp = error "overflow" | otherwise = vm
Compiler
We have a three import functions this time:
-
We ouptut bytes via
f
. -
We call
g
to get the next input byte. This function should return a negative number if there is no more input. -
We call
h
to output a 32-bit number.
leb128 :: Int -> [Int] leb128 n | n < 64 = [n] | n < 128 = [128 + n, 0] | otherwise = 128 + (n `mod` 128) : leb128 (n `div` 128) i32 :: Int i32 = 0x7f i32const :: Int i32const = 0x41 compile :: Lang -> Term -> [Int] compile mode e = concat [ [0, 0x61, 0x73, 0x6d, 1, 0, 0, 0] -- Magic string, version. -- Type section. , sect 1 [encSig [i32] [], encSig [] [], encSig [] [i32]] -- Import section. , sect 2 [ -- [0, 0] = external_kind Function, type index 0. encStr "i" ++ encStr "f" ++ [0, 0], -- [0, 2] = external_kind Function, type index 2. encStr "i" ++ encStr "g" ++ [0, 2], encStr "i" ++ encStr "h" ++ [0, 0]] -- Function section. [1] = Type index. , sect 3 [[1]] -- Memory section. 0 = no-maximum , sect 5 [[0, pageCount]] -- Export section. -- [0, 3] = external_kind Function, function index 3. , sect 7 [encStr "e" ++ [0, 3]] -- Code section. , sect 10 [lenc $ codeSection mode $ length heap] -- Data section. , sect 11 [[0, i32const, 0, 0xb] ++ lenc heap]] where heap = encodeTerm mode e >>= quad sect t xs = t : lenc (leb128 (length xs) ++ concat xs) -- 0x60 = Function type. encSig ins outs = 0x60 : lenc ins ++ lenc outs encStr s = lenc $ ord <$> s lenc xs = leb128 (length xs) ++ xs quad n | n < 0 = [256 + n, 255, 255, 255] | otherwise = take 4 $ byteMe n byteMe n | n < 256 = n : repeat 0 | otherwise = n `mod` 256 : byteMe (n `div` 256)
We translate our interpreter into WebAssembly for our compiler.
Our asmCase
helper deals with the branch numbers for each case in the
br_table
.
codeSection :: Lang -> Int -> [Int] codeSection mode heapEnd = [1, 4, i32, i32const] ++ leb128 (wlen gen) ++ [setlocal, ip, i32const] ++ leb128 maxSP ++ [setlocal, sp, i32const] ++ leb128 heapEnd ++ [setlocal, hp, 3, 0x40] -- loop ++ concat (replicate (ccount + 1) [2, 0x40]) -- blocks ++ [i32const, 128 - 1, getlocal, ip, i32sub, -- -1 - IP br_table] ++ (ccount:[0..ccount]) -- br_table ++ [0xb] ++ concat (zipWith asmCase [0..] combs)
Function application walks down the tree to find the combinator to run next, and builds up a spine on the stack as it goes.
-- Application is the default case. -- SP = SP - 4 -- [SP] = IP ++ [getlocal, sp, i32const, 4, i32sub, teelocal, sp, getlocal, ip, i32store, 2, 0, -- IP = [IP] getlocal, ip, i32load, 2, 0, setlocal, ip, br, 0, -- br loop 0xb, -- end loop 0xb] -- end function where br = 0xc br_if = 0xd br_table = 0xe getlocal = 0x20 setlocal = 0x21 teelocal = 0x22 i32load = 0x28 i32store = 0x36 i32ge_s = 0x4e i32ge_u = 0x4f i32add = 0x6a i32sub = 0x6b i32mul = 0x6c ip = 0 -- instruction pointer, can also hold instructions sp = 1 -- stack pointer hp = 2 -- heap pointer ax = 3 -- accumulator ccount = length combs asmCase combIndex combName = let loopLabel = ccount - combIndex exitLabel = loopLabel + 1 loop = [br, loopLabel] asmCom c = [i32const, 128 + enCom c] asmIP ops = ops ++ [setlocal, ip] asmArg n = [getlocal, sp, i32load, 2, 4*n, i32load, 2, 4] asmPop 0 = [] asmPop n = [getlocal, sp, i32const, 4*n, i32add, setlocal, sp] withHeap xs body = concat (zipWith hAlloc xs [0..]) ++ body ++ [getlocal, hp, i32const, 4*length xs, i32add, setlocal, hp] hAlloc x n = [getlocal, hp] ++ x ++ [i32store, 2, 4*n] hNew 0 = [getlocal, hp] hNew n = [getlocal, hp, i32const, 8*n, i32add] updatePop n x y = concat [ [getlocal, sp, i32load, 2, 4*n], x, [teelocal, ip, i32store, 2, 0] , [getlocal, sp, i32load, 2, 4*n], y, [i32store, 2, 4] , asmPop n ] in (++ [0xb]) $ case combName of
The following is similar to the exec
function of our interpreter.
'0' -> case mode of Nat -> [getlocal, ax, 0x10, 2, br, exitLabel] -- Print AX. LazyK -> [getlocal, ax, i32const, 128, 2, i32ge_u, -- AX >= 256? br_if, exitLabel, -- br_if exit getlocal, ax, 0x10, 0, -- else output AX -- AX = 0 i32const, 0, setlocal, ax ] ++ withHeap [asmArg 0, asmCom ">", hNew 0, asmCom "+", asmCom "0", asmCom "."] (updatePop 0 (hNew 1) (hNew 2)) ++ loop FussyK -> [getlocal, ax, i32const, 128, 2, i32ge_u, -- AX >= 256? br_if, exitLabel, -- br_if exit getlocal, ax, 0x10, 0, -- else output AX -- AX = 0 i32const, 0, setlocal, ax ] ++ updatePop 0 (asmArg 0) (asmCom ">") ++ loop CrazyL -> concat [ [getlocal, ax, 0x10, 0, i32const, 0, setlocal, ax] , asmIP (asmArg 0), asmPop 1, loop] '+' -> concat [ [getlocal, ax, i32const, 1, i32add, setlocal, ax] -- AX = AX + 1 , asmIP (asmArg 0) , asmPop 1 , loop ] 'K' -> updatePop 1 (asmCom "I") (asmArg 0) ++ loop 'S' -> withHeap (asmArg <$> [0, 2, 1, 2]) (updatePop 2 (hNew 0) (hNew 1)) ++ loop '>' -> withHeap [asmArg 0, asmCom "+", asmCom "0", asmArg 1] (updatePop 1 (hNew 0) (hNew 1)) ++ loop '.' -> [br, exitLabel] -- br exit 'I' -> concat [asmIP $ asmArg 0, asmPop 1, loop] '<' -> concat [ [0x10, 1, teelocal, ip] -- Get next character in IP. , [i32const, 0, i32ge_s, 4, 0x40] -- if >= 0 , withHeap [asmArg 0, [getlocal, ip, i32const, 8, i32mul], asmCom "<", asmArg 0] (updatePop 0 (hNew 0) (hNew 1)) , [5] -- else , updatePop 0 (asmCom "S") (asmCom "K") , [0xb] -- end if , loop ] 'B' -> withHeap [asmArg 1, asmArg 2] (updatePop 2 (asmArg 0) (hNew 0)) ++ loop 'C' -> withHeap [asmArg 0, asmArg 2] (updatePop 2 (hNew 0) (asmArg 1)) ++ loop e -> error $ "bad combinator: " ++ [e]
Web UI
We conclude by connecting buttons and textboxes with code.
#ifdef __HASTE__ (<>) = (++) main :: IO () main = withElems ["source", "input", "output", "sk", "asm", "compB", "runB"] $ \[sEl, iEl, oEl, skEl, aEl, compB, runB] -> do inp <- newIORef [] bin <- newIORef [] let putCh :: Int -> IO () putCh c = do v <- getProp oEl "value" setProp oEl "value" $ v ++ [chr c] putInt :: Int -> IO () putInt n = setProp oEl "value" $ show n getCh :: IO Int getCh = do s <- readIORef inp case s of [] -> pure (-1) (h:t) -> const (ord h) <$> writeIORef inp t export "putChar" putCh export "putInt" putInt export "getChar" getCh let setupDemo mode name s = do Just b <- elemById $ name ++ "B" Just d <- elemById $ name ++ "Demo" Just r <- elemById $ mode ++ "Radio" void $ b `onEvent` Click $ const $ do setProp sEl "value" =<< getProp d "value" setProp r "checked" "true" setProp iEl "value" s setProp oEl "value" "" setupDemo "nat" "nat" "" setupDemo "lazyk" "lazyk" "gateman" setupDemo "fussyk" "fussyk" "(ignored)" setupDemo "crazyl" "crazyl" "length" setupDemo "crazyl" "rev" "stressed" setupDemo "crazyl" "sort" "froetf" void $ compB `onEvent` Click $ const $ do setProp skEl "value" "" setProp aEl "value" "" writeIORef bin [] s <- getProp sEl "value" case parseProgram s of Left err -> setProp skEl "value" $ "error: " ++ show err Right sk -> do let f name = do Just el <- elemById $ name ++ "Radio" bool "" name . ("true" ==) <$> getProp el "checked" lang <- concat <$> mapM f ["nat", "lazyk", "fussyk", "crazyl"] let asm = compile (findLang lang) sk setProp skEl "value" $ showBabs sk setProp aEl "value" $ show asm writeIORef bin asm void $ runB `onEvent` Click $ const $ do setProp oEl "value" "" s <- getProp iEl "value" writeIORef inp s asm <- readIORef bin ffi "runWasmInts" asm :: IO () findLang :: String -> Lang findLang "nat" = Nat findLang "fussyk" = FussyK findLang "crazyl" = CrazyL findLang "lazyk" = LazyK findLang _ = undefined #endif
Testing
We test our code with HUnit on known Lazy K examples:
#ifndef __HASTE__ mustParseProgram :: String -> Term mustParseProgram = either (error "bad program") id . parseProgram tests :: Test tests = TestList [ "revK" ~: "diaper" ~?= runSim LazyK "repaid" rev , "revL" ~: "stressed" ~?= runSim CrazyL "desserts" "\\l.l(\\htx.t(\\cn.ch(xcn)))i(sk)" , "empty1" ~: "Hello, World!" ~?= runSim LazyK "Hello, World!" "\n" , "empty2" ~: "" ~?= runSim LazyK "" "\n" , "empty3" ~: "Hello, World!" ~?= runSim CrazyL "Hello, World!" "\n" , "kk256" ~: "" ~?= runSim LazyK "whatever" kk256 , "5!" ~: "120" ~?= runSim Nat "" (unlines [ "Y=(\\z.zz)(\\z.\\f.f(zzf))" , "P=\\nfx.n(\\gh.h(gf))(\\u.x)(\\u.u)" , "M=\\mnf.m(nf)" , "z=\\n.n(\\x.sk)k" , "Y(\\fn.zn(\\fx.fx)(Mn(f(Pn))))(\\fx.f(f(f(f(fx)))))" ]) , "primes" ~: let s = runSim FussyK "" pri in assertBool s $ "2 3 5 7 11 13" `isPrefixOf` s ] where kk256 = "k(k(sii(sii(sBi))))" runSim lang inp = sim lang inp . mustParseProgram rev = concat [ "1111100011111111100000111111111000001111111000111100111111000111111", "1000111100111110001111111000111100111001111111000111100111111111000", "1111111110000011111111100000111111110001111111110000011111111100000", "1111111000111111100011110011111000111001111111110000011111110001111", "0011111100011111111100000111001110011111110001111001111110001111001", "1111100011111110001111111000111111111000001111001110011110011111110", "0011110011111100011111111100000111001111111000111100111111000111100", "1111110001111001110011111110001111111000111100111110001111111000111", "1001111110001111001111100011111110001111111000111100111110001111111", "0001111001110011111110001111001111100011111110001111001110011111110", "0011111111100000111111111000001111001111111000111100111111000111111", "1000111100111110001111111000111100111111000111111111000001111111100", "0111110001111110001111111110000011110011100111111100011110011100111", "0011110011110011111110001111111110000011110011110011111111100000111", "1001111111100011111111100000111111111000001111111100011111111100000", "1111111110000011111110001111111000111100111110001110011111111100000"] pri :: String pri = concat [ "K", "(SII(S(K(S(S(K(SII(S(S(KS)(S(K(S(KS)))(S(K(S(S(KS)(SS(S(S(KS)K))(KK)))))", "(S(S(KS)(S(KK)(S(KS)(S(S(KS)(S(KK)(S(KS)(S(S(KS)(S(KK)(SII)))", "(K(SI(KK)))))))(K(S(K(S(S(KS)(S(K(SI))(S(KK)(S(K(S(S(KS)K)(S(S(KS)K)I)", "(S(SII)I(S(S(KS)K)I)(S(S(KS)K)))))(SI(K(KI)))))))))(S(KK)K)))))))(K(S(KK)", "(S(SI(K(S(S(S(S(SSK(SI(K(KI))))(K(S(S(KS)K)I(S(S(KS)K)(S(S(KS)K)I))", "(S(K(S(SI(K(KI)))))K)(KK))))(KK))(S(S(KS)(S(K(SI))(S(KK)(S(K(S(S(KS)K)))", "(SI(KK))))))(K(K(KI)))))(S(S(KS)(S(K(SI))(SS(SI)(KK))))(S(KK)", "(S(K(S(S(KS)K)))(SI(K(KI)))))))))(K(K(KI))))))))))(K(KI)))))(SI(KK)))))", "(S(K(S(K(S(K(S(SI(K(S(K(S(S(KS)K)I))(S(SII)I(S(S(KS)K)I)))))))K))))", "(S(S(KS)(S(KK)(SII)))(K(SI(K(KI)))))))(SII(S(K(S(S(KS)(S(K(S(S(SI(KK))", "(KI))))(SS(S(S(KS)(S(KK)(S(KS)(S(K(SI))K)))))(KK))))))(S(S(KS)", "(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)(SII)))(K(S(S(KS)K)))))))(K(S(S(KS)", "(S(K(S(S(SI(KK))(KI))))(S(KK)(S(K(SII(S(K(S(S(KS)(S(K(S(K(S(S(KS)(S(KK)", "(S(KS)(S(K(SI))K))))(KK)))))(S(S(KS)(S(KK)(S(K(SI(KK)))(SI(KK)))))", "(K(SI(KK))))))))(S(S(KS)(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)(SII)))", "(K(SI(K(KI))))))))(K(K(SI(K(KI)))))))))(S(K(SII))(S(K(S(K(SI(K(KI))))))", "(S(S(KS)(S(KK)(SI(K(S(K(S(SI(K(KI)))))K)))))(K(S(K(S(SI(KK))))", "(S(KK)(SII)))))))))))(K(SI(K(KI))))))))(S(S(KS)K)I)", "(SII(S(K(S(K(S(SI(K(KI)))))K))(SII)))))"]
Command-line UI
A REPL glues the above together. If no command-line arguments are given, then we print bracket abstractions for each line of the program.
main :: IO () main = do hSetBuffering stdout NoBuffering as <- getArgs let f lang = runInputT defaultSettings $ repl lang inArg [] inArg = case as of (_:a:_) -> a _ -> "" repl lang inp = fix $ \rec env -> do getInputLine "> " >>= \case Nothing -> outputStrLn "" Just ln -> case parseEnv env ln of Left err -> outputStrLn err >> rec env Right env'@((s, t):_) -> do if s == "main" then do outputStrLn $ lang inp t rec env else do outputStrLn $ s ++ "=" ++ showBabs t rec env' _ -> error "unreachable" if null as then f $ const showBabs else case head as of "n" -> f $ sim Nat "lazyk" -> f $ sim LazyK "k" -> f $ sim FussyK "l" -> f $ sim CrazyL "test" -> void $ runTestTT tests "pri" -> putStrLn $ take 70 $ sim FussyK "" $ mustParse pri "bm" -> defaultMain $ pure $ bench "pri" $ whnf (\t -> "2 3 5 7 11 13" `isPrefixOf` sim LazyK t (mustParse pri)) "" "wasm" -> print $ compile CrazyL $ mustParseProgram $ unlines [ "c=\\htcn.ch(tcn)" , "\\l.l(\\htx.t(chx))i(sk)" ] bad -> putStrLn $ "bad command: " ++ bad #endif