Client-side compiler

Our next goal is a browser-based edition of our compiler.

Crossly

We first add a few features related to WebAssembly. Firstly, the wasm command compiles Haskell to C intended to be compiled to wasm. The resulting binary assumes the host environment supplies a few functions such as env.getchar. Secondly, the warts command prints the RTS generated by the compiler, also as C code intended to be compiled to wasm. We later use this to build compilers that can go directly from Haskell to wasm.

We do some housekeeping. Given a Neat, type inference had produced a tuple of a particular type that contained the data needed by the next phase. We change it to produce a new Neat with an updated typedAsts field, so there’s one fewer data type to occupy our thoughts and APIs. We no longer need to pick out specific fields to pass to the next phase, as we simply pass everything.

We take a first stab at top-level type declarations. We treat them similarly to default typeclass methods, in that during type inference, we trust the symbol has its annotated type, and only afterwards that we verify the annotated type matches the inferred type. It’s more complex because we must process an entire strongly connected component at a time.

Adding modules has made a mess of our various functions for looking up data constructors, top-level variables, typeclasses, and so on. We reorganize them a little to standardize the logic for searching through the list of imports. This makes it easier to add support for lists of export symbols. We also replace the giant global operator precedence table with local tables.

We experiment with hash consing which reduces heap usage by maximizing sharing. However, it may cost too much, as our compiler has grown appreciably slower.

-- Module exports.
module Ast where
import Base
import Map

data Type = TC String | TV String | TAp Type Type deriving Eq
arr a b = TAp (TAp (TC "->") a) b
data Extra = Basic String | Const Int | ChrCon Char | StrCon String | Link String String Qual
data Pat = PatLit Ast | PatVar String (Maybe Pat) | PatCon String [Pat]
data Ast = E Extra | V String | A Ast Ast | L String Ast | Pa [([Pat], Ast)] | Proof Pred
data Constr = Constr String [(String, Type)]
data ModExport = ExportVar String | ExportCon String [String]
data Pred = Pred String Type deriving Eq
data Qual = Qual [Pred] Type

instance Show Type where
  showsPrec _ = \case
    TC s -> (s++)
    TV s -> (s++)
    TAp (TAp (TC "->") a) b -> showParen True $ shows a . (" -> "++) . shows b
    TAp a b -> showParen True $ shows a . (' ':) . shows b
instance Show Pred where
  showsPrec _ (Pred s t) = (s++) . (' ':) . shows t . (" => "++)
instance Show Qual where
  showsPrec _ (Qual ps t) = foldr (.) id (map shows ps) . shows t
instance Show Extra where
  showsPrec _ = \case
    Basic s -> (s++)
    Const i -> shows i
    ChrCon c -> shows c
    StrCon s -> shows s
    Link im s _ -> (im++) . ('.':) . (s++)
instance Show Pat where
  showsPrec _ = \case
    PatLit e -> shows e
    PatVar s mp -> (s++) . maybe id ((('@':) .) . shows) mp
    PatCon s ps -> (s++) . foldr (.) id (((' ':) .) . shows <$> ps)

showVar s@(h:_) = showParen (elem h ":!#$%&*+./<=>?@\\^|-~") (s++)

instance Show Ast where
  showsPrec prec = \case
    E e -> shows e
    V s -> showVar s
    A x y -> showParen (1 <= prec) $ shows x . (' ':) . showsPrec 1 y
    L s t -> showParen True $ ('\\':) . (s++) . (" -> "++) . shows t
    Pa vsts -> ("\\cases{"++) . foldr (.) id (intersperse (';':) $ map (\(vs, t) -> foldr (.) id (intersperse (' ':) $ map (showParen True . shows) vs) . (" -> "++) . shows t) vsts) . ('}':)
    Proof p -> ("{Proof "++) . shows p . ("}"++)

data Instance = Instance
  -- Type, e.g. Int for Eq Int.
  Type
  -- Dictionary name, e.g. "{Eq Int}"
  String
  -- Context.
  [Pred]
  -- Method definitions
  (Map String Ast)

data Assoc = NAssoc | LAssoc | RAssoc deriving Eq

data Neat = Neat
  { typeclasses :: Map String [String]
  , instances :: Map String [Instance]
  , topDefs :: [(String, Ast)]
  , topDecls :: Map String Qual
  -- | Typed ASTs, ready for compilation, including ADTs and methods,
  -- e.g. (==), (Eq a => a -> a -> Bool, select-==)
  , typedAsts :: Map String (Qual, Ast)
  , dataCons :: Map String [Constr]
  , type2Cons :: Map String [String]
  , ffiImports :: Map String Type
  , ffiExports :: Map String String
  , moduleImports :: [String]
  , moduleExports :: Maybe [String]
  , opFixity :: Map String (Int, Assoc)
  }

neatEmpty = Neat Tip Tip [] Tip Tip Tip Tip Tip Tip [] Nothing Tip

patVars = \case
  PatLit _ -> []
  PatVar s m -> s : maybe [] patVars m
  PatCon _ args -> concat $ patVars <$> args

typeVars = \case
  TC _ -> []
  TV v -> [v]
  TAp x y -> typeVars x `union` typeVars y

depthFirstSearch = (foldl .) \relation st@(visited, sequence) vertex ->
  if vertex `elem` visited then st else second (vertex:)
    $ depthFirstSearch relation (vertex:visited, sequence) (relation vertex)

spanningSearch   = (foldl .) \relation st@(visited, setSequence) vertex ->
  if vertex `elem` visited then st else second ((:setSequence) . (vertex:))
    $ depthFirstSearch relation (vertex:visited, []) (relation vertex)

scc ins outs = spanning . depthFirst where
  depthFirst = snd . depthFirstSearch outs ([], [])
  spanning   = snd . spanningSearch   ins  ([], [])
-- Separate fixity phase.
-- Export lists.
module Parser where
import Base
import Ast
import Map

-- Parser.
data ParserState = ParserState
  { readme :: [(Char, (Int, Int))]
  , landin :: String
  , indents :: [Int]
  }
data Parser a = Parser { getParser :: ParserState -> Either String (a, ParserState) }
instance Functor Parser where fmap f x = pure f <*> x
instance Applicative Parser where
  pure x = Parser \inp -> Right (x, inp)
  (Parser f) <*> (Parser x) = Parser \inp -> do
    (fun, t) <- f inp
    (arg, u) <- x t
    pure (fun arg, u)
instance Monad Parser where
  return = pure
  (Parser x) >>= f = Parser \inp -> do
    (a, t) <- x inp
    getParser (f a) t
instance Alternative Parser where
  empty = bad ""
  x <|> y = Parser \inp -> either (const $ getParser y inp) Right $ getParser x inp

notFollowedBy p = do
  saved <- Parser \pasta -> Right (pasta, pasta)
  ret <- p *> pure (bad "") <|> pure (pure ())
  Parser \_ -> Right ((), saved)
  ret

parse f str = getParser f $ ParserState (rowcol str (1, 1)) [] [] where
  rowcol s rc = case s of
    [] -> []
    h:t -> (h, rc) : rowcol t (advanceRC (ord h) rc)
  advanceRC n (r, c)
    | n `elem` [10, 11, 12, 13] = (r + 1, 1)
    | n == 9 = (r, (c + 8)`mod`8)
    | True = (r, c + 1)

indentOf pasta = case readme pasta of
  [] -> 1
  (_, (_, c)):_ -> c

ins c pasta = pasta { landin = c:landin pasta }

angle n pasta = case indents pasta of
  m:ms | m == n -> ins ';' pasta
       | n + 1 <= m -> ins '}' $ angle n pasta { indents = ms }
  _ -> pasta

curly n pasta = case indents pasta of
  m:ms | m + 1 <= n -> ins '{' pasta { indents = n:m:ms }
  [] | 1 <= n -> ins '{' pasta { indents = [n] }
  _ -> ins '{' . ins '}' $ angle n pasta

sat f = Parser \pasta -> case landin pasta of
  c:t -> if f c then Right (c, pasta { landin = t }) else Left "unsat"
  [] -> case readme pasta of
    [] -> case indents pasta of
      [] -> Left "EOF"
      m:ms | m /= 0 && f '}' -> Right ('}', pasta { indents = ms })
      _ -> Left "unsat"
    (h, _):t | f h -> let
      p' = pasta { readme = t }
      in case h of
        '}' -> case indents pasta of
          0:ms -> Right (h, p' { indents = ms })
          _ -> Left "unsat"
        '{' -> Right (h, p' { indents = 0:indents p' })
        _ -> Right (h, p')
    _ -> Left "unsat"

char c = sat (c ==)

rawSat f = Parser \pasta -> case readme pasta of
  [] -> Left "EOF"
  (h, _):t -> if f h then Right (h, pasta { readme = t }) else Left "unsat"

eof = Parser \pasta -> case pasta of
  ParserState [] [] _ -> Right ((), pasta)
  _ -> badpos pasta "want eof"

comment = rawSat ('-' ==) *> some (rawSat ('-' ==)) *>
  (rawSat isNewline <|> rawSat (not . isSymbol) *> many (rawSat $ not . isNewline) *> rawSat isNewline) *> pure True
spaces = isNewline <$> rawSat isSpace
whitespace = do
  offside <- or <$> many (spaces <|> comment)
  Parser \pasta -> Right ((), if offside then angle (indentOf pasta) pasta else pasta)

hexValue d
  | d <= '9' = ord d - ord '0'
  | d <= 'F' = 10 + ord d - ord 'A'
  | d <= 'f' = 10 + ord d - ord 'a'
isNewline c = ord c `elem` [10, 11, 12, 13]
isSymbol = (`elem` "!#$%&*+./<=>?@\\^|-~:")
isSmall c = c <= 'z' && 'a' <= c || c == '_'
small = sat isSmall
large = sat \x -> (x <= 'Z') && ('A' <= x)
hexit = sat \x -> (x <= '9') && ('0' <= x)
  || (x <= 'F') && ('A' <= x)
  || (x <= 'f') && ('a' <= x)
digit = sat \x -> (x <= '9') && ('0' <= x)
decimal = foldl (\n d -> 10*n + ord d - ord '0') 0 <$> some digit
hexadecimal = foldl (\n d -> 16*n + hexValue d) 0 <$> some hexit
nameTailChar = small <|> large <|> digit <|> char '\''
nameTailed p = liftA2 (:) p $ many nameTailChar

escape = char '\\' *> (sat (`elem` "'\"\\") <|> char 'n' *> pure '\n' <|> char '0' *> pure '\0' <|> char 'x' *> (chr <$> hexadecimal))
tokOne delim = escape <|> rawSat (delim /=)

charSeq = mapM char
tokChar = between (char '\'') (char '\'') (tokOne '\'')
quoteStr = between (char '"') (char '"') $ many $ many (charSeq "\\&") *> tokOne '"'
quasiquoteStr = charSeq "[r|" *> quasiquoteBody
quasiquoteBody = charSeq "|]" *> pure [] <|> (:) <$> rawSat (const True) <*> quasiquoteBody
tokStr = quoteStr <|> quasiquoteStr
integer = char '0' *> (char 'x' <|> char 'X') *> hexadecimal <|> decimal
literal = lexeme . fmap E $ Const <$> integer <|> ChrCon <$> tokChar <|> StrCon <$> tokStr
varish = lexeme $ nameTailed small
bad s = Parser \pasta -> badpos pasta s
badpos pasta s = Left $ loc $ ": " ++ s where
  loc = case readme pasta of
    [] -> ("EOF"++)
    (_, (r, c)):_ -> ("row "++) . shows r . (" col "++) . shows c
varId = do
  s <- varish
  if elem s
    ["export", "case", "class", "data", "default", "deriving", "do", "else", "foreign", "if", "import", "in", "infix", "infixl", "infixr", "instance", "let", "module", "newtype", "of", "then", "type", "where", "_"]
    then bad $ "reserved: " ++ s else pure s
varSymish = lexeme $ (:) <$> sat (\c -> isSymbol c && c /= ':') <*> many (sat isSymbol)
varSym = lexeme $ do
  s <- varSymish
  if elem s ["..", "=", "\\", "|", "<-", "->", "@", "~", "=>"] then bad $ "reserved: " ++ s else pure s

conId = lexeme $ nameTailed large
conSymish = lexeme $ liftA2 (:) (char ':') $ many $ sat isSymbol
conSym = do
  s <- conSymish
  if elem s [":", "::"] then bad $ "reserved: " ++ s else pure s
special c = lexeme $ sat (c ==)
comma = special ','
semicolon = special ';'
lParen = special '('
rParen = special ')'
lBrace = special '{'
rBrace = special '}'
lSquare = special '['
rSquare = special ']'

backquoted = between (char '`' *> whitespace) (char '`' *> whitespace)

lexeme f = f <* whitespace

lexemePrelude = whitespace *>
  Parser \pasta -> case getParser (res "module" <|> (:[]) <$> char '{') pasta of
    Left _ -> Right ((), curly (indentOf pasta) pasta)
    Right _ -> Right ((), pasta)

curlyCheck f = do
  Parser \pasta -> Right ((), pasta { indents = 0:indents pasta })
  r <- f
  Parser \pasta -> let pasta' = pasta { indents = tail $ indents pasta } in case readme pasta of
    []              -> Right ((), curly 0 pasta')
    ('{', _):_      -> Right ((), pasta')
    (_, (_, col)):_ -> Right ((), curly col pasta')
  pure r

conOf (Constr s _) = s
specialCase (h:_) = '|':conOf h
mkCase t cs = insert (specialCase cs)
  ( Qual [] $ arr t $ foldr arr (TV "case") $ map (\(Constr _ sts) -> foldr arr (TV "case") $ snd <$> sts) cs
  , E $ Basic "I")
mkStrs = snd . foldl (\(s, l) u -> ('@':s, s:l)) ("@", [])
scottEncode _ ":" _ = E $ Basic "CONS"
scottEncode vs s ts = foldr L (foldl (\a b -> A a (V b)) (V s) ts) (ts ++ vs)
scottConstr t cs (Constr s sts) = foldr (.) (insertWith (error $ "constructor conflict: " ++ s) s
  (Qual [] $ foldr arr t ts , scottEncode (map conOf cs) s $ mkStrs ts)
  ) [insertWith (error $ "field conflict: " ++ field) field (Qual [] $ t `arr` ft, L s $ foldl A (V s) $ inj $ proj field) | (field, ft) <- sts, field /= ""]
  where
  ts = snd <$> sts
  proj fd = foldr L (V fd) $ fst <$> sts
  inj x = map (\(Constr s' _) -> if s' == s then x else V "undefined") cs
mkAdtDefs t cs = foldr (.) (mkCase t cs) $ scottConstr t cs <$> cs

mkFFIHelper n t acc = case t of
  TC s -> acc
  TAp (TC "IO") _ -> acc
  TAp (TAp (TC "->") x) y -> L (show n) $ mkFFIHelper (n + 1) y $ A (V $ show n) acc

updateDcs cs dcs = foldr (\(Constr s _) m -> insert s cs m) dcs cs
addAdt t cs ders neat = foldr derive neat' ders where
  neat' = neat
    { typedAsts = mkAdtDefs t cs $ typedAsts neat
    , dataCons = updateDcs cs $ dataCons neat
    , type2Cons = insert (typeName t) (concatMap cnames cs) $ type2Cons neat
    }
  typeName = \case
    TAp x _ -> typeName x
    TC c -> c
  cnames (Constr s sts) = s : concatMap (\(s, _) -> if s == "" then [] else [s]) sts
  derive "Eq" = addInstance "Eq" (mkPreds "Eq") t
    [("==", Pa $ map eqCase cs
    )]
  derive "Show" = addInstance "Show" (mkPreds "Show") t
    [("showsPrec", L "prec" $ Pa $ map showCase cs
    )]
  derive der = error $ "bad deriving: " ++ der
  prec0 = A (V "ord") (E $ ChrCon '\0')
  showCase (Constr con args) = let as = show <$> [1..length args]
    in ([PatCon con $ mkPatVar "" <$> as], case args of
      [] -> A (V "++") (E $ StrCon con)
      _ -> case con of
        ':':_ -> A (A (V "showParen") $ V "True") $ foldr1
          (\f g -> A (A (V ".") f) g)
          [ A (A (V "showsPrec") prec0) (V "1")
          , A (V "++") (E $ StrCon $ ' ':con++" ")
          , A (A (V "showsPrec") prec0) (V "2")
          ]
        _ -> A (A (V "showParen") $ A (A (V "<=") prec0) $ V "prec")
          $ A (A (V ".") $ A (V "++") (E $ StrCon con))
          $ foldr (\f g -> A (A (V ".") f) g) (L "x" $ V "x")
          $ map (\a -> A (A (V ".") (A (V ":") (E $ ChrCon ' '))) $ A (A (V "showsPrec") prec0) (V a)) as
      )
  mkPreds classId = Pred classId . TV <$> typeVars t
  mkPatVar pre s = PatVar (pre ++ s) Nothing
  eqCase (Constr con args) = let as = show <$> [1..length args]
    in ([PatCon con $ mkPatVar "l" <$> as], Pa
      [ ([PatCon con $ mkPatVar "r" <$> as], foldr (\x y -> (A (A (V "&&") x) y)) (V "True")
         $ map (\n -> A (A (V "==") (V $ "l" ++ n)) (V $ "r" ++ n)) as)
      , ([PatVar "_" Nothing], V "False")])

addClass classId v (sigs, defs) neat = if not $ member classId $ typeclasses neat then neat
  { typeclasses = insert classId (keys sigs) $ typeclasses neat
  , typedAsts = selectors $ typedAsts neat
  , topDefs = defaults ++ topDefs neat
  } else error $ "duplicate class: " ++ classId
  where
  vars = take (size sigs) $ show <$> [0..]
  selectors = foldr (.) id $ zipWith (\var (s, Qual ps t) -> insertWith (error $ "method conflict: " ++ s) s (Qual (Pred classId v:ps) t,
    L "@" $ A (V "@") $ foldr L (V var) vars)) vars $ toAscList sigs
  defaults = map (\(s, t) -> if member s sigs then ("{default}" ++ s, t) else error $ "bad default method: " ++ s) $ toAscList defs

addInstance classId ps ty ds neat = neat
  { instances = insertWith (++) classId [Instance ty name ps (fromList ds)] $ instances neat
  } where
  name = '{':classId ++ (' ':shows ty "}")

addTopDecl (s, t) neat = neat { topDecls = insert s t $ topDecls neat }

addForeignImport foreignname ourname t neat = neat
  { typedAsts = insertWith (error $ "import conflict: " ++ ourname) ourname (Qual [] t, mkFFIHelper 0 t $ A (E $ Basic "F") $ A (E $ Basic "NUM") $ E $ Link "{foreign}" foreignname $ Qual [] t) $ typedAsts neat
  , ffiImports = insertWith (error $ "duplicate import: " ++ foreignname) foreignname t $ ffiImports neat
  }
addForeignExport e f neat = neat { ffiExports = insertWith (error $ "duplicate export: " ++ e) e f $ ffiExports neat }
addDefs ds neat = neat { topDefs = ds ++ topDefs neat }
addImport im neat = neat { moduleImports = im:moduleImports neat }
addFixities os prec neat = neat { opFixity = foldr (\o tab -> insert o prec tab) (opFixity neat) os }

parseErrorRule = Parser \pasta -> case indents pasta of
  m:ms | m /= 0 -> Right ('}', pasta { indents = ms })
  _ -> badpos pasta "missing }"

res w@(h:_) = reservedSeq *> pure w <|> bad ("want \"" ++ w ++ "\"") where
  reservedSeq = if elem w ["let", "where", "do", "of"]
    then curlyCheck $ lexeme $ charSeq w *> notFollowedBy nameTailChar
    else lexeme $ charSeq w *> notFollowedBy (if isSmall h then nameTailChar else sat isSymbol)

paren = between lParen rParen
braceSep f = between lBrace (rBrace <|> parseErrorRule) $ foldr ($) [] <$> sepBy ((:) <$> f <|> pure id) semicolon

joinIsFail t = A (L "join#" t) (V "fail#")

addLets ls x = L "let" $ foldr L (L "in" bodies) $ fst <$> ls where
  bodies = foldr A x $ joinIsFail . snd <$> ls

qconop = conSym <|> res ":" <|> backquoted conId

qconsym = conSym <|> res ":"

op = qconsym <|> varSym <|> backquoted (conId <|> varId)
con = conId <|> paren qconsym
var = varId <|> paren varSym

tycon = do
  s <- conId
  pure $ if s == "String" then TAp (TC "[]") (TC "Char") else TC s

aType =
  lParen *>
    (   rParen *> pure (TC "()")
    <|> (foldr1 (TAp . TAp (TC ",")) <$> sepBy1 _type comma) <* rParen)
  <|> tycon
  <|> TV <$> varId
  <|> (lSquare *> (rSquare *> pure (TC "[]") <|> TAp (TC "[]") <$> (_type <* rSquare)))
bType = foldl1 TAp <$> some aType
_type = foldr1 arr <$> sepBy bType (res "->")

fixityDecl w a = do
  res w
  n <- lexeme integer
  os <- sepBy op comma
  pure $ addFixities os (n, a)

fixity = fixityDecl "infix" NAssoc <|> fixityDecl "infixl" LAssoc <|> fixityDecl "infixr" RAssoc

cDecls = first fromList . second fromList . foldr ($) ([], []) <$> braceSep cDecl
cDecl = first . (:) <$> genDecl <|> second . (++) <$> defSemi

genDecl = (,) <$> var <* res "::" <*> (Qual <$> (scontext <* res "=>" <|> pure []) <*> _type)

classDecl = res "class" *> (addClass <$> conId <*> (TV <$> varId) <*> (res "where" *> cDecls))

simpleClass = Pred <$> conId <*> _type
scontext = (:[]) <$> simpleClass <|> paren (sepBy simpleClass comma)

instDecl = res "instance" *>
  ((\ps cl ty defs -> addInstance cl ps ty defs) <$>
  (scontext <* res "=>" <|> pure [])
    <*> conId <*> _type <*> (res "where" *> braceDef))

letin = addLets <$> between (res "let") (res "in") braceDef <*> expr
ifthenelse = (\a b c -> A (A (A (V "if") a) b) c) <$>
  (res "if" *> expr) <*> (res "then" *> expr) <*> (res "else" *> expr)
listify = foldr (\h t -> A (A (V ":") h) t) (V "[]")

alts = joinIsFail . Pa <$> braceSep ((\x y -> ([x], y)) <$> pat <*> gdSep "->")
cas = flip A <$> between (res "case") (res "of") expr <*> alts
lamCase = curlyCheck (res "case") *> alts

nalts = joinIsFail . Pa <$> braceSep ((,) <$> many apat <*> gdSep "->")
lamCases = curlyCheck (res "cases") *> nalts

lam = res "\\" *> (lamCase <|> lamCases <|> liftA2 onePat (some apat) (res "->" *> expr))

flipPairize y x = A (A (V ",") x) y
moreCommas = foldr1 (A . A (V ",")) <$> sepBy1 expr comma
thenComma = comma *> ((flipPairize <$> moreCommas) <|> pure (A (V ",")))
parenExpr = (&) <$> expr <*> (((\v a -> A (V v) a) <$> op) <|> thenComma <|> pure id)
rightSect = ((\v a -> L "@" $ A (A (V v) $ V "@") a) <$> (op <|> (:"") <$> comma)) <*> expr
section = lParen *> (parenExpr <* rParen <|> rightSect <* rParen <|> rParen *> pure (V "()"))

maybePureUnit = maybe (V "pure" `A` V "()") id
stmt = (\p x -> Just . A (V ">>=" `A` x) . onePat [p] . maybePureUnit) <$> pat <*> (res "<-" *> expr)
  <|> (\x -> Just . maybe x (\y -> (V ">>=" `A` x) `A` (L "_" y))) <$> expr
  <|> (\ds -> Just . addLets ds . maybePureUnit) <$> (res "let" *> braceDef)
doblock = res "do" *> (maybePureUnit . foldr ($) Nothing <$> braceSep stmt)

compQual =
  (\p xs e -> A (A (V "concatMap") $ onePat [p] e) xs)
    <$> pat <*> (res "<-" *> expr)
  <|> (\b e -> A (A (A (V "if") b) e) $ V "[]") <$> expr
  <|> addLets <$> (res "let" *> braceDef)

sqExpr = between lSquare rSquare $
  ((&) <$> expr <*>
    (   res ".." *>
      (   (\hi lo -> (A (A (V "enumFromTo") lo) hi)) <$> expr
      <|> pure (A (V "enumFrom"))
      )
    <|> res "|" *>
      ((. A (V "pure")) . foldr (.) id <$> sepBy1 compQual comma)
    <|> (\t h -> listify (h:t)) <$> many (comma *> expr)
    )
  )
  <|> pure (V "[]")

fbind = A <$> (V <$> var) <*> (res "=" *> expr)

fBinds v = (do
    fbs <- between lBrace rBrace $ sepBy1 fbind comma
    pure $ A (E $ Basic "{=") $ foldr A (E $ Basic "=}") $ v:fbs
  ) <|> pure v

atom = ifthenelse <|> doblock <|> letin <|> sqExpr <|> section
  <|> cas <|> lam <|> (paren comma *> pure (V ","))
  <|> V <$> (con <|> var) <|> literal
  >>= fBinds

aexp = foldl1 A <$> some atom

chain a = \case
  [] -> a
  A f b:rest -> case rest of
    [] -> A (A f a) b
    _ -> L "(" $ A (A (A f a) b) $ foldr A (V ")") rest
  _ -> error "unreachable"
expr = chain <$> aexp <*> many (A <$> (V <$> op) <*> aexp)

gcon = conId <|> paren (qconsym <|> (:"") <$> comma) <|> lSquare *> rSquare *> pure "[]"

apat = PatVar <$> var <*> (res "@" *> (Just <$> apat) <|> pure Nothing)
  <|> flip PatVar Nothing <$> (res "_" *> pure "_")
  <|> flip PatCon [] <$> gcon
  <|> PatLit <$> literal
  <|> foldr (\h t -> PatCon ":" [h, t]) (PatCon "[]" [])
    <$> between lSquare rSquare (sepBy pat comma)
  <|> paren (foldr1 pairPat <$> sepBy1 pat comma <|> pure (PatCon "()" []))
  where pairPat x y = PatCon "," [x, y]

patChain a = \case
  [] -> a
  PatCon f [b]:rest -> case rest of
    [] -> PatCon f [a, b]
    _ -> PatCon "{+" $ PatCon f [a, b] : rest
  _ -> error "unreachable"
patAtom = PatCon <$> gcon <*> many apat <|> apat
pat = patChain <$> patAtom <*> many (PatCon <$> qconop <*> ((:[]) <$> patAtom))

maybeWhere p = (&) <$> p <*> (res "where" *> (addLets <$> braceDef) <|> pure id)

gdSep s = maybeWhere $ res s *> expr <|> foldr ($) (V "join#") <$> some (between (res "|") (res s) guards <*> expr)
guards = foldr1 (\f g -> \yes no -> f (g yes no) no) <$> sepBy1 guard comma
guard = guardPat <$> pat <*> (res "<-" *> expr) <|> guardExpr <$> expr
  <|> guardLets <$> (res "let" *> braceDef)
guardExpr x yes no = case x of
  V "True" -> yes
  _ -> A (A (A (V "if") x) yes) no
guardPat p x yes no = A (Pa [([p], yes), ([PatVar "_" Nothing], no)]) x
guardLets defs yes no = addLets defs yes

onePat vs x = joinIsFail $ Pa [(vs, x)]
defOnePat vs x = Pa [(vs, x)]
opDef x f y rhs = [(f, defOnePat [x, y] rhs)]
leftyPat p expr = case pvars of
  [] -> []
  (h:t) -> let gen = '@':h in
    (gen, expr):map (\v -> (v, A (Pa [([p], V v)]) $ V gen)) pvars
  where
  pvars = filter (/= "_") $ patVars p
def = liftA2 (\l r -> [(l, r)]) var (liftA2 defOnePat (many apat) $ gdSep "=")
  <|> (pat >>= \x -> opDef x <$> varSym <*> pat <*> gdSep "=" <|> leftyPat x <$> gdSep "=")
coalesce = \case
  [] -> []
  h@(s, x):t -> case t of
    [] -> [h]
    (s', x'):t' -> let
      f (Pa vsts) (Pa vsts') = Pa $ vsts ++ vsts'
      f _ _ = error "bad multidef"
      in if s == s' then coalesce $ (s, f x x'):t' else h:coalesce t
defSemi = coalesce . concat <$> sepBy1 def (some semicolon)
braceDef = concat <$> braceSep defSemi

simpleType c vs = foldl TAp (TC c) (map TV vs)
conop = conSym <|> backquoted conId
fieldDecl = (\vs t -> map (, t) vs) <$> sepBy1 var comma <*> (res "::" *> _type)
constr = (\x c y -> Constr c [("", x), ("", y)]) <$> bType <*> conop <*> bType
  <|> Constr <$> conId <*>
    (   concat <$> between lBrace rBrace (fieldDecl `sepBy` comma)
    <|> map ("",) <$> many aType)
dclass = conId
_deriving = (res "deriving" *> ((:[]) <$> dclass <|> paren (dclass `sepBy` comma))) <|> pure []
adt = addAdt <$> between (res "data") (res "=") (simpleType <$> conId <*> many varId) <*> sepBy constr (res "|") <*> _deriving

impDecl = addImport <$> (res "import" *> conId)

tops = braceSep
  $   adt
  <|> classDecl
  <|> addTopDecl <$> genDecl
  <|> instDecl
  <|> res "foreign" *>
    (   res "import" *> var *> (addForeignImport <$> lexeme tokStr <*> var <*> (res "::" *> _type))
    <|> res "export" *> var *> (addForeignExport <$> lexeme tokStr <*> var)
    )
  <|> addDefs <$> defSemi
  <|> fixity
  <|> impDecl

export_ = ExportVar <$> varId <|> ExportCon <$> conId <*>
  (   paren ((:[]) <$> res ".." <|> sepBy (var <|> con) comma)
  <|> pure []
  )
exports = Just <$> paren (export_ `sepBy` comma)
  <|> pure Nothing

haskell = between lexemePrelude eof $ some $ mayModule tops

mayModule p = res "module" *> ((,) <$> conId <*> ((,) <$> exports <* res "where" <*> p))
  <|> ("Main",) . (Nothing,) <$> p

parseProgram s = fmap fst $ parse haskell s
-- Export lists.
-- Detect missing instances.
-- Top-level type annotations.
module Typer where

import Base
import Map
import Ast
import Parser
import Unify

-- Pattern compiler.
rewritePats searcher = \case
  [] -> pure $ V "join#"
  vsxs@((as0, _):_) -> case as0 of
    [] -> pure $ foldr1 (A . L "join#") $ snd <$> vsxs
    _ -> do
      let k = length as0
      n <- get
      put $ n + k
      let vs = take k $ (`shows` "#") <$> [n..]
      cs <- flip mapM vsxs \(a:at, x) -> (a,) <$> foldM (\b (p, v) -> rewriteCase searcher v Tip [(p, b)]) x (zip at $ tail vs)
      flip (foldr L) vs <$> rewriteCase searcher (head vs) Tip cs

patEq lit b x y = A (L "join#" $ A (A (A (V "if") (A (A (V "==") lit) b)) x) $ V "join#") y

rewriteCase searcher caseVar tab = \case
  [] -> flush $ V "join#"
  ((v, x):rest) -> go v x rest
  where
  rec = rewriteCase searcher caseVar
  go v x rest = case v of
    PatLit lit -> flush =<< patEq lit (V caseVar) x <$> rec Tip rest
    PatVar s m -> let x' = fill s (V caseVar) x in case m of
      Nothing -> flush =<< A (L "join#" x') <$> rec Tip rest
      Just v' -> go v' x' rest
    PatCon con args -> rec (insertWith (flip (.)) con ((args, x):) tab) rest
  flush onFail = case toAscList tab of
    [] -> pure onFail
    -- TODO: Check rest of `tab` lies in cs.
    (firstC, _):_ -> do
      let cs = either error id $ findCon searcher firstC
      jumpTable <- mapM (\(Constr s ts) -> case mlookup s tab of
          Nothing -> pure $ foldr L (V "join#") $ const "_" <$> ts
          Just f -> rewritePats searcher $ f []
        ) cs
      pure $ A (L "join#" $ foldl A (A (V $ specialCase cs) $ V caseVar) jumpTable) onFail

resolveFieldBinds searcher t = go t where
  go t = case t of
    E _ -> t
    V _ -> t
    A (E (Basic "{=")) (A rawExpr fbsAst) -> let
      expr = go rawExpr
      fromAst t = case t of
        A (A (V f) body) rest -> (f, go body):fromAst rest
        E (Basic "=}") -> []
      fbs@((firstField, _):_) = fromAst fbsAst
      (con, fields) = findField searcher firstField
      cs = either error id $ findCon searcher con
      newValue = foldl A (V con) [maybe (V $ "[old]"++f) id $ lookup f fbs | (f, _) <- fields]
      initValue = foldl A expr [maybe (V "undefined") id $ lookup f fbs | (f, _) <- fields]
      updater = foldr L newValue $ ("[old]"++) . fst <$> fields
      inj x = map (\(Constr con' _) -> if con' == con then x else V "undefined") cs
      allPresent = all (`elem` (fst <$> fields)) $ fst <$> fbs
      isCon = case expr of
        V (h:_) -> 'A' <= h && h <= 'Z'
        _ -> False
      in if allPresent
        then if isCon then initValue else foldl A (A (V $ specialCase cs) expr) $ inj updater
        else error "bad fields in update"
    A x y -> A (go x) (go y)
    L s x -> L s $ go x

fixFixity findPrec t = case t of
  E _ -> pure t
  V _ -> pure t
  A x y -> A <$> go x <*> go y
  L s b
    | s == "(" -> infixer findPrec =<< go b
    | True -> L s <$> go b
  Pa vsxs -> Pa <$> mapM (\(ps, a) -> (,) <$> mapM pgo ps <*> go a) vsxs
  where
  go = fixFixity findPrec
  pgo = patFixFixity findPrec

data OpTree = OpLeaf Ast | OpNode String Ast OpTree

infixer findPrec (A (A (A (V s) x) y) t) = go (OpNode s x (OpLeaf y)) t
  where
  go acc = \case
    A (A (V s) z) rest -> do
      acc' <- ins s z acc
      go acc' rest
    V ")" -> pure $ decode acc
    _ -> error "unreachable"
  ins s z t = case t of
    OpNode s' x y -> isStronger findPrec s s' >>= \case
      True -> OpNode s' x <$> ins s z y
      False -> pure $ OpNode s (decode t) (OpLeaf z)
    OpLeaf x -> pure $ OpNode s x (OpLeaf z)
  decode = \case
    OpNode f x y -> A (A (V f) x) (decode y)
    OpLeaf x -> x

isStronger findPrec s s' = if prec <= prec'
  then if prec == prec'
    then if assoc == assoc'
      then case assoc of
        LAssoc -> pure False
        RAssoc -> pure True
        NAssoc -> Left $ "adjacent NAssoc: " ++ s ++ " vs " ++ s'
      else Left $ "assoc mismatch: " ++ s ++ " vs " ++ s'
    else pure False
  else pure True
  where
  (prec, assoc) = findPrec s
  (prec', assoc') = findPrec s'

patFixFixity findPrec p = case p of
  PatLit _ -> pure p
  PatVar s m -> PatVar s <$> maybe (pure Nothing) (fmap Just . go) m
  PatCon "{+" args -> patFixer findPrec args
  PatCon con args -> PatCon con <$> mapM go args
  where
  go = patFixFixity findPrec

data PopTree = PopLeaf Pat | PopNode String Pat PopTree

patFixer findPrec (PatCon f [a, b]:rest) = go seed rest where
  seed = PopNode f a (PopLeaf b)
  go acc = \case
    [] -> pure $ decode acc
    PatCon s [z]:rest -> do
      acc' <- ins s z acc
      go acc' rest
  ins s z t = case t of
    PopNode s' x y -> isStronger findPrec s s' >>= \case
      True -> PopNode s' x <$> ins s z y
      False -> pure $ PopNode s (decode t) (PopLeaf z)
    PopLeaf x -> pure $ PopNode s x (PopLeaf z)
  decode = \case
    PopNode f x y -> PatCon f [x, decode y]
    PopLeaf x -> x

secondM f (a, b) = (a,) <$> f b
patternCompile searcher t = astLink searcher $ resolveFieldBinds searcher $ evalState (go $ either error id $ fixFixity (findPrec searcher) t) 0 where
  go t = case t of
    E _ -> pure t
    V _ -> pure t
    A x y -> liftA2 A (go x) (go y)
    L s x -> L s <$> go x
    Pa vsxs -> mapM (secondM go) vsxs >>= rewritePats searcher

-- Type inference.
instantiate' t n tab = case t of
  TC s -> ((t, n), tab)
  TV s -> case lookup s tab of
    Nothing -> let va = TV $ show n in ((va, n + 1), (s, va):tab)
    Just v -> ((v, n), tab)
  TAp x y -> let
    ((t1, n1), tab1) = instantiate' x n tab
    ((t2, n2), tab2) = instantiate' y n1 tab1
    in ((TAp t1 t2, n2), tab2)

instantiatePred (Pred s t) ((out, n), tab) = first (first ((:out) . Pred s)) (instantiate' t n tab)

instantiate (Qual ps t) n = first (Qual ps1) $ fst $ instantiate' t n1 tab where
  ((ps1, n1), tab) = foldr instantiatePred (([], n), []) ps

proofApply sub a = case a of
  Proof (Pred cl ty) -> Proof (Pred cl $ apply sub ty)
  A x y -> A (proofApply sub x) (proofApply sub y)
  L s t -> L s $ proofApply sub t
  _ -> a

typeAstSub sub (t, a) = (apply sub t, proofApply sub a)

maybeFix s x = if go x then A (E $ Link "#" "fix" $ Qual [] $ arr (arr (TV "a") (TV "a")) (TV "a")) (L s x) else x where
  go = \case
    V v -> s == v
    A x y -> go x || go y
    L v x -> s /= v && go x
    _ -> False

nonemptyTails [] = []
nonemptyTails xs@(x:xt) = xs : nonemptyTails xt

fv f bound = \case
  V s | not (elem s bound) && f s -> [s]
  A x y -> fv f bound x `union` fv f bound y
  L s t -> fv f (s:bound) t
  _ -> []

fill s a t = case t of
  E _ -> t
  V v -> if s == v then a else t
  A x y -> A (fill s a x) (fill s a y)
  L v u -> if s == v then t else L v $ fill s a u

simulFill tab = go [] where
  go bnd t = case t of
    V s | not $ elem s bnd -> maybe t id $ lookup s tab
    A x y -> A (go bnd x) (go bnd y)
    L s t' -> L s $ go (s:bnd) t'
    _ -> t

triangulate vs defs x = foldr triangle x components where
  tab = zip vs defs
  ios = foldr (\(s, t) (ins, outs) -> let dsts = fv (`elem` vs) [] t in
    (foldr (\dst -> insertWith union dst [s]) ins dsts, insertWith union s dsts outs))
    (Tip, Tip) tab
  components = scc (\k -> maybe [] id $ mlookup k $ fst ios) (\k -> maybe [] id $ mlookup k $ snd ios) vs
  triangle names expr = let
    tnames = nonemptyTails names
    appem vs = foldl1 A $ V <$> vs
    suball x = simulFill (zip (init names) $ appem <$> init tnames) x
    redef tns x = foldr L (suball x) tns
    in foldr (\(x:xt) t -> A (L x t) $ maybeFix x $ redef xt $ maybe (error $ "oops: " ++ x) id $ lookup x tab) (suball expr) tnames

decodeLets x = decodeVars id x where
  decodeVars f = \case
    L "in" t -> decodeBodies id vs t
    L v t -> decodeVars (f . (v:)) t
    where
    vs = f []
    decodeBodies g [] x = ((vs, g []), x)
    decodeBodies g (_:t) (A x y) = decodeBodies (g . (x:)) t y

infer msg typed loc ast csn@(cs, n) = case ast of
  E x -> Right $ case x of
    Basic bug -> error bug
    Const n -> ((TC "Int", ast), csn)
    ChrCon _ -> ((TC "Char", ast), csn)
    StrCon _ -> ((TAp (TC "[]") (TC "Char"), ast), csn)
    Link im s q -> insta q
  V s -> maybe (Left $ "undefined: " ++ s) Right
    $ either (\t -> ((t, ast), csn)) insta <$> lookup s loc
    <|> insta . fst <$> mlookup s typed
  A x y -> rec loc x (cs, n + 1) >>=
    \((tx, ax), csn1) -> rec loc y csn1 >>=
    \((ty, ay), (cs2, n2)) -> unifyMsg msg tx (arr ty va) cs2 >>=
    \cs -> Right ((va, A ax ay), (cs, n2))
  L "let" lets -> do
    let ((vars, defs), x) = decodeLets lets
    rec loc (triangulate vars defs x) csn
  L s x -> first (\(t, a) -> (arr va t, L s a)) <$> rec ((s, Left va):loc) x (cs, n + 1)
  where
  rec = infer msg typed
  va = TV $ show n
  insta ty = ((ty1, foldl A ast (map Proof preds)), (cs, n1))
    where (Qual preds ty1, n1) = instantiate ty n

findInstance searcher qn@(q, n) p@(Pred cl ty) insts = case insts of
  []  -> case ty of
    TV _ -> let v = '*':show n in Right (((p, v):q, n + 1), V v)
    _ -> Left $ "no instance: " ++ show p
  (modName, Instance h name ps _):rest -> case match h ty of
    Nothing -> findInstance searcher qn p rest
    Just subs -> foldM (\(qn1, t) (Pred cl1 ty1) -> second (A t)
      <$> findProof searcher (Pred cl1 $ apply subs ty1) qn1) (qn, if modName == "" then V name else E $ Link modName name undefined) ps

findProof searcher pred@(Pred classId t) psn@(ps, n) = case lookup pred ps of
  Nothing -> findInstance searcher psn pred $ findInstances searcher classId
  Just s -> Right (psn, V s)

prove searcher psn a = case a of
  Proof pred -> findProof searcher pred psn
  A x y -> prove searcher psn x >>= \(psn1, x1) ->
    second (A x1) <$> prove searcher psn1 y
  L s t -> second (L s) <$> prove searcher psn t
  _ -> Right (psn, a)

data Dep a = Dep ([String] -> Either String ([String], a))
instance Functor Dep where
  fmap f = \(Dep mf) -> Dep \g -> do
    (g', x) <- mf g
    pure (g', f x)
instance Applicative Dep where
  pure x = Dep \g -> Right (g, x)
  (Dep mf) <*> (Dep mx) = Dep \g -> do
    (g', f) <- mf g
    (g'', x) <- mx g'
    pure (g'', f x)
addDep s = Dep \deps -> Right (if s `elem` deps then deps else s : deps, ())
badDep s = Dep $ const $ Left s
runDep (Dep f) = f []

unifyMsg s a b c = either (Left . (s++) . (": "++)) Right $ unify a b c

app01 s x y = maybe (A (L s x) y) snd $ go x where
  go expr = case expr of
    V v -> Just $ if s == v then (True, y) else (False, expr)
    A l r -> do
      (a, l') <- go l
      (b, r') <- go r
      if a && b then Nothing else pure (a || b, A l' r')
    L v t -> if v == s then Just (False, expr) else second (L v) <$> go t
    _ -> Just (False, expr)

optiApp t = case t of
  A x y -> let
    x' = optiApp x
    y' = optiApp y
    in case x' of
      L s v -> app01 s v y'
      _ -> A x' y'
  L s x -> L s (optiApp x)
  _ -> t

inferno searcher decls typed defmap syms = let
  anno s = maybe (Left $ TV $ ' ':s) Right $ mlookup s decls
  loc = zip syms $ anno <$> syms
  principal ((acc, preds), (subs, n)) s = do
    expr <- maybe (Left $ "missing: " ++ s) Right (mlookup s defmap)
    ((t, a), (ms, n)) <- infer s typed loc expr (subs, n)
    case mlookup s decls of
      Nothing -> do
        soln <- unifyMsg s (TV (' ':s)) t ms
        Right (((s, (t, a)):acc, preds), (soln, n))
      Just qAnno -> do
        let (Qual pAnno tAnno, n1) = instantiate qAnno n
        soln <- maybe (Left $ s ++ ": match failed: " ++ show qAnno ++ " vs " ++ show (apply ms t)) Right $ match (apply ms t) tAnno
        Right (((s, (t, a)):acc, pAnno ++ preds), (soln @@ ms, n1))
  gatherPreds (acc, psn) (s, (t, a)) = do
    (psn, a) <- prove searcher psn $ optiApp a
    pure ((s, (t, a)):acc, psn)
  in do
    ((stas, preds), (soln, _)) <- foldM principal (([], []), (Tip, 0)) syms
    let ps = zip preds $ ("anno*"++) . show  <$> [0..]
    (stas, (ps, _)) <- foldM gatherPreds ([], (ps, 0)) $ second (typeAstSub soln) <$> stas
    let
      preds = fst <$> ps
      dicts = snd <$> ps
      tab = map (\s -> (s, foldl A (V s) $ V <$> dicts)) syms
      applyDicts (s, (t, a)) = (s, (Qual preds t, foldr L (simulFill tab a) dicts))
    pure $ map applyDicts stas

inferDefs searcher defs decls typed = do
  let
    insertUnique m (s, (_, t)) = case mlookup s m of
      Nothing -> Right $ insert s t m
      _ -> Left $ "duplicate: " ++ s
    addEdges (sym, (deps, _)) (ins, outs) = (foldr (\dep es -> insertWith union dep [sym] es) ins deps, insertWith union sym deps outs)
    graph = foldr addEdges (Tip, Tip) defs
  defmap <- foldM insertUnique Tip defs
  let
    ins k = maybe [] id $ mlookup k $ fst graph
    outs k = maybe [] id $ mlookup k $ snd graph
    inferComponent typed syms = foldr (uncurry insert) typed <$> inferno searcher decls typed defmap syms
  foldM inferComponent typed $ scc ins outs $ keys defmap

dictVars ps n = (zip ps $ map (('*':) . show) [n..], n + length ps)

inferTypeclasses searcher iMap typed = foldM inferInstance typed [(classId, inst) | (classId, insts) <- toAscList iMap, inst <- insts] where
  inferInstance typed (classId, Instance ty name ps idefs) = let
    dvs = map snd $ fst $ dictVars ps 0
    perMethod s = do
      let rawExpr = maybe (V $ "{default}" ++ s) id $ mlookup s idefs
      expr <- snd <$> patternCompile searcher rawExpr
      (ta, (sub, n)) <- either (Left . (name++) . (" "++) . (s++) . (": "++)) Right
        $ infer s typed [] expr (Tip, 0)
      qc <- typeOfMethod searcher s
      let
        (tx, ax) = typeAstSub sub ta
-- e.g. qc = Eq a => a -> a -> Bool
-- We instantiate: Eq a1 => a1 -> a1 -> Bool.
        (Qual [Pred _ headT] tc, n1) = instantiate qc n
-- Mix the predicates `ps` with the type of `headT`, applying a
-- substitution such as (a1, [a]) so the variable names match.
-- e.g. Eq a => [a] -> [a] -> Bool
        Just subc = match headT ty
        (Qual ps2 t2, n2) = instantiate (Qual ps $ apply subc tc) n1
      case match tx t2 of
        Nothing -> Left "class/instance type conflict"
        Just subx -> do
          ((ps3, _), tr) <- prove searcher (dictVars ps2 0) $ optiApp $ proofApply subx ax
          if length ps2 /= length ps3
            then Left $ ("want context: "++) . (foldr (.) id $ shows . fst <$> ps3) $ name
            else pure tr
    in do
      ms <- mapM perMethod $ findSigs searcher classId
      pure $ insert name (Qual [] $ TC "DICTIONARY", flip (foldr L) dvs $ L "@" $ foldl A (V "@") ms) typed

primAdts =
  [ (TC "()", [Constr "()" []])
  , (TC "Bool", [Constr "True" [], Constr "False" []])
  , (TAp (TC "[]") (TV "a"), [Constr "[]" [], Constr ":" $ map ("",) [TV "a", TAp (TC "[]") (TV "a")]])
  , (TAp (TAp (TC ",") (TV "a")) (TV "b"), [Constr "," $ map ("",) [TV "a", TV "b"]])
  ]

prims = let
  ro = E . Basic
  dyad s = TC s `arr` (TC s `arr` TC s)
  wordy = foldr arr (TAp (TAp (TC ",") (TC "Word")) (TC "Word")) [TC "Word", TC "Word", TC "Word", TC "Word"]
  bin s = A (ro "Q") (ro s)
  in map (second (first $ Qual [])) $
    [ ("doubleFromInt", (arr (TC "Int") (TC "Double"), A (ro "T") (ro "FLO")))
    , ("intFromDouble", (arr (TC "Double") (TC "Int"), A (ro "T") (ro "OLF")))
    , ("doubleFromWord", (arr (TC "Word") (TC "Double"), A (ro "T") (ro "FLW")))
    , ("doubleAdd", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FADD")))
    , ("doubleSub", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FSUB")))
    , ("doubleMul", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FMUL")))
    , ("doubleDiv", (arr (TC "Double") (arr (TC "Double") (TC "Double")), A (ro "Q") (ro "FDIV")))
    , ("doubleEq", (arr (TC "Double") (arr (TC "Double") (TC "Bool")), bin "FEQ"))
    , ("doubleLE", (arr (TC "Double") (arr (TC "Double") (TC "Bool")), bin "FLE"))
    , ("rawDouble", (arr (TC "Double") $ arr (arr (TC "Word") $ arr (TC "Word") $ TV "a") $ TV "a", A (ro "T") (ro "PAIR64")))
    , ("intEq", (arr (TC "Int") (arr (TC "Int") (TC "Bool")), bin "EQ"))
    , ("intLE", (arr (TC "Int") (arr (TC "Int") (TC "Bool")), bin "LE"))
    , ("wordLE", (arr (TC "Word") (arr (TC "Word") (TC "Bool")), bin "U_LE"))
    , ("wordEq", (arr (TC "Word") (arr (TC "Word") (TC "Bool")), bin "EQ"))

    , ("charEq", (arr (TC "Char") (arr (TC "Char") (TC "Bool")), bin "EQ"))
    , ("charLE", (arr (TC "Char") (arr (TC "Char") (TC "Bool")), bin "LE"))
    , ("fix", (arr (arr (TV "a") (TV "a")) (TV "a"), ro "Y"))
    , ("if", (arr (TC "Bool") $ arr (TV "a") $ arr (TV "a") (TV "a"), ro "I"))
    , ("intFromWord", (arr (TC "Word") (TC "Int"), ro "I"))
    , ("wordFromInt", (arr (TC "Int") (TC "Word"), ro "I"))
    , ("chr", (arr (TC "Int") (TC "Char"), ro "I"))
    , ("ord", (arr (TC "Char") (TC "Int"), ro "I"))
    , ("ioBind", (arr (TAp (TC "IO") (TV "a")) (arr (arr (TV "a") (TAp (TC "IO") (TV "b"))) (TAp (TC "IO") (TV "b"))), ro "C"))
    , ("ioPure", (arr (TV "a") (TAp (TC "IO") (TV "a")), ro "V"))
    , ("primitiveError", (arr (TAp (TC "[]") (TC "Char")) (TV "a"), ro "ERR"))
    , ("newIORef", (arr (TV "a") (TAp (TC "IO") (TAp (TC "IORef") (TV "a"))), ro "NEWREF"))
    , ("readIORef", (arr (TAp (TC "IORef") (TV "a")) (TAp (TC "IO") (TV "a")),
      A (ro "T") (ro "READREF")))
    , ("writeIORef", (arr (TAp (TC "IORef") (TV "a")) (arr (TV "a") (TAp (TC "IO") (TC "()"))),
      A (A (ro "R") (ro "WRITEREF")) (ro "B")))
    , ("exitSuccess", (TAp (TC "IO") (TV "a"), ro "END"))
    , ("unsafePerformIO", (arr (TAp (TC "IO") (TV "a")) (TV "a"), A (A (ro "C") (A (ro "T") (ro "END"))) (ro "K")))
    , ("fail#", (TV "a", A (V "unsafePerformIO") (V "exitSuccess")))
    , ("join#", (TV "a", A (V "unsafePerformIO") (V "exitSuccess")))
    , ("word64Add", (wordy, A (ro "QQ") (ro "DADD")))
    , ("word64Sub", (wordy, A (ro "QQ") (ro "DSUB")))
    , ("word64Mul", (wordy, A (ro "QQ") (ro "DMUL")))
    , ("word64Div", (wordy, A (ro "QQ") (ro "DDIV")))
    , ("word64Mod", (wordy, A (ro "QQ") (ro "DMOD")))
    ]
    ++ map (\(s, v) -> (s, (dyad "Int", bin v)))
      [ ("intAdd", "ADD")
      , ("intSub", "SUB")
      , ("intMul", "MUL")
      , ("intDiv", "DIV")
      , ("intMod", "MOD")
      , ("intQuot", "QUOT")
      , ("intRem", "REM")
      , ("intXor", "XOR")
      , ("intAnd", "AND")
      , ("intOr", "OR")
      ]
    ++ map (\(s, v) -> (s, (dyad "Word", bin v)))
      [ ("wordAdd", "ADD")
      , ("wordSub", "SUB")
      , ("wordMul", "MUL")
      , ("wordDiv", "U_DIV")
      , ("wordMod", "U_MOD")
      , ("wordQuot", "U_DIV")
      , ("wordRem", "U_MOD")
      ]

tabulateModules mods = foldM ins Tip mods where
  ins tab (k, (mexs, prog)) = case mlookup k tab of
    Nothing -> do
      v <- maybe pure processExports mexs (foldr ($) neatEmpty{moduleImports = ["#"]} prog)
      pure $ insert k v tab
    Just _ -> Left $ "duplicate module: " ++ k
  processExports exs neat = do
    mes <- Just . concat <$> mapM (processExport neat) exs
    pure neat { moduleExports = mes }
  processExport neat = \case
    ExportVar v -> case lookup v $ topDefs neat of
      Nothing -> Left $ "bad export " ++ v
      Just _ -> Right [v]
    ExportCon c ns -> case mlookup c $ type2Cons neat of
      Just cnames
        | ns == [".."] -> Right cnames
        | null delta -> Right ns
        | True -> Left $ "bad exports: " ++ show delta
        where delta = [n | n <- ns, not $ elem n cnames]
      Nothing -> case mlookup c $ typeclasses neat of
        Nothing -> Left $ "bad export " ++ c
        Just methodNames
          | ns == [".."] -> Right methodNames
          | null delta -> Right ns
          | True -> Left $ "bad exports: " ++ show delta
          where delta = [n | n <- ns, not $ elem n methodNames]

data Searcher = Searcher
  { astLink :: Ast -> Either String ([String], Ast)
  , findPrec :: String -> (Int, Assoc)
  , findCon :: String -> Either String [Constr]
  , findField :: String -> (String, [(String, Type)])
  , typeOfMethod :: String -> Either String Qual
  , findSigs :: String -> [String]
  , findInstances :: String -> [(String, Instance)]
  }

isExportOf s neat = case moduleExports neat of
  Nothing -> True
  Just es -> elem s es

findAmong fun viz s = case concat $ maybe [] (:[]) . mlookup s . fun <$> viz s of
  [] -> Left $ "missing: " ++ s
  [unique] -> Right unique
  _ -> Left $ "ambiguous: " ++ s

slowUnionWith f x y = foldr go x $ toAscList y where go (k, v) m = insertWith f k v m

searcherNew tab neat = Searcher
  { astLink = astLink'
  , findPrec = \s -> if s == ":" then (5, RAssoc) else either (const defPrec) id $ findAmong opFixity visible s
  , findCon = findAmong dataCons visible
  , findField = findField'
  , typeOfMethod = fmap fst . findAmong typedAsts visible
  , findSigs = \s -> case mlookup s mergedSigs of
    Nothing -> error $ "missing class: " ++ s
    Just [sigs] -> sigs
    _ -> error $ "ambiguous class: " ++ s
  , findInstances = maybe [] id . (`mlookup` mergedInstances)
  }
  where
  mergedSigs = foldr (slowUnionWith (++)) Tip $ map (fmap (:[]) . typeclasses) $ neat : map (tab !) imps
  mergedInstances = foldr (slowUnionWith (++)) Tip [fmap (map (im,)) $ instances x | (im, x) <- ("", neat) : map (\im -> (im, tab ! im)) imps]
  defPrec = (9, LAssoc)
  findImportSym s = concat [maybe [] (\(t, _) -> [(im, t)]) $ mlookup s $ typedAsts n | (im, n) <- importedNeats s]
  importedNeats s@(h:_) = [(im, n) | im <- imps, let n = tab ! im, h == '{' || isExportOf s n]
  visible s = neat : (snd <$> importedNeats s)
  classes im = typeclasses $ if im == "" then neat else tab ! im
  findField' f = case [(con, fields) | dc <- dataCons <$> visible f, (_, cons) <- toAscList dc, Constr con fields <- cons, (f', _) <- fields, f == f'] of
    [] -> error $ "no such field: " ++ f
    h:_ -> h
  imps = moduleImports neat
  defs = fromList $ topDefs neat
  astLink' ast = runDep $ go [] ast where
    go bound ast = case ast of
      V s
        | elem s bound -> pure ast
        | member s defs -> unlessAmbiguous s $ addDep s *> pure ast
        | member s $ typedAsts neat -> unlessAmbiguous s $ pure ast
        | True -> case findImportSym s of
          [] -> badDep $ "missing: " ++ s
          [(im, t)] -> pure $ E $ Link im s t
          _ -> badDep $ "ambiguous: " ++ s
      A x y -> A <$> go bound x <*> go bound y
      L s t -> L s <$> go (s:bound) t
      _ -> pure ast
    unlessAmbiguous s f = case findImportSym s of
      [] -> f
      _ -> badDep $ "ambiguous: " ++ s

inferModule tab acc name = case mlookup name acc of
  Nothing -> do
    let
      neat = tab ! name
      imps = moduleImports neat
      typed = typedAsts neat
    acc' <- foldM (inferModule tab) acc imps
    let searcher = searcherNew acc' neat
    depdefs <- mapM (\(s, t) -> (s,) <$> patternCompile searcher t) $ topDefs neat
    typed <- inferDefs searcher depdefs (topDecls neat) typed
    typed <- inferTypeclasses searcher (instances neat) typed
    Right $ insert name neat { typedAsts = typed } acc'
  Just _ -> Right acc

untangle s = do
  tab <- insert "#" neatPrim <$> (parseProgram s >>= tabulateModules)
  foldM (inferModule tab) Tip $ keys tab

neatPrim = foldr (\(a, b) -> addAdt a b []) neatEmpty { typedAsts = fromList prims } primAdts
-- Export lists.
module Main where
import Base
import Map
import Ast
import RTS
import Typer
import Kiselyov
import System

hide_prelude_here' = hide_prelude_here'

dumpWith dumper s = case untangle s of
  Left err -> err
  Right tab -> foldr ($) [] $ map (\(name, neat) -> ("module "++) . (name++) . ('\n':) . (foldr (.) id $ dumper neat)) $ toAscList tab

dumpLambs neat = map (\(s, t) -> (s++) . (" = "++) . shows t . ('\n':)) $ second snd <$> toAscList (typedAsts neat)

dumpTypes neat = map (\(s, q) -> (s++) . (" :: "++) . shows q . ('\n':)) $ second fst <$> toAscList (typedAsts neat)

dumpRawCombs neat = map go combs where
  rawCombs = optim . nolam . snd <$> typedAsts neat
  combs = toAscList $ rawCombs
  go (s, t) = (s++) . (" = "++) . shows t . (";\n"++)

dumpCombs neat = map go combs where
  rawCombs = optim . nolam . snd <$> typedAsts neat
  combs = toAscList $ rewriteCombs rawCombs <$> rawCombs
  go (s, t) = (s++) . (" = "++) . shows t . (";\n"++)

main = getArgs >>= \case
  "comb":_ -> interact $ dumpWith dumpCombs
  "rawcomb":_ -> interact $ dumpWith dumpRawCombs
  "lamb":_ -> interact $ dumpWith dumpLambs
  "type":_ -> interact $ dumpWith dumpTypes
  "wasm":opts -> interact \s -> either id id $ untangle s >>= compileWith "1<<22" libcWasm ("no-main":opts)
  "warts":opts -> interact $ either id (warts opts) . untangle
  _ -> interact \s -> either id id $ untangle s >>= compile

Precisely

Before proceeding, we crudely implement arbitrary-precision integers to enable some cool demos. It also exercises our code that handles typeclasses.

Unlike standard Haskell, we add a Ring typeclass rather than Num. The (+) and (*) operators should be reserved for rings, and there are uses for rings that are not also instances of the Num typeclass. For example, Gaussian integers are great for indexing a 2D rectangular board, especially when we want to rotate by a right angle (multiplication by i) or talk about the cardinal directions (which correspond the units).

The integers are the initial ring, so we treat the integer constant n as fromInteger n; if this results in ambiguity, then we drop fromInteger.

Thus the laws that we know to be true in our bones, such as a*(b + c) = a*b
a*c
, will never lead us astray. We must explicitly write fromIntegral to, say, map a Word32 to a Word64. Other languages convert silently, and wind up defying our algebraic intuition.

To represent an integer, we use a list of Word32 numbers, plus a boolean to represent its sign. For GHC compatibility we call the function integerSignList instead of pattern matching on Integer values.

We implement schoolbook algorithms for basic arithmetic, which is straightforward except for division. I realized that when doing long division by hand, to find the next digit of the divisor, I pick something that seems reasonable via a method that seems partly subconscious! How can we possibly code this?

Luckily, there is a simple algorithm that makes good guesses. See Knuth, The Art of Computer Programming.

We rename div and mod to quot and rem, then introduce wrappers for div and mod. Now our divisions behave correctly, though it is sad that div and mod need more instructions. (FORTRAN set an unfortunate precedent of truncating division to zero, ultimately forcing languages like C and WebAssembly and even hardware to conform.)

Our treatment of integer literals causes a bootstrapping issue. Suppose a literal "0" is to be converted to an Int. Then our compiler applies the Int edition of fromInteger to the Integer 0, which involves a call to mpView, whose implementation needs the Int 0. If we simply code this as 0, then we wind up with a circular definition, because our compiler would insert another fromInteger call. We work around this with a definition that bypasses overloading by returning ord '\0'.

-- Assumes overloaded integer literals, lexical negation.
-- Bits.
module Base where

infixr 9 .
infixr 8 ^
infixl 7 * , /, `div` , `mod` , `quot`, `rem`
infixr 6 <>
infixl 6 + , -
infixr 5 ++
infixl 4 <*> , <$> , <* , *>
infix 4 == , /= , <= , < , >= , >
infixl 3 && , <|>
infixl 2 ||
infixl 1 >> , >>=
infixr 1 =<<
infixr 0 $

class Semigroup a where
  (<>) :: a -> a -> a
class Monoid a where
  mempty :: a
  mconcat :: [a] -> a
  mconcat = foldr (<>) mempty
instance Monoid [a] where
  mempty = []
instance Semigroup [a] where
  (<>) = (++)
class Functor f where fmap :: (a -> b) -> f a -> f b
class Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b
class Monad m where
  return :: a -> m a
  (>>=) :: m a -> (a -> m b) -> m b
(<$>) = fmap
liftA2 f x y = f <$> x <*> y
(>>) f g = f >>= \_ -> g
(=<<) = flip (>>=)
class Eq a where (==) :: a -> a -> Bool
instance Eq () where () == () = True
instance Eq Bool where
  True == True = True
  False == False = True
  _ == _ = False
instance (Eq a, Eq b) => Eq (a, b) where
  (a1, b1) == (a2, b2) = a1 == a2 && b1 == b2
instance Eq a => Eq [a] where
  xs == ys = case xs of
    [] -> case ys of
      [] -> True
      _ -> False
    x:xt -> case ys of
      [] -> False
      y:yt -> x == y && xt == yt
instance Eq Int where (==) = intEq
instance Eq Char where (==) = charEq
($) f x = f x
id x = x
const x y = x
flip f x y = f y x
(&) x f = f x
class Ord a where
  (<=) :: a -> a -> Bool
  x <= y = case compare x y of
    LT -> True
    EQ -> True
    GT -> False
  compare :: a -> a -> Ordering
  compare x y = if x <= y then if y <= x then EQ else LT else GT
instance Ord Int where (<=) = intLE
instance Ord Char where (<=) = charLE
data Ordering = LT | GT | EQ deriving (Eq, Show)
instance Ord a => Ord [a] where
  xs <= ys = case xs of
    [] -> True
    x:xt -> case ys of
      [] -> False
      y:yt -> if x <= y then if y <= x then xt <= yt else True else False
  compare xs ys = case xs of
    [] -> case ys of
      [] -> EQ
      _ -> LT
    x:xt -> case ys of
      [] -> GT
      y:yt -> if x <= y then if y <= x then compare xt yt else LT else GT
data Maybe a = Nothing | Just a deriving (Eq, Show)
data Either a b = Left a | Right b deriving (Eq, Show)
fst (x, y) = x
snd (x, y) = y
uncurry f (x, y) = f x y
first f (x, y) = (f x, y)
second f (x, y) = (x, f y)
bool a b c = if c then b else a
not a = if a then False else True
x /= y = not $ x == y
(.) f g x = f (g x)
(||) f g = if f then True else g
(&&) f g = if f then g else False
take :: Int -> [a] -> [a]
take 0 xs = []
take _ [] = []
take n (h:t) = h : take (n - 1) t
drop :: Int -> [a] -> [a]
drop n xs     | n <= 0 = xs
drop _ []              = []
drop n (_:xs)          = drop (n-1) xs
splitAt n xs = (take n xs, drop n xs)
maybe n j m = case m of Nothing -> n; Just x -> j x
instance Functor Maybe where fmap f = maybe Nothing (Just . f)
instance Applicative Maybe where pure = Just ; mf <*> mx = maybe Nothing (\f -> maybe Nothing (Just . f) mx) mf
instance Monad Maybe where return = Just ; mf >>= mg = maybe Nothing mg mf
instance Alternative Maybe where empty = Nothing ; x <|> y = maybe y Just x
foldr c n = \case [] -> n; h:t -> c h $ foldr c n t
length :: [a] -> Int
length = foldr (\_ n -> n + 1) 0
mapM f = foldr (\a rest -> liftA2 (:) (f a) rest) (pure [])
mapM_ f = foldr ((>>) . f) (pure ())
forM = flip mapM
sequence = mapM id
replicateM = (sequence .) . replicate
foldM f z0 xs = foldr (\x k z -> f z x >>= k) pure xs z0
when x y = if x then y else pure ()
unless x y = if x then pure () else y
error = primitiveError
undefined = error "undefined"
foldr1 c l@(h:t) = maybe undefined id $ foldr (\x m -> Just $ maybe x (c x) m) Nothing l
foldl f a bs = foldr (\b g x -> g (f x b)) (\x -> x) bs a
foldl1 f (h:t) = foldl f h t
scanl f q ls = q : (case ls of
  []   -> []
  x:xs -> scanl f (f q x) xs)
scanl1 f (x:xs) =  scanl f x xs
scanl1 _ []     =  []

elem k xs = foldr (\x t -> x == k || t) False xs
find f xs = foldr (\x t -> if f x then Just x else t) Nothing xs
(++) = flip (foldr (:))
concat = foldr (++) []
map = flip (foldr . ((:) .)) []
head (h:_) = h
tail (_:t) = t
(!!) :: [a] -> Int -> a
xs!!0 = head xs
xs!!n = tail xs!!(n - 1)
replicate :: Int -> a -> [a]
replicate 0 _ = []
replicate n x = x : replicate (n - 1) x
repeat x = x : repeat x
cycle = concat . repeat
null [] = True
null _ = False
reverse = foldl (flip (:)) []
dropWhile _ [] = []
dropWhile p xs@(x:xt)
  | p x  = dropWhile p xt
  | True = xs
span _ [] = ([], [])
span p xs@(x:xt)
  | p x  = first (x:) $ span p xt
  | True = ([],xs)
break p = span (not . p)
isSpace c = elem (ord c) [32, 9, 10, 11, 12, 13, 160]
words s = case dropWhile isSpace s of
  "" -> []
  s' -> w : words s'' where (w, s'') = break isSpace s'
lines "" =  []
lines s | (l, s') <- break (== '\n') s = l : case s' of
  [] -> []
  _:s'' -> lines s''
instance Functor [] where fmap = map
instance Applicative [] where pure = (:[]); f <*> x = concatMap (<$> x) f
instance Monad [] where return = (:[]); (>>=) = flip concatMap
instance Alternative [] where empty = [] ; (<|>) = (++)
concatMap = (concat .) . map
lookup s = foldr (\(k, v) t -> if s == k then Just v else t) Nothing
filter p = foldr (\x -> bool id (x:) $ p x) []
filterM p = foldr (\x -> liftA2 (bool id (x:)) $ p x) $ pure []
union xs ys = foldr (\y acc -> (if elem y acc then id else (y:)) acc) xs ys
intersect xs ys = filter (\x -> maybe False (\_ -> True) $ find (x ==) ys) xs
xs \\ ys = filter (not . (`elem` ys)) xs
last (x:xt) = go x xt where go x xt = case xt of [] -> x; y:yt -> go y yt
init (x:xt) = case xt of [] -> []; _ -> x : init xt
intercalate sep = \case [] -> []; x:xt -> x ++ concatMap (sep ++) xt
intersperse sep = \case [] -> []; x:xt -> x : foldr ($) [] (((sep:) .) . (:) <$> xt)
all f = and . map f
any f = or . map f
and = foldr (&&) True
or = foldr (||) False
zipWith f xs ys = case xs of [] -> []; x:xt -> case ys of [] -> []; y:yt -> f x y : zipWith f xt yt
zip = zipWith (,)
unzip [] = ([], [])
unzip ((a, b):rest) = (a:at, b:bt) where (at, bt) = unzip rest
transpose []             = []
transpose ([]     : xss) = transpose xss
transpose ((x:xs) : xss) = (x : [h | (h:_) <- xss]) : transpose (xs : [ t | (_:t) <- xss])
data State s a = State (s -> (a, s))
runState (State f) = f
instance Functor (State s) where fmap f = \(State h) -> State (first f . h)
instance Applicative (State s) where
  pure a = State (a,)
  (State f) <*> (State x) = State \s -> let (g, s') = f s in first g $ x s'
instance Monad (State s) where
  return a = State (a,)
  (State h) >>= f = State $ uncurry (runState . f) . h
evalState m s = fst $ runState m s
get = State \s -> (s, s)
put n = State \s -> ((), n)
either l r e = case e of Left x -> l x; Right x -> r x
instance Functor (Either a) where fmap f e = either Left (Right . f) e
instance Applicative (Either a) where
  pure = Right
  ef <*> ex = case ef of
    Left s -> Left s
    Right f -> either Left (Right . f) ex
instance Monad (Either a) where
  return = Right
  ex >>= f = either Left f ex
class Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a
asum = foldr (<|>) empty
(*>) = liftA2 \x y -> y
(<*) = liftA2 \x y -> x
many p = liftA2 (:) p (many p) <|> pure []
some p = liftA2 (:) p (many p)
sepBy1 p sep = liftA2 (:) p (many (sep *> p))
sepBy p sep = sepBy1 p sep <|> pure []
between x y p = x *> (p <* y)

showParen b f = if b then ('(':) . f . (')':) else f

iterate f x = x : iterate f (f x)
takeWhile _ [] = []
takeWhile p xs@(x:xt)
  | p x  = x : takeWhile p xt
  | True = []

a ^ b = case b of
  0 -> 1
  1 -> a
  _ -> case r of
    0 -> h2
    1 -> h2*a
  where
  (q, r) = divMod b 2
  h = a^q
  h2 = h*h

class Enum a where
  succ           :: a -> a
  pred           :: a -> a
  toEnum         :: Int -> a
  fromEnum       :: a -> Int
  enumFrom       :: a -> [a]
  enumFromTo     :: a -> a -> [a]
  enumFromThen   :: a -> a -> [a]
  enumFromThenTo :: a -> a -> a -> [a]
  succ = toEnum . (+ 1) . fromEnum
  pred = toEnum . (- 1) . fromEnum
  enumFrom x = map toEnum [fromEnum x ..]
  enumFromThen x y = map toEnum [fromEnum x, fromEnum y ..]
  enumFromTo x y = map toEnum [fromEnum x .. fromEnum y]
  enumFromThenTo x1 x2 y = map toEnum [fromEnum x1, fromEnum x2 .. fromEnum y]

instance Enum Int where
  succ = (+ 1)
  pred = (- 1)
  toEnum = id
  fromEnum = id
  enumFrom = iterate succ
  enumFromThen x y = iterate (+(y - x)) x
  enumFromTo lo hi = takeWhile (<= hi) $ enumFrom lo
  enumFromThenTo x y lim = takeWhile ((if y < x then (<=) else (>=)) lim) $ enumFromThen x y
instance Enum Bool where
  toEnum 0 = False
  toEnum 1 = True
  fromEnum False = 0
  fromEnum True = 1
instance Enum Char where
  toEnum = chr
  fromEnum = ord
instance Enum Word where
  toEnum = wordFromInt
  fromEnum = intFromWord
instance Enum Rational where
  toEnum = (% 1) . toEnum
  fromEnum (a :% b) = fromEnum $ a `div` b
  succ = (+ 1)
  pred = (- 1)
instance Enum Double where
  toEnum = doubleFromInt
  fromEnum = intFromDouble
  succ = (+ 1)
  pred = (- 1)

fromIntegral = fromInteger . toInteger

class Ring a where
  (+) :: a -> a -> a
  (-) :: a -> a -> a
  (*) :: a -> a -> a
  fromInteger :: Integer -> a
  negate :: a -> a
  negate = (0 -)

class Field a where
  recip :: a -> a
  recip = (1 /)
  (/) :: a -> a -> a
  a / b = a * recip b
  fromRational :: Rational -> a
  fromRational (a:%b) = fromInteger a / fromInteger b

class Integral a where
  div :: a -> a -> a
  mod :: a -> a -> a
  quot :: a -> a -> a
  rem :: a -> a -> a
  toInteger :: a -> Integer
  divMod :: a -> a -> (a, a)
  divMod a b = (q, a - b*q) where q = div a b
  quotRem :: a -> a -> (a, a)
  quotRem a b = (q, a - b*q) where q = quot a b

instance Ring Int where
  (+) = intAdd
  (-) = intSub
  (*) = intMul
  fromInteger = intFromWord . fromInteger
instance Integral Int where
  div = intDiv
  mod = intMod
  quot = intQuot
  rem = intRem
  toInteger x
    | 0 <= x = Integer True $ if x == 0 then [] else [wordFromInt x]
    | True = Integer False [wordFromInt -x]
zeroWord = wordFromInt $ ord '\0'
instance Ring Word where
  (+) = wordAdd
  (-) = wordSub
  (*) = wordMul
  fromInteger (Integer xsgn xs) = (if xsgn then id else wordSub zeroWord) case xs of
    [] -> zeroWord
    (x:_) -> x
instance Integral Word where
  div = wordDiv
  mod = wordMod
  quot = wordQuot
  rem = wordRem
  toInteger x = Integer True $ if x == 0 then [] else [x]
instance Eq Word where (==) = wordEq
instance Ord Word where (<=) = wordLE

data Word64 = Word64 Word Word deriving Eq
instance Ring Word64 where
  Word64 a b + Word64 c d = uncurry Word64 $ word64Add a b c d
  Word64 a b - Word64 c d = uncurry Word64 $ word64Sub a b c d
  Word64 a b * Word64 c d = uncurry Word64 $ word64Mul a b c d
  fromInteger (Integer xsgn xs) = if xsgn then Word64 x y else uncurry Word64 $ word64Sub 0 0 x y where
    (x, xt) = mpView xs
    (y, _) = mpView xt
instance Ord Word64 where
  Word64 a b <= Word64 c d
    | b == d = a <= c
    | True = b <= d
instance Integral Word64 where
  div (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Div a b c d
  mod (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Mod a b c d
  quot (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Div a b c d
  rem (Word64 a b) (Word64 c d) = uncurry Word64 $ word64Mod a b c d
  toInteger (Word64 a b) = Integer True [a, b]

-- Multiprecision arithmetic.
data Integer = Integer Bool [Word] deriving Eq
instance Ring Integer where
  Integer xsgn xs + Integer ysgn ys
    | xsgn == ysgn = Integer xsgn $ mpAdd xs ys
    | True = case mpCompare xs ys of
      LT -> mpCanon ysgn $ mpSub ys xs
      EQ -> Integer True []
      _ -> mpCanon xsgn $ mpSub xs ys
  Integer xsgn xs - Integer ysgn ys
    | xsgn /= ysgn = Integer xsgn $ mpAdd xs ys
    | True = case mpCompare xs ys of
      LT -> mpCanon (not ysgn) $ mpSub ys xs
      EQ -> Integer True []
      _ -> mpCanon xsgn $ mpSub xs ys
  Integer xsgn xs * Integer ysgn ys
    | null xs || null ys = Integer True []
    | True = Integer (xsgn == ysgn) $ mpMul xs ys
  fromInteger = id
instance Integral Integer where
  div (Integer xsgn xs) (Integer ysgn ys) = if xsgn == ysgn
    then Integer True qs
    else case rs of
      [] -> mpCanon0 False qs
      _  -> mpCanon0 False $ mpAdd qs [1]
    where (qs, rs) = mpDivMod xs ys
  mod (Integer xsgn xs) (Integer ysgn ys) = if xsgn == ysgn
    then mpCanon0 xsgn rs
    else mpCanon ysgn $ mpSub ys rs
    where rs = snd $ mpDivMod xs ys
  quot (Integer xsgn xs) (Integer ysgn ys) = mpCanon0 (xsgn == ysgn) $ fst $ mpDivMod xs ys
  rem (Integer xsgn xs) (Integer ysgn ys) = mpCanon0 xsgn $ snd $ mpDivMod xs ys
  toInteger = id
instance Ord Integer where
  compare (Integer xsgn xs) (Integer ysgn ys)
    | xsgn = if ysgn then mpCompare xs ys else GT
    | True = if ysgn then LT else mpCompare ys xs
instance Enum Integer where
  succ = (+ Integer True [1])
  pred = (+ Integer False [1])
  toEnum = toInteger
  fromEnum = fromInteger
  enumFrom = iterate succ
  enumFromTo lo hi = takeWhile (<= hi) $ enumFrom lo

mpView [] = (0, [])
mpView (x:xt) = (x, xt)

mpCanon sgn xs = mpCanon0 sgn $ reverse $ dropWhile (0 ==) $ reverse xs
mpCanon0 sgn xs = case xs of
  [] -> Integer True []
  _ -> Integer sgn xs

mpCompare [] [] = EQ
mpCompare [] _  = LT
mpCompare _  [] = GT
mpCompare (x:xt) (y:yt) = case mpCompare xt yt of
  EQ -> compare x y
  o -> o

mpAdc [] [] c = ([], c)
mpAdc xs ys c = first (lo:) $ mpAdc xt yt hi where
  (x, xt) = mpView xs
  (y, yt) = mpView ys
  (lo,hi) = uncurry (word64Add c 0) $ word64Add x 0 y 0

mpAdd xs ys | c == 0 = zs
            | True = zs ++ [c]
  where (zs, c) = mpAdc xs ys 0

mpSub xs ys = fst $ mpSbb xs ys 0

mpSbb xs ys b = go xs ys b where
  go [] [] b = ([], b)
  go xs ys b = first (lo:) $ go xt yt $ 1 - hi where
    (x, xt) = mpView xs
    (y, yt) = mpView ys
    (lo,hi) = uncurry word64Sub (word64Sub x 1 y 0) b 0

mpMulWord _ []     c = if c == 0 then [] else [c]
mpMulWord x (y:yt) c = lo:mpMulWord x yt hi where
  (lo, hi) = uncurry (word64Add c 0) $ word64Mul x 0 y 0

mpMul [] _ = []
mpMul (x:xt) ys = case mpMulWord x ys 0 of
  [] -> []
  z:zs -> z:mpAdd zs (mpMul xt ys)

mpDivModWord xs y = first (reverse . dropWhile (0 ==)) $ go 0 $ reverse xs where
  go r [] = ([], r)
  go n (x:xt) = first (q:) $ go r xt where
    q = fst $ word64Div x n y 0
    r = x - q*y  -- Only lower bits matter.

mpDivMod xs ys = first (reverse . dropWhile (== 0)) $ go us where
  s = mpDivScale $ last ys
  us = mpMulWord s (xs ++ [0]) 0
  vs = mpMulWord s ys 0
  (v1:vt) = reverse vs
  vlen = length vs
  go us | ulen <= vlen = ([], fst $ mpDivModWord us s)
        | True = first (q:) $ go $ lsbs ++ init ds
    where
    ulen = length us
    (u0:u1:ut) = reverse us
    (lsbs, msbs) = splitAt (ulen - vlen - 1) us
    (ql, qh) = word64Div u1 u0 v1 0
    q0 = if 1 <= qh then -1 else ql
    (q, ds) = foldr const undefined [(q, ds) | q <- iterate (- 1) q0, let (ds, bor) = mpSbb msbs (mpMulWord q vs 0) 0, bor == 0]

mpDivScale n
  | n1 == 0 = 1
  | otherwise = fst $ word64Div 0 1 n1 0
  where n1 = succ n

mpBase _ [] = ('0':)
mpBase b xs = go xs where
  go [] = id
  go xs = go q . shows r where (q, r) = mpDivModWord xs b

instance Show Integer where showsPrec _ (Integer xsgn xs) = (if xsgn then id else ('-':)) . mpBase 10 xs

instance (Ord a, Ord b) => Ord (a, b) where
  (a1, b1) <= (a2, b2) = a1 <= a2 && (not (a2 <= a1) || b1 <= b2)

a < b = a <= b && a /= b
a > b = b <= a && a /= b
(>=) = flip(<=)

instance Applicative IO where pure = ioPure ; (<*>) f x = ioBind f \g -> ioBind x \y -> ioPure (g y)
instance Monad IO where return = ioPure ; (>>=) = ioBind
instance Functor IO where fmap f x = ioPure f <*> x
class Show a where
  showsPrec :: Int -> a -> String -> String
  showsPrec _ x = (show x++)
  show :: a -> String
  show x = shows x ""
  showList :: [a] -> String -> String
  showList = showList__ shows
shows = showsPrec 0
showList__ _     []     s = "[]" ++ s
showList__ showx (x:xs) s = '[' : showx x (showl xs)
  where
    showl []     = ']' : s
    showl (y:ys) = ',' : showx y (showl ys)
showInt__ n
  | 0 == n = id
  | True = showInt__ (n`div`10) . (chr (48+n`mod`10):)
instance Show () where show () = "()"
instance Show Bool where
  show True = "True"
  show False = "False"
instance Show a => Show [a] where showsPrec _ = showList
instance Show Int where
  showsPrec _ n
    | 0 == n = ('0':)
    | 1 <= n = showInt__ n
    | 2 * n == 0 = ("-2147483648"++)
    | True = ('-':) . showInt__ (0 - n)
showWord_ n
  | 0 == n = id
  | True = showWord_ (n`div`10) . (chr (48+(intFromWord $ n`mod`10)):)
instance Show Word where
  showsPrec _ n
    | 0 == n = ('0':)
    | True = showWord_ n
instance Show Word64 where
  showsPrec p (Word64 x y) = showsPrec p $ Integer True [x, y]
showLitChar__ '\n' = ("\\n"++)
showLitChar__ '\\' = ("\\\\"++)
showLitChar__ c
  | n < 32 || n > 127 = ('\\':) . protectDecEsc (shows n)
  | otherwise = (c:)
  where n = ord c
protectDecEsc f s
  | (c:_) <- s, '0' <= c, c <= '9' = f $ ("\\&"++) s
  | otherwise = f s
instance Show Char where
  showsPrec _ '\'' = ("'\\''"++)
  showsPrec _ c = ('\'':) . showLitChar__ c . ('\'':)
  showList s = ('"':) . foldr (.) id (map go s) . ('"':) where
    go '"' = ("\\\""++)
    go c = showLitChar__ c
instance (Show a, Show b) => Show (a, b) where
  showsPrec _ (a, b) = showParen True $ shows a . (',':) . shows b

integerSignList (Integer xsgn xs) f = f xsgn xs

unwords [] = ""
unwords ws = foldr1 (\w s -> w ++ ' ':s) ws
unlines = concatMap (++"\n")
abs x = if 0 <= x then x else -x
signum x | 0 == x = 0
         | 0 <= x = 1
         | otherwise = -1
otherwise = True
sum = foldr (+) 0
product = foldr (*) 1

max a b = if a <= b then b else a
min a b = if a <= b then a else b

gcd x y = gcd' (abs x) (abs y) where
  gcd' a 0 = a
  gcd' a b = gcd' b (a `rem` b)
lcm = \cases
  _ 0 -> 0
  0 _ -> 0
  x y -> abs ((x `quot` (gcd x y)) * y)

readNatural = foldl (\n d -> toInteger 10*n + toInteger (ord d - ord '0')) (toInteger 0)
readInteger ('-':t) = -(readNatural t)
readInteger s = readNatural s

infixl 7 %
data Rational = Integer :% Integer deriving Eq
numerator (p :% _) = p
denominator (_ :% q) = q
x % y = reduce_ (x * signum y) (abs y)
reduce_ x y = (x `quot` d) :% (y `quot` d) where d = gcd x y
instance Ord Rational where (a :% b) <= (c :% d) = a*d <= b*c
instance Ring Rational where
  (a :% b) + (c :% d) = reduce_ (a*d + b*c) (b*d)
  (a :% b) - (c :% d) = reduce_ (a*d - b*c) (b*d)
  (a :% b) * (c :% d) = reduce_ (a*c) (b*d)
  fromInteger n = n :% 1
instance Show Rational where
  showsPrec _ (a :% b) = shows a . (" % "++) . shows b
instance Field Rational where recip (x :% y) = (y * signum x) :% abs x

instance Ring Double where
  (+) = doubleAdd
  (-) = doubleSub
  (*) = doubleMul
  fromInteger n = integerSignList n \sgn ws -> case reverse ws of
    [] -> doubleFromInt 0
    x:xt -> (if sgn then id else negate) let
      dx = doubleFromWord x
      sh = foldr1 (*) $ replicate 32 $ doubleFromInt 2
      in case xt of
        [] -> dx
        y:yt -> foldr (const (sh*)) (dx*sh + doubleFromWord y) yt
instance Eq Double where (==) = doubleEq
instance Ord Double where (<=) = doubleLE
instance Show Double where
  showsPrec _ d = case compare d 0 of
   EQ -> ('0':)
   LT -> ('-':) . shows -d
   GT -> go where
    tens = iterate (10*) 1
    tenths = iterate (0.1*) 1
    (as, bs) = if d >= 1 then span (<= d) tens else span (>= d) tenths
    norm = if d >= 1 then d / last as else d / head bs
    dig = intFromDouble norm
    go = shows dig . ('.':) . (tail (show $ 1000000 + intFromDouble (1000000 * (norm - doubleFromInt dig)))++) . showsE (length as)
    showsE e = if e == 1 && d >= 1 then id
      else ('e':) . shows (if d >= 1 then e - 1 else 0 - e)
instance Field Double where (/) = doubleDiv

floor = intFromDouble . doubleFloor
ceiling x = 0 - floor (0 - x)

class Bits a where
  xor :: a -> a -> a
  (.&.) :: a -> a -> a
  (.|.) :: a -> a -> a
  shiftL :: a -> Int -> a
  shiftR :: a -> Int -> a
  rotateL :: a -> Int -> a
  rotateR :: a -> Int -> a
  complement :: a -> a
  complement x = -1 - x

instance Bits Int where
  xor = intXor
  (.&.) = intAnd
  (.|.) = intOr
  shiftL n i = intShl n $ fromIntegral i
  shiftR n i = intShr n $ fromIntegral i
  rotateL n i = intOr (intShr n $ 32 - u) (intShl n u) where u = fromIntegral i
  rotateR n i = intOr (intShr n u) (intShl n $ 32 - u) where u = fromIntegral i

instance Bits Word where
  xor = wordXor
  (.&.) = wordAnd
  (.|.) = wordOr
  shiftL n i = wordShl n $ fromIntegral i
  shiftR n i = wordShr n $ fromIntegral i
  rotateL n i = wordOr (wordShr n $ 32 - u) (wordShl n u) where u = fromIntegral i
  rotateR n i = wordOr (wordShr n u) (wordShl n $ 32 - u) where u = fromIntegral i

instance Bits Word64 where
  xor (Word64 a b) (Word64 c d) = Word64 (wordXor a c) (wordXor b d)
  (Word64 a b) .&. (Word64 c d) = Word64 (wordAnd a c) (wordAnd b d)
  (Word64 a b) .|. (Word64 c d) = Word64 (wordOr a c) (wordOr b d)
  shiftL (Word64 a b) i
    | u >= 32 = Word64 0 $ wordShl a $ u - 32
    | otherwise = Word64 (wordShl a u) (wordShr a (32 - u) + wordShl b u)
    where u = fromIntegral i
  shiftR (Word64 a b) i
    | u >= 32 = Word64 (wordShr b $ u - 32) 0
    | otherwise = Word64 (wordShr a u + wordShl b (32 - u)) (wordShr b u)
    where u = fromIntegral i
  rotateL (Word64 a b) i = let n = wordFromInt i in
    uncurry Word64 (word64Shl a b n 0) .|.
    uncurry Word64 (word64Shr a b (64 - n) 0)
  rotateR (Word64 a b) i
    | u >= 32 = small b a $ u - 32
    | otherwise = small a b u
    where
    u = fromIntegral i
    small a b u = Word64 (wordShr a u + wordShl b (32 - u)) (wordShr b u + wordShl a (32 - u))

Wasm to Haskell?

It’d be nice to write wasm ourselves, but for expedience, we rely on Clang to compile our runtime system to a wasm binary which we manipulate. WebAssembly turns out to be pleasantly malleable. A binary breaks up into independent sections, and we can add, delete, or modify sections before stitching them together again.

We write a tool that just does enough wasm parsing to print the sections of a wasm file we want in the form of a Haskell module, and run this on the output of crossly warts to create WartsBytes.hs.

module Main where

import Base
import System

data Charser a = Charser
  { getCharser :: String -> Either String (a, String) }

instance Functor Charser where fmap f (Charser x) = Charser $ fmap (first f) . x
instance Applicative Charser where
  pure a = Charser $ \s -> Right (a, s)
  f <*> x = Charser \inp -> do
    (fun, t) <- getCharser f inp
    (arg, u) <- getCharser x t
    pure (fun arg, u)
instance Monad Charser where
  Charser f >>= g = Charser $ (good =<<) . f
    where good (r, t) = getCharser (g r) t
  return = pure

bad :: String -> Charser a
bad = Charser . const . Left

headerAndVersion :: String
headerAndVersion = "\0asm\x1\0\0\0"

eof :: Charser Bool
eof = Charser \s -> Right (null s, s)

next :: Charser Int
next = Charser \case
  [] -> Left "unexpected EOF"
  h:t -> Right (ord h, t)

sat f = Charser \case
  h:t | f h -> Right (h, t)
  _ -> Left "unsat"

remainder :: Charser String
remainder = Charser \s -> Right (s, "")

varuint7 = next
varuint32 = varuint

varuint :: Charser Int
varuint = unleb 1 0
-- varuint = fromIntegral <$> unleb 1 0

-- unleb :: Integer -> Integer -> Charser Integer
unleb m acc = do
  -- d <- fromIntegral <$> next
  d <- next
  if d > 127 then unleb (m * 128) $ (d - 128) * m + acc else pure $ d*m + acc

sections = eof >>= \b -> if b then pure [] else do
  n <- varuint7
  s <- vec (chr <$> next)
  ((n, s):) <$> sections

wasm = do
  s <- replicateM 8 (chr <$> next)
  if s /= headerAndVersion then bad "bad header or version" else sections

hexDigit n | n < 10 = chr $ n + ord '0'
           | True   = chr $ n - 10 + ord 'a'

xxd = \case
  "" -> ""
  h:t -> let n = ord h in hexDigit (div n 16) : hexDigit (mod n 16) : xxd t

replicateM = (mapM id .) . replicate
vec f = varuint >>= (`replicateM` f)

search00type xs = do
  fts <- maybe (Left "missing section 1") Right $ lookup 1 xs
  ios <- fst <$> getCharser go fts
  maybe (Left "missing (0, 0) functype") Right $ lookup (0, 0) $ zip ios [0..]
  where
  go = vec $ do
    sat (== '\x60')
    inCount <- varuint
    replicateM inCount next
    outCount <- varuint
    replicateM outCount next
    pure (inCount, outCount)

searchExport needle xs = do
  exs <- maybe (Left "missing section 7") Right $ lookup 7 xs
  maybe (Left "not found") Right =<< asum . fst <$> getCharser go exs
  where
  go = vec $ do
    s <- vec $ chr <$> next
    next
    n <- varuint
    pure $ if s == needle then Just n else Nothing

allFunCount xs = do
  impCount <- maybe (Right 0) countImps $ lookup 2 xs
  funCount <- maybe (Right 0) countFuns $ lookup 3 xs
  pure $ impCount + funCount
  where
  countImps imps = length . fst <$> getCharser goImps imps
  goImps = vec $ do
    vec next
    vec next
    sat (== '\0')
    varuint
    pure ()
  countFuns funs = fst <$> getCharser varuint funs

main = do
  s <- getContents
  case getCharser wasm s of
    Left e -> putStrLn $ "parse error: " ++ e
    Right (xs, []) -> do
      putStr "module WartsBytes where\nimport Base\nwartsBytes = "
      print $ second xxd <$> filter (not . (`elem` [0, 6]) . fst) xs
      putStrLn $ either error (("allFunCount = "++) . show) $ allFunCount xs
      putStrLn $ either error (("funType00Idx = "++) . show) $ search00type xs
      putStrLn $ either error (("reduceFunIdx = "++) . show) $ searchExport "reduce" xs
    _ -> error "unreachable"

The RTS includes generate code that wraps foreign imports. It can only be used with programs that declare the same foreign imports.

Webby

We build a compiler that goes directly from Haskell to wasm. The code does little more than dumping the runtime system wasm binary along with bytes describing the initial contents of the heap.

For now we look for the main function in the Main module and export it as the go function; export declarations are ignored.

-- In-browser compiler.
module Main where

import Base
import Ast
import Map
import Typer
import RTS
import System
import WartsBytes

data StrLen = StrLen { _str :: String -> String, _len :: Int }
instance Monoid StrLen where
  mempty = StrLen id 0
instance Semigroup StrLen where
  (StrLen s1 n1) <> (StrLen s2 n2) = StrLen (s1 . s2) (n1 + n2)

hexValue d
  | d <= '9' = ord d - ord '0'
  | d <= 'F' = 10 + ord d - ord 'A'
  | d <= 'f' = 10 + ord d - ord 'a'

unxxd s = StrLen (go s) (length s `div` 2) where
  go s = case s of
    [] -> id
    (d1:d0:rest) -> ((chr $ hexValue d1 * 16 + hexValue d0):) . go rest

main = do
  putStr "DEBUG\n"
  interact $ either id id . toWasm

toWasm s = do
  objList <- toAscList <$> objectify s
  let
    ffis = foldr (\(k, v) m -> insertWith (error $ "duplicate import: " ++ k) k v m) Tip $ concatMap (toAscList . ffiImports . _neat . snd) objList
    ffiMap = fromList $ zip (keys ffis) [0..]
    offobjs = snd $ foldr (\(name, obj) (n, m) -> (n + length (_mem obj), insert name (n, obj) m)) (0, Tip) objList
    (ffeMap, memFun) = foldr (layout offobjs ffiMap) (Tip, id) objList
    ffes = toAscList $ fst <$> ffeMap
    mem = memFun []
    go (n, x)
      -- Function section: for each export, declare a function of type () -> ()..
      | n == 3  = leb n <> extendSection x (replicate (length ffes) $ unxxd "01")
      -- Export section: add each export.
      | n == 7  = leb n <> extendSection x (zipWith encodeExport (fst <$> ffes) [allFunCount..])
      -- Code section: for nth export, define a function calling rts_reduce([512 + 4*n])
      | n == 10 = leb n <> extendSection x (callRoot <$> [0..length ffes - 1])
      | True    = leb n <> leb (_len s) <> s where s = unxxd x
    roots = encodeData rootBase $ littleEndian $ (snd <$> ffes) ++ [0]
    prog = encodeData heapBase $ littleEndian $ length mem : mem
  pure $  _str (unxxd "0061736d01000000" <> mconcat (map go wartsBytes)
-- Data section:
--   512 : null-terminated roots array
--   1048576 - 4: hp
--   1048576: initial heap contents
    <> leb 11 <> extendSection "00" [roots, prog]) ""

extendSection x xs = encodeSection (k + length xs) $ unxxd s <> mconcat xs
  where (k, s) = splitLeb x
splitLeb x = go x 1 0 where
  go (d1:d0:t) m acc
    | n < 128 = (acc + n*m, t)
    | True = go t (m*128) $ acc + (n - 128)*m
    where
    n = hexValue d1 * 16 + hexValue d0
encodeExport s n = encodeString s <> unxxd "00" <> leb n
encodeString s = let n = length s in leb n <> StrLen (s++) n

littleEndian ns = StrLen (foldr (.) id $ go 4 <$> ns) $ 4 * length ns where
  go k n
    | k == 0 = id
    | True = (chr (n `mod` 256):) . go (k - 1) (n `div` 256)

rootBase = unxxd "004180040b"  -- sleb 512 = 8004
heapBase = unxxd "0041fcff3f0b"  -- sleb (1048576 - 4) = fcff3f

-- 0 locals.
-- i32.const 0; i32.load 512 + 4*n; call $REDUCE; end;
callRoot n = leb (_len s) <> s where
  s = unxxd "0041002802" <> slebPos (512 + 4*n) <> unxxd "10"
   <> leb reduceFunIdx <> unxxd "0b"

encodeData addr s = addr <> leb (_len s) <> s
encodeSection k s = let n = leb k in leb (_len n + _len s) <> n <> s

leb n
  | n <= 127 = StrLen (chr n:) 1
  | True = StrLen (chr (128 + n `mod` 128):) 1 <> leb (n `div` 128)

slebPos n
  | n <= 63 = StrLen (chr n:) 1
  | True = StrLen (chr (128 + n `mod` 128):) 1 <> slebPos (n `div` 128)

webby.wasm

To build a browser-based compiler, we essentially run the previous compiler on itself, thus producing a wasm binary that can translate Haskell directly into wasm.

One change is needed: we swap System.hs for SystemWasm.hs. The Linux version declares foreign imports for those in our runtime system for Linux. The wasm version declares foreign imports for functions provided by the host environment.

-- For wasm environments providing Haskell-like getchar, putchar, eof.
module System where

import Base

foreign import ccall "putchar" putChar :: Char -> IO ()
foreign import ccall "getchar" getChar :: IO Char
foreign import ccall "eof" isEOFInt :: IO Int

isEOF = (0 /=) <$> isEOFInt
putStr = mapM_ putChar
putStrLn = (>> putChar '\n') . putStr
print = putStrLn . show
getContents = isEOF >>= \b -> if b then pure [] else getChar >>= \c -> (c:) <$> getContents
interact f = getContents >>= putStr . f

Messy

By now, we’re painfully aware of the effects of rash decisions made while writing our earlier compilers. Compilation time has grown rapidly. Parts of the code are difficult to extend.

I’m cleaning up the precisely compiler first, and intend to backport the changes later, so the journey will look smoother. Until then, the evolution of our compiler will appear jumpy.

Examples:

  • Rewriting let expressions as lambda terms should happen after type-checking, so that type annotations are easier to handle.

  • Some library-like functions like overFree should be scrapped because they’re only used once.

  • I renamed beta to fill as it really fills holes in a context, that is, it substitutes without caring about variable capture. We get away with this because we unshadow variables during type-checking except for join point variables.

Introducing features in a different order also smooths the journey. For example, I originally tried hash consing just before the virtually compiler, which slowed compilation and whose code changed markedly as syntax features were added. I pushed it to a far later compiler for faster bootstrapping and more stable code. Another example is the Show typeclass, which I originally added in a surprisingly recent compiler.

Built-in primitives started in their own # module, then were replaced by code that pre-defined them for every module. But I think I was right the first time, and switched it back. Hopefully there are no traces of my misadventure left.

At the same time, I’m introducing more chaos as I forge ahead with new experiments, especially those involving the web. The above only chronicles the start of adventures in the browser. A fuller account:

  • The crossly wasm command modifies the output C so Clang can compile it to wasm: it adds a minimal bump malloc implementation, and uses attributes to declare exports.

  • The crossly warts command also modifies the output C so Clang can compile to wasm, though it avoids malloc so it can prepopulate the heap. At the time, I didn’t want to allocate a heap and then copy the initial state. It only generates code for FFIs and the RTS, and assumes a tool like webby will supply the initial heap contents.

  • The Imp compiler is like webby but with an extra hack to request the source of dependent import modules.

  • The reply family of REPLs introduced more complications. To avoid compiling Base every time, we define a CBOR-based object file format that contains the bytecode for each definition, along with symbols, types, typeclasses, fixities, type aliases, and so on.

  • For the REPL, we add a scratchpad buffer where the interpreter writes new bytecode. For definitions, we write the number of new symbols and call vmgcroot() to add roots so they survive GC. There’s also a tagging mechanism, where the least significant bit indicates the rest of the word is the index of a previous definition. We do all this because GC could occur while compiling new definitions; the scratchpad is untouched by GC.

  • Related are introspect.c, which lets us examine the heap from the REPL, and a reduction function that is aware of "gas": a feature that allows us to run a program for a limited number of steps, and resume execution later if it has yet to finish.

Designing the object file format leads us to take a closer look at the many phases of compilation:

  1. During parsing, we compile data declarations, generating ASTs for their Scott-encoding and lookup tables for field names. We also genreate instances for deriving clauses.

  2. During parsing, we generate selector code for each typeclass declaration.

  3. During parsing, we compile each FFI import to an F combinator, adding a lambda abstraction for each argument.

  4. Immediately after parsing, we find all exported symbols. The sole purpose of the type2Cons field is to find all exported constructors in a declaration like Foo(..).

  5. We apply type aliases to canonicalize all type names.

  6. We topologically sort the modules, giving us an order which we can compile them. Recall our compilers expect the input to be the concatenation of all source modules, except for imp.wasm, which has a way of requesting more modules.

  7. We rewrite expressions according to fixity declarations, and rewrite pattern matches as lambda terms.

  8. At last, we compile top-level definitions to combinators, reconciling them with any type annotations. Life was simpler with our early compilers, where this was the only task we performed.

  9. It is only now we can compile typeclass instance definitions, as they may depend on top-level definitions. (In the previous step, there are dependencies going the other way, but we know the purported types of all instances, which is the only information needed.)

Our data structures freely mix information collected during parsing with information generated by compilation.


Ben Lynn blynn@cs.stanford.edu 💡