Laziness in Action

Let’s write an program that visualizes computation for a Haskell-like language. For example, the input:

sq x = x * x;
main = sq (sq 3)

begins as the graph:

The internal nodes depict function application: we apply the function of the left child to the right child. Lazy evaluation means we expand the outermost sq first. Eager evaluation would expand the innermost sq first.

The outermost function is now the built-in multiplication function, which forces evaluation of its arguments. Here, both arguments are the same, so we wind up expanding sq 3 once and using it twice. This requires a pure language: no such optimization is possible in a language where any statement may have side effects.

We perform the bottommost multiplication:

Then the remaining multiplication yields the final answer: 81.

Slideshows of other programs (with stack nodes highlighted in red):

id example; our interpreter only computes a reused subexpression once.
the wrong way to compute Fibonacci numbers
straightfoward factorial recursion: lazy evaluation causes the expression to grow until the last minute.
factorial recursion with tail call: lazy evaluation still causes the expression to grow until the last minute.
factorial recursion, forcing evalution with case statement: at last, the expression stays the same size.

How to be lazy

See "Implementing Functional Languages: a tutorial" by Simon L Peyton Jones and David R Lester. Some of the above examples were taken from this free online book, and we will use its terminology. The talk "Into The Core" is also illuminating.

Our goal is a self-contained program that can lazily interpret recursive functions operating on integers. On the one hand, we want to get something working quickly, and on the other hand, we want something interesting. Recursion on integers means we can test our program on fun computations such as the Fibonacci numbers, yet our program will still be relatively small, because we’re only dealing with integers.

Our journey naturally meanders through two gems of functional programming: parser combinators, which we explore separately, and inductive data structures for graphs. Apart from these two, we’ll stay within the base Haskell system:

import Control.Monad
import Data.Char
import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.PatriciaTree
import Data.Graph.Inductive.Query.DFS (reachable)
import Data.List
import qualified Data.Map as M
import Data.Tree
import Text.ParserCombinators.Parsec

Parser

The following parses a Core-like language featuring integers, infix operators of different precedences and associativities, and case statements. For simplicity, we parse expressions into Tree String structures: this makes it easier to reuse this parser in other projects.

program :: Parser [([String], Tree String)]
program = spaces >> sc `sepBy` want ";" <* eof where
  sc = do
    lhs <- many1 var
    want "="
    rhs <- expr
    return (lhs, rhs)
  expr = caseExpr <|>
    molecule `chainr1` chop "^" `chainl1` chop "*/" `chainl1` chop "+-"
  caseExpr = do
    x <- between (want "case") (want "of") expr
    as <- alt `sepBy` want ";"
    return $ Node "case" (x:as)
  alt = do
    p <- lif num <|> lif var
    want "->"
    x <- expr
    return $ Node "->" [p, x]
  chop :: String -> Parser (Tree String -> Tree String -> Tree String)
  chop xs = try $ do
    s <- foldl1' (<|>) $ map (want  . pure) xs
    return $ \x y -> Node "" [Node "" [Node s [], x], y]
  molecule = foldl1' (\x y -> Node "" [x, y]) <$> many1 atom
  atom = between (want "(") (want ")") expr <|> lif var <|> lif num
  var = try $ do
    s@(h:_) <- tok
    when (not (isLetter h) || s `elem` words "case of _") $ fail ""
    return s
  num = try $ do
    s <- tok
    unless (all isDigit s) $ fail ""
    return s
  lif = (flip Node [] <$>)
  want t = try $ do
    s <- tok
    unless (s == t) $ fail $ "expected " ++ t
    return s
  tok = do
    s <- many1 (alphaNum <|> char '_') <|> many1 (oneOf "+-/*^><=") <|>
         string ";" <|> string "(" <|> string ")"
    spaces
    return s

Inductive Graphs

While data structures like lists and trees have been viewed recursively for years, graph algorithms are traditionally imperative. For example, descriptions of graph traversal may describe marking a node to prevent revisiting it.

Instead, if we define graphs with an inductive data type, we can write graph algorithms in a functional style.

Using such a library leads to succinct code for modifying the graph. Instead of introducing indirection nodes to save work when the id function is present, we can simply reroute edges. This is probably slower, but our goal is visualization, not an industrial-strength compiler.

We define the Snapshot type to describe the interpeter state. It consists of a graph and stack of nodes. Each node holds a function that describes how it changes a Snapshot, along with a string. We use the string for printing the node, and also for storing integer values. The stack of nodes holds the spine, and also temporarily holds the results of expressions.

The details are a little finicky. We label all the edges to distinguish left children from right children. Nodes are referenced by integers, and we must devise a numbering scheme. A self-loop trick ensures the stack nodes work correctly when dereferencing variables introduced in case alternatives. We could easily remove self-loops from our slideshows, but again, we want to keep our code short.

Since we want to show the changes to the graph over time, functions operate on lists of snapshots, so older states are preserved for display later.

We show expanding a supercombinator (top-level definition) in a single step. Finer resolution, such as stepping through case expressions, is only a bit more work, but we want to keep our code short. We skip garbage collection for the same reason.

data Funk = Funk String ([Snapshot] -> [Snapshot])
instance Show Funk where show (Funk s _) = s
type Prograph = Gr Funk Int
type Snapshot = (Prograph, [Int])

nsuc i g n = fst $ head $ filter ((== i) . snd) $ lsuc g n

intern i xs = foldl' (\m (id, n) -> M.insert id n m) M.empty $ zip xs [i..]

nip ((g, x:_:sp):rest) = (g, x:sp):rest

run :: [([String], Tree String)] -> [Snapshot]
run t = do
  let
    scs = intern 0 $ head . fst <$> t
    funky s = Funk s f where
      f | null s = \h@((g, n:_):_) -> whnf h $ nsuc 0 g n
        | all isDigit s = id
        | Just n <- M.lookup s scs = graft $ t!!n
        | otherwise = case s of
          "+" -> binOp (+)
          "*" -> binOp (*)
          "-" -> binOp (-)
          "^" -> binOp (^)
          "div" -> binOp div
          "mod" -> binOp mod
          "min" -> binOp min
          "max" -> binOp max
    bloom h@((g, sp):_) (vars, Node s kids)
      | s == "case" = let
        h1@((gc, nc:fn:ns):_) = step $ bloom h (vars, head kids)
        findAlt (Node "->" [Node p@(c:_) [], e]:as)
          | isLetter c = nip $
            bloom ((insEdge (nc, nc, 1) gc, fn:nc:ns):h1) (p:vars, e)
          | p == show (intAt gc nc) = bloom ((gc, fn:ns):h1) (vars, e)
          | otherwise = findAlt as
        in findAlt $ tail kids
      | Just n <- elemIndex s vars = (g, nsuc 1 g (sp!!(n + 1)):sp):tail h
      | otherwise = let
        ((gg, ns):hh, _) =
          foldl' f ((([], n, funky s, []) & g, sp):tail h, 0) kids
        n = if isEmpty g then 0 else 1 + snd (nodeRange g)
        f (h1, i) kid = ((insEdge (n, m, i) ga, sp):h2, i + 1) where
          (ga, m:sp):h2 = bloom h1 (vars, kid)
        in (gg, n:ns):h

    graft (vars, t) h@((g, spine):_) = let
      h1@((g1, sp@(n:_)):_) = bloom h (tail vars, t)
      (Just (ins, _, _, _), g2) = match (sp!!length vars) g1
      g3 = insEdges ((\(lbl, src) -> (src, n, lbl)) <$> ins) g2
      in step $ (g3, n : drop (length vars + 1) sp):h

  step [(mkGraph [(0, funky "main")] [], [0])]

step h@((g, n:_):_) = f h where Just (Funk _ f) = lab g n
whnf ((g, sp):rest) n = step ((g, n:sp):rest)

intAt g n = read s :: Integer where Just (Funk s _) = lab g n

binOp f h0@((g, _:x:_):_) = (gz, y2:rest):h2 where
  h1@((gx, _:_:_:y1:_):_) = whnf h0 $ nsuc 1 g x
  h2@((gy, yv:xv:_:_:y2:rest):_) = whnf h1 $ nsuc 1 gx y1
  z = f (intAt gy xv) (intAt gy yv)
  (Just (ins, _, _, _), g1) = match y2 gy
  gz = (ins, y2, Funk (show z) id, []) & g1

Output

The output is a slideshow in Asciidoc format that uses the graphviz plugin to draw graphs. The HTML slides above were generated by running:

asciidoc --backend=slidy OUTPUT_FILE

on the output of our program, which generates a DOT description of a graph:

dotty (g, sp) = unlines [
   "== " ++ show (length sub) ++ " nodes ==",
   "",
   "[\"graphviz\"]", "----------------", "digraph program {",
   "node [fontname=\"Inconsolata\", shape=box]", "ordering=out",
   unlines $ dump <$> sub,
   "}", "----------------"]
   where
     sub = reachable (last sp) g
     dump n = let Just s = lab g n in concat [show n,
       if null (show s) then "[label=\"\", shape=circle, width=0.4, height=0.4" ++ co ++ "]" else "[label=\"" ++ show s ++ "\"" ++ co ++ "]",
       "\n", unlines $ dumpEdge n . fst <$> sortOn snd (lsuc g n)]
       where co | n `elem` sp = ", style=filled, fillcolor=red"
                | otherwise   = ""
     dumpEdge n t = show n ++ " -> " ++ show t

main = do
  s <- getContents
  case parse program "" s of
    Left err -> putStrLn $ "parse error: " ++ show err
    Right t -> do
      putStr $ unlines ["= Graph Reduction =", "",
        "----------------", s, "----------------"]
      putStr $ unlines $ map dotty $ reverse $ run t

Ben Lynn blynn@cs.stanford.edu 💡