= Equational Reasoning = Let's take a well-known programming problem, and solve it with generous helpings of _equational reasoning_. This just means if we know _x_ = _y_, then anywhere we see _x_ we may replace it by _y_ and vice versa. This sounds simple and natural, but fails in many languages due to side effects. In contrast, much of Haskell is pure, so we greatly benefit from equational reasoning. Define a binary tree data structure that stores items of type `a` in its internal nodes, and stores nothing in its leaves:
[ ]:
data Tree a = N | B a (Tree a) (Tree a) deriving Show
An inorder traversal produces a list as follows:
[ ]:
inorder :: Tree a -> [a] inorder = \case N -> [] B x l r -> inorder l <> [x] <> inorder r
A preorder traversal produces a list as follows:
[ ]:
preorder :: Tree a -> [a] preorder = \case N -> [] B x l r -> [x] <> preorder l <> preorder r
Examples:
[ ]:
leaf c = B c N N tree1 = B 'b' (leaf 'a') (leaf 'c') tree2 = B 'b' (B 'x' (leaf 'a') N) (B 'y' N (leaf 'c')) [inorder, preorder] <*> [tree1] [inorder, preorder] <*> [tree2]
*Puzzle*: Recover a binary tree of distinct elements from its inorder and preorder traversals. In other words, define a function: ---------------------------------------------------------------- f :: Eq a => [a] -> [a] -> Tree a ---------------------------------------------------------------- that satisfies, for all `t` of type `Tree a`: ---------------------------------------------------------------- f (preorder t) (inorder t) == t ---------------------------------------------------------------- In the old days, to solve this, I might have racked my brain, trying to bring to the surface what I know about binary trees. I might have studied other puzzles about binary trees. I might have tried small examples to see if they cast light on a general solution. These are good strategies, but then I read _Pearls of Functional Algorithm Design_, where https://en.wikipedia.org/wiki/Richard_Bird_(computer_scientist)[Richard Bird] aims "to see to what extent algorithm design can be cast in a familiar mathematical tradition of calculating a result by using well-established mathematical principles, theorems and laws." Of course! I calculate 123 + 456 + 789 by robotically following standard recipes, reserving my intellect for greater challenges. Why not do the same when programming? Be smart by being stupid! Thus we begin with structural induction on `t`, which by definition splits into two cases. Case `t == N`: By definition: ---------------------------------------------------------------- preorder N = [] inorder N = [] ---------------------------------------------------------------- thus equational reasoning implies: ---------------------------------------------------------------- f [] [] = N ---------------------------------------------------------------- Case `t == B x l r`: ---------------------------------------------------------------- f (preorder (B x l r)) (inorder (B x l r)) == B x l r ---------------------------------------------------------------- We rewrite the left-hand side using the definitions of `preorder` and `inorder`: ---------------------------------------------------------------- f (x : preorder l <> preorder r) (inorder l <> [x] <> inorder r) ---------------------------------------------------------------- where we apply the identity `[a] <> bs == a : bs`. By inductive assumption, the right-hand side is: ---------------------------------------------------------------- B x (f (preorder l) (inorder l)) (f (preorder r) (inorder r)) ---------------------------------------------------------------- Let's write: ---------------------------------------------------------------- plpr = preorder l <> preorder r ilxir = inorder l <> [x] <> inorder r ---------------------------------------------------------------- to work towards a definition of `f`: ---------------------------------------------------------------- f (x : plpr) ilxir == B x (f (preorder l) (inorder l)) (f (preorder r) (inorder r)) ---------------------------------------------------------------- A decent start, but we must find some way to write various terms on the right-hand side in terms of `x`, `plpr` and `ilxir`. Sadly, at this point, we need to think. Since the elements are distinct, we can show `+++not (x `elem` inorder l)+++`, hence: ---------------------------------------------------------------- (inorder l, x:inorder r) == break (==x) ilxir ---------------------------------------------------------------- The `splitAt` function has the property: ---------------------------------------------------------------- (x, y) == splitAt (length x) (x <> y) ---------------------------------------------------------------- By induction, we can show for all `x`: ---------------------------------------------------------------- length (preorder x) == length (inorder x) ---------------------------------------------------------------- Altogether, we have calculated:
[ ]:
f :: Eq a => [a] -> [a] -> Tree a f [] [] = N f (x : plpr) ilxir = B x (f pl il) (f pr ir) where (il, _:ir) = break (== x) ilxir (pl, pr) = splitAt (length il) plpr
Solved! But we had to turn off autopilot for the last stretch.
[ ]:
f "abc" "bac" f "axbyc" "bxayc"
Nevertheless, blindly attempting structural induction and expanding definitions gave us a good head start, and although we had to work for our meal in the end, by sticking to equational reasoning: * We proved our code is correct. * We proved a solution exists and must be unique. * We can pinpoint the step that needs the elements to be distinct. My old self would likely written a solution in C that was "obviously correct" and have trouble with these tasks. Equational reasoning is possible in some parts of C such as arithmetic expressions, but it covers so little of the language that we must resort to other methods to show correctness. This exercise naturally leads to: *Puzzle*: Write a program that automates calculations like the above. [Originally I had planned to hold off on this article until after I had solved this!] The structural induction and expansion of definitions seem easy to automate, and also the identity `[a] <> bs == a : bs`. The `splitAt` property we used could conceivably be built-in. But how would a program figure out `break` and its connection with the condition that the elements are distinct? And how would it discover the lengths of the two different traversals must always match? Why do natural numbers arise in a problem about lists and trees? Is it because of how I think, or is it fundamental to the problem? Perhaps Peano-encoding numbers as lists would make certain connections easier to see, that is, we represent the natural `n` with `replicate n ()`, in which case `length = map (const ())`, and so on.