Over the centuries, mathematicians have developed notation so eloquent that it
is partly responsible for their progress. To quote George Boole: “That
language is an instrument of human reason, and not merely a medium for the
expression of thought, is a truth generally admitted.”
Unfortunately, elegance is often lost in translation to a computer program. A
short formula expands into several nested loops. Simple conditions become bulky
if-then-else blocks. A few symbols that represent an infinite object may have
no direct analogue.
However, if the target language is Haskell, we can retain much of the original
mathematical flavour. It’s one of the reasons why Haskell is better for rapid
prototyping than the usual suspects, such as Python.
Gray code
In Volume 4 of The Art of Computer Programming, Knuth mathematically
introduces Gray code as follows: “if Γn stands for the Gray binary
sequence of n-bit strings, we can define Γn recursively by the two rules
Γ0 = ϵ; Γn+1 = 0Γn , 1ΓnR”.
This translates eerily well to Haskell:
g 0 = [""]
g n = (map ('0':)) (g (n-1)) ++ (map ('1':)) (reverse (g (n-1)))
Why is it so easy to write Haskell code to resemble a mathematical explanation?
At a lower level, Haskell is well-endowed with tools for sequences (except
they’re called lists): ++ is concatenation, reverse reverses a list,
map applies a function to each element of a list, : prepends an element
to a list, and so on.
At a higher level, pattern matching lets us write out the special cases and
the general cases just as a mathematician would.
As a bonus, type inference means we can skip the ceremony and go straight to
defining the sequence. However, in practice, types should sometimes be declared
explicitly for safety and clarity. For example, in the above it would prevent
attempts to feed fractions to g.
As one might expect, type declarations also resemble their mathematical counterparts:
Nested parentheses
Consider enumerating nested parentheses (say 5 pairs) lexicographically, where
) is considered smaller than (. Knuth presents a supernatural algorithm
before revealing how it works by describing its mathematical underpinnings.
Let’s cut to the chase:
Let Apq be the sequence of all strings α that contain p left
parentheses and q ≥ p right parentheses, where (q-pα is properly nested, listed in lexicographic order…Apq obeys the recursive rules
Apq = )Ap(q-1), (A(p-1)q, if 0 ≤ p ≤ q ≠ 0; A00 = ϵ;
also Apq is empty if p < 0 or p > q.
Again, in Haskell, the resemblance is uncanny:
a :: Int -> Int -> [String]
a 0 0 = [""]
a p q | p < 0 || p > q = []
| otherwise = (map (')': ) (a p (q-1))) ++ (map ('(': ) (a (p-1) q))
(So for example a 5 5 lists all ways of nesting 5 pairs of parentheses in lexicographic order.)
By the way, “otherwise” is simply syntactic sugar for “True”; it’s one
of many little niceties.
Another little nicety is the “where” clause:
heron a b c = sqrt(s * (s-a) * (s-b) * (s-c)) where s = (a+b+c)/2
Other languages force us to at least declare s before using it. Of course, if
preferred, we can respect this more conventional order in Haskell:
heron a b c = let s = (a+b+c)/2 in sqrt(s * (s-a) * (s-b) * (s-c))
Taming infinity
Squares: Consider the sequence of positive squares in ascending order.
Mathematically we might write ⟨x2 | x = 1, 2, …⟩. Thanks to list
comprehension, Haskell code looks similar:
Although this is an infinite sequence, it does not require infinite memory. In
fact, thanks to lazy evaluation, it requires almost no memory until an element
of the sequence is required for some calculation, in which case Haskell
computes the minimum possible to find it.
For example, we can sum the first 10 squares:
Haskell computes only 10 elements of the sequence
(before summing them to get 385).
Continued fractions: The continued fraction expansion of e is [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, … ].
In Haskell, we can write:
e = 2 : f 2 where f x = 1 : x : 1 : f (x+2)
Again, although this is an infinite sequence, it takes almost no memory until an element of the sequence is required. Type e !! 23 and Haskell recurses just enough to return the 23rd element of the sequence (16).
The recursive nature of continued fractions plays to Haskell’s strengths:
f 1 (x:xs) = fromIntegral x
f n (x:xs) = fromIntegral x + 1 / (f (n-1) xs)
Then f 32 e returns the 32nd convergent.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
This famous one-liner is Haskell’s answer to a top-down dynamic programming
Fibonacci number generator of other languages.
Serious power
In Power series, power
serious, Doug McIlroy constructs a simple yet powerful system for
manipulating power series by utilizing Haskell’s operator overloading, lazy
evaluation, and first-class functions. Among the many
jaw-dropping highlights is the
mutually recursive:
sinx = integral cosx
cosx = 1 - (integral sinx)
which amazingly generates the power series of both trigonometric functions.
McIlroy also identifies lazy evaluation with stream programming and coroutines;
this is particularly poignant as he is the principal inventor of Unix pipes.
Intellectual snobbery
A rich seam of theory lies beneath the surface of Haskell. One can ignore it
and still appreciate Haskell. However, better to let curiosity get the better
of you and read about the underlying computer science.
For starters, it’s simply fun to talk the talk. Lambda calculus. Combinators.
Point-free programming. Hindley-Milner type inference, or Algorithm W (though
Wikipedia says it should really be Damas-Milner). Functors. Monads. Each term
reeks of erudition.
More importantly, you will gain knowledge beyond that of supposed experts, if
I’m any guide. Despite holding an advanced degree in computer science, most of
the material was new to me. In my defence, my specialty was cryptography, far
removed from icky real-world matters such as programming languages, but I feel
embarrassed nonetheless. So learn the theory, either to outsmart academics, or
to prevent others outsmarting you!
Purity
I once thought purity was pointless because truly pure languages are
impractical. At some point a computer must communicate with a human or another
computer, yet I/O is impure.
In fact, initially I thought the situation was worse: without variables or
anything else resembling a tape cell of a Turing machine, how could a pure
function possibly be as powerful?
This is where the aforementioned theory comes in handy. It turns out you
can cleverly encode the integers as strange-looking pure functions, and feed
these into other pure functions to perform operations on them. A clever trick
involving a fixed-point operator, such as the Y combinator, allows
recursion, and this leads to a proof that pure functions and Turning machines
can compute the same functions.
Thus pure functions can be equally powerful, so apart from little real-world
inconveniences like I/O, a pure language ought to be practical. Good notation
for pure functions may be worth the hassle: lack of side effects allows deeper
analysis and optimization.
Of course, those inconveniences must be supported, hence we require some
impurity. However, like dangerous chemicals, improperly handled impurities can
contaminate a functional language. Haskell is particularly good at cordoning
off the impure parts of a program.
There are times we must resort to C or lower. For example, the program below,
also described by Knuth, prints all ways of nesting 5 pairs of parentheses in
lexicographic order.
#include <stdio.h>
int main() {
int n = 5;
char s[2*n+2], *end = s + 2*n-1, *m = end;
*s = end[1] = ')', end[2] = 0;
for(char *c = s+1; c <= end; *c++ = '(', *c++ = ')');
for(;; ) {
puts(s+1);
*m = ')';
if (')' == *--m && (*m = '(')) continue;
char *j = m;
for(char *k = end; '(' == *j; *j-- = ')', *k = '(', k-=2);
if (s == j) break;
*j = '(', m = end;
}
return 0;
}
Contrast this to the Haskell version. The C version modifies a buffer in place
on the stack. There are no function calls, apart from the I/O. No boundary
checks, no garbage collection, no memory allocation, no runtime system. It is
hard to imagine how a Haskell compiler could generate code this fast. Indeed,
it is a nontrivial task for a human! We’ll never escape books like Knuth’s, nor
languages like C.
Lazy evaluation
Lazy evaluation is a double-edged sword. In theory, it beats strict evaluation
because, roughly speaking, a computation that can halt will halt. It is why we
can play with infinite sets, so long as we only compute on finite subsets. In
contrast, strict evaluation cannot get started because it tries to instantiate
an infinite set before operating on it. Laziness also meshes well with purity.
However, lazy evaluation complicates profiling and optimization. Consider
computing the mean of some numbers. In low-level languages, we can easily do
this in constant space in a single linear loop. In contrast, the most natural
Haskell code for this calculation uses at least linear space and squanders most
of the CPU on garbage collection because of lazy evaluation. There are ways to
force strict evaluations, but such
optimization
makes Haskell uglier and requires expertise.
UNIX pipes are more-or-less equivalent to lazy evaluation, as are the
channels found in the Go language which are essentially
statically typed pipes. Compare Rob Pike’s
Squinting at Power
Series, a channel-based approach to power series, to McIlroy’s Haskell
version.
Monads
Though they caution against using macros throughout
The Practice of Programming, in section 9.6, Kernighan and Pike give
examples of code that become clearer with textual substitution.
Sometimes, the problem is that the code is repetitive, but the language lacks
expressive power to factor out the common part. We might lack a “if the
condition fails then skip the rest of the if blocks” statement, or a “start a
timer, execute the given statement 10 times, then stop the timer” construct.
For these situations, Haskell has monads. Like macros and goto, monads can
eliminate repetitive code, except they can do so cleanly. Monads are
to noninvasive surgery as macros are to butchery.