Over the centuries, mathematicians have developed notation so eloquent that it is partly responsible for their progress. To quote George Boole: “That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted.”

Unfortunately, elegance is often lost in translation to a computer program. A short formula expands into several nested loops. Simple conditions become bulky if-then-else blocks. A few symbols that represent an infinite object may have no direct analogue.

However, if the target language is Haskell, we can retain much of the original mathematical flavour. Along with other features, this can make Haskell better for rapid prototyping than the usual suspects, such as Python.

For example, Jeremy Gibbons shows off incredible Haskell snippets that compute digits of Pi (and also quotes a cute C program for the first 15000 digits). Though not the main aim, the paper gives a good overview of Haskell.

In Volume 4 of *The Art of
Computer Programming*, Knuth mathematically
introduces Gray code as follows: “if Γ_{n} stands for the Gray binary
sequence of *n*-bit
strings, we can define Γ_{n} recursively by the two
rules Γ_{0} = ϵ; Γ_{n+1} = 0Γ_{n} , 1Γ_{n}^{R}”.

This translates eerily well to Haskell:

g 0 = [""] g n = (map ('0':)) (g (n-1)) ++ (map ('1':)) (reverse (g (n-1)))

Why is it so easy to write Haskell code to resemble a mathematical explanation?

At a lower level, Haskell is well-endowed with tools for
sequences (except they’re called *lists*): `++`

is concatenation, `reverse`

reverses a list, `map`

applies a function to each element of a
list, `:`

prepends an element to a
list, and so on.

At a higher level, pattern matching lets us write out the special cases and the general cases just as a mathematician would.

As a bonus, type inference means we can skip the ceremony
and go straight to defining the sequence. However, in
practice, types should sometimes be declared explicitly for
safety and clarity. For example, in the above it would
prevent attempts to feed fractions to `g`

.

As one might expect, type declarations also resemble their mathematical counterparts:

g :: Int -> [String]

Consider enumerating nested parentheses (say 5 pairs)
lexicographically, where `)`

is
considered smaller than `(`

. Knuth
presents a supernatural algorithm before revealing how it
works by describing its mathematical underpinnings. Let’s cut
to the chase:

Let

Abe the sequence of all strings_{pq}αthat containpleft parentheses andq≥pright parentheses, where`(`

^{q-p}αis properly nested, listed in lexicographic order…Aobeys the recursive rules_{pq}

A=_{pq}`)`

A_{p(q-1)},`(`

A_{(p-1)q}, if 0 ≤p≤q≠ 0;A_{00}= ϵ;also

Ais empty if_{pq}p< 0 orp>q.

Again, in Haskell, the resemblance is uncanny:

a :: Int -> Int -> [String] a 0 0 = [""] a p q | p < 0 || p > q = [] | otherwise = (map (')': ) (a p (q-1))) ++ (map ('(': ) (a (p-1) q))

(So for example `a 5 5`

lists
all ways of nesting 5 pairs of parentheses in lexicographic
order.)

By the way, “otherwise” is simply syntactic sugar for “True”; it’s one of many little niceties.

Another little nicety is the “where” clause:

heron a b c = sqrt(s * (s-a) * (s-b) * (s-c)) where s = (a+b+c)/2

Compare this with the phrasing of the Wikipedia entry on Heron’s formula.

Other languages force us to at least declare *s* before using it. Of course, if
preferred, we can respect this more conventional order in
Haskell:

heron a b c = let s = (a+b+c)/2 in sqrt(s * (s-a) * (s-b) * (s-c))

**Squares**:
Consider the sequence of positive squares in ascending order.
Mathematically we might write ⟨x^{2} | x = 1, 2, …⟩.
Thanks to *list
comprehension*, Haskell code looks similar:

sq = [ x*x | x <- [1..]]

Although this is an infinite sequence, it does not require infinite memory. In fact, thanks to lazy evaluation, it requires almost no memory until an element of the sequence is required for some calculation, in which case Haskell computes the minimum possible to find it.

For example, we can sum the first 10 squares:

sum (take 10 sq)

Haskell computes only 10 elements of the sequence (before summing them to get 385).

**Continued
fractions**: The continued fraction expansion
of *e* is [2; 1, 2, 1,
1, 4, 1, 1, 6, 1, … ].

In Haskell, we can write:

e = 2 : f 2 where f x = 1 : x : 1 : f (x+2)

Again, although this is an infinite sequence, it takes
almost no memory until an element of the sequence is
required. Type `e !! 23`

and
Haskell recurses just enough to return the 23rd element of
the sequence (16).

The recursive nature of continued fractions plays to Haskell’s strengths:

f 1 (x:xs) = fromIntegral x f n (x:xs) = fromIntegral x + 1 / (f (n-1) xs)

Then `f 32 e`

returns the 32nd
convergent.

**Fibonacci
sequence**:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

This famous one-liner is Haskell’s answer to a top-down dynamic programming Fibonacci number generator of other languages.

In *Power series, power serious*, Doug
McIlroy constructs a simple yet powerful system for
manipulating power series by utilizing Haskell’s operator
overloading, lazy evaluation, and first-class functions.
Among the many jaw-dropping highlights is the mutually
recursive:

sinx = integral cosx cosx = 1 - (integral sinx)

which amazingly generates the power series of both trigonometric functions.

McIlroy also identifies lazy evaluation with stream programming and coroutines; this is particularly poignant as he is the principal inventor of Unix pipes.

That Haskell has his approval should alone encourage others to take a close look, because McIlroy’s prowess with computers is the stuff of legend!

See also *Enumerating the strings of
regular languages*, another McIlroy Haskell
adventure.

A rich seam of theory lies beneath the surface of Haskell. One can ignore it and still appreciate Haskell. However, better to let curiosity get the better of you and read about the underlying computer science.

For starters, it’s simply fun to talk the talk. Lambda calculus. Combinators. Point-free programming. Hindley-Milner type inference, or Algorithm W (though Wikipedia says it should really be Damas-Milner). Functors. Monads. Each term reeks of erudition.

More importantly, you will gain knowledge beyond that of supposed experts, if I’m any guide. Despite holding an advanced degree in computer science, most of the material was new to me. In my defence, my specialty was cryptography, far removed from icky real-world matters such as programming languages, but I feel embarrassed nonetheless. So learn the theory, either to outsmart academics, or to prevent others outsmarting you!

I once thought purity was pointless because truly pure languages are impractical. At some point a computer must communicate with a human or another computer, yet I/O is impure.

In fact, initially I thought the situation was worse: without variables or anything else resembling a tape cell of a Turing machine, how could a pure function possibly be as powerful?

This is where the aforementioned theory comes in handy. It
turns out you can cleverly encode the integers as
strange-looking pure functions, and feed these into other
pure functions to perform operations on them. A clever trick
involving a *fixed-point
operator*, such as the *Y combinator*, allows recursion,
and this leads to a proof that pure functions and Turning
machines can compute the same functions.

Thus pure functions can be equally powerful, so apart from little real-world inconveniences like I/O, a pure language ought to be practical. Good notation for pure functions may be worth the hassle: lack of side effects allows deeper analysis and optimization.

Of course, those inconveniences must be supported, hence we require some impurity. However, like dangerous chemicals, improperly handled impurities can contaminate a functional language.

Amongst functional languages, Haskell is particularly good at cordoning off the dirty parts.

Despite its terse, powerful notation, Haskell is a strongly-typed compiled language. It outperforms popular scripting languages, and some compiled languages.

Though when speed matters, one should resort to C or lower. For example, the program below, also described by Knuth, prints all ways of nesting 5 pairs of parentheses in lexicographic order.

#include <stdio.h> int main() { int n = 5; char s[2*n+2], *end = s + 2*n-1, *m = end; *s = end[1] = ')', end[2] = 0; for(char *c = s+1; c <= end; *c++ = '(', *c++ = ')'); for(;; ) { puts(s+1); *m = ')'; if (')' == *--m && (*m = '(')) continue; char *j = m; for(char *k = end; '(' == *j; *j-- = ')', *k = '(', k-=2); if (s == j) break; *j = '(', m = end; } return 0; }

Contrast this to the Haskell version. The C version modifies a buffer in place on the stack. There are no function calls, apart from the I/O. No boundary checks, no garbage collection, no memory allocation. It is hard to imagine how a Haskell compiler could generate code this fast. Indeed, it is a nontrivial task for a human!

We’ll never escape books like Knuth’s, nor languages like C. It’s nice to have cute mathematical descriptions, and clever languages in which to implement them, but eventually somebody must mess with the details to discover low-level shortcuts.

Lazy evaluation is a double-edged sword. In theory, it beats strict evaluation because, roughly speaking, a computation that can halt will halt. It is why we can play with infinite sets, so long as we only compute on finite subsets. In contrast, strict evaluation cannot get started because it tries to instantiate an infinite set before operating on it. Laziness meshes well with purity.

However, lazy evaluation complicates profiling and optimization. Consider computing the mean of some numbers. In low-level languages, we can easily do this in constant space in a single linear loop. In contrast, the most natural Haskell code for this calculation uses at least linear space and squanders most of the CPU on garbage collection because of lazy evaluation. There are ways to force strict evaluations, but such optimization makes Haskell uglier and requires expertise.

In his 2003 talk, *Wearing the hair shirt: a retrospective on
Haskell*, Simon Peyton Jones asserts that
laziness does not matter; rather, it is purity that is
important.

So how should a language support lazy evaluation? I don’t
know the best answer, but I do know of one notation that has
worked well for decades: UNIX pipes. Among others, the Go language
provides typed pipes in the form of channels. It is too early
to say for sure, but perhaps this is the right compromise.
Compare Rob Pike’s *Squinting at Power Series*, a
channel-based approach to power series, to McIlroy’s Haskell
version.

Though they caution against using macros throughout
*The Practice of
Programming*, in section 9.6, Kernighan and Pike
give examples of code that become clearer with textual
substitution.

Similarly, the goto statement is widely frowned upon, yet Knuth shows how goto makes some code clearer or faster.

Sometimes, the problem is that the code is repetitive, but the language lacks expressive power to factor out the common part. We might lack a “if the condition fails then skip the rest of the if blocks” statement, or a “start a timer, execute the given statement 10 times, then stop the timer” construct.

For these situations, Haskell has monads. Like macros and goto, monads can eliminate repetitive code, except they can do so cleanly. Monads are to noninvasive surgery as macros are to butchery.

Aspect-oriented programming appears related, though I have not investigated this.