(<$>) :: (a -> b) -> m a -> m b -- Also known as: fmap, liftM, liftA.
Warts and All
Even the relatively young field of computer science is haunted by entrenched traditions.
Take TAB characters in Makefiles. Why was this strange convention never changed? In the original author’s words: "I didn’t want to screw up my embedded base".
Or consider the artificial divide between statements and expressions that still plagues many programming languages. This choice was natural for FORTRAN, partly because of the limitations of punch-cards, but in modern times, such segregation is counterproductive.
GHC takes the righteous path. The language atones for historical accidents, becoming more clear and logical over time, and although code will mostly be backwards compatible, breakages are possible. More accurately, it’s not the language itself but the core libraries that are changing.
AMP
The Functor-Applicative-Monad Proposal (AMP) was implemented in GHC version 7.10.
Roughly speaking, a parameterized type m
is a functor if we can meaningfully
define the function:
In addition, m
is an applicative if we can break down this function into two
pieces:
pure :: a -> m a -- Also known as: return. (<*>) :: m (a -> b) -> m a -> m b -- (<$>) = (<*>) . pure
In addition, m
is a monad if we can work with functions with type
a → m b
:
(>>=) :: m a -> (a -> m b) -> m b
Despite being the middle child, applicatives were added to Haskell after monads. And because the family resemblance of functors, applicatives, and monads were once underappreciated, different names arose for the same functions.
Synonyms can be useful, but I find names like >>
and >
equally
illuminating. And chaining <>
beats functions like liftM5
: why can’t the
compiler figure out how many arguments should be lifted?
Moreover, much of the time, functors are applicatives, and applicatives are monads (exercise: think of counterexamples). If they’re usually the same, why not use the same notation?
Apart from sleeping more soundly knowing this wart has been removed,
I mostly look forward to removing verbose Control.Applicative
imports.
FTP
The Foldable/Traversable in Prelude proposal is sometimes called The Burning Bridges Proposal by its friends. It also found its way to GHC 7.10.
Haskell was designed to make lists easy to use. This is a good choice, as lists are simple to understand and implement, flexible, and efficient in certain circumstances.
However, we often need data structures such as Data.Sequence, or Data.Tree, which share some features with lists. Since Prelude and Data.List hogged all the good function names like "sum", "length", "foldl", "mapM", we’re forced to qualify our imports:
import qualified Data.Sequence as S import qualified Data.Foldable as F main = do print $ length [1..10] print $ S.length $ S.fromList [1..10] print $ sum [1..10] print $ F.sum $ S.fromList [1..10]
FTP fixes this by precisely nailing down the common properties of these data structures with type classes, and redefining the good function names so that they automatically work on all appropriate data structures:
import Data.Sequence (fromList) main = do print $ length [1..10] print $ length $ fromList [1..10] print $ sum [1..10] print $ sum $ fromList [1..10]
One might say even C++11 beat Haskell to the punch in this regard, as the same
range-based for loop (with auto
for type inference) works on lists, vectors,
sets, and so on.
At long last, GHC will end the tyranny of Data.List.
In addition, the OverloadedLists GHC extension liberates the square bracket notation:
{-# LANGUAGE OverloadedLists #-} import Data.Set (Set) s :: Set Char s = ['0'..'9']
Semigroup and Monoid
A monoid is a semigroup with an identity. Enough code has been written that we now know it is worth teaching the compiler this fact, just as it was worth making Applicative a superclass of Monad.
Workarounds
Sadly, even in Haskell, some wounds are likely too deep to heal. We can paper over the cracks, for a cost.
Working around a few of the problems is simply a matter of style, which naturally changes as one gains more Haskell experience. For example, nowdays I am less afflicted by Boolean blindness, because over time I grew more comfortable with pattern matching.
In contrast, fixes such as replacing the standard prelude requires at least two lines. If we cared enough about this stuff, we would have to add boilerplate to every program we write.
My code hardly leaves the prototype stage, so to avoid imports and language
extensions, I tend to follow outdated practices. I use String
to avoid
imports and the OverloadedStrings
extension. I use partial functions because
their names are so succinct. I use a list of two elements instead of a tuple
so I can the same function on both elements with map
, which I find handier
than importing Control.Arrow
and calling join (*)
.
However, part of me detests these transgressions, especially when they cause crashes. I’m tempted to rig a preprocessor that inserts boilerplate for me so I can write in a more modern style without the clutter.
A Sinister Fold
One workaround is forced upon us even in prototype code due to performance issues.
The function foldl
is lazy, which means:
foldl1 (+) [1..5]
expands to (((1 + 2) + 3) + 4) + 5
in memory before being evaluated.
On longer lists, this is unbearably slow, and we risk stack overflow.
The solution is to call the eager variant foldl1'
instead:
import `Data.List` foldl1' (+) [1..5]
So what’s the point of foldl
in the first place? We might think a lazy
foldl
suits some problems because the lazy foldr
can be quite dexterous:
> foldr1 (&&) $ repeat False -- An infinite list. False
However, by definition, foldl
cannot short-circuit on infinite lists like
foldr
can: it must expand the input list entirely before getting started. We,
too, are back where we started: why would we ever prefer the lazy foldl
to
its eager variant foldl'
?
As far as I can tell, a lazy foldl
is only applicable to cases such as:
let f x y = if y == 5 then 42 else x in foldl1 f [1..5]
In other words, even though the expression is expanded in memory, at least
the evaluation can short-circuit on the first invocation of f
: Haskell
notices y == 5
, so simply returns 42 and skips evaluating the rest of the
expression. We can see this by unsafely inserting impure code:
> import Debug.Trace > let f x y = if y == 5 then 42 else x > foldl1 (\x y -> traceShow y $ f x y) [1..5] 5 42 > foldl1' (\x y -> traceShow y $ f x y) [1..5] 2 3 4 5 42
Even so:
-
We may still be better off with eager evaluation. Needlessly evaluating a function may be cheaper than constructing a huge expression on the stack.
-
In the above, we acheive a better short-circuit with
foldr1 (flip f) [5,4..1]
: not only do we stop evaluation after the first invocation off
, but we also stop expanding the expression.
We can guess why the default foldl
is lazy. Its authors probably desired
consistency: everything else (like foldr
) is lazy, so surely foldl
should
be lazy too. However, I believe the foldl
and foldl'
functions should be
swapped, that is, the left fold ought to be eager by default. Functions like
maximum
should also be eager by default.
This would spare Haskell novices from learning about the intricacies of lazy evaluation when all we want to do is write working code.
As for the few times a lazy left fold really is warranted: only Haskell experts would care, so they’re the ones who should be forced to import some module or other, and redecorate their code with extra punctuation.
If the two functions were swapped today, just about all existing Haskell programs would continue to work, except they’d be faster, and the rare breakage would be easy to fix.
For now, we have to put up with this wart of the Haskell Prelude.
If we know a list is short, we can get away with foldl
, maximum
, and so on.
Otherwise, we must import Data.List
and append apostrophes to
every foldl
and foldl1
, and also replace functions like maximum
with foldl1' max
.