The land of the point-free

What is an injective function?

The first time around, I was taught a function \(f : X \to Y\) is injective or one-to-one if for all \(a, b \in X\):

\[ f(a) = f(b) \implies a = b \]

Can we rewrite this definition without mentioning set elements? This might sound hopeless: how can we speak of injectivity if we’re forbidden to name individual elements?

Here’s a hint: consider function composition instead of function application, and consider pairs of functions instead of pairs of elements. From this, we might try defining a function \(f : X \to Y\) to be injective if all for functions \(g, h:1 \to X\):

\[ f \cdot g = f \cdot h \implies g = h \]

where \(1\) denotes a singleton set. The idea is to select elements of \(X\) with \(g\) and \(h\). We cannot name \(a\) and \(b\) directly, but we sure can name functions that map to them!

However, we’re cheating: to talk of singleton sets is to count elements of sets. We’ll soon find a way around this, but for now, we clean up by observing that everything works even if we relax the condition that \(g\) and \(h\) map from a singleton set. That is, \(f : X \to Y\) is injective if for all sets \(Z\) and all functions \(g, h:Z \to X\):

\[ f \cdot g = f \cdot h \implies g = h \]

This high-level element-free approach is fun to visualize with arrows and circles:

YfXghZ

In contrast, the diagram would be less pleasant if we had to depict elements of sets as well. We’d probably have to enlarge the circles and add big dots to represent individual elements.

Less is more

Even programmers with scant mathematical training appreciate working at a higher level. Details are important, but we benefit from abstracting them away. Nobody wants to write complex software in assembly language. As the most famous poem by Wang Zhihuan puts it: to see a thousand miles, go up another level.

Above, our original definition involved function application and set elements. The high-level version replaces these with just function composition. Besides prettier diagrams, this beats the low-level version in at least 3 ways:

  • Suppose we wish to prove a theorem. Since the high-level view involves fewer concepts, proofs are easier.

  • We instantly gain a dual by reversing the arrows, that is, we swap the sources and destinations of functions, and flip compositions.

  • We can generalize to any setting where composition of function-like things makes sense. That is, there need not be any notion of set elements or function application. All we need is composition.

We illustrate the first point by showing the composition of two injective functions is injective. Suppose \(f, f'\) are injective. Then given \(f \cdot f' \cdot g = f \cdot f' \cdot h\), we wish to show \(g = h\). The only obvious first step is to use our definition and the injectivity of \(f\) to obtain \(f' \cdot g = f' \cdot h\). Then doing the same for the injectivity of \(f'\) yields \(g = h\). Done!

As for the second point, the dual of an injective function is a function \(f : Y \to X\) such that for all \(g, h: X \to Z\):

\[ g \cdot f = h \cdot f \implies g = h \]

This characterizes surjective or onto functions. Furthermore, mechanically taking the dual of our proof shows the composition of surjections is surjective. Could you get all this by tweaking the conventional low-level definition of injections?

YfXghZ

For the last point, we give an example of a function-like thing. Instead of sets, let \(X, Y, Z\) be elements of a poset, and let \(f:X \to Y\) indicate \(X \le Y\). In this setting, there is at most one “function” between any two “sets”, and composition is well-behaved because the poset relation is transitive. Nothing corresponds to set elements or function application, yet our high-level definitions still make sense.

Category Theory

To avoid scare quotes, we say arrow for a generalized function, and object for a generalized set. We also say monic and epic arrows for generalized injections and surjections. A rich field known as category theory develops from merely stipulating that:

  1. An identity arrow exists for each object:

    YidYfXidX
    \[ f \cdot id_X = f = id_Y \cdot f \]
  2. Composition of arrows is associative:

    ZhYgXfW
    \[ h \cdot (g \cdot f) = (h \cdot g) \cdot f \]

The first axiom often leads to reflection laws. For example, identity arrows are monic and epic.

The second axiom often leads to fusion laws. For example, the composition of two monic arrows is monic, and similarly for epic arrows.

[I’m not sure what the official definitions of reflection and fusion are, but this seems to fit what I’ve observed.]

Something from nothing

The empty set is the set with no elements. Can we also rewrite this definition without mentioning elements?

Yes! The empty set is the set \(X\) such that for every set \(Y\), there exists a unique function \(f:X \to Y\) (the empty function).

YfX

Sometimes this is written \(\exists ! f\), but instead I’m trying to save room by putting exclamation marks in the arrow shaft in the diagram.

How about the dual? That is, what is the set \(X\) where for each set \(Y\) there exists a unique function \(f:Y \to X\)? A little thought shows \(X\) must be a singleton set.

YfX

These definitions of the empty set and singleton sets generalize to what are called the initial object and the terminal object in category theory. That is, so long as we have objects and arrows, we can define objects akin to the empty set and singleton sets.

In a poset, the initial object, if it exists, is the minimum, and the terminal object, if it exists, is the maximum.

The axioms of category theory quickly lead to some laws. The arrow from the initial object to itself is the identity arrow. Composing an arrow from the initial object with another arrow gives a unique arrow from the initial object to the new destination. We have similar results for terminal objects.

As an aside, now that we can define singleton sets without mentioning elements, we could fix our first attempt at defining injectivity. We could require \(Z\) to be a terminal object. However, in the dual, \(Z\) becomes an initial object, and we find the only valid choice for \(f\) is the identity function on the empty set. Thus this version is over-engineered: by adding an unnecessary condition, we have made the dual boring.

Products

For our next trick, consider the Cartesian product of two sets \(A, B\), which is typically defined to be the set

\[ A \times B = \{ (a, b) | a \in A, b \in B \} \]

How can we rewrite this definition without mentioning set elements?

We start by thinking about the API. The projection functions are how programmers interact with the product without knowing its implementation details. In C++, the first and second members of std::pair hide the dirty work done by the compiler. In Haskell, the fst and snd functions play this role. Theoreticians prefer the names outl and outr.

Next, for inspiration, consider the product type (Int, String) in Haskell. For any type c, given functions f :: c -> Int and g :: c -> String, we can combine them into a function h :: c -> (Int, String):

h :: (c -> Int) -> (c -> String) -> (c -> (Int, String))
h f g = \c -> (f c, g c)

-- Fancier definition using the Arrow monad: h = (&&&)

prop_product = h length reverse "foo" == (3, "oof")

Furthermore, we can recover the original functions with fst and snd:

f' = fst . h
g' = snd . h

This guides our approach. If \(X\) denotes the product of \(A\) and \(B\) then roughly speaking, two functions \(f:C \to A\) and \(g:C \to B\) should be equivalent to a single function \(h:C \to X\) because we can combine them. Also, this function \(h\) should be unique, because \(X\) should contain no superfluous information: \(X\) should contain all combinations of the elements of \(A\) and \(B\), and nothing more.

More formally, the product of sets \(A\) and \(B\) is a set \(X\) and two functions \(outl : X \to A\) and \(outr : X \to B\) such that for any functions \(f:C \to A\) and \(g:C \to B\), there exists a unique \(h : C \to X\) such that \(outl \cdot h = f\) and \(outr \cdot h = g\).

BgoutrhCXfoutlA

We write \(\langle f,g \rangle\) for the function \(h\) above. We have the reflection law:

\[ \langle outl, outr \rangle = id \]

and the fusion law:

\[ \langle f, g \rangle \cdot m = \langle f \cdot m, g \cdot m \rangle \]

Their proofs are left as an exercise. Since we’re working at a high level, it’s a matter of applying a few definitions in the right order.

The set \(X\) is not unique, but we justify calling it the product because the definition implies there is a unique isomorphism between any two products, that is, the product is unique up to unique isomorphism.

Our original definition yields a unique product set, but why single out one particular implementation? A compiler can choose how best to lay out a pair of elements in memory. For efficiency, it may use different schemes for different cases, and convert between them automatically. At a high level, they’re all the same.

Co-products

We flip the arrows to reveal our bonus dual idea. The definition now reads: the co-product of sets \(A\) and \(B\) is a set \(X\) and two functions \(inl : A \to X\) and \(inr : B \to X\) such that for any functions \(f:A \to C\) and \(g:B \to C\), there exists a unique \(h : X \to C\) such that \(h \cdot inl = f\) and \(h \cdot inr = g\).

BginrhCXfinlA

We write \([f,g]\) for the function \(h\) above. We have the reflection law:

\[ [inl, inr] = id \]

and the fusion law:

\[ m \cdot [f, g] = [m \cdot f, m \cdot g] \]

which again are exercises in applying definitions. Alternatively, we can take the dual of the proofs of the corresponding laws for products.

A moment’s thought shows \(X\) is the direct sum of \(A\) and \(B\), which we write \(A + B\). This is clear if we map \(inl\) and \(inr\) to Left and Right, and \(X\) to Either A B in Haskell:

import Data.Char

h :: (a -> c) -> (b -> c) -> Either a b -> c
h f _ (Left a)  = f a
h _ g (Right b) = g b

-- In fact, the Prelude already defines this function: h = either

prop_sumL = h chr head (Left 65)     == 'A'
prop_sumR = h chr head (Right "foo") == 'f'

So from a high-level standpoint, it would be odd to talk about product types without mentioning sum types. They’re two sides of the same coin.

For posets, the product and co-product correspond to the meet and join. Would you think of one without the other?

In practice, sum types are handy and technically feasible, for despite being relatively close to assembly language, C manages to support both products (struct) and sums (union), albeit in an unsafe manner. Also, languages such as ML and Haskell prove it is possible to design usable syntax for sum types.

Thus both abstract and concrete evidence suggest that a programming language that supports product types should also support sum types.

Indeed, we experience friction when this principle is violated. The Go language has excellent notation for product types but conspicuously lacks sum types. Programmers clumsily compensate by using product types in lieu of their dual. A function that returns an error or an integer instead returns an error and an integer, with the unenforceable understanding that only one of them should be taken seriously.

In fact, the reality is even messier. Go perpetuates what Tony Hoare calls his billion-dollar mistake, which means that certain types are actually sum types where we have no control over the one of the summands.

Let me draw you a diagram

The following code generates the above diagrams, using the Diagrams domain-specific language; see also Functional Geometry by Peter Henderson.

{-# LANGUAGE NoMonomorphismRestriction #-}

import Data.Text (empty)
import Diagrams.Prelude
import Diagrams.Backend.SVG

arr = with & headLength .~ 8

uni = arr & shaftStyle %~ dashingO [5,3,1,6] 0

lbl :: String -> Diagram B
lbl s = text s # stylize

stylize :: Diagram B -> Diagram B
stylize = font "sans" . fontSizeO 14

mkObj :: String -> Diagram B
mkObj s = lbl s <> unitCircle # named s

catId :: Diagram B
catId = (strutY 0.5 ===) $ hcat
  [ mkObj "X"
    <> (text "id" <> text "X" # scale 0.5 # translate (r2 (0.5, -0.2)))
    # stylize # translateY 2.2
  , translateY 0.5 (lbl "f") <> strutX 4
  , mkObj "Y"
    <> (text "id" <> text "Y" # scale 0.5 # translate (r2 (0.5, -0.2)))
    # stylize # translateY 2.2
  ]
  # connectPerim' (arr & arrowShaft .~ arc xDir (-5/8 @@ turn))
    "X" "X" (5/16 @@ turn) (3/16 @@ turn)
  # connectPerim' (arr & arrowShaft .~ arc xDir (-5/8 @@ turn))
    "Y" "Y" (5/16 @@ turn) (3/16 @@ turn)
  # connectOutside' arr "X" "Y"

catCompose :: Diagram B
catCompose = hcat
  [ mkObj "W"
  , translateY 0.5 (lbl "f") <> strutX 4
  , mkObj "X"
  , translateY 0.5 (lbl "g") <> strutX 4
  , mkObj "Y"
  , translateY 0.5 (lbl "h") <> strutX 4
  , mkObj "Z"
  ]
  # connectOutside' arr "W" "X"
  # connectOutside' arr "X" "Y"
  # connectOutside' arr "Y" "Z"

zxy :: Diagram B
zxy = hcat
  [ z
  , translateY (-2) (lbl "h" <> unitSquare # lw none)
    <> translateY 2 (lbl "g" <> unitSquare # lw none) <> strutX 4
  , x
  , translateY 0.5 (lbl "f") <> strutX 4
  , y
  ] where [x, y, z] = mkObj <$> ["X", "Y", "Z"]

injection :: Diagram B
injection = zxy
  # connectOutside' arr "X" "Y"
  # connectPerim' (arr & arrowShaft .~ arc xDir (-1/6 @@ turn))
    "Z" "X" (1/8 @@ turn) (3/8 @@ turn)
  # connectPerim' (arr & arrowShaft .~ arc xDir (1/6 @@ turn))
    "Z" "X" (-1/8 @@ turn) (-3/8 @@ turn)

surjection :: Diagram B
surjection = zxy
  # connectOutside' arr "Y" "X"
  # connectPerim' (arr & arrowShaft .~ arc xDir (1/6 @@ turn))
    "X" "Z" (3/8 @@ turn) (1/8 @@ turn)
  # connectPerim' (arr & arrowShaft .~ arc xDir (-1/6 @@ turn))
    "X" "Z" (-3/8 @@ turn) (-1/8 @@ turn)

xy :: Diagram B
xy = hcat
  [ x
  , translateY 0.5 (lbl "f") <> strutX 4
  , y
  ] where [x, y] = mkObj <$> ["X", "Y"]

initial :: Diagram B
initial = xy # connectOutside' uni "X" "Y"

terminal :: Diagram B
terminal = xy # connectOutside' uni "Y" "X"

axbc :: String -> String -> Diagram B
axbc l r = hcat
  [ a
  , translateY 0.5 (lbl l) <> translateY (-3) (lbl "f") <> strutX 4
  , x <> translateY (-4) c <> lbl "h" # translateY (-2) # translateX 0.5
  , translateY 0.5 (lbl r) <> translateY (-3) (lbl "g") <> strutX 4
  , b
  ] where [a, x, b, c] = mkObj <$> ["A", "X", "B", "C"]

catProduct :: Diagram B
catProduct = axbc "outl" "outr"
  # connectOutside' arr "X" "A"
  # connectOutside' arr "X" "B"
  # connectOutside' arr "C" "A"
  # connectOutside' arr "C" "B"
  # connectOutside' uni "C" "X"

catSum :: Diagram B
catSum = axbc "inl" "inr"
  # connectOutside' arr "A" "X"
  # connectOutside' arr "B" "X"
  # connectOutside' arr "A" "C"
  # connectOutside' arr "B" "C"
  # connectOutside' uni "X" "C"

main :: IO ()
main = do
  let
    gen name dia = renderSVG' (name ++ ".svg")
      -- Is there a default SVGOptions? I only want to set the size and
      -- suppress the DOCTYPE declaration.
      (SVGOptions (mkWidth $ width dia * w) Nothing empty [] False)
      (dia & lineWidth 1 & pad 1.1)
    w = 300 / width injection
  gen "id"         catId
  gen "compose"    catCompose
  gen "injection"  injection
  gen "surjection" surjection
  gen "initial"    initial
  gen "terminal"   terminal
  gen "product"    catProduct
  gen "coproduct"  catSum

We scale diagrams so the sizes of their elements (such as circles) have the same size. We start by setting the width of the injection diagram to 300 pixels and computing the other widths accordingly. Since the diagrams have different dimensions, normalized sizes look inconsistent. Instead, we use pixels to measure the dimensions of arrow heads, fonts, line widths, and the like.

When drawing the dual, sometimes we do more than swap the sources and destinations. The curved arrows must arc in the opposite direction, and their entry and exit angles must be mirrored.


Ben Lynn blynn@cs.stanford.edu 💡