What is the matrix?

Linear algebra begins with puzzles that humanity conquered millenia ago. The following appears in Chapter 8 of The Nine Chapters on the Mathematical Art (九章算術):

今有上禾三秉，中禾二秉，下禾一秉，實三十九斗；上禾二秉，中禾三秉，下禾一秉，實三十四斗；上禾一秉，中禾二秉，下禾三秉，實二十六斗。問上、中、下禾實一秉各幾何？

We translate this problem as a gamer might. There are rice plants of varying quality: rare, uncommon, and common.

3 rare plants, 2 uncommon plants, and 1 common plant yield 39 units of rice.
2 rare plants, 3 uncommon plants, and 1 common plant yield 34 units of rice.
1 rare plant, 2 uncommon plants, and 3 common plants yield 26 units of rice.

What is the yield of individual rare, uncommon, and common rice plants?

Thanks to algebraic notation, today we’d write this as a system of linear equations:

\[ \begin{align} 3x + 2y + z &= 39 \\ 2x + 3y + z &= 34 \\ x + 2y + 3z &= 26 \end{align} \]

I first learned to solve linear systems informally. We start by trying to eliminate variables from equations. We pick one of the equations and scale it so that the coefficient of \(x\) matches another equation. For example, multiplying the last equation by 2 yields:

\[ 2x + 4y + 6z = 52 \]

and subtracting this from the second equation eliminates \(x\):

\[ -y - 5z = -18 \]

Multiplying the last equation by 3 and subtracting from the first equation yields another equation that only involves \(y\) and \(z\):

\[ -4y - 8z = -39 \]

We repeat this trick to eliminate \(y\) from one of these equations: we multiply one equation by 4 and subtract from the other:

\[ 12z = 33 \]

which reveals the solution for \(z\). We substitute this value for \(z\) into one of the equations lacking \(x\) to find the solution for \(y\), then substitute the values for \(y\) and \(z\) into one of the original equations to reveal \(x\). By the way, substitution is really a special case of elimination.

Ad hoc forward eliminiation and back substitution served me well in my childhood, but occasionally, while attempting to eliminate a variable, I’d crash into a useless equality like \(0 = 0\). I’d sigh, then manipulate the equations in a different order, and since my homework only involved small systems of equuations, sooner or later I’d find the solution.

I regret I didn’t stop to think how dead ends arose. I also regret I didn’t write a program to automate the above.

Time to Pivot

Had I written code, I think I would have used a two-dimensional array to represent the equations, as I programmed in languages that encouraged the use of arrays, perhaps overly so. Programming would also force me to proceed systematically, and I believe I would have discovered why my hand calculations sometimes hit a dead end: I was essentially subtracting an equation from itself!

In other words, I might have stumbled upon the content of Chapter 8. It is titled 方程 ("Rectangular Grids") because it teaches us to write the equations with a matrix of numbers:

\[ \left[ \begin{array}{ccc|c} 3 & 2 & 1 & 39 \\ 2 & 3 & 1 & 34 \\ 1 & 2 & 3 & 26 \end{array} \right] \]

With this data structure, forward elimination and back substitution can be seen as compositions of elementary row operations:

Swapping two rows.
Scaling a row by a nonzero number.
Adding a multiple of one row to another.

Keeping the numbers organized prevents the accidental subtraction an equation from itself. Each of these operations can be undone, so they preserve data faithfully. A far cry from my childhood self, who scribbled one equation after another down the page, quickly losing track of the relationships between them.

This scheme was invented independently multiple times, though The Nine Chapters is the only known ancient record. Much later, Newton published the most influential guide to solving simultaneous linear equations. Many others contributed improvements, including Gauss, whose innovations in notation led the most prevalent name for the algorithm: Gaussian elimination.

Better late than never. I hereby check off a decades-old entry on my to-do list by implementing Gaussian elimination.

Since we’re using Haskell, instead of a two-dimensional array, we represent a matrix as a list of lists. We start with a pretty-printing routine:

pretty = mapM_ ((>> putChar '\n') . mapM_ go) where
  go x = do
    let d = denominator x
    let str = show (numerator x) ++ (if d == 1 then "" else '/':show d)
    putStr $ replicate (8 - length str) ' '
    putStr str

rice =
  [ [3, 2, 1, 39]
  , [2, 3, 1, 34]
  , [1, 2, 3, 26]
  ]

anotherExample =
  [ [0, 0, 0, 1, 2, 3]
  , [4, 5, 6, 7, 8, 9]
  , [2, 7, 1, 8, 2, 8]
  , [0, 1, 0, 1, 0, 1]
  , [3, 1, 4, 1, 5, 9]
  ]

pretty rice
putChar '\n'
pretty anotherExample

To eliminate a variable from all but one of the equations, we pick one of the \(n\) variables of one of the \(n\) equations, which we call the pivot, then subtract appropriate multiples of the pivot row from the other rows.

Our code chooses the pivot by looking for a row whose head is nonzero; our code assumes such a row exists. We normalize the pivot row before elimination, that is, we scale the entire row so that the pivot is 1. Normalization is unnecessary, but we must divide by the pivot at some point anyway, and it also simplifies our back substitution code.

elimHead xs = (1:ps) : as <|> (0:) . elim <$> bs where
  (as, (pivot:pivotTail):bs) = span ((== 0) . head) xs
  ps = (/pivot) <$> pivotTail
  elim (h:t) = zipWith (-) t $ (h*) <$> ps

For example:

pretty $ elimHead rice
putChar '\n'
pretty $ elimHead anotherExample

We complete forward elimination by recursing on the remaining \(n - 1\) variables of the other \(n - 1\) equations. The recursion bottoms out when we have one variable left.

Thue we write a version of our previous function that recurses instead of returning immediately. We take this opportunity to optimize slightly by partitioning off all rows starting with 0 to avoid redundant elim calls.

forwardElim [] = []
forwardElim xs = (1:ps) : rest where
  (zs, (pivot:pivotTail):nzs) = partition ((== 0) . head) xs
  ps = (/pivot) <$> pivotTail
  rest = map (0:) $ forwardElim $ tail <$> zs <|> elim <$> nzs
  elim (h:t) = zipWith (-) t $ map (h*) ps

pretty $ forwardElim rice
putChar '\n'
pretty $ forwardElim anotherExample

Now for back substitution. Forward elimination finishes with a one-variable equation, which tells us what value the variable must take. We work backwards inductively: at each step we use the values for \(k - 1\) of the variables to figure out the value of the \(k\)th variable.

backSub [[1, a]] = [a]
backSub ((1:xs):rest) = (last xs - sum (zipWith (*) xs as)) : as
  where
  as = backSub $ tail <$> rest

Let’s test it on the problem from The Nine Chapters:

backSub $ forwardElim rice

If we were using floating point numbers instead of rationals, then it turns out for numerical stability, we should pivot on the row whose first element is largest in magnitude.

You say row, and I say column

Traditionally, Chinese is written vertically, from top to bottom, so The Nine Chapters actually arranged the equations in columns rather than rows, and described elementary column operations rather than row operations.

If \(A\) is a matrix, then the transpose of \(A\), denoted \(A^T\), is the matrix whose columns are the rows of \(A\).

For now, writing rows instead of columns is a cosmetic choice, but we shall see there are other reasons to transpose matrices.

Our code represents matrices with lists of lists, which we can conveniently transpose with Haskell’s transpose function bundled with Data.List.

pretty $ transpose
  [ [1, 2, 3]
  , [4, 5, 6]
  , [7, 8, 9]
  ]

Zero, one, infinity

My homework focused on simultaneous equations with unique solutions. However, in general, there are two other possibilities. If the equations are inconsistent, then there is no soution. And if the equations are under-determined, then there are infinite solutions, because we may assign any desired value to at least one of the variables and still derive a solution.

We generalize our forward-elimination code by moving on to the next column if all leading entries are zero, and also by terminating if there are no more columns.

We say the resulting matrix is in row echelon form.

Our code is written so that the first nonzero entry of a row, namely, its leading entry, is always 1, though this is not a requirement for row echelon form.

echelon = \case
  [] -> []
  xs@([]:_) -> xs
  xs | (zs, nzs) <- partition ((== 0) . head) xs -> case nzs of
    [] -> map (0:) $ echelon $ tail <$> zs
    (pivot:pivotTail):nzt -> (1:pt) : rest
      where
      pt = map (/pivot) pivotTail
      rest = map (0:) $ echelon $ tail <$> zs <|> elim <$> nzt
      elim (h:t) = zipWith (-) t $ map (h*) pt

pretty $ echelon rice

As unique solutions are no longer guaranteed, back substitution can no longer be seen as chaining together steps that determine one variable after another; the chain may now be broken.

We just do our best: for each row, we subtract approriate multiples of other rows so that its leading 1 is the only nonzero entry in its column. Our code starts from the last row and works upwards, so we sandwich the core logic between reverse calls.

We say the resulting matrix is in reduced row echelon form.

reduce = reverse . reduceReverse . reverse

reduceReverse [] = []
reduceReverse (row:rest) = row : (go <$> reduceReverse rest) where
  k = length $ takeWhile (0 ==) row
  go other
    | k < length other = zipWith (-) other $ map ((other!!k)*) row
    | otherwise        = other

From here we can read off any solutions.

If the last nonzero row is 0 .. 0 1 then the equations imply 0 = 1, that is, there is no solution:

inconsistent = [[1, 2, 3, 4], [1, 2, 3, 44]]
pretty $ reduce $ echelon inconsistent

Otherwise, if the matrix contains at least one row of the form:

\[ \begin{matrix} 0 & … & 0 & 1 & a_1 & … & a_k & b \end{matrix} \]

where at least one of \(a_1, .,., a_k\) is nonzero, then each variable corresponding to a nonzero \(a_i\) can take any desired value \(v_i\). Write \(x\) for the variable corresponding to this row. Then:

\[x = b - \sum_{a_i \ne 0} a_i v_i\]

underdetermined = [[1, 2, 3, 4], [0, 2, 0, 44], [2, 4, 6, 8]]
pretty $ reduce $ echelon underdetermined

Otherwise there is a unique solution. If we remove any zero rows and the last column, we’re left with a square matrix that is zero everywhere except for the main diagonal of ones, where the main diagonal goes from the top left to the bottom right; this square matrix is the identity matrix of size \(n\), where \(n\) is the number of ones.

pretty $ reduce $ echelon rice

The last column holds the values we must assign to the variables. The above matrix says:

\[ x = \frac{37}{4}, y = \frac{17}{4}, z = \frac{11}{4} \]

Same LHS, Different RHS

Suppose we change the right-hand sides of \(n\) linear equations, but leave the left-hand sides unchanged, that is, the coefficients of each variable remain the same.

Naturally, we could rerun our program. But this seems wasteful as we repeat many operations. Can we reduce the amount of work by saving data from our first run?

Reviewing our code suggests the following caching plan:

The partition calls may cause reordering of the rows. We record the resulting permutation and pre-apply it to the equations.
We record all the multipliers found during forward elimination. For row \(k\), we record how much we scaled it before adding it to each of rows \(k+1, …, n\).
If we opt to normalize, then we record the normalization scalars.
We record all coefficients just before we perform back substitution, that is, the row echelon form of the matrix except for the last column.

This scheme is known as the LU decomposition. The four caches are called P, L, D, and U, respectively, which stand for permutation, lower triangular, diagonal, and upper triangular. These terms make sense once we learn about matrix multiplication and elementary matrices, though already we know that U is a matrix in row echelon form, so its nonzero numbers can only appear on the main diagonal or above the main diagonal, that is, U is upper triangular.

Then given a new set of values on the right-hand side, we permute them according to P, scale and add them to one another according to L, then back-substitute acoording to U. If a D cache is present, then we normalize just before back-substitution.

We rewrite forwardElim to record information as we perform forward elimination. We tag each row with an index, so we can later determine how they were permuted. We also attach a list of scalars to each row: every time we add a scaled row to another row, we append the multiplier used.

To simplify, we assume a unique solution, that is, there are \(n\) equations with \(n\) unknowns, with no redundant equations. Generalizing the code is left as an exercise.

We skip normalization; implementing a D cache is left as another exercise.

The plu function reorganizes the information recorded by forwardElim, and prettyPlu displays them in a less unfriendly manner.

forwardElim' [] = []
forwardElim' xs = p : rest where
  (zs, p@((pivot:pivotTail), _):nzs) = partition ((== 0) . head . fst) xs
  rest = first (0:) <$> forwardElim' (elim <$> (zs ++ nzs))
  elim (h:t, (idx, ks)) = let k = h*(1/pivot) in
    (zipWith (-) t $ map (k*) pivotTail, (idx, k:ks))

plu m = (p, reverse <$> l, u) where
  (u, info) = unzip $ forwardElim' $ zip m $ (, []) <$> [0..]
  (p, l) = unzip info
  n = length u

prettyPlu m = let
  (p, l, u) = plu m
  n = length p
  in do
    putStrLn $ "P: " ++ show p
    putStrLn "L:"
    pretty l
    putStrLn "U:"
    pretty u

These functions assume the input represents the left-hand side of the equations.

pretty $ init <$> rice
prettyPlu $ init <$> rice

pretty $ init <$> anotherExample
prettyPlu $ init <$> anotherExample

Ben Lynn blynn@cs.stanford.edu 💡

What is the matrix?

Time to Pivot

You say row, and I say column

Zero, one, infinity

Same LHS, Different RHS

Contents