pretty = mapM_ ((>> putChar '\n') . mapM_ go) where
go x = do
let d = denominator x
let str = show (numerator x) ++ (if d == 1 then "" else '/':show d)
putStr $ replicate (8 - length str) ' '
putStr str
rice =
[ [3, 2, 1, 39]
, [2, 3, 1, 34]
, [1, 2, 3, 26]
]
anotherExample =
[ [0, 0, 0, 1, 2, 3]
, [4, 5, 6, 7, 8, 9]
, [2, 7, 1, 8, 2, 8]
, [0, 1, 0, 1, 0, 1]
, [3, 1, 4, 1, 5, 9]
]
pretty rice
putChar '\n'
pretty anotherExample
What is the matrix?
Linear algebra begins with puzzles that humanity conquered millenia ago. The following appears in Chapter 8 of The Nine Chapters on the Mathematical Art (九章算術), written thousands of years ago:
今有上禾三秉,中禾二秉,下禾一秉,實三十九斗;上禾二秉,中禾三秉,下禾一秉,實三十四斗;上禾一秉,中禾二秉,下禾三秉,實二十六斗。問上、中、下禾實一秉各幾何?
We translate this problem as a gamer might. There are rice plants of varying quality: rare, uncommon, and common.
-
3 rare plants, 2 uncommon plants, and 1 common plant yield 39 units of rice.
-
2 rare plants, 3 uncommon plants, and 1 common plant yield 34 units of rice.
-
1 rare plant, 2 uncommon plants, and 3 common plants yield 26 units of rice.
What is the yield of individual rare, uncommon, and common rice plants?
Thanks to algebraic notation, today we’d write this as a system of linear equations:
\[ \begin{align} 3x + 2y + z &= 39 \\ 2x + 3y + z &= 34 \\ x + 2y + 3z &= 26 \end{align} \]
I first learned to solve linear systems informally. We start by trying to eliminate variables from equations. We pick one of the equations and scale it so that the coefficient of \(x\) matches another equation. For example, multiplying the last equation by 2 yields:
\[ 2x + 4y + 6z = 52 \]
and subtracting this from the second equation eliminates \(x\):
\[ -y - 5z = -18 \]
Multiplying the last equation by 3 and subtracting from the first equation yields another equation that only involves \(y\) and \(z\):
\[ -4y - 8z = -39 \]
We repeat this trick to eliminate \(y\) from one of these equations: we multiply one equation by 4 and subtract from the other:
\[ 12z = 33 \]
which reveals the solution for \(z\). We substitute this value for \(z\) into one of the equations lacking \(x\) to find the solution for \(y\), then substitute the values for \(y\) and \(z\) into one of the original equations to reveal \(x\). By the way, substitution is really a special case of elimination.
Ad hoc forward eliminiation and back substitution served me well in my childhood, but occasionally, while attempting to eliminate a variable, I’d crash into a useless equality like \(0 = 0\). I’d sigh, then manipulate the equations in a different order, and since my homework only involved small systems of equuations, sooner or later I’d find the solution.
I regret I didn’t stop to think how dead ends arose. I also regret I didn’t write a program to automate the above.
Time to Pivot
Had I written code, I think I would have used a two-dimensional array to represent the equations, as I programmed in languages that encouraged the use of arrays, perhaps overly so. Programming would also force me to proceed systematically, and I believe I would have discovered why my hand calculations sometimes hit a dead end: I was essentially subtracting an equation from itself!
In other words, I might have stumbled upon the content of Chapter 8. It is titled 方程 ("Rectangular Grids") because it teaches us to write the equations with a matrix of numbers:
\[ \left[ \begin{array}{ccc|c} 3 & 2 & 1 & 39 \\ 2 & 3 & 1 & 34 \\ 1 & 2 & 3 & 26 \end{array} \right] \]
With this data structure, forward elimination and back substitution can be seen as compositions of elementary row operations:
-
Swapping two rows.
-
Scaling a row by a nonzero number.
-
Adding a multiple of one row to another.
Keeping the numbers organized prevents the accidental subtraction an equation from itself. Each of these operations can be undone, so they preserve data faithfully. A far cry from my childhood self, who scribbled one equation after another down the page, quickly losing track of the relationships between them.
This scheme was invented independently multiple times, though The Nine Chapters is the only known ancient record. Much later, Newton published the most influential guide to solving simultaneous linear equations. Many others contributed improvements, including Gauss, whose innovations in notation led the most prevalent name for the algorithm: Gaussian elimination.
Better late than never. I hereby check off a decades-old entry on my to-do list by implementing Gaussian elimination.
Since we’re using Haskell, instead of a two-dimensional array, we represent a matrix as a list of lists. We start with a pretty-printing routine:
To eliminate a variable from all but one of the equations, we pick one of the \(n\) variables of one of the \(n\) equations, which we call the pivot, then subtract appropriate multiples of the pivot row from the other rows.
Our code chooses the pivot by looking for a row whose head is nonzero; our code assumes such a row exists. We normalize the pivot row before elimination, that is, we scale the entire row so that the pivot is 1. Normalization is unnecessary, but we must divide by the pivot at some point anyway, and it also simplifies our back substitution code.
elimHead xs = (1:ps) : as <|> (0:) . elim <$> bs where (as, (pivot:pivotTail):bs) = span ((== 0) . head) xs ps = (/pivot) <$> pivotTail elim (h:t) = zipWith (-) t $ (h*) <$> ps
For example:
pretty $ elimHead rice putChar '\n' pretty $ elimHead anotherExample
We complete forward elimination by recursing on the remaining \(n - 1\) variables of the other \(n - 1\) equations. The recursion bottoms out when we have one variable left.
Thue we write a version of our previous function that recurses instead of
returning immediately. We take this opportunity to optimize slightly by
partitioning off all rows starting with 0 to avoid redundant elim calls.
forwardElim [] = [] forwardElim xs = (1:ps) : rest where (zs, (pivot:pivotTail):nzs) = partition ((== 0) . head) xs ps = (/pivot) <$> pivotTail rest = map (0:) $ forwardElim $ tail <$> zs <|> elim <$> nzs elim (h:t) = zipWith (-) t $ map (h*) ps pretty $ forwardElim rice putChar '\n' pretty $ forwardElim anotherExample
Now for back substitution. Forward elimination finishes with a one-variable equation, which tells us what value the variable must take. We work backwards inductively: at each step we use the values for \(k - 1\) of the variables to figure out the value of the \(k\)th variable.
backSub [[1, a]] = [a] backSub ((1:xs):rest) = (last xs - sum (zipWith (*) xs as)) : as where as = backSub $ tail <$> rest
Let’s test it on the problem from The Nine Chapters:
backSub $ forwardElim rice
If we were using floating point numbers instead of rationals, then it turns out for numerical stability, we should pivot on the row whose first element is largest in magnitude.
Zero, one, infinity
My homework focused on simultaneous equations with unique solutions. However, in general, there are two other possibilities. If the equations are inconsistent, then there is no soution. And if the equations are under-determined, then there are infinite solutions, because we may assign any desired value to at least one of the variables and still derive a solution.
We generalize our forward-elimination code by moving on to the next column if all leading entries are zero, and also by terminating if there are no more columns.
We say the resulting matrix is in row echelon form.
Our code is written so that the first nonzero entry of a row, namely, its leading entry, is always 1, though this is not a requirement for row echelon form.
echelon = \case
[] -> []
xs@([]:_) -> xs
xs | (zs, nzs) <- partition ((== 0) . head) xs -> case nzs of
[] -> map (0:) $ echelon $ tail <$> zs
(pivot:pivotTail):nzt -> (1:pt) : rest
where
pt = map (/pivot) pivotTail
rest = map (0:) $ echelon $ tail <$> zs <|> elim <$> nzt
elim (h:t) = zipWith (-) t $ map (h*) pt
pretty $ echelon rice
As unique solutions are no longer guaranteed, back substitution can no longer be seen as chaining together steps that determine one variable after another; the chain may now be broken.
We just do our best: for each row, we subtract approriate multiples of other
rows so that its leading 1 is the only nonzero entry in its column. Our code
starts from the last row and works upwards, so we sandwich the core logic
between reverse calls.
We say the resulting matrix is in reduced row echelon form.
reduce = reverse . reduceReverse . reverse
reduceReverse [] = []
reduceReverse (row:rest) = row : (go <$> reduceReverse rest) where
k = length $ takeWhile (0 ==) row
go other
| k < length other = zipWith (-) other $ map ((other!!k)*) row
| otherwise = other
From here we can read off any solutions.
If the last nonzero row is 0 .. 0 1 then the equations imply 0 = 1, that is, there is no solution:
inconsistent = [[1, 2, 3, 4], [1, 2, 3, 44]] pretty $ reduce $ echelon inconsistent
Otherwise, if the matrix contains at least one row of the form:
\[ \begin{matrix} 0 & … & 0 & 1 & a_1 & … & a_k & b \end{matrix} \]
where at least one of \(a_1, .,., a_k\) is nonzero, then each variable corresponding to a nonzero \(a_i\) can take any desired value \(v_i\). Write \(x\) for the variable corresponding to this row. Then:
\[x = b - \sum_{a_i \ne 0} a_i v_i\]
underdetermined = [[1, 2, 3, 4], [0, 2, 0, 44], [2, 4, 6, 8]] pretty $ reduce $ echelon underdetermined
Otherwise there is a unique solution. If we remove any zero rows and the last column, we’re left with a square matrix that is zero everywhere except for the main diagonal of ones, where the main diagonal goes from the top left to the bottom right; this square matrix is the identity matrix of size \(n\), where \(n\) is the number of ones.
pretty $ reduce $ echelon rice
The last column holds the values we must assign to the variables. The above matrix says:
\[ x = \frac{37}{4}, y = \frac{17}{4}, z = \frac{11}{4} \]
Zero to one
In practice, we may refuse to take "no solutions" for an answer.
Consider the following trivial example, suppose we wish to determine the weight \(x\) of a loaf of bread from a certain baker. We find one loaf weighs 499 grams:
\[ x = 499 \]
And another weighs 501 grams:
\[ x = 501 \]
Instead of saying no solution exists, most people would say a loaf of bread weighs about 500 grams.
Or suppose in the rice problem from The Nine Chapters, we’re given a fourth measurement which finds one plant of each quality together yield 16 units of rice:
\[ x + y + z = 16 \]
The values of \(x, y, z\) we found before imply the right-hand side should be larger than 16 by 0.25. Rather than say there are no solutions, intuitively, we ought to accept that errors have crept in somewhere. With no way of knowing exactly where the errors are, we should nudge the values we found so that they in some sense disagree as little as possible with each measurement.
Under some assumptions, the errors can be modeled with the normal distribution, which turns out to imply that we should use the least squares criterion to find the best solutions: we find the values that minimize the sum of the squares of the discrepancies on the right-hand side. It turns out there is a unique choice for which this occurs.
In the one-dimensional case, this is the mean value.
Same LHS, Different RHS
Suppose we change the right-hand sides of \(n\) linear equations, but leave the left-hand sides unchanged, that is, the coefficients of each variable remain the same.
Naturally, we could rerun our program. But this seems wasteful as we repeat many operations. Can we reduce the amount of work by saving data from our first run?
Reviewing our code suggests the following caching plan:
-
The
partitioncalls may cause reordering of the rows. We record the resulting permutation and pre-apply it to the equations. -
We record all the multipliers found during forward elimination. For row \(k\), we record how much we scaled it before adding it to each of rows \(k+1, …, n\).
-
If we opt to normalize, then we record the normalization scalars.
-
We record all coefficients just before we perform back substitution, that is, the row echelon form of the matrix except for the last column.
This scheme is known as the LU decomposition. The four caches are called P, L, D, and U, respectively, which stand for permutation, lower triangular, diagonal, and upper triangular. These terms make sense once we learn about matrix multiplication and elementary matrices, though already we know that U is a matrix in row echelon form, so its nonzero numbers can only appear on the main diagonal or above the main diagonal, that is, U is upper triangular.
Then given a new set of values on the right-hand side, we permute them according to P, scale and add them to one another according to L, then back-substitute acoording to U. If a D cache is present, then we normalize just before back-substitution.
We rewrite forwardElim to record information as we perform forward
elimination. We tag each row with an index, so we can later determine how they
were permuted. We also attach a list of scalars to each row: every time we add
a scaled row to another row, we append the multiplier used.
To simplify, we assume a unique solution, that is, there are \(n\) equations with \(n\) unknowns, with no redundant equations. Generalizing the code is left as an exercise.
We skip normalization; implementing a D cache is left as another exercise.
The plu function reorganizes the information recorded by forwardElim,
and prettyPlu displays them in a less unfriendly manner.
forwardElim' [] = []
forwardElim' xs = p : rest where
(zs, p@((pivot:pivotTail), _):nzs) = partition ((== 0) . head . fst) xs
rest = first (0:) <$> forwardElim' (elim <$> (zs ++ nzs))
elim (h:t, (idx, ks)) = let k = h*(1/pivot) in
(zipWith (-) t $ map (k*) pivotTail, (idx, k:ks))
plu m = (p, reverse <$> l, u) where
(u, info) = unzip $ forwardElim' $ zip m $ (, []) <$> [0..]
(p, l) = unzip info
n = length u
prettyPlu m = let
(p, l, u) = plu m
n = length p
in do
putStrLn $ "P: " ++ show p
putStrLn "L:"
pretty l
putStrLn "U:"
pretty u
These functions assume the input represents the left-hand side of the equations.
pretty $ init <$> rice prettyPlu $ init <$> rice
pretty $ init <$> anotherExample prettyPlu $ init <$> anotherExample