Algebraic Thinking

We’ve explored the linear in linear algebra. Now for the algebra.

Recall a function \( \newcommand{\A}{\mathbf{A}} \newcommand{\u}{\mathbf{u}} \newcommand{\v}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \newcommand{\vv}[1]{\mathbf{#1}} \newcommand{\R}{\mathbb{R}} \DeclareMathOperator{\dim}{dim} \DeclareMathOperator{\im}{im} \DeclareMathOperator{\End}{End} f: \R^n → \R^m \) is linear if:

\[ \begin{align} f(\v) + f(\w) &= f(\v + \w) \\ f (a \v) &= a f(\v) \end{align} \]

These equations work whenever the domain and range to be equipped with addition and scalar multiplication, suggesting that we can generalize linearity beyond tuples of reals.

While we could develop theory using any binary operation for addition and any notion of scalar multiplication, experience has shown it is fruitful to set the following guidelines.

Firstly, we allow any field \(K\) in place of \(\mathbb{R}\). Considering rings instead of fields leads to modules, but that’s a story for another day. For our applications, we only need \(K = \mathbb{Q}\), \(\mathbb{R}\), or \(\mathbb{C}\).

Next, let \(V\) be the domain. if we’re using additive notation, then \(V\) better be an abelian group, that is, addition should be symmetric, associative, and possess an identity and inverses. In symbols, for all \(\v, \w \in V\):

\[ \begin{align} \v + \w &= \w + \v \\ (\u + \v) + \w &= \u + (\v + \w) \\ \v + 0 &= \v \\ \exists (-\v) . \v + (-\v) &= 0 \end{align} \]

Lastly, we nail down the rules for scalar multiplication. In group theory, if \(n \in \mathbb{Z}\), then \(n \v\) represents \(\v + …​ + \v\) where there are \(n\) copies of \(\v\). This integer scaling behaves nicely: when \(n = 1\) we get \(\v\); it associates with integer multiplication; it distributes over integer addition and group addition.

We require properties to hold when we replace \(\mathbb{Z}\) with \(K\), that is, for all \(a, b \in K\), and for all \(\v, \w \in V\):

\[ \begin{align} 1 \v &= \v \\ a (b \v) &= (a b) \v \\ a (\v + \w) &= a \v + a \w \\ (a + b)\v &= a\v + b\v \\ \end{align} \]

Any set \(V\) satisfying the above conditions is a vector space over \(K\), and an element of \(V\) is a vector. If \(W\) is also a vector space over the same field \(K\), then linear maps from \(V\) to \(W\) make sense.

In sum, vectors spaces are like free abelian groups, except instead of integer coefficients our scalars hail from some field \(K\). Some of the theorems are similar (compare the rank of a free abelian group with the dimension of a vector space), though the proofs are have a different flavour since some integer tricks only apply to abelian groups.

Beyond Tuples

So far, we only looked at tuples of field elements. Now that we have loosened up and defined a vector space abstractly, can we find other kinds of mathematical objects that are vector spaces?

Many of our results so far implicitly depend on properties of tuples of numbers. Abstraction requires us to be more explicit: exactly which properties of tuples do we use? Thus we’re led to the following definitions.

The span of a set of vectors \(\{\v_1,…​,\v_n\}\) is denoted by angle brackets and consists of all linear combinations of the elements:

\[ \langle \v_1,…​,\v_n \rangle = \left\{ \sum_{i=1}^n a_i \v_i | i \in [1..n], a_i \in K \right\} \]

A set of vectors \(\{\v_1,…​,\v_n\}\) is linearly independent if every nontrivial linear combination of them is nonzero:

\[ \sum_{i=1}^n a_i \v_i = 0 \implies \forall i . a_i = 0 \]

Steinitz exchange lemma: Let \(V\) be a vector space. Suppose the vectors of the set \(S = \{\v_1,…​,\v_n\}\) span \(V\), and suppose \(\{\w_1,..,\w_m\}\) are linearly independent. Then \(n \ge m\).

Proof: The vector \(\w_1\) is some nontrivial linear combination of the vectors of \(S\). Relabel so that the coefficient of \(\v_1\) is nonzero:

\[ \w_1 = a \v_1 + …​ \]

where \(a \ne 0\). Then \(\v_1\) is some linear combination of \(\w_1, \v_2, …​, \v_n\):

\[ \v_1 = a^{-1} \w_1 + …​ \]

hence the set \(S' = \{\w_1, \v_2, …​, \v_n\}\) also spans \(V\). Here we witness the utility of taking the reciprocal of a coefficient, a superpower unavailable in abelian groups.

If \(m = 1\) then we have shown \(n \ge m\). Otherwise write \(\w_2\) as a linear combination of the vectors of \(S'\). For some \(i\), the coefficient of \(\v_i\) must be nonzero by linear independence of \(\w_1, \w_2\). Relabel so that \(\v_2\) has a nonzero coefficient, then like before, the set \(S'' = \{\w_1, \w_2, \v_3, …​, \v_n\}\) spans \(V\).

We iterate this argument until we have replaced \(m\) of the elements of \(S\) with \(\w_1, …​, \w_m\), implying that \(n \ge m\). QED.

A basis of a vector space \(V\) is a linearly independent set of vectors than span \(V\).

The known is finite

With the above definitions, the set of polynomials over some field can be considered a vector space. One possible basis is \(\{1, x, x^2, …​\}\).

Let \(D\) denote differentiation and \(S\) denote integration where the choice of constant is zero. Then both \(S\) and \(D\) are linear maps, and we have \(D S = \vv{I}\). Yet \(S D \ne \vv{I}\). This odd behaviour is impossible with matrices arising from solving linear systems.

We dodge quirks like this by restricting ourselves to finite-dimensional vector spaces: a vector space is finite-dimensional if a finite basis exists. Then the Steinitz exchange lemma implies all bases of a finite-dimensional vector space \(V\) have the same size, which we call the dimension of \(V\) and denote \(\dim V\).

In other words, we simplify by banning infinite-dimensional vector spaces. These days, such beasts are considered part of functional analysis, rather than linear algebra.

Since a finite-dimensional vector space \(V\) over a field \(K\) has some basis \(\v_1,…​,\v_n\), where \(n = \dim V\), every element of \(V\) can be written uniquely as a linear combination of these \(n\) vectors, and hence can be represented with an element of \(K^n\), that is, a tuple of elements of \(K\).

To answer an earlier question, there are indeed exotic mathematical objects that are vector spaces, such as polynomials. But we declare them to be beyond our scope, so we’re back to tuples.

Deja Vu

Why bother with all this song and dance? We’ve already been studying tuples of field elements!

Getting back to where we started is in fact a strength of abstract algebra. The goal is to discover the minimal set of rules; the core mathematical truth. It’s a good sign when a handful of simple principles lead us to familiar territory.

For example, Galois invented group theory when he studied permutations. Today, we define groups to be sets equipped with an associative binary operation with identity and inverse. By Cayley’s Theorem, a group is really a bunch of permutations anyway! Nonetheless, the abstract algebraic viewpoint is invaluable for studying groups, as permutations are often distracting details.

Likewise for linear algebra. Matrices can be distracting. We already saw this with matrix multiplication, where proving associativity from the multiplication formula directly is painful, yet is immediate once we move to a higher level and think of a matrix as a function.

We could all use a little change

Abstraction opens our eyes to what a basis can be. Earlier, we assumed matrices used coordinates with respect to the basis that Fermat and Descartes would approve of, where the axes are perpendicular to each other and each basis vector has unit length. While this standard basis occupies a special place in our hearts, abstraction pinpoints what we really care about: linearly independent vectors that span the space.

This frees our thinking, encouraging us to choose a basis that suits the problem at hand. Indeed, we shall see a big part of linear algebra is about picking a basis so that a given linear transformaion can be represented with a simple matrix.

Without abstraction, coordinates with respect to the standard basis pervades the discussion, obscuring the underlying ideas. (Exercise: which linear operators have the same matrix representations no matter which basis we choose?)

For example, consider the 2D linear transformation \(T\) that:

  • maps \( (1,0) \) to \( (2,0) \) (horizontally scales by 2)

  • maps \( (1,1) \) to \( (-1,-1) \) (reflects the \(y = x\) diagonal)

A nice basis for this transformation is \( \{ (1, 0), (1, 1) \} \), as we can immediately write down the matrix representing it:

\[ A = \begin{bmatrix} 2 & 0 \\ 0 & -1 \end{bmatrix} \]

which is easy to apply. For instance, to apply \(A\) to \( (6, 7) \) we double the first coordinate and negate the second: \( (12, -7) \).

Although we can think using any basis we like, in practice, we need the standard basis. For example, in computer graphics, objects are described with Cartesian coordintes, and plotting points requires screen coordinates.

Converting back to standard coordinates is easy. For example, \( (6,7) \) becomes:

\[ 6 (1, 0) + 7 (1, 1) = (13, 7) \]

and \( (12, -7) \) becomes:

\[ 12 (1, 0 ) - 7 (1, 1) = (5, -7) .\]

What about the other way?

Let \(P : \R^2 → \R^2 \) be the function that converts standard coordinates to coordinates with respect to our specially chosen basis. This change of basis is an invertible linear transformation (exercise), and its inverse is simply the basis vectors written as columns (exercise):

\[ P^{-1} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \]

The inverse of this matrix is:

\[ P = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix} \]

Then to find the matrix representing \(T\) with respect to the standard basis, we multiply the matrices corresponding to the following steps:

  1. Convert standard coordinates to special coordinates with \(P\).

  2. Apply \(A\), the matrix that is nice and simple becaues of our carefully chosen basis.

  3. Convert special coordinates to standard coordinates with \(P^{-1}\).

That is, we want \( P^{-1} A P \).

dot xs ys = sum $ zipWith (*) xs ys
matrixMul aRows bRows = go <$> aRows where
  go row = dot row <$> transpose bRows

b = foldr1 matrixMul [pinv, a, p] where
  pinv = [[1,1], [0,1]]
  p = [[1,-1], [0,1]]
  a = [[2,0], [0,-1]]

b

We confirm this matrix works as advertised on a couple of points:

matrixMul b [[1], [0]]
matrixMul b [[1], [1]]

Algebraically, we just have a product of three matrices. But each matrix serves a different geometric role. The change-of-basis matrix \(P\) a passive transformation, and so is its inverse \(P^{-1}\), while the scaling and reflection matrix \(A\) is an active transformation; it does the actual work.

Two matrices \(A, B\) are called similar if there exists an invertible matrix \(P\) such that:

\[ B = P^{-1} A P. \]

Give us a ring

Consider the linear operators on some vector space \(V\), also called the endomorphisms of \(V\) and denoted \(\End V\). Then \(\End V\) is a vector space, with addition and scalar multiplication defined in a way that is natural to a functional programmer familiar with the reader monad:

\[ \begin{align} f + g & : \v \mapsto f(\v) + g(\v) \\ a f & : \v \mapsto a f(\v) \\ \end{align} \]

endoScale a = ((a*) .)
endoAdd = liftA2 (+)

But wait, there’s more! Define the product of two linear operators \(f, g : V → V\) by:

\[ f g = f \circ g \]

endoMul = (.)

Then \(\End V\) is a ring. (Exercise: which functions play the role of zero and one?) Thus we can apply all our knowledge about rings to linear operators.

As always, we can switch gears at any time and think about matrices again: for all \(n\), We find the \(n\times n\) matrices over \(K\) have a ring structure. This matrix ring is denoted \({\rm Mat}_n(K)\) or \({\rm M}_n(K)\) or \(K^{n\times n}\).

Subspaces

In life, understanding the parts help us understand the whole. In algebra, this heuristic becomes delightfully recursive, as we seek substructures that are miniature versions of the original. When studying sets, we study subsets. When studying groups, we study subgroups. And when studying vector spaces, we study subspaces.

Let \(V\) be a vector space over a field \(K\). Then a nonempty subset \(U\) of \(V\) is a subspace of \(V\) if for all \(a, b \in K\):

\[ \v, \w \in U \implies a\v + b\w \in U \]

Theorem: Let \(U\) be a subspace of \(V\). Then \(\dim U \le \dim V\).

Proof: We start from the empty set and extend it to a basis by repeatedly adding vectors.

Let \(n = \dim V\) If \(U = \{ 0\} \) then \(U\) is generated by the empty set, thus \(\dim U = 0 \le n\).

Otherwise there exists some nonzero vector \(\u_1 \in U\), which also lies in \(V\) so \(n \ge 1\). If \(U = \langle \u_1 \rangle\) then \(\dim U = 1 \le n.\)

Otherwise we iterate: there exists some \(\u_2 \in U \setminus \langle \u_1 \rangle\); by definition \(\u_2\) must be linearly independent to \(\u_1\). If \(U = \langle \u_1, \u_2 \rangle\) then \(\dim U = 2 \le n \).

Repeating this argument at most \(n\) times shows \(\dim U \le n\), by the Steinitz exchange lemma. QED.

More generally, instead of the empty set, we can start from some set of linearly independent vectors and extend it to a basis, a useful trick we shall need later.

Let \(U, V\) be subspaces of a vector space. Their intersection \(U \cap V\) is a subspace (exercise). Their union might not be a subspace. Instead we define their sum \(U + V\) as the smallest subspace containing their union, namely:

\[ U + V = \{ \u + \v : \u \in U, \v \in V \} \]

The subspace \(U \cap V\) must contain the zero vector. If it only contains the zero vector, namely \(U \cap V = \{0\}\), then we say \(U\) and \(V\) are disjoint.

Divide et impera

Sometimes, the whole is precisely the (direct) sum of its parts. In such cases, we can completely understand a big thing by studying a bunch of small things.

Let \(U, V\) be two subspaces of \(W\). Suppose every \(\w \in W\) can be uniquely written in the form:

\[ \w = \u + \v \]

where \(\u \in U, \v \in V\). Then we write:

\[ W = U \oplus V \]

and say \(W\) is the direct sum of \(U\) and \(V\). (Without the uniqueness condition, we just have \(U + V\).)

For example, we can rephrase an earlier sentence as follows. If \(U\) is a subspace of \(W\), and we extend some basis of \(U\) to a basis of \(W\) with the vectors \(\v_1, …​ \v_m\), then \(W = U \oplus \langle \v_1, …​, \v_m \rangle\).

Suppose \(W = U \oplus V\). Let \(T_U, T_V\) be linear operators on \(U, V\) respectively. Then their direct sum \(T = T_U \oplus T_V\) is the map given by:

\[ \w = \u + \v \mapsto T_U \u + T_V \v \]

where \(\u \in U, \v \in V, \w \in W\). This is well-defined since the left-hand sum is unqiue. We say \(T\) is reduced by \(U \oplus V\). Exercise: show that we can find a basis of \(W\) such that the matrix representing \(T\) has the form:

\[ \begin{bmatrix} A & 0 \\ 0 & B \end{bmatrix} \]

where \(A, B\) are square matrices with side lengths \(\dim U, \dim V\) respectively, and each \(0\) represents a zero matrix.

Reduction is a divide-and-conquer strategy: we handle a linear map on a big space by breaking it down into linear maps on smaller spaces.

Let \(T : W → W\) be a linear operator. A subspace \(V \subseteq W\) is invariant under \(T\) if \(T V \subseteq V\). Exercise: show we can find a basis of \(W\) such that the matrix representing \(T\) has the form:

\[ \begin{bmatrix} A & C \\ 0 & B \end{bmatrix} \]

where \(A, B\) are square matrices with side lengths \(\dim V, \dim W - \dim V\), and \(C\) is some matrix, and \(0\) is a zero matrix.

We see \(T\) is reduced by \(U \oplus V\) if and only if both \(U\) and \(V\) are invariant under \(T\).

Rank-Nullity Theorem

Theorem: Let \(T : V → W\) be a linear transformation. Then:

\[ \dim \im(T) + \dim\ker(T) = \dim V \]

The two terms on the left-hand side are called the rank and nullity of \(T\), respectively.

Proof: We can easily check \(\im(T)\) and \(\ker(T)\) are vector spaces.

Let \(\v_1,…​,\v_k\) be a basis of \(\ker(T)\). Extend it to a basis of \(V\) with vectors \(\w_1.,,,.\w_m\), where \(k + m = n\). By linearity:

\[ \langle T \v_1, …​, T \v_k, T\w_1, …​, T\w_m \rangle = \im(T) \]

By the definition of \(\ker(T)\):

\[ \langle T\w_1, …​, T\w_m \rangle = \im(T) \]

Thus \(\dim \im(T) \le m \). We prove we must have equality by showing these vectors are also linearly independent. Suppose some linear combination of them is zero, that is:

\[ \sum_{i=1}^m a_i T\w_i = 0 \]

By linearity:

\[ \sum_{i=1}^m a_i \w_i \in \ker(T) \]

Since every \(\w_i\) lies outside \(\ker(T)\):

\[ \sum_{i=1}^m a_i \w_i = 0 \]

Hence \(a_i = 0\) for all \(i\), since these are basis vectors. QED.


Ben Lynn blynn@cs.stanford.edu 💡