Jordan Canonical Speedrun

Let \( \DeclareMathOperator{\adj}{adj} \DeclareMathOperator{\det}{det} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\dim}{dim} \DeclareMathOperator{\im}{im} \DeclareMathOperator{\char}{char} \newcommand{\A}{\mathbf{A}} \newcommand{\u}{\mathbf{u}} \newcommand{\v}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \newcommand{\R}{\mathbb{R}} \newcommand{\vv}[1]{\mathbf{#1}} T\) be a linear operator over an \(n\)-dimensional vector space \(V\) over \(\mathbb{C}\). We shall show exists a basis such that the matrix representing \(T\) takes on a special form known as the the Jordan normal form, or the Jordan canonical form.

Many authors prove its existence, and in particular, the existence of eigenvalues, after establishing results involving a mysterious figure known as the determinant. Axler, Linear Algebra Done Right, notes this is a "tortuous path", because "determinants are difficult, nonintuitive and often defined without motivation".

Instead, we provide a direct (linear, if you will) proof.

jsEval "curl_module('Matrix.ob')"
import Matrix

Eigenvalues and eigenvectors

If \(\w\) is a nonzero vector such that:

\[ T\w = \lambda \w \]

for some \(\lambda\), then we call \(\w\) an eigenvector, and we call \(\lambda\) its corresponding eigenvalue. The "eigen" prefix also exists in non-mathematical English; see Eigengrau ("intrinsic grey").

Eigenvalues and eigenvectors are intrinsic to a linear operator because they are analogous to fixed points of a function. We shall see a linear operator preserves certain vectors (eigenvectors) up to multiplication by certain scalars (their corresponding eigenvalues). In some applications, the eigenvalue is 1, so the corresponding eigenvectors are precisely the fixpoints.

In functional programming, the Kleene fixed-point theorem finds the least fixpoint of a function \(f\) by repeatedly applying \(f\) to a certain value. We apply a similar strategy to linear operators.

Let \(\v\) be any nonzero vector, and consider the vectors:

\[ \v, T\v …​, T^n\v \]

These can be viewed as the first \(n + 1\) states of a discrete linear dynamical system with transition matrix \(T\) and initial state \(\v\). Their span is called the order-\((n+1)\) Krylov subspace generated by \(T\) and \(\v\).

Any subspace of \(V\) is at most \(n\)-dimensional, so these \(n+1\) vectors must be linearly dependent:

\[ a_0 \v + a_1 T\v + …​ + a_n T^n\v = 0 \]

where not all \(a_k\) are zero. Since \(\mathbb{C}\) is algebraically closed, we have:

\[ a_0 + a_1 x + …​ + a_n x^n = c(x - \lambda_1) …​ (x - \lambda_m) \]

where \(m\) is the highest power of \(x\) with a nonzero coefficient. Thus:

\[ (T - \lambda_1) …​ (T - \lambda_m) \v = 0 \]

Thus at least one of the factors is non-invertible. Label it \(T - \lambda\). Its kernel (or nullspace) \(\ker(T - \lambda )\) is nontrivial and is called the eigenspace associated with \(\lambda\), and any nonzero vector \(\w\) of this eigenspace satisfies:

\[ T\w = \lambda \w \]

In other words, given any linear operator on a nonempty space, we can find an eigenvalue.

This proof shows off the power of algebra. We mostly avoid thinking about linear operators and instead mechanically manipulate symbols according to the ring axioms.

Nilpotence

A linear operator \(N\) is nilpotent if \(N^q = 0\) for some positive integer \(q\). The smallest \(q\) for which this holds is the index of nilpotence.

The zero operator a space of any dimension is nilpotent with index 1, and no nonzero operators have index 1.

The simplest nontrivial example is the 2-dimensional operator that maps:

\[ (x,y) \mapsto (y,0) \]

which clearly has index 2. We can represent it with the matrix:

\[ \begin{bmatrix} 0 & 1 \\ 0 & 0 \\ \end{bmatrix} \]

We can extend this idea to construct a nilpotent operator of any desired index \(q\):

\[ (x_1, …​, x_{q-1}, x_q) \mapsto (x_2, …​, x_q, 0) \]

This is like repeatedly shifting a \(q\)-bit word by one bit:

  • There exist vectors which remain nonzero after \(q - 1\) iterations

  • We always reach zero after \(q\) iterations.

The \(q\times q\) matrix representing this operator is:

\[ \begin{bmatrix} 0 & 1 & & 0 \\ & \ddots & \ddots \\ & & & 1 \\ 0 & & & 0 \\ \end{bmatrix} \]

nilBlock q = take q $ take q <$> iterate (0:) (0 : 1 : repeat 0)

All entries are zero except for those on the superdiagonal, the name given to the entries directly above the main diagonal. Each entry on the superdiagonal is 1.

The direct sum of nilpotent operators is nilpotent. Thus if we build \(N\) by direct summing various sizes of our the above example, we can describe \(N\) with a list of positive integers \(q_1,…​,q_m\) that indicate the dimensions of each subspace in the direct sum.

directSum a b = (++ zb) <$> a <|> (za ++) <$> b where
  za = replicate (length a) 0
  zb = replicate (length b) 0

nilpotent = foldr1 directSum . map nilBlock

Matrix $ nilpotent [1, 1, 1, 1]

Matrix $ nilpotent [4, 2, 1]

There is no other way

We now prove there are no other nilpotent operators: any nilpotent operator has a basis such that the matrix representing the operator takes the above form, namely, the direct sum of matrices whose superdiagonals consist entirely of 1s, and are zero everywhere else.

Let \(N\) be a nilpotent operator of index \(q\) over a vector space \(V\), and let \(\v\) be a vector such that \(N^{q-1} \v \ne 0\). Let

\[ S = \langle \v, N \v, …​, N^{q-1} \v \rangle \]

(the order-\(q\) Krylov subspace generated by \(N\) and \(\v\)). These \(q\) vectors are linearly independent (exercise).

We’re off to a good start. If we choose a basis starting with the vectors of \(S\), then the first \(q\) columns of \(N\) are:

\[ \begin{matrix} 0 & 1 & & 0 \\ & \ddots & \ddots \\ & & & 1 \\ 0 & & & 0 \\ \vdots & & & \vdots \\ 0 & & & 0 \\ \end{matrix} \]

Next, we find some subspace \(T\) to reduce \(N\) by \(S \oplus T\), thus we want \(V = S \oplus T\) with \(T\) invariant under \(N\).

The base case \(q = 1\) is trivial. Assume we can do this for nilpotence index \(q - 1\), Under \(N\), the space \(\im N\) is invariant and has nilpotence index \(q - 1\). Let \(S'\) be the span of all but the first of the above vectors:

\[ S' = \langle N \v, …​, N^{q-1} \v \rangle \]

By inductive assumption \(\im N = S' \oplus T' \) for some subspace \(T'\) invariant under \(N\).

We applied \(N\) to \(S\) and obtained \(T'\), so we might guess undoing \(N\) on \(T'\) will finish off the proof. Let \(U\) be the preimage of \(T'\) under \(N\), that is:

\[ U = N^{-1}(T') = \{ \w \mid \w \in V, N \w \in T' \} \]

For any \(\vv{x} \in V\), we have \(N\vv{x} = \vv{s}' + \vv{t}' \) for some \(\vv{s}' \in S', \vv{t}' \in T'\). Since \(\vv{s}'\) is a linear combination of \(N\v, …​, N^{q-1}\v\), we have \(\vv{s}' = N \vv{s}\) for some \(\vv{s} \in S\). Thus:

\[ \vv{t}' = N(\vv{x} - \vv{s}) \]

Thus by definition \(\vv{x} - \vv{s} \in U\), hence \(\vv{x}\) is the sum of a vector in \(U\) and a vector in \(S\). Therefore \(V = S + U\).

Unfortunately, this sum is not direct; for starters, we have \(N^{q-1} \v \in S \cap U\). We need to whittle down \(U\) carefully. To this end, we first show \(S\) and \(T'\) are disjoint.

Suppose \(\vv{x} \in S \cap T'\). Then \(\vv{x} \in S\) implies \(N\vv{x} \in S'\), and since \(T'\) is invariant under \(N\), we find \(N \vv{x} \in T'\). But \(S'\) and \(T'\) are disjoint hence \(N \vv{x} = 0\), which in turn implies \(\vv{x}\) is some scalar multiple of \(N^{q-1}\v\) (exercise) so \(\vv{x}\in S'\). Again, since \(S'\) and \(T'\) are disjoint we must have \( \vv{x} = 0 \), that is, as desired, \(S \cap T' = 0\).

Take any subspace \(R\) satisfying:

\[ R \oplus T' \oplus (S \cap U) = U \]

Let \(T = R \oplus T'\). Then \(V = S \oplus T\), and \(T\) is invariant under \(N\) since it contains \(T' = \im U\).

We also prove uniqueness. Suppose we have \(V = S_1 \oplus T_1 = S_2 \oplus T_2\) where \(S_1\) and \(S_2\) have index \(q\) and each of the subspaces \(S_1, S_2, T_1, T_2\) are invariant under \(N\). Let index of nilpotence of \(N\) on the spaces \(T_1, T_2\) be \(r_1, r_2\) respectively. Then by invariance:

\[ N^{r_1} V = N^{r_1} S_1 \oplus N^{r_1} T_1 = N^{r_1} S_2 \oplus N^{r_1} T_2 \]

Taking dimensions yields:

\[ (q - r_1) + 0 = (q - r_1) + (r_2 - r_1) \]

Therefore \(r_1 = r_2\).

Thus given any nilpotent operator \(N\), we can repeatedly decompose spaces to find \(m\) vectors \(\v_1, …​, \v_m\) and positive integers \(q_1 \ge …​ \ge q_m\) such that:

\[ \begin{align} &\v_1, N\v_1, …​, N^{q_1 - 1}\v_1, \\ &\v_2, N\v_2, …​, N^{q_2 - 1}\v_2, \\ &\vdots \\ &\v_m, N\v_m, …​, N^{q_m - 1}\v_m \\ \end{align} \]

is a basis of \(V\), and \(N^{q_1}\v_1 = …​ = N^{q_m}\v_m = 0\).

Moreover, the integers \(m, q_1, …​, q_m\) are isomorphism invariants for \(N\), that is, we see the same integers from a nilpotent operator \(N'\) if and only if \(N' = T^{-1} N T\) for some isomorphism \(T\).

With a suitable ordering of this basis, the matrix representing \(N\) takes the form we described. QED

It may help to walk through an example. Consider \(N : \R^4 → \R^4 \) given by:

\[ (a, b, c, d) \mapsto (0, a, 0, c) \]

This has nilpotence index 2. Choose \(\v = (1,0,0,0)\). Then:

  • \(N \v = (0,1,0,0)\)

  • \(S\) is the space \((a, b, 0, 0)\)

  • \(S'\) is the space \((0, b, 0, 0)\)

  • \(\im N\) is the space \((0, b, 0, d)\)

  • a possible \(T'\) is the space \((0,2d,0,d)\); this is the trivial case because \(N (\im N) = {0}\), so we can pick any \(T'\) such that \(\im N = S' \oplus T'\)

  • \(U\) is the space \((2c,b,c,d)\)

  • \(S \cap U\) is the space \((0, b, 0, 0)\)

  • a possible \(R\) is the space \((2c, 0, c, 0)\), though we could gratuitously replace the zeros with arbitrary multiples of \(c\)

  • \(T = R \oplus T'\) is the space \((2c, 2d, c, d)\)

where "space" is short for "space of vectors of the form". As promised, \(N\) is reduced by \(S \oplus T\).

Nilpotent plus invertible

Let \(T\) be a linear operator. We prove \(T\) is the direct sum of a nilpotent operator and an invertible operator.

Consider the kernels of the powers of \(T\):

\[ \ker(T) \subseteq \ker(T^2) \subseteq …​ \]

Finite-dimensionality implies these inclusions cannot always be strict, so let \(q\) be the smallest integer for which \(\ker(T^q) = \ker(T^{q+1})\). Then \(\ker(T^{q+k}) = \ker(T^q)\) by induction (exercise).

Now suppose \(\v \in \ker(T^q) \cap \im(T^q)\). Then \(T^q \v = 0\) and \(\v = T^q \w\) for some \(\w\), hence \(T^{2q} \w = 0\). But \(\ker(T^q) = \ker(T^{2q})\) thus \(\v = T^q \w = 0\), that is, the spaces are disjoint. The rank-nullity theorem implies:

\[ V = \ker (T^q) \oplus \im (T^q) \]

By definition \(T\) is nilpotent on \(\ker(T^q)\) with index \(q\).

Lastly, suppose \(\v = T^q \w\) for some \(\w\) and \(T \v = 0\). Then \(\v = T^q \w = T^{q+1} \w = 0\), which means \(T\) is invertible on \(\im (T^q)\).

Jordan normal form

Let \(T\) be a linear operator. If the space is empty, then there is nothing to do. Otherwise, let \(\lambda_1\) be an eigenvalue of \(T\), and write \(T - \lambda_1\) as a direct sum of a nilpotent operator and an invertible operator:

\[T - \lambda_1 = N_1 \oplus (T_2 - \lambda_1) \]

where \(N_1\) is nilpotent and \(T_2 - \lambda_1\) is invertible. As the eigenspace must be nonzero, the second subspace is strictly smaller than the entire space. We have:

\[T = (N_1 + \lambda_1) \oplus T_2 \]

If \(T_2\) is nonempty then it has a nonzero eigenvalue \(\lambda_2\), and again we apply the theorem:

\[T = (N_1 + \lambda_1) \oplus (N_2 + \lambda_2) \oplus T_3 \]

where \(N_2\) is nilpotent and \(T_3\) is invertible. This process must eventually terminate for finite-dimensional spaces:

\[T = (N_1 + \lambda_1) \oplus …​ \oplus (N_k + \lambda_k) \]

where each of \(N_1, …​, N_k\) are nonempty nilpotent operators, and \(\lambda_1 ,…​, \lambda_k\) are the eigenvalues of \(T\). It turns out \(\lambda_1, …​, \lambda_k\) must be distinct, but we had no use for this fact, for even if they weren’t, we could merge identical eigenvalues because the direct sum of nilpotent operators is nilpotent.

Thus with a suitable basis the matrix representing \(T\) is the direct sum of Jordan blocks, where all other entries are 0, and where a Jordan block is a square matrix such that:

  • Each diagonal entry is the same value \(\lambda\).

  • Each superdiagonal entry is 1.

  • All other entries are 0.

This is called Jordan normal form. Look ma, no determinants!

jordanBlock (a, n) = take n $ take n <$> iterate (0:) (a : 1 : repeat 0)

jordan = foldr1 directSum . map jordanBlock

Matrix $ jordan [(6, 7)]
Matrix $ jordan [(6, 3), (7, 2), (7, 1), (7, 1)]

Some define Jordan normal form using the subdiagonal instead of the superdiagonal, namely, the entries directly below the main diagonal.

We can follow the above to find the Jordan normal form of the transition matrix of the rabbits example from a previous section, but better to wait until we’ve encountered the characteristic equation.


Ben Lynn blynn@cs.stanford.edu 💡