Orthogonality

Humans have an innate sense of three dimensions of space, so the concepts of length and angle are obvious to us. But how do we generalize lengths and angles to higher dimensions?

One approach is the dot product. For up to three dimensions of space, we can visualize that:

\[ \newcommand{\u}{\mathbf{u}} \newcommand{\v}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \newcommand{\vv}[1]{\mathbf{#1}} \newcommand{\R}{\mathbb{R}} \v \cdot \w = |\v| |\w| \cos \theta \]

The dot product means we can discuss lengths and angles with algebra alone. We are no longer limited to three dimensions, because the dot product generalizes easily to any finite-dimensional vector space: simply mulitply corresponding coordinates of two input vectors and sum them.

We can generalize further, and do without coordinates completely. Define an inner product on a real vector space \(V\) to be a function \( ( \cdot ) : V \times V → \mathbb{R} \) satisfying:

  1. Symmetry: \[ \v \cdot \w = \w \cdot \v \]

  2. Linearity in first argument: \[ (a_1 \v_1 + a_2 \v_2) \cdot \w = a_1 (\v_1 \cdot \w) + a_2 (\v_2 \cdot \w) \]

  3. Positive definiteness: \[ \v \ne 0 \implies \v \cdot \v \gt 0 \]

Experience has shown that these are the only properties that matter. A symmetric, bilnear, positive-definite function is enough to derive results that match our understanding of three-dimensional sapce. Inner products are so general that they even allow us to talk about lengths and angles in some infinite-dimensional vector spaces.

The norm or length of a vector \(\v\) is:

\[ \lVert \v \rVert = \sqrt{\v \cdot \v} \]

Two vectors \(\v, \w\) are orthogonal if \(\v \cdot \w = 0\). An orthogonal set is a set of vectors such that any two are orthogonal. An orthonormal set is an orthogonal set where each vector has unit length.

From human familiarity with three dimensions, it seems any orthogonal set of nonzero vectors must be linearly independent, and if is not a basis, then we can add more vectors so it is an orthogonal basis of our space. However, since we have committed to an algebraic definition of orthogonality, we must also prove these seemingly basic facts with algebra alone.

Let \( \{\v_1,…​,.\v_n \} \) be an orthogonal set of nonzero vectors. The equations are nicer if we normalize, so let:

\[ \u_i = \frac{\v_i}{\lVert \v_i \rVert} \]

These vectors span the same space, but are now orthonormal. Let \( \v \) be some vector in their span of either set, so \(\v = \sum_{i=1}^n a_i \u_i, \) where each \(a_i\) is some scalar.

Then by linearity, and because the \(\u_i\)'s are orthonormal, we find for all \(k\):

\[ \u_k \cdot \v = \u_k \cdot \left( \sum_{i=0}^n a_i \u_i \right) = \sum_{i=0}^n a_i \u_k \cdot \u_i = a_k \]

Thus each \(a_k\) is uniquely determined, implying that orthogonal sets are linearly independent, and the inner product \(\u_i \cdot \v\) can be thought of as returning a coordinate of a point with respect to the \(\u_i\):

\[ \v = \sum_{i=1}^n (\u_i \cdot \v) \u_i \]

Now let \(\v\) be a vector outside the span of the \(\u_i\), so that the above equality fails, and consider the difference:

\[ \w = \v - \sum_{i=1}^n (\u_i \cdot \v) \u_i \]

When \(n = 1\), the vector \( (\u_1 \cdot \v) \u_1 \) is sometimes called the vector projection of \(\v\) on \( \u_1 \), and \( \w \) the vector rejection of \(\v\) from \( \u_1 \).

These terms generalize to higher \(n\), and we can call \( \sum_{i=1}^n (\u_i \cdot \v) \u_i \) the vector projection of \(\v\) on the subspace generated by the \(\u_i\) and \(\w\) the vector rejection of \(\v\) from the subspace generated by the \(\u_i\).

To make a proof below easier to follow, we remind the viewer that the normal \(\u_i\) vectors and norms are mostly a notational convenience, and we can rewrite the equation with the \(\v_i\) and inner products only:

\[ \w = \v - \sum_{i=1}^n \frac{\v_i \cdot \v}{\v_i \cdot \v_i} \v_i \]

Anyway, as before, for any \(k\):

\[ \u_k \cdot \w = \u_k \cdot \v - \u_k \cdot \left(\sum_{i=1}^n (\u_i \cdot \v) \u_i\right) = \u_k \cdot \v - \u_k \cdot \v = 0 \]

Thus \(\w\) is orthogonal to \(\u_1,…​,\u_n\), and by construction, the span of \( \{\w, \u_1,…​, \u_n \} \) is the same as the span of \( \{\v, \u_1,…​, \u_n \} \).

This yields a non-constructive inductive proof that any \(n\)-dimensional vector space has an orthgonal basis (which can trivially be made orthonormal).

We start from the empty set \(B_0\). For \(k \ge 0 \), if \(B_k\) does not span the entire space, let \(\v\) be some vector lying outside the span, and use the above equation to compute a vector \(\w\) orthogonal to each vector of \(B_k\), and define \(B_{k+1} = B_k \cup \{ \w \}. \) This process must terminate in exactly \(n\) steps; the set \(B_n\) is an orthgonal basis of the entire space.

Did you spot the non-constructive step? It was when we magically plucked some vector \(\v\) not contained in the span. We shall see that a constructive version of this result is invaluable, for both theory and practice. This is trivial when we’re given some basis of the space: we simply pick successive basis vectors for our \(\v\).

We walk through the details of this important algorithm, known as Gram-Schmidt orthogonalization.

Gram-Schmidt

Let \(\v_1, …​, \v_n\) be a basis of a vector space \(V\).

Define \(\w_1 = \v_1\), and for \(k \gt 1\) define:

\[ \w_{k+1} = \v_{k+1} - \sum_{i=1}^k \frac{\w_i \cdot \v_{k+1}}{\w_i \cdot \w_i} \v_{k+1} \]

Then \(\w_1,…​,\w_n\) is the Gram-Schmidt orthogonalization of \(\v_1,…​,\v_n,\) that is, the \(\w_i\) are an orthogonal set of nonzero vectors that span \(V\).

jsEval "curl_module('Matrix.ob')"
import Matrix
gramSchmidt = go [] where
  go acc = \case
    [] -> reverse acc
    v:vt -> go (foldr (zipWith (+)) v (muMul v <$> acc):acc) vt
  muMul v w = (-(mu v w) *) <$> w

mu v w = dot v w / dot w w

For example:

rice =
  [ [3, 2, 1]
  , [2, 3, 1]
  , [1, 2, 3]
  ]

gramSchmidt rice

Let’s perform some sanity checks. The first vectors of the input and output should match:

g = gramSchmidt rice

head g == head rice

The output vectors should be pairwise orthogonal:

[dot (g!!i) (g!!j) | let n = length g, i <- [0..n-1], j <- [i+1..n-1]]

By convention, we tend to write vectors as column vectors, which just means we sprinkle transpose here and there throughout our code.

Let \(G\) be the matrix of the Gram-Schmidt orthogonalization of \(\v_1, …​, \v_n\), written as column vectors. The following computes the matrix \(H\) satisfying \(G H = A\), where the columns of \(A\) are \(\v_1, …​, \v_n\).

unGramSchimidt vs = transpose $ go <$> vs where
  go v = mu v <$> ws
  ws = gramSchmidt vs

h = unGramSchimidt rice

Let’s check it works as advertised:

matrixMul (transpose g) h == transpose rice

Next, we check by eye that every diagonal entry of \(H\) is 1, and that it is upper triangular:

Matrix h

This implies that the \(k\)th column vector of \(G\) can be written in terms of the first \(k\) columns of \(A\), and furthermore the \(k\)th coefficient is 1, thus each step really did take \(\v_k\) and subtract some linear combination of the previous entries. In particular the first vector is preserved.

Normalizing an orthogonal basis produces an orthonormal basis, which requires taking square roots.

normalize v = (recip (sqrt $ dot v v) *) <$> v

orthonorm = map normalize . gramSchmidt

orthonorm rice

Let \(Q\) be the matrix resulting from nomralizing each column vector of \(G\), that is, we compute the Gram-Schmidt orthogonalization of \(\v_1, …​, \v_n\) and normalize each vector.

Like before, we can find an upper triangular matrix \(R\) such that \(QR = A\), where the columns of \(A\) are \(\v_1, …​, \v_n\).

qr vs = (transpose g, matrixMul g vs) where g = orthonorm $ transpose vs

(q, r) = qr rice

Matrix $ q `matrixMul` r
Matrix q
Matrix r

This is known as the QR decomposition of a matrix.

Complex Inner Product Spaces

Generalizing the underlying field to the complex numbers runs into a hiccup. Consider one-dimensional space, that is just \(\mathbb{C}\). We want:

\[ |z|^2 = z \cdot z \]

and we want our definition to work for the classic norm we first learned in school, that is, there should be some choice of (\(\cdot\)) so that:

\[ |a + bi|^2 = (a + bi) \cdot (a + bi) = a^2 + b^2 \]

If we say (\(\cdot\)) is ordinary multiplication (which for the reals, yields the classic norm), we run into \( |i|^2 = i \cdot i = i \times i = -1 \).

We can get rid of the pesky minus sign by conjugating the second argument before multiplication:

\[ |i|^2 = i \cdot i = i \times \bar{i} = i \times -i = 1 \]

and this fix works for all complex numbers since \(|z|^2 = z \bar{z}\).

But this implies (\(\cdot\)) is no longer necessarily symmetric: we treat the second argument differently to the first. There’s nothing we can do about this. We must accept complex numbers are more complex, and slightly loosen our first requirement.

Define an inner product on a complex vector space \(V\) to be a function \( ( \cdot ) : V \times V → \mathbb{C} \) satisfying:

  1. Conjugate symmetry: \[ \v \cdot \w = \overline{\w \cdot \v} \]

  2. Linearity in the first argument: \[ (a_1 \v_1 + a_2 \v_2) \cdot \w = a_1 (\v_1 \cdot \w) + a_2 (\v_2 \cdot \w) \]

  3. Positive definiteness: \[ \v \ne 0 \implies \v \cdot \v \ge 0 \]

It might seem the only use for these definitions are mathematical fun and games, because we typically model the space around us with \(\mathbb{R}^3\), with imaginary numbers left to the imagination. But it turns out physicists describe reality with complex inner product spaces.


Ben Lynn blynn@cs.stanford.edu 💡