Haskell - The Shortest Vector Problem

= The Shortest Vector Problem =

Pick a polynomial with rational coefficients with a real root. Enter the root
and the degree of the polynomial, and click the button. Our program will try
to guess your polynomial!

+++<div id="magic"></div><br><div id="out"></div>+++

(The answer for the default values should be +\(x^3 + 2x^2 + 3x + 4\)+.)

How does this work? See László Lovász, _An Algorithmic Theory of Numbers,
Graphs, and Convexity_. We use the LLL algorithm, which finds an approximate
solution to the _shortest vector problem_.

*Shortest Vector Problem*: Given a basis \( \newcommand{\vv}[1]{\mathbf{#1}}
\vv{b}_1,...,\vv{b}_n\) of \(\mathbb{R}^n\), find a non-zero element of the
additive subgroup \(L\) generated by the basis (known as a _lattice_) with the
smallest norm. That is, if we let \(\vv{B}\) denote the matrix of the basis
vectors, then:

\[ L = \{ \vv{B} \vv{a} | \vv{a} \in \mathbb{Z}^n \} \]

and we wish to find a nonzero element \(\vv{v} \in L\) such that
\( \lVert \vv{v}\rVert \) is minimal.

By norm, we mean the Euclidean norm, that is,

\[ \lVert \vv{v}\rVert = \sqrt{v_1^2 + ... + v_n^2} \]

where the \(v_i\) are the coordinates of \(\vv{v}\) with respect to some
orthonormal basis.

This problem is hard (NP-hard; Lovász suspects it is NP-complete). Indeed, how
do we even prove a a shortest vector must exist in the
first place? Perhaps we are doomed to only find vectors whose norms are
arbitrarily close to, but strictly greater than some limit?

== Easy does it ==

As Frege wrote: "When a problem appears unsolvable in its full generality, one
should temporarily restrict it; perhaps it can then be conquered by a gradual
advance."
Accordingly, we first consider orthogonal lattices.
Let \(\vv{b}_1,...,\vv{b}_n\) be an orthogonal basis sorted in order of
increasing norm. Then \(\vv{b}_1\) is a shortest vector.

More rigorously, for any integer \(k \in [1..n]\) and any \(a_1,...,a_n \in
\mathbb{Z}^n\), by orthogonality:

\[ \lVert \sum_i a_i \vv{b}_i \rVert
= \sum_i \lVert a_i \vv{b}_i \rVert
\ge \lVert a_k \vv{b}_k \rVert \]

And for any nonzero integer \(a\):

\[ \lVert \vv{b}_1 \rVert \le \lVert \vv{b}_k \rVert \le \lVert a \vv{b}_k \rVert \]

Thus \(\vv{b}_1\) is a shortest vector.

== Two's company ==

There is little to be said in zero or one dimensions. Two dimensions is a big
step up in difficulty.

Inspired from above, we strive to become as orthogonal as possible: we find
some integer \(r\) so that the angle between \(\vv{b}_1\) and:

\[ \vv{b}'_2 = \vv{b}_2 + r \vv{b}_1 \]

is close as possible to a right angle. Observe the basis
\(\vv{b}_1, \vv{b}'_2\)
generates the same lattice as \(\vv{b}_1, \vv{b}_2\). Hopefully, we
can then extend our reasoning above enough to find a shortest vector.

It's easiest to explain if we first rotate and scale so that one basis vector
is \(\vv{b}_1 = (1\enspace 0)\). This transforms our task to adding some
integer to the first coordinate of \(\vv{b}_2\) so it is close as possible to
zero. We simply subtract an integer closest to the first coordinate of
\(\vv{b}_2\). Let \(\vv{b}'_2 = (x\enspace y)\) be the result. Then we have:

\[-\frac{1}{2} \le x \le \frac{1}{2}\]

We could insist on strictness in one of the inequalities, but this has no
effect on our arguments. For any integer \(n\):

\[ \lVert n \vv{b}_1 + \vv{b}'_2 \rVert^2 = (n + x)^2 + y^2 \ge x^2 + y^2 = \lVert \vv{b}'_2 \rVert^2 \]

thus \(\vv{b}'_2\) is the shortest vector of this form.

Suppose:

\[ \lVert \vv{b}_1 \rVert \le \Vert \vv{b}'_2 \rVert \]

that is:

\[ 1 \le \sqrt{x^2 + y^2} \]

Since \(x^2 \le \frac{1}{4}\) we must have \(y^2 \ge \frac{3}{4} \).

Thus for any integers \(r, s\) such that \(|s| \ge 2\):

\[ \lVert r \vv{b}_1 + s \vv{b}'_2 \rVert^2 \ge s^2 y^2 \ge 3 \gt 1 = \lVert \vv{b}_1 \rVert^2 \]

Therefore, if \( \lVert \vv{b}_1 \rVert \le \Vert \vv{b}'_2 \rVert \), then
\(\vv{b}_1\) is a shortest vector, as we have just seen adding any nonzero
integer multiple of \(\vv{b}'_2\) results in a vector that is at least as long.
Undoing the scaling and rotation yields our original \(\vv{b}_1\).

But what if \( \lVert \vv{b}_1 \rVert \gt \Vert \vv{b}'_2 \rVert \)? In this
case, we rename \( \vv{b}'_2, \vv{b}_1 \) to \( \vv{b}_1, \vv{b}_2 \)
respectively, undo the scaling and rotation, and go back to the first step.

This shortest vector algorithm for two dimensions resembles Euclid's algorithm,
and it is known as Lagrange's algorithm, Gauss's algorithm, or the
Lagrange-Gauss algorithm. It seems precedence should go to
https://arxiv.org/pdf/2106.14665[Lagrange, who wrote about it in 1773]; he
viewed it as a problem about quadratic forms.

Lagrange's algorithm is a few lines.

[ ]:

nearestZ (a :% b) = if 2*r > b then q + 1 else q where
  (q, r) = divMod a b

dot = (sum .) . zipWith (*)
quadrance v = dot v v

lagrangeReduce [u, v] = [u, (zipWith (-) v $ (m*) <$> u)] where
  m = nearestZ $ dot v u % dot u u

lagrange b
  | quadrance u <= quadrance v = u
  | otherwise = lagrange [v, u]
  where
  [u, v] = lagrangeReduce b

lagrange [[123,456],[123,455]]
lagrange [[123,456],[60,240]]

However, the complexity of this algorithm is not obvious, and indeed, how do we
know it even terminates? It turns out to be a polynomial-time algorithm:

 * https://prof.msoltys.com/wp-content/uploads/2015/04/soltys-gauss11.pdf[a proof by Michael Soltys].
 * https://perso.ens-lyon.fr/damien.stehle/downloads/lowdim-final.pdf[Phong and
Stehlé walk through two proofs]; they also generalize Lagrange's algorithm to
efficiently solve the cases \(n = 3, 4\).

== Close enough ==

As the number of dimensions \(n\) increases, our troubles grow exponentially,
though it could be worse. We should be grateful that
https://cseweb.ucsd.edu/~daniele/LatticeLinks/SVP.html[exponential-time
algorithms exist].

Nevertheless, we can still obtain useful results in polynomial time. The LLL
algorithm, a sort of sloppy Euclid's algorithm for more than two things, given
a basis of \(\mathbb{Q}^n\), returns a shortest vector up to a factor of
+\(2^{(n-1)/2}\)+. Moreover, for a fixed dimension, it implies a
polynomial-time algorithm for finding a shortest vector.

Again, we strive for orthogonality. We can convert any basis
\(\vv{b}_1,...,\vv{b}_n\) of \(\mathbb{R}^n\) into an orthogonal basis
+\(\vv{b}^*_1,...,\vv{b}^*_n\)+ inductively. Start with:

+\[\vv{b}^*_1 = \vv{b}_1\]+

then iteratively set +\(\vv{b}^*_k\)+ to be the rejection of \(\vv{b}_k\) on
the subspace generated by the basis vectors found so far, that is, the part of
\(\vv{b}_k\) perpendicular to each of +\(\vv{b}^*_1,...,\vv{b}^*_{k-1}\)+
which we can compute by subtracting away projections:

+\[
\vv{b}^*_k =
\vv{b}_k - \sum_{i=1}^{k-1} \frac{\vv{b}_k \cdot \vv{b}^*_i}{\vv{b}^*_i \cdot \vv{b}^*_i } \vv{b}^*_i
\]+

The result is known as the
https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process[_Gram-Schmidt
orthogonalization_] of the original basis. Although the \(\vv{b}^*_i\) generate
a distinct lattice in general, the LLL algorithm uses the Gram-Schmidt
orthogonalization to find a lower bound of the shortest vector, and also to
define an \(n\)-dimensional analogue of a basis that is orthogonal as possible.

[ ]:

gramSchmidt = go [] . map (map (% 1)) where
  go acc = \case
    [] -> reverse acc
    h:t -> go (foldr (zipWith (+)) h (muMul h <$> acc):acc) t
  muMul w v = (mu w v *) <$> v

mu w v = -(dot w v / dot v v)

*Lemma*: Let \(\vv{v}\) be a nonzero vector in the lattice
generated by \(\vv{b}_1,...,\vv{b}_n\). Then:

\[ \lVert \vv{v} \rVert \ge \min\{\lVert \vv{b}^*_1 \rVert,...,\lVert \vv{b}^*_n \rVert \} \]

In particular, this bound applies to the shortest vector.

*Proof*: Let:

+\[ \vv{v} = \sum_{i=1}^k a_i \vv{b}_i
= \sum_{i=1}^{k-1} a_i \vv{b}_i
+ a_k \vv{b}_k
\]+

where \(a_k \ne 0\).

With respect to the Gram-Schmidt orthogonalization of the basis:

+\[ \vv{v} = \sum_{i=1}^{k-1} a^*_i \vv{b}^*_i + a^*_k \vv{b}^*_k
\]+

Taking the inner product with \(\vv{b}^*_k\) in both equations implies:

\[ a_k \vv{b}_k \cdot \vv{b}^*_k = a^*_k \vv{b}^*_k \cdot \vv{b}^*_k \]

since by construction \(\vv{b}^*_k\) is perpendicular to the first \(k - 1\)
vectors of either basis. Also by construction \( \vv{b}_k \cdot
\vv{b}^*_k = \vv{b}^*_k \cdot \vv{b}^*_k \) hence:

\[ a^*_k = a_k \]

and in particular a nonzero integer. Thus:

+\[ \lVert\vv{v}\rVert^2 = \sum_{i=1}^{k} |a^*_i|^2 \lVert\vv{b}^*_i\rVert
\ge |a_k|^2 \lVert\vv{b}^*_k\rVert \ge \lVert\vv{b}^*_k\rVert
\]+

which implies the lemma.

Write:

+\[ \mu_{ji} = - \frac{\vv{b}_j \cdot \vv{b}^*_i}{\vv{b}^*_i \cdot \vv{b}^*_i }
\]+

for the coefficients in the Gram-Schmidt orthogonalization. If all of them
satisfy:

\[ |\mu_{ji}| \le \frac{1}{2} \]

then we call \(\vv{b}_1,...,\vv{b}_n\) a _weakly reduced_ basis.

We can convert any lattice basis to a weakly reduced basis with the same
Gram-Schmidt orthogonalization by repeating the following up to
\( \binom{n}{2} \) times: choose \(j\) as large as possible such that
\(|\mu_{kj}| \gt \frac{1}{2} \) for some \(k\). Then replace \(\vv{b}_k\) with
\(\vv{b}_k - m \vv{b}_j\), where \(m\) is an integer closest to \(\mu_{kj}\).

Our weak reduction routine expects a lattice basis along with its Gram-Schmidt
orthogonalization. The return value also takes this form, as this will be
easier to work with later. For testing, we add a wrapper that hides
Gram-Schmidt business from the caller.

[ ]:

lllWeakReduce' = go [] . reverse where
  go as = \case
    [] -> as
    (h@(b, bo):bs) -> go (h:map (first tweak) as) bs where
      tweak a = zipWith (+) a $ (m*) <$> b where
        m = nearestZ $ -(dot (map (% 1) a) bo / dot bo bo)
lllWeakReduce b = map fst $ lllWeakReduce' $ zip b $ gramSchmidt b

lllWeakReduce [[1,2,3],[100,101,102], [55,-10,-20]]

This generalizes our orthogonal-as-possible basis \(\vv{b}_1, \vv{b}'_2\) above
when we studied the two-dimensional case.

Back then, we wanted the first vector to be no longer than the second, and if
this wasn't the case, then we interchanged the vectors and repeated the steps.
This time, we merely desire that:

+\[ \lVert \vv{b}^*_i \rVert^2 \le \frac{4}{3} \lVert \vv{b}^*_{i+1} + \mu_{(i+1),i}\vv{b}^*_i \rVert^2 \]+

for all \(i \in [1..n]\). A _reduced_ basis is a weakly reduced basis
satisfying this condition, known as the _Lovász condition_.

As you might have guessed, we can turn any basis into a reduced basis by
first weakly reducing it, then if necessary, interchanging any \(\vv{b}_i,
\vv{b_{i+1}}\) violating this inequality and going back to the first step.

Like our weak reduction function, our reduction function also expects the
Gram-Schmidt orthogonalization along with the lattice basis. With a couple of
vector additions, we update the Gram-Schmidt basis after an interchange. Both
weak reduction and reduction need the Gram-Schmidt basis, so we maintain it and
pass it around rather than recompute it every time.

[ ]:

lll' = go [] . lllWeakReduce' where
  go acc = \case
    [] -> fst <$> reverse acc
    h@(b, bo):rest -> case acc of
      [] -> go [h] rest
      (a, ao):at -> let
        rhs = (mu (map (% 1) b) ao *) <$> ao
        bo' = zipWith (-) bo rhs
        ao' = zipWith (+) ao $ (mu (map (% 1) a) bo' *) <$> bo'
        in if 3 * quadrance ao <= 4 * quadrance (zipWith (+) bo rhs)
          then go (h:acc) rest
          else lll' $ reverse at ++ ((b, bo') : (a, ao') : rest)
lll b = lll' $ zip b $ gramSchmidt b

lll [[1,2,3,4],[-5,6,7,8],[9,-10,11,-12],[13,-14,-15,-16]]

As you might have also guessed, the 4/3 factor is important for proving that
this algorithm terminates in polynomial time for rational inputs, which we will
do soon, and also for estimating how close we are to the true answer.

+\[
\begin{align}
\lVert \vv{b}^*_i \rVert^2
   & \le \frac{4}{3} \lVert \vv{b}^*_{i+1} + \mu_{(i+1),i}\vv{b}^*_i \rVert^2
\\ & = \frac{4}{3} \lVert \vv{b}^*_{i+1} \rVert^2 + \frac{4}{3} \mu_{(i+1),i}^2 \lVert \vv{b}^*_i \rVert^2
\\ & \le \frac{4}{3} \lVert \vv{b}^*_{i+1} \rVert^2 + \frac{1}{3} \lVert \vv{b}^*_i \rVert^2
\end{align}
\]+

where the last step is a consequence of the basis being weakly reduced.
Then:

+\[
\lVert \vv{b}^*_{i+1} \rVert^2 \ge \frac{1}{2} \lVert \vv{b}^*_i \rVert^2
\]+

By induction:

+\[
\lVert \vv{b}_1 \rVert^2 \le 2^{n-1} \min_i \lVert \vv{b}^*_i \rVert^2
\]+

Thus by the lemma, for any nonzero vector \(\vv{v}\) on the lattice, including
a shortest vector:

+\[
\lVert \vv{b}_1 \rVert \le 2^{(n-1)/2} \lVert \vv{v} \rVert
\]+

Now we address the running time when given a rational basis.
Define:

\[
D(\vv{b}_1, ..., \vv{b}_n) = \prod_{i=1}^n \lVert \vv{b}^*_i \rVert^{n-i}
\]

Our weak reduction step above leaves the Gram-Schmidt orthogonalization
unchanged, so also leaves this quantity unchanged.

Our reduction step shrinks this quantity by a factor of at least
\(2/\sqrt{3}\).

We can scale up the input vectors so all the coordinates are integers, so that
we exclusively work with matrices over the integers during the algorithm. It is
known that the https://en.wikipedia.org/wiki/Gram_matrix#Gram_determinant[Gram
determinant] of a set of vectors is the square of the volume of the
parallelotope they form, hence:

\[
\prod_{i=1}^k \lVert \vv{b}^*_i \rVert^2 =
\begin{vmatrix}
\vv{b}_1 \cdot \vv{b}_1 & ... & \vv{b}_1 \cdot \vv{b}_k \\
\vdots & \ddots & \vdots \\
\vv{b}_1 \cdot \vv{b}_k & ... & \vv{b}_k \cdot \vv{b}_k \\
\end{vmatrix}
\ge 1
\]

where the inequality is justified by noting every entry of the determinant is
an integer and that the square of a volume must be positive. This implies:

\[
D(\vv{b}_1, ..., \vv{b}_n) \ge 1
\]

Lastly, if the input is the matrix \(\vv{A} = (\vv{a}_1, ..., \vv{a}_n\), then
we have the crude bound:

\[
D(\vv{a}_1, ..., \vv{a}_n) = (\max_i \lVert \vv{a}_i \rVert)^\binom{n}{2}
\]

Therefore, after \(m\) steps:

\[
1 \le D(\vv{b}_1,...,\vv{b}_n) \le \left( \frac{\sqrt{3}}{2} \right)^m
(\max_i \lVert \vv{a}_i \rVert)^\binom{n}{2}
\]

Rearranging, and since \( 1 / \lg (2/\sqrt{3}) = 4.8188... \)

\[
m \le 5 \binom{n}{2}\max_i \lg \lVert \vv{a}_i \rVert
\le 5 \binom{n}{2} \langle \vv{A} \rangle
\]

where \(\langle \vv{A} \rangle\) denotes the number of bits needed to represent
\(\vv{A}\). In other words, the LLL algorithm terminates after a polynonial
number of steps. (Lovász has 6 here; I may have blundered along the way.)

It remains to show the arithmetic operations stay within polynomial bounds. We
skip this, though we mention that it appears this is the only part of the proof
that depends on \(\mu_{ji} \le 1/2 \) for all valid indices; above, we only
need this inequality when \(j = i + 1.\)

== Get Shorty ==

Let \(\vv{b}_1, ..., \vv{b}_n\) be a reduced basis of \(\mathbb{R}^n\),
and \(\vv{v} = \sum_i a_i \vv{b}_i\) be a vector on the lattice it generates.
Then it can be shown for all \(i\):

\[
|a_i| \lt 3^n \lVert \vv{v} \rVert / \lVert \vv{b}_1 \rVert
\]

Thus if we view \(n\) as a fixed constant, then we can find a shortest vector
by trying all possible +\(a_i \in [-3^n..3^n] \)+. This also implies a shortest
vector must exist.

[ ]:

brutalSVP = go . lll where
  go b = snd $ foldr1 min $ filter ((/= 0) . fst) $ zip (quadrance <$> vs) vs where
    n = length b
    vs = lincom b <$> replicateM n [-3^n..3^n]

lincom b as = foldr1 (zipWith (+)) $ zipWith scale b as where
  scale v a = map (a*) v

== Relationship Advice ==

Let \(\vv{x} \in \mathbb{R}^n\) be nonzero. An integer relation between the
coefficients of \(\vv{x}\) exists if for some \(\vv{a} \in \mathbb{Z}^n\) we
have:

\[ \vv{a} \cdot \vv{x} = 0 \]

We can find an approximation \(vv{a}\) for any rational approximation of
\(\vv{x}\) by running the LLL algorithm on the matrix:

\[
\pmatrix{
Qx_1 & Qx_2 & ... & Qx_n \\
1   & 0   & ... & 0 \\
0   & 1   & ... & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0   & 0   & ... & 1
}
\]

Here, we only have \(n\) column vectors, but each vector has \(n + 1\)
coordinates. This is no problem because the LLL algorithm works just fine even
when the number of vectors is less than the number of dimensions. We can
imagine filling in the empty spots at the end of the list with phantom vectors
that satisfy the Lovász condition.

We choose \(Q\) large enough so we can bound \( |\vv{a} \cdot \vv{x}| \).
Lovász leaves the details as an exercise. I took a stab at it: we can add
another vector:

\[\vv{y} = \pmatrix{M^2 n \\ 0 \\ \vdots \\ 0}\]

where \(M\) is the largest of the \(Qx_i\), which exceeds 1 for sufficiently
large \(Q\). Then the vector \(\vv{b_1}\) returned by the LLL algorithm cannot
involve any multiples of \(\vv{y}\), because it is shorter than any of the
original column vectors. Roughly speaking, if we assign \(\vv{y}\) a nonzero
coefficient and try to make up for it, then one of the other coefficients must
be at least \(M\), which leads to a linear combination that must be longer than
the original vectors. The determinant of the lattice is \(M^2 n\) which
we can use to estimate bounds.

Lovász is more explicit in the case of an approximation \(\alpha\) to a real
root \(r\) of a polynomial. We set:

\[x_1 = 1, x_2 = r, ..., x_n = r^d\]

where \(d\) is the degree of the minimal polynomial. Then it can be shown
if \(Q\) is large enough and \(r\) is close enough to \(\alpha\) then the LLL
algorithm returns the coefficients of the minimal polynomial.

In particular, it can be shown shown taking +\(Q = 2^{3k^3}\)+ suffices,
where \(k\) is the input size of the polynomial, provided that
+\(|r - \alpha| < 2^{-4k^3}\)+.

[ ]:

minpoly n r = head $ lll $ zipWith (:) scaled im where
  scaled = map (nearestZ . (1000000*)) $ iterate (r*) 1
  im = map (take n) $ take n $ iterate (0:) $ 1 : repeat 0

Lastly, we write a widget to show off this function:

[ ]:

showsXPow k
  | k == 0 = id
  | k == 1 = ('x':)
  | otherwise = ("x<sup>"++) . shows k . ("</sup>"++)

showTerm n k
  | n == 0 = id
  | n == 1 = ('+':) . showsXPow k
  | n == -1 = ('-':) . showsXPow k
  | n < 0 = shows n . showsXPow k
  | otherwise = ('+':) . shows n . showsXPow k

magic = do
  (a, b) <- break (== '.') <$> jsEval "root.value;"
  deg <- fromIntegral . readInteger <$> jsEval "degree.value;"
  let
    r = case b of
      [] -> readInteger a % 1
      _:d -> let
        w = readInteger a
        tens = 10^length d in (w * tens + signum w * readInteger d) % tens
    mp = tail $ minpoly (deg + 1) r
    s = foldr ($) "" $ reverse $ zipWith showTerm mp [0..deg]
  jsEval $ "out.innerHTML = '"++s++"';"

jsEval [r|
magic.innerHTML="Root: <input id='root' type=='text' size='8' value='-1.650629'> Degree: <input id='degree' type='text' size='2' value='3'> <button id='magic_button'>Polynomial</button>";
magic_button.addEventListener("click", (ev) => {
  repl.run("chat", ["Main"], "magic");
});
|]

_Computing in Science & Engineering_ named integer relation detection as one of
the https://www.computer.org/csdl/magazine/cs/2000/01/c1022/13rRUxBJhBm[top 10
algorithms of the 20th century]. They credit the 1979 algorithm of Ferguson and
Forcade, though it seems
https://en.wikipedia.org/wiki/Integer_relation_algorithm[the LLL algorithm was
the first to be published with complete proofs]. Ferguson and others improved
the original approach, and now
https://www.davidhbailey.com/dhbpapers/pslq-comp-alg.pdf[PSLQ is the most
widely used algorithm for integer relation detection].

Ben Lynn blynn@cs.stanford.edu 💡

Contents