Quicksort

Quicksort looks nice in Haskell:

qsort = \case
  [] -> []
  x:xs -> qsort as ++ (x : qsort bs) where (as, bs) = partition (< x) xs

qsort [2,7,1,8]

The input list should be shuffled first, as this code performs poorly on lists that are already sorted, or almost sorted, as well as their reversals. However, lack of shuffling works in our favour, because we’ll use permutations to analyze the behaviour of quicksort.

For simplicity, assume the list items are the naturals \([1..n]\). This means we’re ignoring cases involving duplicate values, which could be handled with a little more code. Anyway, the results are similar provided there are only a few of them.

The following code traces the comparisons that result from a given permutation:

qtrace xs = go [xs] where
  go = \case
    [] -> []
    xs:xss -> case xs of
      [] -> go xss
      x:xt -> map (x,) xt ++ go (as:bs:xss)
        where (as, bs) = partition (< x) xt

Sorting [1,2,3] takes 2 or 3 comparisons depending on whether our first pivot choice was the middle value:

qtrace [1,2,3]
qtrace [2,1,3]

With 7 items, we need between 10 and 21 comparisons:

qtrace [4,2,1,3,6,5,7]
qtrace [7,6,5,4,3,2,1]

In practice, instead of shuffling the input list, we can choose pivots randomly, which is less work since the number of pivots is smaller than the list size, typically logarithmically.

What did you expect?

Let \(a_n\) be the expected number of comparisons when quicksorting \(n\) distinct items. When \(n = 0\) there is nothing to be done. Otherwise each \(k \in [1..n]\) has a \(1/n\) probability of being chosen as the first pivot. We compare the pivot against the other \(n-1\) numbers, and recurse on both sides. Hence:

\[ \begin{align} a_0 &= 0 \\ a_n &= n-1 + \frac{2}{n} \sum a_k \mid k \in [0..n-1] \end{align} \]

brute = \case
  0 -> 0
  n -> n - 1 + 2 * sum [brute k | k <- [0..n-1]] / n

brute <$> [0..5]

We could memoize to reduce the exponential complexity to polynomial, but we can do even better.

Two numbers \(i < j\) are compared if and only if the first of \([i..j]\) to be chosen as a pivot is either \(i\) or \(j\), which occurs with probability:

\[ \frac{2}{j - i + 1} \]

By linearity of expectation:

\[ a_n = \sum_{1\le i < j \le n} \frac{2}{j - i + 1} = 2\left( \frac{n-1}{2} + \frac{n-2}{3} + …​ + \frac{1}{n} \right) \]

We have:

\[ \begin{align} a_n &= 2 \sum \frac{n-k+1}{k} \mid k \in [2..n] \\ &= 2 \sum \frac{n+1}{k} - 1 \mid k \in [2..n] \\ &= 2 \left( \sum \frac{n+1}{k} - 1 \mid k \in [1..n] \right) - 2 n \\ &= 2 (n+1) H_n - 4 n \end{align} \]

where \(H_n\) is the \(n\)th harmonic number:

\[ H_n = \sum \frac{1}{k} \mid k \in [1..n] \]

harmonic n = sum [1 / k | k <- [1..n]]

a n = 2 * (n + 1) * harmonic n - 4 * n

a <$> [1..10] :: [Rational]
a <$> [1..10] :: [Double]

Exercise: show that the generating function for this sequence is:

\[ \frac{2}{(1-x)^2} \left(\left(\int \frac{dx}{1-x}\right) - x\right) \]

We reuse code from our generating function notes to compute the first few coefficients:

jsEval "curl_module('../haskell/GenFun.ob')"
import GenFun

take 12 . toList $ (2/(1 - x 1)^2) * (int (1/(1 - x 1)) - x 1)

\[ H_n = \log n + \gamma + \frac{1}{2n} - \epsilon_n \]

where \(\gamma = 0.5771…​\) is the Euler-Mascheroni constant and \( 0 \le \epsilon_n \le 1/(8n^2) \). Thus the above bound implies the expected number of comparisons performed by quicksort is bounded above by \(2n \log n = O(n \log n)\).

For fun let’s derive a tighter approximation.

\[ \begin{align} a_n &= 2 (n+1) H_n - 4 n \\ &= 2 (n+1) (\log n + \gamma - \epsilon_n) + 1 + \frac{1}{n} - 4 n \end{align} \]

acoth k = sum $ take 16 $ zipWith (*) denoms $ iterate ((r*r)*) r
  where
  r = 1/k
  denoms = recip <$> iterate (2+) 1.0

log2 = 18*acoth 26 - 2*acoth 4801 + 8*acoth 8749

log = go 0 where
  go m x
    | x > 1 = go (succ m) (x / 2)
    | 2 * x <= 1 = go (pred m) (x * 2)
    | otherwise = m * log2 - tay (1 - x)
  tay x = sum $ take 20 $ zipWith (/) (iterate (x*) x) [1..]
gamma = 0.57721566490153286

ish n = 2*(n+1)*(log n + gamma) + 1 + 1/n - 4*n

simple n = 2*n*log n

So far so good, but what if quicksort performs far better than expected half the time, but far worse otherwise? Such unpredictability may disqualify it from many applications. We’d like the the number of comparisons to rarely stray from its expected value.

This is Problem 4.14 from Motwani and Raghavan, namely, to show that quicksort "runs in time \(O(n \log n)\) with high probability". It appears immediately after the section that explains Chernoff-like bounds that work even when dependencies exist.

Martingales and Tail Inequalities

Consider the probability density function:

\[ E[X | Y = y] \mapsto \Pr[Y = y] \]

In the literature, this random variable is denoted \(E[X|Y]\), but despite its brevity and cute appearance, we avoid it because of a type mismatch with our fundamental definitions, where only propositions may appear on the right-hand side of a vertical bar. Instead, we’ll write:

\[ y \mapsto E[X | Y = y] \]

It’s clear that:

\[ E[ y \mapsto E[X | Y = y] ] = E[X] \]

while in conventional notation, \(E[E[X|Y]] = E[X]\) seems mysterious.

A sequence of real-valued random variables \(Z_0, Z_1, …​\) is a martingale if:

\[ t \mapsto E[Z_{n+1} | (Z_n, …​, Z_0) = t] = Z_n \]

where \(t\) is an appropriate tuple.

Doob noted that given any sequence of random variables \(X_0, X_1,…​\), and any other random variable \(Q\), we can construct a martingale \(Z_0,Z_1,…​\) simply by defining:

\[ Z_n = t \mapsto E[Q|(X_0,…​,X_n) = t] \]

Define:

\[ Y_k = Z_k - Z_{k-1} \]

Then if \(a_k \le Y_k \le b_k\), we have the Hoeffding-Azuma inequality:

\[ \Pr[Y_1 + …​ + Y_n \ge x] \le \exp\left( -\frac{2x^2}{(b_1 - a_1)^2 + …​ + (b_n - a_n)^2} \right) \]

There is a two-sided form which bounds the absolute value of the sum of the \(Y_k\) causing the right-hand side to double, but we only care about far more comparisons than expected; it’s nice when it takes far fewer!

Do the obvious?

The placement of the Problem 4.14 suggests we can make some obvious substitutions into the above formulas to derive an exponentially vanishing bound for running times that exceed \(O(n \log n)\).

Let \(Q\) be the number of comparisons, and \(X_k\) be the \(k\)th pivot choice. We need at most \(n\) pivots.

For a set of \(t\) values, the worst pivot choice is the mininum or maximum value, leaving \(t - 1\) values to sort, that is, we perform \(t - 1\) comparisons for almost no reason. Thus by the Hoeffding-Azuma inequality:

\[ \begin{align} \Pr[Z_n - Z_0 \ge x] &\le \exp(-2x^2 / (1^2 + …​ + (n-1)^2)) \\ &\le \exp(-6x^2 / n^3) \end{align} \]

where we have used:

\[ 1^2 + …​ + n^2 = n(n+1)(2n + 1) / 6 \]

Well, that didn’t work. We’ve shown it is unlikely quicksort does worse than \( O(n^{3/2}) \), which beats \(O(n^2)\), but our goal was \(O(n \log n) \). I don’t know of an easy way to reach it.

McDiarmid and Hayward establish decent bounds with a derivation that takes many pages.

"Proof" by eyeball

Let \(f_n(x)\) be the polynomial representing the distribution of the number of comparisons required to quicksort all permutations of \(n\) items, where the coefficient of \(x^k\) is the number of inputs that require \(k\) comparisons. Then:

\[ \begin{align} f_0 &= 1 \\ f_{n+1} &= x^n \sum {n \choose k} f_k f_{n-k} \mid k \in [0..n] \end{align} \]

We can translate this directly to code:

choose n k
  | n == 0 || k == 0 = 1
  | k > n = 0
  | otherwise = n * choose (n - 1) (k - 1) `div` k

f = \case
  0 -> 1
  n1 | n <- pred n1 -> x 1^n * sum [fromIntegral (choose n k) * f k * f (n - k) | k <- [0..n]]

toList $ f 5

However, it’s faster if we compute Pascal’s triangle with additions only, and reuse computed results:

fs = fst <$> iterate go ([1], [1]) where
  go (ps, pascal) = (ps', pascal')
    where
    ps' = x 1^n * (sum $ zipWith (*) (fromIntegral <$> pascal) $ zipWith (*) ps (reverse ps)) : ps
    n = length ps - 1
    pascal' = 1 : zipWith (+) pascal (tail pascal) ++ [1]

Our code is still inefficient as we multiply polynomials with a naive algorithm, but it’s good enough for our purposes.

Let’s check the \(n=3\) case. We only need 2 comparisons if we pick the middle element as the first pivot, and there are 2 ways this can happen. Otherwise we need 3 comparisons, and there are 4 ways this can happen.

toList $ head $ fs!!3

We hack together a routine to visualize a distribution as an SVG diagram:

draw ns = jsEval $ "runme_out.insertAdjacentHTML('beforeend',`" ++ svg ++ "`);"
  where
  ds = fromIntegral <$> ns :: [Double]
  m = foldr1 max ds
  scaled = (/m) <$> ds
  rect x y = concat
    [ "<rect x='", show x, ".2' y='", show $ 1 - y, "' width='0.6' height='", show y, "' />"
    ]
  svg = concat $
    [ "<svg style='background-color:#dfdfdf;' "
    , " width=640"
    , " height=240"
    , " viewBox='", unwords (map show [0,0,length scaled,1]), "'"
    , " preserveAspectRatio='none'"
    , ">"
    ] ++ zipWith rect [0..] scaled ++
    [ "</svg>"
    ]

Again, we check the \(n = 3\) case:

draw $ toList $ head $ fs!!3

By \(n = 7\) we see that higher values are unlikely:

draw $ toList $ head $ fs!!7

For \(n = 20\) the tail is invisible; our code plots every value up to the worst case, but it looks like we just specified a large right margin:

draw $ toList $ head $ fs!!20

Trying other values yields the same shape. Thus by non-mathematical induction, quicksort performs \(O(n\log n)\) comparisons with high probability!


Ben Lynn blynn@cs.stanford.edu 💡