Working sets

Key points:

   * The Problem: To manage memory effectively to maximize CPU utilization.
     Select highest degree of multiprogramming that doesn't run into
     excessive thrashing, and dole out memory to each running process.
   * Paper includes a history of research in this area. Focuses on the
     author's Working Set (WS) policy, and argues that it is nearly optimal.
   * `Working set' at time t is the set of pages which have been referenced
     at least once since time t-a, for some tuning parameter a. WS policy
     gives the program just enough memory to contain its `working set', and
     leaves exactly the pages in the `working set' resident.
   * Measuring per-process demand for memory: collect traces of a program's
     memory references, measure performance of the optimal lookahead memory
     management policy (which unfortunately can't be implemented because it
     looks into the future!), and use that as an upper bound on the
     performance possible for any implementable policy.
   * Theoretical models for program behavior and memory usage: locality,
     phase-transition models (a program undergoes long phases of very good
     locality, separated by short transitions of random memory accesses),
     measurements (2% of time is spent in transitions, 50% of page faults
     occur in transitions, fault rates 100 to 1000 times higher in
     transitions than in phases), queuing models, lots of boring theory and
     formulas.
   * Implementing optimal memory management: ``L = S criterion'' (heuristic:
     increase degree of multiprogramming until mean-time-between-page-faults
     is about equal to mean-time-to-service-a-fault), ``50 percent
     criterion'' (heuristic: increase degree of multiprogramming until swap
     device is about 50% utilized; closely related to ``L = S criterion''),
     global manager (adaptively sets degree of multiprogramming at optimal
     level suggested by heuristics), local managers (per-process manager
     monitors a program's `working set' and implements WS), WS empirically
     keeps space-time products near optimal (better than CLOCK, LRU, PFF;
     within 5%--30% of optimal lookahead policy).
   * Hardware device supporting WS: each page has counter and process
     identifier, a page reference sets counter to 0, clock interrupt
     increments every counter in the current process, when a page's counter
     overflows the page is considered removed from the `working set' and
     ought to be paged out.

Possible criticisms and areas for discussion:

   * The problem was motivated in the era of batch programming, where there
     were plenty of asynchronous jobs ready to run. Is this problem-- and
     controlling degree of multiprogramming-- even relevant anymore? (In
     other words, does thrashing ever really happen today?-- and does it
     ever happen because of memory contention from many programs?)
   * Lots of queuing theory, formal models, derived formulas. Need more
     performance measurements of real performance for common applications to
     validate them. Is memory management really a bottleneck?
   * Considering the main memory as a first-level cache for the swap device,
     Denning concentrates solely on reducing miss rates by improving block
     replacement strategies and per-process cache sizes. Interesting to see
     effect of WS on hit times (or miss times)-- is the extra complexity of
     managing WS worth the reduced miss rates, or does the complexity become
     a bottleneck? What about other strategies for reducing access times?