Back to index

Threads and Input/Output in the Synthesis Kernel

Henry Massalin and Calton Pu

One-line summary: Repeated applications of Amdahl's law (optimize the common
case) leads to a fast system.

Overview/Main Points

   * Principle of Frugality: use the least powerful solution to a given
     problem. (Occam's razor for OS.)
   * Quaspace: quasi-address space - threads of execution live in these.
     They are subspaces of the physical address space, protection done by
     kernel with MMU. No virtual memory!
   * Quaject: quasi-object - encapsulation of procedures and data. Threads,
     hardware abstractions, device servers, etc. Composed of monitors,
     queues, schedulers, switches, pumps, and guages, connected with
     "quaject interfacer" and created with &quo;tquaject creator"
     (allocation, factorization, and optimization).
   * Kernel code synthesis: kernel code is dynamically synthesized on a
     per-thread basis to be specialized and thus short and fast.
   * Synchronization: Synthesis kernel is highly concurrent, so optimize the
     common case of synchronization.
        o Code Isolation - synthesize code for manipulating parts of data
          structures; each thread has its own code for updating its own
          thread-table entries, thus no synchronization is needed.
        o Procedure Chaining - serializes execution of conflicting threads
          by changing return addresses on stack. Eg - signal arriving in
          middle of interrupt handler. (Q - how do you know when something
          is potentially conflicting?)
        o Optimistic Synchronization - similar to RAS, but with actual
          rollback instead of atomic commit. Make changes assuming no
          synchronization needed, if detect conflict at the end (how??) then
          rollback the changes (how??) and retry (then what about
          starvation/perpetual conflict??)
   * Optimistic queues - optimize for specific context.
        o Single-producer/single-consumer: synchronization only needed if
          queue is empty/full.
        o Multiple-producer/single-consumer: need synchronization across
          producers only. Atomic multiple-insertion done by reserving space
          in queue atomically, then filling in with data and setting a valid
          flag for consumer.
        o Single-producer/multiple-consumer: similar
        o Multiple-producer/multiple-consumer: need full synchronization, so
          do it optimistically.
   * Threads: threads carry with them synthesized kernel code - custom I/O
     (open), interrupt handlers, error trap and signal handlers. User
     threads actually execute in the kernel on kernel calls - quaspaces used
     to isolate them. Waiting threads are procedure chained to the
     appropriate event code (such as tty driver queue) instead of having a
     general blocked code that needs traversal. (Interaction with
     priorities??)

     Context switches - don't save floating point registers unless thread
     actually uses them. (Illegal instruction traps to detect this, restart
     context switch code in this case.) Lack of a thread dispatcher - chain
     together threads directly in a circular queue, using procedure chaining
     on thread's synthesized context switch procedure. Vector tables
     associated with threads for interrupt handlers.

     Signals and interrupts - alter threads TTE to make thread jump to
     signal handler then reactived. Error traps - must be synchronous, so
     actually have error trap handler copy a kernel stack frame onto the
     user stack, modify return address of kernel stack to point to user
     error signal procedure, and execute a return from exception.

     Scheduling is done by changing the CPU quantum assigned to a thread,
     and reordering the ready queu (the circular TTE queue).

   * I/O:Data flows from hardware devices to quaspaces along a logical
     channel called a "stream". Multiple stages of device servers and
     filters are connected with this abstraction, which is a
     highly-optimized producer/consumer using optimistic queues.
        o passive consumer, active producer (or vice/versa): procedure
          calls, no synchronization necessary.
        o both passive: need pump quaject
        o both active: optimistic queue to mediate.
     Signals and interrupt handling - synthesized interrupt handlers that
     run in context of currently executing thread (!). Signals are delayed
     using procedure chaining.
   * Measurements: implemented SunOS emulator on top of synthesis, compared
     real SunOS with emulated SunOS. Synthesis was much faster (big
     surprise). Synthesis had a much smaller kernel (another big surprise)
     except when many threads were running.

Relevance

This paper proves that Amdahl's law can be taken to an extreme.

Flaws

   * Kernel optimized for low-memory, few processes - not relevant for
     modern workstations or PCs.
   * Comparison with SunOS unfair, as SunOS can do much more than Synthesis
     (eg. virtual memory....)
   * Kernel written in 680x0 assembly as "fast prototyping language" - poor
     methodology, not at all a prototyping language. Makes for a very
     delicate kernel.
   * This paper was written in a world unto itself - look at references
     section.

  ------------------------------------------------------------------------
Back to index