Back to index Threads and Input/Output in the Synthesis Kernel Henry Massalin and Calton Pu One-line summary: Repeated applications of Amdahl's law (optimize the common case) leads to a fast system. Overview/Main Points * Principle of Frugality: use the least powerful solution to a given problem. (Occam's razor for OS.) * Quaspace: quasi-address space - threads of execution live in these. They are subspaces of the physical address space, protection done by kernel with MMU. No virtual memory! * Quaject: quasi-object - encapsulation of procedures and data. Threads, hardware abstractions, device servers, etc. Composed of monitors, queues, schedulers, switches, pumps, and guages, connected with "quaject interfacer" and created with &quo;tquaject creator" (allocation, factorization, and optimization). * Kernel code synthesis: kernel code is dynamically synthesized on a per-thread basis to be specialized and thus short and fast. * Synchronization: Synthesis kernel is highly concurrent, so optimize the common case of synchronization. o Code Isolation - synthesize code for manipulating parts of data structures; each thread has its own code for updating its own thread-table entries, thus no synchronization is needed. o Procedure Chaining - serializes execution of conflicting threads by changing return addresses on stack. Eg - signal arriving in middle of interrupt handler. (Q - how do you know when something is potentially conflicting?) o Optimistic Synchronization - similar to RAS, but with actual rollback instead of atomic commit. Make changes assuming no synchronization needed, if detect conflict at the end (how??) then rollback the changes (how??) and retry (then what about starvation/perpetual conflict??) * Optimistic queues - optimize for specific context. o Single-producer/single-consumer: synchronization only needed if queue is empty/full. o Multiple-producer/single-consumer: need synchronization across producers only. Atomic multiple-insertion done by reserving space in queue atomically, then filling in with data and setting a valid flag for consumer. o Single-producer/multiple-consumer: similar o Multiple-producer/multiple-consumer: need full synchronization, so do it optimistically. * Threads: threads carry with them synthesized kernel code - custom I/O (open), interrupt handlers, error trap and signal handlers. User threads actually execute in the kernel on kernel calls - quaspaces used to isolate them. Waiting threads are procedure chained to the appropriate event code (such as tty driver queue) instead of having a general blocked code that needs traversal. (Interaction with priorities??) Context switches - don't save floating point registers unless thread actually uses them. (Illegal instruction traps to detect this, restart context switch code in this case.) Lack of a thread dispatcher - chain together threads directly in a circular queue, using procedure chaining on thread's synthesized context switch procedure. Vector tables associated with threads for interrupt handlers. Signals and interrupts - alter threads TTE to make thread jump to signal handler then reactived. Error traps - must be synchronous, so actually have error trap handler copy a kernel stack frame onto the user stack, modify return address of kernel stack to point to user error signal procedure, and execute a return from exception. Scheduling is done by changing the CPU quantum assigned to a thread, and reordering the ready queu (the circular TTE queue). * I/O:Data flows from hardware devices to quaspaces along a logical channel called a "stream". Multiple stages of device servers and filters are connected with this abstraction, which is a highly-optimized producer/consumer using optimistic queues. o passive consumer, active producer (or vice/versa): procedure calls, no synchronization necessary. o both passive: need pump quaject o both active: optimistic queue to mediate. Signals and interrupt handling - synthesized interrupt handlers that run in context of currently executing thread (!). Signals are delayed using procedure chaining. * Measurements: implemented SunOS emulator on top of synthesis, compared real SunOS with emulated SunOS. Synthesis was much faster (big surprise). Synthesis had a much smaller kernel (another big surprise) except when many threads were running. Relevance This paper proves that Amdahl's law can be taken to an extreme. Flaws * Kernel optimized for low-memory, few processes - not relevant for modern workstations or PCs. * Comparison with SunOS unfair, as SunOS can do much more than Synthesis (eg. virtual memory....) * Kernel written in 680x0 assembly as "fast prototyping language" - poor methodology, not at all a prototyping language. Makes for a very delicate kernel. * This paper was written in a world unto itself - look at references section. ------------------------------------------------------------------------ Back to index