Unix Implementation

Ken Thompson

Overview

   * provide a kernel with the least common denominator of what users would
     want the system to do for them
   * everything outside of the kernel can be changed by users
   * value simplicity over efficiency

Processes

   * user processes request system functions by performing a system call, a
     subroutine call which turns the user processes into system processes
   * each system process has its own stack, for protection purposes
   * user processes get their code from shared, read-only text segments;
     this allows sharing of text pages in memory, as well as obviating the
     need to swap these pages out (the originals are already on disk)
   * user processes have a private, read-write data segment, which consists
     of an automatically managed stack, and a manually managed heap;
     processes also have an unaccessible (except to the kernel) system data
     segment, which stores things like file descriptor tables, etc.
   * a process table has one entry per process, each entry containing, among
     other things, pointers to the text table (which points to the text
     segment) and to the data segments
   * a process is created by having another process do a fork; the new child
     is an exact copy of the old parent, except for the value of one
     register, which the two processes can use to differentiate themselves
   * a parent can wait for the termination of any of its children
     (rendezvous)
   * new programs are executed by having a process exec the file containing
     the program; a new process is not created, but rather, the old process
     simply begins executing the new program; it will not resume the old
     program when the new one terminates (like goto)
   * programs in primary memory are swapped (not paged) to and from primary
     memory: either the whole program is in primary memory, or none of it is
     (modulo shared text segments)
   * the swapper always stays in memory, and decides what processes to swap
     in and out, depending mostly on how long they have been in/out of
     primary memory
   * processes wait on events; other processes that signal these events
     cause all processes waiting on them to wake up
   * all processes except one are waiting on an event at any time; when the
     running process waits, another process is chosen based on a dynamic
     priority system (system processes have higher priority than user
     processes; user process priorities dynamically change with the amount
     of service they receive)

I/O

   * two kinds of I/O devices: block and character: block devices appear to
     be files of randomly-accessed 512-byte blocks; character devices can be
     almost anything else
   * block I/O is done via a block cache; writes are always asynchronous,
     unless all dirty blocks are explicitly flushed to disk (which happens
     periodically, automatically); this can lead to inconsistencies,
     especially since data may be written to disk in a different order than
     they were written by the application
   * character I/O to character-at-a-time devices like terminals and paper
     tape readers or writers are handled via character queues (FIFOs);
     terminals also have a special canonical input queue which contains
     line-at-a-time input, the system having parsed control characters such
     as erase and kill

File System

   * each file is an array of bytes, divided into a sequence of 512-byte
     blocks
   * the space on the disk is divided into space for inodes and for data
     blocks; directories are special files that are maintained by the system
     that map filenames (14 bytes long) to inumbers (the index of an inode
     in the inode list); a data file may appear in more than one directory,
     possibly under a different name, simply by listing its inumber in each
     one (the device number for a disk and the inumber for a file on the
     disk are sufficient to uniquely identify a file); the root directory is
     located at a well-known block (2)
   * free data block numbers are kept in a linked list; each block in the
     list contains the number of the next block in the list, and up to 50
     numbers of free data blocks
   * each inode describes the location of the data blocks for a file by
     having a list of 13 block numbers: the first 10 point to the first 10
     blocks of the file; the next block points to an indirect block which
     contains a list of 128 more block numbers; the 12th block can point to
     a double indirect block, which contains 128 indirect block numbers,
     each of which points to 128 data blocks; if necessary, the 13th block
     points to a triple indirect block, which contains 128 double indirect
     block numbers, each of which points to 128 indirect block numbers, each
     of which points to 128 data blocks; thus the maximum file size is about
     1GB.
   * user processes reference files by their pathnames; these pathnames are
     converted to inumbers, and thence to inodes, by traversing the
     directory tree
   * each user process can be simultaneously referencing between 10 and 50
     files; this is managed by storing pointers to a system-wide open file
     table in the system data segment of the process; the open file table
     itself contains I/O pointers and pointers into the active inode table
     (I/O pointers have to be kept in a separate table, because sometimes
     two processes which both have the same file open share the I/O pointer,
     and sometimes they don't)
   * multiple disk devices can appear to be part of the same directory
     hierarchy by mounting one filesystem at any leaf of another's hierarchy
     (note: today, this has changed to any directory, not any leaf); in this
     way, what appears to be one hierarchy can actually be composed of many
     physical devices