Unix Implementation Ken Thompson Overview * provide a kernel with the least common denominator of what users would want the system to do for them * everything outside of the kernel can be changed by users * value simplicity over efficiency Processes * user processes request system functions by performing a system call, a subroutine call which turns the user processes into system processes * each system process has its own stack, for protection purposes * user processes get their code from shared, read-only text segments; this allows sharing of text pages in memory, as well as obviating the need to swap these pages out (the originals are already on disk) * user processes have a private, read-write data segment, which consists of an automatically managed stack, and a manually managed heap; processes also have an unaccessible (except to the kernel) system data segment, which stores things like file descriptor tables, etc. * a process table has one entry per process, each entry containing, among other things, pointers to the text table (which points to the text segment) and to the data segments * a process is created by having another process do a fork; the new child is an exact copy of the old parent, except for the value of one register, which the two processes can use to differentiate themselves * a parent can wait for the termination of any of its children (rendezvous) * new programs are executed by having a process exec the file containing the program; a new process is not created, but rather, the old process simply begins executing the new program; it will not resume the old program when the new one terminates (like goto) * programs in primary memory are swapped (not paged) to and from primary memory: either the whole program is in primary memory, or none of it is (modulo shared text segments) * the swapper always stays in memory, and decides what processes to swap in and out, depending mostly on how long they have been in/out of primary memory * processes wait on events; other processes that signal these events cause all processes waiting on them to wake up * all processes except one are waiting on an event at any time; when the running process waits, another process is chosen based on a dynamic priority system (system processes have higher priority than user processes; user process priorities dynamically change with the amount of service they receive) I/O * two kinds of I/O devices: block and character: block devices appear to be files of randomly-accessed 512-byte blocks; character devices can be almost anything else * block I/O is done via a block cache; writes are always asynchronous, unless all dirty blocks are explicitly flushed to disk (which happens periodically, automatically); this can lead to inconsistencies, especially since data may be written to disk in a different order than they were written by the application * character I/O to character-at-a-time devices like terminals and paper tape readers or writers are handled via character queues (FIFOs); terminals also have a special canonical input queue which contains line-at-a-time input, the system having parsed control characters such as erase and kill File System * each file is an array of bytes, divided into a sequence of 512-byte blocks * the space on the disk is divided into space for inodes and for data blocks; directories are special files that are maintained by the system that map filenames (14 bytes long) to inumbers (the index of an inode in the inode list); a data file may appear in more than one directory, possibly under a different name, simply by listing its inumber in each one (the device number for a disk and the inumber for a file on the disk are sufficient to uniquely identify a file); the root directory is located at a well-known block (2) * free data block numbers are kept in a linked list; each block in the list contains the number of the next block in the list, and up to 50 numbers of free data blocks * each inode describes the location of the data blocks for a file by having a list of 13 block numbers: the first 10 point to the first 10 blocks of the file; the next block points to an indirect block which contains a list of 128 more block numbers; the 12th block can point to a double indirect block, which contains 128 indirect block numbers, each of which points to 128 data blocks; if necessary, the 13th block points to a triple indirect block, which contains 128 double indirect block numbers, each of which points to 128 indirect block numbers, each of which points to 128 data blocks; thus the maximum file size is about 1GB. * user processes reference files by their pathnames; these pathnames are converted to inumbers, and thence to inodes, by traversing the directory tree * each user process can be simultaneously referencing between 10 and 50 files; this is managed by storing pointers to a system-wide open file table in the system data segment of the process; the open file table itself contains I/O pointers and pointers into the active inode table (I/O pointers have to be kept in a separate table, because sometimes two processes which both have the same file open share the I/O pointer, and sometimes they don't) * multiple disk devices can appear to be part of the same directory hierarchy by mounting one filesystem at any leaf of another's hierarchy (note: today, this has changed to any directory, not any leaf); in this way, what appears to be one hierarchy can actually be composed of many physical devices