Transparent Process Migration: Design Alternatives and the Sprite Implementation --- Douglies et al., 1991 Summary: the paper talks about process migration in the Sprite operating system. It talks about the problems and solutions related to how stop a process in one machine and transfer to the other, transparently to the user. - most often, migration happens as part of the exec system call when an intensive program is to be initiated; - the second common occurrence of migration occurs when a user returns to his workstation, and processes currently running need to be preempted and evicted; - objectives: a remote process has exactly the same access to virtual memory, devices, files, and other resources that it would have if in the original machine; - also, the process appears to the user to be still executing in the home machine; - four factors were taken into account during the design of Sprite: transparency, residual dependencies, performance, and complexity; - if a migration mechanism leaves residual dependencies, the source machine must continue to provide some services after the process has migrated; - complexity is an issue because migration affects all the parts of an OS; - they emphasized transparency and performance, but accepted residual dependencies in some situations; - the system helps to identify idle hosts, but does not automatically migrate processes; - four aspects of sprite environment: idle hosts are plentiful, users own their workstations (keep good interactive response), Sprite uses kernel calls (not message passing), and Sprite already provides network support; - Sprite provides remote access to files and devices and has a single network-wide space of process identifiers; - Motivation for Sprite: rsh sucks. It has no transparency, no eviction mechanism, performance is not good, and has no automatic host selection; - justifications for process migration: - additional flexibility - migration is only slightly more complicated than transparent remote invocation; - the main problem of process migration is managing the process state. The main problems are: - Virtual memory: likely to be the greatest amount of state in bytes; - Open files: if process is manipulating files or devices, state will be present in both application address space and kernel space. The state includes the file descriptor, current access position, and possibly cached file blocks; - Message chanels: if OS is message-based, this state would exist instead of open files; - Execution state: saved during context switch (register values and condition codes); - Other kernel state: process identifier, user identifier, current working directory, etc. I believe this is in the process control block; - the problem is to maintain the process state after it migrates; - for each state component, the system may decide to use one of three possible strategies: transfer the state, arrange for forwarding, or ignore the state and sacrifice transparency; - when forwarding is used, the system leaves the state where it is and forwards operations back and forth between the state and the process. Ex: i/o devices cannot be transferred; - an approach based solely on forwarding would not work, because some services must be provided by the host machine (allocate VM, for example), and because of the high communication cost; - message-based systems make migration somewhat easier than kernel-call-based systems because some of the state is maintained by the kernel in the process's address space, being implicitely transferred with the VM data; - Possibilities to migrate VM state: - send the process's entire memory image (used in Charlotte and LOCUS); - V System used pre-copying, allowing a process to continue while its address space is transferred; - Accent uses lazy-copying, transfering pages only when they are needed. This leaves residual dependencies. - Sprite uses a different form of lazy copying. Backing storage for virtual memory is implemented using ordinary files, which are stored in the network file system. (NOTE: after every page fault, possibly two pages are transfered over the net!!); - in most cases, no disk operations are needed, as the VM servers use memory to cache pages (how big are these??) - shared writable virtual memory is not allowed, as puts problems to process migration; - migrating open files could be done by forwarding, but this has really bad performance, since there are many file-related system calls. So, Sprite transfers the open-file state; - during migration, dirty blocks are flushed back to the file server; - if an access position becomes shared between machines, then neither machine stores it. All access are handled by the file server; - the last piece of state is the process control block (PCB). The home machine for a process must assist in some operations on the process, so it always maintains a PCB. Also, the current machine also has a PCB for it. The PCB in the home machine serves primarily to locate the process and most of its fields are unused; - example of a residual dependency occurs when a process is forked on a remote machine (its process ID should be generated in the home machine). - in Sprite, every machine has a "load monitor" that sends information to a central migration server, that keeps a list of idle computers; - when eviction occurs, foreign processes are migrated back to the home machine; Results: -------- - the results show that process migration works fine, but its results are strongly dependent on the nature of the application. For example, the pmake (parallel compiler) forks processes that are too small to be migrated (each process compiles one source file); - exec-migration occurred in 86% of the cases; Problems: -------- - forwarding seems to be inevitable in process migration. Also, its performance is dependent on the application level of parallelism (using processes; threads wouldn't work);