Transparent Process Migration: Design Alternatives and the Sprite
Implementation
--- Douglies et al., 1991


Summary: the paper talks about process migration in the Sprite
operating system.  It talks about the problems and solutions related
to how stop a process in one machine and transfer to the other,
transparently to the user.

- most often, migration happens as part of the exec system call when
  an intensive program is to be initiated;

- the second common occurrence of migration occurs when a user returns
  to his workstation, and processes currently running need to be 
  preempted and evicted;

- objectives: a remote process has exactly the same access to virtual
  memory, devices, files, and other resources that it would have if 
  in the original machine;

- also, the process appears to the user to be still executing in the 
  home machine;

- four factors were taken into account during the design of Sprite:
  transparency, residual dependencies, performance, and complexity;

- if a migration mechanism leaves residual dependencies, the source
  machine must continue to provide some services after the process has 
  migrated;

- complexity is an issue because migration affects all the parts of an OS;

- they emphasized transparency and performance, but accepted residual
  dependencies in some situations;

- the system helps to identify idle hosts, but does not automatically
  migrate processes;

- four aspects of sprite environment: idle hosts are plentiful, users
  own their workstations (keep good interactive response), Sprite uses
  kernel calls (not message passing), and Sprite already provides
  network support;

- Sprite provides remote access to files and devices and has a single
  network-wide space of process identifiers;

- Motivation for Sprite: rsh sucks.  It has no transparency, no
  eviction mechanism, performance is not good, and has no automatic
  host selection;

- justifications for process migration: 

	- additional flexibility
	- migration is only slightly more complicated than transparent
	  remote invocation;

- the main problem of process migration is managing the process state.
  The main problems are:
	
	- Virtual memory: likely to be the greatest amount of state in
	  bytes;

	- Open files: if process is manipulating files or devices,
	  state will be present in both application address space and
	  kernel space.  The state includes the file descriptor,
	  current access position, and possibly cached file blocks;

	- Message chanels: if OS is message-based, this state would
	  exist instead of open files;

	- Execution state: saved during context switch (register
	  values and condition codes);

	- Other kernel state: process identifier, user identifier,
	  current working directory, etc.  I believe this is in the
	  process control block;

- the problem is to maintain the process state after it migrates;

- for each state component, the system may decide to use one of three
  possible strategies: transfer the state, arrange for forwarding, or
  ignore the state and sacrifice transparency;

- when forwarding is used, the system leaves the state where it is and
  forwards operations back and forth between the state and the
  process.  Ex: i/o devices cannot be transferred;

- an approach based solely on forwarding would not work, because some
  services must be provided by the host machine (allocate VM, for
  example), and because of the high communication cost;

- message-based systems make migration somewhat easier than
  kernel-call-based systems because some of the state is maintained by
  the kernel in the process's address space, being implicitely
  transferred with the VM data;

- Possibilities to migrate VM state:

	- send the process's entire memory image (used in Charlotte
	  and LOCUS);

	- V System used pre-copying, allowing a process to continue
	  while its address space is transferred;

	- Accent uses lazy-copying, transfering pages only when they
	  are needed.  This leaves residual dependencies.
	
	- Sprite uses a different form of lazy copying.  Backing
	  storage for virtual memory is implemented using ordinary
	  files, which are stored in the network file system.
	  (NOTE: after every page fault, possibly two pages are
	  transfered over the net!!);

- in most cases, no disk operations are needed, as the VM servers use
  memory to cache pages (how big are these??)

- shared writable virtual memory is not allowed, as puts problems to
  process migration;

- migrating open files could be done by forwarding, but this has
  really bad performance, since there are many file-related system calls. 
  So, Sprite transfers the open-file state;

- during migration, dirty blocks are flushed back to the file server;

- if an access position becomes shared between machines, then neither
  machine stores it.  All access are handled by the file server;

- the last piece of state is the process control block (PCB). The home
  machine for a process must assist in some operations on the process,
  so it always maintains a PCB.  Also, the current machine also has a
  PCB for it.  The PCB in the home machine serves primarily to locate
  the process and most of its fields are unused;

- example of a residual dependency occurs when a process is forked on
  a remote machine (its process ID should be generated in the home
  machine).

- in Sprite, every machine has a "load monitor" that sends information
  to a central migration server, that keeps a list of idle computers;

- when eviction occurs, foreign processes are migrated back to the
  home machine;


Results:
--------

- the results show that process migration works fine, but its results
  are strongly dependent on the nature of the application.  For
  example, the pmake (parallel compiler) forks processes that are too
  small to be migrated (each process compiles one source file);

- exec-migration occurred in 86% of the cases;


Problems:
--------

- forwarding seems to be inevitable in process migration.  Also, its
  performance is dependent on the application level of parallelism
  (using processes; threads wouldn't work);