Disconnected Operation in the Coda File System --- Kistler et al., 1991. Summary by Petros Maniatis Main goal: highly available operation for a network file system, even in the face of disconnection. Workload is balanced at the granularity of subtrees. High availability is achieved through: * replication * disconnected use Caching is done on whole files. This simplifies disconnected operation and allows for more straightforward failure modes. Clients are given most functionality, to promote scalability. The contents of the cache, in the face of disconnection, can be determined by the user, in the same way this is done when a user copies interesting files on a laptop before leaving a network access point and then reintegrates them back into the file repository, on reconnection. When disconnected, the pessimistic approach would be to enforce consistency, by requiring an a priori acquisition of a shared or exclusive lock for access. In the case of involuntary disconnection, this is unrealistic. The optimistic approach, which is the one taken by Coda, allows all updates and hopes no conflicts will ensue; when conflicts do arise, they are resolved upon reconnection. System calls are intercepted through v-nodes and serviced by a user level cache daemon (Venus). Venus can be in any one of three states: * When connected, Venus is hoarding (i.e., accumulating files in its cache). It has to deal with updating its file versions (since others might also be modifying cached files), with cache space management and with keeping certain files, in which the user has expressed explicit interest, around. * When disconnected, it emulates normal operation, by using its cache to satisfy file requests. It also fakes some of the functionality that normally only servers provide (i.e., assigning a new file ID and such). Results of this usurped functionality have to be revalidated by real servers upon reconnection. Cache misses can either block until reconnection, or return an error code. Mutating operations are logged, so that they can be replayed upon reconnection. A caching intricacy during disconnection is that the cache has to be maintained across crashes (since it's what the user perceives as the emulation of the servers). Therefore, it uses recoverable virtual memory for the meta-data, which enforces transaction semantics and provides recovery across failures. * When reconnected, it briefly goes through the reintegration phase, where it deals with update conflicts. The entire log of mutating operations, as assembled during disconnection is shipped to all servers. There it is replayed inside a single transaction, and applied to all referenced files. If reintegration succeeds (i.e., there are no conflicts), the log is purged and Venus goes back to hoarding mode. Otherwise, a user utility can be used on the log to selectively apply it. Conflicts of interest are only write/write conflicts. Each file has a version number (storeID) associated with it. If the version numbers between the log entry and the server copy of the file are compatible, there is no conflict. In the case of directories, conflicts can sometimes be resolved.