Chapter 14: Replication --- - motivations for replication: performance, availability, and fault-tolerance; - replication transparency is desirable; - each logical object contains multiple distributed replicas; each replica is handled by a replica manager; - the front end (FE) hides the location of the replicas from the clients; - FIFO ordering, causal ordering, total ordering; - group membership service has 4 main tasks: - providing an interface for group membership changes; - implementing a failure detector; - notifying members of group membership changes; - performing group address expansion; - group management services can be primary-partition or partitionable: - primary-partition: only one subgroup (majority) survives a partition; - partitionable: all subgroups may continue working independently; - passive (primary-backup) replication: one primary server and other slaves or backups; primary always handles FE requests (used for example in the Harp distributed filesystem); - active replication: FE multicasts requests to the group of replica managers. FE uses totally ordered, reliable multicast; this system achieves sequencial consistency (SC); - The gossip architecture ----------------------- - two guarantees: - each client obtains a consistent service over time. Replica managers only ever provide a client with data that reflects at least the updates that the client has observed so far, even though clients may communicate with different replica managers over time; - relaxed consistency between replicas: all replica managers eventually receive all updates, and a total order exists. It can be made to satisfy SC, but its use focuses on weaker consistency models; - the replica manager that receives a request does not process it until it can apply the request according to the required ordering constraints (this is a kind of lazy, on-demand update); - it seems from the description that a client always use the same FE; - Bayou system ------------ - data replication for high availability with weaker guarantees than sequential consistency; - Bayou replica managers cope with variable connectivity by exchanging updates in pairs; - the state that Bayou replicates is held in the form of a database, supporting queries and updates; Bayou may undo and redo updates to the database as execution proceeds; - the Bayou guarantee is that, eventually, every replica manager receives the same set of updates and applies those updates in a way to achieve the same result in all replicas; - updates are marked as tentative when they are first applied to a database. The system arranges that tentative updates are eventually placed in a canonical order and marked as committed; - while updates are tentative, the system may undo and reapply them as it produces a consistent state; - total ordering can be achieved by selecting a primary server, which serializes the updates - every Bayou update contains a dependency check and a merge procedure, which are domain-specific; a replica manager calls the dependency check procedure before applying the update; if the check indicates a conflict, Bayou calls the merge procedure; - in Bayou, replication is not transparent to the application, and achieves what the book calls eventual sequential consistency; - disadvantages: - programmer needs to supply dependency checks and merge procedures; - results after a conflict may not agree with the user view before the conflict (appointment schedule example in the book); - Coda File System ---------------- - descendant of AFS (Andrew File System); - AFS limited replication to read-only filesystems; - appearance of mobile users creates the need for constant data availability; - Coda relies on the replication of file volumes to achieve a higher throughput of file access operations and a greater degree of fault tolerance; - Coda relies on an extension of the mechanism used in AFS for caching copies of files at client computers to enable disconnected operation; - Coda is like Bayou, in that it uses an optimistic strategy; - different from Bayou, the dependecy check is not application-specific; - on a close call, copies of modified files are broadcast in parallel to all the servers in the available volume storage group (AVSG); - disconnected operation is said to occur when the AVSG is empty; - to detect conflicts and keep the state of files, each file version contains a Coda version vector (CVV), which is a vector timestamp with an entry for each server in the relevant VSG; - the purpose of the CVV is to provide sufficient information about the update history of each file replica to enable potential conflicts to be detected and submitted for manual intervention and for stale replicas to be updated automatically; - Coda does not, in general, resolve conflicts automatically; - statistics (Mary's papers, I think) show that for most files accessed by users, there is not possible conflict (not shared); - when a file fetch is completed as a result of an open on a uncached file, a callback promise is established at the preferred server (the one with the most up-to-date version of the file in the AVSG); - the preferred server contacts the client if a modification to the file (visible to the server) occurs while the client is working on it; - a frequently sent probe is used to define the AVSG; - Transactions with replicated data --------------------------------- - lazy vs. eager approaches to update propagation. Lazy propagation is usually used in primary copy replication systems, while the eager approach is used to guarantee serialized access when different replica managers are used by different clients;