Chapter 16: Distributed Shared Memory --- - DSM is used as an abstraction for sharing data between computers that do not share physical memory; - allows a shared memory programming model; - the amount of communication is strongly related to the consistency model adopted by a DSM system; - the main point of DSM is that it spares the programmer the concerns of message passing when writing applications; - less appropriate for client-server communication; - Message passing is good because: - architectural differences are not a problem because marshalling takes care of it; - processes are more protected from one another; - simpler to implement; - easier synchronization - communication is explicit (can be good and bad); - DSM is good because: - no marchalling/unmarshalling necessary; - known programming pradigm; - communication not explicit (actually good and bad); - experiments [Munim] show that parallel programs developed for DSM can be made to perform as well as functionally equivalent programs written for message passing platforms on the same hardware (at least with small number of computers); - the performance of DSM depends upon many factors, particularly the pattern of data sharing; - variable marshaling is needed when message passing (MP) is used; - MP gives more protection and takes care of different data representation between computers; - processes that use DSM may have non-overlapping lifetimes (results stay in memory), which is not the case when MP is used; this makes DSM persistent over time; - DSM may be provided in hardware (NUMA), paged virtual memory (Ivy, Munin, and others), and as a middleware (JavaSpaces, TSpaces); - In Ivy (that uses VM), DSM is a region of virtual memory occupying the same address range in the address space of every participating process; this approach depends on homogeneous computers, with same data and paging formats; - the middleware approach is quite different because it is not intended to utilize existing shared memory code; * interesting to see that microkernel architectures make it easier to * implement DSM, as page fault handlers are implemented in user space * and can communicate as they will in order to handle a fault; - a DSM system may be byte-oriented, object-oriented, or based on immutable data. Linda (and its derivatives JavaSpaces and TSpaces) uses the last approach, viewing objects as immutable tuples (an update is done by first issuing a "take" and then a "write"); - the main consistency models that can be practically realized in DSM implementations are sequential consistency and models tha are based on weak consistency; - perfect consistency is only possible with a total ordering mechanism; this model represents the concept of linearizability, also called atomic consistency; - a DSM is linearizable if for any execution there is some interleaving of the operations issued by all the clients that satisfies: - L1: the interleaved sequence of operations meets the specification of a (single) correct copy of the objects; - L2: the order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution; - related to virtual memory, L1 can be written as: - L1': the interleaved sequence of operations is such that if R(x)a occurs in the sequence, then either the last write operation that occurs before it in the interleaved sequence is W(x)a, or no write operation occurs before it and a is the initial value of x; - linearizability is too strict for most practical purposes; - sequential consistency is the model that is more used in practice (by Lampson); - a DSM is sequentially consistent if for any execution there is some interleaving of the operations issued by all the clients that satisfies: - SC1: the interleaved sequence of operations is such that if R(x)a occurs in the sequence, then either the last write operation in the sequence is W(x)a or no write operation occurs before it and a is the initial value of x; - SC2: the order of operations in the interleaving is consistent with the program order in which each individual client executes them; - SC1 is equivalent to L1. SC2 uses program order, other than temporal order (as in L2). This makes SC possible to be implemented; - note that memory operations upon the entire DSM have to be taken into account to satisfy the conditions of SC - and not just the operations on each individual memory location; * it seemed to me that SC was a direct application of Lamport's * "happens-before" relation. But this is claimed to be Causal * Consistency (CC)... What is the difference then? - SC is costly to implement; - coherence in SC is applied on a location-by-location basis; - update options: write-update (WU) and write-invalidate (WI); - WU: updates are made locally and multicast to other replica managers, which immediately update their copies (multiple-readers/multiple-writes sharing); - WI: multiple-reader/single-write sharing. When a process wants to write a local copy, a multicast message is sent to invalidate all other copies. - granularity of sharing affects performance, e.g., if a page is the basic unit, 8KB may be sent because of one byte being updated! * false sharing - process A reads location 1 while B reads location 2, * both in the same page. The bigger the granularity, the bigger the * changes of having false sharing; - too much communication overhead leads to thrashing; - Munin uses WU in a paged-based basis, buffering multiple writes to the same page (makes a copy before the local changes occur); - in WI, sequence numbers may be used to prevent stale copies being taken in write mode; - in WI, ownership of a page is transfered when a write fault occurs; - Release Consistency (RC): reduce DSM overhead by exploiting the fact that programmers use synchronization objects; - RC satisfies these requirements: - RC1: before an ordinary read or write operation occurs, all previous acquire accesses must be performed; - RC2: before a release operation is allowed to perform, all previous ordinary read and write operations must be performed; - RC3: acquire and release operations are sequentially consistent with respect to one another; - the DSM runtime can only enforce RC if it's aware of synchronization accesses; - in RC, a process need not be blocked when it makes updates inside the critical section. Also, updates are propagated only when the lock is released; - RC can be used with either WI or WU; - RC is implemented in the V-kernel; - note that updates may only be done when a process acquires the lock (they do it even for reading!); - Munin divides data items in the following categories: - read-only - migratory: both read and write permissions given at the same time; - write-shared: used by the programmer when processes access different parts of an array, for example. After accesses, only differences are propagated; - producer-consumer: shared by a fixed set of processes (only one of which is writing); - reduction: data always follows the pattern: lock, read, write, unlock. Items have a fixed owner, to which updates are propagated; - result: updates done by "slaves" processes are propagated just to the "master"; - conventional: conventional RC (?). - uniform x hybrid models: hybrid models differentiate between memory accesses; - Other uniform consistency models: - causal consistency - processor consistency - pipelined RAM - Other hybrid models: - entry consistency: a lock may be held by several readers; - scope consistency: automatic (compiler generated) EC; - weack consistency: does not differentiate between acquire and release synchr. accesses;