Chapter 2: Parallel Programs --- - task: arbitrarily defined piece of the work done by the program; - fine-grained (little work) vs. coarse-grained tasks; - a process is an abstract entity that performs tasks; - creating a parallel program: - decomposition into tasks; - assignment of tasks to processes; - orchestration of the necessary data access, communication, and synchronization among processes; - mapping or binding of processes to processors. - together, decomposition and assignment are called partitioning; - decomposition: - major goal: expose enough concurrency to keep the processes busy at all times, yet not so much that the overhead of managing the tasks becomes substatial compared to the useful work done; - a concurrency profile plots the available concurrency (with unlimited number of processors) as a function of time. Therefore, the maximum speedup can be calculated as Area under concurrency profile speedup <= ---------------------------------------- Horizontal extent of concurrency profile - assignment: - should balance the workload among processes, reduce the amount of interprocess communication, and run-time overhead; - decomposition and assignment are the major algorithmic steps in parallelization; - orchestration: - processes need mechanisms to name and access data, to exchange data, and to synchronize with one another; - mapping: - a program can bind, or pin, processes to processors to ensure that they do not migrate during execution; - most OSs: the user may ask the system to preserve certain allocation properties, giving the user program some control over the mapping, but the OS is allowed to change the mapping dynamically for efficiency; - orchestration and mapping are architecture-dependent, which is not true for decomposition and assignment. - in SIMD machines, the work division is done automatically in hardware, while in shared-memory arch. there are explicit operations, like process creation, work partitioning, and synchronization; - this is even more explicit when message-passing is used, as each process allocates its own substructure and synchronization is also more explicit, done by sending/receiving messages.