Chapter 2: Parallel Programs
---

- task: arbitrarily defined piece of the work done by the program;

- fine-grained (little work) vs. coarse-grained tasks;

- a process is an abstract entity that performs tasks;

- creating a parallel program:

	- decomposition into tasks;
	- assignment of tasks to processes;
	- orchestration of the necessary data access, communication,
	  and synchronization among processes;
	- mapping or binding of processes to processors.

- together, decomposition and assignment are called partitioning;

- decomposition:

	- major goal: expose enough concurrency to keep the processes
	  busy at all times, yet not so much that the overhead of
	  managing the tasks becomes substatial compared to the useful work
	  done;

	- a concurrency profile plots the available concurrency (with
	  unlimited number of processors) as a function of time.
	  Therefore, the maximum speedup can be calculated as 
	
			   Area under concurrency profile
		speedup <= ----------------------------------------
			   Horizontal extent of concurrency profile

- assignment:

	- should balance the workload among processes, reduce the
	  amount of interprocess communication, and run-time overhead;

	- decomposition and assignment are the major algorithmic steps
	  in parallelization;

- orchestration:

	- processes need mechanisms to name and access data, to
	  exchange data, and to synchronize with one another;

- mapping:

	- a program can bind, or pin, processes to processors to
	  ensure that they do not migrate during execution;

	- most OSs: the user may ask the system to preserve certain
	  allocation properties, giving the user program some control
	  over the mapping, but the OS is allowed to change the
	  mapping dynamically for efficiency;

- orchestration and mapping are architecture-dependent, which is
  not true for decomposition and assignment.

- in SIMD machines, the work division is done automatically in
  hardware, while in shared-memory arch. there are explicit
  operations, like process creation, work partitioning, and 
  synchronization;

- this is even more explicit when message-passing is used, as each 
  process allocates its own substructure and synchronization is also
  more explicit, done by sending/receiving messages.