Chapter 4: Worload-Driven Evaluation
---

- for parallem machines, should measure two performance
  characteristics (both are important):

	- absolute performance;
	- performance improvement due to parallelism;

- superlinear speedup can occur if the problem is "too big" for small
  number of processors; as the problem is divided into more
  subproblems, we have less page faults and cache misses, creating
  superlinear speedups;

- user-oriented scaling:

	- number of particles per process in Barnes-Hut;
	- number of rows of a matrix;

- resource-oriented scaling:

	- execution time;
	- total amount of memory;

- resource-oriented models:

	- problem-constrained scaling (PC);
	- time-constrained scaling (TC);
	- memory-constrained scaling (MC);

- PC scaling: problem size is kept fixed;

- TC scaling: time to complete execution is held fixed;

- MC scaling: amount of memory per process is fixed;

- speedup can always be calculated as:

		     Work(p) x Time(1)
	Speedup(p) = -------   -------
		     Work(1)   Time(p)

  then, for each scaling model we cross the terms that are kept fixed
  as the number of processors increases.  For example, for TC:

		     Work(p)
	Speedup(p) = -------
		     Work(1)

* as long as memory or cache capacity effects do not dominate, we
* should expect the lowest parallelism overhead and highest speedup
* under MC scaling and the next under TC scaling.  We should expect
* speedup to degrade quite quickly under PC scaling, at least once the
* overheads become signigicant relative to useful work.  Unless it is
* known that a particular scaling model is the right one for an
* application, or is particularly inappropriate, it is useful to
* evaluate a machine under all three scaling models;

- usual classes of microbenchmarks:

	- processing MB;
	- local memory MB;
	- I/O MB;
	- communication MB;
	- synchronization MB;

- workloads:

	- kernels;
	- complete applications;
	- multiprogrammed workloads;

- optimizations:
	
	- algorithmic;
	- data structuring;
	- data layout, distribution, and alignment;
	- orchestration of communication and synchronization;

- workloads should be chosen to represent a wide range of
  applications, behavioral patterns, and levels of optimization;