Chapter 4: Worload-Driven Evaluation --- - for parallem machines, should measure two performance characteristics (both are important): - absolute performance; - performance improvement due to parallelism; - superlinear speedup can occur if the problem is "too big" for small number of processors; as the problem is divided into more subproblems, we have less page faults and cache misses, creating superlinear speedups; - user-oriented scaling: - number of particles per process in Barnes-Hut; - number of rows of a matrix; - resource-oriented scaling: - execution time; - total amount of memory; - resource-oriented models: - problem-constrained scaling (PC); - time-constrained scaling (TC); - memory-constrained scaling (MC); - PC scaling: problem size is kept fixed; - TC scaling: time to complete execution is held fixed; - MC scaling: amount of memory per process is fixed; - speedup can always be calculated as: Work(p) x Time(1) Speedup(p) = ------- ------- Work(1) Time(p) then, for each scaling model we cross the terms that are kept fixed as the number of processors increases. For example, for TC: Work(p) Speedup(p) = ------- Work(1) * as long as memory or cache capacity effects do not dominate, we * should expect the lowest parallelism overhead and highest speedup * under MC scaling and the next under TC scaling. We should expect * speedup to degrade quite quickly under PC scaling, at least once the * overheads become signigicant relative to useful work. Unless it is * known that a particular scaling model is the right one for an * application, or is particularly inappropriate, it is useful to * evaluate a machine under all three scaling models; - usual classes of microbenchmarks: - processing MB; - local memory MB; - I/O MB; - communication MB; - synchronization MB; - workloads: - kernels; - complete applications; - multiprogrammed workloads; - optimizations: - algorithmic; - data structuring; - data layout, distribution, and alignment; - orchestration of communication and synchronization; - workloads should be chosen to represent a wide range of applications, behavioral patterns, and levels of optimization;