Chapter 2: Instruction Set Principles and Examples
---

- different instruction set architectures:

	- stack;

	- accumulator;

	- register-memory;

	- register-register (load-store);

- addressing modes:

	- Register				Add R4, R3
	
	- Immediate				Add R4, #3

	- Register deferrred or indirect	Add R4, (R1)

	- Displacement				Add R4, 100(R1)

	- Indexed				Add R3, (R1 + R2)

	- Direct or absolute			Add R1, (1001)

	- Memory indirect or Memory deferred	Add R1, @(R3)

	- Autoincrement				Add R1, (R2)+

	- Audodecrement 			Add R1, -(R2)

	- Scaled				Add R1, 100(R2)[R3]

- addressing modes can reduce instruction count (IC) but also increase
  average CPI;

- based on benchmarks, 12 bits of displacement would capture 75% of
  the full 32-bit displacements and 16 bits should capture about 99%;

- immediate fields with 8 bits would capture 50-70% of the cases for
  the VAX (which supports 32-bit immediates), while 16 bits would
  capture 75-80%;

- because of their popularity in the benchmarks, H&P suggest that new
  architectures implement at least the following addressing modes:
  displacement, immediate, and register deferred;

- conditional branches represent 81-87% of the control flow
  instructions in the benchmarks, the other 13-19% representing
  call/returns and jumps;

- register indirect jumps are useful for case/switch statements,
  dynamically shared libraries, and virtual functions such as provided
  in C++;

- three ways to specify branch conditions:

	- condition codes (CCs);

	- condition register;

	- compare and branch;

- branch displacement values are usually small, with most of them
  being 4 to 7 instructions (2-3 bits) away from the PC;  figure 2.13
  suggests that values that need more than 10 bits almost never occur in
  practice;

- commom operand types include character (1 byte), half word (16
  bits), word (32 bits), single-precision floating point (32 bits),
  and double-precision floating point (64 bits);  some architectures 
  provide packed decimal, or binary-coded decimal (BCD);

- the Alpha, for example, removed operations and data transfers for
  data smaller than 32 bits; multiple instructions are needed to read
  or write bytes or half words;

- support these simple instructions, since they will dominate the
  number of isntructions executed: load, store, add, subtract, move
  register-register, and, shift, compare equal, compare not equal,
  branch, jump, call, and return;

- the architect must balance several competing forces when encoding
  the instruction set:

	- the desire to have as many registers and addressing modes as
	  possible;

	- the impact of the size of the register and addressing mode
	  fields on the average instruction size and hence on the
	  average program size;

	- a desire to have instructions encode into lengths that will
	  be easy to handle in the implementation;  as a minimum, the
	  architect wants the instructions to be in multiples of bytes, 
	  rather than an arbitrary length;  many architects have 
	  preferred to use fixed-size instructions to gain 
  	  implementation benefits while sacrificing a little the average 
	  code size;

- three basic variations in instruction encoding:

	- variable: VAX (between 1 and 53 bytes and up to 6 memory
	  references!);

	- fixed: DLX, MIPS, PowerPC, SPARC;

	- hybrid: IBM 360/70, Intel 80x86;

- the architect more interested in code size will pick variable
  encoding, and the one more interested in performance than code size
  will pick fixed encoding;

- current compilers typically consist of two to four phases, with more
  highly optimized compilers having more passes;

- optimizations performed by modern compilers can be classified by the
  style of the transformation:

	- high-level optimizations: often done at the source code;

	- local optimizations: optimize code only within a
	  straight-line code fragment (basic block);

	- global optimizations: extend the optimizations across
	  branches and optimizes loops;

	- register allocation;

	- machine-dependent optimizations;

- register allocation is the most important optimization, usually
  employing graph coloring heuristics;  graph coloring works best when
  there are at least 16 general purpose registers;

- most common optimizations in FORTRAN programs analyzed:

	- constant propagation (local, 22%);

	- common subexpression elimination (local, 18%): replace two
	  instances of the same computation by single copy;

	- code motion (global, 16%): remove code from a loop that
	  computes the same result in each iteration;

	- global common subexpression elimination (global, 13%): same
	  as local, but this one crosses branches;

	- copy propagation (global, 11%): replace all instances of a
	  variable A that has been assigned X (A := X;) with X;

- register allocation is much more efficient for stack-allocated
  objectgs than for global variables, and register allocation is 
  essentially impossible for heap-allocated objects because they are
  accessed with pointers;

- aliased variables cannot be put into registers;

* to guarantee correctness, a compiler must be conservative; many
* compilers will not allocate ANY local variables of a procedure in a
* register when there is a pointer that may refer to ONE of the local
* variables;

- how to help the compiler writer folks:

	- regularity: whenever possible, operations, data types, and
	  addressing modes should be orthogonal;

	- provide primitives, not solutions: special features that
	  match a language construct are often unusable;

	- simplify tradeoffs among alternatives;

	- provide instructions that bind the quantities known at
	  compile time as constants;


- DLX instruction set
  -------------------

- the only data addressing modes are immediate and displacement, both
  with 16-bit fields.  Register deferred is accomplished simply by
  placing 0 in the displacement field, and absolute addressing is
  achieved by using displacement and R0 as the base register;

- since DLX has just two addressing modes, these can be encoded into
  the opcode;

- I-type instruction: encodes loads, stores, immediates, conditional
  branches, jump register, jump and link register;

	- opcode: 	6 bits;
	- rs1: 		5 bits;
	- rd:		5 bits;
	- immediate:	16 bits;

- R-type instruction: encodes ALU operations, read-write special
  registers and moves;

	- opcode: 	6 bits;
	- rs1: 		5 bits;
	- rs2:		5 bits;
	- rd:		5 bits;
	- func:		11 bits;

- J-type instructions: encodes jump, jump and link, trap, and return
  from exception;

	- opcode:	6 bits;
	- offset:	26 bits (added to PC);

- compare instructions place 1 in the destination register if the
  condition is true and 0 otherwise;

- jump and link instructions are used for procedure calls and place
  the return address in R31;


- pitfall: designing a "high-level" instruction set feature
  specifically oriented to supporting a high-level language structure;

	- one example is the VAX CALLS instruction, which does a lot
	  of stuff and always uses the stack to pass arguments;

- fallacy: there is such a thing as a typical program;

- fallacy: an architecture with flaws cannot be successful. "The 80x86
  provides a dramatic example: the architecture is one only its
  creators could love."