Chapter 2: Instruction Set Principles and Examples --- - different instruction set architectures: - stack; - accumulator; - register-memory; - register-register (load-store); - addressing modes: - Register Add R4, R3 - Immediate Add R4, #3 - Register deferrred or indirect Add R4, (R1) - Displacement Add R4, 100(R1) - Indexed Add R3, (R1 + R2) - Direct or absolute Add R1, (1001) - Memory indirect or Memory deferred Add R1, @(R3) - Autoincrement Add R1, (R2)+ - Audodecrement Add R1, -(R2) - Scaled Add R1, 100(R2)[R3] - addressing modes can reduce instruction count (IC) but also increase average CPI; - based on benchmarks, 12 bits of displacement would capture 75% of the full 32-bit displacements and 16 bits should capture about 99%; - immediate fields with 8 bits would capture 50-70% of the cases for the VAX (which supports 32-bit immediates), while 16 bits would capture 75-80%; - because of their popularity in the benchmarks, H&P suggest that new architectures implement at least the following addressing modes: displacement, immediate, and register deferred; - conditional branches represent 81-87% of the control flow instructions in the benchmarks, the other 13-19% representing call/returns and jumps; - register indirect jumps are useful for case/switch statements, dynamically shared libraries, and virtual functions such as provided in C++; - three ways to specify branch conditions: - condition codes (CCs); - condition register; - compare and branch; - branch displacement values are usually small, with most of them being 4 to 7 instructions (2-3 bits) away from the PC; figure 2.13 suggests that values that need more than 10 bits almost never occur in practice; - commom operand types include character (1 byte), half word (16 bits), word (32 bits), single-precision floating point (32 bits), and double-precision floating point (64 bits); some architectures provide packed decimal, or binary-coded decimal (BCD); - the Alpha, for example, removed operations and data transfers for data smaller than 32 bits; multiple instructions are needed to read or write bytes or half words; - support these simple instructions, since they will dominate the number of isntructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, and return; - the architect must balance several competing forces when encoding the instruction set: - the desire to have as many registers and addressing modes as possible; - the impact of the size of the register and addressing mode fields on the average instruction size and hence on the average program size; - a desire to have instructions encode into lengths that will be easy to handle in the implementation; as a minimum, the architect wants the instructions to be in multiples of bytes, rather than an arbitrary length; many architects have preferred to use fixed-size instructions to gain implementation benefits while sacrificing a little the average code size; - three basic variations in instruction encoding: - variable: VAX (between 1 and 53 bytes and up to 6 memory references!); - fixed: DLX, MIPS, PowerPC, SPARC; - hybrid: IBM 360/70, Intel 80x86; - the architect more interested in code size will pick variable encoding, and the one more interested in performance than code size will pick fixed encoding; - current compilers typically consist of two to four phases, with more highly optimized compilers having more passes; - optimizations performed by modern compilers can be classified by the style of the transformation: - high-level optimizations: often done at the source code; - local optimizations: optimize code only within a straight-line code fragment (basic block); - global optimizations: extend the optimizations across branches and optimizes loops; - register allocation; - machine-dependent optimizations; - register allocation is the most important optimization, usually employing graph coloring heuristics; graph coloring works best when there are at least 16 general purpose registers; - most common optimizations in FORTRAN programs analyzed: - constant propagation (local, 22%); - common subexpression elimination (local, 18%): replace two instances of the same computation by single copy; - code motion (global, 16%): remove code from a loop that computes the same result in each iteration; - global common subexpression elimination (global, 13%): same as local, but this one crosses branches; - copy propagation (global, 11%): replace all instances of a variable A that has been assigned X (A := X;) with X; - register allocation is much more efficient for stack-allocated objectgs than for global variables, and register allocation is essentially impossible for heap-allocated objects because they are accessed with pointers; - aliased variables cannot be put into registers; * to guarantee correctness, a compiler must be conservative; many * compilers will not allocate ANY local variables of a procedure in a * register when there is a pointer that may refer to ONE of the local * variables; - how to help the compiler writer folks: - regularity: whenever possible, operations, data types, and addressing modes should be orthogonal; - provide primitives, not solutions: special features that match a language construct are often unusable; - simplify tradeoffs among alternatives; - provide instructions that bind the quantities known at compile time as constants; - DLX instruction set ------------------- - the only data addressing modes are immediate and displacement, both with 16-bit fields. Register deferred is accomplished simply by placing 0 in the displacement field, and absolute addressing is achieved by using displacement and R0 as the base register; - since DLX has just two addressing modes, these can be encoded into the opcode; - I-type instruction: encodes loads, stores, immediates, conditional branches, jump register, jump and link register; - opcode: 6 bits; - rs1: 5 bits; - rd: 5 bits; - immediate: 16 bits; - R-type instruction: encodes ALU operations, read-write special registers and moves; - opcode: 6 bits; - rs1: 5 bits; - rs2: 5 bits; - rd: 5 bits; - func: 11 bits; - J-type instructions: encodes jump, jump and link, trap, and return from exception; - opcode: 6 bits; - offset: 26 bits (added to PC); - compare instructions place 1 in the destination register if the condition is true and 0 otherwise; - jump and link instructions are used for procedure calls and place the return address in R31; - pitfall: designing a "high-level" instruction set feature specifically oriented to supporting a high-level language structure; - one example is the VAX CALLS instruction, which does a lot of stuff and always uses the stack to pass arguments; - fallacy: there is such a thing as a typical program; - fallacy: an architecture with flaws cannot be successful. "The 80x86 provides a dramatic example: the architecture is one only its creators could love."