CS61C Spring 1999

Section 23 TA: Gek Siong Low (cs61c-tb)

Week 14 Discussion notes

Pipelining


It looks like you are only responsible for Chapter 6.1. If you look at Ch 6.2 and beyond you'll see a lot of horrible diagrams. Read them if you want, you don't need to know them. It is useful to read some of the elaborations in those sections though.

Why pipelining?

In a nutshell, pipelining improves throughput. More instructions per unit time can be executed. Each instruction can be divided into several stages which can then be interleaved (remember the washer/dryer analogy). The actual speedup depends on the mix of instructions, which may introduce "bubbles" into the pipeline. If we assume that the number of instructions is very large, and that the number of bubbles are pretty much insignificant, the speedup is approximately the number of pipeline stages.

The bottleneck in a pipeline is the slowest stage in the pipeline.

Stages

In summary, here are the 5 pipeline stages for the MIPS architecture:

1. IFetch : Instruction fetch, includes incrementing the PC
2. Decode : Instruction is decoded and registers read
3. Execute (ALU stage):
a) If it's a load/store (memory reference), calculate address
b) If it's a arithmetic operation, perform the operation
c) If it's a branch, perform the comparison

4. Memory access :
a) If it's a load, read data from memory
b) If it's a store, write data to memory
c) If it's a branch, change the PC if stage 3 result is true

5. Write Back : Write data to register

Note that for load, the value is not in the register yet at the end of stage 4..

Why do we have "left half write, right half read"?

It is not just so that the diagrams look pretty. The stage is really divided into these two sub-stages. We see why this is needed when we get to forwarding. Data that is written to a register in the first half of the stage is available to some instruction that reads from it in the second half of another stage. (see slide 27 in lecture notes and Fig 6.37 on pg 481 in the COD). This scheme avoids the need for an extra "bubble".

Shading the stages

When a stage/half-stage is shaded, it means that the stage/half-stage is used by the instruction. In the lecture notes examples, the ALU stage is not shaded. It should be shaded. Almost all instructions use the ALU stage.

Why is MIPS so good for pipelining?

MIPS is designed to be pipelined from the ground up, unlike Intel, so the pipelining scheme is clean and simple. Intel do use pipelines, but they are very complicated and there are more than the 5 stages MIPS use, partly due to the fact that Intel is CISC architecture.

The following properties makes MIPS suitable for pipelining:
1. All instruction are of the same length, so that instruction fetch is just one simple step. If the instructions are of variable length (as in Intel), the instruction fetch will include first fetching the opcode, determining which instruction it is, and then fetching the appropriate number of bytes for that particular opcode/instruction.

2. Few instruction formats. The instructions all have similar structures, and they easily fit themselves into the same few pipeline stages.

3. Only loads/stores access memory, and they don't require any additional arithmetic operations. This allows the ALU stage to take on multiple tasks, so for load/stores the ALU stage is used to calculate memory addresses. Otherwise, we may need to have a special stage just for calculating memory addresses.

4. Single memory accesses. No instruction accesses memory twice. For example, the Intel instruction set has some instructions that operate directly on memory references, so we would have to read two values from memory, operate on them, and store the result back into memory. Obviously this would require more pipeline stages, and increase the complexity.

In general, the more pipeline stages we have the more complex it becomes because of the increase in possible combinations of hazards. MIPS is very well designed as it has only 5 pipeline stages.

Hazards

There are several types of hazards (i.e. literally, "dangerous situations" - this is not some special computer jargon with a special meaning):

1. Structural Hazards

This is an impossible combination of instructions. On page 441 in the COD, the "two memories" refer to having two separate paths to the memory for instruction fetch and for memory access. If we have only one path to be shared by both instruction fetch and memory access, then the instruction fetch stage and the memory access stage may "collide", since we cannot do both at the same time. Hence the word "structural". It arises from the physical design of the system.

2. Control Hazards

Control hazards are due to branches. We do not know in advance the outcome of a branch instruction, so we do not know what the next instruction will be, and that interferes with the smooth flow of the pipeline.

There are several possible solutions:
a) Wait/Stall - a bubble is simply inserted into the pipeline. We assume that we have added enough hardware to do the comparison, address calculation, PC updating in a short enough time such that we just need to insert one bubble. Otherwise we'll need more bubbles. This is a slow (and lazy) solution.

b) Branch prediction and back up - We guess the outcome of the branch instruction and start the execution of the next instruction that we think will be executed in the next pipeline slot. If we are wrong about our guess, we'll have to scrap that instruction and undo what has been executed so far. Obviously, we'll need some fancy hardware to do this. If we predict correctly most of the time, we stand to gain a lot of speed. Many methods can be used to guess the outcome of the branch, e.g. one way is keep a history of some previous outcomes.

c) Delayed branches - This looks pretty arcane but it is actually a very simple idea and it is what the MIPS designers chose to implement. Basically we find a "safe" instruction from before the branch to insert into the slot right after the branch, and we complete the execution of that instruction. There are ways for the compiler/assembler to determine which instructions are safe, and you'll learn this in CS164, but for this class it will suffice to just pick them by observation. Just think of it as "the instruction right after the branch will be executed". Delayed branches actually provides pretty good performance, since the compiler can often find an instruction to fill into the delay slot. The fact that there are only 5 pipeline stages in MIPS so that we have only one delay slot to fill in helps too.

3. Data Hazards

Data hazards arise from data dependencies between instructions/stages. When one stage needs some data that is dependent on the completion of another stage in another instruction, then we must wait for that other stage to complete first.

Forwarding hardware ensures that data output from one stage is available to the other stages thereafter so that we don't have to wait for entire instruction to complete before we can get the data we want.

If data dependencies are backward in time, then we must insert bubbles (stall) until the data dependencies are forward in time. This is just common sense.

We can reschedule (reorder) instructions to avoid load hazards. This is somewhat similar to delayed branches. We know that the instruction that immediately follows a load instruction must stall if it requires the result of the load instruction. Similarly as in the delayed branch case, we can pick safe instructions to put after the load instruction so that we do not have to stall. Again, there are methods to do the reordering (it involves drawing a dependency graph as the first step) but for this class the examples you'll get should be easily done by observation.


Written by : Gek Siong Low (cs61c-tb), Spring 1999