Week 12 Discussion Notes

CS61C Spring 1999

Section 23 TA: Gek Siong Low (cs61c-tb)

Week 12 Discussion notes

Computer Arithmetic

There isn't much to cover for the datapath, and anyway we'll see it again when we get to pipelining, so for now I'll cover the computer arithmetic stuff.

Hardware Building Blocks and ALU

In the textbook and lecture notes we see some pretty complicated-looking diagrams. The way to approach them is to remember that we are building a ALU that performs several operations, and each new operation is added on block by block, so don't try to understand the last diagram all at once.

The most important component is the multiplexor which selects which operation to perform. As we add more operations to the ALU, our multiplexor gets larger. Initially we start with only the AND and OR operations. The diagram for that is fairly straightforward.

The full adder

The next component we add is the full adder. The full adder takes in 3 inputs: the 2 operands, and the carry in from the previous add, and has 2 outputs: the results and a carry out. First thing we do is to draw the truth table for the full adder. I'll just reproduce it here:

Full Adder

A   B   Carry In  |  Carry Out   Sum
------------------+------------------
0   0      0      |      0        0
0   0      1      |      0        1
0   1      0      |      0        1
0   1      1      |      1 <-     0
1   0      0      |      0        1
1   0      1      |      1 <-     0
1   1      0      |      1 <-     0
1   1      1      |      1 <-     1

To derive the circuit diagram for this operation, we first look at the rows for which there the result is 1. So for the carry out part of the operation, we have the 4 rows indicated above. Using the sum-of-products method described in Appendix B, we get the following Boolean equation:

CarryOut = (B.CarryIn) + (A.CarryIn) + (A.B) + (A.B.CarryIn)

Either by observation or by manipulation and using the laws of Boolean algebra, we have:

CarryOut
= (B.CarryIn).(1 + A) + (A.CarryIn) + (A.B) (grouping common factors)
= (B.CarryIn) + (A.CarryIn) + (A.B) (because 1+A=1 and 1.x=x)

From this we get the diagram in the lecture notes, which consists of 3 AND gates and a 3-input OR gate (or 2 OR gates if you prefer). You have to do the other half of the circuit (Sum) for your homework.

To get a 32-bit adder, we simply string 32 full adders together, connecting each carry out to the next carry in. The first carry in is zero.

Subtraction

We have to modify the circuit a little to provide support for subtraction, but we don't have to do much because we are using two's complement representation (recall that's one of the advantages of using two's complement). We simply need to convert the second operand to its two's complement equivalent and proceed to add them together. Only two more components are needed: a NOT gate for each second input (b) to flip the bits and a multiplexor (called BInvert in the textbook) to select whether to flip or not. For the add 1 we just set the initial carry in to 1 (because we need to add one to get the two's complement), so we'll just connect the BInvert line directly to the first CarryIn. Easy!

Other operations: slt, bne

The textbook mentions two more operations. For slt, the idea is that to know if a<b we first perform (a-b) and look at the MSB (most significant bit) in the subtraction result which is the sign bit. Since the result of an slt operation is that MSB, we add a new Less input and feed the bit back to the first Less input at bit 0. All the other Less inputs are zero.

For bne, we just want to know if the subtraction result is zero, and the output is 1 if the subtraction turns out to be 0. To do that we simply OR all the bits of the subtraction result together and then NOT it. If you read up on De Morgan's Law this is the same as ANDing all the bits together. So if all the bits are 0, the output will be 1.

Since the professor didn't cover fast carry, I guess it's not that important. The problem with the method described above is that you have to wait for the carry bit to propagate up the chain, which is slow. Fast carry is an attempt to speed up calculations by doing parts of the operation in parallel and knowing the carry in value there in advance without waiting for the carry bit to propagate there.

Multiplication and Division

What's all that version 1,2,3 stuff? We didn't really talk about the 2nd and 3rd versions in detail, but since they are in the lecture notes, I'll try to explain the concepts behind them. Basically, the three versions develop in this sequence:

Version 1: Our good old pencil and paper technique, shouldn't be hard to understand
Version 2: Noticing that we don't really need a 64-bit ALU to multiply/divide 32-bit numbers, so we do things differently. Hard to understand.
Version 3: Noticing that we can share space because as one value gets shorter in length, another increases in length. Easier than Version 2.

Multiplcation

Version 1

Multiplying two numbers is easy, it's a series of shift and adds. I'll reproduce the example in the lecture notes here:

Multiplicand          1000   (m bits)
Multiplier          x 1001   (n bits)
                   -------
                      1000
                     0000
                    0000
                   1000
                 ---------
Product           01001000   (m+n bits)


Algorithm:
1. If rightmost bit of multiplier==1
          Add multiplicand to Product register
2. Shift multiplicand left by 1 bit.
3. Shift multiplier right by 1 bit.
4. Do this 32 times because the multiplier is 32-bits wide. (must shift everything out)

Because of step 1, our product, multiplicand and ALU have to be 64-bits wide.

Version 2

Now here's the hard-to-understand part: we don't need a 64-bit ALU at all. The obvious observation is that the multiplicand is only 32 bits wide but we are using 64 bits for it. The algorithm is not as obvious, why are we shifting the product to the right?

Look at the example above. Now concentrate on intermediate steps in between. If we ignore the bits on the right (the blank spaces) then we see that we need only 32 bits for each intermediate result if we don't store the bits where the blank spaces are. Furthermore, each step along the way we confirm every bit from the right and they thus they don't really need to be kept as temporary storage. Those rightmost bits are already permanent.

The idea is to place the tempory part of the result in the left half of the product register (which is still 64-bits wide) and gradually shift the rightmost bits into the right half as we confirm them bit by bit. We do not have to shift the multiplicand to the left anymore, so we need only 32 bits for that now. And since the intermediate storage need only be 32 bits wide, we can do it with a 32-bit ALU!

So here's the version 2 algorithm:

1. If rightmost bit of multiplier==1
         Add multiplicand to left half of product register
2. Shift product register right by 1 bit.
3. Shift multiplier register right by 1 bit.
4. Do this 32 times.

Version 3

This is easy to see. The right half of the product register in version 2 is empty until we gradually start shifting the confirmed bits in, and that space matches exactly the space required for the multiplier at each step (remember we are shifting it to the right too). So the idea here is just to share the space in the product register, thereby eliminating the need for a multiplier register. As can be seen below, we've eliminated a shift right from the algorithm too.

Version 3 algorithm:
1. If rightmost bit of multiplier==1
         Add multiplicand to left half of product register
2. Shift product register right by 1 bit. (multiplier is stored in the same space)
3. Do this 32 times.

Division

Version 1

This is essentially the same as pencil-and-paper long division. The dividend is gradually reduced and the quotient gradually built along the way. Here's the example from the lecture notes:

                  1001    Quotient
              +-------
Divisor  1000 |1001010    Dividend
              -1000
               ----
                  10
                  101
                  1010
                 -1000
                  ----
                    10    Remainder

Algorithm:
0. a) Set Remainder register to Dividend
   b) Quotient=0  (32 bits)
   c) Divisor is placed in left half of 64-bit Divisor register
1. Remainder register = Remainder register - Divisor
2. If Remainder>=0  (okay)
          Quotient = (Quotient << 1) + 1
   else             (oops, subtracted too much)
          Remainder register = Remainder register + Divisor   (restore it)
          Quotient = (Quotient << 1) + 0
3. Divisor = Divisor >> 1
4. Repeat steps 1 to 3, 33 times.

From step 1, we see that we'll need a 64-bit ALU here too. Because the Divisor is placed in a 64-bit register we have to make the Remainder register and the ALU 64-bits wide.

Version 2

Notice that we don't need to touch the rightmost bits of the Remainder register until much later, and that all the subtraction acitivity occurs in the left side of the Remainder. This provides the idea for Version 2. Suppose we keep the Divisor at 32-bits, and place the remainder in the left half of the Remainder register. The subtraction takes place in the left half. After the subtraction, the leftmost bits of the remainder are going to be zero anyway, so we don't need them and we can shift them out. Thus we'll shift the Remainder left instead of shifting the Divisor right, and everything is still correct.

Version 3

This is the same idea as for multiplication. Once we've implemented version 2, we observe that the right side of the Remainder register is now left unused. So why not place the Quotient in there since both the Quotient and Remainder shifts left at every step?

Here's the final algorithm:

0. a) Set Remainder to Dividend, place in left half of Remainder register
   b) Quotient = 0, place in right half of Remainder register
1. Shift Remainder register left by 1 bit. (this is an optimization)
2. Left half of Remainder register = Left Half of Remainder register - Divisor
3. If Remainder>=0
      Remainder register = (Remainder register << 1) + 1 (build the Quotient)
   else
      Left half of Remainder register = Left half of Remainder register + Divisor
4. Repeat 1 to 3, 32 times.
5. Shift left half of Remainder register right by 1 bit.

Multiply/Divide in MIPS

In MIPS, the operand are 32-bits, and the result is placed in 2 32-bit registers, which match the version 3 algorithms described.

Written by : Gek Siong Low (cs61c-tb), Spring 1999