CSC3050 Computer Architecture Arithematic and Logic Unit Prof. Yeh-Ching Chung School of Science and Engineering Chinese University of Hong Kong, Shenzhen National Tsing Hua University copyright OIA 1 Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers Representation and operations National Tsing Hua University copyright OIA 2 MIPS Arithmetic and Logic Unit (ALU) Must support the Arithmetic/Logic operati ons of the ISA zero add, addi, addiu, addu

sub, subu mult, multu, div, divu 32 A ALU sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu With special handling for sign extend addi, addiu, slti, sltiu zero extend andi, ori, xori overflow 32 result 32 B 4 m (operation) overflow detection add, addi, sub

National Tsing Hua University copyright OIA 3 MIPS Arithmetic and Logic Instructions R format op rs rt I format op rs rt rd shamt funct immediate R-format op funct

I-format op funct op add 000000 100000 slt 000000 101010 addi 001000 addu 000000 100001 sltu 000000 101011 addiu

001001 sub 000000 100010 slti 001010 subu 000000 100011 sltiu 001011 and 000000 100100 andi 001100 or 000000 100101

ori 001101 xor 000000 100110 xori 001110 nor 000000 100111 lui 001111 National Tsing Hua University copyright OIA 4 Design MIPS ALU Requirements: must support the following arit hmetic and logic operations add, sub: Twos complement adder/subtractor with

overflow detection and, or, nor : Logical AND, logical OR, logical NOR slt (set on less than): Twos complement adder with inverter, check sign bit of result National Tsing Hua University copyright OIA 5 Design Approach Design trick 1: Divide and conquer Break the problem into simpler problems, solve them and glue together the solution Design trick 2: Solve part of the problem and ext end Design trick 3: Take pieces you know (or can ima gine) and try to put them together National Tsing Hua University copyright OIA 6 Function Specification ALUop A 4

32 Zero ALU Result 32 Overflow B 32 CarryOut National Tsing Hua University copyright OIA ALU Control (ALUop) Function 0000 and 0001 or 0010 add 0110 subtract 0111 set-on-less-than 1100

nor 7 The Diagram of a 32-Bit ALU 32 A B 32 ALUop 4 A a31 32 ALU B Zero Result 32

Overflow ALU31 c31 a0 b31 s31 m b0 ALU0 co cin s0 4 m cin ALUop 32

CarryOut Overflow Zero 32 National Tsing Hua University copyright OIA Result 8 Function Specification ALUop 4 A 32 Zero ALU B 32 Result Overflow

32 CarryOut National Tsing Hua University copyright OIA ALU Control (ALUop) Function k 0000 and 0001 or 0010 add 0110 subtract 0111 set-on-less-than 1100 nor 9 A 1-bit ALU And, Or, and Add Operations CarryIn Operation and

A 0 or Result 1 Mux B 1-bit Full Adder add 2 CarryOut National Tsing Hua University copyright OIA 10 A 4-bit ALU And, Or, and Add Operations 4-bit ALU

1-bit ALU CarryIn A0 A B0 Mux B 1-bit Full Adder CarryOut National Tsing Hua University copyright OIA Operation CarryIn0 Operation Result 1-bit

ALU Result0 CarryIn1 A1 1-bit ALU B1 CarryIn2 A2 1-bit ALU B2 CarryOut0 CarryIn3 A3 1-bit ALU B3 CarryOut2 Result1 CarryOut1 Result2 Result3

CarryOut3 11 Function Specification ALUop A 4 32 Zero ALU Result 32 Overflow B 32 CarryOut National Tsing Hua University copyright OIA ALU Control (ALUop)

0000 0001 0010 0110 0111 1100 Function and or add subtract set-on-less-than nor 12 Subtraction Operation 2s complement: Take inverse of every bit and add 1 (at c in of first stage) A + B + 1 = A + (B + 1) = A + (-B) = A - B Bit-wise inverse of B is B Subtract (Bnegate) CarryIn Operation A

National Tsing Hua University copyright OIA ALU B 1 Mux Sel 0 B Result CarryOut 13 Revised Diagram LSB and MSB need to do a little extra A a31 ?

32 B 32 a0 b31 b0 ALU0 ALU31 c31 cin co s31 s0 cin 4 ALUop Supply a 1 on

subtraction 32 Overflow Zero National Tsing Hua University copyright OIA Result Combining the CarryIn and Bnegate 14 Function Specification ALUop A 4 32 Zero ALU Result 32

Overflow B 32 CarryOut National Tsing Hua University copyright OIA ALU Control (ALUop) Function 0000 and 0001 or 0010 add 0110 subtract 0111 set-on-less-than 1100 nor 15 Nor Operation A nor B = (not A) and (not B) Ainvert a

Operation 2 CarryIn 0 1 0 Bnegate 1 b ALUop Result 0 1 2 CarryOut National Tsing Hua University copyright OIA 16 Function Specification

ALUop A 4 32 Zero ALU Result 32 Overflow B 32 CarryOut National Tsing Hua University copyright OIA ALU Control (ALUop) 0000 0001 0010 0110 0111

1100 Function and or add subtract set-on-less-than nor 17 Set on Less Than (1) slt rd, rs, rt (if (rs < rt) rd = 1; else rd = 0;) 1-bit in ALU (for bits 1-30) Ainvert Operation CarryIn 0 1 0 a

1 Bnegate b 0 1 2 Less (0:bits 1-30) National Tsing Hua University copyright OIA ALUop Result 3 CarryOut 18 Set on Less Than (2) Sign bit in ALU (bit 31) Ainvert a

CarryIn Operation 0 1 Bnegate 0 1 Result b 0 1 2 3 Less Set Overflow detection National Tsing Hua University copyright OIA Overflow 19

Set on Less Than (3) Bit 0 in ALU Ainvert CarryInOperation 0 1 0 a 1 Bnegate b ALUop Result 0 1 2 3

Set CarryOut National Tsing Hua University copyright OIA 20 (Simplified) 1-bit MIPS ALU and, or, nor, add, sub, slt National Tsing Hua University copyright OIA 21 (Simplified) 32-bit ALU 1-bit ALU National Tsing Hua University copyright OIA 22 Overflow National Tsing Hua University copyright OIA 23 Overflow Detection National Tsing Hua University copyright OIA

24 Overflow Detection Logic Overflow = CarryIn[N-1] XOR CarryOut[N-1] CarryIn0 A0 B0 A1 B1 A2 B2 1-bit Result0 ALU CarryOut0 CarryIn1 1-bit Result1 ALU CarryOut1 CarryIn2 1-bit Result2 ALU CarryIn3 A3 1-bit

ALU B3 X Y X XOR Y 0 0 1 1 0 1 0 1 0 1 1 0 Overflow Result3 CarryOut3 National Tsing Hua University copyright OIA

25 Dealing with Overflow Some languages (e.g., C) ignore overflow Use MIPS addu, addui, subu instructions Other languages (e.g., Ada, Fortran) require rai sing an exception Use MIPS add, addi, sub instructions On overflow, invoke exception handler Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor reg) instruction can retrieve (copy) EPC value (to a general purpose register), to return after correctiv e action (by jump register instruction) National Tsing Hua University copyright OIA 26 Zero Detection Logic Zero Detection Logic is a one BIG NOR gate (support conditional jump) CarryIn0 A0 B0 A1 B1 A2 B2

A3 B3 Result0 1-bit ALU CarryIn1 CarryOut0 Result1 1-bit ALU CarryIn2 CarryOut1 Result2 1-bit Zero ALU CarryIn3 CarryOut2 Result3 1-bit ALU CarryOut3 National Tsing Hua University copyright OIA 27 Ripple Carry Adder Carry bit may have to propagate from LSB to MSB => worst case delay: N-st

age delay CarryIn0 A0 1-bit Result0 ALU B0 CarryOut0 CarryIn1 A1 1-bit Result1 ALU B1 CarryOut1 CarryIn2 A2 1-bit Result2 ALU B2 CarryOut2 CarryIn3 A3 1-bit Result3 ALU B3

CarryIn A B Design Trick: look for parallelism and throw hardware at it CarryOut CarryOut3 National Tsing Hua University copyright OIA 28 Carry-Lookahead Adder Carry-lookahead adder a0 b0 s0 c0 s0 = a0 b0 c0 c1 = a0b0 + a0c0 + b0c0 c1

c1 = a0b0 + (a0 + b0)c0 = g0 + p0c0 c2 = a1b1 + (a1 + b1)c1 = g1 + p1c1 = g1 + p1g0 + p1p0c0 National Tsing Hua University copyright OIA c3= a2b2 + (a2 + b2)c2 = g2 + p2c2 = g2 + p2g1 + p2p1g0+p2p1p0c0 29 Critical Path Delay Ripple-Carry Adder Delay = 2n + 1 National Tsing Hua University copyright OIA Carry-Lookahead Adder Delay = 4 30 Multiplication More complicated than addition Can be accomplished via shifting and adding 0010

1011 0010 00100 000000 0010000 00010110 (multiplicand) (multiplier) (partial product array) (product) Double precision product is produced More time and more area is required National Tsing Hua University copyright OIA 31 Multiplication Hardware (1st Version) National Tsing Hua University copyright OIA 32 Multiplication Algorithm National Tsing Hua University copyright OIA 33

Multiplication Hardware (2nd Version) 32-bit Multiplicand register, 32 -bit ALU, 64-bit Product registe r (HI & LO in MIPS), (0-bit Multiplier register) Multiplicand 32 bits 32-bit ALU Product Shift right Write Control test 64 bits National Tsing Hua University copyright OIA 34 Multiplication Hardware (2nd Version) Start 0010 x 0011 Multiplicand 0010

0011 0010 0001 Product0 = 1 Product 0000 0010 0011 0001 0011 0001 0001 0010 1000 0010 0000 1100 0010 National Tsing Hua University 0000 copyright OIA Product0 = 0 1. Test Product0 1a. Add multiplicand to left half of product and place the result in left half of Product register

2. Shift Product register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done 35 MIPS Multiplication Instruction Two 32-bit registers for product HI: most-significant 32 bits LO: least-significant 32-bits Instructions mult rs, rt / multu rs, rt 64-bit product in HI/LO mfhi rd / mflo rd Move from HI/LO to rd Can test HI value to see if product overflows 32 bits mul rd, rs, rt Least-significant 32 bits of product rd National Tsing Hua University copyright OIA 36 Divide: Paper & Pencil

National Tsing Hua University copyright OIA 37 Divide Hardware - Version 1 (1) 64-bit Divisor register (initialized with 32-bit divisor in left half), 64-b it ALU, 64-bit Remainder register (initialized with 64-bit dividend), 3 2-bit Quotient register Shift Right Divisor 64 bits Quotient 64-bit ALU Remainder Shift Left 32 bits Write Control 64 bits National Tsing Hua University copyright OIA

38 Divide Hardware - Version 1 (2) 0111 / 0010 Quot. Divisor Rem. 0000 00100000 00000111 11100111 00000111 0000 00010000 00000111 11110111 00000111 0000 00001000 00000111 11111111 00000111 0000 00000100 00000111 00000011 0001 00000011 0001 00000010 00000011 00000001 0011 00000001 0011 00000001 00000001 National Tsing Hua University copyright OIA Start: Place Dividend in Remainder 1. Subtract Divisor register from

Remainder register, and place the result in Remainder register Remainder 0 Test Remainder 2a. Shift Quotient register to left, setting new rightmost bit to 1 Remainder < 0 2b. Restore original value by adding Divisor to Remainder, place sum in Remainder, shift Quotient to the left, setting new least significant bit to 0 3. Shift Divisor register right 1 bit 33rd repetition? No: < 33 repetitions Yes: 33 repetitions Done 39 Observations - Version 1

Half of the bits in divisor register always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit => switch order to shift first and then subtract => save 1 iteration Eliminate Quotient register by combining with Remainder register as shifted left National Tsing Hua University copyright OIA 40 Divide Hardware - Version 2 (1) 32-bit Divisor register, 32 -bit ALU, 64-bit Remain der register, (0-bit Quotient register) Divisor 32 bits 32-bit ALU Shift Left Remainder (Quotient) 64 bits

National Tsing Hua University copyright OIA Control Write 41 Divide Hardware - Version 2 (2) Start: Place Dividend in Remainder 0111 / 0010 Step 0 1.1 1.2 1.3b 2.2 2.3b 3.2 3.3a 4.2 4.3a Remainder Div. 0000 0111 0010 0000 1110 1110 1110 0001 1100 1111 1100

0011 1000 0001 1000 0011 0001 0001 0001 0010 0011 0001 0011 1. Shift Remainder register left 1 bit 2. Subtract Divisor register from the left half of Remainder register, and place the result in the left half of Remainder register Remainder 0 3a. Shift Remainder to left, setting new rightmost bit to 1 Test Remainder Remainder < 0 3b. Restore original value by adding Divisor to left half of Remainder, and place sum in left half of Remainder. Also shift Remainder to left, setting the new least significant bit to 0 32nd No: < 32 repetitions

repetition? Yes: 32 repetitions Done. Shift left half of Remainder right 1 bit National Tsing Hua University copyright OIA 42 MIPS Division Instruction Instruction div $t1, $t2 # t1 / t2 Quotient stored in Lo, remainder in Hi mflo $t3 #copy quotient to t3 mfhi $t4 #copy remainder to t4 3-step process Unsigned division: divu $t1, $t2 # t1 / t2 Just like div, except now interpret t1, t2 as unsigned integers instead of signed Answers are also unsigned, use mfhi, mflo to access No overflow or divide-by-0 checking Software must perform checks if required National Tsing Hua University copyright OIA 43 Signed Divide

Remember signs, make positive, complement quotient and remainder if necessary Let Dividend and Remainder have same sign a nd negate Quotient if Divisor sign & Dividend s ign disagree, e.g., -7 2 = -3, remainder = -1 -7 - 2 = 3, remainder = -1 Satisfy Dividend =Quotient x Divisor + Remainder National Tsing Hua University copyright OIA 44 Observations: Multiply and Divide Same hardware as multiply: Just need ALU to add or subtract, and 64-bit reg ister to shift left (multiply: shift right) Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Multiplicand Divisor 32 bits 32 bits 32-bit ALU

32-bit ALU Shift Left Product Shift right Write 64 bits National Tsing Hua University copyright OIA Control test Remainder (Quotient) 64 bits Control Write 45 Multiply/Divide Hardware 32-bit Multiplicand/Divisor register, 32 -bit ALU, 64-bit Product/Remai nder register, (0-bit Multiplier/Quotient register)

Multiplicand/ Divisor 32 bits 32-bit ALU Shift Right Product/ Remainder Multiplier/ Quotient 64 bits National Tsing Hua University copyright OIA Shift Left Control Write 46 Floating-Point Numbers Representation for non-integral numbers Include very small and very large numbers Like scientific notation 2.34 1056 +0.002 104

+987.02 109 (normalized) (not normalized) In binary 1.xxxxxxx2 2yyyy Types float and double in C National Tsing Hua University copyright OIA 47 Floating-Point Standard Defined by IEEE 754-1985 standard Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Two representations Single precision (32-bit) Double precision (64-bit) National Tsing Hua University copyright OIA 48 IEEE Floating-Point Format 1 bit

8 bits 23 bits s exponent x = (1)s (1+fraction) 2(exponent127) Single-Precision 1 bit 11 bits s exponent fraction/mantissa exponent fraction Value represented 0000 0000

0 0 1111 1111 0 1111 1111 non-zero Double-Precision National Tsing Hua University copyright OIA NaN 52 bits fraction/mantissa x = (1)s (1+fraction) 2(exponent1023) 49 Exercise L04-2

Ex-1: What is the IEEE single precision number 0x40C0000 0 representing in decimal? National Tsing Hua University copyright OIA 50 Exercise L04-3 Ex-2: How to represent 0.5 in IEEE single precision binary floating-point format? National Tsing Hua University copyright OIA 51 Floating-Point Addition (1) Consider a 4-digit decimal example 9.999 101 + 1.610 101 1. Align decimal points Shift number with smaller exponent 9.999 101 + 0.016 101 2. Add significands 9.999 101 + 0.016 101 = 10.015 101

3. Normalize result & check for over/underflow 1.0015 102 4. Round and renormalize if necessary 1.002 102 National Tsing Hua University copyright OIA 52 Floating-Point Addition (2) Now consider a 4-digit binary example 1.0002 21 + 1.1102 22 (0.5 + 0.4375) 1. Align binary points Shift number with smaller exponent 1.0002 21 + 0.1112 21 2. Add significands 1.0002 21 + 0.1112 21 = 0.0012 21 3. Normalize result & check for over/underflow 1.0002 24, with no over/underflow 4. Round and renormalize if necessary 1.0002 24 (no change) = 0.0625 National Tsing Hua University copyright OIA 53

FP Adder Hardware (1) Much more complex than integer adder Doing it in one clock cycle would take too long Much longer than integer operations Slower clock would penalize all instructions FP adder usually takes several cycles Can be pipelined National Tsing Hua University copyright OIA 54 FP Adder Hardware (2) Step 1 Step 2 Step 3 Step 4 National Tsing Hua University copyright OIA 55 FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder But uses a multiplier for significands instead of an adder

FP arithmetic hardware usually does Addition, subtraction, multiplication, division, reciprocal, square-ro ot FP integer conversion Operations usually takes several cycles Can be pipelined National Tsing Hua University copyright OIA 56 FP Instructions in MIPS (1) FP hardware is coprocessor 1 Adjunct processor that extends the ISA Separate FP registers 32 single-precision: $f0, $f1, $f31 Paired for double-precision: $f0/$f1, $f2/$f3, FP instructions operate only on FP registers Programs generally dont do integer ops on FP data, or vice versa More registers with minimal code-size impact FP load and store instructions lwc1, ldc1, swc1, sdc1 e.g., ldc1 $f8, 32($sp) National Tsing Hua University copyright OIA

57 FP Instructions in MIPS (2) Single-precision arithmetic add.s, sub.s, mul.s, div.s e.g., add.s $f0, $f1, $f6 Double-precision arithmetic add.d, sub.d, mul.d, div.d e.g., mul.d $f4, $f4, $f6 Single- and double-precision comparison c.xx.s, c.xx.d (xx is eq, lt, le, gt, ge) Sets or clears FP condition-code bit e.g. c.lt.s $f3, $f4 Branch on FP condition code true or false bc1t, bc1f e.g., bc1t TargetLabel National Tsing Hua University copyright OIA 58