

# Lecture 2: MIPS Processor Example

## Outline

- Design Partitioning
- MIPS Processor Example
  - Architecture
  - Microarchitecture
  - Logic Design
  - Circuit Design
  - Physical Design
  - **J** Fabrication, Packaging, Testing



# **Coping with Complexity**

#### □ How to design System-on-Chip?

- Many millions (even billions!) of transistors
- Tens to hundreds of engineers
- Structured Design
- Design Partitioning

# **Structured Design**

#### □ Hierarchy: Divide and Conquer

- Recursively system into modules

#### Regularity

- Reuse modules wherever possible
- Ex: Standard cell library
- **Modularity**: well-formed interfaces
  - Allows modules to be treated as black boxes

### Locality

- Physical and temporal

# **Design Partitioning**

#### □ Architecture: User's perspective, what does it do?

- Instruction set, registers

– MIPS, x86, Alpha, PIC, ARM, ...

#### **Microarchitecture**

- Single cycle, multcycle, pipelined, superscalar?
- **Logic**: how are functional blocks constructed
  - Ripple carry, carry lookahead, carry select adders
- □ Circuit: how are transistors used
  - Complementary CMOS, pass transistors, domino
- **Physical**: chip layout
  - Datapaths, memories, random logic



# **MIPS Architecture**

- □ Example: subset of MIPS processor architecture
  - Drawn from Patterson & Hennessy
- □ MIPS is a 32-bit architecture with 32 registers
  - Consider 8-bit subset using 8-bit datapath
  - Only implement 8 registers (\$0 \$7)
  - \$0 hardwired to 0000000
  - 8-bit program counter
- You'll build this processor in the labs
  - Illustrate the key concepts in VLSI design

## **Instruction Set**

| Table 1.7 MIPS in | nstruction set (s | subset supported)                         |          |        |        |
|-------------------|-------------------|-------------------------------------------|----------|--------|--------|
| Instruction       | Function          |                                           | Encoding | ор     | funct  |
| add \$1, \$2, \$3 | addition:         | \$1 → \$2 + \$3                           | R        | 000000 | 100000 |
| sub \$1, \$2, \$3 | subtraction:      | \$1 → \$2 – \$3                           | R        | 000000 | 100010 |
| and \$1, \$2, \$3 | bitwise and:      | \$1 → \$2 and \$3                         | R        | 000000 | 100100 |
| or \$1, \$2, \$3  | bitwise or:       | \$1 → \$2 or \$3                          | R        | 000000 | 100101 |
| slt \$1, \$2, \$3 | set less than:    | \$1 → 1 if \$2 < \$3<br>\$1 → 0 otherwise | R        | 000000 | 101010 |
| addi \$1, \$2,    | add immediate:    | \$1→ \$2 + imm                            | I        | 001000 | n/a    |
| beq \$1, \$2, imm | branch if equal:  | PC → PC + imm <sup>a</sup>                | I        | 000100 | n/a    |
| j destination     | jump:             | PC_destination <sup>a</sup>               | J        | 000010 | n/a    |
| lb \$1, imm(\$2)  | load byte:        | \$1 → mem[\$2 + imm]                      | I        | 100000 | n/a    |
| sb \$1, imm(\$2)  | store byte:       | mem[\$2 + imm] → \$1                      | I        | 110000 | n/a    |

2: MIPS Processor Example

CMOS VLSI Design 4th Ed.

# **Instruction Encoding**

#### □ 32-bit instruction encoding

- Requires four cycles to fetch on 8-bit datapath



# Fibonacci (C)

```
f_0 = 1; f_{-1} = -1
f_n = f_{n-1} + f_{n-2}
f = 1, 1, 2, 3, 5, 8, 13, ...
 int fib(void)
 {
  int f1 = 1, f2 = -1; /* last two Fibonacci numbers */
  while (n != 0) { /* count down to n = 0 */
    f1 = f1 + f2;
    f2 = f1 - f2;
    n = n - 1;
   }
   return f1;
 }
```

# Fibonacci (Assembly)

```
\Box 1<sup>st</sup> statement: n = 8
```

□ How do we translate this to assembly?

```
# fib.asm
# Register usage: $3: n $4: f1 $5: f2
# return value written to address 255
fib: addi $3, $0, 8  # initialize n=8
        addi $4, $0, 1  # initialize f1 = 1
        addi $5, $0, -1  # initialize f2 = -1
loop: beq $3, $0, end  # Done with loop if n = 0
        add $4, $4, $5  # f1 = f1 + f2
        sub $5, $4, $5  # f2 = f1 - f2
        addi $3, $3, -1  # n = n - 1
        j loop  # repeat until done
end: sb $4, 255($0)  # store result in address 255
```

2: MIPS Processor Example

CMOS VLSI Design <sup>4th Ed.</sup>

12

# Fibonacci (Binary)

□ 1<sup>st</sup> statement: addi \$3, \$0, 8

- □ How do we translate this to machine language?
  - Hint: use instruction encodings below



2: MIPS Processor Example

# Fibonacci (Binary)

#### □ Machine language program

|    |         |      |       |        |       |        |        |         |         | Hexadecimal |
|----|---------|------|-------|--------|-------|--------|--------|---------|---------|-------------|
| In | struct  | ion  |       | Binary | Encod | ing    |        |         |         | Encoding    |
| ad | di \$3, | \$0, | 8     | 001000 | 00000 | 00011  | 0000   | 000000  | 0001000 | 20030008    |
| ad | di \$4, | \$0, | 1     | 001000 | 00000 | 00100  | 0000   | 000000  | 0000001 | 20040001    |
| ad | di \$5, | \$0, | -1    | 001000 | 00000 | 00101  | 1111   | 1111111 | 1111111 | 2005ffff    |
| be | q \$3,  | \$0, | end   | 000100 | 00011 | 00000  | 0000   | 000000  | 0000101 | 10600005    |
| ad | d \$4,  | Ş4,  | \$5   | 000000 | 00100 | 00101  | 00100  | 00000   | 100000  | 00852020    |
| su | b \$5,  | Ş4,  | \$5   | 000000 | 00100 | 00101  | 00101  | 00000   | 100010  | 00852822    |
| ad | di \$3, | \$3, | -1    | 001000 | 00011 | 00011  | 1111   | 1111111 | 1111111 | 2063ffff    |
| j  | loop    |      |       | 000010 | 0000  | 000000 | 000000 | 000000  | 0000011 | 08000003    |
| sb | Ş4,     | 255  | (\$0) | 110000 | 00000 | 00100  | 0000   | 000011  | 1111111 | a00400ff    |

2: MIPS Processor Example



2: MIPS Processor Example

# Instruction Execution

- □ Instruction execution generally flows from left to right
- The program counter (PC) specifies the address of the instruction. The instruction is loaded 1 byte at a time over four cycles from an off-chip memory into the 32-bit instruction register (IR)
- The Opfield (bits 31:26 of the instruction) is sent to the controller, which sequences the datapath through the correct operations to execute the instruction
- The controller contains a finite state machine (FSM) that generates multiplexer select signals and register enables to sequence the datapath

## **Multicycle Controller**



# Logic Design

□ Start at top level

- Hierarchically decompose MIPS into units

Top-level interface







# HDLs

- □ Hardware Description Languages
  - Widely used in logic design
  - Verilog and VHDL
  - Describe hardware using code
    - Document logic functions
    - Simulate logic before building
    - Synthesize code into gates and layout
      - Requires a library of standard cells



# **Circuit Design**

□ How should logic be implemented?

- NANDs and NORs vs. ANDs and ORs?
- Fan-in and fan-out?

- How wide should transistors be?

□ These choices affect speed, area, power

- ❑ Logic synthesis makes these choices for you
  - Good enough for many applications
  - Hand-crafted circuits are still better





## **Transistor-Level Netlist**



## **SPICE Netlist**

. SUBCKT CARRY A B C COUT VDD GND MN1 I1 A GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN2 I1 B GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN3 CN C I1 GND NMOS W=1U L=0.18U AD=0.5P AS=0.5P MN4 T2 B GND GND NMOS W=1U L=0.18U AD=0.15P AS=0.5P MN5 CN A 12 GND NMOS W=1U L=0.18U AD=0.5P AS=0.15P MP1 I3 A VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1 P MP2 I3 B VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1P MP3 CN C I3 VDD PMOS W=2U L=0.18U AD=1P AS=1P MP4 I4 B VDD VDD PMOS W=2U L=0.18U AD=0.3P AS=1P MP5 CN A I4 VDD PMOS W=2U L=0.18U AD=1P AS=0.3P MN6 COUT CN GND GND NMOS W=2U L=0.18U AD=1P AS=1P MP6 COUT CN VDD VDD PMOS W=4U L=0.18U AD=2P AS=2P CI1 I1 GND 2FF CI3 I3 GND 3FF CA A GND 4FF CB B GND 4FF CC C GND 2FF CCN CN GND 4FF CCOUT COUT GND 2FF . ENDS

#### 2: MIPS Processor Example

#### CMOS VLSI Design 4th Ed.

27

# **Physical Design**

□ Floorplan

Standard cells

- Place & route
- Datapaths
  - Slice planning
  - Area estimation

## **MIPS Floorplan**





## **Standard Cells**

- Uniform cell height
- Uniform well height
- $\square M1 V_{DD} and GND rails$
- M2 Access to I/Os
- □ Well / substrate taps
- Exploits regularity



2: MIPS Processor Example



# **Pitch Matching**

□ Synthesized controller area is mostly wires

- Design is smaller if wires run through/over cells
- Smaller = faster, lower power as well!

Design snap-together cells for datapaths and arrays

- Plan wires into cells
- Connect by abutment
  - Exploits locality
  - Takes lots of effort

| Α | A | А | A | В |
|---|---|---|---|---|
| Α | Α | А | Α | В |
| Α | А | А | А | В |
| Α | А | А | А | В |
| ( | С |   | C | D |



## **Slice Plans**

#### □ Slice plan for bitslice

- Cell ordering, dimensions, wiring tracks
- Arrange cells for wiring locality



## Area Estimation

Need area estimates to make floorplan

- Compare to another block you already designed
- Or estimate from transistor counts
- Budget room for large wiring tracks
- Your mileage may vary; derate by 2x for class.

| Element                              | Area                                   |
|--------------------------------------|----------------------------------------|
| random logic (2-level metal process) | 1000 – 1500 $\lambda^2$ / transistor   |
| datapath                             | 250 – 750 $\lambda^2$ / transistor     |
|                                      | or 6 WL + 360 $\lambda^2$ / transistor |
| SRAM                                 | $1000 \lambda^2 / \text{bit}$          |
| DRAM (in a DRAM process)             | 100 $\lambda^2$ / bit                  |
| ROM                                  | $100 \lambda^2 / \text{bit}$           |

2: MIPS Processor Example

# **Design Verification**

- □ Fabrication is slow & expensive
  - MOSIS 0.6µm: \$1000, 3 months
  - 65 nm: \$3M, 1 month
- Debugging chips is very hard
  - Limited visibility into operation
- Prove design is right before building!
  - Logic simulation
  - Ckt. simulation / formal verification
  - Layout vs. schematic comparison
  - Design & electrical rule checks

□ Verification is > 50% of effort on most chips!





# **Fabrication & Packaging**

- Tapeout final layout
- **G** Fabrication
  - 6, 8, 12" wafers
  - Optimized for throughput, not latency (10 weeks!)
  - Cut into individual dice
- Packaging



Bond gold wires from die I/O pads to package



CMOS VLSI Design 4th Ed.

# Testing

- □ Test that chip operates
  - Design errors
  - Manufacturing errors
- □ A single dust particle or wafer defect kills a die
  - Yields from 90% to < 10%
  - Depends on die size, maturity of process
  - Test each part before shipping to customer



## **MIPS R3000 Processor**

- □ 32-bit 2<sup>nd</sup> generation commercial processor (1988)
- Led by John Hennessy (Stanford, MIPS Founder)
- 32-64 KB Caches
- I.2 μm process
- 111K Transistors
- Up to 12-40 MHz
- □ 66 mm<sup>2</sup> die
- 145 I/O Pins
- $\Box$  V<sub>DD</sub> = 5 V
- 4 Watts
- GI Workstations



http://gecko54000.free.fr/?documentations=1988\_MIPS\_R3000

2: MIPS Processor Example