

Week\_1: Introduction

NEIL H. E. WESTE DAVID MONEY HARRIS

#### Introduction

- Integrated circuits: many transistors on one chip.
- Very Large Scale Integration (VLSI): bucketloads!
- Complementary Metal Oxide Semiconductor
  - Fast, cheap, low power transistors
- Today: How to build your own simple CMOS chip
  - CMOS transistors
  - Building logic gates from transistors
  - Transistor layout and fabrication
- □ Rest of the course: How to build a good CMOS chip

### **Silicon Lattice**

- Transistors are built on a silicon substrate
- □ Silicon is a Group IV material
- □ Forms crystal lattice with bonds to four neighbors



**0: Introduction** 

CMOS VLSI Design 4th Ed.

#### Dopants

□ Silicon is a semiconductor

Pure silicon has no free carriers and conducts poorly

Adding dopants increases the conductivity

Group V: extra electron (n-type)

Group III: missing electron, called hole (p-type)



**0: Introduction** 

#### **p-n Junctions**

- A junction between p-type and n-type semiconductor forms a diode.
- Current flows only in one direction

anode cathode



# nMOS Transistor

- □ Four terminals: gate, source, drain, body
- Gate oxide body stack looks like a capacitor
  - Gate and body are conductors
  - SiO<sub>2</sub> (oxide) is a very good insulator
  - Called metal oxide semiconductor (MOS)
    capacitor
    Source Gate Drain
  - Even though gate is
    no longer made of metal\*



\* Metal gates are returning today!

# **nMOS** Operation

- Body is usually tied to ground (0 V)
- When the gate is at a low voltage:
  - P-type body is at low voltage
  - Source-body and drain-body diodes are OFF
  - No current flows, transistor is OFF



**0: Introduction** 

CMOS VLSI Design 4th Ed.

## nMOS Operation Cont.

- □ When the gate is at a high voltage:
  - Positive charge on gate of MOS capacitor
  - Negative charge attracted to body
  - Inverts a channel under gate to n-type
  - Now current can flow through n-type silicon from source through channel to drain, transistor is ON



**0: Introduction** 

# pMOS Transistor

- □ Similar, but doping and voltages reversed
  - Body tied to high voltage (V<sub>DD</sub>)
  - Gate low: transistor ON
  - Gate high: transistor OFF
  - Bubble indicates inverted behavior



**0: Introduction** 

# **Power Supply Voltage**

#### $\Box \quad \text{GND} = 0 \text{ V}$

☐ In 1980's, V<sub>DD</sub> = 5V

 $\hfill\square\hfill V_{DD}$  has decreased in modern processes

– High  $V_{DD}$  would damage modern tiny transistors

- Lower V<sub>DD</sub> saves power

## **Transistors as Switches**

- We can view MOS transistors as electrically controlled switches
- Voltage at gate controls path from source to drain





**0: Introduction** 

#### **CMOS NAND Gate**









**0: Introduction** 

#### **CMOS NOR Gate**



# **3-input NAND Gate**

- Y pulls low if ALL inputs are 1
- □ Y pulls high if ANY input is 0



# **CMOS Fabrication**

- CMOS transistors are fabricated on silicon wafer
- Lithography process similar to printing press
- On each step, different materials are deposited or etched
- Easiest to understand by viewing both top and cross-section of wafer in a simplified manufacturing process



# Well and Substrate Taps

- $\hfill\square$  Substrate must be tied to GND and n-well to  $V_{DD}$
- Metal to lightly-doped semiconductor forms poor connection called Shottky Diode
- Use heavily doped well and substrate contacts / taps





#### **Detailed Mask Views**

- Six masks
  - n-well
  - Polysilicon
  - n+ diffusion
  - p+ diffusion
  - Contact
  - Metal



CMOS VLSI Design 4th Ed.



#### Week\_2: Introduction

#### Fabrication

#### □ Chips are built in huge factories called fabs

Contain clean rooms as large as football fields



Courtesy of International Business Machines Corporation. Unauthorized use not permitted.



# **Fabrication Steps**

- Start with blank wafer
- Build inverter from the bottom up
- □ First step will be to form the n-well
  - Cover wafer with protective layer of SiO<sub>2</sub> (oxide)
  - Remove layer where n-well should be built
  - Implant or diffuse n dopants into exposed wafer
  - Strip off SiO<sub>2</sub>

| p su         | ıbstrate                            |  |
|--------------|-------------------------------------|--|
|              |                                     |  |
| Introduction | CMOS VLSI Design <sup>4th Ed.</sup> |  |



**0: Introduction** 

#### Photoresist

#### □ Spin on photoresist

- Photoresist is a light-sensitive organic polymer
- Softens where exposed to light



# Lithography

- Expose photoresist through n-well mask
- Strip off exposed photoresist



#### **Etch** □ Etch oxide with hydrofluoric acid (HF) Seeps through skin and eats bone; nasty stuff!!! Only attacks oxide where resist has been exposed **Photoresist** SiO<sub>2</sub> p substrate CMOS VLSI Design <sup>4th Ed.</sup> **0: Introduction** 7

# **Strip Photoresist**

- □ Strip off remaining photoresist
  - Use mixture of acids called piranah etch
- Necessary so resist doesn't melt in next step



## n-well

- n-well is formed with diffusion or ion implantation
- Diffusion
  - Place wafer in furnace with arsenic gas
  - Heat until As atoms diffuse into exposed Si
- Ion Implantation
  - Blast wafer with beam of As ions
  - Ions blocked by SiO<sub>2</sub>, only enter exposed Si



# **Strip Oxide**

- □ Strip off the remaining oxide using HF
- Back to bare wafer with n-well
- Subsequent steps involve similar series of steps

|                 | n well                              |  |
|-----------------|-------------------------------------|--|
| p substrate     |                                     |  |
| 0: Introduction | CMOS VLSI Design <sup>4th Ed.</sup> |  |
|                 |                                     |  |

# Polysilicon

- Deposit very thin layer of gate oxide
  - < 20 Å (6-7 atomic layers)
- □ Chemical Vapor Deposition (CVD) of silicon layer
  - Place wafer in furnace with Silane gas  $(SiH_4)$
  - Forms many small crystals called polysilicon
  - Heavily doped to be good conductor





# **Self-Aligned Process**

- Use oxide and masking to expose where n+ dopants should be diffused or implanted
- N-diffusion forms nMOS source, drain, and n-well contact



#### **N-diffusion**

- Pattern oxide and form n+ regions
- Self-aligned process where gate blocks diffusion
- Polysilicon is better than metal for self-aligned gates because it doesn't melt during later processing



## N-diffusion cont.

- □ Historically dopants were diffused
- Usually ion implantation today
- □ But regions are still called diffusion






#### Contacts

- Now we need to wire together the devices
- Cover chip with thick field oxide
- Etch oxide where contact cuts are needed





## Layout

- Chips are specified with set of masks
- Minimum dimensions of masks determine transistor size (and hence speed, cost, and power)
- □ Feature size *f* = distance between source and drain

- Set by minimum width of polysilicon

- □ Feature size improves 30% every 3 years or so
- Normalize for feature size when describing design rules
- **Express rules in terms of**  $\lambda = f/2$

- E.g.  $\lambda$  = 0.3  $\mu m$  in 0.6  $\mu m$  process

# **Simplified Design Rules**

#### Conservative rules to get you started



**0: Introduction** 

## **Inverter Layout**

□ Transistor dimensions specified as Width / Length

- Minimum size is 4 $\lambda$  / 2 $\lambda$ , sometimes called 1 unit
- In  $\mathit{f}$  = 0.6  $\mu m$  process, this is 1.2  $\mu m$  wide, 0.6  $\mu m$  long

22



### Summary

- □ MOS transistors are stacks of gate, oxide, silicon
- Act as electrically controlled switches
- Build logic gates out of switches
- Draw masks to specify layout of transistors
- Now you know everything necessary to start designing schematics and layout for a simple chip!



## Lecture\_3: Circuits & Layout

NEIL H. E. WESTE DAVID MONEY HARRIS

#### Outline

- □ A Brief History
- CMOS Gate Design
- Pass Transistors
- CMOS Latches & Flip-Flops
- Standard Cell Layouts
- Stick Diagrams

## **A Brief History**

#### □ 1958: First integrated circuit

- Flip-flop using two transistors
- Built by Jack Kilby at Texas Instruments

**2**010

- Intel Core i7 µprocessor
  - 2.3 billion transistors
- 64 Gb Flash memory
  - > 16 billion transistors



**Courtesy Texas Instruments** 



1: Circuits & Layout

#### **Growth Rate**

- □ 53% compound annual growth rate over 50 years
  - No other technology has grown so fast so long
- Driven by miniaturization of transistors
  - Smaller is cheaper, faster, lower in power!
  - Revolutionary effects on society



Electronics Magazine

1: Circuits & Layout



## Invention of the Transistor

- Vacuum tubes ruled in first half of 20<sup>th</sup> century Large, expensive, power-hungry, unreliable
- 1947: first point contact transistor
  - John Bardeen and Walter Brattain at Bell Labs
  - See Crystal Fire
    - by Riordan, Hoddeson



CMOS VLSI Design 4th Ed.

AT&T Archives. Reprinted with permission.

## **Transistor Types**

#### Bipolar transistors

- npn or pnp silicon structure
- Small current into very thin base layer controls large currents between emitter and collector
- Base currents limit integration density
- Metal Oxide Semiconductor Field Effect Transistors
  - nMOS and pMOS MOSFETS
  - Voltage applied to insulated gate controls current between source and drain
  - Low power allows very high integration

## **MOS Integrated Circuits**

1970's processes usually had only nMOS transistors
Inexpensive, but consume power while idle



Intel 1101 256-bit SRAM Intel 4004 4-bit μProc
1980s-present: CMOS processes for low idle power

1: Circuits & Layout

#### Moore's Law: Then

- □ 1965: Gordon Moore plotted transistor on each chip
  - Fit straight line on semilog scale
  - Transistor counts have doubled every 26 months



#### **Integration Levels**

- SSI: 10 gates
- **MSI**: 1000 gates
- **LSI**: 10,000 gates
- VLSI: > 10k gates

1: Circuits & Layout

#### And Now...



1: Circuits & Layout





## **Complementary CMOS**

- □ Complementary CMOS logic gates
  - nMOS pull-down network
  - pMOS pull-up network
  - a.k.a. static CMOS



|               | Pull-up OFF | Pull-up ON  |
|---------------|-------------|-------------|
| Pull-down OFF | Z (float)   | 1           |
| Pull-down ON  | 0           | X (crowbar) |

#### **Series and Parallel**

- nMOS: 1 = ON
- $\square$  pMOS: 0 = ON
- Series: both must be ON
- □ *Parallel*: either can be ON



## **Conduction Complement**

- Complementary CMOS gates always produce 0 or 1
- Ex: NAND gate
  - Series nMOS: Y=0 when both inputs are 1
  - Thus Y=1 when either input is 0
  - Requires parallel pMOS
- Rule of Conduction Complements
  - Pull-up network is a complement of pull-down

Α

B

Parallel -> series, series -> parallel

### **Compound Gates**

□ Compound gates can do any inverting function  $Y = \overline{A.B + C.D}$  (AND - AND - OR - INVERT, AOI22)



1: Circuits & Layout



# Signal Strength

- □ *Strength* of signal
  - How close it approximates ideal voltage source
- $\Box$  V<sub>DD</sub> and GND rails are strongest 1 and 0
- nMOS pass strong 0
  - But degraded or weak 1
- pMOS pass strong 1
  - But degraded or weak 0
- Thus, nMOS are best for pull-down network

#### **Pass Transistors**

Transistors can be used as switches



1: Circuits & Layout

#### **Transmission Gates**

Pass transistors produce degraded outputs
*Transmission gates* pass both 0 and 1 well





1: Circuits & Layout

#### **Tristates** □ *Tristate buffer* produces Z when not enabled EN ΕN Y Α Y 0 0 Α 0 1 1 0 EN 1 1 Y Α

CMOS VLSI Design <sup>4th Ed.</sup>

EN

## **Nonrestoring Tristate**

- □ Transmission gate acts as tristate buffer
  - Only two transistors
  - But nonrestoring
    - Noise on A is passed on to Y



1: Circuits & Layout

#### **Tristate Inverter**

- □ Tristate inverter produces restored output
  - Violates conduction complement rule
  - Because we want a Z output



1: Circuits & Layout

### Multiplexers

□ 2:1 multiplexer chooses between two inputs

| S | D1 | D0 | Y |
|---|----|----|---|
| 0 | Х  | 0  |   |
| 0 | Х  | 1  |   |
| 1 | 0  | Х  |   |
| 1 | 1  | X  |   |



1: Circuits & Layout

## **Gate-Level Mux Design**

- $\Box \quad Y = SD_1 + \overline{S}D_0 \text{ (too many transistors)}$
- How many transistors are needed?



### **Transmission Gate Mux**

- □ Nonrestoring mux uses two transmission gates
  - Only 4 transistors



## **Inverting Mux**

- Inverting multiplexer
  - Use compound AOI22
  - Or pair of tristate inverters
  - Essentially the same thing
- Noninverting multiplexer adds an inverter



1: Circuits & Layout

## 4:1 Multiplexer

#### □ 4:1 mux chooses one of 4 inputs using two selects

Two levels of 2:1 muxes





1: Circuits & Layout



## Lecture\_4 Circuits & Layout

NEIL H. E. WESTE DAVID MONEY HARRIS
#### **D** Latch

- □ When CLK = 1, latch is *transparent* 
  - D flows through to Q like a buffer
- □ When CLK = 0, the latch is *opaque* 
  - Q holds its old value independent of D
- □ a.k.a. transparent latch or level-sensitive latch



#### **D** Latch Design

#### Multiplexer chooses D or old Q



1: Circuits & Layout

#### **D** Latch Operation



### **D** Flip-flop

- When CLK rises, D is copied to Q
- At all other times, Q holds its value
- a.k.a. positive edge-triggered flip-flop, master-slave flip-flop





#### **D** Flip-flop Operation







1: Circuits & Layout

#### **Race Condition**

Back-to-back flops can malfunction from clock skew

- Second flip-flop fires late
- Sees first flip-flop change and captures its result
- Called hold-time failure or race condition



## Nonoverlapping Clocks

- Nonoverlapping clocks can prevent races
  As long as nonoverlap exceeds clock skew
- We will use them in this class for safe design
  - Industry manages skew more carefully instead



1: Circuits & Layout

#### Gate Layout

- □ Layout can be very time consuming
  - Design gates to fit together nicely
  - Build a library of standard cells
- □ Standard cell design methodology
  - $V_{DD}$  and GND should abut (standard height)
  - Adjacent gates should satisfy design rules
  - nMOS at bottom and pMOS at top
  - All gates include well and substrate contacts

#### **Example: Inverter**



1: Circuits & Layout

CMOS VLSI Design 4th Ed.

#### **Example: NAND3**

- Horizontal N-diffusion and p-diffusion strips
- Vertical polysilicon gates
- □ Metal1 V<sub>DD</sub> rail at top
- Metal1 GND rail at bottom
- $\Box 32 \lambda by 40 \lambda$



CMOS VLSI Design 4th Ed.

#### **Stick Diagrams**

Stick diagrams help plan layout quickly

- Need not be to scale
- Draw with color pencils or dry-erase markers



## Wiring Tracks

A *wiring track* is the space required for a wire
 - 4 λ width, 4 λ spacing from neighbor = 8 λ pitch
 Transistors also consume one wiring track



## Well spacing

- $\hfill\square$  Wells must surround transistors by 6  $\lambda$ 
  - Implies 12  $\lambda$  between opposite transistor flavors
  - Leaves room for one wire track



1: Circuits & Layout

#### **Area Estimation**

Estimate area by counting wiring tracks

– Multiply by 8 to express in  $\lambda$ 



1: Circuits & Layout

1: Circuits & Layout

#### **Euler's Path**

#### Standard Cell Layout Methodology – 1990s



#### **Euler's Path**

#### Two Versions of C • (A + B)



CMOS VLSI Design 4th Ed.

#### **Euler's Path**

#### **Stick Diagrams**





#### OAI22 Logic Graph



1: Circuits & Layout

#### Example: x = ab+cd

#### Example: x = ab+cd





(a) L ogic graphs for (ab+cd)

(b) Euler Paths {a b c d}



(c) stick diagram for ordering {a b c d}

© Digital Integrated Circuits<sup>2nd</sup>

24 Combinational Circuits



#### Lecture\_5: Logical Effort

NEIL H. E. WESTE DAVID MONEY HARRIS

#### Outline

- □ Logical Effort
- Delay in a Logic Gate
- Multistage Logic Networks
- Choosing the Best Number of Stages
- Example
- Summary

#### Introduction

- Chip designers face a bewildering array of choices
  - What is the best circuit topology for a function?
  - How many stages of logic give least delay?
  - How wide should the transistors be?
- Logical effort is a method to make these decisions
  - Uses a simple model of delay
  - Allows back-of-the-envelope calculations
  - Helps make rapid comparisons between alternatives
  - Emphasizes remarkable symmetries



#### Example Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the A[3:0] A[3:0] decoder for a register file. 32 bits 4:16 Decode **Decoder specifications:** 16 words 16 **Register File** 16 word register file Each word is 32 bits wide Each bit presents load of 3 unit-sized transistors True and complementary address inputs A[3:0] Each input may drive 10 unit-sized transistors Ben needs to decide: – How many stages to use? - How large should each gate be? – How fast can decoder operate? CMOS VLSI Design 4th Ed. 6: Logical Effort

# Alternative Logic Structures



## Delay in a Logic Gate

- Express delays in process-independent unit
- Delay has two components: d = f + p
  - *f*: *effort delay* = *gh* (a.k.a. stage effort)
    - Again, has two components
  - g: logical effort
    - Measures relative ability of gate to deliver current
    - $g \equiv 1$  for inverter
- $\square \quad h: electrical effort = C_{out} / C_{in} \qquad p = fan-in$ 
  - Ratio of output to input capacitance
  - Sometimes called fanout
  - **p**: parasitic delay
    - Represents delay of gate driving no load
    - Set by internal parasitic capacitance

CMOS VLSI Design <sup>4th Ed.</sup>

g

 $d = \frac{d_{abs}}{d}$ 

3RC

3 ps in 65 nm process

 $\frac{C_{gatenorm}}{C} = \frac{C_{gate}}{3}$ 

60 ps in 0.6  $\mu$ m process

 $\tau =$ 

#### **Delay Plots**

d = f + p= gh + p



6: Logical Effort

CMOS VLSI Design 4th Ed.

## **Computing Logical Effort**

- DEF: Logical effort is the ratio of the input capacitance of a gate to the input capacitance of an inverter delivering the same output current.
- □ Measure from delay vs. fanout plots
- Or estimate by counting transistor widths



6: Logical Effort

CMOS VLSI Design 4th Ed.

#### **Catalog of Gates**

□ Logical effort of common gates

| Gate type      | Number of inputs |      |          |              |          |  |
|----------------|------------------|------|----------|--------------|----------|--|
|                | 1                | 2    | 3        | 4            | n        |  |
| Inverter       | 1                |      |          |              |          |  |
| NAND           |                  | 4/3  | 5/3      | 6/3          | (n+2)/3  |  |
| NOR            |                  | 5/3  | 7/3      | 9/3          | (2n+1)/3 |  |
| Tristate / mux | 2                | 2    | 2        | 2            | 2        |  |
| XOR, XNOR      |                  | 4, 4 | 6, 12, 6 | 8, 16, 16, 8 |          |  |

6: Logical Effort

## **Catalog of Gates**

Parasitic delay of common gates

– In multiples of  $p_{inv}$  ( $\approx$ 1)

| Gate type      | Number of inputs |   |   |   |    |  |
|----------------|------------------|---|---|---|----|--|
|                | 1                | 2 | 3 | 4 | n  |  |
| Inverter       | 1                |   |   |   |    |  |
| NAND           |                  | 2 | 3 | 4 | n  |  |
| NOR            |                  | 2 | 3 | 4 | n  |  |
| Tristate / mux | 2                | 4 | 6 | 8 | 2n |  |
| XOR, XNOR      |                  | 4 | 6 | 8 |    |  |

# **Example: Ring Oscillator**

□ Estimate the frequency of an N-stage ring oscillator



Logical Effort:g =Electrical Effort:h =Parasitic Delay:p =Stage Delay:d =Frequency: $f_{osc} =$ 

31 stage ring oscillator in 0.6  $\mu$ m process has frequency of ~ 200 MHz

$$f_{osc} = \frac{1}{4Nt_{inv}}Hz$$

#### **Example: FO4 Inverter**

□ Estimate the delay of a fanout-of-4 (FO4) inverter



- Logical Effort: g = Electrical Effort: h =
- Parasitic Delay: p =
- Stage Delay: d =

## Multistage Logic Networks

□ Logical effort generalizes to multistage networks □ Path Logical Effort  $G = \prod g_i$ 

Path Electrical Effort

$$H = \frac{C_{\text{out-path}}}{C_{\text{in-path}}}$$

 $F = \prod f = \prod \sigma h$ 

Path Effort Delay

**6: Logical Effort** 

CMOS VLSI Design 4th Ed.

## Multistage Logic Networks

- □ Logical effort generalizes to multistage networks □ Path Logical Effort  $G = \prod g_i$
- □ Path Electrical Effort H

$$I = \frac{C_{out-path}}{C_{in-path}}$$

- Path Effort Delay
- $F = \prod f_i = \prod g_i h_i$
- $\Box \quad Can we write F = GH?$
### Paths that Branch

□ No! Consider paths that branch:





## **Multistage Delays**

Path Effort Delay

$$D_F = \sum f_i$$

Path Parasitic Delay

$$P = \sum p_i$$

Path Delay

$$D = \sum d_i = D_F + P$$

CMOS VLSI Design 4th Ed.

## **Designing Fast Circuits**

$$D = \sum d_i = D_F + P$$

Delay is smallest when each stage bears same effort

$$\hat{f} = g_i h_i = F^{\frac{1}{N}}$$

Thus minimum delay of N stage path is

□ This is a key result of logical effort

- Find fastest possible delay
- Doesn't require calculating gate sizes

#### **Gate Sizes**

□ How wide should the gates be for least delay?

$$\hat{f} = gh = g \frac{C_{out}}{C_{in}}$$
$$\Rightarrow C_{in_i} = \frac{g_i C_{out}}{\hat{f}}$$

- Working backward, apply capacitance transformation to find input capacitance of each gate given the load it drives.
- □ Check work by verifying input cap spec is met.



#### Lecture\_6: Logical Effort

NEIL H. E. WESTE DAVID MONEY HARRIS



## Example: 3-stage path



## **Example: 3-stage path**



y =

 $\mathbf{X} =$ 

2y/x for branching (3x/Cin).4/3 = 5 gives Cin=8



**6: Logical Effort** 



6: Logical Effort

**Derivation**  
Consider adding inverters to end of path  
- How many give least delay?  

$$D = NF^{\frac{1}{N}} + \sum_{i=1}^{n_1} p_i + (N - n_1) p_{inv} \xrightarrow{\text{Logic Block:} \atop n_i \text{Stages} \atop p_{ath EffortF}} \xrightarrow{\text{N-}n_i \text{Extrainverters}} \xrightarrow{\text{D} \circ \circ \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ \circ} \xrightarrow{\text{D} \circ \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ} \xrightarrow{\text{D} \circ \circ \circ} \xrightarrow{\text{D} \circ} \xrightarrow{\text{D} \circ \xrightarrow{\text{D} \circ} \xrightarrow{\text{D} \circ} \xrightarrow{\text{D} \circ} \xrightarrow{\text{D} \circ} \xrightarrow{\text{D} \circ \xrightarrow{\text{D} \circ} \xrightarrow{\text{D} \circ \xrightarrow{\text{D} \circ} \xrightarrow{\text{D$$

6: Logical Effort

## **Best Stage Effort**

 $\square p_{inv} + \rho (1 - \ln \rho) = 0 \text{ has no closed-form solution}$ 

□ Neglecting parasitic ( $p_{inv} = 0$ ), we find  $\rho = 2.718$  (e) □ For  $p_{inv} = 1$ , solve numerically for  $\rho = 3.59$ 

# Sensitivity Analysis

□ How sensitive is delay to using exactly the best number of stages?



## Example, Revisited

- Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the decoder for a register file.
- Decoder specifications:
  - 16 word register file
  - Each word is 32 bits wide
  - Each bit presents load of 3 unit-sized transistors
  - True and complementary address inputs A[3:0]
  - Each input may drive 10 unit-sized transistors
- Ben needs to decide:
  - How many stages to use?
  - How large should each gate be?
  - How fast can decoder operate?



6: Logical Effort

#### CMOS VLSI Design 4th Ed.



## Number of Stages

Decoder effort is mainly electrical and branching
 Electrical Effort: H =
 Branching Effort: B =

If we neglect logical effort (assume G = 1)
 Path Effort: F =

Number of Stages: N =

□ Try a -stage design

#### Gate Sizes & Delay



**6: Logical Effort** 

CMOS VLSI Design 4th Ed.

#### G,H and B Calculations

 $\Box$  G = 1(INV10) \* 6/3 (NAND4) \* 1(INVz) = 6/3 = 2

 $\square$  H = 3\*32/10 = 9.6

- B, each input is connected to 8 words because the input variables A[0-3] and their complements are available.
   So, path branching is (1+7)/1 one ON path and seven OFF
  - paths.
  - So, B is equal to 8

Then  $F = GHB = 6/3*9.6*8 = 153.6 \sim 154$ 



## Comparison

- Compare many alternatives with a spreadsheet
- **D** = N(76.8 G)<sup>1/N</sup> + P

| Design                      | Ν | G    | Ρ | D    |
|-----------------------------|---|------|---|------|
| NOR4                        | 1 | 3    | 4 | 234  |
| NAND4-INV                   | 2 | 2    | 5 | 29.8 |
| NAND2-NOR2                  | 2 | 20/9 | 4 | 30.1 |
| INV-NAND4-INV               | 3 | 2    | 6 | 22.1 |
| NAND4-INV-INV               | 4 | 2    | 7 | 21.1 |
| NAND2-NOR2-INV-INV          | 4 | 20/9 | 6 | 20.5 |
| NAND2-INV-NAND2-INV         | 4 | 16/9 | 6 | 19.7 |
| INV-NAND2-INV-NAND2-INV     | 5 | 16/9 | 7 | 20.4 |
| NAND2-INV-NAND2-INV-INV-INV | 6 | 16/9 | 8 | 21.6 |

6: Logical Effort

#### **Review of Definitions**

| Term              | Stage                                                                     | Path                                              |
|-------------------|---------------------------------------------------------------------------|---------------------------------------------------|
| number of stages  | 1                                                                         | Ν                                                 |
| logical effort    | 8                                                                         | $G = \prod g_i$                                   |
| electrical effort | $h = \frac{C_{\text{out}}}{C_{\text{in}}}$                                | $H = rac{C_{	ext{out-path}}}{C_{	ext{in-path}}}$ |
| branching effort  | $b = \frac{C_{\text{on-path}} + C_{\text{off-path}}}{C_{\text{on-path}}}$ | $B = \prod b_i$                                   |
| effort            | f = gh                                                                    | F = GBH                                           |
| effort delay      | f                                                                         | $D_F = \sum f_i$                                  |
| parasitic delay   | p                                                                         | $P = \sum p_i$                                    |
| delay             | d = f + p                                                                 | $D = \sum d_i = D_F + P$                          |

## Method of Logical Effort

- 1) Compute path effort
- 2) Estimate best number of stages
- 3) Sketch path with N stages
- 4) Estimate least delay
- 5) Determine best stage effort
- 6) Find gate sizes

F = GBH

 $N = \log_4 F$ 

$$D = NF^{\frac{1}{N}} + P$$
$$\hat{f} = F^{\frac{1}{N}}$$



## Limits of Logical Effort

- □ Chicken and egg problem
  - Need path to compute G
  - But don't know number of stages without G
- □ Simplistic delay model
  - Neglects input rise time effects
- Interconnect
  - Iteration required in designs with wire
- □ Maximum speed only
  - Not minimum area/power for constrained delay

## Summary

- □ Logical effort is useful for thinking of delay in circuits
  - Numeric logical effort characterizes gates
  - NANDs are faster than NORs in CMOS
  - Paths are fastest when effort delays are ~4
  - Path delay is weakly sensitive to stages, sizes
  - But using fewer stages doesn't mean faster paths
  - Delay of path is about log<sub>4</sub>F FO4 inverter delays
  - Inverters and NAND2 best for driving large caps
- Provides language for discussing fast circuits
  - But requires practice to master



#### Lecture\_7: Power

NEIL H. E. WESTE DAVID MONEY HARRIS

#### Outline

- Power and Energy
- Dynamic Power
- □ Static Power





7: Power

#### **Power in Circuit Elements**

$$P_{VDD}\left(t\right) = I_{DD}\left(t\right)V_{DD}$$



$$P_{R}\left(t\right) = \frac{V_{R}^{2}\left(t\right)}{R} = I_{R}^{2}\left(t\right)R$$

$$E_{C} = \int_{0}^{\infty} I(t)V(t)dt = \int_{0}^{\infty} C \frac{dV}{dt}V(t)dt + V_{C}$$
$$= C \int_{0}^{V_{C}} V(t)dV = \frac{1}{2}CV_{C}^{2}$$

$$V_{C} \stackrel{+}{=} C \downarrow I_{C} = C dV/dt$$

7: Power

#### **Charging a Capacitor**

- When the gate output rises
  - Energy stored in capacitor is

$$E_C = \frac{1}{2} C_L V_{DD}^2$$

- But energy drawn from the supply is

$$E_{VDD} = \int_{0}^{\infty} I(t) V_{DD} dt = \int_{0}^{\infty} C_L \frac{dV}{dt} V_{DD} dt$$
$$= C_L V_{DD} \int_{0}^{V_{DD}} dV = C_L V_{DD}^2$$



- Half the energy from  $V_{DD}$  is dissipated in the pMOS transistor as heat, other half stored in capacitor
- When the gate output falls
  - Energy in capacitor is dumped to GND
  - Dissipated as heat in the nMOS transistor

#### **Switching Waveforms**

**D** Example:  $V_{DD} = 1.0 V$ ,  $C_{L} = 150 fF$ , f = 1 GHz



#### **Switching Power**

$$P_{\text{switching}} = \frac{1}{T} \int_{0}^{T} i_{DD}(t) V_{DD} dt$$
$$= \frac{V_{DD}}{T} \int_{0}^{T} i_{DD}(t) dt$$
$$= \frac{V_{DD}}{T} \left[ T f_{\text{sw}} C V_{DD} \right]$$
$$= C V_{DD}^{2} f_{\text{sw}}$$



7: Power

## **Activity Factor**

- Suppose the system clock frequency = f
- $\Box$  Let  $f_{sw} = \alpha f$ , where  $\alpha = activity factor$ 
  - If the signal is a clock,  $\alpha$  = 1
  - If the signal switches once per cycle,  $\alpha$  =  $\frac{1}{2}$

#### Dynamic power:

$$P_{\rm switching} = \alpha C V_{DD}^2 f$$

## **Short Circuit Current**

- When transistors switch, both nMOS and pMOS networks may be momentarily ON at once
- □ Leads to a blip of "short circuit" current.
- < 10% of dynamic power if rise/fall times are comparable for input and output</p>
- □ We will generally ignore this component

#### **Power Dissipation Sources**

- $\square P_{total} = P_{dynamic} + P_{static}$
- Dynamic power: P<sub>dynamic</sub> = P<sub>switching</sub> + P<sub>shortcircuit</sub>
  - Switching load capacitances
  - Short-circuit current
- □ Static power:  $P_{static} = (I_{sub} + I_{gate} + I_{junct} + I_{contention})V_{DD}$ 
  - Subthreshold leakage
  - Gate leakage
  - Junction leakage
  - Contention current

## **Dynamic Power Example**

- 1 billion transistor chip
  - 50M logic transistors
    - Average width: 12  $\lambda$
    - Activity factor = 0.1
  - 950M memory transistors
    - Average width: 4  $\lambda$
    - Activity factor = 0.02
  - $-1.0 \text{ V} 65 \text{ nm process}, L_{eff} = 50 nm$
  - C = 1 fF/µm (gate) + 0.8 fF/µm (diffusion)
- Estimate dynamic power consumption @ 1 GHz. Neglect wire capacitance and short-circuit current.

#### Solution

$$C_{logic} = (50 \times 10^6)(12 \times 0.025 \mu m)(1.0 + 0.8) \left(\frac{pF}{\mu m}\right) = 27 nF$$
$$C_{mem} = (950 \times 10^6)(4 \times 0.025 \mu m)(1.0 + 0.8) \left(\frac{pF}{\mu m}\right) = 171 nF$$

 $P_{dynamic} = \left[0.1C_{logic} + 0.02C_{mem}\right](1.0)^2(1.0Ghz) = 6.1W$ 

$$f = 50$$
 nm and  $\lambda = 25$  nm  $= 0.025 \mu$ m

7: Power

CMOS VLSI Design 4th Ed.
#### **Dynamic Power Reduction**

$$\square P_{\text{switching}} = \alpha C V_{DD}^{2} f$$

- □ Try to minimize:
  - Activity factor
  - Capacitance
  - Supply voltage
  - Frequency

#### **Activity Factor Estimation**

 $\Box \text{ Let } P_i = Prob(node i = 1)$ 

- 
$$\overline{P_i} = 1 - P$$
, Prob(node i = 0)

 $\Box \ \alpha_i = \overline{P_i} \times P_i$ 

**Completely random data has P = 0.5 and \alpha = 0.25** 

Data is often not completely random

- Structured data, e.g. upper bits of 64-bit unsigned integer representing bank account balances are usually 0
- Data propagating through ANDs and ORs has lower activity factor

– Depends on design, but typically  $\alpha \approx 0.1$ 

#### **Switching Probability**

| Gate  | P <sub>Y</sub>                                                      |
|-------|---------------------------------------------------------------------|
| AND2  | $P_{\mathcal{A}}P_B$                                                |
| AND3  | $P_A P_B P_C$                                                       |
| OR2   | $1 - \overline{P}_A \overline{P}_B$                                 |
| NAND2 | $1 - P_A P_B$                                                       |
| NOR2  | $\overline{P}_{\mathcal{A}}\overline{P}_B$                          |
| XOR2  | $P_{\mathcal{A}}\overline{P}_{B} + \overline{P}_{\mathcal{A}}P_{B}$ |





#### **Clock Gating**

- The best way to reduce the activity is to turn off the clock to registers in unused blocks
  - Saves clock activity ( $\alpha$  = 1)
  - Eliminates all switching activity in the block
  - Requires determining if block will be used



#### Lecture\_8: Power

NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

CMOS

DESIGN

A CIRCUITS

SYSTEMS

PERSPECTIVE

AND

#### Capacitance

- Gate capacitance
  - Fewer stages of logic
  - Small gate sizes
- □ Wire capacitance
  - Good floorplanning to keep communicating blocks close to each other
  - Drive long wires with inverters or buffers rather than complex gates

## Voltage / Frequency

- Run each block at the lowest possible voltage and frequency that meets performance requirements
- Voltage Domains
  - Provide separate supplies to different blocks
  - Level converters required when crossing from low to high  $V_{DD}$  domains



Dynamic Voltage Scaling – Adjust V<sub>DD</sub> and f according to workload



#### **Static Power**

- Static power is consumed even when chip is quiescent.
  - Leakage draws power from nominally OFF devices
  - Ratioed circuits burn power in fight between ON transistors

# **Static Power Example**

- Revisit power estimation for 1 billion transistor chip
- Estimate static power consumption
  - Subthreshold leakage
    - Normal V<sub>t</sub>: 100 nA/μm
    - High  $V_t$ : 10 nA/ $\mu$ m
    - High Vt used in all memories and in 95% of logic gates
  - Gate leakage 5 nA/µm
  - Junction leakage negligible

#### Solution

$$W_{\text{normal-V}_{t}} = (50 \times 10^{6})(12\lambda)(0.025\mu\text{m}/\lambda)(0.05) = 0.75 \times 10^{6} \mu\text{m}$$

$$W_{\text{high-V}_{t}} = [(50 \times 10^{6})(12\lambda)(0.95) + (950 \times 10^{6})(4\lambda)](0.025\mu\text{m}/\lambda) = 109.25 \times 10^{6} \mu\text{m}$$

$$I_{sub} = [W_{\text{normal-V}_{t}} \times 100 \text{ nA}/\mu\text{m} + W_{\text{high-V}_{t}} \times 10 \text{ nA}/\mu\text{m}]/2 = 584 \text{ mA}$$

$$I_{gate} = [(W_{\text{normal-V}_{t}} + W_{\text{high-V}_{t}}) \times 5 \text{ nA}/\mu\text{m}]/2 = 275 \text{ mA}$$

$$P_{static} = (584 \text{ mA} + 275 \text{ mA})(1.0 \text{ V}) = 859 \text{ mW}$$

7: Power

#### Subthreshold Leakage

**T** For  $V_{ds} > 50 \text{ mV}$ 

$$I_{sub} \approx I_{off} 10^{\frac{V_{gs} + \eta(V_{ds} - V_{DD}) - k_{\gamma}V_{sb}}{S}}$$

$$\Box I_{off} = Ieakage at V_{gs} = 0, V_{ds} = V_{DD}$$

Typical values in 65 nm  $I_{off} = 100 \text{ nA/}\mu\text{m} @ V_t = 0.3 \text{ V}$   $I_{off} = 10 \text{ nA/}\mu\text{m} @ V_t = 0.4 \text{ V}$   $I_{off} = 1 \text{ nA/}\mu\text{m} @ V_t = 0.5 \text{ V}$   $\eta = 0.1$   $k_{\gamma} = 0.1$ S = 100 mV/decade

CMOS VLSI Design 4th Ed.

#### **Stack Effect**

□ Series OFF transistors have less leakage  $-V_x > 0$ , so N2 has negative  $V_{as}$ 

$$I_{sub} = \underbrace{I_{off} 10^{\frac{\eta(V_x - V_{DD})}{S}}}_{N1} = \underbrace{I_{off} 10^{\frac{-V_x + \eta((V_{DD} - V_x) - V_{DD}) - k_\gamma V_x}{S}}}_{N2}$$



$$V_{x} = \frac{\eta V_{DD}}{1 + 2\eta + k_{\gamma}}$$
$$I_{sub} = I_{off} 10^{\frac{-\eta V_{DD} \left(\frac{1 + \eta + k_{\gamma}}{1 + 2\eta + k_{\gamma}}\right)}{S}} \approx I_{off} 10^{\frac{-\eta V_{DD}}{S}}$$

Leakage through 2-stack reduces ~10x

Leakage through 3-stack reduces further

CMOS VLSI Design 4th Ed.

#### Leakage Control

- □ Leakage and delay trade off
  - Aim for low leakage in sleep and low delay in active mode
- □ To reduce leakage:
  - Increase  $V_t$ : multiple  $V_t$ 
    - Use low V<sub>t</sub> only in critical circuits
  - Increase V<sub>s</sub>: *stack effect* 
    - Input vector control in sleep
  - Decrease V<sub>b</sub>
    - Reverse body bias in sleep
    - Or forward body bias in active mode

#### Gate Leakage

- Extremely strong function of t<sub>ox</sub> and V<sub>gs</sub>
  - Negligible for older processes
  - Approaches subthreshold leakage at 65 nm and below in some processes
- □ An order of magnitude less for pMOS than nMOS
- □ Control leakage in the process using  $t_{ox}$  > 10.5 Å
  - High-k gate dielectrics help
  - Some processes provide multiple t<sub>ox</sub>
    - e.g. thicker oxide for 3.3 V I/O transistors
- Control leakage in circuits by limiting V<sub>DD</sub>

#### NAND3 Leakage Example

#### □ 100 nm process

 $I_{gn} = 6.3 \text{ nA}$   $I_{gp} = 0$ 

 $I_{offn} = 5.63 \text{ nA}$   $I_{offp} = 9.3 \text{ nA}$ 



| Input State (ABC) | l <sub>sub</sub> | l <sub>gate</sub> | l <sub>total</sub> | V <sub>x</sub>                   | Vz                               | I |
|-------------------|------------------|-------------------|--------------------|----------------------------------|----------------------------------|---|
| 000               | 0.4              | 0                 | 0.4                | stack effect                     | stack effect                     | I |
| 001               | 0.7              | 0                 | 0.7                | stack effect                     | V <sub>DD</sub> – V <sub>t</sub> | Ī |
| 010               | 0.7              | 1.3               | 2.0                | intermediate                     | intermediate                     | Ī |
| 011               | 3.8              | 0                 | 3.8                | V <sub>DD</sub> – V <sub>t</sub> | V <sub>DD</sub> – V <sub>t</sub> | ĺ |
| 100               | 0.7              | 6.3               | 7.0                | 0                                | stack effect                     | I |
| 101               | 3.8              | 6.3               | 10.1               | 0                                | V <sub>DD</sub> – V <sub>t</sub> | ĺ |
| 110               | 5.6              | 12.6              | 18.2               | 0                                | 0                                | I |
| 111               | 28               | 18.9              | 46.9               | 0                                | 0                                |   |

7: Power

#### **Junction Leakage**

- □ From reverse-biased p-n junctions
  - Between diffusion and substrate or well
- Ordinary diode leakage is negligible
- □ Band-to-band tunneling (BTBT) can be significant
  - Especially in high-V $_{\rm t}$  transistors where other leakage is small
  - Worst at  $V_{db}$  =  $V_{DD}$
- □ Gate-induced drain leakage (GIDL) exacerbates
  - Worst for  $V_{gd}$  = - $V_{DD}$  (or more negative)

#### **Power Gating**

- Turn OFF power to blocks when they are idle to save leakage
  - Use virtual  $V_{DD}$  ( $V_{DDV}$ )
  - Gate outputs to prevent invalid logic levels to next block



- Voltage drop across sleep transistor degrades performance during normal operation
  - Size the transistor wide enough to minimize impact
- Switching wide sleep transistor costs dynamic power
   Only justified when circuit sleeps long enough



NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

#### Lecture\_9: **Combinational Circuit Design**

## Outline

- Bubble Pushing
- Compound Gates
- Logical Effort Example
- Input Ordering
- □ Asymmetric Gates
- Skewed Gates
- Best P/N ratio

#### Example 1

Endmodule

1) Sketch a design using AND, OR, and NOT gates.



**10: Combinational Circuits** 

## Example 2

2) Sketch a design using NAND, NOR, and NOT gates. Assume ~S is available.



# **Bubble Pushing**

- Start with network of AND / OR gates
- Convert to NAND / NOR + inverters
- Push bubbles around to simplify logic
  - Remember DeMorgan's Law

(b)









**10: Combinational Circuits** 





CMOS VLSI Design 4th Ed.

## Example 3

3) Sketch a design using one compound gate and one NOT gate. Assume ~S is available.



#### **Compound Gates**

#### Logical Effort of compound gates



7

## **Example 4**

The multiplexer has a maximum input capacitance of 16 units on each input. It must drive a load of 160 units. Estimate the delay of the two designs.



**10: Combinational Circuits** 

CMOS VLSI Design 4th Ed.



**10: Combinational Circuits** 

CMOS VLSI Design 4th Ed.

## Input Order

Our parasitic delay model was too simple

- Calculate parasitic delay for Y falling
  - If A arrives latest?
  - If B arrives latest?

$$t_{pd} = 6C * \left(\frac{R}{2} + \frac{R}{2}\right) / 3RC$$

$$t_{pd} = (2C * R/2 + 6C * \left(\frac{R}{2} + \frac{R}{2}\right))/3RC$$



**10: Combinational Circuits** 

## **Inner & Outer Inputs**

- Inner input is closest to output (A)
- Outer input is closest to rail (B)
- If input arrival time is known
  - Connect latest input to inner terminal



# **Asymmetric Gates**

Buffer A Reset asserted y=0 reset Required to reset less frequently

A is most critical, go for Asymmetric gate.

- Make it inner
- Less gate capacitance
- Reset to a wider nMOS, Less R
- Reset narrower pMOS, Less C
- Series nMOS R =unity
- R/4 + R/(4/3) = R and  $g_A = (2+4/3)/3 = 10/9$
- As the reset nMOS W gets larger,  $g_A$  becomes closer to unity



# **Asymmetric Gates**

- Asymmetric gates favor one input over another
- Ex: suppose input A of a NAND gate is most critical
  - Use smaller transistor on A (less capacitance)
  - Boost size of noncritical input
  - So total resistance is same

$$\Box g_A =$$

- $\Box$  g<sub>B</sub> =
- $\Box g_{total} = g_A + g_B =$





reset

- Asymmetric gate approaches g = 1 on critical input
- But total logical effort goes up



## **Skewed Gates**

- Skewed gates favor one edge over another
- Ex: suppose rising output of inverter is most critical
  - Downsize noncritical nMOS transistor



Calculate logical effort by comparing to unskewed inverter with same effective resistance on that edge.

$$-g_u =$$

$$-g_d =$$

## HI- and LO-Skew



CMOS VLSI Design 4th Ed.

# HI- and LO-Skew

- Def: Logical effort of a skewed gate for a particular transition is the ratio of the input capacitance of that gate to the input capacitance of an unskewed inverter delivering the same output current for the same transition.
- Skewed gates reduce size of noncritical transistors
  - HI-skew gates favor rising output (small nMOS)
  - LO-skew gates favor falling output (small pMOS)
- □ Logical effort is smaller for favored direction
- But larger for the other direction

# HI- and LO-Skew

In calculating  $g_u$  of a complex gate:

Draw the unskewed inverter (2:1) whose pull-up resistance is equal to the equivalent resistance of the pull-up network of the skewed gate.

Then  $g_u = \frac{input \ capacitance \ of \ the \ skewed \ gate}{input \ capacitance \ of \ the \ unskewed \ invrter}$ 

In calculating  $g_d$  of a complex gate:

Draw the unskewed inverter (2:1) whose pull-down resistance is equal to the equivalent resistance of the pull-down network of the skewed gate.

Then  $g_d = \frac{input \ capacitance \ of \ the \ skewed \ gate}{input \ capacitance \ of \ the \ unskewed \ invrter}$
## **Calculations of** $g'_u s$ and $g'_d s$

#### Inverters



## **Calculations of** $g'_u s$ and $g'_d s$

#### NAND gates



## **Calculations of** $g'_u s$ and $g'_d s$

#### NOR gates

A –

Unskewed



HI-skewed



LO-skewed



Equal rise time





Equal fall time

**10: Combinational Circuits** 

# **Catalog of Skewed Gates**



**10: Combinational Circuits** 

# Asymmetric Skew

- Combine asymmetric and skewed gates
  - Downsize noncritical transistor on unimportant input
  - Reduces parasitic delay for critical input





**10: Combinational Circuits** 

## **Best P/N Ratio**

□ We have selected P/N ratio for unit rise and fall resistance ( $\mu = 2-3$  for an inverter). $\mu = \frac{\mu_n}{\mu_p} = 2$ 

□ Alternative: choose ratio for least average delay

#### Ex: inverter

- Delay driving identical inverter
- $t_{pdf} = 2C(P+1)$ . R
- $t_{pdr} = 2C(P+1)$ .  $R(\mu / P)$
- $t_{pd} = 1/2(t_{pdf} + t_{pdr}) = 1/2[2CR(P+1)(1+\mu/P)] = (P+1+\mu+\mu/P)CR$
- $dt_{pd} / dP = (1 \mu/P^2) = 0$
- Least delay for  $P = \sqrt{\mu}$



## **Best P/N Ratio**



## **P/N Ratios**

□ In general, best P/N ratio is sqrt of equal delay ratio.

- Only improves average delay slightly for inverters
- But significantly decreases area and power



**10: Combinational Circuits** 

## **Observations**

□ For speed:

- NAND vs. NOR
- Many simple stages vs. fewer high fan-in stages
- Latest-arriving input
- □ For area and power:
  - Many simple stages vs. fewer high fan-in stages



#### NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

# Lecture\_10: Circuit **Families**

### Outline

- Pseudo-nMOS Logic
- Dynamic Logic
- Pass Transistor Logic

## Introduction

What makes a circuit fast?

- I = C dV/dt  $\rightarrow t_{pd} \propto (C/I) \Delta V$
- low capacitance
- high current
- small swing
- □ Logical effort is proportional to C/I
- □ pMOS are the enemy!
  - High capacitance for a given current
- Can we take the pMOS capacitance off the input?
- Various circuit families try to do this...



#### **Ratioed circuits: nMOS Technology**

- nMOS only Technology.
- $\Box$  Popular 1970 to -1980 before CMOS.
- □ Pulldown network off, static load (R or T) pulls output high.
- □ Pulldown network on, PDN fights the always on static load.
- $\begin{tabular}{ll} \hline \Box \ Enhancement nMOS requires additional Supply $V_{GG}$ for strong $V_{OH}$, use instead depletion mode MOS \end{tabular}$



#### **Pseudo-nMOS**

#### In CMOS, use a pMOS that is always ON

*Ratio* **issue** Make pMOS about  $\frac{1}{4}$ effective strength of pulldown network. P = (2x16)/4 = 8



1000

**10: Circuit Families** 

#### **Pseudo-nMOS**



Need the discharging current of the capacitor to I as a unit-sized inverter I. Required transistor size m to do so, keeping the pMOS transistor of  $\frac{1}{4}$  the streight of the nMOS.

m . I – m . I/4 = I which gives m = 4/3  
Which gives 
$$\mu(4/3) * \frac{1}{4} = \frac{2}{3}$$



| Pseuc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | do-nMOS                                                                                                                                                       | 5 Gates                                                                                                      |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
| <ul> <li>Design for unit to compare with to pMOS fights n</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | t current on output<br>th unit inverter.<br>MOS                                                                                                               | t<br>inputs<br>f                                                                                             |
| Inverter                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | NAND2                                                                                                                                                         | NOR2                                                                                                         |
| $\begin{array}{cccc} g_{u} &= 4/3 \\ g_{d} &= 4/9 \\ g_{avg} &= 8/9 \\ \hline 2/3 & Y & p_{u} &= 6/3 \\ A & 4/3 & p_{d} &= 6/9 \\ \hline & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & \\ & & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & &$ | $\begin{array}{c} & g_{u} &= 8/3 \\ g_{d} &= 8/9 \\ g_{avg} &= 16/9 \\ P_{u} &= 10/3 \\ B &= 8/3 \\ B &= 8/3 \\ P_{d} &= 10/9 \\ p_{avg} &= 20/9 \end{array}$ | $g_{u} = 4/3$ $g_{d} = 4/9$ $g_{avg} = 8/9$ $Y p_{u} = 10/3$ $A - 4/3 B - 4/3 p_{d} = 10/9$ $p_{avg} = 20/9$ |
| 10: Circuit Families                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | CMOS VLSI Design <sup>4</sup>                                                                                                                                 | Ith Ed. 8                                                                                                    |

#### **Pseudo-nMOS Gates**

Calculate g<sub>ave</sub> and P<sub>ave</sub> for k-input pseudo-nMOS NOR gate



$$g_u = (4/3)/1 = 4/3$$
  
 $g_d = (4/3)/3 = 4/9$   
 $g_{ave} = \frac{1}{2}(4/3 + 4/9) = 8/9$  independent of k  
 $P_u = (2/3 + kx4/3)/1$   
 $P_d = (2/3 + kx4/3)/3$   
 $P_{ave} = \frac{1}{2} [2/3 + 4/3xk + 2/9 + 4/9xk) = 4/9 + 8k/9$ 

**10: Circuit Families** 



# **Pseudo-nMOS Design**

Since the unit-sized inverter has an input capacitance of 3 units, the sizing of the nMOS NOR gate transistors should be  $\sqrt{8H}$ and the size of the pMOS NOR gate would be 2.  $(\sqrt{8H})/4$  which makes it one fourth the nMOS strength.



## **Pseudo-nMOS Power**

Pseudo-nMOS draws power whenever Y = 0

- Called static power  $P = I_{DD}V_{DD}$
- A few mA / gate \* 1M gates would be a problem
- Explains why nMOS went extinct
- Use pseudo-nMOS sparingly for wide NORs
- Turn off pMOS when not in use



## Pseudo nMOS ROM





**10: Circuit Families** 

# **Ratio Example**

- The chip contains a 32 word x 48 bit ROM
  - Uses pseudo-nMOS decoder and bitline pullups
  - On average, one wordline and 24 bitlines are high
- □ Find static power drawn by the ROM

$$-I_{on-p} = 36 \ \mu A, V_{DD} = 1.0 \ V$$

Solution:

$$P_{\text{pull-up}} =$$
  
 $P_{\text{static}} =$ 

**10: Circuit Families** 



#### Lecture\_11: Circuits Families

NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

CMOS



# **Dynamic Logic**

- Dynamic gates uses a clocked pMOS pullup
- □ Two modes: *precharge* and *evaluate*



## The Foot

What if pulldown network is ON during precharge?
 Use series evaluation transistor to prevent fight.



# Logical Effort



#### **10: Circuit Families**



# **Monotonicity Woes**

- But dynamic gates produce monotonically falling outputs during evaluation
- Illegal for one dynamic gate to drive another!



**10: Circuit Families** 

## **Domino Gates**

- □ Follow dynamic stage with inverting static gate
  - Dynamic / static pair is called domino gate
  - Produces monotonic outputs



# **Domino Optimizations**

- Each domino gate triggers next one, like a string of dominos toppling over
- Gates evaluate sequentially but precharge in parallel
- ❑ Thus evaluation is more critical than precharge
- □ HI-skewed static stages can perform logic



**10: Circuit Families** 



#### **Domino and Compound Domino**





**10: Circuit Families** 

# **Dual-Rail Domino**

- Domino only performs noninverting functions:
  - AND, OR but not NAND, NOR, or XOR
- Dual-rail domino solves this problem
  - Takes true and complementary inputs
  - Produces true and complementary outputs

| sig_h | sig_l | Meaning    | . 1      |
|-------|-------|------------|----------|
| 0     | 0     | Precharged |          |
| 0     | 1     | '0'        | inputs → |
| 1     | 0     | '1'        | φ        |
| 1     | 1     | invalid    |          |

**10: Circuit Families** 

CMOS VLSI Design 4th Ed.

Υh

# **Example: AND/NAND**

- Given A\_h, A\_l, B\_h, B\_l
- $\Box \quad \text{Compute Y}_h = AB, Y_I = \overline{AB}$
- Pulldown networks are conduction complements
- □ More area, wiring and power
- Perform inverting and noninverting logic




### **Pass Transistor Circuits**

- Use pass transistors like switches to do logic
- ❑ Inputs drive diffusion terminals as well as gates
- CMOS + Transmission Gates:
  - 2-input multiplexer
  - Gates should be restoring



**10: Circuit Families** 



#### Complementary Pass-transistor Logic CPL

- Dual-rail form of pass transistor logic
- Avoids need for ratioed feedback
- Optional cross-coupling for rail-to-rail swing



**10: Circuit Families** 

#### **Pass Transistor Summary**

- Researchers investigated pass transistor logic for general purpose applications in the 1990's
  - Benefits over static CMOS were small or negative
  - No longer generally used
- However, pass transistors still have a niche in special circuits such as memories where they offer small size and the threshold drops can be managed

### Lecture 12: Adders

DESIGN A CIRCUITS AND SYSTEMS PERSPECTIVE

NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

CMOS

#### Outline

- □ Single-bit Addition
- Carry-Ripple Adder
- Carry-Skip Adder
- Carry-Select Adder
- Carry-Lookahead Adder
- Carry-Increment Adder
- Tree Adder

## **Single-Bit Addition**



Full Adder  

$$S = A \oplus B \oplus C$$
  
 $C_{out} = MAJ(A, B, C)$ 
 $A = B$   
 $C_{out} = C$ 

| А | В | С | C <sub>out</sub> | S |
|---|---|---|------------------|---|
| 0 | 0 | 0 |                  |   |
| 0 | 0 | 1 |                  |   |
| 0 | 1 | 0 |                  |   |
| 0 | 1 | 1 |                  |   |
| 1 | 0 | 0 |                  |   |
| 1 | 0 | 1 |                  |   |
| 1 | 1 | 0 | -                | - |
| 1 | 1 | 1 |                  |   |

#### 17: Adders



## **Single-Bit Addition**



For the Full Adder Note the symmetry of S and C<sub>out</sub> Inverting the inputs inverts the outputs

Full Adder  $S = A \oplus B \oplus C$   $C_{out} = MAJ(A, B, C)$  A = B $C_{out} = C$ 



17: Adders



17: Adders





17: Adders

#### Full Adder Design IV :Dynamic Logic

#### Dual-rail domino

- Very fast, but large and power hungry
- Used in very fast multipliers



17: Adders

# **Carry Propagate Adders**

#### N-bit adder called CPA

- Each sum bit depends on all previous carries
- How do we compute all these carries quickly?



# **Carry-Ripple Adder**

- □ Simplest design: cascade full adders
  - Critical path goes from C<sub>in</sub> to C<sub>out</sub>
  - Design full adder to have fast carry delay



#### Inversions

- Critical path passes through majority gate
  - Built from minority + inverter
  - Eliminate inverter and use inverting full adder



#### Generate, Propagate, and Kill

|   | Truth ta | able for full | adder |   |   |                  |   |
|---|----------|---------------|-------|---|---|------------------|---|
| A | В        | C             | G     | Р | K | C <sub>out</sub> | S |
| 0 | 0        | 0             | 0     | 0 | 1 | 0                | 0 |
|   |          | 1             |       |   |   | 0                | 1 |
| 0 | 1        | 0             | 0     | 1 | 0 | 0                | 1 |
|   |          | 1             |       |   |   | 1                | 0 |
| 1 | 0        | 0             | 0     | 1 | 0 | 0                | 1 |
|   |          | 1             |       |   |   | 1                | 0 |
| 1 | 1        | 0             | 1     | 0 | 0 | 1                | 0 |
|   |          | 1             |       |   |   | 1                | 1 |

 $G = A.B, C_{out} = 1$   $P = A \bigoplus B, C_{out} = C_{in}$  $K = \overline{A}.\overline{B}, C_{out} = 0$   $S = A \bigoplus B \bigoplus C$ =  $P \bigoplus C$  $C_{out} = G + P.C$ =  $G + \overline{K}.C$ 

17: Adders

### Lecture 13: Adders

NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

CMOS

DESIGN

A CIRCUITS

SYSTEMS

PERSPECTIVE

AND



# **Group PG Signals**

Bitwise generate and propagate signals:

 $G_{i:i} = G_i = A_i \cdot B_i$  and  $P_{i:i} = P_i = A_i \oplus B_i$ 

\* Generate and propagate for groups of bits spanning from i:j ( $i \ge k > j$ )  $C_{in}$ 



$$G_{i:j} = G_{i:k} + P_{i:k} \cdot G_{k-1:j} \qquad C_i = G_i + P_i C_{i-1} \\ P_{i:j} = P_{i:k} \cdot P_{k-1:j} \qquad C_i = G_i + P_i G_{i-1:0} \\ C_i = G_i + P_i G_{i-1:0} \\ C_i = G_{i:0}$$

17: Adders

## Generate / Propagate

- Equations often factored into G and P
- Generate and propagate for groups spanning i:j

$$G_{i:j} =$$
  
 $P_{i:j} =$ 

#### Base case

$$G_{i:i} \equiv G_{0:0} \equiv$$

$$P_{i:i} \equiv P_{0:0} \equiv$$

 $\Box$  Sum:  $S_i = P_i \oplus C_i$ 

$$S_i =$$

### **PG Logic**



17: Adders

# **Carry-Ripple Revisited**

P and G signals simplify the majority function into AND-OR

- $S_i = P_i \bigoplus C_{i-1}$   $C_i = G_i + P_i \cdot C_{i-1}$   $C_i = G_i + P_i \cdot G_{i-1:0}$   $C_i = G_{i:0}$
- AND-OR current bit G with previous group G ( carry-in )
- Group P not required



# **Carry-Ripple PG Diagram**





# **PG Diagram Notation**



17: Adders



# Carry-Skip PG Diagram

- $\succ$  Gray: when a group generates a carry out.
- > Blue:  $G_{i:0}$  updated when a carry-in arrives.

For k groups of and n bits per group (N=nk)



$$t_{skip} = t_{pg} + 2(n-1)t_{AO} + (k-1)t_{mux} + t_{xor}$$

17: Adders

### Variable Group Size



### **Carry-Select Adder**

Trick for critical paths dependent on late input X

- Precompute two possible outputs for X = 0, 1
- Select proper output when X arrives

Carry-select adder precomputes n-bit sums

For both possible carries into n-bit group



### **Multi-input Adders**

Suppose we want to add k N-bit words

- Ex: 0001 + 0111 + 1101 + 0010 = 10111

#### Straightforward solution: k-1 N-input CPAs

Large and slow



**18: Datapath Functional Units** 

### **Carry Save Addition**

- □ A full adder sums 3 inputs and produces 2 outputs
  - Carry output has twice weight of sum output
- □ N full adders in parallel are called *carry* save adder
  - Produce N sums and N carry outs



# **CSA Application**

#### □ Use k-2 stages of CSAs

- Keep result in carry-save redundant form
- □ Final CPA computes actual result



**18: Datapath Functional Units** 

## Multiplication



multiplicand multiplier partial products product

- □ M x N-bit multiplication
  - Produce N M-bit partial products
  - Sum these to produce M+N-bit product

#### **General Form**

Multiplicand:Multiplier:

$$Y = (y_{M-1}, y_{M-2}, \dots, y_1, y_0)$$
$$X = (x_{N-1}, x_{N-2}, \dots, x_1, x_0)$$

**D** Product: 
$$P = \left(\sum_{j=0}^{M-1} y_j 2^j\right) \left(\sum_{i=0}^{N-1} x_i 2^i\right) = \sum_{i=0}^{N-1} \sum_{j=0}^{M-1} x_i y_j 2^{i+j}$$

multiplicand  $y_5$ У<sub>4</sub>  $y_3 y_2$ У<sub>1</sub>  $y_0$ multiplier Х<sub>5</sub> х<sub>4</sub> **х**<sub>3</sub> х<sub>2</sub> **X**<sub>1</sub> **X**<sub>0</sub>  $x_0y_5 \quad x_0y_4 \quad x_0y_3 \quad x_0y_2 \quad x_0y_1$  $\mathbf{x}_0 \mathbf{y}_0$  $x_1y_5 \quad x_1y_4 \quad x_1y_3 \quad x_1y_2 \quad x_1y_1 \quad x_1y_0$ partial  $x_2y_5 \quad x_2y_4 \quad x_2y_3 \quad x_2y_2 \quad x_2y_1 \quad x_2y_0$ products  $x_{3}y_{5}$   $x_{3}y_{4}$   $x_{3}y_{3}$   $x_{3}y_{2}$   $x_{3}y_{1}$   $x_{3}y_{0}$  $x_4y_5$   $x_4y_4$   $x_4y_3$   $x_4y_2$   $x_4y_1$   $x_4y_0$  $x_5y_3 \quad x_5y_2 \quad x_5y_1 \quad x_5y_0$  $x_5y_5$ x<sub>5</sub>y<sub>4</sub> product р<sub>10</sub> р<sub>11</sub>  $p_4$  $p_1$  $\mathbf{p}_0$  $p_9$ p<sub>8</sub>  $p_7$  $\mathsf{p}_5$  $p_3$  $\mathbf{p}_2$  $\mathsf{p}_6$ 

**18: Datapath Functional Units** 


## **Array Multiplier**



**18: Datapath Functional Units** 

# **Rectangular Array**

### □ Squash array to fit rectangular floorplan



**18: Datapath Functional Units** 

CMOS VLSI Design 4th Ed.

# **Fewer Partial Products**

- Array multiplier requires N partial products
- If we looked at groups of r bits, we could form N/r partial products.
  - Faster and smaller?
  - Called radix-2<sup>r</sup> encoding
- □ Ex: r = 2: look at pairs of bits
  - Form partial products of 0, Y, 2Y, 3Y
  - − First three are easy, but 3Y requires adder ⊗

## **Booth Encoding**

**C** Replace 3y with  $-y \_$  add y to the next partial product (y<< 2 = 4y)

- The next PP has 4-times the weight
- Adding y to the next PP is actually adding 4y to the current PP

□ But what about the next PP?

$$\bullet \quad 0+y = y$$

• 
$$y + y = 2y$$

•  $2y + y = 3y \implies$  requires an adder.

□ Replace 2y with  $-2y \Rightarrow$  add y to the next partial product (y<< 2 = 4y)

•  $-2y + y = y \implies$  no adder is required

□ Need to check the MSB from the previous pair of bits

• If it is 1 (case of 2y or 3y in previous PP) the add y.

```
17: Adders
```

# **Booth Encoding**

- Instead of 3Y, try –Y, then increment next partial product to add 4Y
- □ Similarly, for 2Y, try –2Y + 4Y in next partial product

| Inputs            |          |                        | Partial Product | Booth Selects       |                   |                  |
|-------------------|----------|------------------------|-----------------|---------------------|-------------------|------------------|
| x <sub>2i+1</sub> | $x_{2i}$ | <i>x</i> 2 <i>i</i> −1 | $PP_i$          | SINGLE <sub>i</sub> | $\text{DOUBLE}_i$ | NEG <sub>i</sub> |
| 0                 | 0        | 0                      | 0               | 0                   | 0                 | 0                |
| 0                 | 0        | 1                      | Y               | 1                   | 0                 | 0                |
| 0                 | 1        | 0                      |                 |                     |                   |                  |
| 0                 | 1        | 1                      |                 |                     |                   |                  |
| 1                 | 0        | 0                      |                 |                     |                   |                  |
| 1                 | 0        | 1                      |                 |                     |                   |                  |
| 1                 | 1        | 0                      |                 |                     |                   |                  |
| 1                 | 1        | 1                      |                 |                     |                   |                  |

**18: Datapath Functional Units** 

## **Booth Hardware**

Booth encoder generates control lines for each PP

Booth selectors choose PP bits



### **Multiplication Example**

### $P = Y \times X = 011001_2 \times 100111_2$



Multiplication example



18: Datapath Functional Units CMOS VLSI Design <sup>4th Ed.</sup>