Bruce Jacob

University of Maryland ECE Dept.

SLIDE 1

#### ENEE 359a Digital VLSI Design

#### System Timing: Conventions, Problems, Solutions

Prof. Bruce Jacob blj@ece.umd.edu



Credit where credit is due:

Slides contain original artwork (© Jacob 2004) as well as material taken liberally from Irwin & Vijay's CSE477 slides (PSU), Schmit & Strojwas's 18-322 slides (CMU), Dally's EE273 slides (Stanford), Wolf's slides for *Modern VLSI Design*, and/or Rabaey's slides (UCB). Asynchronous circuits: Erik Brunvand.

Bruce Jacob

University of Maryland ECE Dept.

SLIDE 2

#### **Overview**

- Motivation: What Needs To Be Done? (and Why It Matters) (Elmore primer)
- Timing Conventions
- Synchronous & Source-Synchronous
- Self-Timed Circuits
- Dealing with Problems of Global Clocks
- Balanced Trees
- The Wonderful World of DLLs/PLLs





#### **Background: Elmore Delays**



ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



#### **Background: Elmore Delays**



ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.













UNIVERSITY OF MARYLAND

Bruce Jacob

University of Maryland ECE Dept.

SLIDE 11

### **Eye Diagram**



Yes, there really is that much voltage noise; Yes, there really is that much timing noise ...

Life sucks; deal with it.





- Caused by differences in signal path characteristics
- Total timing budget must take data-data skew, data-clock skew as well as clock-clock skew into cycle budget consideration



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 13



#### **Definitions:** Jitter

- Dynamic timing displacement from nominal timing characteristics
- Magnitude and offset of timing displacement could depend on: previous signal state(s), current signal state(s), supply voltage level(s), crosstalk, variations in thermal characteristics. Perhaps even phases of the moon (not proven).



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 15



Define *when* a value is present on a line ...

How many 1's? How many 0's?



Need convention to distinguish where one '1' ends and next '1' begins ... conventions typically mark boundaries w/ TRANSITIONS

of the signal itself
of an associated *clock* signal

(original definition of "synchronous")

**Uncertainty in timing limits operating speed** 





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 17

### **Setup Time**

#### Required time for input to be stable BEFORE CLOCK EDGE









Bruce Jacob

University of Maryland ECE Dept.

SLIDE 20

### **Hold Time**

#### Required time for input to be stable

#### AFTER CLOCK EDGE





## **Hold Time Violations**

Prop Delay: 1 ns

Hold Time: 2 ns



Hold time violations are caused by "short paths" Cannot be fixed by slowing down the clock!!! Fixed by slowing down fast paths



ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 22

### **Basic Timing Analysis**

Look for *longest* path: clock speed Look for *shortest* path: check hold time

Difficult problem, e.g. False Paths





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 23

#### A Tale of Two (or more) Timing Conventions

Synchronous: global clock

Synchronous: source-synchronous I ("open-loop" meaning no control loop)

Synchronous: source-synchronous II ("closed loop" meaning feedback control)

**Asynchronous: self-timed** 



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 24

### **System Building Blocks**

#### **DELAY ELEMENTS**

- Nominal delay
- Timing uncertainty, skew, jitter





#### **COMBINATIONAL LOGIC**

- Contamination delay
- Propagation delay

#### **CLOCKED STORAGE ELEMENTS**

- Align signal to a clock
- Signal waits to be sampled by clock
- Output held steady until next clock





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 25



Samples data on rising edge of CLK

Data must remain
 valid during an
 *aperture* of time
 during sampling

Output held steady until next CLK edge



- Output is held until a contamination delay following CLK edge
- Output has a correct value after a propagation delay following CLK edge



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 26

#### **Other Storage Elements**

#### LEVEL-SENSITIVE LATCH

- Passes data through when enable (clock) is high
- Holds data stable when enable (clock) is low





#### DUAL-EDGE-TRIGGERED FLIP-FLOP/REGISTER

- Samples data at both edges of the clock
- Internally two interleaved flip-flops (1 posedge FF + 1 negedge FF)
- Allows CLK to run at same speed as data



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 27



#### **MAXIMUM OPERATING RATE**

(i.e., signaling speed) LIMITED BY THREE FACTORS:



- Tr Transition time (rise/fall time)
- Tu Timing uncertainty, skew, jitter
- Ta Aperture time
- Tbit ≥ Tr + Tu + Ta (but not that simple ...)



### Synchronous Timing I

**GLOBAL CLOCK (Conventional) EXAMPLE** 



| Parameter              | Symbol | Nominal | Skew   | Jitter |
|------------------------|--------|---------|--------|--------|
| Bit Cell (data period) | Tbit   | 2.5 ns  |        |        |
| Transmitter Rise Time  | Tr     | 1.0 ns  |        |        |
| Cable Delay            | Twire  | 6.25 ns | 100 ps |        |
| Receiver Aperture      | Та     | 300 ps  | 100 ps | 50 ps  |
| Transmitter Delay      |        | 500 ps  | 150 ps | 50 ps  |
| Buffer Stage Delay     | B#     | 250 ps  | 100 ps | 50 ps  |

ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 29

### Synchronous Timing I

#### Sources of uncertainty:

 Skew (across multiple lines) of line delay



- Jitter of Tx, Rx, and line delay
- Skew and jitter of global clock (usually large due to high fan-out)

For best performance, center sampling edge on data eye





= 1/2 [Tbit - Tr - Ta] = ± 600ps = <u>1/2 Tu</u>?



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 30

#### An Aside ...

Why do we simply add up the uncertainties? And why does each effectively count twice?





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 31

### Synchronous Timing I

#### **TIMING ANALYSIS**

Clock Skew: 100ps lines + 100ps B1 + 400ps B4



Clock Jitter: 50ps B1 + 200ps B4 [CLKs times 2: one for xmit, one for recv] Transmitter: 150ps skew, 50ps jitter Receiver: 100ps skew, 50ps jitter Data Cable: 100ps skew TOTAL: 1550ps skew, 600ps jitter (BAD)



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 32

### Synchronous Timing I

#### LIMITS TO LINE DELAY & DATA FREQUENCY



Conventional Wisdom:



For fixed line length and tight margins, this limits the bus speeds that can be used



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 33

### Synchronous Timing I

**SUMMARY** 



#### GLOBALLY <sup>\*\*</sup> SYNCHRONOUS DESIGN:

- For long wires and high speeds, only a handful of frequencies work
- Impractical to control uncertainties
- Cannot switch frequencies



### Synchronous Timing II

#### **PIPELINED TIMING: BASIC IDEA**



Delay the clock by the same amount as data ... PLUS half a bit-cell

System will work from DC to maximum theoretical frequency 1/(Tr + Tu + Ta)



Defines new clock domain at receiving end

Maryland ECE Dept.

ENEE 359a Lecture/s 16-19

System Timing

**Bruce Jacob** 

University of

Bruce Jacob

University of Maryland ECE Dept.

SLIDE 35

### Synchronous Timing II

SOURCES OF UNCERTAINTY



#### SKEW:

- Between CLK & Data line
- Fixed differences in FF, Tx, Rx delays
- Different CLK delays to different FFs
- Aperture offset in Rx FF
- Extra offset in the delayed CLK line

#### JITTER:

- In Tx clock
- In FF, Tx, Rx delays



### Synchronous Timing II

#### **OPEN-LOOP PIPELINED EXAMPLE**



ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 37

### Synchronous Timing II

#### **TIMING ANALYSIS**

Xmit data: 150ps skew, 50ps jitter

Xmit toggle: 150ps skew, 50ps jitter



**Receiver: 100ps skew, 50ps jitter** 

Data cable: 100ps skew

**Toggle clock cable: 100ps skew** 

TOTAL: 600ps skew, 150ps jitter (BETTER)



### Synchronous Timing III

#### **CLOSED-LOOP PIPELINED EXAMPLE**





ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



## Synchronous Timing III

#### COMPONENTS of CONTROL LOOP (DLL)



ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 40

### Synchronous Timing III

#### **TIMING ANALYSIS**

Xmit data: 50ps jitter

**Recv data:** 50ps jitter



Xmit toggle: 30ps skew data/toggle (0?), 50ps jitter

**Recv toggle:** 20ps skew (0?), 50ps jitter

Data cable: 100ps skew

**Toggle clock cable:** 100ps skew



TOTAL: 250ps skew, 200ps jitter (GOOD!)

### Synchronous Timing II & III

LIMITS TO LINE DELAY & DATA FREQUENCY

#### None.

Only limiter to bus frequency is the rate at which you can successfully transmit & receive data (e.g. Taperture + Tuncertainty + Ttransmit)







ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.

Bruce Jacob

University of Maryland ECE Dept.

SLIDE 42

### **Asynchronous Timing**

**Basic Idea: no clocks** 

#### **ADVANTAGES:**

- Achieve average-case performance (good if difference between average & worst case is large)
- Consume power only when needed (i.e., only when actually processing data)
- Easy modular composition (designer focuses on local issues, not global issues)
- No clock alignment required (no expensive DLLs/PLLs)
- No clock distribution headaches (saves design time, power consumption, chip area)
- Robust in the face of parameter variations (e.g., temperature/voltage fluctuations, process variations)
- Global synchrony is a fallacy anyway! (i.e., face the problem head-on)



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 43

### **Asynchronous Timing**

#### **SELF-TIMED CIRCUITS**

**Worst-Case Delay:** sets clock period in synchronous designs (time: N \* max delay)



### Average-Case Delay: asynchronous designs





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 44

### **Asynchronous Timing**

#### HANDSHAKING PROTOCOLS

#### Four-Phase / RTZ / Level Signaling

• specific protocol determines data-release point



#### **Two-Phase / NRTZ / Transition Signaling**





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 45

### **Asynchronous Timing**

#### DATA SIGNALING

# **Bundled Data:** "normal" data wires, one per bit, with associated "valid" signal



## **Dual-Rail Data:** two wires per bit, encoded (00 = no data, 01 = 0, 10 = 1, 11 = error)





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 46

### **Asynchronous Timing**

#### **GOTCHA: Glitches on output (REQ line)**

#### **One Solution — CRITICAL PATH REPLICAS**

- Match the critical-path delay through the combinational logic block
- When a "start" signal indicates to begin processing, send signal through REPLICA DELAY block
- Combinational logic block is done processing when signal exits REPLICA DELAY







Bruce Jacob

University of Maryland ECE Dept.

SLIDE 48

### **A Tale of Three Pipelines**

**Asynchronous: self-timed** 

Synchronous: global clock l ("normal" design: unbalanced pipe)

Synchronous: global clock II (intentionally skews clock to balance pipe, uses wave pipelining)





time units (say ns): roughly every 23.5ns, for an effective speed of 43MHz

**Delay through pipe for single item = 33/63ns** 





Clock period = 31ns

By design, graduate one result every 31ns, for an effective speed of 32MHz

**Delay through pipe for single item = 93ns** 



### **Synchronous Pipeline II**



Clock period = 21ns; intentionally skew clock to register #3 to arrive 10ns *early* 

By design, graduate one result every 21ns, for an effective speed of 48MHz



ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



### **Synchronous Pipeline II**



Clock period = 21ns; intentionally skew clock to register #3 to arrive 10ns *early* 

## By design, graduate one result every 21ns, for an effective speed of 48MHz

Note: This requires use of *wave-pipelining* design techniques on last combinational logic block—very hard to do (probably much easier to do asynchronous design)

ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



### **Global Clock I**



- **0**: Assume data is stable for setup time before clock edge
- **1**: Rising edge of transmitter clock
- **2**: Transmitter begins to drive data (perhaps through logic)
- **3**: Signal reaches input of receiver.
- **4**: Rising edge of receiver clock
- **5**: Receiver latches data and drives internal signal lines



ENEE 359a Lecture/s 16-19

System Timing

**Bruce Jacob** 

University of

Maryland

SLIDE 53

ECE Dept.

### **Global Clock II: Parallel Data**



- Skew and jitter eats into timing budget
- Luckily, uncertainty does not accumulate beyond latches

ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



### **Clock Skew & Pipelines**



Clock edge timing depends upon position

- A clock line behaves as a distributed RC line
- Each register sees a local clock time depending on their distance from the clock source -> clock skew

- Clock skew can severely affect the performance
- Note: we assumed here t<sub>setup</sub>=0



Lecture/s 16-19 System Timing

Bruce Jacob

**ENEE 359a** 

University of Maryland ECE Dept.



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 57

#### **Clock Constraints in Edge-Triggered Logic**

(1)  $\emptyset_{skew} \le t_{r,min} + t_i + t_{l,min}$ (2) T  $\ge t_{r,max} + t_i + t_{l,max} - \emptyset_{skew}$ 

- Maximum Clock Skew Determined by Minimum Delay between Latches (condition 1)
- Minimum Clock Period Determined by Maximum Delay between Latches (condition 2)



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 58

#### **Positive and negative Skew**

#### **POSITIVE SKEW:**



The skew has to satisfy (1)

If it violates (1), then the circuit malfunction independently of the clock period Clock period decreases!!!

#### **NEGATIVE SKEW:**



(1) is satisfied implicitly.

The circuit operates correctly independently of the skew

Clock period increases by  $| \mathcal{Q}_{skew} |$ 





#### Bruce Jacob

University of Maryland ECE Dept.







### **Clock Tree II (I with buffers)**



- Large synchronous systems require all components (chips or registers) to be driven by clock signal.

- Clock signal paths and buffers could introduce both skew and jitter at each stage
- Jitter and skew are additive with larger systems. More buffering, more skew and jitter.

ENEE 359a Lecture/s 16-19 System Timing

Bruce Jacob

University of Maryland ECE Dept.



Bruce Jacob

University of Maryland ECE Dept.

SLIDE 62

## DEC Alpha 21164



9.3 M Transistors, 4 metal layers, 0.55µm

**Clock Freq: 300 MHz** 

Clock Load: 3.75 nF Power in Clock = 20W (out of 50W) Two Level Clock Distribution: • Single 6-stage driver at center • Secondary buffers drive left and right side Max clock skew less than 100psec • Routing the clock in the opposite direction • Proper timing





Bruce Jacob

University of Maryland ECE Dept.

SLIDE 64

### **Dual Edge Clocking**



- Only one edge of clock latches data
- Duty cycle of clock signal is not relevent
- Clock signal operating at 2X switching rate of data
- Always a clock edge where you need one



- Both edges of clock used to latch in data
- Duty cycle & rise/fall times of clock must be even
- Clock signal must be phase shifted by 90 degrees relative to phase of data signal
- How do you get 90 degrees ??



### Phase Locked Loop



- Given a data signal, recover the frequency and phase of the data signal, generate local reference clock  $\phi_{out}$
- Local reference clock may be frequency multiple of input clock
- PLL depends on data input to provide "enough" signal transitions to lock onto, else PLL could lose coherency.
- Modern processors utilize PLL's for frequency multiplication



ENEE 359a Lecture/s 16-19

System Timing

**Bruce Jacob** 

University of

Maryland ECE Dept.

| ENEE 359a       |
|-----------------|
| Lecture/s 16-19 |
| System Timing   |

Bruce Jacob

University of Maryland ECE Dept.

SLIDE 66

### **Voltage Controlled Oscillator**



**Ring Oscillator** 

- VCO may be designed from ring oscillator where voltage controls the number of (odd) stages of inverters in the feedback ring



- VCO may be designed from resonant oscillator where voltage controls capacitance in LC circuit.



### **Delay Locked Loop**



- Given a data signal and reference clock, compare and adjust phase of local clock signal by  $\Delta \phi$
- Unlike PLL, requires reference clock
- Hence, no need to "recover" clock signal with VCO
- Modern DRAM with dual edged clocking utilizes DLL's for phase compensation. (gets you 90 degrees)



University of Maryland

ECE Dept.

**Bruce Jacob** 

ENEE 359a Lecture/s 16-19

System Timing

### **ENEE 359a** Lecture/s 16-19 **Zero-Skew Clock Distribution** System Timing **Bruce Jacob** University of Maryland ECE Dept. SLIDE 68 PLL or

DLL



