### High Speed Communication Circuits and Systems Lecture 22 Delay Locked Loops and High Speed Circuit Highlights

Michael H. Perrott April 28, 2004

Copyright © 2004 by Michael H. Perrott All rights reserved.

M.H. Perrott

### Recall the CDR Model (Hogge Det.) From Lecture 21



- Similar to frequency synthesizer model except
  - No divider
  - Phase detector gain depends on the transition density of the input data

### Key Observation: Must Use a Type II Implementation



Integrator in H(s) forces the steady-state phase error to zero

Important to achieve aligned clock and to minimize jitter

M.H. Perrott

### Issue: Type II System Harder to Design than Type I



#### A stabilizing zero is required

Undesired closed loop pole/zero doublet causes peaking M.H. Perrott

### **Delay Locked Loops**



#### Delay element used in place of a VCO

- No integration from voltage input to phase output
- System is Type 1

M.H. Perrott

### System Design Is Easier Than For CDR



### **Example Delay-Locked Loop Implementation**



- Assume an input clock is provided that is perfectly matched in frequency to data sequence
  - However, phase must be adjusted to compensate for propagation delays between clock and data on the PC board
- A variable delay element is used to lock phase to appropriate value
  - Phase detector can be similar to that used in a CDR
    - Hogge, Bang-Bang, or other structures possible

# The Catch



- Delay needs to support an infinite range if system to be operated continuously
  - Can otherwise end up at the end of range of delay element
    - Won't be able to accommodate temperature variations
- Methods have been developed to achieve infinite range delay elements
  - Efficient implementation of such delay elements is often the key issue for high performance designs

# The Myth



- Delay locked loop designers always point to jitter accumulation problem of phase locked loops
  - Implication is that delay locked loops can achieve much lower jitter than clock and data recovery circuits
- The reality: phase locked loops can actually achieve lower jitter than delay locked loops
  - PLL's can clean up high frequency jitter of input clock
  - Whether a PLL or DLL is better depends on application (and achievable VCO performance)

### **One Method of Achieving Infinite Delay**

 $\cos(2\pi f_{in}t + \phi) = \cos(2\pi f_{in}t)\cos(\phi) - \sin(2\pi f_{in}t)\sin(\phi)$ 



Phase shift of a sine wave can be implemented with I/Q modulation

$$I = \cos(\Phi), \quad Q = \sin(\Phi)$$

- Note: infinite delay range allows DLL to be used to adjust frequency as well as phase
  - Phase adjustment now must vary continuously
  - Hard to get low jitter in practical implementations

### **Conceptual Implementation of Infinite Delay Range**



Practical designs often implement cos(Φ) and sin(Φ) signals as phase shifted triangle waves

M.H. Perrott

### Some References on CDR's and Delay-Locked Loops

- Tom Lee et. al. were pioneers of the previous infinite range DLL approach
  - See T. Lee et. al., "A 2.5 V CMOS Delay-Locked Loop for an 18 Mbit, 500 Megabyte/s DRAM", JSSC, Dec 1994
- Check out papers from Mark Horowitz's group at Stanford
  - Oversampling data recovery approach
    - See C-K K. Yang et. al., "A 0.5-um CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery using Oversampling", JSSC, May 1998
  - Multi-level signaling
    - See Ramin Farjad-Rad et. al., "A 0.3-um CMOS 8-Gb/s 4-PAM Serial Link Transceiver", JSSC, May 2000
  - Bi-directional signaling
    - See E. Yeung, "A 2.4 Gb/s/pin simultaneous bidirectional parallel link ....", JSSC, Nov 2000

# High Speed Circuit Highlights

### Examine Techniques from a Few Recent Papers

- Circuit architectures utilizing circular topologies
  - "A 40-Gb/s Clock and Data Recovery Circuit in 0.18-um CMOS Technology", Jri Lee and Behzad Razavi, JSSC, Dec. 2003
  - "Fully Integrated CMOS Power Amplifier Design Using the Distributed Active-Transformer Architecture", Ichiro Aoki, ..., Ali Hajimiri, JSSC, March 2002
  - "A Circular Standing Wave Oscillator", W. Andress, Donhee Ham, ISSCC 2004
    - Donhee will talk about this (and other things) in his guest lecture
- Low Noise, High Bandwidth Sigma-Delta Fractional-N Frequency Synthesizers
  - "A Fractional-N Frequency Synthesizer Architecture Utilizing a Mismatch Compensated PFD/DAC Structure
     "Scott Moninger, Michael Perrott, TCASIL, New 2003
    - ...", Scott Meninger, Michael Perrott, TCASII, Nov 2003

## A 40 Gb/s CDR in 0.18u CMOS! (Razavi et. al.)



#### Achieves high speed operation using interleaving

- 4 parallel 10 Gb/s detectors are fed by an 8-phase VCO
  - 4 phases used for sampling registers
  - 4 phases used for bang-bang phase detection registers

### Key challenges

- Low jitter and low mismatch between clock phases
  - We will look at this issue in detail here
- Achievement of 10 Gb/s sampling/bang-bang detection

### The Need for Low Mismatch Between Clock Phases



8-phases generated by 4 VCO clock signals and their complements

Desired spacing between clock signals is only 12.5 ps!

- Must meet setup and hold times of each 10 Gb/s sampler and phase detector register (limited by 0.18u technology)
- Mismatch and jitter on clock phases quickly eats into any margin left over after meeting setup/hold times
  - Unacceptable bit error rates can easily result

### A Method to Generate Clock Phases



- Use transmission delay lines to generate each phase
- Advantage over using buffers as delay elements
  - Wide bandwidth and lower noise
  - Mismatch only a function of geometry variation
    - Buffer mismatch a function of both geometry and device variation (i.e., doping variation, etc.)
- Issue: transmission line is big
  - Loss (and finite bandwidth) due to finite resistance of metal
  - Long distance between clock phase outputs undesirable

### Realize a Lumped Parameter Version of Trans. Line



Approximate transmission line as an LC ladder network

- Allows a much more compact implementation
- Offers the same advantage of having mismatch depend only on geometry
- Issue: now that mismatch has been dealt with, how do we achieve low jitter?



- Can satisfy Barkhausen criterion by inverting output of line and feeding back to the input
  - Looks a bit like a ring oscillator, but much better phase noise performance

## Sustain Oscillation by Including Negative Resistance



- Place negative resistance at each phase to keep amplitudes identical
  - Must be careful to minimize impact on mismatch
- Issue: how do you match feedback path from CLK<sub>180</sub> to CLK<sub>0</sub> with other phases?

### **Use a Circular Geometry!**

![](_page_20_Figure_1.jpeg)

#### Note use of differential inductors, etc.

M.H. Perrott

### Other Nice Nuggets in the Razavi Paper

- Phase detection using 4 bang-bang detectors
  - Clever combining of individual detectors to create an overall control voltage
  - Note: Bang-bang detection linearized by metastable behavior of registers
- Achievement of 10 Gb/s registers in 0.18u CMOS
  - Leverages a large amplitude clock signal using a tuned
     VCO buffer
  - Uses SCL registers with resistor loads bottom current sources eliminated to leverage large amplitude clock
- Fast XOR gate and amplifier structures

Take a look at the paper for more details:"A 40-Gb/s Clock and Data Recovery Circuit in 0.18-umCMOS Technology", Jri Lee and Behzad Razavi, JSSC, Dec. 2003

### A 2 Watt, 2.4 GHz CMOS Power Amplifier (Hajimiri et. al.)

![](_page_22_Figure_1.jpeg)

- Key issue facing CMOS power amps:
  - Breakdown voltage is too low for transistors with sufficient speed
  - Example
    - 0.35u CMOS limited to about a 3V supply
    - To keep in M<sub>1</sub> in saturation, assume we need V<sub>out</sub> > 0.5 V

$$\implies P_{out} \le \frac{((3.0 - 0.5)/\sqrt{2})^2}{R_L} = \frac{2.5^2}{2 \cdot 50\Omega} = 63 \text{mW}$$

### Key Idea: Use a Transformer!

![](_page_23_Figure_1.jpeg)

To achieve 1 Watt at the output, we need:

$$Z_{in} \le \frac{((3.0 - 0.5)/\sqrt{2})^2}{P_{out}} = \frac{2.5^2}{2 \cdot 1 \text{ Watt}} = 3.1 \Omega$$

- We know that:  $Z_{in} = \frac{1}{n^2} R_L$
- Therefore, setting n = 4 is adequate:

$$n = 4 \Rightarrow Z_{in} = \frac{1}{4^2} 50\Omega = 3.1\Omega$$

### A Practical Issue for High Frequency Transformers

![](_page_24_Figure_1.jpeg)

- High frequency transformers are formed by coupled inductors
  - Will typically have a net inductive impedance at the operating frequency (assuming self-resonant frequency is well above operating frequency)

Use a capacitor to resonate out the inductive component of Z<sub>in</sub> at the desired frequency

### The Issue of Bondwires

![](_page_25_Figure_1.jpeg)

- The presence of bondwires will alter the impedance seen by the transistor
  - Would prefer to desensitize the circuit to the bondwire inductances

### The Fix

![](_page_26_Figure_1.jpeg)

- A differential topology places the bondwire nodes at incremental ground
  - Bondwire inductance now has little impact

### How Do We Implement the Transformer?

![](_page_27_Figure_1.jpeg)

#### Classical options

- Spiral 1:n transformer
  - Problem: very lossy

#### **Resonant L-match or** $\pi$ **-match transformer**

Problem: still too lossy (though better than a spiral transformer)

A novel approach by Aoki & Hajimiri: Create a distributed, active transformer

### A 1:4 Transformer Achieved Using Four 1:1 Sections

![](_page_28_Figure_1.jpeg)

- 1:1 transformers can be implemented much more efficiently than their 1:4 counterparts
  - Winding ratio is one-to-one, and integrated processes allow very close proximity between the two windings
- Cascading of the secondary windings leads to their output voltages being summed
  - Net effect is a 1:4 transformer!

### Implementation of 1:1 Transformer Sections

![](_page_29_Figure_1.jpeg)

High efficiency using slab (i.e. straight wire) inductors

Avoids inefficiency of current crowding at corners of windings M.H. Perrott **Problem:** Long Wires Required for Diff. Pair Elements

![](_page_30_Figure_1.jpeg)

- The use of slab inductors would seem to imply that an equally long return path for the current is required
  - Implies that long wires are required for connection to capacitor and differential pair transistors
    - Issue: loss and undesired inductance

### A Clever Fix: Redefine The Differential Pairs

![](_page_31_Figure_1.jpeg)

- Observation: neighbors of adjoining transformer sections have opposite signaling on their transistor gates
  - Can define differential pairs to be between the sections rather than within each section
  - Short wires can now be achieved for capacitor and transistors

Issue: what do you do about the ends?

### Use a Circular Topology!

![](_page_32_Figure_1.jpeg)

Removes the end effects!
M.H. Perrott

### **Other Issues to Consider**

- Efficient achievement of 50 Ohm matching at the input of the amplifier
- Efficiency calculations
- Input power distribution
- Harmonic suppression

Take a look at the paper for more details: "Fully Integrated CMOS Power Amplifier Design Using the Distributed Active-Transformer Architecture", Ichiro Aoki, Scott Kee, David Rutledge, Ali Hajimiri, JSSC, March 2002

### Wide Bandwidth, Low Noise Fractional-N Synthesizers

Fractional-N frequency synthesis

![](_page_34_Figure_2.jpeg)

- Achieves very high frequency resolution
- There is a noise/bandwidth tradeoff

### The Issue of Quantization Noise

![](_page_35_Figure_1.jpeg)

- Divide value dithering introduces noise
- Sigma-Delta modulation shapes noise to high frequencies

M.H. Perrott

### Impact of $\Sigma$ - $\Delta$ Quantization Noise on Synth. Output

![](_page_36_Figure_1.jpeg)

• Lowpass action of PLL dynamics suppresses the shaped  $\Sigma$ - $\Delta$  quantization noise

### Impact of Increasing the PLL Bandwidth

![](_page_37_Figure_1.jpeg)

Higher PLL bandwidth leads to less quantization noise suppression

There is a direct trade-off between PLL bandwidth and jitter

### Method 1 of Reducing Quantization Noise

![](_page_38_Figure_1.jpeg)

- Lower quantization step size by switching between multiple phases of the VCO output
  - Generate phases by using a ring oscillator or delay locked loop
- Issue: noise induced by mismatch between phases

### Method 2 of Reducing Quantization Noise

![](_page_39_Figure_1.jpeg)

- Use classical fractional-N approach of "phase interpolation" to cancel out quantization noise
  - Use a D/A converter matched to PFD/Charge Pump output
- Issue: limited by mismatch between gain of D/A and PFD/Charge Pump output and nonlinearity in D/A

M.H. Perrott

# **Comparison of Approaches**

![](_page_40_Figure_1.jpeg)

"Vertical" approach

![](_page_40_Figure_3.jpeg)

### Which Is Best?

- Phase shifting
  - Limited by number of phases that can be generated and their mismatch
  - Ring oscillators have poor phase noise
- Phase interpolation
  - Limited by ability to match DAC output to that of the PFD/Charge pump
  - High spurious noise can result due to DAC nonlinearity

#### Key observation

- Phase interpolation allows us to take advantage of advances in DAC design over the last 20 years
  - We can now largely overcome the above limitations!

### **Two Recent Approaches to the Cancellation Method**

- "A Wideband 2.4-GHz Delta-Sigma Fractional-N PLL With 1-Mb/s In-Loop Modulation", Sudhakar Pamarti and Ian Galton, JSSC, Nov 2004
  - Impact of DAC mismatch mitigated by using  $\Sigma$ - $\Delta$  modulator rather than accumulator to perform dithering
  - Impact of DAC nonlinearity mitigated by using mismatch noise shaping techniques
  - Overall: reliably achieves 20 dB noise suppression
- "A Fractional-N frequency synthesizer architecture utilizing a mismatch compensated PFD/DAC structure...", Scott Meninger and M.H. Perrott, TCAS II, Nov 2003
  - Utilizes a mismatch compensated PFD/DAC structure
  - Simulations show that 40 dB noise suppression is achievable!

### Key Element: A PFD/DAC Structure

![](_page_43_Figure_1.jpeg)

Leverages application of selective delays of parallel PFD outputs to realize the D/A function

- No explicit D/A required
- Delay of one VCO cycle can be easily achieved using registers clocked by the VCO
- Illustrate the idea through animation

### Apply Phase Shift to Two out of the Four PFD's

![](_page_44_Figure_1.jpeg)

#### Net horizontal level shifts to halfway point

### Apply Phase Shift to Three out of the Four PFD's

![](_page_45_Figure_1.jpeg)

#### Net horizontal point shifts up

### DAC function is self-aligned in gain to PFD output!

## Actual PFD/DAC Implementation

![](_page_46_Figure_1.jpeg)

- A current DAC is used, but is self-aligned to PFD output using the phase shifting method just discussed
- Nonlinearity of the DAC is removed using mismatch noise shaping techniques
- Note: approach overcomes mismatch limitations of prior art: Y. Dufour, "... Fractional Division Charge Compensation ...", US Patent 6,130,561

M.H. Perrott

### Goal: GSM Level Noise Performance with 1 MHz Bandwidth!

### CppSim simulations verify this is possible with only a 6-bit DAC!

![](_page_47_Figure_2.jpeg)

- Left: <u>Calculated</u> Performance (PLL Design Assistant)
- Right: <u>Simulated</u> Performance (CppSim)

M.H. Perrott

### **Other Issues to Consider**

- Additional nonidealities must be dealt with
  - Timing mismatch
  - Impact of shape of horizontal cancellation waveforms
  - Impact of both DAC element and timing mismatch sources on achievable spurious performance
- Note: detailed analytical examination of the above items is difficult
  - CppSim is an invaluable tool for exploring such issues

#### Take a look at the paper for more details: "A Fractional-N Frequency Synthesizer Architecture Utilizing a Mismatch Compensated PFD/DAC Structure ...", Scott Meninger, Michael Perrott, TCASII, Nov 2003