Onderbemonsterende technieken met laag vermogen voor een volledig gedigitaliseerde klok-en-data-extractie in multigigabit passieve optische netwerken

Low-Power Subsampling All-Digital Clock and Data Recovery Techniques for Multi-Gigabit Passive Optical Networks

Marijn Verbeke

Promotoren: prof. dr. ir. G. Torfs, prof. dr. ir. P. Rombouts Proefschrift ingediend tot het behalen van de graad van Doctor in de ingenieurswetenschappen: elektrotechniek

UNIVERSITEIT GENT Vakgroep Informatietechnologie Voorzitter: prof. dr. ir. B. Dhoedt Faculteit Ingenieurswetenschappen en Architectuur Academiejaar 2017 - 2018

ISBN 978-94-6355-088-8 NUR 959 Wettelijk depot: D/2018/10.500/6

# , FACULTY OF ENGINEERING



Ghent University Faculty of Engineering and Architecture

Department of Information Technology iGent Tower, Tech Lane Ghent Science Park 15, B-9052 Ghent, Belgium

Tel.: +32 9 264 33 40

#### Low-Power Subsampling All-Digital Clock and Data Recovery Techniques for Multi-Gigabit Passive Optical Networks

Marijn Verbeke

#### Members of the examination board:

| Ghent University           |
|----------------------------|
| Ghent University           |
| KU Leuven                  |
| Vrije Universiteit Brussel |
|                            |

Dissertation submitted to obtain the degree of Doctor of Electrical Engineering Academic year 2017-2018

This work was supported by The Agency for Innovation by Science and Technology in Flanders (IWT).

#### Dankwoord

Beste vrienden en familie, Beste collega's en professoren,

Na vier jaar doctoreren, sta ik hier nu met mijn "boek" in mijn handen. Wanneer ik terugblik op die vier jaar, kan ik alleen maar zeggen dat de tijd razendsnel voorbij is gevlogen. Dit komt dankzij mijn toffe collega's en de geweldige sfeer die er heerst in het 'design'-labo, waar ik mijn onderzoek heb gedaan. In het bijzonder wil ik Michael, *Manolo*, Haolin, Joris, Hannes en *Hubert* bedanken voor de vele gezellige koffie-/chocomelkpauzes. Ik denk nu al met heimwee terug aan de lekkere worstenbroodjes (en ander gezond eten), spelletjes- en filmavonden en natuurlijk het zotte vrijgezellenweekend.

De hoofdreden waarom ik besloot om te doctoreren, is omdat ik in de eerste plaats mijn kennis over elektronica en chipontwerp wou uitbreiden. Dankzij de professionele begeleiding van mijn promotoren Guy en Pieter, heb ik het gevoel dat ik die doelstelling meer dan bereikt hebt. Onze wekelijkse meetings – waarbij naast de technische aspecten ook even tijd was voor small talk en andere interessante levenslessen – en de mogelijkheid dat ik altijd kon binnenspringen, wanneer ik ergens vast zat, waren hierbij onmisbaar. De interne reviewprocessen voor publicaties waren bovendien zeer grondig en intensief, maar het resultaat loonde. Bijgevolg, heb ik dan ook het gevoel dat ik het winnende lot heb getrokken met jullie als mijn promotoren.

Ik wil graag ook Johan en Scott bedanken om mij bij de INTEC design groep te laten beginnen, voor de leiding van het labo en de aangeboden hulp. Ook de oude vakgroepvoorzitter em. prof. dr. ir. Daniël De Zutter en de nieuwe vakgroepvoorzitter prof. dr. ir. Bart Dhoedt wens ik te bedanken voor de aangeboden faciliteiten en voor de verzorgde omkadering die voorzien werd in de vakgroep. Hierbij wil ik ook Arno bedanken om steeds zijn kennis en ervaring over *clock and data recovery* en digitaal ontwerp met mij te delen. Jean, bedankt voor de nodige technische ondersteuning en Mike voor de onmisbare administratieve hulp.

Naast het 'harde' werk is natuurlijk ook veel te beleven en ik ben dan ook gelukkig om dit met een fantastische vriendkring te delen. Aan mijn vrienden van het middelbaar Bert, Bruno, Eva, Jochen, Jonas, Stefanie en Ward, dank jullie om ondertussen al meer dan 10 jaar deel uit te maken van mijn leven. Frederick, Lien, Michiel, dank jullie voor de maandelijkse gezellige rendez-vous. Mijn vrienden van de elektrotechniek Dries, Michaël, Sander, Stan, Bernard, Jens, Jeroen, Matthias, Thibault, Tim en Willeke, wens ik graag te bedanken voor de vele leuke en grappige momenten tijdens onze drukke studietijd aan de universiteit. Ondanks onze drukke agenda's, proberen we nog steeds contact te houden met elkaar. En dankjewel aan de Boombalvrienden, met in het bijzonder Ester, David, Mieke en Martijn, voor het vele dansplezier en het gezellig samenzijn.

Uiteraard wil ik graag mijn ouders en mijn zus bedanken. Papa, bedankt om mij te leren klussen en mij steeds met mijn beide voeten op de grond te houden. Mama, dank je voor al je liefde, zorg en ingebouwd stemmetje dat alarm slaat bij enige mogelijke dreiging. Bedankt dat jullie er altijd voor mij zijn en voor alles dat jullie voor mij doen en gedaan hebben. Ik weet dat ik voor goed advies steeds bij jullie terecht kan. Graag wil ik ook mijn geduchte tegenstander in karate en in het opnoemen van filmquotes bedanken en die ik met veel trots mijn zus mag noemen. Dankjewel Eef voor alle gezellige studie-uren aan de keukentafel en zoveel meer.

Het laatste bedankje gaat naar mijn lieve vriendin Jolien. Op het moment dat we nog maar net samen waren, was mijn meest stressvolle tape-out dichtbij. Dank je om toen reeds te tonen dat ik op jouw steun kon rekenen en mij te motiveren om door te gaan. Bedankt om steeds mijn out-of-bed look op punt te stellen wanneer ik naar het werk vertrek en ook om me steeds naar huis te lokken met je lekkere kookkunsten. Ik vind het nog steeds ongelooflijk dat iemand die toch 'een beetje' chaotisch is, zo een grote houvast kan zijn in mijn leven.

Gent, Januari 2018

Marijn Verbeke

# Table of Contents

| Dankwoord                     | ix     |
|-------------------------------|--------|
| Nederlandstalige Samenvatting | xxix   |
| English Summary               | xxxiii |
| List of Publications          | xxxvii |

| Ι | Intr | oduction to Internet Communication and CDRs           | 1  |
|---|------|-------------------------------------------------------|----|
| 1 | Intr | oduction                                              | 3  |
|   | 1.1  | Evolution of Data Consumption                         | 3  |
|   |      | 1.1.1 Internet Traffic                                | 3  |
|   |      | 1.1.2 Power Consumption                               | 7  |
|   | 1.2  | Networks Today                                        | 8  |
|   |      | 1.2.1 Optical Access Networks                         | 9  |
|   |      | 1.2.2 Passive Optical Network                         | 10 |
|   |      | 1.2.3 Optical Receiver                                | 14 |
|   | 1.3  | Objective of this Work                                | 15 |
|   | 1.4  | Overview of the Dissertation                          | 16 |
| 2 | Mul  | ti-Gigabit Clock and Data Recovery                    | 21 |
|   | 2.1  | Introduction to CDRs                                  | 21 |
|   | 2.2  | Jitter and Wander                                     | 23 |
|   |      | 2.2.1 Jitter Specifications                           | 24 |
|   | 2.3  | CDR Types                                             | 27 |
|   |      | 2.3.1 Oversampling without Feedback Phase Tracking .  | 27 |
|   |      | 2.3.2 Phase Alignment without Feedback Phase Tracking | 28 |
|   |      | 2.3.3 Feedback Phase Tracking                         | 29 |
|   | 2.4  | PLL-Based CDR Structure                               | 30 |

|   | 2.5  | Evolut   | ion to Digital CDR                                 | 32 |
|---|------|----------|----------------------------------------------------|----|
|   |      | 2.5.1    | All-Digital CDR Structure                          | 32 |
|   |      | 2.5.2    | Advantages                                         | 33 |
|   |      | 2.5.3    | Challenges                                         | 35 |
|   | 2.6  | Next-O   | Generation (All-Digital) Clock and Data Recovery . | 37 |
|   |      |          |                                                    |    |
| Π | An   | alysis,  | Design and Implementation                          | 45 |
| 3 | Cloc | k and I  | Data Recovery Analysis                             | 47 |
|   | 3.1  | CDR F    | Phase Domain Model                                 | 47 |
|   | 3.2  | Descri   | bing Functions: Pseudo-Linear Model                | 51 |
|   |      | 3.2.1    | Random-Input Describing Function                   | 51 |
|   |      | 3.2.2    | Limit Cycles                                       | 53 |
|   |      | 3.2.3    | Gaussian-plus-Sinusoid-Input Describing Function   | 54 |
|   | 3.3  | Stabili  | ty in Charge Pump CDRs                             | 56 |
|   |      | 3.3.1    | System Relations                                   | 57 |
|   |      | 3.3.2    | Algorithm                                          | 59 |
|   |      | 3.3.3    | Application of the Algorithm                       | 60 |
|   |      | 3.3.4    | Simulation Results                                 | 61 |
|   |      | 3.3.5    | Influence of the CDR Design Parameters             | 64 |
|   |      | 3.3.6    | Further Analytical Approximations                  | 68 |
|   | 3.4  | Jitter A | Analysis in Charge Pump CDRs                       | 71 |
|   |      | 3.4.1    | Jitter Transfer and Jitter Generation              | 72 |
|   |      | 3.4.2    | Jitter Tolerance                                   | 73 |
|   | 3.5  | AD-Cl    | DR Phase Domain Jitter Analysis                    | 76 |
|   |      | 3.5.1    | Sampled-Data Mixed-Signal AD-CDR Model             | 76 |
|   |      | 3.5.2    | Aliasing                                           | 77 |
|   |      | 3.5.3    | Discrete-Time Multi-Rate Modeling of AD-CDR .      | 79 |
|   |      | 3.5.4    | LTV Analysis of Subsampled AD-CDR                  | 82 |
|   | 3.6  | CID in   | Subsampled AD-CDR                                  | 83 |
|   |      | 3.6.1    | Idle Time                                          | 83 |
|   |      | 3.6.2    | Phase Drift                                        | 85 |
|   | 3.7  | Simula   | ations of Subsampled AD-CDR                        | 85 |
|   |      | 3.7.1    | Model                                              | 85 |
|   |      | 3.7.2    | Phase Noise Simulations                            | 89 |
|   |      | 3.7.3    | Robustness Against CID                             | 89 |
|   | 3.8  | Discus   | ssion                                              | 92 |

| 4   | AD-  | CDR Architecture and Design                              | 97 |
|-----|------|----------------------------------------------------------|----|
|     | 4.1  | System Architecture                                      | 97 |
|     | 4.2  | Bang-Bang Phase Detector                                 | 98 |
|     |      | 4.2.1 Comparison of Alexander and Inverse Alexander PD   | 99 |
|     |      | 4.2.2 PD Characteristics                                 | 01 |
|     |      | 4.2.3 Performance                                        | 04 |
|     | 4.3  | Digitally Controlled Oscillator                          | 06 |
|     | 4.4  |                                                          | 08 |
|     | 4.5  |                                                          | 09 |
|     | 4.6  |                                                          | 10 |
| 5   | Circ | cuit Implementation 1                                    | 13 |
|     | 5.1  | -                                                        | 13 |
|     | 5.2  |                                                          | 15 |
|     |      | 1 6                                                      | 15 |
|     |      |                                                          | 15 |
|     |      |                                                          | 20 |
|     |      | 1 0                                                      | 20 |
|     | 5.3  |                                                          | 20 |
|     | 5.4  |                                                          | 23 |
|     | 5.5  |                                                          | 24 |
|     | 5.6  |                                                          | 24 |
|     |      |                                                          |    |
| III | [ Re | esults and Conclusions 12                                | 29 |
| 6   | Exp  |                                                          | 31 |
|     | 6.1  | 1                                                        | 31 |
|     |      | 6.1.1 Electrical Test Setup                              | 33 |
|     |      | 6.1.2 Optical Test Setup                                 | 34 |
|     | 6.2  | Electrical Tests in Continuous Mode                      | 36 |
|     |      |                                                          | 36 |
|     |      | 6.2.2 Digitally Controlled Oscillator Operation 1        | 36 |
|     |      | 6.2.3 Phase Detector Operation                           | 39 |
|     |      | 6.2.4 Comparison Conventional and Inverse Alexander PD 1 | 40 |
|     |      | 6.2.5 All-Digital Clock and Data Recovery Operation 1    | 42 |
|     |      | 6.2.6 Describing Function Stability Verification 1       | 48 |
|     | 6.3  | Electrical Setup Tests in Burst Mode                     | 50 |
|     |      | 6.3.1 Frame Structure                                    | 50 |
|     |      | 6.3.2 Settling Time                                      | 53 |

xiii

|    | 6.4 | Optical Setup Tests in Continuous Mode | 154<br>154<br>154<br>157 |
|----|-----|----------------------------------------|--------------------------|
|    | 6.5 |                                        | 157                      |
| 7  | Con | clusion and Future Work                | 163                      |
|    | 7.1 | Conclusion                             | 163                      |
|    | 7.2 | Future Work                            | 165                      |
|    |     | 7.2.1 Possible Improvements            | 165                      |
|    |     | 7.2.2 Additional Functionalities       | 166                      |
|    |     | 7.2.3 Higher Data Rates                | 166                      |
| IV | A   | opendix                                | 171                      |
| A  | LTV | Analysis Calculations                  | 173                      |

# List of Figures

| 1.1  | The number of internet users from 2001 until 2017 [3]                   | 4  |
|------|-------------------------------------------------------------------------|----|
| 1.2  | The future global internet traffic by application [2]                   | 4  |
| 1.3  | The evolution of the Netflix subscribers [4]                            | 6  |
| 1.4  | The current and future video requirements [2]                           | 6  |
| 1.5  | The evolution of Facebook platform Medium Attachment                    | _  |
|      | Units (MAUs) [4]                                                        | 7  |
| 1.6  | The Compound Annual Growth Rate (CAGR) of the elec-                     |    |
|      | tricity consumption in ICT compared to the total worldwide              |    |
|      | electricity consumption. Networks is the fastest growing category [12]. | 8  |
| 1.7  | The modern telecommunication network hierarchy [14].                    | 9  |
| 1.8  | A Passive Optical Network [14].                                         | 12 |
| 1.9  | The communication in PON: (a) downstream and (b) up-                    |    |
| 117  | stream.                                                                 | 13 |
| 1.10 |                                                                         | 14 |
| 2.1  | An eye diagram at the input of the decision circuit with ISI,           |    |
|      | noise and jitter.                                                       | 22 |
| 2.2  | A basic block diagram of a Clock and Data Recovery circuit.             | 24 |
| 2.3  | Jitter: unwanted phase variations of a signal                           | 25 |
| 2.4  | The jitter tolerance mask for SDH STM-256 [9]                           | 27 |
| 2.5  | A block diagram of a PLL-based CDR circuit                              | 31 |
| 2.6  | A charge pump PLL-based CDR circuit.                                    | 31 |
| 2.7  | An All-Digital CDR.                                                     | 33 |
| 2.8  | The proposed next-generation All-Digital CDR                            | 37 |
| 3.1  | A charge pump Phase Locked Loop-based Clock and Data                    |    |
|      | Recovery circuit.                                                       | 49 |
| 3.2  | The behavioral model of a CDR with a BB-PD.                             | 49 |

| 3.3  | A time domain example of the describing function model<br>for a non-linearity. (a) The characteristic of a comparator<br>(a non-linearity). (b) The describing function characteristic<br>according to the original approach in [8]. (c) The definition<br>of the linearization error $\phi_q$ , which is included in [2] and in<br>our pseudo-linear analysis | 52 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.4  | The RIDF model of a CDR with a BB-PD                                                                                                                                                                                                                                                                                                                           | 53 |
| 3.5  | The GSIDF model of the non-linearity of a BB-PD                                                                                                                                                                                                                                                                                                                | 54 |
| 3.6  | $K_s$ according to Eq. (3.15) as a function of the amplitude $A_e$ and the RMS jitter $\sigma_e$ at the input of the non-linearity.<br>( $\alpha = 0.5$ )                                                                                                                                                                                                      | 56 |
| 3.7  | The GSIDF model of a CDR with a BB-PD for (a) the sinusoidal component and (b) the random Gaussian component (identical to the RIDF model in Fig. 3.4).                                                                                                                                                                                                        | 57 |
| 3.8  | The altered GSIDF model of a CDR with a BB-PD for<br>the random Gaussian component (equivalent to the RIDF<br>model in Fig. 3.4)                                                                                                                                                                                                                               | 59 |
| 3.9  | The limit cycle amplitude $A_e$ as a function of the RMS in-<br>put jitter $\sigma_{in}$ . The simulation results where performed with:<br>$f_{data} = 10 \text{ GHz}, \ \omega_z = 2\pi \cdot 300 \text{ kHz}, \ \omega_0 = 2\pi \cdot 3 \text{ MHz}, \ \omega_p = 2\pi \cdot 30 \text{ MHz}$ and $T_d = 3 \text{ ns.} \dots \dots \dots \dots \dots \dots$   | 61 |
| 3.10 | The power spectrum $S_{\phi_{out}}$ of the same CDR as in Fig. 3.9 for an input noise level $\sigma_{in} = \sqrt{2} \cdot \sigma_{in,th}$ .                                                                                                                                                                                                                    | 62 |
| 3.11 | The power spectrum $S_{\phi_{out}}$ of the same Clock and Data Re-<br>covery (CDR) as in Fig. 3.9 for an input noise level $\sigma_{in} = \frac{\sigma_{in,th}}{\sqrt{2}}$ . The simulation results are compared to the pre-<br>diction where the CDR does not contain any limit cycles:<br>i.e. the RIDF and to the prediction where a limit cycle is         |    |
|      | present in the CDR: i.e. the GSIDF                                                                                                                                                                                                                                                                                                                             | 62 |
| 3.12 | The power spectrum $S_{\phi_{out}}$ of the same CDR as in Fig. 3.9 for an input noise level $\sigma_{in} = \sigma_{in,th}$ .                                                                                                                                                                                                                                   | 63 |
| 3.13 | The worst-case limit cycle amplitude $A_{e,max}$ as a function<br>of the gain $\omega_0$ for different pole frequencies $\omega_p$ and delays<br>$T_d$ . The corresponding calculated results (solid lines) and<br>simulation results (markers) are represented with the same                                                                                  |    |
|      | color                                                                                                                                                                                                                                                                                                                                                          | 66 |

| 3.14 | The threshold RMS input jitter $\sigma_{in,th}$ as a function of the gain $\omega_0$ for different pole frequencies $\omega_p$ and delays $T_d$ .<br>The corresponding calculated results (solid lines) and simulation results (markers) are represented with the same color. | 67       |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 3.15 | A scatter plot of simulated $A_{e,max}$ as a function of the approximation according to Eq. (3.27) for different values of $\omega_0, \omega_p, T_d$ and $\alpha$ .                                                                                                           | 70       |
| 3.16 | A scatter plot of simulated $\sigma_{in,th}$ as a function of the approximation according to Eq. (3.31) for different values of $\omega_0, \omega_p, T_d$ and $\alpha$ .                                                                                                      | 70       |
| 3.17 | The GSIDF model of a CDR with a BB-PD for the jitter tolerance measurements with the block diagram for (a) the sinusoidal component and (b) the random Gaussian com-                                                                                                          |          |
|      | ponent (identical to the RIDF model in Fig. 3.4).                                                                                                                                                                                                                             | 74       |
| 3.18 | A block diagram of the proposed next-generation All-Digital                                                                                                                                                                                                                   |          |
|      | CDR                                                                                                                                                                                                                                                                           | 76       |
| 3.19 | Behavioral models of the proposed AD-CDR: (a) sampled-<br>data (mixed-type) discrete/continuous-time model. (b) discrete<br>time multirate model.                                                                                                                             | e-<br>78 |
| 2 20 | The phase noise through the AD-CDR: a transformed phase                                                                                                                                                                                                                       | 70       |
| 3.20 | domain model of the input branch                                                                                                                                                                                                                                              | 79       |
| 3.21 | The simulation model of the proposed AD-CDR. The red and green color indicate the $f_{data}$ -rate and $f_{dig}$ -rate operation, respectively.                                                                                                                               | 87       |
| 3.22 | Details of the simulation model of the proposed AD-CDR<br>in Fig. 3.21 with (a) the BB-PD building block and (b) the<br>DLF building block. The red and green color indicate the                                                                                              |          |
|      | $f_{data}$ -rate and $f_{dig}$ -rate operation, respectively.                                                                                                                                                                                                                 | 88       |
| 3.23 | Phase noise simulations with the different noise contribu-<br>tions derived from the LTV analysis with subsample fac-                                                                                                                                                         |          |
|      | tors: (a) $N = 16$ and (b) $N = 32$                                                                                                                                                                                                                                           | 90       |
| 3.24 | An example of the simulation results for the case where the subsampled PD output consists of $l = 100$ idle values (= 64                                                                                                                                                      |          |
|      | ns idle time )                                                                                                                                                                                                                                                                | 91       |
| 4.1  | The system diagram of the AD-CDR                                                                                                                                                                                                                                              | 98       |
| 4.2  | (a) The conventional Alexander PD and (b) the Inverse Alexan-                                                                                                                                                                                                                 | -        |
|      | der PD circuit.                                                                                                                                                                                                                                                               | 100      |

| 4.3  | Waveforms for the locking behavior of the Alexander PD :<br>(a) Ideal locking condition with phase difference $\Delta \phi = 0.5$<br>UI; (b) <i>Early</i> condition; (c) <i>Late</i> condition                                                                                                                                                              | 100      |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 4.4  | Waveforms for the locking behavior of the Inverse Alexan-<br>der PD : (a) Ideal locking condition with phase difference<br>$\Delta \phi = 0$ UI; (b) <i>Early</i> condition; (c) <i>Late</i> condition                                                                                                                                                      | 101      |
| 4.5  | Simplified (single pulse) PD output characteristics at full rate operation for the case of ideal waveforms: (a) the Alexander PD , (b) the Inverse Alexander                                                                                                                                                                                                | 102      |
| 4.6  | Simplified (single pulse) PD output characteristics at full rate operation for the case of duty-cycle distortion: (a) the Alexander PD and (b) the Inverse Alexander PD                                                                                                                                                                                     | 102      |
| 4.7  | PD waveforms for data with duty-cycle distortion corresponding to the anomalous cases (a) Alexander <i>Early</i> immediately followed by <i>Late</i> (most relevant for conventional Alexander), and (b) Simultaneous <i>Early</i> and <i>Late</i> (most relevant for Inverse Alexander).                                                                   | 103      |
| 4.8  | Simplified (single pulse) PD output characteristics at sub-<br>sampled rate operation: (a) the Alexander PD for the case<br>of duty-cycle distortion and (b) the Inverse Alexander PD<br>for the case of duty-cycle distortion.                                                                                                                             | 104      |
| 4.9  | Simulink simulations of the locking behavior in the case<br>of a pronounced duty-cycle distortion with (a) the eye dia-<br>gram of input data, (b) the persistence view of the recovered<br>clock of a CDR with a conventional Phase Detector, and (c)<br>the persistence view of the recovered clock of a CDR with<br>an Inverse Alexander Phase Detector. | 105      |
| 4.10 | The BER performance: (a) no subsampling; (b) subsample factor = 4                                                                                                                                                                                                                                                                                           | 107      |
| 4.11 | Waveforms of a '1010' data sequence and the 8 clock phases when the AD-CDR is <i>Early</i> . The red clock phases correspond to edge-related samples and the black to data-related samples (as in Fig. 4.4).                                                                                                                                                | 108      |
| 5.1  | The block diagram of AD-CDR implementation (speeds are in dicated for $25 \text{ Gb/s}$ operation). Red is used for edge-related samples and black for data-related samples (as in Fig. 4.4).                                                                                                                                                               | -<br>114 |
|      |                                                                                                                                                                                                                                                                                                                                                             |          |

xviii

| 5.2  | A detail of the full custom part of the <i>BB-PD</i> & <i>Subsampling</i> , which contains 6 samplers, a retiming block and a   |       |
|------|---------------------------------------------------------------------------------------------------------------------------------|-------|
|      | subsampling block (speeds are indicated for $25 \mathrm{Gb/s}$ operation)                                                       | 116   |
| 5.3  | ation)                                                                                                                          | 110   |
| 5.5  | sense amplifier input and a slower regenerative latch.                                                                          | 117   |
| 5.4  | The retiming circuit consisting of an array of retiming type                                                                    |       |
|      | I (postive edge triggered) flip-flops and an array of type II                                                                   |       |
|      | (negative edge triggered) flip-flops. Red is used for edge-                                                                     |       |
|      | related samples and black for data-related samples (as in Fig. 4.4).                                                            | 118   |
| 5.5  | The flip-flops used in the retiming circuit: (a) type I (posi-                                                                  | 110   |
| 5.5  | tive edge triggered) dynamic flip-flop and (b) type I (post-                                                                    |       |
|      | ative edge triggered) dynamic flip-flop                                                                                         | 119   |
| 5.6  | The subsampling circuit.                                                                                                        | 120   |
| 5.7  | The digital phase detection logic                                                                                               | 121   |
| 5.8  | The digital loop filter implementation.                                                                                         | 122   |
| 5.9  | The DCO structure: (a) ring oscillator and (b) delay cell                                                                       | 123   |
| 5.10 | The complete AD-CDR ASIC: (a) the drawn layout, (b) a                                                                           |       |
|      | photo of manufactured ASIC and (c) the legend                                                                                   | 125   |
| 5.11 | A photo of the fabricated AD-CDR-core together with an                                                                          | 100   |
|      | annotated layout view.                                                                                                          | 126   |
| 6.1  | The AD-CDR testboard.                                                                                                           | 132   |
| 6.2  | A photo of the implemented chip wire bonded on a high-                                                                          |       |
|      | speed PCB                                                                                                                       | 132   |
| 6.3  | The electrical test setup with AD-CDR                                                                                           | 133   |
| 6.4  | The optical test setup with AD-CDR.                                                                                             | 135   |
| 6.5  | The free running frequency of the DCO with (a) an overview the complete frequency range and (b) a detail around 6.25 CII        | - 127 |
| 6.6  | the complete frequency range and (b) a detail around $6.25 \text{ GH}$<br>The gain of the DCO $K_{dco}$ at $6.25 \text{ GHz}$ . | 138   |
| 6.7  | The supply sensitivity at $6.25 \text{ GHz}$ .                                                                                  |       |
| 6.8  | The sensitivity of the PD with a PRBS7 input data at $25 \text{ Gb/s}$                                                          |       |
| 6.9  | The measured BER for the conventional and the Inverse                                                                           | .140  |
| 0.7  | Alexander phase detector with a PRBS7 input data sequence                                                                       |       |
|      | at 25 Gb/s: (a) with a subsample factor $N = 16$ and (b)                                                                        |       |
|      | with a subsample factor $N = 32$ . (Digital Loop Filter set-                                                                    |       |
|      | tings: $K_p = 5$ and $K_i = 2^{-7}$ )                                                                                           | 141   |

| 6.10 | The phase noise of the recovered clock with a PRBS31 in-<br>put data sequence at 25 Gb/s: Comparison between Alexan-<br>der and Inverse Alexander PD for different subsample fac-<br>tors (i.e. $N = 16$ and $N = 32$ ). (Digital Loop Filter<br>settings: $K_p = 5$ and $K_i = 2^{-7}$ ) | 142 |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 6.11 | The phase noise of the recovered clock with a PRBS31 in-<br>put data sequence at 25 Gb/s: Sweep $K_p$                                                                                                                                                                                     | 143 |
| 6.12 |                                                                                                                                                                                                                                                                                           | 144 |
| 6.13 | Persistence plots of (a) the recovered (differential) clock<br>(jitter < 1.5 ps <sub>rms</sub> ) and (b) the recovered data (jitter $\approx 3.71$ ps <sub>rn</sub>                                                                                                                       |     |
| 6 14 | The jitter tolerance with a PRBS7 input data sequence at                                                                                                                                                                                                                                  | 145 |
|      | $25 \text{ Gb/s:}$ (a) Sweep $K_p$ and (b) Sweep $K_i$ .                                                                                                                                                                                                                                  | 146 |
| 6.15 | A packet in electrical burst mode measurements with a short gap (10.25 ns).                                                                                                                                                                                                               | 151 |
| 6.16 | A packet in electrical burst mode measurements with a long gap (41 ns).                                                                                                                                                                                                                   | 151 |
| 6.17 |                                                                                                                                                                                                                                                                                           | 152 |
| 6.18 | A captured 6.25 Gb/s output stream in electrical burst mode measurements.                                                                                                                                                                                                                 | 152 |
| 6.19 | The AD-CDR is always in lock after 35 ns for 2 million packets with setting $K_p = 7$ and $K_i = 2^{-9}$ .                                                                                                                                                                                | 153 |
| 6.20 | The eye diagram of the input signal of the CDR (PRBS9 @ $25 \text{ Gb/s}$ ).                                                                                                                                                                                                              | 154 |
| 6.21 | The eye diagram of one of the quarter-rate outputs of the CDR (@ $6.25 \text{ Gb/s}$ ).                                                                                                                                                                                                   | 155 |
| 6.22 | The phase noise of the quarter-rate recovered clock of the AD-CDR for a PRBS9 input data sequence at 25 Gb/s ( $K_p =$                                                                                                                                                                    | 155 |
| ( )) | 5, $K_i = 2^{-7}$ )                                                                                                                                                                                                                                                                       | 156 |
| 0.23 | The BER as a function of the voltage swing at the input of the AD-CDR.                                                                                                                                                                                                                    | 156 |
| 6.24 | A packet in optical burst mode measurements with a short gap $(10.25 \text{ ns})$ .                                                                                                                                                                                                       | 158 |
| 6.25 | A packet in optical burst mode measurements with a long                                                                                                                                                                                                                                   |     |
|      | gap (41 ns)                                                                                                                                                                                                                                                                               | 158 |

| $6.26$ A captured $6.25\mathrm{Gb/s}$ output stream in optical burst mode | 2     |
|---------------------------------------------------------------------------|-------|
| measurements                                                              | . 159 |
| 6.27 The AD-CDR is always in lock after 37.5 ns for 2 million             | 1     |
| packets with setting $K_p = 5$ and $K_i = 2^{-7}$ .                       | . 159 |

# List of Tables

| 1.1 | An Italian network forecast (2015-2020): device density<br>and energy requirements in the Business-As-Usual (BAU)<br>case [15] | 10  |
|-----|--------------------------------------------------------------------------------------------------------------------------------|-----|
| 1.2 | The advantages of optical fiber [16]                                                                                           | 11  |
| 3.1 | The model parameters of the complete linearized discrete-<br>time multirate AD-CDR model shown in Fig. 3.19(b)                 | 81  |
| 5.1 | The device sizes of the sense amplifier flip-flop shown in                                                                     |     |
|     | Fig. 5.3                                                                                                                       | 117 |
| 5.2 | The device sizes of the dynamic flip-flops shown in Fig. 5.5.                                                                  | 119 |
| 6.1 | The comparison of digital CDRs                                                                                                 | 147 |

# Glossary

### A

| AD-CDR | All-Digital Clock and Data Recovery     |
|--------|-----------------------------------------|
| AD-PLL | All-Digital Phase Locked Loop           |
| ASIC   | Application Specific Integrated Circuit |

### B

| BAU   | Business-As-Usual        |
|-------|--------------------------|
| BB-PD | Bang-Bang Phase Detector |
| BER   | Bit-Error Rate           |
| BiPON | Bit-interleaving PON     |
| BM-Rx | Burst Mode Receiver      |

# С

| CAGR | Compound Annual Growth Rate  |
|------|------------------------------|
| CBI  | Cascaded BiPON               |
| CDR  | Clock and Data Recovery      |
| CID  | Consecutive Identical Digits |

## D

| DCO | Digitally Controlled Oscillator |
|-----|---------------------------------|
| DLF | Digital Loop Filter             |
| DLL | Delay Locked Loop               |

| DPRBS | Differentiated Pseudo Random Bit Sequence |
|-------|-------------------------------------------|
| DUT   | Device Under Test                         |
|       |                                           |

#### F

| FTTC | Fibre-To-The-Curb     |
|------|-----------------------|
| FTTH | Fibre-To-The-Home     |
| FTTP | Fiber-To-The-Premises |

#### G

| GSIDF | Gaussian-plus-Sinusoid-Input Describing Func- |
|-------|-----------------------------------------------|
|       | tion                                          |
| GVCO  | Gated Voltage Controlled Oscillator           |

## I

| ICT   | Information and                          | l Communications Techno | ology  |
|-------|------------------------------------------|-------------------------|--------|
| IL    | Injection Locke                          | d                       |        |
| ISI   | InterSymbol Int                          | terference              |        |
| ITU-T | International                            | Telecommunications      | Union, |
|       | Telecommunication Standardization Sector |                         |        |

### L

| LSB | Least Significant Bit |
|-----|-----------------------|
| LTI | Linear Time-Invariant |
| LTV | Linear Time-Variant   |

#### Μ

| MAUs | Medium Attachment Units |
|------|-------------------------|
| MZM  | Mach-Zehnder Modulator  |

#### xxvi

### Ν

| NG-EPON | Next-Generation Ethernet PON |
|---------|------------------------------|
| NRZ     | Non-Return-to-Zero           |

## 0

| OLT | Optical Line Terminal |
|-----|-----------------------|
| ONU | Optical Network Unit  |

#### Р

| PAM-4 | 4-level Pulse- Amplitude Modulation |
|-------|-------------------------------------|
| PCB   | Printed Circuit Board               |
| PD    | Phase Detector                      |
| PI    | Phase Interpolator                  |
| PLL   | Phase Locked Loop                   |
| PON   | Passive Optical Network             |
| PRBS  | Pseudo Random Bit Sequence          |
|       |                                     |

## R

| tion |
|------|
| 2    |

#### S

| SDH  | Synchronous Digital Hierarchy        |
|------|--------------------------------------|
| SIDF | Sinusoidal-Input Describing Function |
| SNR  | Signal-to-Noise Ratio                |
| STM  | Synchronous Transport Module         |

#### Т

| •    | ٠ | ٠ |
|------|---|---|
| XXV1 | 1 | 1 |

| TDC<br>TIA | Time-to-Digital Converter<br>TransImpedance Amplifier |
|------------|-------------------------------------------------------|
| U          |                                                       |
| UHD<br>UI  | Ultra-High-Definition<br>Unit Interval                |
| V          |                                                       |
| VCO        | Voltage Controlled Oscillator                         |
| Z          |                                                       |
| ZOH        | Zero-Order Hold                                       |

# Nederlandstalige Samenvatting –Dutch Summary–

De laatste jaren is het dataverkeer exponentieel gestegen en er wordt voorspeld dat het einde van deze groei nog steeds niet in zicht is. Door een stijgend aanbod aan nieuwe online toepassingen, binnen sectoren zoals amusement, handel, industrie en gezondheidszorg vraagt men steeds meer bandbreedte en stelt men steeds hogere eisen aan de kwaliteit van de netwerk- en ICT-infrastructuur. Vooral de explosieve groei van online video en clouddiensten vereist hogere datasnelheden.

In de huidige internetarchitectuur zijn eindgebruikers verbonden met het openbare netwerk via het toegangsnetwerk van de lokale internetaanbieder. Tegenwoordig worden er nieuwe passieve optische netwerken (PONs) toegepast. Door gebruik te maken van optische vezels kan men veel hogere datasnelheden aanbieden en dit voor een fractie van het vermogenverbruik. Desondanks heeft onderzoek aangetoond dat het vermogenverbruik van communicatienetwerken een significant en groeiend deel van het totale globale vermogenverbruik inneemt. Meer en meer wordt men er zich nu van bewust dat dit een negatieve impact op het milieu heeft.

Dit heeft geleid tot de oprichting van het GreenTouch consortium in 2010. Deze instantie concentreert zich vooral op de vraag naar de stijgende datasnelheden, maar besteedt tevens ook aandacht aan de ecologische en economische impact ervan. De missie bestaat erin om aan te tonen dat de energieefficiëntie van communicatienetwerken kan worden geoptimaliseerd met een factor  $1000 \times$  tegen 2020, in vergelijking met het door GreenTouch gedefinieerd referentienetwerk dat opgebouwd werd met de meest energieefficiënte apparatuur die in 2010 beschikbaar was.

Een belangrijke component van een optische ontvanger in een passief optisch toegangsnetwerk zijn klok-en-data-extractie (CDR) schakelingen. Deze CDR-schakelingen worden momenteel geïmplementeerd met omvangrijke en vermogensinefficiënte bouwblokken en hebben dus veel ruimte voor verbetering. In dit proefschrift wordt het onderzoek over onderbemonsterende technieken met laag vermogen voor een volledig gedigitaliseerde klok-endata-extractie uiteengezet. Deze technieken vormen een antwoord op de verschillende uitdagingen waarmee de volgende generatie netwerken zullen geconfronteerd worden. Om deze technieken en stelling te staven, werd een prototype van een volledig gedigitaliseerde 25 Gb/s klok-endata-extractie (AD-CDR) schakeling geïmplementeerd in een geavanceerde CMOS-technologie (40 nm).

Dankzij de digitale architectuur kon de actieve chip-oppervlakte zeer compact gehouden worden. De oppervlakte bedraagt slechts  $0.050 \text{ mm}^2$ , wat beduidend lager is dan in andere vergelijkbare onderzoeken. De vermogenefficiëntie van de kern van de CDR is 1.8 pJ/b, wat ook beter is dan de allernieuwste CDR-systemen. Bovendien is de AD-CDR uitermate aanpasbaar: de karakteristieken van het lusfilter kunnen aangepast worden om aan meerdere jitter tolerantie specificaties te voldoen. Daarnaast kan het werkingsgebied aangepast worden van 12.5 Gb/s tot 25 Gb/s. Dit is het grootste werkingsgebied van elke digitale CDR dat geen gebruik maakt van een hoogkwalitatieve, multi-gigahertz referentieklok. Vanwege het daadwerkelijke digitale karakter van de aanpasbaarheid van de frequentie, schaalt het vermogenverbruik rechtevenredig met de datasnelheid. Hierdoor wordt een uitstekende vermogenefficiëntie over het volledige werkingsgebied bereikt: het vermogenverbruik is 46 mW aan 25 Gb/s, terwijl aan 12.5 Gb/s dit slechts 23 mW is.

Bovendien is AD-CDR ook geschikt om pakketgebaseerde data te ontvangen. De pakketgebaseerde operatie van de CDR wordt mogelijk gemaakt doordat de frequentie constant blijft tussen de pakketten en doordat de lusfilterparameters aangepast kunnen worden om een grotere bandbreedte te bekomen. Deze eigenschappen zorgen ervoor dat men korte insteltijden kan bereiken. Bijgevolg heeft de AD-CDR geen hoog-accurate referentieklok nodig en is er ook geen startsignaal nodig dat aangeeft wanneer de pakketten ontvangen worden. De digitaalgestuurde oscillator (DCO) moet enkel eenmalig gekalibreerd worden, zodat de oscillatiefrequentie de datasnelheid benadert. De integratie van de CDR in een systeem wordt hierdoor enorm vereenvoudigd.

Het proefschrift bevat zeven hoofdstukken en een appendix: Hoofdstuk 1 beschrijft de impact van de stijgende vraag naar hogere datasnelheden in combinatie met een lager vermogenverbruik in de communicatienetwerken. Vervolgens wordt de huidige kern-metro-toegangsnetwerkarchitectuur voorgesteld. Typische getallen tonen aan waarom het vermogenverbruik van het toegangsnetwerk het leeuwendeel van het totale vermogenverbruik voor zich neemt. Dit is vooral te wijten aan het enorm aantal apparaten in het netwerk. In het toegangsnetwerk wordt o.a. de evolutie naar optische toegangsnetwerken en het concept van passieve optische netwerken besproken.

Hoofdstuk 2 introduceert de CDR-schakelingen en accentueert het belang van deze schakelingen. Aanvullend worden de performantiemaatstaven en een beknopt overzicht van de verschillende CDR-types weergegeven. Er wordt aangetoond dat een CDR gebaseerd op een fase vergrendelende lus (PLL), het gunstigste type is voor de hogesnelheidoptische communicatie. Hoewel dit type nog enkele minpunten heeft, kunnen ze weggewerkt worden door de toepassing van digitale PLL-technieken. Maar in de praktijk worden deze technieken zelden toegepast in een CDR, omdat enkele uitdagingen nog steeds verhinderen dat de digitale technieken hun volledige potentieel kunnen bereiken. Deze uitdagingen worden in kaart gebracht en mogelijke oplossingen worden voorgesteld. Dit zal uiteindelijk leiden tot de volgende generatie van de hogesnelheids- en laagvermogen kloken-data-extractie die digitaal zal zijn of met andere woorden een volledig gedigitaliseerde klok-en-data-extractie.

In Hoofdstuk 3 wordt de niet-lineaire werking van de CDR onderzocht door gebruik te maken van beschrijvende functietechnieken in het fasedomein. Ten eerste worden de stabiliteit en de faseruis van een analoge ladingspomp CDR besproken. Vervolgens wordt het fasemodel uitgebreid om een representatief model voor een volledig gedigitaliseerde klok-en-data-extractie (AD-CDR) te vormen. Dit model laat toe om de totale faseruis en de robuustheid tegen lange inactieve sequenties te onderzoeken. Ter afronding van dit hoofdstuk worden de simulatieresultaten besproken.

Een overzicht van het ontwerp van de voorgestelde AD-CDR-schakeling wordt weergegeven in Hoofdstuk 4. Het hoofdstuk start met de architectuur van het systeem, waarbij vervolgens wordt overgegaan tot een diepgaande studie van de belangrijkste bouwblokken. Hierbij hoort ook de uitgebreide vergelijkende studie tussen de conventionele en de onlangs voorgestelde Inverse Alexander fasedetector (PD).

Hoofdstuk 5 bespreekt de applicatie-specifieke geïntegreerde schakeling (ASIC) implementatie van een AD-CDR in een 40 nm laagvermogen CMOS technologie. De beschrijving begint met de globale indeling en achtereenvolgens wordt de implementatie van elk onderliggend bouwblok in detail

besproken.

Om de correcte werking en het lage energieverbruik te demonstreren, werden metingen uitgevoerd. Deze worden besproken in Hoofdstuk 6.

Het laatste hoofdstuk (Hoofdstuk 7) geeft een overzicht van de belangrijkste conclusies van het uitgevoerde onderzoek weer.

Ten slotte zijn de berekeningen van de lineaire tijdsvariante (LTV) analyse van het volledig gedigitaliseerde klok-en-data-extractie model in Appendix A opgenomen.

xxxii

### **English Summary**

During the last couple of years, data traffic has been rising exponentially and it is predicted that this growth is not going to end anytime soon. This is due to new broadband applications in the field of entertainment, commerce, industry, health care and social interactions which demand increasingly higher data rates and quality of the networks and Information and Communications Technology (ICT) infrastructure. In addition, high definition video streaming and cloud services will continue to push the demand for bandwidth.

In the current architecture of the internet, end-users connect to the public network using the access network of an internet service provider. Today, this access network uses Passive Optical Network (PON) technologies because optical fiber is highly energy efficient for high data rates. Still, research has shown that the power consumption of communication networks is taking up a significant and growing share of the total global power consumption. Therefore, over the past few years, a stronger awareness has risen with respect to this negative environmental impact of massive power consumption in communication networks.

This has led to the foundation of the GreenTouch consortium in 2010, which focuses on the problem of increasing data rates while reducing the economical and environmental impact. Its mission is to show that the energy efficiency of communication networks could be improved by a factor of  $1000 \times$  by 2020, compared to the GreenTouch-defined baseline network which was built using the most energy efficient equipment available in 2010.

An important part of an optical receiver in a PON access network are Clock and Data Recovery (CDR) circuits. These CDR circuits are currently implemented with bulky and power hungry analog sub-blocks and thus have a lot of room for improvement. In this dissertation, low-power subsampling All-Digital Clock and Data Recovery (AD-CDR) techniques are presented as an answer to the various challenges next-generation networks are facing. To demonstrate this, a 25 Gb/s Phase Locked Loop (PLL)-based All-Digital Clock and Data Recovery (AD-CDR) circuit prototype was implemented in an advanced CMOS technology (40 nm).

Thanks to the highly digital architecture, the active die area is very compact and only occupies  $0.050 \text{ mm}^2$  which is significantly smaller than competing work. The power efficiency of the CDR core is 1.8 pJ/b which is also better than the state-of-the-art. Additionally, the All-Digital Clock and Data Recovery (AD-CDR) is highly adaptable: i.e. the characteristics of the loop filter can be tuned to satisfy multiple jitter tolerance specifications. Moreover, the operating range can be varied from 12.5 Gb/s to 25 Gb/s, which is the broadest operating range of any digital CDR that does not use a highquality, multi-gigahertz reference clock. Due to the truly digital frequency adaptable nature, the power consumption decreases linearly with the data rate and hence an excellent power efficiency is maintained over the entire operating range: e.g. at 25 Gb/s the power consumption is 46 mW while at 12.5 Gb/s this is 23 mW.

Furthermore, the AD-CDR is also able to capture burst mode data. The burst mode operation of the CDR is realized thanks to the lack of frequency drift between bursts and the possibility to adapt the Digital Loop Filter (DLF) parameters to obtain a large loop bandwidth. These features enable short settling times. As a result, the AD-CDR does not require a high-accuracy reference clock nor a start-of-burst signal. Only the Digitally Controlled Oscillator (DCO) needs a 1-time only calibration to ensure that its frequency is in the vicinity of the line rate. This significantly simplifies the integration of the component in a system.

The dissertation is composed of seven chapters and one appendix: Chapter 1 discusses the impact of increasing data rates and the desire to reduce power consumption in communication networks. Subsequently, the current core-metro-access architecture is presented and typical numbers are given to show why the power consumption of the access tier constitutes the lion's share of the total power consumption due to the vast amount of devices in the network. The access network is discussed, including the evolution to all-optical access networks and the concept of PONs.

Chapter 2 introduces CDR circuits and highlights the importance of these circuits. Additionally, the performance measures and a brief overview of different CDR types are given. A PLL-based CDR proves to be the most favorable type for high speed optical communication systems. Although this type still has some drawbacks, they can be overcome by using digital

PLL techniques. However, in practice these techniques are rarely implemented in a CDR because there are still some challenges that prevent the digital PLL techniques from reaching their full potential. These challenges are identified and solutions are proposed. This leads to a next-generation of high-speed and low-power Clock and Data Recovery circuits which will be digital, i.e. an All-Digital Clock and Data Recovery (AD-CDR).

In Chapter 3, the non-linear operation of the CDR is investigated using describing function techniques in the phase domain. First, the stability and the phase noise are discussed for the case of an analog charge pump CDR. Next, the phase domain model is extended to the case of the proposed AD-CDR. The phase noise and the robustness against long idle sequences are investigated. Finally, simulation results are discussed.

An overview of the design of the proposed AD-CDR circuit is given in Chapter 4. It starts with the system architecture and is followed by an in-depth discussion covering the most critical building blocks. This also includes an elaborate comparison between the conventional and the newly proposed Inverse Alexander Phase Detector (PD).

Chapter 5 discusses the implementation of an AD-CDR Application Specific Integrated Circuit (ASIC) in a 40 nm Low Power CMOS technology. The top-down approach starts with the description of the top-level implementation. Subsequently, the implementation of each building block is covered in detail.

To demonstrate the correct operation and low power efficiency, measurements were performed and are presented in Chapter 6.

The final chapter (Chapter 7) provides an overview of the foremost conclusions of the presented research.

Finally, the calculations of the Linear Time-Variant (LTV) analysis of the AD-CDR model are included in Appendix A.

#### List of Publications

#### **Publications in International Journals**

- M. Verbeke, P. Rombouts, H. Ramon, J. Verbist, J. Bauwelinck, X. Yin and G. Torfs, A 25 Gb/s All-Digital Clock and Data Recovery Circuit for Burst Mode Applications in PONs [Invited], Journal of Lightwave Technology, Pre-print, DOI:10.1109/JLT.2017.2784848, pp. 1-7, December 2017.
- M. Verbeke, P. Rombouts, H. Ramon, B. Moeneclaey, X. Yin, J. Bauwelinck and G. Torfs, *A 1.8-pJ/b, 12.5-25-Gb/s Wide Range All-Digital Clock and Data Recovery Circuit*, IEEE Journal of Solid-State Circuits, Pre-print, DOI: 10.1109/JSSC.2017.2755690, pp. 1-14, October 2017.
- **M. Verbeke**, P. Rombouts, X. Yin and G. Torfs, *Inverse Alexander phase detector*, Electronics Letters, vol. 52, no. 23, pp. 1908-1910, October 2016.
- A. Vyncke, G. Torfs, C. Van Praet, **M. Verbeke**, A. Duque, D. Suvakovic, H.K. Chow and X. Yin, *The 40 Gbps cascaded bit-interleaving PON [Invited]*, Optical Fiber Technology, vol. 26, part A, pp. 108-117, December 2015.
- M. Verbeke, P. Rombouts, A. Vyncke and G. Torfs, *Influence of Jitter on Limit Cycles in Bang-Bang Clock and Data Recovery Circuits*, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 62, no. 6, pp. 1463-1471, June 2015.

#### **Publications in International Conferences**

• M. Verbeke, P. Rombouts, H. Ramon, G. Torfs, J. Bauwelinck and X. Yin, A 25 Gb/s all-digital clock and data recovery circuit for burst

*mode applications in PONs [Highly ranked paper]*, the 43rd European Conference on Optical Communication (ECOC 2017), Gothenburg, Sweden, September 2017.

- **M. Verbeke**, P. Rombouts, A. Vyncke and G. Torfs, *Influence of Jitter on Limit Cycles in Bang-Bang Clock and Data Recovery Circuits [TCAS Special]*, IEEE International Symposium on Circuits and Systems (ISCAS 2016), Montreal, Canada, May 2016.
- A. Vyncke, G. Torfs, M. Verbeke, C. Van Praet, H. Chow, D. Suvakovic, A. Duque, and X. Yin, A Low Power 40 Gbit/s Cascaded Extension to Bit-interleaving Optical Networks Enabling Next-generation Metro/access Connectivity, 20th Annual Symposium of the IEEE Photonics Society Benelux Chapter, Brussels, Belgium, November 2015.
- A.Vyncke, G. Torfs, M. Verbeke, C. Van Praet, H. Chow, D. Suvakovic, and A. Duque, *Voltage controlled oscillators for 40Gbit/s cascaded bit-interleaving PON*, Advances in Wireless and Optical Communications (RTUWO 2015), Riga, Latvia, November 2015.
- A. Vyncke, G. Torfs, M. Verbeke, and X. Yin, An 8-phase 10 GHz Voltage Controlled Ring Oscillator for 40 Gbit/s BiPON Clock-anddata Recovery, 11th Conference on PhD Research in Microelectronics and Electronics (IEEE PRIME 2015), Glasgow, United Kingdom, July 2015.
- A.Vyncke, G. Torfs, M. Verbeke, C. Van Praet, H. Chow, D. Suvakovic, A. Duque, and X. Yin, *CBI-PON: a Low Power Solution Offering Flexible Bandwidth Allocation for 40 Gbit/s Next Generation Metro/access Networks*, IEICE Information and Communication Technology Forum 2015 (ICTF 2015), Manchester, United Kingdom, June 2015.
- X. Yin, H. Chow, A. Vyncke, D. Suvakovic, G. Torfs, A. Duque, D. Van Veen, M. Verbeke, T. Ayan, and P. Vetter. *CBI: a Scalable Energy-efficient Protocol for Metro/access Networks [Invited]*, 2014 IEEE Online Conference on Green Communications (OnlineGreen-Comm), November 2014.
- G. Torfs, X. Yin, A. Vyncke, M. Verbeke, and J. Bauwelinck. Solutions for a Single Carrier 40 Gbit/s Downstream Long-reach Passive

*Optical Network*, 2014 16th International Telecommunications Network Strategy and Planning Symposium (Networks 2014), September 2014.

#### **Publications in National Conferences**

• M. Verbeke, Fast determination of instability in a non-linear Clock and Data Recovery circuit, 16th FEA PhD symposium, December 9, 2015, Ghent, Belgium

#### **Chapters in Books**

A. Vyncke, G. Torfs, M. Verbeke, C. Van Praet, H. Chow, D. Suvakovic, A. Duque, and X. Yin, *Design and measurement of VCOs for 40Gbit/s Cascaded Bit-Interleaving PON*, 1st International IEEE Conference on Advances in Wireless and OPtical Communications 2015, Latvia, Riga, November 2015. Riga: RTU Press, 2015, pp. 91-104. ISBN 978-9934-10-758-0

### Part I

## Introduction to Internet Communication and CDRs

# Introduction

#### **1.1 Evolution of Data Consumption**

#### **1.1.1 Internet Traffic**

In 1995, less than 1% of the world population had an internet connection. Since then the number of internet users has increased tremendously: from 1999 to 2013, the number of internet users increased tenfold and reached the first billion of internet users in 2005. Today, there are over 3 billion internet users (Fig. 1.1), which corresponds to around 40% of the world population [1].

Over the last decades, not only the number of internet users, but also the data traffic, as well as the internet access speed have increased enormously and are still growing. It is expected that the global average broadband speed will nearly double from 2016 to 2021 (from 27.5 Mb/s to 53.0 Mb/s) [2]. This broadband speed is a crucial enabler of internet traffic, because broadband-speed improvements result in an increased consumption and the use of high-bandwidth content and applications.

Fig. 1.2 shows a prediction of the evolution of the major applications of future global internet traffic. The application that generates and will generate the most of global internet traffic is video. Video continues to be of enormous demand in today's home and this can be seen by observing the evolution of the number of Netflix subscribers (Fig. 1.3). Also profound



Figure 1.1: The number of internet users from 2001 until 2017 [3].



Figure 1.2: The future global internet traffic by application [2].

functionalities such as virtual reality, augmented reality, immersive video and video surveillance are emerging. This traffic type is a high-bandwidth consuming application and can cause significant new network design implications. For example, traffic associated with virtual and augmented reality applications is anticipated grow 20-fold by 2021, while video surveillance accounts will grow 15-fold by 2021 [2].

This growth in traffic is also more pronounced due to the significant bandwidth demands of the video application requirements of the future such as Ultra-High-Definition (UHD) streaming, 8K wall TV, UHD virtual reality. In Fig. 1.4, a scenario with video applications of the future is explored: today's bandwidth needs are only a tiny piece of the future needs [2]. It shows that the bit rate for a 8K wall TV at about 100 Mb/s is only one fifth of the need for UHD virtual reality (VR). It is estimated that by 2021 more than half (56 %) of the installed flat-panel TV sets will be UHD, compared to 15 % in 2016 [2]. In total, the sum of all forms of IP video, which includes internet video, IP VoD, video files exchanged through file sharing, video-streamed gaming, and video conferencing, will continue to be in the range of 80 to 90 % of the total internet traffic [2].

Next to internet video, there are some emerging contributors of the future data consumption (e.g. gaming, file sharing and web/data) that do not have a big relative share of the future global internet traffic (Fig. 1.2). However, the absolute value of internet traffic they produce is increasing rapidly.

For the case of internet gaming, the traffic will grow nearly tenfold between 2016 and 2021. Gaming on demand and streaming gaming platforms have been in development for several years, with many newly released in the last couple of years. While graphical processing is performed locally on the gamer's computer or console for traditional gaming, the game graphics for cloud gaming are produced on a remote server and transmitted over the network to the gamer. As cloud gaming becomes popular, gaming could have an increasing impact on the future internet traffic [2].

As social networking is one of the most popular ways for online users to spend their time, it is only natural that the number of social network users is also increasing. Fig. 1.5 illustrates the spectacular augmentation in Facebook platform medium attachment units for Instagram, Facebook, WhatsApp and Facebook Messenger since their launch. Despite Facebook being the absolute market leader in terms of medium attachment units, other social networks have thrived nonetheless. Some social networks such as LinkedIn have specialized in professional networking, whereas others such as Chinese-language Qzone or Renren support huge local audiences [5].



Netflix Subscribers (MM) & Quarterly Revenue (\$MM), 2/99 – 3/17, Global Q1:17 Streaming ARPU per Month = \$9.14

Figure 1.3: The evolution of the Netflix subscribers [4].



Figure 1.4: The current and future video requirements [2].

Currently, there are more than 1.6 billion social network users worldwide (about 64% of internet users) and these figures are expected to grow [5]. A wide selection of social networks also heavily relies on user-generated content<sup>1</sup>, increasing the need for more bandwidth.



Facebook Platform MAUs, Global, Months Since Launch

Figure 1.5: The evolution of Facebook platform Medium Attachment Units (MAUs) [4].

#### 1.1.2 Power Consumption

In this evolution of internet traffic, the cost and especially the power consumption of the enabling electronic circuits are important aspects. The global Information and Communications Technology (ICT) industry accounts for approximately 2% of global carbon dioxide (CO<sub>2</sub>) emissions, which is the same figure for the global airline industry [6, 7]. Additionally, with respect to the total worldwide electricity consumption, the relative share of the ICT electricity consumption has increased from about 3.9% in 2007 to 4.6% in 2012 [7].

Fig. 1.6 highlights the importance of networks in the electricity consumption of ICT: the total worldwide electricity consumption in communication networks has increased from 219 TWh per year in 2007 to 354 TWh per

<sup>&</sup>lt;sup>1</sup>Image-heavy Tumblr, Instagram and Pinterest are focused on content-creating.

year in 2012 [8]. This corresponds to an annual growth of about 10%. When we compare this to the total worldwide electricity consumption [9], we see that the share of networks is becoming increasingly important. Where communication networks only consumed about 1.3% of worldwide electricity in 2007, their relative contribution has increased to 1.8% in 2012 [8]. The electricity consumption in communication networks is thus growing at a faster pace ( $\approx 10\%$  in the interval 2007-2011) than the overall electricity consumption ( $\approx 3\%$  in the interval 2007-2011) [10]. These results and the fact that data rates and subscription numbers will most likely continue to grow in the following years, make it both essential for the industry and extremely challenging to cope with the increasing demand [11]. Therefore, in recent years, the energy efficiency of communication networks has received a lot of research attention: there are advances in circuit architecture, interconnect topologies, and transistor scaling [11].



Figure 1.6: The Compound Annual Growth Rate (CAGR) of the electricity consumption in ICT compared to the total worldwide electricity consumption. Networks is the fastest growing category [12].

#### **1.2 Networks Today**

Modern telecommunication networks are constructed as a three-level hierarchical network (Fig. 1.7), where each level is called a tier [13]. Each tier can roughly be identified with a geographical entity. The core tier, which typically uses a mesh topology, is responsible for interconnecting continents and countries, spanning distances ranging from hundreds to thousands of kilometers.



Figure 1.7: The modern telecommunication network hierarchy [14].

The metro tier corresponds to a metropolitan area, and can roughly be seen as the area a large city covers. Metro networks consist of a ring topology interconnecting several central offices over tens to hundreds of kilometers.

The access tier is the lowest level and provides connectivity to the end-user. Access networks are designed to operate on distances of a few kilometers to tens of kilometers. Contrary to the core and metro tier, the access tier is deployed in a variety of configurations, such as bus, star or ring topologies.

Due to the hierarchical construction of a modern telecommunication network, it is clear the amount of network devices deployed in the access tiers far outnumbers those in the core tiers. Table 1.1 reveals a projection of the power consumption, split in terms of core, metro and access network and customer premises equipment, by 2015-2020 assuming a business-as-usual scheme. It can be seen that small power consumption reductions in access network devices potentially have a much bigger impact on the complete system than high power consumption reductions in power-hungry devices in the core network.

#### 1.2.1 Optical Access Networks

While the core and metro networks incorporate optical fibers, an access network historically uses copper cable. This is because the twisted pair telephone cable network and the coaxial cable TV networks were readily available when the access networks had to be deployed [14, 16]. However, as the bandwidth demand has accelerated and as electrical channel

|        | Power consumption | Number of  | BAU      |
|--------|-------------------|------------|----------|
|        | (W/device)        | devices    | (GWh/yr) |
| Home   | 10                | 17 500 000 | 1 533    |
| Access | 1 280             | 27 344     | 307      |
| Metro  | 6 000             | 1 750      | 92       |
| Core   | 10 000            | 175        | 15       |

Table 1.1: An Italian network forecast (2015-2020): device density and energy requirements in the Business-As-Usual (BAU) case [15].

impairments become increasingly severe with the rise of data rates, optical interconnects have become an increasingly attractive alternative to traditional electrical wireline interconnects in access networks. The advantages of using optical interconnects are summarized in Table 1.2.

A Fibre-To-The-Home (FTTH) infrastructure is thus highly desirable, where optic fiber is distributed to every subscriber's premise [17]. Despite all technological advantages, converting a legacy copper access network to an all-optical access network requires a dazzling investment [13], which explains why this topology has not been globally deployed at this time.

However, the limited distances supported by the currently used advanced transmission techniques have already forced service providers to adopt an intermediate solution: Fibre-To-The-Curb (FTTC). FTTC is a hybrid fiber access solution which consists of routing fiber to a cabinet in the street, close to the end-user, without requiring to install fiber to every single end-user. In an FTTC scenario, the connection from the cabinet to the end-user, which is still copper, is known as the last mile. Converting this last mile to optic fiber is very expensive, owing to the cost of the required civil works [14].

Although this is quite the investment, more and more service providers start to recognize it would turn their current situation of bandwidth scarcity to one of bandwidth abundance, which enables long term growth and creates potential for additional services on the network.

#### **1.2.2 Passive Optical Network**

The legacy copper network can be replaced by an optical access network, which has a number of possible topologies. The central office can be connected to subscribers by point-to-point fibers. This has the drawback of a high cost due to the huge amount of fibers needed. To reduce the fiber

| Property                | Advantage                                                                                                                                                                                                                                                                |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Size                    | The total diameter of an optical fiber (core, cladding and protection jacket) measures about $400 \mu\text{m}$ , a significant reduction from the $6 \text{mm}$ diameter of coaxial cable. This is advantageous in cramped conduits in buildings and underground layout. |
| Weight                  | Due to the mass density difference and the smaller size, optical fiber yields a 10 to $30\%$ weight reduction compared to copper cable.                                                                                                                                  |
| Bandwidth               | Fiber has very high bandwidth, supporting data rates over 100 Tbit/s across one single standard single mode fiber, as experimentally proven in [18].                                                                                                                     |
| Loss                    | Optical fiber has an attenuation less than $0.2 \mathrm{dB/km}$ at $1550 \mathrm{nm}$ , enabling transmission over several tens of kilometers without amplification.                                                                                                     |
| Electrical interference | Since light is used, electromagnetic fields<br>have no influence on transmission making it<br>ideal in environments where strong electro-<br>magnetic fields are present                                                                                                 |
| Crosstalk               | Very little light escapes an optical fiber, re-<br>sulting in very good crosstalk characteristics.                                                                                                                                                                       |
| Environmental           | Flammable or explosive environments pose<br>no issue, since no sparks are ever generated<br>by optical fiber                                                                                                                                                             |
| Material availability   | While copper is mined, and is scarce, silica is<br>composed of oxygen and silicon, both avail-<br>able abundantly.                                                                                                                                                       |
| Multiplexing            | A single fiber supports multiplexing of many wavelengths, increasing the potential data rate.                                                                                                                                                                            |

Table 1.2: The advantages of optical fiber [16].

quantity, a star topology can be used, in which case there is a splitting point, known as the remote node close to the end subscribers, which is connected to the central office.

This splitting point can be implemented using an active node that incorporates a transceiver per customer which has to be powered and maintained, while future-proofness is not guaranteed.



Figure 1.8: A Passive Optical Network [14].

When the splitter is implemented passively, the network is called a Passive Optical Network (PON). A PON consists of a single mode fiber-based point-to-multipoint topology, where a number of Optical Network Units (ONUs) at the subscriber side are connected to the Optical Line Terminal (OLT) at the central office through passive splitters (Fig. 1.8) [13]. The passive infrastructure benefits from a low installation and maintenance cost and reduces the power consumption, since no power supplies are needed at the remote node. PONs are therefore considered to be the most energy efficient network architecture for broadband fiber access [19]. Additionally, upgrading to higher bit rates requires upgraded electronics in the central office and customer premises, but there is nothing that needs upgrading in the outside plant, as the passive splitters are insensitive to the PON speed. This network could be flexibly upgraded as new technologies mature or new standards emerge [20]. Today, the dominant technology of optical access networks is the power-splitter-based PON [21].

The standard PON operates in the "single-wavelength mode" where one

wavelength is used for downstream transmission and a separate one is used for upstream transmission. Since in a PON, the section between the OLT and the first splitter is, known as the feeder section, is shared between all subscribers, a multiplexing technique is needed to ensure the co-existence of multiple signals from and to different subscribers. Time division multiplexing is the most common variant. In this scheme, a single OLT broadcasts all downstream traffic to every ONU in the link by using a power splitter at the remote node (Fig. 1.9(a)). Each ONU extracts its own packets and discards all others. For the upstream communication (Fig. 1.9(b)), the OLT dynamically allocates specific time slots to active subscribers, during which the ONU can transmit data. In this way packets are time interleaved at the splitter and the ONU is able to transmit at the full upstream bandwidth for the duration of the specific time slot.



Figure 1.9: The communication in PON: (a) downstream and (b) upstream.

Until recently, the evolution of PON has been basically a matter of increasing of the data rate [21]. However, to accommodate the growing demand of access bandwidth, especially for applications beyond FTTH (e.g. business services and 5G X-hauling), both IEEE and ITU-T have recently started to investigate the roadmap for future passive optical networks [22, 23]. The IEEE P802.3ca 100G Ethernet PON Task Force [24] was founded in 2016 to specify the physical-layer parameters for  $25/50/100 \,\mathrm{Gb/s}$  Next-Generation Ethernet PON (NG-EPON) and targets the  $100 \,\mathrm{Gb/s}$  capacity PON to be expected in 2025 [22]. The NG-EPON is expected to use 2 and 4 wavelengths, each carrying  $25 \,\mathrm{Gb/s}$ , to achieve data rates up to  $50 \,\mathrm{Gb/s}$  and  $100 \,\mathrm{Gb/s}$ , respectively [23].

#### **1.2.3 Optical Receiver**

Information in a PON network is transmitted on an optical carrier in a certain modulation format. The modulation format typically used with limiting receivers is Non-Return-to-Zero (NRZ) which is also applied in this work. In practice, the sent bit-stream is deteriorated by numerous parasitic effects in the fiber. The function of a receiver is to recover the information embedded in the received signal. This is done in three steps: re-amplification, re-shaping and re-timing [25].



Figure 1.10: A block diagram of an optical receiver.

A receiver comprises four main building blocks: a photo detector, a TransImpedance Amplifier (TIA), a limiting amplifier and a Clock and Data Recovery (CDR) circuit (Fig. 1.10). A photo detector (e.g.: PIN diode) linearly converts the incident optical power to a current, which is amplified and converted to a voltage by the TIA. The limiting amplifier sets the decision level and amplifies the incoming signal, yielding logic levels. The amplifier and TIA thus perform the re-shaping and re-amplification [26]. The receiver also needs a circuit to extract the (digital) data and precise timing from the deteriorated received (analog) waveform. This function is called Clock and Data Recovery (CDR). From here on the signal is, once again, truly digital.

A CDR is thus a major part of an optical receiver. This is also reflected in the power consumption: approximately 50% of the power consumption in an optical receiver is due to the CDR [27–33]. Therefore, this work will focus on the improvement and optimization of this important building block.

#### **1.3** Objective of this Work

The previous sections of this chapter have introduced the reader to the context wherein the research, leading to this dissertation, has been conducted. First, the tremendous increase in data traffic can only be sustained by communication networks supporting higher line rates. Secondly, there is the growing importance of communication networks power consumption which can no longer be ignored. Therefore, next-generation networks will have to engage low power solutions.

Furthermore, CDR circuits are introduced as a part of an optical receiver in a PON access network. These CDR circuits are currently implemented with bulky and power hungry analog sub-blocks. In this work, we want to investigate how we can implement most of such a CDR in the digital domain, because innovative digital CDR techniques are key to drastically reduce the power consumption of future fiber-optic systems.

In particular, this dissertation comprises the research conducted by the author on All-Digital Clock and Data Recovery (AD-CDR) techniques which are presented as an answer to the various challenges next-generation networks are facing. This work covers multiple areas, from the top-level analysis over the low-level architecture research, to the design, implementation and verification of an AD-CDR Application Specific Integrated Circuit (ASIC). Compared to competing work, this AD-CDR has the best power efficiency and occupies the smallest area. Moreover, the developed AD-CDR techniques resulted in 4 first-authored publications in international journals and 2 first-authored presentations at internal conferences.

This work was supported by the Agency for Innovation by Science and Technology in Flanders (IWT), the Hercules project VeRONICa for the chip fabrication and the Hercules project AUGE/13/01 for the measurement equipment.

Additionally, the author contributed on different projects in the domain of access networks: i.e. DISCUS [34] and GreenTouch [35]. The major contributions of the author in these projects are to be found in the implementation and the verification by simulation of sub-building blocks of the CAB-INET ASIC [14]. These contributions, however, are out of scope of this dissertation.

#### **1.4** Overview of the Dissertation

In this chapter, the description is given of the context in which this research was performed, providing background information on the evolution of internet traffic and data consumption. Furthermore, PON networks and the building blocks of an optical link (including a CDR circuit) are introduced.

In Chapter 2, the CDR concept is discussed in detail and the issues and solutions regarding future implementation in low-power networks are highlighted. Chapter 3 describes the analysis of non-linear and digital CDR circuit. Subsequently, Chapter 4 elaborates on the design of the CDR. The implementation is described in Chapter 5 and the measurement results are given in Chapter 6. Finally, Chapter 7 concludes the dissertation by providing a summary of the most important results and discussing opportunities for future research.

#### References

- [1] Internet Live Stats, "Internet Usage & Social Media Statistics," 2017. Available on URL: http://www.internetlivestats.com/
- [2] CISCO, "The Zettabyte Era: Trends and Analysis," 2015. Available on URL: https://www.cisco.com/c/en/us/ solutions/collateral/service-provider/visual-networking-index-vni/ vni-hyperconnectivity-wp.html
- [3] International Telecommunication Union, "ICT: Facts and figures 2017," pp. 1–8, 2017. Available on URL: http://www.itu.int/en/ ITU-D/Statistics/Pages/facts/default.aspx
- [4] Mary Meeker, "Internet Trends 2017," in Code Conference. Kleiner Perkins, 2017. Available on URL: http://www.kpcb.com/ internet-trends
- [5] Statista, "Social Media and User-Generated Content," 2017. Available on URL: https://www.statista.com/markets/424/topic/540/ social-media-user-generated-content/
- [6] Gartner, "Gartner Estimates ICT Industry Accounts of for 2 Percent Global CO<sub>2</sub> Emissions." p. http://www.gartner.com/it/page.jsp?id=503867, 2007. Available on URL: http://www.gartner.com/newsroom/id/503867
- [7] Sofie Lambert, "Energieverbruik en besparingsstrategieën in telecommunicatienetwerken Energy Consumption and Energy-Saving Strategies in Telecommunication Networks," Ph.D. dissertation, Ghent University, 2016.
- [8] Sofie Lambert, Ward Van Heddeghem, Willem Vereecken, Bart Lannoo, Didier Colle, and Mario Pickavet, "Worldwide electricity consumption of communication networks." *Optics express*, vol. 20, no. 26, pp. B513—24, dec 2012.

- [9] Enerdata, "World Electricity Statistics -Electricity Production Data," 2017. Available on URL: https://yearbook.enerdata.net/electricity/ world-electricity-production-statistics.html
- [10] P Bertoldi, B Hirl, and N Labanca, "Energy efficiency status report 2012," European Commission, Tech. Rep., 2012.
- [11] ISSCC, "Press Kit 2017," 2017. Available on URL: http://isscc.org/ about-isscc/press-kit/
- [12] Ward Van Heddeghem, Sofie Lambert, Bart Lannoo, Didier Colle, Mario Pickavet, and Piet Demeester, "Trends in worldwide ICT electricity consumption from 2007 to 2012," *Computer Communications*, vol. 50, pp. 64–76, 2014.
- [13] Cedric F Lam, *Passive optical networks : principles and practice*. Elsevier/Academic Press, 2007.
- [14] Arno Vyncke, "A low power, multi-rate clock-and-data recovery circuit and MAC preprocessor for 40 Gbit/s cascaded bit-interleaving passive optical networks," Ph.D. dissertation, Ghent University, 2016.
- [15] Raffaele Bolla, Franco Davoli, Roberto Bruschi, Ken Christensen, Flavio Cucchietti, and Suresh Singh, "The potential impact of green technologies in next-generation wireline networks: Is there room for energy saving optimization?" *IEEE Communications Magazine*, vol. 49, no. 8, pp. 80–86, aug 2011.
- [16] Renato Vaernewyck, *High-speed low-power modulator driver arrays* for medium-reach optical networks, 2014.
- [17] Eileen Connolly Bull, "FTTH Handbook, Edition 6," Tech. Rep., 2014.
- [18] Dayou Qian, Ming-Fang Huang, Ezra Ip, Yue-Kai Huang, Yin Shao, Junqiang Hu, and Ting Wang, "High Capacity/Spectral Efficiency 101.7-Tb/s WDM Transmission Using PDM-128QAM-OFDM Over 165-km SSMF Within C- and L-Bands," *Journal of Lightwave Technology*, vol. 30, no. 10, pp. 1540–1548, may 2012. Available on URL: http://ieeexplore.ieee.org/document/6158565/
- [19] Raffaele Bolla, Roberto Bruschi, Franco Davoli, and Flavio Cucchietti, "Energy efficiency in the future internet: A survey of existing approaches and trends in energy-aware fixed network infrastructures,"

*IEEE Communications Surveys and Tutorials*, vol. 13, no. 2, pp. 223–244, 2011.

- [20] Cedric Mélange, "Burst Mode Clock and Data Recovery in Long Reach Passive Optical Networks." Ph.D. dissertation, Ghent University, 2010.
- [21] Frank J. Effenberger, "Industrial Trends and Roadmap of Access," *Journal of Lightwave Technology*, vol. 35, no. 5, pp. 1142–1146, mar 2017.
- [22] Derek Nesset, "PON Roadmap [Invited]," Journal of Optical Communications and Networking, vol. 9, no. 1, pp. A71–A76, jan 2017.
- [23] Curtis Knittle, "IEEE 100 Gb/s EPON," Optical Fiber Communication Conference 2016, pp. 2014–2016, 2016.
- [24] IEEE, "IEEE P802.3ca 100G-EPON Task Force," 2017. Available on URL: http://www.ieee802.org/3/ca/index.shtml
- [25] Xing-Zhi Qiu, Xin Yin, Jochen Verbrugghe, Bart Moeneclaey, Arno Vyncke, Christophe Van Praet, Guy Torfs *et al.*, "Fast Synchronization 3R Burst-Mode Receivers for Passive Optical Networks," *Journal* of Lightwave Technology, vol. 32, no. 4, pp. 644–659, feb 2014.
- [26] Jochen Verbrugghe, "Design of Event-Driven Automatic Gain Control and High-Speed Data Path for Multichannel Optical Receiver Arrays," Ph.D. dissertation, Ghent University, 2015.
- [27] Cecilia Gimeno, Carlos Sanchez-Azqueta, Erick Guerrero, Javier Aguirre, Concepcion Aldea, and Santiago Celma, "Single-chip receiver for 1.25 Gb/s over 50-m SI-POF," *IEEE Photonics Technology Letters*, vol. 27, no. 11, pp. 1220–1223, jun 2015.
- [28] Chih-Fan Liao and Shen-Iuan Liu, "40 Gb/s Transimpedance-AGC Amplifier and CDR Circuit for Broadband Data Receivers in 90 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 3, pp. 642– 655, mar 2008.
- [29] Yingmei Chen, Zhigong Wang, Xiangning Fan, Hui Wang, and Wei Li, "A 38 Gb/s to 43 Gb/s monolithic optical receiver in 65 nm cmos technology," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 12, pp. 3173–3181, dec 2013.

- [30] Alexander Rylyakov, Jonathan E. Proesel, Sergey Rylov, Benjamin G. Lee, John F. Bulzacchelli, Abhijeet Ardey, Ben Parker *et al.*, "A 25 Gb/s Burst-Mode Receiver for Low Latency Photonic Switch Networks," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 12, pp. 3120–3132, dec 2015.
- [31] Ping Chuan Chiang, Jhih Yu Jiang, Hao Wei Hung, Chin Yang Wu, Gaun Sing Chen, and Jri Lee, "425 Gb/s transceiver with optical frontend for 100 GbE system in 65 nm CMOS technology," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 2, pp. 573–585, feb 2015.
- [32] Sang Hyeok Chu, Woorham Bae, Gyu Seob Jeong, Sungchun Jang, Sungwoo Kim, Jiho Joo, Gyungock Kim *et al.*, "A 22 to 26.5 Gb/s Optical Receiver with All-Digital Clock and Data Recovery in a 65 nm CMOS Process," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2603–2612, nov 2015.
- [33] Samuel Palermo, Azita Emami-Neyestanak, and Mark Horowitz, "A 90nm CMOS 16Gb/s Transceiver for Optical Interconnects," in 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. IEEE, feb 2007, pp. 44–586.
- [34] FP7 Grant Agreement 318137, "The DIStributed Core for unlimited bandwidth supply for all Users and Services," 2017. Available on URL: http://www.discus-fp7.eu/
- [35] GreenTouch Consortium, "GreenTouch," 2017. Available on URL: www.greentouch.org/

# Multi-Gigabit Clock and Data Recovery

This chapter introduces Clock and Data Recovery (CDR) circuits and highlights the importance of these circuits. Additionally, the concept of jitter which is one of the most important performance measures next to the power consumption, the area and the bandwidth is summarized. Next, a brief overview of different CDR types is given. For each type, the advantages and limitations are discussed, facilitating an objective comparison between the different types. A Phase Locked Loop (PLL)-based CDR proves to be the most favorable type for high speed optical communication systems. Although this type still has some drawbacks, they can be overcome by using digital PLL techniques. However, in practice these techniques are rarely implemented in a CDR because there are still some challenges that prevent the digital PLL techniques from reaching their full potential. These challenges are identified and solutions are proposed. This leads to a nextgeneration of high-speed and low-power Clock and Data Recovery circuits which will be digital.

#### 2.1 Introduction to CDRs

In a fiber-optic network, data (e.g. a Non-Return-to-Zero (NRZ) signal) is transmitted through an optical fiber to a receiver without any accompa-

nying time-reference [1, 2]. Sending a time-reference (e.g. a clock signal) together with the data over the same channel would severely lower the spectral efficiency, and adding an extra channel just to send a clock would be overly expensive [3–5]. Consequently, the received waveform containing the data is asynchronous.

Furthermore, the quality of the received waveform is degraded due to numerous parasitic effects, which are illustrated by the eye diagram in Fig. 2.1. These parasitic effects include amplitude variations (i.e. due to InterSymbol Interference (ISI) and noise) and timing variations (i.e. deterministic and random jitter) of the received asynchronous signal. These variations increase the difficulty to capture the data sent.



Figure 2.1: An eye diagram at the input of the decision circuit with ISI, noise and jitter.

To recover the transmitted digital data from this degraded and asynchronous waveform and to enable the subsequent processing, the precise timing information (i.e. a clock signal) must be extracted from the waveform. This clock enables us to sample the waveform at times  $t_s$ . Additionally, the samples are compared to a threshold value  $\gamma$ : all values above this threshold are mapped to a logic-'1' while the values smaller than  $\gamma$  are mapped to a logic-'0'. The output results in a digital and synchronous signal which allows further synchronous operations with the sampling clock as time reference.

Ideally, the obtained output would not contain any timing variation or bit errors. However, due to the imperfect sampling clock, some minor timing variations are still present. Moreover, bit errors can occur due to the stochastic nature of the amplitude noise and ISI at the input or can also be made if the sampling occurs close to a data edge.

To successfully retime the data and reduce the timing variation and bit errors, the generated clock must satisfy three basic conditions [1]:

- The clock signal frequency must be consistent with the data rate: e.g. for a data rate of 25 Gb/s (each bit 40 ps wide), a clock frequency of 25 GHz (= period of 40 ps) is required.
- The generated clock signal has to exhibit very low timing variations because any timing variation will add directly to the recovered data signal.
- There has to be a certain phase relationship between the data and the clock signal to allow an optimum sampling of the bits by the clock. This optimal sampling point  $t_s$  occurs farthest from the preceding and following data transitions, and thus corresponds to the middle of a bit. Therefore, the generated clock signal has to follow the timing variations of the input data.

This clock recovery and data recovery are performed by a Clock and Data Recovery circuit and its basic block diagram is shown in Fig. 2.2. A clock recovery circuit senses the received waveform and produces a periodic clock. A flip-flop driven by this clock then retimes the data: the noisy data is sampled in order to remove the noise and any timing variation accumulated during transmission. This operation is necessary in every high-speed broadband receiver and makes a Clock and Data Recovery circuit an essential component in the data link. For this reason, the design and performance of a CDR circuit has a significant influence on the total operation of a data link [6].

#### 2.2 Jitter and Wander

Timing variation of the received data decreases the performance in a data link: if timing variations become large, errors are produced and the system can become inefficient. Even little timing variations reduce the noise margin of the system and makes it more sensitive to errors [1, 4, 7].



Figure 2.2: A basic block diagram of a Clock and Data Recovery circuit.

This timing variation is defined as the deviation of the zero crossing<sup>1</sup> from its ideal position in time, or alternatively, as the deviation of each period from its ideal value. Slow timing variations (< 10 Hz) are called *wander*, while higher speed variations are described as *jitter* [7].

Fig. 2.3 displays the received jittered signal viewed at different time instants  $(t_1, \ldots, t_6)$  together with the original signal. Moreover, Fig. 2.3 also illustrates the corresponding jitter function, i.e. the deviation of the zero crossing from their ideal position viewed at these instants (indicated by the arrows). In this illustration, the jitter function is depicted as a sinusoidal wave. The magnitude of the jitter function represents the deviation of the zero crossing from its ideal position, while the frequency of the jitter function represents the rate at which the zero crossing deviates from its ideal position. Please note that a jitter function is not limited to a sinusoidal wave but can be any combination of a deterministic and a random signal.

#### 2.2.1 Jitter Specifications

CDR circuits targeting optical communication standards must satisfy stringent and difficult jitter specification. Typically, multiple contradicting specifications have to be met simultaneously. For example, the generated clock should track the jitter of the input data signal in order to capture the data at the optimum sampling point. However, this can lead to jitter building up in a network when the recovered clock is used to generate and transmit (upstream) data. Therefore, the recovered clock should be stable and exhibit low jitter. In practice, the design of a clock recovery circuit is thus a compromise and depends on the intended system or communication standard.

<sup>&</sup>lt;sup>1</sup>The zero crossing is defined as the time instant where the signal crosses the threshold value  $\gamma$  in Fig. 2.1.



Figure 2.3: Jitter: unwanted phase variations of a signal.

Jitter performance of transmission systems is mandated by standards written by agencies such as the International Telecommunications Union, Telecommunication Standardization Sector (ITU-T) and Telcordia (formerly Bellcore). As guideline, the optical communication standard Synchronous Digital Hierarchy (SDH) (ITU-T G.783 and ITU-T G.825) [8, 9] is used in this work and the jitter requirements are imposed on the jitter generation, the jitter transfer (including jitter peaking), the output jitter (combination of jitter generation and jitter transfer) and the jitter tolerance.

**Jitter generation** describes how much jitter is intrinsically generated by a system at the output when no jitter is present at the input.

**Jitter transfer** is a measure of the amount of jitter that is transferred between the input and the output of a system as a function of frequency. This input jitter is a consequence of the jitter generated by each preceding device as the desired signal traverses a network. If this jitter is amplified as it passes through the network, then it could exceed the tolerance levels of the subsequent equipment to process the data correctly.

**Output jitter** is a measure of the jitter present on an output of a system. It is a combination of the intrinsically generated jitter (Jitter generation) and the jitter that is traversed through the large network (Jitter transfer). The output jitter is important if the recovered clock is reused to transmit data. In this case, the jitter generation and jitter transfer have to fulfill the requirements specified above to avoid excessive jitter insertion and jitter amplification into the network. However, in the case that the clock is only used to sample and process the received data, the constraints on the jitter generation and jitter transfer can be relaxed. The most important requirement here is that the output jitter is low enough such that the setup and hold constraints of the digital processing logic are always met.

**Jitter tolerance** describes the resilience of the device to input jitter. Typically, the jitter tolerance is measured by generating a signal with added sinusoidal jitter and applying it to the Device Under Test (DUT). At each jitter frequency, the amplitude of the jitter is increased until transmission errors are detected. Although jitter is unlikely to be sinusoidal in realistic operating conditions, this measurement method is straightforward and gives consistent results. Therefore, different systems can be easily compared and system specifications can be defined, i.e. a jitter tolerance mask. For SDH Synchronous Transport Module (STM)-256, Fig. 2.4 shows the corresponding jitter tolerance mask [9].



Figure 2.4: The jitter tolerance mask for SDH STM-256 [9].

#### 2.3 CDR Types

Over the last years, a growing number of high-speed electrical serial link applications has stimulated many researchers to produce a wide variety of CDR designs. An overview of these CDR designs is given in [10, 11] and each CDR topology can be classified into one of the following three main categories:

- · Oversampling without feedback phase tracking
- Phase alignment without feedback phase tracking
- · Feedback phase tracking

#### 2.3.1 Oversampling without Feedback Phase Tracking

This type of CDR circuit blindly samples the incoming data stream at a multitude of the data rate. Every clock cycle, it chooses one data sample which results in a minimal Bit-Error Rate (BER). This topology requires

multiple clock phases which run at the full data rate and are generated from a reference clock. Each clock phase triggers one of the parallel samplers once every clock cycle to capture the data. The reference clock does not need to be aligned with the received data edge, however a fixed frequency relation between the reference clock and transmitter clock is required to provide the necessary frequency information. Once the samples of the received signal are collected, a phase detection logic circuit recovers each bit and forwards the stream to the subsequent digital circuit.

Once the samples are collected, blind oversampling CDRs perform phase detection and data recovery purely in the digital domain, and therefore this approach offers several advantages over conventional analog-type CDRs. They are immune to noise and can be easily integrated with other digital logic functions in a single chip.

However, there are also disadvantages to blind oversampling CDRs. By blindly oversampling the received signal first and picking the right sample later, a much higher hardware cost is required compared to recovering the received timing first and sampling only at that position. Furthermore, a large oversampling rate may be needed for reliable phase detection, and processing such a large data set is costly in power and area. Therefore, oversampling architectures are among the most power hungry CDR topologies [3] and are therefore not suitable for the low power solution that we are pursuing. However, due to their fast acquisition properties, they are often used in Burst Mode Receivers (BM-Rxs).

Unlike the oversampling CDR circuit type, the following two CDR types continuously adjust the recovered clock phase to the center of the data eye to sample the data at a single optimal point.

#### 2.3.2 Phase Alignment without Feedback Phase Tracking

When the incoming data signal has spectral energy at the clock frequency a synchronous clock can be obtained simply by passing the incoming data through a band pass filter tuned to the nominal frequency. Because of bandwidth restrictions, in most signaling formats the incoming data signal (e.g. a NRZ signal) has no spectral energy at the clock frequency which complicates the clock recovery. Such signals must first undergo an appropriate non-linear preprocessing before they are applied to the resonator, e.g. by using a diode. Timing circuits using a filter are the oldest solution for clock recovery. For clock recovery this approach is outdated in many respects, and PLL circuits perform much better [12]. Another form of phase alignment without feedback phase tracking are Clock and Data Recovery circuits which are based on a Gated Voltage Controlled Oscillator (GVCO). These circuits can instantaneously<sup>2</sup> obtain phase lock from burst mode NRZ signals. The GVCO oscillates at about the data frequency and instantaneous phase locking is guaranteed by restarting the gated oscillator every time an input transition occurs. This method has been demonstrated to be precise enough to handle the input data patterns with low-transition densities, containing hundreds of Consecutive Identical Digits (CID) without errors. These circuits lock very fast on the input signal, are small, high speed, low power and show a high level of integration.

However, they do not provide any jitter rejection, since the phase of the signal at the output tracks all the variations of the phase at the input [12]. This jitter rejection is necessary in Passive Optical Network (PON) transceivers as they are expected to support both down- as upstream communication. An Optical Network Unit (ONU) typically reuses the clock that is extracted from the downstream data traffic for the upstream transmission. This reuse has a significant impact on the jitter rejection requirements of the clock extraction. Therefore, due to their lack of jitter rejection, phase alignment topologies without feedback phase tracking are not acceptable for use in ONUs [3]. This also leaves out the gated oscillator-based CDR techniques.

#### 2.3.3 Feedback Phase Tracking

The last category is feedback phase tracking, which includes Phase Locked Loop (PLL), Delay Locked Loop (DLL), Phase Interpolator (PI) and Injection Locked (IL) Clock and Data Recovery topologies.

Phase Interpolator (PI) CDRs typically have a narrow frequency capture range and a small bandwidth which has a negative effect on the jitter tolerance. To alleviate these drawbacks, a high speed (of the datarate's order) tunable clock reference is required. In most cases, the high-speed and highquality tunable clock reference is provided by an external source, which increases the system cost and makes this solution undesirable. [11]

Injection Locked (IL) CDRs have on the contrary a very wide bandwidth. However, the capture range of the CDRs is typically limited to a fixed local operating frequency. Although the capture range is improved in [13], the complexity of the design is increased and new issues are introduced. Moreover, both the PI and IL-CDR architectures require (as in the case of the

<sup>&</sup>lt;sup>2</sup>within one bit

oversampling CDR discussed above) multiple full-rate clock phases [11], resulting in high power consumption and are therefore ruled out.

Another popular CDR is the Delay Locked Loop (DLL) CDR. Its limited phase acquisition range is however the major drawback of this system. DLLs lack the ability to withstand low-frequency jitter or frequency offset. Therefore, this type is only applicable to source synchronous systems [11], where transmitter and receiver use the same clock source. A DLL by itself is thus not useful for our application where frequency synthesis is required. Nevertheless, dual-loop DLL/PLL topologies exist that support asynchronous systems. These architectures combine the advantages of DLLs and PLLs, providing fast acquisition while avoiding jitter peaking, but this comes at a price: the dual loop nature significantly complicates the system analysis and raises stability concerns [4].

The last major group are the PLL CDRs. These Clock and Data Recovery circuits intrinsically have a wide frequency capture range due to their ability to correct both phase and frequency. Additionally, they have a wide bandwidth and have input jitter rejection. On the downside, PLL-based CDRs tend to have a poor power efficiency.

Recently, digital PLL techniques in PLL-based CDRs have emerged. These techniques have many advantages over the conventional analog PLL-based CDR, including an improved power efficiency. In the next sections, we will elaborate on the operation of a conventional PLL-based CDR and on the potential of digital techniques to enhance the performance. Consequently, a PLL-based CDR is chosen as the underlying structure of the implemented CDR.

#### 2.4 PLL-Based CDR Structure

Fig. 2.5 shows a conceptual diagram of a PLL-based CDR circuit, which is typically used for fiber-optic communication systems. This PLL basically consists of a Phase Detector (PD), a loop filter and a controlled oscillator. The PD measures the phase difference between the input data signal  $D_{in}$  and the recovered clock *Clk* from the controlled oscillator. This error signal is low-pass filtered by the loop filter and the resulting signal drives the controlled oscillator such that the phase error is reduced. This recovered (and phase locked) clock signal is then used to sample the received data at the ideal moment (i.e. the middle of a bit). In this way, the original transmitted data is recovered from the degraded input signal.

Note that this CDR structure does not require an external clock reference.



Figure 2.5: A block diagram of a PLL-based CDR circuit.

The clock is generated by the controlled oscillator and may be recovered from data rates spanning the entire frequency range of the controlled oscillator. This also allows the CDR to absorb frequency drifts caused by variations in temperature and power supply in the clock recovery circuit.

An analog implementation of a PLL-based CDR is given by a charge pump Clock and Data Recovery circuit shown in Fig. 2.6. In this conventional implementation, the loop filter is based on an analog charge pump combined with an RC-filter. According to the phase difference between the input data and the recovered clock, the PD will activate the upper or lower current source of the charge pump. This will increase or decrease the control voltage of the Voltage Controlled Oscillator (VCO), respectively. Consequently, the frequency of the recovered clock is respectively increased or decreased in order to reduce the phase error. If no data transition occurs, the PD does not generate any signal (*Early* or *Late*) and the VCO is not adjusted.



Figure 2.6: A charge pump PLL-based CDR circuit.

#### 2.5 Evolution to Digital CDR

Today, CDR circuits for multi-gigabit fiber-optic communication are usually implemented with purely analog PLL techniques based on charge pump loop filters. Although the need for low cost and high integration mandates that the CDRs should be implemented in a deep-submicron technology, it is hard to achieve high performance for classical analog CDRs in today's modern technologies [14]. Therefore, digital CDRs will become increasingly important for high-speed data communication.

A digital CDR eliminates the need for a large loop filter capacitor used in classical analog CDRs. Instead, a digital CDR uses a compact Digital Loop Filter (DLF) which can realize large time-constants without any additional cost in area. Additionally, a DLF is tolerant to process, voltage and temperature variations and is noise insensitive. The filter is also easily scalable, portable across CMOS technologies and highly adaptive. This makes a digital CDR the optimal choice for a high speed receiver implemented in a deep-submicron technology and has been a major area of research interest in recent years [14–24].

#### 2.5.1 All-Digital CDR Structure

We focus on a subset of these digital CDRs, i.e. so-called All-Digital Clock and Data Recovery (AD-CDR) circuits. AD-CDRs are derived from the first All-Digital Phase Locked Loop (AD-PLL) introduced in [25]. PLLbased CDR circuits have the advantage over alternative digital friendly CDRs that they have intrinsically a wide frequency capture range due to the ability to adapt both phase and frequency [13]. Additionally, they have a wide bandwidth and the ability to reject input jitter (as discussed in Section 2.3) [11].

The overall architecture of an AD-CDR architecture is shown in Fig. 2.7. It comprises a digital PD and a Digitally Controlled Oscillator (DCO) in addition to a DLF [14–16, 26–30]. The digital PD determines the phase difference between edges in the input data stream ( $D_{in}$ ) and the recovered clock (Clk) signal. When the clock is leading the input data, a logic *Early* signal is generated to decrease the frequency of the recovered clock. Alternatively, when the clock is lagging, the digital PD outputs a logic *Late* signal to increase the frequency of the recovered clock. These *Early* and *Late* signals are digital signals which are filtered by the DLF. The resulting signal controls the DCO such that the phase error is reduced.

Note that if no data transition occurs, the digital PD cannot determine if

the clock leads or lags the data and therefore does not generate any signal. Consequently, the DCO is not adjusted.



Figure 2.7: An All-Digital CDR.

# 2.5.2 Advantages

Although (All-)Digital CDRs have to compete with mature analog techniques that are based on established PDs combined with charge pump loop filters, digital loop concepts that originate from AD-PLLs have clear advantages. In fact, all the advantages of the digital PLL loop concepts can also be applied to the case of a multi-gigabit CDR and even appear to be more pronounced:

- Charge pump loop filters do not scale well towards ultra-deep- submicron CMOS technologies due to the leakage effects and the reduced output impedance of ultra-deep-submicron transistors. Moreover, these technologies have a reduced voltage headroom, which causes difficulties to drive a VCO. Additionally, noise with a similar power will have relatively much more impact compared to the desired signal. This results in a reduced noise margin. Clearly, DLFs do not have this problem.
- Being a full analog technology, charge pump techniques are not easily ported towards new technology nodes. In contrast, this is easy for digital filters due to the power of automated synthesis. Besides, digital filtering benefits from scaling advantages, lowering power dissipation in each successive technological node.
- In analog CDRs, the charge pump leakage (caused by the low output impedance of a transistor) causes an additional undesired phenomenon: i.e. when there is a large number of CID in the input data stream, the instantaneous gain of the PD becomes zero. When this

happens, the PLL is effectively in an open loop state, and the state of the charge pump will "leak" away, which will cause frequency drift of the oscillator. Clearly, with a digital implementation of the loop filter, no undesired leakage can occur, which would solve these problems.

This is also a great advantage for burst mode applications, where there is no phase information available between bursts. Due to the digital nature, no frequency drift occurs between bursts. Therefore, the burst mode operation of the CDR does not require a precise reference clock and the CDR is able to achieve fast settling times.

- Constraints on the PLL dynamics and jitter performance often require a loop filter with a low cut-off frequency. To implement this with analog techniques, very large capacitors that cannot easily be co-integrated with the rest of the PLL are needed. This problem can be circumvented by using small charge pump currents, however this degrade noise margin. This is in contrast to a digital implementation of the loop filter where very low cut-off frequencies are easily implemented.
- Typically, CDRs must have fast locking characteristics and low jitter. However, those two requirements cannot be met simultaneously using a conventional PLL design. Digital PLLs have a greater flexibility over the loop bandwidth control, because there are no physical resistors and capacitors which determine the loop bandwidth, only registers and variables. Therefore, a digital CDR system can easily be extended by incorporating an on-the-fly adaptation of the DLF's coefficients. This way, classical trade-offs between different system parameters such as PLL bandwidth and lock time, can be relaxed by (digitally) detecting in which state the PLL is, and selecting the appropriate filter coefficients. This paves the way for independent optimization of the different system parameters with greatly improved overall performance.

This adaptation or switching of the filter coefficients is for example useful when a frequency drift occurs due to a long series of CID in the incoming data stream and the CDR has to reacquire the phase/frequency after a possible loss of lock. In this case, a loop filter with a large bandwidth is used during the acquisition process to enhance locking speed. Subsequently, it is switched to narrow bandwidth after lock is achieved to meet the phase jitter requirement.

# 2.5.3 Challenges

Digital PLL techniques clearly have a lot of advantages and therefore these techniques are commonly used in typical wireless systems such as frequency synthesis, RF up-conversion and direct modulation [31-33]. However, this "digital PLL" concept has not yet been extensively transferred to typical clock and data recovery applications which occur in multi-gigabit fiber-optic communication systems. The reason for this is that there is an important difference between the PLLs typically needed for radio applications and the PLLs for CDR applications: i.e. a typical PLL in a radio application has a slow reference clock and a feedback frequency divider. This way the PD operates at a relatively low frequency and hence it is straightforward to implement the DLF at a conveniently low frequency (usually below 100 MHz). This is very different in a CDR application for fiber-optic communication where the PD has to process the high-speed input data and typically operates at the full data rate (tens of GHz). This frequency is simply too high to practically implement an automatically synthesized DLF at a reasonable power consumption.

Undoubtedly, several challenges for the full integration of digital PLL techniques in a CDR are still present and are clarified below:

# **Speed Reduction in DLF**

As mentioned, an automatic synthesized DLF cannot operate at a tens of Gb/s data rate. In prior work, a DLF which consists of a proportional path and integral path is typically split up: the speed of the integral path of the DLF is reduced by using demultiplexing [14, 30] or subsampling [15, 34]. On the other hand, the proportional path still runs at a high speed and due to this, these blocks had to be designed and laid out by hand. The digital outputs of the proportional and integral path are then converted to the analog domain where they are again combined [34]. This largely counteracts the advantages of a digital PLL techniques discussed in Section 2.5.2.

There is only one very recent related work [16] where the digital block is entirely synthesized. To accommodate this synthesis, the input of the digital loop is heavily demultiplexed into many parallel lanes but this has disadvantages: a large amount of parallel samplers are needed to process the high-speed data input and this in turn requires a considerable clock distribution network. Moreover, the huge amount of samples has to be processed by a complex signal processing block. This increases the power consumption and chip area. In this work, we use extensive subsampling instead of demultiplexing to reduce the operating speed of the entire DLF. This will enable us to push the integration up the the level where our DLF is entirely synthesized without requiring complex signal processing. Subsampling the input data in a CDR circuit, however, inherently leads to loss of jitter information since only a fraction of the available transitions are taken into account. Therefore, the subsampling effect should be investigated and a correct operation of the CDR has to be maintained.

## **Requirement of a High-speed, Energy Efficient PD**

The DLF is preceded by a high speed PD that operates at the full data rate. Therefore, the phase difference detected by the PD has to be converted to a digital signal and reduced in speed in order to be compatible with the input of the (subsampled) loop filter. Obviously, this PD may not disturb the correct operation of the CDR and has to be energy efficient.

There are two groups of digital PD: the (quasi-)linear and the binary phase detectors. The linear analog PDs followed by a Time-to-Digital Converter (TDC) are advanced complex building blocks which typically consume a lot of power. Therefore we prefer to use a (binary) Bang-Bang Phase Detector (BB-PD) in this work. These phase detectors are typically used in high-speed CDR circuits because they provide simplicity in design, good phase adjustment and can work at high speeds [35]. Additionally, these BB-PDs have the advantage that the output is already digital. This output can also be easily combined with the required subsampling, which makes this type of phase detector very suitable to drive a DLF.

In this work, the newly proposed Inverse Alexander BB-PD will be introduced as improvement over the established and well-known Alexander BB-PD when subsampling is used. Unfortunately, the behavior of a CDR with a BB-PD is highly non-linear which complicates the analysis.

# Analysis of a Non-Linear and Subsampled System

The use of a non-linear block and subsampling complicates the analysis of the behavior of the CDR. Nevertheless, there have been several publications which predict the characteristics of a non-linear CDR such as jitter transfer, jitter tolerance and jitter generation [35–37]. However, all these papers assume that the CDR operates in its normal working area, which means that the CDR is stable and does not have a limit cycle.

In most applications, these limit cycles are undesired as they produce unwanted spurious tones or peaking in the recovered clock's output spectrum [38]. Hence, it is necessary to predict whether the CDR has a limit cycle.

In this work, describing function pseudo-linear analysis is used to examine the occurrence of limit cycles. Furthermore, the analysis of the subsampling operation of an AD-CDR has not been performed in literature and will also be examined in this work. These phenomena can be analyzed in the phase domain by looking at the combined aliasing effects.

Based on the analysis and simulations results, the architecture of the nextgeneration AD-CDR can be set up.

# 2.6 Next-Generation (All-Digital) Clock and Data Recovery

As explained above, the primary goal of this work is to avoid an analog (charge pump-based) loop filter and to implement the loop filter in the digital domain in order to obtain an AD-CDR. First, the challenges that hinder the introduction of digital PLL techniques in today's multi-gigabit AD-CDR circuit for fiber optic application have to be tackled. This allows the substitution of the analog building blocks which will result in a reduction of the power consumption and will support the scalability towards new and more digital process technologies.



Figure 2.8: The proposed next-generation All-Digital CDR.

The overall architecture of the proposed AD-CDR is shown in Fig. 2.8. It consists of an Inverse Alexander BB-PD, a subsampler, a clock divider, a DLF and a DCO. The BB-PD determines the phase difference between edges in the input data stream  $(D_{in})$  and the recovered clock (Clk) signal. If

the clock is leading the input data, an Early signal is generated to decrease the frequency of the recovered clock. Alternatively, if the clock is lagging, the BB-PD outputs a *Late* signal to increase the frequency of the recovered clock. These Early and *Late* signals are subsampled by a factor of N and then filtered by the Digital Loop Filter (DLF). The resulting signal controls the DCO such that the phase error is reduced. Note that if no data transition occurs, the BB-PD cannot determine if the clock leads or lags the data and therefore does not generate any signal. Consequently, the DCO is not adjusted.

To demonstrate the correct operation and low power efficiency, profound analyses were performed and a  $25 \,\mathrm{Gb/s}$  PLL-based All-Digital Clock and Data Recovery (AD-CDR) circuit prototype was designed and implemented in an advanced CMOS technology (40 nm).

# References

- Behzad Razavi, Design of Integrated Circuits for Optical Communications, 2nd ed. Hoboken, New Jersey, USA: John Wiley & Sons, Inc., 2012.
- [2] Eduard Säckinger, Broadband Circuits for Optical Fiber Communication. Hoboken, New Jersey, USA: John Wiley & Sons, Inc., 2005.
- [3] Arno Vyncke, "A low power, multi-rate clock-and-data recovery circuit and MAC preprocessor for 40 Gbit/s cascaded bit-interleaving passive optical networks," Ph.D. dissertation, Ghent University, 2016.
- [4] Christophe Van Praet, "Techniques to reduce energy consumption in next-generation access networks," Ph.D. dissertation, Ghent University, 2014.
- [5] Cedric Mélange, "Burst Mode Clock and Data Recovery in Long Reach Passive Optical Networks." Ph.D. dissertation, Ghent University, 2010.
- [6] Behzad Razavi, "Challenges in the design high-speed clock and data recovery circuits," *IEEE Communications Magazine*, vol. 40, no. 8, pp. 94–101, aug 2002.
- [7] Agilent Technologies, "Understanding Jitter and Wander Measurements and Standards," p. 1, 2003.
- [8] ITU-T G.783, "ITU-T G.783 Characteristics of synchronous digital hierarchy (SDH) equipment functional blocks."
- [9] ITU-T G.825, "ITU-T G.825 : The control of jitter and wander within digital networks which are based on the synchronous digital hierarchy (SDH)."
- [10] S.I. Ahmed and T.A. Kwasniewski, "Overview of oversampling clock and data recovery circuits," *Canadian Conference on Electrical and Computer Engineering*, pp. 1876–1881, may 2005.

- [11] Ming-ta Hsieh and Gerald Sobelman, "Architectures for multi-gigabit wire-linked clock and data recovery," *IEEE Circuits and Systems Magazine*, vol. 8, no. 4, pp. 45–57, dec 2008.
- [12] Yves Martens, "High performance digital circuits for gigabit passive optical networks." Ph.D. dissertation, Ghent University, 2005.
- [13] Takashi Masuda, Ryota Shinoda, Jeremy Chatwin, Jacob Wysocki, Koki Uchino, Yoshifumi Miyajima, Yosuke Ueno *et al.*, "A 12 Gb/s 0.9 mW/Gb/s Wide-Bandwidth Injection-Type CDR in 28 nm CMOS With Reference-Free Frequency Capture," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 12, pp. 3204–3215, dec 2016.
- [14] Sang-Hyeok Chu, Woorham Bae, Gyu-Seob Jeong, Sungchun Jang, Sungwoo Kim, Jiho Joo, Gyungock Kim *et al.*, "A 22 to 26.5 Gb/s Optical Receiver With All-Digital Clock and Data Recovery in a 65 nm CMOS Process," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2603–2612, nov 2015.
- [15] Taeho Lee, Yong-Hun Kim, Jaehyeong Sim, Jun-Seok Park, and Lee-Sup Kim, "A 5-Gb/s 2.67-mW/Gb/s Digital Clock and Data Recovery With Hybrid Dithering Using a Time-Dithered DeltaSigma Modulator," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 4, pp. 1450–1459, apr 2016.
- [16] Wahid Rahman, Danny Yoo, Joshua Liang, Ali Sheikholeslami, Hirotaka Tamura, Takayuki Shibasaki, and Hisakatsu Yamaguchi, "A 22.5-to-32Gb/s 3.2pJ/b referenceless baud-rate digital CDR with DFE and CTLE in 28nm CMOS," in 2017 IEEE International Solid-State Circuits Conference (ISSCC), feb 2017, pp. 120–121.
- [17] Reza Navid, E-Hung Chen, Masum Hossain, Brian Leibowitz, Jihong Ren, Chuen-huei Adam Chou, Barry Daly *et al.*, "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 4, pp. 814–827, apr 2015.
- [18] Zheng-Hao Hong, Yao-Chia Liu, and Wei-Zen Chen, "A 3.12 pJ/bit, 19-27 Gbps Receiver With 2-Tap DFE Embedded Clock and Data Recovery," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2625–2634, nov 2015.
- [19] Guanghua Shu, Woo Seok Choi, Saurabh Saxena, Mrunmay Talegaonkar, Tejasvi Anand, Ahmed Elkholy, Amr Elshazly *et al.*, "A 4to-10.5 Gb/s Continuous-Rate Digital Clock and Data Recovery With

Automatic Frequency Acquisition," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 2, pp. 428–439, feb 2016.

- [20] Hyosup Won, Taehun Yoon, Jinho Han, Joon-Yeong Lee, Jong-Hyeok Yoon, Taeho Kim, Jeong-Sup Lee *et al.*, "A 0.87 W Transceiver IC for 100 Gigabit Ethernet in 40 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 2, pp. 399–413, feb 2015.
- [21] Guoying Wu, Deping Huang, Jingxiao Li, Ping Gui, Tianwei Liu, Shita Guo, Rui Wang *et al.*, "A 116 Gb/s All-Digital Clock and Data Recovery With a Wideband High-Linearity Phase Interpolator," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 7, pp. 2511–2520, jul 2016.
- [22] Joshua Liang, Ali Sheikholeslami, Hirotaka Tamura, Yuuki Ogata, and Hisakatsu Yamaguchi, "A 28Gb/s digital CDR with adaptive loop gain for optimum jitter tolerance," in 2017 IEEE International Solid-State Circuits Conference (ISSCC), feb 2017, pp. 122–123.
- [23] Lucio Rodoni, George von Buren, Alex Huber, Martin Schmatz, and Heinz Jackel, "A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 7, pp. 1927–1941, jul 2009.
- [24] Soon-Won Kwon, Joon-Yeong Lee, Jinhee Lee, Kwangseok Han, Taeho Kim, Sangeun Lee, Jeong-Sup Lee *et al.*, "An Automatic Loop Gain Control Algorithm for Bang-Bang CDRs," *IEEE Transactions* on Circuits and Systems I: Regular Papers, vol. 62, no. 12, pp. 2817– 2828, dec 2015.
- [25] Robert Bogdan Staszewski, Khurram Muhammad, Dirk Leipold, Chih-Ming Hung, Yo-Chuol Ho, John L. Wallberg, Chan Fernando *et al.*, "All-digital TX frequency synthesizer and discrete-time receiver for Bluetooth radio in 130-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 12, pp. 2278–2291, dec 2004.
- [26] Chi-Shuang Oulee and Rong-Jyi Yang, "A 1.25Gbps all-digital clock and data recovery circuit with binary frequency acquisition," in APC-CAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems. IEEE, nov 2008, pp. 680–683.
- [27] I-Fong Chen, Rong-Jyi Yang, and Shen-Iuan Liu, "Loop latency reduction technique for all-digital clock and data recovery circuits," in 2009 IEEE Asian Solid-State Circuits Conference, nov 2009, pp. 309– 312.

- [28] N. Tall, N. Dehaese, S. Bourdel, and B. Bonat, "An all-digital clock and data recovery circuit for low-to-moderate data rate applications," in 2011 18th IEEE International Conference on Electronics, Circuits, and Systems, dec 2011, pp. 37–40.
- [29] Ching-Che Chung, Duo Sheng, and Yang-Di Lin, "An all-digital clock and data recovery circuit for spread spectrum clocking applications in 65nm CMOS technology," in 2012 4th Asia Symposium on Quality Electronic Design (ASQED), jul 2012, pp. 91–94.
- [30] Heesoo Song, Deok-Soo Kim, Do-Hwan Oh, Suhwan Kim, and Deog-Kyoon Jeong, "A 1.04.0-Gb/s All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Proportional Gain Control," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 2, pp. 424–434, feb 2011.
- [31] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, A. Visweswaran, and J. R. Long, "All-Digital RF I/Q Modulator," *IEEE Transactions* on *Microwave Theory and Techniques*, vol. 60, no. 11, pp. 3513–3526, nov 2012.
- [32] Robert Bogdan Staszewski, "State-of-the-Art and Future Directions of High-Performance All-Digital Frequency Synthesis in Nanometer CMOS," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 58, no. 7, pp. 1497–1510, jul 2011.
- [33] Robert Bogdan Staszewski, Khurram Waheed, Fikret Dulger, and Oren E. Eliezer, "Spur-Free Multirate All-Digital PLL for Mobile Phones in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 12, pp. 2904–2919, dec 2011.
- [34] Mrunmay Talegaonkar, Rajesh Inti, and Pavan Kumar Hanumolu, "Digital clock and data recovery circuit design: Challenges and tradeoffs," 2011 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–8, sept 2011.
- [35] Jri Lee, K.S. Kundert, and B. Razavi, "Analysis and Modeling of Bang-Bang Clock and Data Recovery Circuits," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1571–1580, sep 2004.
- [36] Richard Walker, "Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems," in *Phase-Locking in High-Performance Systems: From Devices to Architectures*, Behzad Razavi, Ed. Wiley-IEEE Press, 2003, pp. 34–45.

- [37] Youngdon Choi, Deog-Kyoon Jeong, and Wonchan Kim, "Jitter transfer analysis of tracked oversampling techniques for multigigabit clock and data recovery," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 50, no. 11, pp. 775–783, nov 2003.
- [38] M. Zanuso, D. Tasca, S. Levantino, A. Donadel, C. Samori, and A.L. Lacaita, "Noise Analysis and Minimization in Bang-Bang Digital PLLs," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 11, pp. 835–839, nov 2009.

# Part II

Analysis, Design and Implementation

# Clock and Data Recovery Analysis

After the introduction of a Phase Locked Loop (PLL)-based Clock and Data Recovery (CDR) circuit in Section 2.4, the operation and behavior of the CDR will be studied more thoroughly. In this chapter, a behavioral phase domain model for a CDR circuit with a Bang-Bang Phase Detector (BB-PD) is introduced. The behavior of the CDR is highly non-linear which complicates the analysis. Therefore, describing function quasi-linearization techniques are used to analyze the stability and the jitter characteristics of the CDR. Our model is also expanded to a subsampled digital system to give a good representation of the operation of the complete All-Digital Clock and Data Recovery (AD-CDR). Furthermore, the amount of Consecutive Identical Digits (CID) that the AD-CDR can tolerate is discussed.

# 3.1 CDR Phase Domain Model

Assuming that the CDR loop is locked, a mathematical behavioral model for the CDR can be created. That is, the input data signal  $D_{in}$  and the recovered clock signal Clk are represented by their (excess) phases  $\phi_{in}$  and  $\phi_{out}$ , respectively [1]. The relations between the voltage domain signals and the phases are given by:

$$Clk_{D_{in}}(t) = \operatorname{sign}\left[\cos(\omega_{osc}t + \phi_{in}(t))\right]$$
(3.1)

$$Clk(t) = \operatorname{sign}\left[\cos(\omega_{osc}t + \phi_{out}(t))\right]$$
(3.2)

where  $Clk_{D_{in}}(t)$  is the clock signal used to generate the input data  $D_{in}$  and Clk(t) is the recovered clock signal.

The voltage domain clock signals are represented as square waves with an oscillation frequency  $\omega_{osc}^{1}$ . The corresponding phases of  $Clk_{D_{in}}(t)$  and Clk(t) are given by  $\phi_{in}(t)$  and  $\phi_{out}(t)$ , respectively. Any phase variation on the clock signal used to generate the input data  $Clk_{D_{in}}$  is automatically transferred to a phase variation of the input data signal  $D_{in}$ . Therefore,  $\phi_{in}(t)$  is also defined as the phase variation of the input data signal  $D_{in}$ .

The relation between a voltage domain signal and the corresponding phase  $\phi_x(t)$  is already visualized by Fig. 2.3. Here, the phase  $\phi_x(t)$  is defined as the jitter function and presented as a sine wave. However, the variation in the phase (or jitter function) is not limited to a sine wave and can be described by any deterministic or random signal, or a combination of both.

The phase signals expressed as a function of time (e.g.:  $\phi_{in}(t)$  and  $\phi_{out}(t)$ ) can also be converted into their corresponding frequency components (e.g.:  $\phi_{in}(s)$  and  $\phi_{out}(s)$ ) by using Laplace transformations. The CDR can now be treated as a feedback system, where the closed loop transfer function H(s), given by Eq. (3.3), reveals how the output phase tracks slow and fast variations of the input phase.

$$H(s) = \frac{\phi_{out}(s)}{\phi_{in}(s)} \tag{3.3}$$

The schematic of a general charge pump CDR shown in Fig. 2.6 and repeated here for convenience (Fig. 3.1) is converted to a behavioral model in the phase domain (Fig. 3.2). The BB-PD in the CDR is replaced by an ideal subtraction, a comparator, an edge detector and a Zero-Order Hold (ZOH) block. The phase of the recovered clock  $\phi_{out}$  is subtracted from the phase of the incoming data  $\phi_{in}$ , followed by an ideal comparator. The edge detector outputs every period  $1/f_{data}$  a '1', respectively a '0', when a data transition occurs or not. This signal is multiplied with the value of the comparator and is sent through the ZOH, resulting in a signal  $\phi_u$  that only adjusts the Voltage Controlled Oscillator (VCO) when a data transition takes place. In [2], the BB-PD is modeled as a slicer with a ternary output, resulting in an equivalent behavior.

The combination of the charge pump, the loop filter and the VCO is equiv-

<sup>&</sup>lt;sup>1</sup>The recovered clock Clk(t) will have the same frequency as the clock used to generate the input data  $Clk_{D_{in}}(t)$  if the CDR is locked.



Figure 3.1: A charge pump Phase Locked Loop-based Clock and Data Recovery circuit.



Figure 3.2: The behavioral model of a CDR with a BB-PD.

alent to the linear block G(s):

$$G(s) = \frac{\omega_0}{s} \frac{1 + \frac{\omega_z}{s}}{1 + \frac{s}{\omega_p}} \exp\left(-sT_{d,l}\right)$$
(3.4)

where  $\omega_z$  represents the frequency of the zero,  $\omega_p$  the frequency of the pole,  $\omega_0$  the overall amplification factor of the linear block and  $T_{d,l}$  the delay of the signal path through the CDR. This loop delay is the sum of all gate and component delays in the loop [3] and also includes any delay introduced by re-timing and demultiplexing the data in the Phase Detector (PD) [4– 6]. Note that if  $\omega_0$  has a value between  $\omega_z$  and  $\omega_p$ ,  $\omega_0$  also represents the unity gain frequency. The assumption is made that  $\omega_0$  and  $\omega_p$  will always be sufficiently larger than  $\omega_z$ , such that the zero has little effect. For the sake of completeness,  $\omega_z$ ,  $\omega_p$  and  $\omega_0$  can be written down in terms of their component values (Fig. 3.1):

$$\omega_z = \frac{1}{RC} \tag{3.5}$$

$$\omega_p = \frac{C+C_2}{RCC_2} \tag{3.6}$$

$$\omega_0 = K_{vco} I_p \frac{RC}{C+C_2} \tag{3.7}$$

In the equations above,  $K_{vco}$ ,  $I_p$ , R, C and  $C_2$  respectively represent the gain of the VCO [Hz/V], the current sources of the charge pump [A] and the resistance [ $\Omega$ ] and capacitance [F] values of the loop filter. Finally, the phase noise contributed by the VCO is modeled by  $\phi_{vco}$  in Fig. 3.2.

This model is a good approximation of the real system: it incorporates the *Early-Late* signal based on the sign of the difference of input phase and output phase  $\phi_e$ , the update rate  $f_{data}$  of the CDR, the transition density of the data, any delay introduced in the CDR and different noise sources. Note that, this model is also a good representation for a standard PLL when a data transition occurs every clock cycle.

Typically, the data period is much smaller than any time constant in the CDR. Therefore the intrinsic sample and hold operation of the BB-PD, represented by the ZOH block in Fig. 3.2, can be approximated by a delay of  $1/(2f_{data})$  [7]. To simplify further calculations, this delay is added to the delay of the linear block  $T_{d,l}$ , resulting in the total delay  $T_d$ :

$$T_d = T_{d,l} + \frac{1}{2f_{data}} \tag{3.8}$$

As a result, the analysis can be performed in continuous time.

# 3.2 Describing Functions: Pseudo-Linear Model

The BB-PD is a highly non-linear block. A powerful method to analyze such a system is the describing function quasi-linearization technique [8]. Here, the input signal of the non-linearity is denoted by the phase error  $\phi_e$ on Fig. 3.2 and can be decomposed in a sum of basic signal components: e.g. a DC bias  $\phi_{e,DC}$ , a sinusoid  $\phi_{e,s}$  or a random Gaussian process  $\phi_{e,n}$ . For each component a best-fit linear gain is determined in order to minimize the mean-squared difference between the output of the approximation and the output of the non-linearity. Of course, if a large number of these basic signal components is present in the input signal, the complexity of the describing functions will increase.

Fig. 3.3 visualizes this describing function model in the time domain for the case that the non-linearity is a comparator (Fig. 3.3(a)). In this example the input of the non-linearity  $\phi_e$  is decomposed in a sinusoidal component  $\phi_{e,s}$  and a random Gaussian component  $\phi_{e,n}$ . For each component there is a corresponding magnitude dependent gain. The sum of the amplified signal components results in an approximation of the output signal of the nonlinearity  $\phi_{u,approx}$ . This is represented by Fig. 3.3(b), which is the original describing function model of [8]. To increase the accuracy, the linearization error  $\phi_q$ , which is defined as the difference between the approximated output  $\phi_{u,approx}$  and the actual output  $\phi_u$  (Fig. 3.3(c)), is included in our pseudo-linear model. This improvement was also already performed in [2].

# 3.2.1 Random-Input Describing Function

The simplest possible case to study the behavior of the CDR, correspond to the situation where there is only one signal component present at the input of the non-linearity  $\phi_e$ . As noise is present in every system, this implies that this signal component originates from a random Gaussian process and will be denoted by  $\phi_{e,n}$ . This signal component  $\phi_{e,n}$  is normally distributed with zero mean and variance  $\sigma_e^2$ .

The non-linearity of the BB-PD can now be modeled by a single gain block (Fig. 3.4), for which the gain  $K_n$  is calculated using the Random-Input Describing Function (RIDF). Furthermore, the linearization error  $\phi_q$  is also added to the model in Fig. 3.4. In [2], it was already proven that in this case the linearization error can be accurately modeled by an independent noise source  $\phi_q$  which is uncorrelated to  $\phi_{e,n}$ . The values of the gain  $K_n$  and the



Figure 3.3: A time domain example of the describing function model for a non-linearity. (a) The characteristic of a comparator (a non-linearity). (b) The describing function characteristic according to the original approach in [8]. (c) The definition of the linearization error  $\phi_q$ , which is included in [2] and in our pseudo-linear analysis.

variance of the linearization error  $\phi_q$  are given by [2]:

$$K_n(\sigma_e) = \sqrt{\frac{2}{\pi}} \frac{\alpha}{\sigma_e}$$
 (3.9)

$$\sigma_q^2 = \alpha - \frac{2}{\pi} \alpha^2 \tag{3.10}$$

where  $\sigma_e$ ,  $\sigma_q$  and  $\alpha$  respectively represent the standard deviation of  $\phi_{e,n}$  and  $\phi_q$ , and the transition density of the data.

These equations are valid if the bandwidth of the loop filter is much smaller than the data rate and the data transitions occur in a random manner (with probability  $\alpha$ )<sup>2</sup>. In our work, we assume that these conditions are also met and therefore Eqs. (3.9) and (3.10) are adequate for further analysis.

Note that the gain factor  $K_n$  is not a fixed value, but depends on the characteristics of the input signal of the non-linearity, i.e. the standard deviation  $\sigma_e$ . This is a typical property of describing functions.



Figure 3.4: The RIDF model of a CDR with a BB-PD.

# 3.2.2 Limit Cycles

In some cases, an oscillation can build up and be sustained by the CDR's feedback mechanism. The characteristics of the oscillation are a system property, and are independent on initial conditions. Such an oscillation is called a limit cycle [8].

Further on, the higher harmonics of the limit cycle oscillation, originating from the non-linearity of the BB-PD are neglected. This approach is justified, because the linear block G(s) filters the BB-PD output harmonics such that only a negligible part of the harmonics is fed back to the input of the BB-PD. Hence, the input of the BB-PD is approximated by the sum of

<sup>&</sup>lt;sup>2</sup>To ensure that this assumption is satisfied, scramblers are typically used to avoid any auto-correlation of the data pattern.

a random Gaussian and a sinusoidal component. This way, the Gaussianplus-Sinusoid-Input Describing Function (GSIDF) has to be applied instead of the simpler RIDF in order to correctly analyze the non-linear system.

# 3.2.3 Gaussian-plus-Sinusoid-Input Describing Function

When a limit cycle with a non-zero amplitude is present, the phase error  $\phi_e$  will consist of the sum of a random Gaussian component  $\phi_{e,n}$  and a sinusoidal component  $\phi_{e,s}$ .  $\phi_{e,n}$  is normally distributed with zero mean and variance  $\sigma_e^2$ , while  $\phi_{e,s}$  can be written as:

$$\phi_{e,s} = A_e \sin\left(\omega_s t\right) = A_e \sin\theta \tag{3.11}$$

with  $A_e$  the amplitude of the limit cycle,  $\omega_s$  the frequency of the limit cycle and  $\theta$  the instantaneous phase.



Figure 3.5: The GSIDF model of the non-linearity of a BB-PD.

The describing functions of the BB-PD are now determined by the GSIDF. Fig. 3.5 represents the GSIDF model for the non-linearity in Fig. 3.2. The sinusoidal  $\phi_{e,s}$  and the random Gaussian  $\phi_{e,n}$  component are treated separately, each with their corresponding gain factor  $K_s$  and  $K_n$ . These gain factors are calculated such that the linearization error  $\phi_q$  between the pseudolinear model and the actual BB-PD is minimized.

For a general non-linear element, denoted by  $\phi_u(\phi_e)$ , the describing functions can be written down as shown in [8]:

$$K_n(A_e, \sigma_e) = \frac{1}{\sqrt{2\pi^3}\sigma_e^3} \int_0^{2\pi} \mathrm{d}\theta \int_{-\infty}^\infty \phi_u(\phi_e) \phi_{e,n} \exp\left(-\frac{\phi_{e,n}^2}{2\sigma_e^2}\right) \,\mathrm{d}\phi_{e,n}$$
(3.12)

$$K_s(A_e, \sigma_e) = \frac{2}{\sqrt{2\pi}^3} \sigma_e A_e \int_0^{2\pi} \mathrm{d}\theta \int_{-\infty}^{\infty} \phi_u(\phi_e) \sin\theta \, \exp\left(-\frac{\phi_{e,n}^2}{2\sigma_e^2}\right) \, \mathrm{d}\phi_{e,n}$$
(3.13)

where  $\phi_e$  is the input of the non-linearity and is determined by the sum of a random Gaussian component  $\phi_{e,n}$  and a sinusoidal component  $\phi_{e,s}$ .

By substituting the actual BB-PD characteristics in Eq. (3.12) and Eq. (3.13), the GSIDFs become:

$$K_n(A_e, \sigma_e) = \frac{\alpha}{\sqrt{2\pi}} \frac{1}{\pi \sigma_e} \int_0^{2\pi} \exp\left(-\frac{1}{2} \left(\frac{A_e \sin \theta}{\sigma_e}\right)^2\right) d\theta \qquad (3.14)$$

$$K_s(A_e, \sigma_e) = \frac{\alpha}{\pi A_e} \int_0^{2\pi} \operatorname{erf}\left(\frac{A_e \sin\theta}{\sqrt{2}\sigma_e}\right) \sin(\theta) \,\mathrm{d}\theta \tag{3.15}$$

In the equations above,  $K_n$ ,  $K_s$ ,  $A_e$  and  $\sigma_e$  respectively represent the noise gain and sinusoidal gain, the amplitude of the sinusoidal component of  $\phi_e$  (limit cycle) and the standard deviation of the noise component of  $\phi_e$ .

These equations look complex, but can be easily evaluated with modern mathematical tools: e.g. the sinusoidal gain  $K_s$  is plotted in Fig. 3.6 as a function of  $A_e$  for increasing values of  $\sigma_e$ . Fig. 3.6 clearly shows that the sinusoidal gain converges when the noise at the input of the non-linearity becomes negligible w.r.t. the sinusoidal component. For this envelope, the Gaussian-plus-sinusoid-input describing function is reduced to the (single) Sinusoidal-Input Describing Function (SIDF), for which the gain is inversely proportional to the amplitude  $A_e$  [8]:

$$\lim_{\sigma_e \to 0} K_s(A_e, \sigma_e) = K_{SIDF}(A_{e,max}) = \frac{4\alpha}{\pi A_{e,max}}$$
(3.16)

In this analysis the input model of the BB-PD is approximated by a random Gaussian plus a sinusoidal component and the higher harmonics of the limit cycle are omitted. However, the linearization error  $\phi_q$  still contains the harmonics originating from the non-linearity of the BB-PD. In order to simplify further analysis, the linearization error  $\phi_q$  is approximated as random Gaussian noise. This allows to include the linearization error  $\phi_q$  in the random Gaussian component of the output of the BB-PD  $\phi_{u,n}$  and makes it possible to decompose the CDR into two GSIDF models: one for the sinusoidal component.

The variance of the linearization error  $\sigma_q^2$  can then be determined as [2]:

$$\sigma_q^2 = \alpha - K_n^2 \sigma_e^2 - K_s^2 \frac{A_e^2}{2}$$
(3.17)



Figure 3.6:  $K_s$  according to Eq. (3.15) as a function of the amplitude  $A_e$  and the RMS jitter  $\sigma_e$  at the input of the non-linearity. ( $\alpha = 0.5$ )

# 3.3 Stability in Charge Pump CDRs

In most applications, instability or the occurrance of these limit cycles is undesired because limit cycles produce unwanted spurious tones or peaking in the recovered clock's output spectrum [9]. Hence, it is necessary to predict whether the CDR has a limit cycle.

The numerous amount of studies performed about the presence of limit cycles, indicate the importance of this research topic. However, most of the previously published work focuses on the domain of digital PLLs with a BB-PD [9–14] and very little research has been conducted about the occurrence of limit cycles in charge pump CDRs.

It is important to note that the analyses applied for digital PLLs cannot be easily mapped to a charge pump CDR: firstly, only PLLs are considered, which use a clock signal as input. In most cases this clock is provided from a very clean reference. This is very different from CDR applications where the reference jitter on the input data is the dominant noise source in the loop [9].

In recent years, there has been some prior related work on limit cycles in CDRs with a BB-PD: e.g. in [15], a stability analysis of CDRs with a BB-PD is performed. But, unlike in our work, no clear distinction is made between the case with or without limit cycles.

This section discusses an extensive and quantitative analysis of the occurrence of limit cycles in CDRs with a BB-PD. For this, the describing function techniques discussed above are further exploited. The proposed analysis is able to accurately predict the occurrence of a limit cycle as well as its amplitude. This leads to the quantification of the input jitter necessary to quench a limit cycle as well as the worst-case limit cycle amplitude, as a function of the different loop parameters.

# 3.3.1 System Relations

The GSIDF model of Fig. 3.5 is incorporated in the complete model of the CDR. This results in two block diagrams and is represented by Fig. 3.7. The phase error  $\phi_e$ , the output of the BB-PD  $\phi_u$  and the output of the CDR  $\phi_{out}$  are the sum of their random Gaussian and their sinusoidal component, i.e. at every node x we can write:

$$\phi_x = \phi_{x,n} + \phi_{x,s} \tag{3.18}$$



Figure 3.7: The GSIDF model of a CDR with a BB-PD for (a) the sinusoidal component and (b) the random Gaussian component (identical to the RIDF model in Fig. 3.4).

When this system is excited by only a random Gaussian process, any sinusoid appearing in the system would have to be caused by a limit cycle (which, as discussed above, is approximated by its fundamental sinusoidal component). The condition for self-oscillation is given by:

$$K_s(A_e, \sigma_e) = \left| \frac{1}{G(j\omega_s)} \right| \equiv K_s^*$$
(3.19)

with  $\omega_s$  the oscillation frequency of the limit cycle for which G(s) reaches 180° phase lag, i.e. the Barkhausen criterion.

In addition, the random Gaussian component in the CDR must satisfy the following equations:

$$\phi_{e,n} = H_1(s) \phi_{in} + H_2(s) \phi_q \tag{3.20}$$

$$H_1(s) = \frac{1}{1 + K_n G(s)}$$
(3.21)

$$H_2(s) = -\frac{G(s)}{1 + K_n G(s)}$$
(3.22)

where  $K_n$  and  $K_s$  are given by Eq. (3.14) and Eq. (3.15), and  $\phi_q$ ,  $\phi_{in}$  and  $\phi_{e,n}$  are respectively the linearization error, the input noise and the random Gaussian component of the phase error. Finally, the variance of the phase error  $\sigma_e^2$  is calculated by integrating the power spectral density  $S_{\phi_e}$  over the noise bandwidth B:

$$\sigma_e^2 = \int_B S_{\phi_{in}} |H_1(j\omega)|^2 + S_{\phi_q} |H_2(j\omega)|^2 \,\mathrm{d}\omega$$
 (3.23)

where  $S_{\phi_{in}}$  and  $S_{\phi_q}$  are the power spectral densities of the input random jitter and the linearization error. The noise bandwidth reaches from DC to  $f_{data}/2$ , due to the fact that the system only reacts on a data edge and hence implicitly incorporates a sampling operation [2]. To match with the simulations (Section 3.3.4), also a narrow band around the oscillation frequency was removed, because (as outlined in Section 3.3.4) resonating noise can not be distinguished from a limit cycle in the simulations.

Without loss of generality, the phase noise of the VCO is incorporated in the input noise  $\phi_{in}$ . This is shown in Fig. 3.8: the phase noise of the VCO  $\phi_{vco}$  can be split and transferred such that  $\phi_{vco}$  directly adds to the output and also the input of the loop. In this way, the total noise contribution at the input can be written as  $\phi_{in,eq}$  where:

$$\phi_{in,eq} = \phi_{in} - \phi_{vco}$$

Furthermore, the addition of  $\phi_{vco}$  at the output is outside the feedback loop and does not influence the limit cycle behavior.

Further on,  $\phi_{in,eq}$  is approximated as white noise, which simplifies the pseudo-linear analysis. This simplification is valid in most CDR applications, where the input jitter is the dominant source of phase noise. However, if the phase noise of the VCO is not negligible with respect to the input noise then a more accurate model for the phase noise has to be used (see e.g. [16]).



Figure 3.8: The altered GSIDF model of a CDR with a BB-PD for the random Gaussian component (equivalent to the RIDF model in Fig. 3.4).

Now, Eqs. (3.14), (3.15), (3.17), (3.19) and (3.23) are combined to constitute a system of equations, where the different parameters are recursively dependent on each other. Every realistic solution of this system with 5 equations and 5 unknowns for a given value of  $\sigma_{in}$  indicates the existence of a limit cycle.

# 3.3.2 Algorithm

Eventually, we want to solve this system of equations such that we obtain the amplitude of the limit cycle  $A_e$  as a function of the input jitter  $\sigma_{in}$ . This however requires several calculation iterations due to the recursive dependencies. A way to circumvent this, is described in Algorithm 1. This algorithm calculates the input jitter  $\sigma_{in}$  as a function of the limit cycle amplitude  $A_e$  and consists of the following steps: firstly, the assumption is made that a limit cycle exists and hence the GSIDF analysis is applicable.

Assume limit cycle exists

Determine  $\omega_s$  and  $K_s^*$  which satisfy the Barkhausen criterion: Eq. (3.19)

 $\begin{array}{l} \text{for each } value \; A_e(i) \; of \; A_e \; \text{do} \\ & \left| \begin{array}{c} \sigma_e(i) \leftarrow \text{invert Eq.}(3.15) \; \text{for} \; A_e = A_e(i) \; \text{and} \; K_s = K_s^* \\ K_n(i) \leftarrow \text{evaluate Eq.} \; (3.14) \; \text{using} \; \sigma_e(i) \; \text{and} \; A_e(i) \\ \sigma_q(i) \leftarrow \text{evaluate Eq.} \; (3.17) \; \text{using} \; \sigma_e(i), \; A_e(i), \; K_s^* \; \text{and} \; K_n(i) \\ \sigma_{in}(i) \leftarrow \text{evaluate Eq.}(3.24) \; \text{using} \; K_s^*, \; K_n(i), \; A_e(i), \; \sigma_e(i) \; \text{and} \; \sigma_q(i) \end{array}$ 

```
end
```

Algorithm 1: Calculation procedure for obtaining  $K_s$ ,  $K_n$ ,  $A_e$ ,  $\sigma_e$  and  $\sigma_q$  which correspond to  $\sigma_{in}^2$ .

Subsequently, the amplification factor  $K_s^*$  that causes a limit cycle is determined according to Eq. (3.19). Thereafter, the amplitude of the limit cycle

 $A_e$  is swept. For each value of  $A_e$  represented by  $A_e(i)$  and given the gain  $K_s = K_s^*$ , the corresponding standard deviation of the noise component of the phase error  $\sigma_e(i)$  is calculated by inverting Eq. (3.15). This is a numerical procedure, but with contemporary numerical tools this can be easily determined. The obtained value of  $\sigma_e(i)$ , in addition to the given amplitude  $A_e(i)$ , gives rise to a sinusoidal gain  $K_s$  equal to  $K_s^*$ .

Then for each set of  $A_e(i)$  and  $\sigma_e(i)$ , we can immediately calculate  $K_n(i)$  and  $\sigma_q(i)$  by utilizing Eq. (3.14) and Eq. (3.17). Finally, Eq. (3.23) is rearranged, such that the corresponding  $\sigma_{in}^2(i)$  can be determined by Eq. (3.24):

$$\sigma_{in}^{2} = \frac{\sigma_{e}^{2} - \frac{\sigma_{q}^{2}}{B} \int_{B} |H_{2}(j\omega)|^{2} d\omega}{\frac{1}{B} \int_{B} |H_{1}(j\omega)|^{2} d\omega}$$
(3.24)

with  $H_1(j\omega)$  and  $H_2(j\omega)$  given by Eq. (3.21) and Eq. (3.22) respectively.

With the procedure described above,  $\sigma_{in}^2$  will give rise to the determined values of  $K_s$ ,  $K_n$ ,  $A_e$  and  $\sigma_e$ . Note that this algorithm requires no iterations in the calculation of  $\sigma_{in}^2$ ,  $K_s$ ,  $K_n$ ,  $\sigma_e$  and  $\sigma_q$  for a particular value of  $A_e$ .

# 3.3.3 Application of the Algorithm

Using the algorithm above, the relation between the amplitude of the limit cycle and the corresponding RMS input jitter  $\sigma_{in}$  is calculated. This is done for a CDR with the following parameters:  $f_{data} = 10 \text{ GHz}$ ,  $\omega_z = 2\pi \cdot 300 \text{ kHz}$  $\omega_0 = 2\pi \cdot 3 \text{ MHz}$ ,  $\omega_p = 2\pi \cdot 30 \text{ MHz}$  and  $T_d = 3 \text{ ns}$ . These parameters are of the order of the parameters used in a charge pump CDR developed in our group for the DISCUS project [17, 18]. The large delay is due to the parallelization of the BB-PD and the demultiplexing of the data in the BB-PD. In addition, we assume that random data is received at the input of the BB-PD. The probability that a transition occurs for the data sequence is 0.5 and this is thus equal to the transition density  $\alpha$ .

The calculated result is presented in Fig. 3.9. From the plot it is clear that in the case that no input jitter is present, the CDR has a limit cycle with a worst-case amplitude of  $A_{e,max}$ . In addition, Fig. 3.9 shows that above a certain value of  $\sigma_{in}$  there is no corresponding solution for  $A_e$ . This means that the noise is large enough to destroy the limit cycle. For lower input noise levels the limit cycle is stable. The transition point is called the threshold RMS input jitter  $\sigma_{in,th}$ . This is the predicted transition point where the CDR stops to have a limit cycle.



Figure 3.9: The limit cycle amplitude  $A_e$  as a function of the RMS input jitter  $\sigma_{in}$ . The simulation results where performed with:  $f_{data} = 10 \text{ GHz}$ ,  $\omega_z = 2\pi \cdot 300 \text{ kHz}$ ,  $\omega_0 = 2\pi \cdot 3 \text{ MHz}$ ,  $\omega_p = 2\pi \cdot 30 \text{ MHz}$  and  $T_d = 3 \text{ ns}$ .

## 3.3.4 Simulation Results

# **Output Power Spectra**

To validate the theory, several time domain simulation were performed. For a first batch of simulations, the same CDR is used as the one that was used for the calculation of Fig. 3.9. Some resulting power spectra of  $\phi_{out}$  for several values of RMS input jitter are given in Figs. 3.10, 3.11 and 3.12.

In Fig. 3.10, the RMS input jitter is equal to  $\sqrt{2} \cdot \sigma_{in,th}$ . According to the theory no limit cycle is present in the CDR and the prior art RIDF prediction [2] should perfectly match the simulation. The calculated RIDF prediction (according to Section 3.4) is also shown in Fig. 3.10 and it is clear that the simulation and the calculation match nearly perfectly. We can thus conclude that our theory correctly predicts that there is no limit cycle present in the CDR with a BB-PD. As a result the RIDF theory correctly models the behavior of the CDR with a BB-PD.

On the other hand, in Fig. 3.11, the RMS input jitter is equal to  $\frac{\sigma_{in,th}}{\sqrt{2}}$ . Now, the theory indicates that a limit cycle is present. We expect that the Random-Input Describing Function model is inadequate and there will be no match between the simulation and RIDF calculation. This is illustrated



Figure 3.10: The power spectrum  $S_{\phi_{out}}$  of the same CDR as in Fig. 3.9 for an input noise level  $\sigma_{in} = \sqrt{2} \cdot \sigma_{in,th}$ .



Figure 3.11: The power spectrum  $S_{\phi_{out}}$  of the same CDR as in Fig. 3.9 for an input noise level  $\sigma_{in} = \frac{\sigma_{in,th}}{\sqrt{2}}$ . The simulation results are compared to the prediction where the CDR does not contain any limit cycles: i.e. the RIDF and to the prediction where a limit cycle is present in the CDR: i.e. the GSIDF.



Figure 3.12: The power spectrum  $S_{\phi_{out}}$  of the same CDR as in Fig. 3.9 for an input noise level  $\sigma_{in} = \sigma_{in,th}$ .

in Fig. 3.11, where it is readily observed that the correspondence with the RIDF is poor. In this case however, the GSIDF prediction should be valid and is also compared to the simulation in Fig. 3.11. It is clear that it matches much better than the RIDF result. Nonetheless, there is a small discrepancy between the simulation results and the GSIDF prediction. The reason is that the self-oscillation in the GSIDF is modeled as a perfect sine wave (which corresponds to an infinitely narrow line in the spectrum). However, due to the noise in the system, the actual self-oscillation exhibits some phase noise (which corresponds to a wider peak). This effect is well known in the community of oscillator specialists (see e.g. [16, 19]), and is neglected here.

Additionally, the simulated power spectrum also shows a small peak around 100 MHz, which is the third harmonic of the limit cycle oscillation. It originates from the non-linearity of the BB-PD, which is lost in the linearized describing function model. Higher harmonics of the limit cycle, however, are greatly suppressed by the linear block G(s). This is confirmed by the fact that the third harmonic is very small and higher order harmonics are invisible. Apart from these two second-order effects, the GSIDF calculation matches the simulation almost perfectly.

Finally, for values of  $\sigma_{in}$  close to the threshold RMS input jitter  $\sigma_{in,th}$  there is a transition region between a false and a correct prediction by the RIDF

theory. Fig. 3.12 shows that the RIDF prediction coincides with the GSIDF prediction. These calculations are compared to the simulated results and Fig. 3.12 illustrates that the theory closely predicts the simulation results. However, the figures discussed above show that it is difficult to distinguish jitter peaking from a limit cycle. This makes it challenging to determine the actual amplitude of the limit cycle from the simulation results in the frequency domain. Therefore, the amplitude of the limit cycle is measured in the time domain as proposed in the next section.

## Amplitude Estimations of the Limit Cycle

Here, time domain simulations were performed with the same CDR parameters as those used in the calculations in Fig. 3.9. A random Gaussian noise source  $\phi_{in}$  is applied to the input of the behavioral model for which the variance  $\sigma_{in}^2$  is swept over multiple simulations. For each value of  $\sigma_{in}^2$ , the amplitude of the limit cycle  $A_e$  is estimated from the simulation results as follows: a curve fitting algorithm is used to match a sine wave to the time domain simulation data. This allows us to calculate the amplitude  $A_e$  of the limit cycle component in the signal  $\phi_e$  and the variance of the noise component of the phase error  $\sigma_e^2$ . However, a data transition does not occur every clock cycle and this influences the behavior of the limit cycle. Therefore, for each simulation the entire set of simulated data (2 million time steps) is divided into small parts which contain 10 limit cycle periods. The amplitude is estimated for each part and is then averaged out.

However, the curve fitting algorithm has a pitfall: even if the limit cycle amplitude is zero, this algorithm will estimate a non-zero (be it small) value for the limit cycle amplitude. This is due to the presence of noise power at the frequency where the amplitude is estimated. To detect this situation, the Signal-to-Noise Ratio (SNR) of the limit cycle is calculated as well. If this SNR is very small, it is concluded that the above described situation occurs and the solution is rejected. A SNR of -6 dB is taken as decision criterion: simulation results with a SNR lower than -6 dB are rejected.

The results of the simulations are added to Fig. 3.9. By comparing the simulation results with the calculated values, its is clear that the theory closely predicts the amplitude of the limit cycle. Furthermore, the (numerical) procedure is about three orders of magnitude faster than the simulation approach for an equal number of data points.

# 3.3.5 Influence of the CDR Design Parameters

Now that we are able to predict the amplitude of the limit cycle, the next step is to study the influence of the different CDR design parameters. In order to perform a useful study from a designers point of view, an asymptotic approximation is made of the limit cycle amplitude characteristic. This is also added to Fig. 3.9. The asymptotic approximation is made as follows: if there is a limit cycle, its amplitude is approximately the worst-case amplitude  $A_{e,max}$  and if the noise is larger than the threshold RMS input jitter  $\sigma_{in,th}$ , there is no limit cycle. In this way, the limit cycle amplitude characteristic is reduced to two essential, enveloping figures: i.e. the worst-case amplitude  $A_{e,max}$  and the threshold RMS input jitter  $\sigma_{in,th}$ . The influence of the different CDR design parameters is further examined, firstly for the worst-case amplitude and secondly for the threshold RMS input jitter.

### Worst-case Limit Cycle Amplitude

The worst-case amplitude of a limit cycle  $A_{e,max}$  and its dependence on the gain  $\omega_0$ , the pole  $\omega_p$  and the total loop delay  $T_d$  is investigated. As already mentioned, the zero  $\omega_z$  is assumed to be sufficiently small such that it has little influence. Therefore, this parameter is not considered. Also the cases where the bandwidth of the CDR becomes significant with respect to the data rate are rejected. This only occurs when both  $\frac{1}{T_d}$  and  $\omega_p$  are very large. Under these conditions, the derived describing functions are no longer valid.

The calculated and simulated results are displayed in Fig. 3.13. It illustrates the effect of  $\omega_0$ ,  $\omega_p$  and  $T_d$  on the worst-case limit cycle amplitude  $A_{e,max}$ . The plot shows that the worst-case limit cycle amplitude  $A_{e,max}$ and the gain  $\omega_0$  are linearly proportional. The pole  $\omega_p$  has only a modest effect on the worst-case limit cycle amplitude  $A_{e,max}$ : a large increase of the pole frequency  $\omega_p$  will only cause a small decrease in  $A_{e,max}$ . Furthermore for large values of  $T_d$ , although not obvious from the figure, there is also a linear relation between the delay  $T_d$  and the worst-case limit cycle amplitude  $A_{e,max}$ . However, for small values of  $T_d$ ,  $A_{e,max}$  rises less than proportional with increasing  $T_d$ . From Fig. 3.13, it can be concluded that the theory closely predicts the simulation results.

Now that the influence on the worst-case limit cycle amplitude is examined, it is important to investigate whether this limit cycle prevents correct data recovery. To assume successful data recovery, a reasonable threshold for the worst-case limit cycle amplitude  $A_{e,max}$  is chosen: i.e.  $\frac{\pi}{8}$ . This threshold is also displayed on Fig. 3.13, together with the worst-case am-

plitude  $A_{e,max}$  of the CDR design discussed earlier (Fig. 3.9). The plots show that most CDR designs (including the design discussed in Fig. 3.9) have a worst-case limit cycle amplitude  $A_{e,max}$  which is sufficiently small to successfully recover the input data. However, an increase in delay and in the gain of the linear block could lead to large amplitudes which can greatly influence the correct operation of the data recovery.



Figure 3.13: The worst-case limit cycle amplitude  $A_{e,max}$  as a function of the gain  $\omega_0$  for different pole frequencies  $\omega_p$  and delays  $T_d$ . The corresponding calculated results (solid lines) and simulation results (markers) are represented with the same color.

# Minimal Input Noise to Quench a Limit Cycle

As shown in the previous section, the worst-case amplitude of a limit cycle is sufficiently small in most CDRs. However, a limit cycle causes severe jitter peaking as demonstrated in Fig. 3.11. As a result, a limit cycle should be avoided in CDR application where the recovered clock is further utilized in the system. Therefore, it is interesting to study how much noise is needed to quench the limit cycle. The results of this study are shown in Fig. 3.14. Here, the threshold RMS input jitter  $\sigma_{in,th}$  (as defined above) is represented as a function of the CDR parameters. Both the theoretical result (based on the describing function theory) as well as the experimental result (obtained from simulations such as described above) are shown.



Figure 3.14: The threshold RMS input jitter  $\sigma_{in,th}$  as a function of the gain  $\omega_0$  for different pole frequencies  $\omega_p$  and delays  $T_d$ . The corresponding calculated results (solid lines) and simulation results (markers) are represented with the same color.

The effect of  $\omega_0$ ,  $\omega_p$  and  $T_d$  on the threshold RMS input jitter  $\sigma_{in,th}$  is illustrated by Fig. 3.14. The threshold RMS input jitter  $\sigma_{in,th}$  is directly proportional to the gain  $\omega_0$ . Additionally, the pole  $\omega_p$  has a modest effect on the threshold RMS input jitter  $\sigma_{in,th}$ . Fig. 3.14 also shows the effect of  $T_d$  on the threshold RMS input jitter  $\sigma_{in,th}$ . For large values of  $T_d$ , there is a linear relation between the total delay  $T_d$  and threshold RMS input jitter  $\sigma_{in,th}$ . For small values of  $T_d$ , the threshold RMS input jitter  $\sigma_{in,th}$  rises less than proportional with increasing delay.

Fig. 3.14 show that the theory accurately predicts the simulation results. Additionally, the CDR used in previous simulations (i.e. Fig. 3.9) is indicated on Fig. 3.14. This figure shows that the threshold RMS input jitter  $\sigma_{in,th}$  is equal to 21 mrad. In practice, a RMS input jitter of 4 ps is not uncommon for a data rate of 10 Gb/s. This corresponds to 250 mrad, which is more than sufficient to avoid limit cycles in the discussed CDR. Fig. 3.14 shows that, in general, there is enough noise present to avoid limit cycles. Only in designs where limited input jitter is expected, the loop characteristics should be evaluated to ensure no unwanted or excessive limit cycles arise.

Note that Fig. 3.14 and the relations described above are very similar to Fig. 3.13 and the relations with respect to  $A_{e,max}$ . Intuitively, a limit cycle with a higher worst-case amplitude  $A_{e,max}$  will require more input jitter to quench the limit cycle and thus results in a higher threshold RMS input jitter  $\sigma_{in,th}$ .

# **3.3.6** Further Analytical Approximations

While the previously developed theory matches excellently with the simulation results, it does not provide simple design intuition. This is because we still need to solve a system of equations, due to the interdependencies of  $\sigma_e$ ,  $\sigma_{in}$ ,  $\sigma_q$ ,  $K_n$ ,  $K_s$  and  $A_e$ , in order to find the results. To overcome this, analytical approximations are made in order to obtain closed form equations both for the worst-case amplitude  $A_{e,max}$  and the threshold RMS input jitter  $\sigma_{in,th}$ .

## Worst-Case Limit Cycle Amplitude A<sub>e,max</sub>

As shown in Fig. 3.9, the worst-case amplitude occurs for small values of the input noise level ( $\sigma_{in} \rightarrow 0$ ). Unfortunately, this does not allow a direct simplification of the describing functions of Eqs. (3.14) and (3.15) because they depend on the noise level  $\sigma_e$  at the input of the non-linear block and not on the overall input noise level  $\sigma_{in}$ . According to Eq. (3.23), the noise level  $\sigma_e$  is a complex function of the describing functions, the input noise level  $\sigma_{in}$  and the standard deviation of the linearization error  $\sigma_q$ .

By taking the limit of Eq. (3.24) for small loop bandwidths, a spectacular simplification can be obtained: the contribution of the linearization error (i.e.  $\sigma_q$ ) will be nearly entirely filtered out. Hence, in this case, the limit  $\sigma_{in} \rightarrow 0$  corresponds to  $\sigma_e \rightarrow 0$ . Now, the Gaussian-plus-Sinusoid-Input Describing Function collapses to the Sinusoidal-Input Describing Function given by Eq. 3.16 [8].

According to the Barkhausen criterion (Eq. (3.19)), this gain  $K_s$  has to be equal to  $K_s^*$  for a limit cycle to occur. Hence, the oscillation frequency  $\omega_s$  and  $K_s^*$  are described by the following relations:

$$\frac{\pi}{2} = \operatorname{atan}\left(\frac{\omega_s}{\omega_p}\right) + \omega_s T_d \tag{3.25}$$

$$K_s^* = \frac{\omega_s}{\omega_0} \sqrt{1 + \left(\frac{\omega_s}{\omega_p}\right)^2} \tag{3.26}$$

Eq. (3.25) defines the oscillation frequency  $\omega_s$  implicitly and should be inverted to evaluate  $\omega_s$ , but this is very simple. Note that  $K_s^*$  is fixed and only depends on  $\omega_0$ ,  $\omega_p$  and  $T_d$ .

Substituting Eq. (3.26) in Eq. (3.16), yields the maximum amplitude of the limit cycle  $A_{e,max}$ :

$$A_{e,max} \approx \frac{4\alpha}{\pi} \frac{\omega_0}{\omega_s} \frac{1}{\sqrt{1 + \left(\frac{\omega_s}{\omega_p}\right)^2}}$$
(3.27)

If the pole  $\omega_p$  is at a sufficiently high frequency relative to  $\omega_s$ , this equation can be further simplified:

$$A_{e,max} \approx \frac{4\alpha}{\pi} \frac{\omega_0}{\omega_s} \approx \frac{8\alpha}{\pi^2} T_d \,\omega_0 \tag{3.28}$$

From Eq. (3.27) and Eq. (3.28) it is clear that  $A_{e,max}$  is proportional to  $\omega_0$  and that there is a linear relation between the  $T_d$  and  $A_{e,max}$  (if  $\omega_p$  is at a sufficiently high frequency). This corresponds well to the results of Section 3.3.5.

In order to determine the accuracy of the made approximations, a scatter plot is displayed in Fig. 3.15. This plot shows the approximation of  $A_{e,max}$ (Eq. (3.27)) versus the simulation result of  $A_{e,max}$  obtained from many simulation runs. Here, the values of  $\omega_0$ ,  $\omega_p$  and  $T_d$  were varied over the same range as in Figs. 3.13 and 3.14. Also, three cases of the transition density  $\alpha$  were considered. From this figure it is clear that the approximate expression matches the simulation very well.

Additionally, the difference between the exact GSIDF calculation and the approximation of Eq. (3.28) was evaluated for the same range of parameter values as the simulations. It was found that the differences were as small as 0.1%. Hence, we can conclude that the approximation to neglect the contribution of the linearisation noise level  $\sigma_q$  is sufficiently accurate in this asymptotic case where the noise level goes to zero.

#### Minimal Input Noise to Quench a Limit Cycle

To find a simple approximation for the threshold RMS input jitter  $\sigma_{in,th}$ , we start from the observation that it corresponds to the case where the amplitude of the limit cycle  $A_e$  goes to zero. In this case, the GSIDF for the



Figure 3.15: A scatter plot of simulated  $A_{e,max}$  as a function of the approximation according to Eq. (3.27) for different values of  $\omega_0$ ,  $\omega_p$ ,  $T_d$  and  $\alpha$ .



Figure 3.16: A scatter plot of simulated  $\sigma_{in,th}$  as a function of the approximation according to Eq. (3.31) for different values of  $\omega_0$ ,  $\omega_p$ ,  $T_d$  and  $\alpha$ .

sinusoidal gain (Eq. (3.15)) reduces to:

$$K_s(A_e, \sigma_e)\big|_{\sigma_{in} = \sigma_{in,th}} = \lim_{A_e \to 0} K_s(A_e, \sigma_e) = \sqrt{\frac{2}{\pi}} \frac{\alpha}{\sigma_e}$$
(3.29)

Again, we face the problem that  $\sigma_e$  is a complex function of  $\sigma_{in}$ ,  $K_n$  and  $\sigma_q$ . To overcome this, once more the limit of Eq. (3.24) for a small loop bandwidth is taken, which results in  $\sigma_e \rightarrow \sigma_{in}$ .

By combining this approximation and Eq. (3.29) with the Barkhausen criterion, we obtain an explicit equation for  $\sigma_{in,th}$ :

$$\sigma_{in,th} = \sqrt{\frac{2}{\pi} \frac{\alpha}{K_s^*}} \tag{3.30}$$

By combining Eq. (3.27) with Eq. (3.30), we can cast this in the following form:

$$\sigma_{in,th} \approx \frac{1}{2} \sqrt{\frac{\pi}{2}} A_{e,max}$$
(3.31)

which clearly indicates the relation between  $A_{e,max}$  and  $\sigma_{in,th}$ .

The equation provides, in combination with Eq. (3.25) and Eq. (3.27), a very simple and fast way to assess the possibility of limit cycles in a CDR with a BB-PD. This fast assessment is illustrated in Section 6.2.6.

Analogous to Fig. 3.15, Fig. 3.16 shows a scatter plot of the approximation of Eq. (3.31) versus the entire batch of simulation results. It is clear that there is a good matching between the analytical approximation and the simulations.

Again, the difference between the exact GSIDF calculation and the approximation of Eq. (3.31) was evaluated for the same range of parameter values as the simulations. It was found that the differences were still within a 2.5% range. Hence, we can again conclude that our approximation is also sufficiently accurate in the asymptotic case where the limit cycle amplitude goes to zero.

# 3.4 Jitter Analysis in Charge Pump CDRs

Now that we assessed the stability of the CDR, we can rely on several publications which assume that the CDR operates in its normal working

area<sup>3</sup> to predict the key characteristics of a CDR: i.e. the jitter transfer, jitter generation, and jitter tolerance [2, 20-22].

An important work is [2], where describing functions are used to predict the jitter characteristics. For the sake of completeness, the most important aspects of [2] are summarized in this section.

#### 3.4.1 Jitter Transfer and Jitter Generation

To analyze the jitter transfer and jitter generation of the CDR, only random Gaussian noise is applied to the input of the CDR model (Fig. 3.2). Furthermore we assume that the CDR is stable and no limit cycle is present. The input of non-linearity can therefore only contain a random component and the Random-Input Describing Function (Section 3.2.1) can be used to model the non-linearity of the BB-PD.

The system relations are extracted from the RIDF model shown in Fig. 3.4 and the variance of the phase error  $\sigma_e^2$  is given by [2]:

$$\sigma_e^2 = \int_B S_{\phi_{in}} |H_1(j\omega)|^2 + S_{\phi_{vco}} |H_1(j\omega)|^2 + S_{\phi_q} |H_2(j\omega)|^2 \,\mathrm{d}\omega \quad (3.32)$$

$$|H_1(s)| = \left|\frac{1}{1 + K_n G(s)}\right|$$
(3.33)

$$|H_2(s)| = \left|\frac{G(s)}{1 + K_n G(s)}\right|$$
(3.34)

where  $S_{\phi_{in}}$ ,  $S_{\phi_{vco}}$  and  $S_{\phi_q}$  are the power spectral densities of the input random jitter, the VCO's phase noise and the linearization error, respectively. The noise bandwidth *B* reaches from DC to  $f_{data}/2$ , due to the fact that the system only reacts on a data edge and hence implicitly incorporates a sampling operation [2]. The gain  $K_n$ , the variance of the linearization error  $\phi_q$  and the linear transfer function G(s) are given by Eqs. (3.9), (3.10) and (3.4), respectively.

The RIDF gain  $K_n$  and the phase error variation  $\sigma_e^2$  can be calculated from the system of equations given by Eqs. (3.9)-(3.10) and Eqs. (3.32)-(3.34), when the power densities of the input random jitter  $S_{\phi_{in}}$ , the VCO's phase noise  $S_{\phi_{neo}}$  and the transition density  $\alpha$  are given [2].

<sup>&</sup>lt;sup>3</sup>That is, the CDR is stable and no limit cycle is present.

Subsequently, the solution can be used to determine the power spectral density of the output phase  $S_{\phi_{out}}$  [2]:

$$S_{\phi_{out}} = S_{\phi_{in}} \underbrace{K_n^2 |H_2(j\omega)|^2}_{\text{Jitter transfer}} + \underbrace{S_{\phi_{vco}} |H_1(j\omega)|^2 + S_{\phi_q} |H_2(j\omega)|^2}_{\text{Jitter generation}}$$
(3.35)

where the first term on the right-hand side in Eq. (3.35) corresponds to the input phase noise transferred to the output, while the rest corresponds to the phase noise generated by the internal circuits. This includes the contribution from the phase noise from the VCO and the contribution due to the linearization error of the BB-PD. The transfer functions  $H_1$  and  $H_2$  used in Eq. (3.35) are given by Eqs. (3.33)-(3.34).

#### 3.4.2 Jitter Tolerance

The jitter tolerance is an important metric that describes how much jitter the CDR can tolerate. This metric is measured by applying sinusoidal jitter to the CDR and determining the corresponding maximum amplitude that does not cause the Bit-Error Rate (BER) to exceed a target value (typically 1e-12) [2].

Due to this test procedure, there is a sinusoidal component  $\phi_{e,s}$  present at the input of the non-linearity next to a random Gaussian noise component  $\phi_{e,n}$ . This means that the GSIDF model described in Section 3.2.3 is required and two different linearized gains are used: one for the random Gaussian component  $K_n$  given by Eq. (3.14) and one for the sinusoidal component  $K_s$  given by Eq. (3.15). Please note that in this case, the sinusoidal wave is not caused by any instability or limit cycle but originates from the applied sinusoidal jitter at the input. The sinusoidal components  $\phi_{in,s}$  and  $\phi_{e,s}$  can therefore be written as:

$$\phi_{in,s} = A_{in} \sin\left(\omega_{in} t + \psi_{in}\right) \tag{3.36}$$

$$\phi_{e,s} = A_e \sin(\omega_{in}t + \psi_e) = A_e \sin\theta \qquad (3.37)$$

with  $A_{in}$  and  $A_e$  the amplitudes of the sinusoidal components of the input phase  $\phi_{in}$  and the phase error  $\phi_e$ , respectively.  $\omega_{in}$  represents the frequency of applied sinusoidal jitter and  $\theta$  the instantaneous phase of  $\phi_{e,s}$ .  $\psi_{in}$  and  $\psi_e$ describe the initial phases of the sinusoidal jitter  $\phi_{in,s}$  and  $\phi_{e,s}$ , respectively.

The GSIDF model of Fig. 3.5 is incorporated in the complete model of the CDR for the jitter tolerance measurements. This results in two block diagrams shown by Fig. 3.17. Similar to in Section 3.3.1, the input phase  $\phi_{in}$ , the phase error  $\phi_e$ , the output of the BB-PD  $\phi_u$  and the output of

the CDR  $\phi_{out}$  are the sum of their random Gaussian and their sinusoidal component, i.e. at every node x we can write:



Figure 3.17: The GSIDF model of a CDR with a BB-PD for the jitter tolerance measurements with the block diagram for (a) the sinusoidal component and (b) the random Gaussian component (identical to the RIDF model in Fig. 3.4).

The relation between the amplitude of the sinusoidal component of the input phase  $\phi_{in}$  and of the sinusoidal component of the phase error  $\phi_e$  can be derived from the corresponding transfer function, evaluated at the applied sinusoidal jitter frequency  $\omega_{in}$  [2]:

$$A_e = \frac{A_{in}}{|1 + K_s G(j\omega_{in})|} \tag{3.39}$$

The standard deviation of the random component of the phase error  $\sigma_e$  must be calculated by integrating its output power spectrum density over the noise bandwidth B [2]:

$$\sigma_e^2 = \int_B S_{\phi_{in}} |H_1(j\omega)|^2 + S_{\phi_{vco}} |H_1(j\omega)|^2 + S_{\phi_q} |H_2(j\omega)|^2 \,\mathrm{d}\omega \quad (3.40)$$

$$H_1(s) = \frac{1}{1 + K_n G(s)} \tag{3.41}$$

$$H_2(s) = -\frac{G(s)}{1 + K_n G(s)}$$
(3.42)

where the gain for the random Gaussian component  $K_n$ , the gain for the sinusoidal component  $K_s$  and the variation of the linearization error  $\sigma_q$  are given by Eqs. (3.14), (3.15) and (3.17).

Unfortunately, there are no closed-form formulas for the calculation of  $\sigma_e$  and  $A_e$ . Instead, the solution to the set of equations can be found via iteration. The procedure is described in Algorithm 2 [2].

Find initial values for  $K_n$  and  $\sigma_e$  assuming that there is only random jitter Derive initial values for  $K_s$  and  $A_e$ : use  $K_s = K_n$  and Eq. (3.39) **while** values for  $K_s$ ,  $K_n$ ,  $A_e$ ,  $\sigma_e$  and  $\sigma_q$  are not converged **do**  $K_s$  and  $K_n \leftarrow$  evaluate Eqs. (3.14) and (3.15) using  $A_e$  and  $\sigma_e$  $\sigma_q \leftarrow$  evaluate Eq. (3.17) using  $K_s$ ,  $K_n$ ,  $A_e$  and  $\sigma_e$  $A_e$  and  $\sigma_e \leftarrow$  evaluate Eqs. (3.39) and (3.40) using  $K_s$ ,  $K_n$  and  $\sigma_q$ **end** 

#### Algorithm 2: Jitter tolerance calculation procedure [2].

The solution for  $A_e$  and  $\sigma_e$  from Algorithm 2 is used to determine the probability density function of the phase error  $\phi_e (= \phi_{e,s} + \phi_{e,n})$ . The sinusoidal jitter component and the ramdom jitter component are independent of each other and the joined probability function is therefore given by the convolution of the probability density functions of the random Gaussian component  $\phi_{e,n}$  and sinusoidal component  $\phi_{e,s}$  [2]:

$$f_n(\phi_{e,n}) = \frac{1}{\sqrt{2\pi\sigma_e^2}} \exp\left(-\frac{\phi_{e,n}^2}{2\sigma_e^2}\right)$$
(3.43)

$$f_s(\phi_{e,s}) = \frac{1}{\pi \sqrt{A_e^2 - \phi_{e,s}^2}}$$
(3.44)

Substituting the calculated values of  $A_e$  and  $\sigma_e$  in the joined probability function, the BER can be calculated as the probability that the phase error  $\phi_e$  exceeds a prescribed margin. This way, the maximum amplitude of the applied sinusoidal jitter  $A_{in}$  that achieves a target BER can be determined. This is repeated for different jitter frequencies  $\omega_{in}$  to obtain the jitter tolerance [2].

# 3.5 AD-CDR Phase Domain Jitter Analysis

#### 3.5.1 Sampled-Data Mixed-Signal AD-CDR Model

The block diagram of the proposed All-Digital Clock and Data Recovery, shown in Fig. 2.8 and repeated here for convenience (Fig. 3.18), is converted to an elaborate sampled-data (mixed-type) discrete/ continuous-time AD-CDR model shown in Fig. 3.19(a).



Figure 3.18: A block diagram of the proposed next-generation All-Digital CDR.

The Bang-Bang Phase Detector (BB-PD) compares every clock period  $T_{clk}$  the phase of the input data  $\phi_{in}$  with the phase of the recovered clock  $\phi_{out}$  and determines whether the recovered clock is *Early* or *Late*. This subblock is modeled in Fig. 3.19(a) by two samplers, an ideal substraction and an edge detector. First, both the input and output phase are sampled and subtracted from each other. Similar to the phase domain model described in Section 3.1, the phase error  $\phi_{e,n}$  is then sent through an ideal comparator. The edge detector outputs every period  $1/f_{data}$  a '1', receptively a '0', when a data transition occurs or not. This signal is multiplied with the value of the comparator, resulting in a signal  $\phi_{u,n}$  that only adjusts the control signal when a data transition takes place.

The Digital Loop Filter (DLF) operates at a reduced speed: the digital clock frequency  $f_{dig} = 1/T_{dig}$  runs at 1/N-th of the recovered clock frequency  $f_{clk} = 1/T_{clk}$ . Therefore, the output of the BB-PD is first subsampled by a factor of N before it is processed by the Digital Loop Filter (DLF). The result of the DLF is a *digital signal* that controls the Digitally Controlled Oscillator (DCO). The subsampling and DLF behavior is modeled in Fig. 3.19(a) by sampling the *Early/Late* signal every digital clock period  $T_{dig}$ , processing it by a (discrete-time) transfer function  $H_{DLF}$ , quantizing the ouput of the DLF and creating a continuous-time signal through a ZOH.

The last block is the Digitally Controlled Oscillator (DCO), which changes the frequency of the recovered clock signal according to the output signal of the DLF. This block is modeled by the (continuous-time) transfer function  $H_{dco}$  together with the input-referred phase noise of the DCO  $\phi_{dco}$ .

#### 3.5.2 Aliasing

The discussion above highlights that the proposed AD-CDR is a multirate system which requires subsampling and interpolation operations. This subsampling operation results in spectral aliasing effects that may degrade the performance. Such effects become apparent at the AD-CDR output as increased close-in phase noise power spectral density due to the folding of high-rate noise.

To explain this, we start from the phase domain diagram of Fig. 3.19(a), which shows a model of our AD-CDR with the two noise sources:  $\phi_{DCO}$  stands for the input referred DCO phase noise and  $\phi_{in}$  represents the combined effect of the input referred noise of the BB-PD (essentially originating from input samplers) and the noise in the input signal. With regard to the in-band phase noise, the contribution of  $\phi_{DCO}$  can be neglected because when referred to the input, it will be attenuated by the loop filter which has high gain in the passband. Hence, we need to focus on  $\phi_{in}$ . To understand what happens with this component we should focus on the input part of the AD-CDR.

The BB-PD performs two functions: sampling and quantization of the phase error. The sampling is explicitly shown in the figure by the sampling switch. Now, we can transform the input branch of the CDR by moving both samplers  $T_{clk}$  and  $T_{dig}$  in front of the subtraction block at the input, which is shown in Fig. 3.20. Here we can see that both  $\phi_{in}$  as well as the output phase  $\phi_{out}$  are sampled by  $T_{clk}$  and then subsampled by  $T_{dig}$ . Now, due to the low-pass operation of the loop,  $\phi_{out}$  has a low-pass spectrum with a bandwidth much lower than sampling frequency after subsampling. Depending on the loop filter settings, the bandwidth of  $\phi_{out}$  will be in the range of 10-100 MHz, while the residual sampling frequency after subsampling with e.g. a factor N = 16 is 1.56 GHz. This means that the sampling and subsampling operation does not affect the spectrum of  $\phi_{out}$ . However  $\phi_{in}$  is a wideband noise signal, with a bandwidth that may be much larger than  $f_{clk}$ . E.g. the noise component coming from an input sampler has the same bandwidth as the sampler and necessarily this bandwidth must be well above  $f_{clk}$  to sample the input signal successfully. Hence in the first sampling step all the noise energy aliases to the first Nyquist band (from DC to







Figure 3.20: The phase noise through the AD-CDR: a transformed phase domain model of the input branch

 $f_{clk}/2$ ). This is illustrated in Fig. 3.20 as well. When this signal is subsequently subsampled by a factor N, another aliasing step occurs, where all the noise energy aliases again in the band from DC to  $f_{clk}/(2N)$ . Since the same noise energy is now contained in a smaller bandwidth, this implies that the effective input noise spectral density is increased by a factor N.

#### 3.5.3 Discrete-Time Multi-Rate Modeling of AD-CDR

The sampled-data (mixed-signal) discrete/continuous-time model is transformed to a discrete-time multirate model shown by Fig. 3.19(b). The advantage over the continuous-time sample-data model is that it leads to compact analytical formulations.

In Fig. 3.19(b), the sampled continuous-time signals are indicated by the superscript \* (i.e. the star-operator [23]). For example, the signal  $\phi_{in}^*(t)$  symbolizes the sequence of samples (or numbers) which can be related to the continuous time signal  $\phi_{in}(t)$  [23]. Correspondingly,  $\phi_{in}^*(s)$  denotes the periodic expansion of the spectrum of  $\phi_{in}(s)$ . In other words,  $\phi_{in}^*(s)$  is the  $\mathcal{Z}$ -transformed  $\phi_{in}(z)$  signal evaluated in  $z = \exp(sT)$ , where  $\phi_{in}(z)$  is the  $\mathcal{Z}$ -transformed signal of the discrete signal  $\phi_{in}(nT)$  that was obtained by sampling the signal  $\phi_{in}(t)$  every period T [23].

Here, we assume that only random jitter is applied at the input of the AD-CDR and that the AD-CDR does not contain any limit cycles. Therefore, no sinusoidal component is present at the input of the non-linearity and the non-linearity can be modeled by the RIDF (discussed in Section 3.2.1).

The continuous-time transfer function of the DCO and the ZOH discrete/ continuous-time interface, is converted to a discrete-time  $f_{clk}$ -rate transfer function using the impulse-invariant method. That is by mathematically sampling the continuous-time phase impulse response of the DCO at the  $f_{clk}$ -rate in order to capture its development at the rising edges of  $f_{clk}$  and taking the  $\mathcal{Z}$ -transform [24]:

$$\mathcal{Z}\left\{\mathcal{L}^{-1}\left\{H_N(s)\frac{K_{DCO}}{s}\right\}\Big|_{t=nT_{clk}}\right\} = H_N(z)\frac{T_{Clk}K_{DCO}}{z-1} \qquad (3.45)$$

where the s-domain transfer function of the ZOH interpolation filter is given by:

$$H_N(s) = \frac{1 - e^{-sT_{dig}}}{s}$$
(3.46)

and  $H_N(z)$  represents the  $f_{clk}$ -rate z-domain transfer function of the discretetime ZOH interpolation filter:

$$H_N(z) = \frac{1 - z^{-N}}{1 - z^{-1}} \tag{3.47}$$

Furthermore, the model for the subsampling and DLF is adjusted. First, the sampling by  $T_{dig}$  at the input of the DLF is replaced by a decimation block  $\downarrow N$ . Second, an additional interpolation block  $\uparrow N$  is placed in front of the ZOH block. This block allows us to define a  $f_{dig}$  domain for the DLF without altering the system relations. Finally, the quantizer is omitted from the model because the effect of quantizer on the control signal of the DCO will be negligible compared to the strong quantization behavior of the BB-PD.

The complete linearized discrete-time multirate AD-CDR model serves as the basis for the Linear Time-Variant (LTV) anaysis of the AD-CDR which captures the spectral aliasing effects that are associated with the multirate operation of the AD-CDR. It accounts for the coexistence of the two clock domains ( $f_{clk}$  and  $f_{dig}$ ) as well as the primary noise sources per clock domain. These are the phase noise of the input data  $\phi_{in}^*$ , the linearization error of the BB-PD  $\phi_q^*$  and the input-referred phase noise of the DCO  $\phi_{dco}^*$ . As a summary, all AD-CDR model parameters in Fig. 3.19(b) are listed in Table 3.1.

| Symbol                                                   | Definition                                                                                                                                           |
|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| ω                                                        | The discrete-time angular frequency that defines the                                                                                                 |
|                                                          | $f_{clk}$ clock domain. $\omega \in [-\pi,\pi]$                                                                                                      |
| $\omega_{dig}$                                           | The discrete-time angular frequency that defines the                                                                                                 |
|                                                          | $f_{dig}$ clock domain. $\omega_{dig} = N\omega$                                                                                                     |
| $\phi_{in}^*(\omega)$                                    | The spectrum of the $f_{clk}$ -rate sampled input phase.                                                                                             |
| $\phi_e^*(\omega)$                                       | The spectrum of the $f_{clk}$ -rate sampled phase error.                                                                                             |
| $\phi_q^*(\omega)$                                       | The spectrum of the $f_{clk}$ -rate sampled linearization                                                                                            |
|                                                          | error of the BB-PD. The standard variation $\sigma_q$ is de-                                                                                         |
|                                                          | fined by Eq. $(3.10)$ .                                                                                                                              |
| $\widehat{TW}(\omega)$<br>$\widetilde{TW}(\omega_{dig})$ | The spectrum of the $f_{clk}$ -rate sampled output signal of the BB-PD.                                                                              |
|                                                          | The spectrum of the $f_{dig}$ -rate downsampled output signal of the BB-PD.                                                                          |
| $\phi^*_{dco}(\omega)$                                   | The spectrum of the input referred $f_{clk}$ -rate sampled                                                                                           |
|                                                          | phase noise of the DCO.                                                                                                                              |
| $\phi_{out,n}^*(\omega)$                                 | The spectrum of the $f_{clk}$ -rate sampled output phase                                                                                             |
| ,                                                        | of the AD-CDR.                                                                                                                                       |
| $K_n$                                                    | The RIDF gain of the BB-PD, defined by Eq. $(3.9)$ .                                                                                                 |
| $H_{DLF}(z)$                                             | The $f_{dig}$ -rate signal transfer function of the DLF, defined by:                                                                                 |
|                                                          | $H_{DLF}(z) = K_p \cdot z^{-D_{K_p}} + K_i \cdot \frac{z^{-D_{K_i}}}{1 - z^{-1}}  (3.48)$                                                            |
|                                                          | where $K_p$ and $K_i$ are the respective gains of the proportional path and integral path, and $D_{K_p}$ and $D_{K_i}$ are the corresponding delays. |
| $H_N(z)$                                                 | The $f_{clk}$ -rate transfer function of the ZOH filter that                                                                                         |
|                                                          | interpolates the $f_{dig}$ -rate output of the DLF, defined<br>by Eq. (3.47).                                                                        |
| $H_{DCO}(z)$                                             |                                                                                                                                                      |
|                                                          | by Eq. (3.45).                                                                                                                                       |

Table 3.1: The model parameters of the complete linearized discrete-time multirate AD-CDR model shown in Fig. 3.19(b).

#### 3.5.4 LTV Analysis of Subsampled AD-CDR

The discrete-time multi-rate model given in Fig.3.19(b) has two major issues that complicate the jitter analysis: first, the BB-PD is higly non-linear and describing functions are needed to allow a pseudo-linear analysis. Second, a LTV analysis is required to capture all spectral aliasing effects that are associated with the actual digital multirate operation [24]. The issue of non-linearity is already addressed in Sections 3.1-3.4. To deal with the issue of subsampling, our jitter analysis of an AD-CDR is based on the LTV analysis of an All-Digital Phase Locked Loop (AD-PLL) discussed in [24].

It is, however, impractical to analytically solve both issues simultaneously, because it would severely increase the complexity and calculation time. Therefore, it is assumed that describing function gain  $K_n$  of the RIDF model (for the non-linearity) is fixed and known when the analysis of the multi-rate AD-CDR system is performed.

The describing function gain  $K_n$  will be estimated from simulation results and combined with the Linear Time-Variant analysis described below. As a result, the jitter contributions from different noise sources in the AD-CDR can be determined.

Similar to [24], a frequency-approach is followed for the LTV analysis of the AD-CDR using the discrete-time multirate model of Fig.3.19(b). The full calculation procedure is added to Appendix A and the results are summarized here.

From the LTV analysis, the power spectral density of the AD-CDR output phase for uncorrelated noise sources is obtained as:

$$S_{\phi_{out,n}^*}(\omega) \simeq S_{\phi_{out,lti}^*}(\omega)$$
$$- |H_{alias}(\omega)|^2 \sum_{n=1}^{N-1} \left| 1 + T\left(\omega - n\frac{2\pi}{N}\right) \right|^2 S_{\phi_{out,lti}^*}\left(\omega - n\frac{2\pi}{N}\right)$$
(3.49)

where  $S_{\phi_{out,lti}^*}(\omega)$  is the output phase spectrum that is predicted after a traditional Linear Time-Invariant (LTI) analysis:

$$S_{\phi_{out,lti}^*}(\omega) = |H_{n,dco}(\omega)|^2 S_{\phi_{DCO}^*}(\omega) + N \left| \frac{H_{alias}(\omega)}{K_n} \right|^2 S_{\phi_q^*}(\omega) + N |H_{alias}(\omega)|^2 S_{\phi_{in}^*}(\omega)$$
(3.50)

The different transfer functions in these equations are defined as:

$$T(\omega) = H_{DCO}(\omega)H_N(\omega)H_{DLF}(N\omega)\frac{K_n}{N}$$
(3.51)

$$H_{n,dco}(\omega) = \frac{H_{DCO}(\omega)}{1 + T(\omega)}$$
(3.52)

$$H_{alias}(\omega) = \frac{T(\omega)}{1 + T(\omega)}$$
(3.53)

Eqs. (3.49)–(3.53) describe the discrete-time multi-rate model of the proposed subsampling AD-CDR. In Section 3.7.2, these equations are used to predict the contributions from the different noise sources to the output and the effect of subsampling in the AD-CDR.

# **3.6 CID in Subsampled AD-CDR**

#### 3.6.1 Idle Time

The AD-CDR should be able to deal with data sequences where the BB-PD does not receive data edges (and, hence, does not generate *Early* nor *Late* signals) for many clock cycles. In the case without subsampling, this occurs if the input data contains a long sequence of CID. If this happens the output of the phase detector is stuck at zero and the feedback is broken such that the CDR operates temporarily in open loop. This means that the oscillator runs freely and any frequency difference between the input data rate and the recovered clock frequency will cause a linear increase or decrease of the phase difference over time. In a prolonged open-loop situation, this phase drift will exceed a unit interval causing the AD-CDR to lose its lock, which means that the CDR operation is disrupted. For an input sequence of 'k' CIDs, the idle time of the CDR without subsampling is given by:

$$T_{idle} = \frac{k}{f_{data}} \tag{3.54}$$

where  $f_{\text{data}}$  corresponds to the input data rate.

In the case of subsampling, the loop filter operates at lower frequency and the total idle time  $T_{idle}$  corresponding to a CID input sequence of length k will be:

$$T_{idle} = \frac{N \cdot ceil\left(\frac{k}{N}\right)}{f_{data}} \tag{3.55}$$

This means, that the idle time due to k CID input bits for the case with subsampling is almost equal to the case without subsampling.

However, regardless of the CIDs in the full rate input data, it can happen that after subsampling the phase detection output consists of a long idle sequence of length l (without any *Early* nor *Late* pulse). This occurs, for example, when the popular Pseudo Random Bit Sequence of length  $2^{31} - 1$  (PRBS31) is applied.

A PRBS31 overly stresses the robustness against long idle sequences. Normally bit shuffling or line coding is incorporated to avoid the occurrence of long idle sequences, and hence these long idle sequences may not be realistic usage scenario. Notwithstanding, we decided that our CDR should be able to have full functionality with this test sequence. Such a PRBS31 sequence contains sequences of k = 31 consecutive identical digits and without subsampling, this corresponds to an idle time of:

$$T_{idle} = \frac{k}{f_{data}} = \frac{31 \text{ bit}}{f_{data}}$$
(3.56)

Moreover, PRBS sequences have several properties. If a PRBS sequence is subsampled, it yields the same PRBS (but with another seed, which only results in a time delay). Thus, a PRBS31 sequence that is subsampled by N results in a PRBS31 sequence but now running at  $\frac{f_{data}}{N}$ . Again, this PRBS31 contains a maximum of 31 consecutive identical bits.

In our AD-CDR, the PRBS sequence is not subsampled, but instead the PD output is subsampled. This PD output is derived from the edge information in such a PRBS sequence and the edge information corresponds a differentiated version of the PRBS sequence. Now, let us denote this as a Differentiated Pseudo Random Bit Sequence (DPRBS). The DPRBS31 again has the same generator polynomial as PRBS31. As a result, subsampling of DPRBS31 also yields the same PRBS31 sequence, but delayed.

The subsampled output of the PD will have a consecutive sequence with l = 31 idle ('stuck at 0') values. This corresponds to an idle time of:

$$T_{idle,subsampled} = \frac{N \cdot l}{f_{data}} = N \frac{31 \text{ bit}}{f_{data}}$$
(3.57)

Hence, in this test scenario, the idle time of the subsampled CDR is N times larger than for the case without subsampling.

#### 3.6.2 Phase Drift

There are three components that cause a phase drift during the idle time of the AD-CDR: the finite DCO resolution, the accumulated noise in the DLF and the phase noise of the DCO.

Let us first focus on the systematic phase drift between the DCO output and the ideal clock corresponding to the input data which occurs during an idle sequence. This phase drift  $\Delta \phi$  is due to the frequency difference  $\Delta f$  between the DCO frequency and the frequency of the input data. Any frequency difference will lead to a linear increase or decrease of the phase difference over time, and hence for a certain idle time:

$$\Delta \phi = 2\pi \Delta f T_{idle} \tag{3.58}$$

So in order to maintain correct operation, under all circumstances the absolute value of the phase drift should be less then  $\pi$ , which leads to:

$$\Delta f_{\text{worst,case}} | \le \frac{1}{2T_{idle}} \tag{3.59}$$

So clearly, to tolerate a long idle sequence, the DCO must have a sufficiently high resolution such that quantization error is small. This way, the DCO frequency will be closer to the desired input data frequency. And hence, when the loop temporarily opens due to an idle sequence, the corresponding phase drift will remain acceptable.

Next to this, all the noise contributions that are accumulated in the integrating branch of the DLF can also contribute to the frequency error when the AD-CDR receives a long CID sequence.

The last effect that lowers the maximum tolerable idle sequence is given by the random walk process of the phase of the recovered clock during openloop operation [25]. This random phase drift has the same effect as the systematic phase drift and lowering the phase noise of the DCO will reduce this random walk process.

In this work, the trade-off between the subsample factor, the phase noise characteristic and the DCO frequency resolution was primarily studied through simulation (see Section 3.7).

### 3.7 Simulations of Subsampled AD-CDR

#### 3.7.1 Model

To validate the theory of the LTV analysis of the AD-CDR and to ensure the correct operation of the AD-CDR with long CID sequences, several simulations were performed.

The simulation are based on a phase domain model (Fig. 3.19(a)) and have been performed using simulink/matlab. Fig. 3.21 gives an overview of the complete testbench, while Figs. 3.22(a) and (b) show the details of the BB-PD block and the Digital Loop Filter block. In these figures, the red color is used to indicate the blocks which operate at the data rate  $f_{data}$  while the green color represents the signals and blocks running at the subsampled rate  $f_{dig}$ .

As shown in Fig. 3.21, the input of the AD-CDR is generated by the sum of a random jitter source (random number) and a frequency offset (ramp). For the jitter analysis, only the random noise source is used, while the random jitter source was used in combination with the frequency offset to mimic the worst-case situation for the CID robustness simulations.

The block diagram of the BB-PD (Fig. 3.22(a)) depicts that every data period  $T_{data}$  (= 1/25 GHz), the phase at the input ('data') is compared to phase of the recovered clock ('RCLK'). A uniform random process in the interval [0, 1], generated by 'random occurrence of data transitions' is used to simulate the probability of data transition: if this random process outputs a value lower than 0.5, the model behaves as if a data transition has occurred. In this case, the output of the BB-PD is equal to the sign of the phase error. In the other case, the output of the phase detector is set to zero. Therefore, only 50% of the *Early/Late* signals are different from zero as would be the case in a real CDR due to the 50% probability of receiving a data transition. Additional functionality is added to the blockdiagram of the BB-PD to simulate long CID sequences. In this test case, the output of the BB-PD is fixed to zero for a set length *l*.

The *Early/Late* signals are sent to the digital loop filter shown by Fig. 3.22(b). This digital loop filter in the simulation model operates at 1/N times the data rate. Consequently, the *Early/Late* signal is subsampled with a factor N, before it is further processed. The digital loop filter contains a proportional path and an integral path with a quantizer and saturation block. Both paths are added and constitute the driving signal of the DCO.

To incorporate non-idealities, noise is added to the control signal of the DCO. This resulting signal is amplified by the gain of the DCO  $K_{dco}$  and integrated by the DCO operation. The output phase noise of the recovered clock is fed back to the phase detector (Fig. 3.21).



Figure 3.21: The simulation model of the proposed AD-CDR. The red and green color indicate the  $f_{data}$ -rate and  $f_{dig}$ -rate operation, respectively.



Figure 3.22: Details of the simulation model of the proposed AD-CDR in Fig. 3.21 with (a) the BB-PD building block and (b) the DLF building block. The red and green color indicate the  $f_{data}$ -rate and  $f_{dig}$ -rate operation, respectively.

#### 3.7.2 Phase Noise Simulations

First, we present the phase noise simulations with the simulink behavioral model together with the results from the LTV analysis. The simulations results discussed here are performed with the component values which are extracted from the measurements discussed in Chapter 6. In Chapter 6, the simulated phase noise is then compared to the measurement results.

The standard deviation of the input noise is set to 60 mrad, while the inputreferred noise of the DCO is set such that the free-running DCO has a phase noise of  $-95 \,\mathrm{dBc/Hz}$  at 10 MHz. The gain of the proportional path  $K_p$  and of the integral path  $K_i$  are set respectively to 5 and  $2^{-7}$ . The delay in the proportional path and in the integral path are respectively  $D_{K_p} = 3$  and  $D_{K_i} = 10^4$ . Finally, the gain of the DCO  $K_{dco}$  is set to 1.8 MHz/LSB.

From the simulation results, the standard deviation of the phase error  $\sigma_e$  is numerically calculated. Using Eq. (3.9), the describing function gain  $K_n$  is estimated. Subsequently, this value is plugged in the LTV analysis to determine the contribution of the different noise sources in the AD-CDR.

The simulation results for the presented AD-CDR with subsample factors N = 16 and N = 32 are shown by Figs. 3.23(a) and (b), respectively. The figures show the simulated spectrum of the output phase together with the contributions of the DCO noise, the input noise, the linearization error of the BB-PD and the contribution due to aliasing (LTV). Clearly, the contribution of the LTI model which comprises the contribution of the DCO, input and BB-PD noise component, completely determines the output phase of the AD-CDR. Furthermore, the calculated total output noise according to the LTV analysis matches perfectly with the simulated result. Please note that the steep downward spikes in the calculated total output noise are numerical glitches. The comparison between both figures also shows that an increase of the subsample factor will raise the in-band noise of the AD-CDR.

#### 3.7.3 Robustness Against CID

To test if the CDR can tolerate idle sequences (after subsampling) of length l=100, the operation of the phase detector is switched by a 'Pulse Generator' from normal operation to an operation where the output of the phase detector is fixed to zero for *l* bits (Fig. 3.22(a)). Additionally, the CDR is

<sup>&</sup>lt;sup>4</sup>Note that here, an extra delay was added to the latency of the DLF to include and simulate the total propagation delay in the AD-CDR circuit.



Figure 3.23: Phase noise simulations with the different noise contributions derived from the LTV analysis with subsample factors: (a) N = 16 and (b) N = 32.



Figure 3.24: An example of the simulation results for the case where the subsampled PD output consists of l = 100 idle values (= 64 ns idle time )

tested in a worst-case, where the maximum frequency offset is  $0.5 \times$  Least Significant Bit (LSB) of the DCO.

The phase difference between the DCO and the input is observed and should remain within the range given by the bathtub curve of the phase detector (Chapter 6). An example of the simulation results is given in Fig. 3.24. The figure shows an example when the subsampled PD output consists of l idle values: the output of the phase detector (*Early/Late*) remains zero. This idle sequence lasts:

$$T_{idle} = \frac{100 \text{ bits}}{1.5625 \cdot 10^9 \text{ bits/s}} = 64 \text{ ns}$$

In Fig. 3.24, the phase errors are also plotted during the idle sequences. These phase errors do not exceed  $\pm \pi \cdot \frac{20ps}{40ps}$  ( = the range in which the phase detectors operate error free). Therefore the CDR operates error free and is able to withstand this idle sequence.

From our simulation results, we confirmed the ability to withstand an idle sequence of l = 100 subsampled bits. If this subsampled idle sequence

would originate from a full rate CID sequence, this would correspond to k = 1600 identical input bits.

# 3.8 Discussion

One of the major challenges for the design and implementation of an AD-CDR (see Section 2.5.3) is the complicated analysis of the non-linear and subsampled system.

This chapter addresses this challenge by firstly investigating the stability of a Bang-Bang CDR. Until now, authors have always assumed that there is enough noise in a Bang-Bang CDR such that no limit cycle occurs. In this work, a pseudo-linear analysis based on describing functions is used to investigate this. The analysis allows to calculate the worst case amplitude of a limit cycle and to determine the minimal amount of noise necessary to avoid limit cycling as a function of the different CDR loop parameters. For this, the simple analytical approximations of Eqs. (3.25), (3.28) and (3.31) were found, which can be used for a fast assessment of the limit cycle sensitivity.

Based on our analysis, it appears that in most CDR systems, there is sufficient noise present to avoid limit cycling. Even in the case that the input jitter level is too small to avoid limit cycling, it is still likely that the amplitude of the limit cycle will be small enough to allow a correct data recovery operation. However, in this case the recovered clock will contain significant jitter peaking, which may be unacceptable. The most dangerous situation occurs when the CDR loop filter has a large delay and a high linear gain.

Secondly, the analysis method from [2] is summarized, which predicts the key jitter characteristics of a CDR. This analysis is extended in this work to incorporate the subsampling operation of an AD-CDR: an LTV analysis is used to capture the spectral aliasing effects that are not captured by linear time-invariant models such as in [2]. This LTV analysis has not been performed in literature for an (All-Digital) CDR and is used to determine the effect of subsampling and the contributions of different noise sources to the output phase noise of the AD-CDR.

Finally, the robustness against long CID sequences is investigated and verified using a Simulink model. The simulation results show that the AD-CDR is able to withstand an idle sequence of l = 100 subsampled bits.

After this analysis and prediction of the behavior of the AD-CDR, we can proceed with the discussion of the design of the AD-CDR.

# References

- Behzad Razavi, Design of Integrated Circuits for Optical Communications, 2nd ed. Hoboken, New Jersey, USA: John Wiley & Sons, Inc., 2012.
- [2] Myeong-Jae Park and Jaeha Kim, "Pseudo-Linear Analysis of Bang-Bang Controlled Timing Circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 6, pp. 1381–1394, jun 2013.
- [3] Byungjin Chun and Michael P. Kennedy, "Statistical Properties of First-Order Bang-Bang PLL With Nonzero Loop Delay," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 55, no. 10, pp. 1016–1020, oct 2008.
- [4] B. Razavi, "Challenges in the design high-speed clock and data recovery circuits," *IEEE Communications Magazine*, vol. 40, no. 8, pp. 94–101, aug 2002.
- [5] J. Savoj and B. Razavi, "A 10-Gb/s CMOS clock and data recovery circuit with a half-rate binary phase/frequency detector," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 1, pp. 13–21, jan 2003.
- [6] J. Lee and B. Razavi, "A 40-Gb/s clock and data recovery circuit in 0.18-μm CMOS technology," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 12, pp. 2181–2190, dec 2003.
- [7] William F. Egan, Advanced Frequency Synthesis by Phase Lock. Hoboken, NJ, USA: John Wiley & Sons, Inc., jul 2011.
- [8] Arthur Gelb and Wallace Vander Velde, Multiple Input Describing Functions and Nonlinear System Design. New York: McGraw-Hill, 1968.
- [9] M. Zanuso, D. Tasca, S. Levantino, A. Donadel, C. Samori, and A.L. Lacaita, "Noise Analysis and Minimization in Bang-Bang Digital PLLs," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 11, pp. 835–839, nov 2009.

- [10] N. Da Dalt, "A Design-Oriented Study of the Nonlinear Dynamics of Digital Bang-Bang PLLs," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 52, no. 1, pp. 21–31, jan 2005.
- [11] Raymond Flynn and Orla Feely, "Limit cycles in Digital Bang-Bang PLLs," in 2007 18th European Conference on Circuit Theory and Design, aug 2007, pp. 731–734.
- [12] N. Da Dalt, "Linearized Analysis of a Digital Bang-Bang PLL and Its Validity Limits Applied to Jitter Transfer and Jitter Generation," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, no. 11, pp. 3663–3675, dec 2008.
- [13] Dan Liu, Philipp Basedau, Markus Helfenstein, James Wei, Thomas Burger, and Yangjian Chen, "A Frequency-Based Model for Limit Cycle and Spur Predictions in Bang-Bang All Digital PLL," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 59, no. 6, pp. 1205–1214, jun 2012.
- [14] Giovanni Marucci, Salvatore Levantino, Paolo Maffezzoni, and Carlo Samori, "Analysis and Design of Low-Jitter Digital Bang-Bang Phase-Locked Loops," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 1, pp. 26–36, jan 2014.
- [15] Jae-Yong Ihm, "Stability Analysis of Bang-Bang Phase-Locked Loops for Clock and Data Recovery Systems," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 60, no. 1, pp. 1–5, jan 2013.
- [16] A. Demir, A. Mehrotra, and J. Roychowdhury, "Phase noise in oscillators: a unifying theory and numerical methods for characterization," *IEEE Transactions on Circuits and Systems I: Fundamental Theory* and Applications, vol. 47, no. 5, pp. 655–674, may 2000.
- [17] Arno Vyncke, "A low power, multi-rate clock-and-data recovery circuit and MAC preprocessor for 40 Gbit/s cascaded bit-interleaving passive optical networks," Ph.D. dissertation, Ghent University, 2016.
- [18] FP7 Grant Agreement 318137, "The DIStributed Core for unlimited bandwidth supply for all Users and Services," 2017. Available on URL: http://www.discus-fp7.eu/
- [19] T.H. Lee and A. Hajimiri, "Oscillator phase noise: a tutorial," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 3, pp. 326–336, mar 2000.

- [20] Richard Walker, "Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems," in *Phase-Locking in High-Performance Systems: From Devices to Architectures*, Behzad Razavi, Ed. Wiley-IEEE Press, 2003, pp. 34–45.
- [21] Jri Lee, K.S. Kundert, and B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1571–1580, sep 2004.
- [22] Youngdon Choi, Deog-Kyoon Jeong, and Wonchan Kim, "Jitter transfer analysis of tracked oversampling techniques for multigigabit clock and data recovery," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 50, no. 11, pp. 775–783, nov 2003.
- [23] Jeroen De Maeyer, "Efficient architectures for A/D-converters in discrete and continuous time," Ph.D. dissertation, Ghent University, 2006.
- [24] Ioannis L. Syllaios and Poras T. Balsara, "Linear Time-Variant Modeling and Analysis of All-Digital Phase-Locked Loops," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 59, no. 11, pp. 2495–2506, nov 2012.
- [25] A.A. Abidi, "Phase Noise and Jitter in CMOS Ring Oscillators," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 8, pp. 1803–1816, aug 2006.

# AD-CDR Architecture and Design

This chapter gives an overview of the design of the proposed All-Digital Clock and Data Recovery (AD-CDR) circuit. It starts with the system architecture and is followed by an in-depth discussion covering the most critical building blocks. This also includes an elaborate comparison between the conventional and the newly proposed Inverse Alexander Phase Detector (PD).

# 4.1 System Architecture

The architecture of the AD-CDR is shown in Fig. 4.1. As discussed in Section 2.6, it comprises a Bang-Bang Phase Detector (BB-PD), a subsampling block, a Digital Loop Filter (DLF) and a Digitally Controlled Oscillator (DCO).

To relax the circuit requirements, the DCO operates at the quarter-rate and outputs 8 clock signals that have equally spaced clock phases. These clock phases are sent to the BB-PD and are used to sample the data. With a 25 Gb/s data input, a Bang-Bang Phase Detector is the optimal choice, because this type of PD provide simplicity in design, good phase adjustment and can work at high speeds [1]. Additionally, BB-PDs have the advantage that the output is already digital, making this PD very suitable to drive the Digital Loop Filter. However, the operating frequency of the PD is too high to allow direct synthesis of the loop filter [2]. To reduce its operating

frequency, the phase information is subsampled N=16 times, indicated with  $\downarrow N$  in Fig. 4.1. This means that once out of every N clock cycles the output of the BB-PD is sent to the DLF. The subsampled signal is filtered by the DLF and the resulting signal controls the DCO such that the phase error is reduced. Furthermore, because the input data is sampled with these quarterrate clock signals, the output data is automatically parallelized, simplifying further processing.



Figure 4.1: The system diagram of the AD-CDR.

# 4.2 Bang-Bang Phase Detector

In recent years, some new BB-PD topologies have come up [3, 4]. However, these topologies have either a high complexity or the data retiming and the phase detection operation are not performed in the same circuit. The latter requires an explicit decision circuit and intrinsically introduces skew between both circuits. This is unfavorable for high speed data communication applications [5]. Therefore, the Alexander PD topology [6], including variations such as the half-rate, the quarter-rate, the multilevel, and the majority-voting variant, is the most commonly used PD in high speed designs with data rates larger than 10 Gb/s.

In this section, the Inverse Alexander PD is introduced which is an improvement over the established and well known conventional Alexander PD without increasing the complexity of the circuit. This is verified by comparing the Bit-Error Rate (BER) performance of this improved PD with the conventional Alexander PD for multiple cases of duty-cycle distortion and subsample factors.

#### 4.2.1 Comparison of Alexander and Inverse Alexander PD

#### **The Conventional Alexander PD**

The conventional Alexander phase detection is based on three successive data samples which are sampled at twice the data clock frequency. In a typical full-rate Clock and Data Recovery (CDR), this is done by sampling the data both on the rising and the falling edges of the full-rate recovered clock *Clk*. By monitoring the differences between the three sampled values, it can be detected whether a data edge has occurred and if this data edge occurs before or after the corresponding clock edge. A typical implementation is based on 2 wideband input data samplers, 2 delaying flip-flops and is shown in Fig. 4.2(a). For the actual phase detection, the 3 successive samples, available at nodes  $S_0$ ,  $S_1$  and  $S_2$  are used.

To understand the operation, 3 possible waveforms are considered in Fig. 4.3. First, the ideal locking condition is shown in Fig. 4.3(a). In this case, the value of sample  $S_1$  is undefined and in practice due to noise the PD will randomly produce an *Early* or a *Late* pulse. Fig. 4.3(b) shows the case where the clock edge leads on the data edge (*Early*) and Fig. 4.3(c) shows the case where the clock edge lags on the date edge (*Late*). In the absence of data transitions (not shown in the figure), all three samples  $S_0$ ,  $S_1$  and  $S_2$  are equal and the xor gates (Fig. 4.2(a)) will set both the *Early* and the *Late* signals to zero. These relations are summarized as [5]:

| Early : $S_0 \oplus S_1 = 0$ ,             | $S_1 \oplus S_2 = 1$ | $\rightarrow$ | Clk frequency $\downarrow$ |
|--------------------------------------------|----------------------|---------------|----------------------------|
| Late : $S_0 \oplus S_1 = 1$ ,              | $S_1 \oplus S_2 = 0$ | $\rightarrow$ | Clk frequency $\uparrow$   |
| Others : $S_0 \oplus S_1 = S_1 \oplus S_1$ | $\oplus S_2$         | $\rightarrow$ | Do not adjust clk          |

Fig. 4.3(a) shows that once the CDR has settled, the samples  $S_0$  and  $S_2$  correspond to two successive data output  $(D_{out})$  samples, while sample  $S_1$  occurs at the transition of the data.

#### The Inverse Alexander PD

The proposed Inverse Alexander PD is shown in Fig. 4.2(b). Obviously, it has the same schematic as the Alexander PD, but the *Early* and the *Late* signal are interchanged, which leads to an inversion of the sign in the CDR loop:

| Early : $S_0 \oplus S_1 = 1$ ,  | $S_1 \oplus S_2 = 0$ | $\rightarrow$ | Clk frequency $\downarrow$ |
|---------------------------------|----------------------|---------------|----------------------------|
| Late : $S_0 \oplus S_1 = 0$ ,   | $S_1 \oplus S_2 = 1$ | $\rightarrow$ | Clk frequency $\uparrow$   |
| Others : $S_0 \oplus S_1 = S_1$ | $\oplus S_2$         | $\rightarrow$ | Do not adjust clk          |



Wideband data samplers



Figure 4.2: (a) The conventional Alexander PD and (b) the Inverse Alexander PD circuit.



Figure 4.3: Waveforms for the locking behavior of the Alexander PD : (a) Ideal locking condition with phase difference  $\Delta \phi = 0.5$  UI; (b) *Early* condition; (c) *Late* condition.

The inversion of the sign in the CDR loop causes the CDR to settle to a different equilibrium point. As shown in Fig. 4.4(a), the Inverse Alexander PD will align the rising edges of the clock signal with the data edges. If the rising edge of the clock leads (is *Early*), the first sample,  $S_0$ , is unequal to the last two and the clock frequency must decrease (Fig. 4.4(b)). Vice versa, if the rising edge of the clock lags (is *Late*), the last sample,  $S_2$  differs from the first two and the clock frequency must increase (Fig. 4.4(c)). In lock, the middle sample,  $S_1$ , corresponds with the data sample  $D_{out}$  while the other sample moments  $S_0$  and  $S_2$  occur at the data transitions.



Figure 4.4: Waveforms for the locking behavior of the Inverse Alexander PD : (a) Ideal locking condition with phase difference  $\Delta \phi = 0$  UI; (b) *Early* condition; (c) *Late* condition.

#### 4.2.2 PD Characteristics

#### **Full-Rate Operation**

The output characteristic of both the conventional and the Inverse Alexander PD are shown in Figs. 4.5(a)-(b). Here it is assumed that all waveforms are ideal (as in Figs. 4.3 and 4.4). If a data edge occurs, either a 1-bit *Early* or *Late* pulse will be generated, which for both PDs results in the well known bang-bang action. For both PDs, there is only one stable locking point, which corresponds to a phase shift of half a Unit Interval (UI) for the conventional and to zero phase shift for the Inverse Alexander PD (also indicated on the figure).

However, in practice the waveforms are not ideal and several imperfections occur. E.g. if the recovered clock operates at quarter-rate (as is the case for the prototype, see Section 4.3), there will definitely be imperfections affecting the sampling times for the samples  $S_0$ - $S_2$  (Fig. 4.3 and Fig. 4.4). Also the input data waveform will be imperfect and exhibits e.g. pulse width jitter and unequal rise and fall times. These effects translate to duty-cycle distortion which affects the behavior of both PDs.



Figure 4.5: Simplified (single pulse) PD output characteristics at full rate operation for the case of ideal waveforms: (a) the Alexander PD , (b) the Inverse Alexander.



Figure 4.6: Simplified (single pulse) PD output characteristics at full rate operation for the case of duty-cycle distortion: (a) the Alexander PD and (b) the Inverse Alexander PD.

Duty-cycle distortion means that the duration of a logic-'0' differs from the duration of a logic-'1' [7]. The notations  $T_0$  and  $T_1$  are used to represent respectively the duration of an occurrence of a single logic-'0' and a single logic-'1' affected by duty-cycle distortion; where the sum of  $T_0$  and  $T_1$  always equals 2 UI. Note that when  $T_1$  equals 1 UI, there is no duty-cycle distortion and when  $T_1 < 0.5$  UI, the duty-cycle distortion is too large to have any useful operation of the CDR. The reciprocal case when  $T_1 > T_0$ , is analogous.

To examine the influence of duty-cycle distortion, the output characteristics of both PDs are determined and shown in Figs. 4.6(a)-(b) for the artificial case of a data stream with a single logic-'1' data pulse. This means that there are two consecutive data transitions. When examining this case, it turns out that apart from the normal *Early* and *Late* cases, two anomalous states occur. The first anomalous case, shown in Fig. 4.7(a), occurs around

the locking point of the conventional PD. Here an *Early* pulse is immediately followed by a *Late* pulse. If the PD is operated at full speed, this will be filtered by the low pass loop filter and essentially translate in a net null action. The second anomalous case is shown in Fig. 4.7(b) and is most relevant for the Inverse Alexander variant. Both the *Early* and *Late* signal are simultaneously active. This is normally an illegal state, but in practice most loop filters (e.g. the popular charge pump [5] and also the DLF used in the prototype) deal with this situation by interpreting this as a net null action.

Both these anomalous cases occur for phase errors near the equilibrium locking point and broaden the locking point into a locking region which is illustrated in Figs. 4.6(a)-(b). For the conventional Alexander the locking range corresponds to the *Early immediately followed by Late* case, whereas for the Inverse Alexander the locking range corresponds to the *simultaneous both Early and Late* case. Despite of this difference, both cases are almost equivalent when the PDs are operated at full rate.



Figure 4.7: PD waveforms for data with duty-cycle distortion corresponding to the anomalous cases (a) Alexander *Early* immediately followed by *Late* (most relevant for conventional Alexander), and (b) Simultaneous *Early* and *Late* (most relevant for Inverse Alexander).

#### **Subsampled Operation**

When the PD is subsampled only one out of *N* of the PD output values is used. When the PD characteristics is studied for the case of ideal waveforms, the same result as Figs. 4.5(a)-(b) is still obtained. However, in the case of duty-cycle distortion, the *simultaneous both Early and Late* case remains unchanged, but the *Early immediately followed by Late* case is altered because one of the 2 successive samples will be lost and since the data is not correlated to the subsampling process, randomly either *Early* or *Late* will be selected as shown in Fig. 4.8. This means that a significant amount of excess random jitter is injected in the loop which will increase the probability of bit errors. This problem occurs in the locking region of the conventional Alexander PD and not for the Inverse Alexander PD. For



Figure 4.8: Simplified (single pulse) PD output characteristics at subsampled rate operation: (a) the Alexander PD for the case of duty-cycle distortion and (b) the Inverse Alexander PD for the case of duty-cycle distortion.

this reason the Inverse Alexander PD is expected to have a greatly improved performance when the PD is subsampled.

#### 4.2.3 Performance

To study the performance, simulations were performed. A CDR with an Alexander PD is compared with a CDR containing an Inverse Alexander PD. For both CDRs the loop parameters are equal and the locking behavior and the BER performance are discussed.

#### **Locking Behavior**

The simulation result of the locking behavior is illustrated in Fig. 4.9. Here, random data with a duty-cycle distortion was applied to a CDR with the conventional PD and to an identical CDR with the Inverse Alexander PD. The corresponding simulated input data eye diagram is shown in Fig. 4.9(a). This eye diagram shows that the duty-cycle distorted input data contains 4 possible data transition instants. Two instants correspond to a '101' pulse with the duration of the logic-'0' larger than 1 UI while the other two transitions originate from a '010' pulse with a pulse width smaller than 1 UI. The corresponding infinite persistence waveforms of the recovered clocks of both CDRs are shown as well (Figs. 4.9(b)-(c)). From these waveforms, the following observations can be made: first, the recovered clocks of the conventional Alexander CDR and the Alexander CDR have a phase difference of about 0.5 UI and respectively the falling and rising edge of the recovered clocks are aligned to middle of the falling and rising edges of the duty-cycle distorted input signal. Hence, both recovered clocks are almost each others inverse. Second, it is clear that the recovered clock of the



Inverse Alexander CDR exhibits much less jitter than the conventional one.

Figure 4.9: Simulink simulations of the locking behavior in the case of a pronounced duty-cycle distortion with (a) the eye diagram of input data, (b) the persistence view of the recovered clock of a CDR with a conventional Phase Detector, and (c) the persistence view of the recovered clock of a CDR with an Inverse Alexander Phase Detector.

#### BER

In the case that no subsampling is present in the system, the expectation is that the BER curves for both systems are equal, because the characteristics of both PDs (Fig. 4.5) are identical apart from the phase shift in  $\Delta\phi$ .

Hence the system will react in a similar way and this results in the coincidence of the BER curves. This is confirmed by the simulations as shown in Fig. 4.10(a). Additionally, Fig. 4.10(a) illustrates that the BER becomes worse when the duty-cycle distortion increases. This is because the shorter logic-'1' levels become more susceptible to jitter. As mentioned above, the cases for  $T_1 = 0.8$  UI and  $T_1 = 1.2$  UI are analogous and result in the same BER curves. These curves are omitted from the figures.

Fig. 4.10(b) illustrates the BER for a 4-times subsampled CDR. In this case the Inverse Alexander PD performs better than the Alexander PD. This is because subsampling causes the output characteristic of the PDs to change: especially the Early immediately followed by Late-zone which resulted in a net zero action when no subsampling was present. Due to the subsampling only one signal of the two subsequent Early and Late signals is sampled. In this way, the *Early* and *Late* signal will not cancel out rapidly as was the case without subsampling. This will undesirably lead to an adjustment of the frequency and cause fluctuations in the phase difference. For the Alexander PD this results in a worse BER, because its locking region is located in the Early immediately followed by Late-zone. For the Inverse Alexander PD, the BER will also degrade due to the lower update rate of the error signal. However, this degradation is less severe because the output characteristic in the locking region for the Inverse Alexander PD (i.e. simultaneous both Early and Late-zone) remains the same when subsampling is applied. Fig. 4.10(b) shows that the Inverse Alexander PD is consistently better than the conventional Alexander PD and the difference becomes more pronounced for high levels of duty-cycle distortion and/or low levels of jitter, e.g.: for a typical case with 0.05 UI RMS input jitter and a subsample factor of 4, the Inverse Alexander PD reaches a BER which is 20 times better than the BER for the Alexander PD.

#### 4.3 Digitally Controlled Oscillator

For the DCO, a quarter-rate architecture [8] is used. This means that the DCO operates at one fourth of the data speed, and provides the required sample-time resolution in the form of 8 uniformly-phase-shifted clock phases. This can conveniently be realized by a 4-stage differential ring oscillator and significantly relaxes the requirements on the clock buffers and BB-PD circuitry. For a 25 Gb/s data input, this means that the DCO frequency will be 6.25 GHz. The operation of the quarter-rate topology is illustrated in Fig. 4.11 where a '1010...' waveforms of the input data sequence and of the 8 different clock phases are shown for the case that the AD-CDR is



Figure 4.10: The BER performance: (a) no subsampling; (b) subsample factor = 4.

#### Early.

Using the quarter-rate clock phases, the input data is sampled. In the ideal locking condition, the even clock phases are perfectly aligned with the data edges, while the odd clock phases are in the middle of the data symbol, which is the ideal sample moment. Per clock period, there are 4 sets of three consecutive samples and each set of three consecutive samples can be used by the Inverse Alexander PD operation to generate an Early/Late signal. In the design, only 1 out of these 4 Early/Late signals is used: i.e. only clock phases  $Clk_0$ ,  $Clk_1$  and  $Clk_2$  are used to gather the phase information (Early/Late). Of course, still all the data need to be recovered, which can be done by using the odd clock phases to sample the input data. The net result is that clock phases  $Clk_4$  and  $Clk_6$  are not used and that the phase information is already subsampled by a factor of 4 in the PD.



Figure 4.11: Waveforms of a '1010' data sequence and the 8 clock phases when the AD-CDR is *Early*. The red clock phases correspond to edge-related samples and the black to data-related samples (as in Fig. 4.4).

Due to the use of this quarter-rate topology, the output of the BB-PD is thus automatically parallelized which demonstrates that the quarter-rate operation significantly relaxes the requirements on the clock buffers and BB-PD circuitry and simplifies further processing.

#### 4.4 Digital Loop Filter

A typical DLF consists of a proportional and integral path and can be described by the discrete-time transfer function  $H_{DLF}(z)$  given by:

$$H_{DLF}(z) = K_p \cdot z^{-D_{K_p}} + K_i \cdot \frac{z^{-D_{K_i}}}{1 - z^{-1}}$$
(4.1)

where  $K_p$  and  $K_i$  are the respective gains of the proportional path and integral path, and  $D_{K_p}$  and  $D_{K_i}$  are the corresponding delays. In the implemented DLF, both the proportional and integral gain setting can be adapted, while the delays are hard wired. The delay in the proportional path and in the integral path are a consequence of the implementation discussed in Chapter 5 and are respectively  $D_{K_p} = 2$  and  $D_{K_i} = 9$  digital clock cycles. Especially the delay in the proportional path should be limited in order to avoid stability issues, but with the expected jitter in the CDR loop, this delay ( $D_{K_p} = 2$ ) is low enough to ensure its stability [9].

Note that this DLF is connected directly following the subsampling block (Fig. 4.1) to allow automatic synthesis of the entire DLF. Consequently, the proportional and integral path are equally affected by the subsampling.

#### 4.5 Subsampling

In the 40 nm Low Power CMOS process used in this work, the maximal clock speed should not exceed 1.75 GHz to enable an automated design (synthesis, place and route) of this DLF. This means that, even with the subsampling by a factor of 4 that already occurs in the PD, the operating frequency at the output of the BB-PD is still too high: e.g. if the CDR operates at 25 Gb/s the output of the BB-PD operates at 6.25 GHz. Hence, this operating frequency has to be further reduced to facilitate the implementation of the DLF. Therefore, the output of the BB-PD is additionally subsampled by a factor of 4. The subsampling is thus realized in two steps. First, the subsampling operation is incorporated in the quarter-rate PD. Secondly, additional subsampling is implemented before the data is processed by the DLF. Overall, this means that the DLF will only receive an output signal of the PD once out of every N (=16) data periods. In Fig. 4.1, the subsampling corresponds to the block ' $\downarrow N$ '.

Although a higher level of subsampling would further reduce the area and the power of the DLF, a higher subsample factor will not lead to an overall optimal power efficiency. This is because the CDR should be able to deal with data sequences where the BB-PD does not receive data edges (and hence does not generate *Early* nor *Late* signals) for many clock cycles.

As discussed in Section 3.6, an adequate robustness to these long idle sequences can be maintained for an increasing value of the subsampling factor N if the DCO phase noise and resolution are improved accordingly. This indicates that there is a trade-off for the subample factor N in the sense that increasing N will decrease the power consumption of the DLF but increase the required power consumption in the DCO. The behavioral simulations (Section 3.7.3) indicate that choosing N = 16 is an adequate compromise. According to simulation, with this setting, the circuit should be able to tolerate input data streams which after PD subsampling have an idle (sub-sampled) sequence length l of over 100.

#### 4.6 Discussion

This chapter presents the architecture of the proposed AD-CDR. The major challenge for implementing an AD-CDR is, as discussed in Section 2.5.3, the reduction of the operating speed of the automatic synthesized DLF. In this work, this is implemented by subsampling the phase information N = 16 times.

The challenge of a high-speed, energy efficient phase detector (Section 2.5.3) is faced in this chapter by introducing the newly proposed Inverse Alexander phase detector. With minimal effort – inverting the sign of the Alexander phase detector –, all the advantages of the conventional Alexander phase detector are maintained while improving the BER significantly in the situation that the CDR uses subsampling.

Finally, a DCO completes the AD-CDR architecture. For the design of the DCO in our AD-CDR, a quarter-rate architecture is used. This means that the DCO operates at one fourth of the data speed, and provides the required sample-time resolution in the form of 8 uniformly-phase shifted clock phases. For a 25 Gb/s data input, this means that the DCO frequency will be 6.25 GHz. The outputs of the BB-PD, i.e. the Early/Late-signal and recovered data, are thus automatically parallelized. This demonstrates that the quarter-rate operation significantly relaxes the requirements on the clock buffers and BB-PD circuitry and simplifies further processing.

#### References

- [1] Jri Lee, K.S. Kundert, and B. Razavi, "Analysis and modeling of bangbang clock and data recovery circuits," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1571–1580, sep 2004.
- [2] Mrunmay Talegaonkar, Rajesh Inti, and Pavan Kumar Hanumolu, "Digital clock and data recovery circuit design: Challenges and tradeoffs," 2011 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–8, 2011.
- [3] David Rennie and Manoj Sachdev, "A Novel Tri-State Binary Phase Detector," in 2007 IEEE International Symposium on Circuits and Systems, may 2007, pp. 185–188.
- [4] Fei Yuan, "A new current-integrating bang-bang phase detector for clock and data recovery," in *Midwest Symposium on Circuits and Systems*, aug 2008, pp. 898–901.
- [5] Behzad Razavi, Design of Integrated Circuits for Optical Communications, 2nd ed. Hoboken, New Jersey, USA: John Wiley & Sons, Inc., 2012.
- [6] J.D.H. Alexander, "Clock recovery from random binary signals," *Electronics Letters*, vol. 11, no. 22, pp. 541–542, 1975.
- [7] Mihai Marcu, Sriram Durbha, and Sanjeev Gupta, "Duty-cycle distortion and specifications for jitter test-signal generation," 2008 IEEE International Symposium on Electromagnetic Compatibility, pp. 1–4, aug 2008.
- [8] Arno Vyncke, Guy Torfs, Chris Van Praet, Marijn Verbeke, Alex Duque, Dusan Suvakovic, Hungkei Chow *et al.*, "The 40Gbps cascaded bit-interleaving PON," *Optical Fiber Technology*, vol. 26, pp. 108–117, 2015.

[9] Marijn Verbeke, Pieter Rombouts, Arno Vyncke, and Guy Torfs, "Influence of Jitter on Limit Cycles in Bang-Bang Clock and Data Recovery Circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 6, pp. 1463–1471, jun 2015.

# 5 Circuit Implementation

After the system analysis in Chapter 3 and the illustration of the design in Chapter 4, this chapter discusses the implementation of the All-Digital Clock and Data Recovery (AD-CDR) circuit. A top-down approach is given, which starts with the description of the top-level implementation. Subsequently, the implementation of each building block is given in detail, i.e.: the Bang-Bang Phase Detector (BB-PD) & subsampling, the Digital Loop Filter (DLF), the Digitally Controlled Oscillator (DCO) and the calibration.

#### 5.1 Top-Level Implementation

The top-level implementation of the All-Digital Clock and Data Recovery is shown in Fig. 5.1. In the physical partitioning, the exploitation of automated digital tools is maximized. Therefore, a part of the BB-PD is pushed after the subsampling such that it could also be automatically synthesized. The result is that the BB-PD and the subsampling block are intertwined. The implementation consists of 6 high-speed samplers followed by a retiming block, a subsampling block and the (automatically synthesized) *phase detection logic*. Additionally, the AD-CDR comprises an automatically synthesized digital loop filter, a clock divider and a DCO.

The 6 high-speed samplers are driven each by their own 6.25 GHz clock phase coming from the DCO. 4 samplers out of the 6 are used to sample the



Red is used for edge-related samples and black for data-related samples (as in Fig. 4.4). Figure 5.1: The block diagram of AD-CDR implementation (speeds are indicated for 25 Gb/s operation). data, while the other 2 samplers are used to sample the edges. As mentioned in Section 4.3, 2 out of the 8 uniformly-phase-shifted DCO clock phases are not used.

In the retiming block, all the collected samples (i.e. 4 data samples and 2 edge samples) are aligned to 1 clock phase. The retimed samples of the data constitute the recovered data (the actual AD-CDR output), while the phase information, which adjusts the AD-CDR to reduce the phase error, is subsampled to 1.56 Gb/s. This phase information is sent to the synthesized digital block (running at 1.56 GHz) where first, the *phase detection logic* calculates the *Early* and *Late* signals. These are then further processed by the DLF which controls the quarter-rate DCO.

#### 5.2 **BB-PD** and Subsampling

The implementation of the BB-PD & Subsampling comprises two parts: a full custom designed block and the automatically synthesized *phase detection logic*. A more detailed view of the full custom block consisting of the high-speed samplers, the retiming block and the subsampling block is given in Fig. 5.2.

#### 5.2.1 Sampler

First, the incoming 25 Gb/s data is sampled with a high-speed sampler which is implemented as a sense amplifier flip-flop [1–6]. The sense amplifier flip-flop has a fast sense amplifier input followed by a slower regenerative latch (Fig. 5.3). The fast sense amplifier has the advantage that it has a very short time window in which the data is captured. This way the data can be correctly sampled with a quarter-rate clock phase. Furthermore, the regenerative dynamic latch holds the data.

This makes the sense amplifier flip-flop an ideal choice for a subsampling stage, which needs to capture the high-speed input data very quickly, but has relaxed requirements on the clock-to-output delay. Therefore also the timing requirements of the subsequent digital gates to process the sampled data at a reduced speed are relaxed. The device sizes of the implemented sense amplifier flip-flop shown in Fig. 5.3 are summarized by Table 5.1.

#### 5.2.2 Retiming

The 6 sampled data signals (4 corresponding to actual data samples and 2 corresponding to edge samples) are sent to the retiming block, which aligns



a subsampling block (speeds are indicated for  $25~\mathrm{Gb/s}$  operation). Figure 5.2: A detail of the full custom part of the BB-PD & Subsampling, which contains 6 samplers, a retiming block and



Figure 5.3: The sampler circuit: sense amplifier flip-flop with a fast sense amplifier input and a slower regenerative latch.

| Transistor     | L     | W                  |
|----------------|-------|--------------------|
| M0             | 40 nm | $8.4\mu\mathrm{m}$ |
| M1 - M4        | 40 nm | $2.4\mu{ m m}$     |
| M5 - M6        | 40 nm | $4.8\mu\mathrm{m}$ |
| M7 - M10       | 40 nm | $3.6\mu{ m m}$     |
| M11 - M12      | 40 nm | $1.2\mu{ m m}$     |
| M13            | 40 nm | $0.6\mu{ m m}$     |
| M14            | 40 nm | $1.2\mu{ m m}$     |
| M15 - M16      | 40 nm | $1.8\mu{ m m}$     |
| Invertor: pmos | 40 nm | $2.4\mu{ m m}$     |
| Invertor: nmos | 40 nm | $1.2\mu{ m m}$     |

Table 5.1: The device sizes of the sense amplifier flip-flop shown in Fig. 5.3

the samples to one clock phase.

For this, two types of dynamic flip-flops clocked with the opposite clock edge are used. The sampled input data from clock phases zero to three, is retimed by an array of positive edge triggered dynamic flip-flops of type I (Fig. 5.4). This is a standard dynamic flip-flop, shown in Fig. 5.5(a). 3 of these retimed samples that contain the information of two edges ( $Edge_0$ ,  $Edge_1$ ) and one intermediate data symbol ( $D_{out0}$ ), are used for the phase alignment but first have to be subsampled (see Section 5.2.3).



Figure 5.4: The retiming circuit consisting of an array of retiming type I (postive edge triggered) flip-flops and an array of type II (negative edge triggered) flip-flops. Red is used for edge-related samples and black for data-related samples (as in Fig. 4.4).

To relax the timing requirements of the flip-flops, the sampled input data from clock phases five and seven is retimed by an array of type II (negative edge triggered) dynamic flip-flops (Fig. 5.4). This type is clocked with the opposite clock edge compared to type I, but an additional half clock cycle delay is incorporated (Fig. 5.5(b)) such that all samples are retimed to the same clock edge.

The devices sizes of the dynamic flip-flops shown in Fig. 5.5 are summarized by Table 5.2.



Figure 5.5: The flip-flops used in the retiming circuit: (a) type I (positive edge triggered) dynamic flip-flop and (b) type II (negative edge triggered) dynamic flip-flop.

| Transistor  | L               | W              |
|-------------|-----------------|----------------|
| M1, M5, M9  | $40\mathrm{nm}$ | 0.6 µm         |
| M2, M6, M10 | $40\mathrm{nm}$ | $0.6\mu{ m m}$ |
| M3, M7, M11 | $40\mathrm{nm}$ | $1.2\mu{ m m}$ |
| M4, M8, M12 | $40\mathrm{nm}$ | $0.6\mu{ m m}$ |

Table 5.2: The device sizes of the dynamic flip-flops shown in Fig. 5.5.

#### 5.2.3 Subsampling

Before the phase alignment information can be sent to the digital block, this information has to be subsampled by a factor of 4 (Fig. 5.2). This subsampling is performed in two steps (Fig. 5.6), where for each step the clock frequency is first divided by two and then applied as clock signal to an array of three type I dynamic flip-flops. Because the input data of the flip-flops is twice the speed of the corresponding clock input, the data is subsampled by a factor of 2. Overall, the input data is thus subsampled by a factor of 4 and the clock signal is divided by 4. This divided clock is used as clock signal for the digital block.



Figure 5.6: The subsampling circuit.

#### 5.2.4 Digital Phase Detection Logic

Next to the full custom blocks, the BB-PD & Subsampler comprises the synthesized *digital phase detection logic*. This part is automatically generated from a Verilog description, which corresponds to the schematic shown in Fig. 5.7. It compares the consecutive samples and determines whether the clock leads or lags the data, according to the Inverse Alexander operation [7].

#### 5.3 Digital Loop Filter

The implementation of the automatically generated DLF is shown in Fig. 5.8. The DLF receives an Early/Late signal from the phase detection logic and this signal is then processed by a proportional and an integral path. The proportional path directly amplifies the Early/Late signals with  $-K_p$  and  $K_p$ , respectively. To maintain the stability of the AD-CDR, the delay in this path is minimized and the implementation is made as simple as possible.



Figure 5.7: The digital phase detection logic.

To achieve this,  $K_p$  is always an integer and the output is a 7-bit thermometer code. Now, the proportional path can simply be implemented by selecting or deselecting ' $K_p$ ' of the thermometer-coded output bits. These bits directly drive the fine-tuning input of the DCO (see Section 5.4). This configuration allows the gain  $K_p$  to be set between 0 and 7.

The integral path of the DLF is implemented as a multi-rate architecture. That is, a Clk/2-domain is created to reduce the clock speed which facilitates the implementation of the accumulator. Therefore, the Early/Late signal is demuxed by a factor of 2. The internal accumulator has a high resolution of 16 bit. This allows the use of a broad range of integral gains  $K_i$ , which can be set to integer powers of 2. However, to avoid a bulky DCO design, only the 5 most significant bits of this 16 bit word are converted to a 31 bit thermometer-coded word which drives the DCO. In contrast to a binaryweighted coding, this thermometer coding increases the robustness against parasitic effects and reduces glitches when switching between states. In total, the DCO is controlled (in standard operation) by 45 (=7+7+31) bits each driving a unit varactor which corresponds to a resolution of 5.5 binaryweighted bits.

Furthermore, there are some signals shown in Fig. 5.8 that are not used in normal operation: first there is a '*from FD*' signal, which is used in the calibration process of the DCO (see Section 5.5) and which can be activated by the control signal '*Calibration*'. Second, there is also a '*a fixed DCO setting*' signal which is only used for debug purposes and gives the ability to characterize the DCO separately. This signal is activated by the control signal '*DCO Characterization*'.



#### 5.4 Digitally Controlled Oscillator

To generate the 8 uniformly-phase-shifted clock phases for the aggregated 25 Gb/s BB-PD operation, the DCO is implemented as a 4-stage ring oscillator with differential delay cells (Fig. 5.9) [8].



Figure 5.9: The DCO structure: (a) ring oscillator and (b) delay cell.

The delay cell is shown in Fig. 5.9(b). It can be tuned by tuning the tail bias current or by tuning the load network. For the load, a coarse tuning and a fine tuning was distinguished. The coarse tuning has 6-bit resolution and is only used during calibration of the DCO (see Section 5.5) and is implemented by switching binary-weighted resistors on or off.

The fine tuning is done by tuning the load varactors. During normal AD-CDR operation only this fine tuning is used. It is implemented as follows: the thermometer-coded words from the DLF (see Fig. 5.8) switch unit varactors on/off. To reduce the area of the ring oscillator and achieve a good resolution, the varactor units are distributed equally over the 4 delay cells. Per Least Significant Bit (LSB) of the fine tuning word, only one varactor is switched. However, the clock phases of the DCO have to be kept equally spaced as much as possible. Therefore, the on/off switching of the varactors is sequenced across the different delay cells: 1. toggle a varactor in the first delay cell, 2. toggle a varactor in the third delay cell, 3. toggle a varactor in the second delay cell, 4. toggle a varactor in the fourth delay cell, etc.

The tune mechanism through the tail bias current is in principle not needed, because according to simulation the entire operating range could be sufficiently covered with the load tuning alone. However, this tuning was added to achieve a larger robustness versus process variations, such that the entire intended frequency range has sufficient coverage even under unforeseen process conditions. Here, a 4-bit current control was implemented on the chip.

#### 5.5 Calibration of the DCO

Before normal AD-CDR operation, where only the fine-tuning of the DCO is adapted, the DCO frequency should first be adjusted to within about  $\pm$  30 MHz of the correct quarter-rate frequency of the data rate (e.g. 6.25 GHz for 25 Gb/s input data). For this, a coarse tuning of the DCO is performed in a calibration cycle at startup. This is done through an automatic frequency control loop which is based on an external reference clock and counters [9].

The frequency control loop counts the number of clock cycles of the digital clock and external reference clock. These numbers are compared with SPI configured registers and the coarse settings are then gradually adjusted. This procedure is repeated until the DCO lies within about  $\pm 30$  MHz of the correct desired frequency.

The circuit is incorporated in the synthesized digital block. The power overhead of this calibration procedure is negligible: the synthesized circuit is only based on simple counters and comparators and consumes almost no power (approximately 0.75mW).

#### 5.6 AD-CDR ASIC Layout

The AD-CDR is fabricated in a 40 nm Low Power CMOS technology. The low power flavor is not favorable for a high-speed circuit, but was selected based on the available tape-outs. The drawn layout is shown in Fig. 5.10(a), while a photograph of the (pad-limited) manufactured AD-CDR Application Specific Integrated Circuit (ASIC) is shown in Fig. 5.10(b). The legend for the annotation on both figures is shown in Fig. 5.10(c). The complete chip dimension measure about  $1.85 \text{ mm} \times 1.85 \text{ mm}$ .

A zoomed-in photo of the AD-CDR core together with an annotated layout view, is shown in Fig. 5.11. The core-area of the AD-CDR is only  $0.050 \text{ mm}^2$ .



Figure 5.10: The complete AD-CDR ASIC: (a) the drawn layout, (b) a photo of manufactured ASIC and (c) the legend.



### References

- B. Nikolic, V.G. Oklobdzija, V. Stojanovic, Wenyan Jia, James Kar-Shing Chiu, and M. Ming-Tak Leung, "Improved sense-amplifierbased flip-flop: design and measurements," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 6, pp. 876–884, jun 2000.
- [2] A.G.M. Strollo, D. De Caro, E. Napoli, and N. Petra, "A novel high-speed sense-amplifier-based flip-flop," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 13, no. 11, pp. 1266– 1274, nov 2005.
- [3] HeungJun Jeon and Yong-Bin Kim, "A CMOS low-power low-offset and high-speed fully dynamic latched comparator," in 23rd IEEE International SOC Conference, sep 2010, pp. 285–288.
- [4] Massimo Brandolini, Young J. Shin, Karthik Raviprakash, Tao Wang, Rong Wu, Hemasundar Mohan Geddada, Yen-Jen Ko *et al.*, "A 5 GS/s 150 mW 10 b SHA-Less Pipelined/SAR Hybrid ADC for Direct-Sampling Systems in 28 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 12, pp. 2922–2934, dec 2015.
- [5] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, "A currentcontrolled latch sense amplifier and a static power-saving input buffer for low-power architecture," *IEEE Journal of Solid-State Circuits*, vol. 28, no. 4, pp. 523–527, apr 1993.
- [6] Schekeb Fateh, Philipp Schonle, Luca Bettini, Giovanni Rovere, Luca Benini, and Qiuting Huang, "A Reconfigurable 5-to-14 bit SAR ADC for Battery-Powered Medical Instrumentation," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 11, pp. 2685–2694, nov 2015.
- [7] M. Verbeke, P. Rombouts, X. Yin, and G. Torfs, "Inverse Alexander phase detector," *Electronics Letters*, vol. 52, no. 23, pp. 1908–1910, nov 2016.

- [8] Asad Abidi, "Phase Noise and Jitter in CMOS Ring Oscillators," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 8, pp. 1803–1816, aug 2006.
- [9] Sang-Hyeok Chu, Woorham Bae, Gyu-Seob Jeong, Sungchun Jang, Sungwoo Kim, Jiho Joo, Gyungock Kim *et al.*, "A 22 to 26.5 Gb/s Optical Receiver With All-Digital Clock and Data Recovery in a 65 nm CMOS Process," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2603–2612, nov 2015.

## Part III

# **Results and Conclusions**

# Experimental Results

This chapter presents the results obtained by the measurements performed on the All-Digital Clock and Data Recovery (AD-CDR) Application Specific Integrated Circuit (ASIC). It starts by discussing the electrical and the optical experimental test setups. Subsequently, it gives the measurement results of the AD-CDR ASIC in continuous and burst mode operation. Additionally, the comparison of the measurements results of the conventional and Inverse Alexander Phase Detector (PD) is highlighted.

#### 6.1 Measurement Setups

To test the fabricated AD-CDR, a dedicated high-speed Printed Circuit Board (PCB) (Fig. 6.1) was developed to provide the required inputs to the chip and observe the necessary outputs. The test board was designed such that the AD-CDR ASIC could be wire bonded directly to the PCB (Fig. 6.2), eliminating any interconnection parasitics that would degrade the signal quality when using a standard package. Additionally, the transmission lines on the PCB are matched to the 50  $\Omega$  input impedance of the input buffers of the AD-CDR.

Unfortunately, the received samples (all from the same wafer) were apparently from a slow process corner. This forced us to increase the Digitally Controlled Oscillator (DCO) supply voltage to 1.15V (instead of the nominal value of 1.1V). For the Bang-Bang Phase Detector (BB-PD) and synthe-



Figure 6.1: The AD-CDR testboard.



Figure 6.2: A photo of the implemented chip wire bonded on a high-speed PCB.

sized logic we had to increase the voltage to 1.25V. All the measurements reported in this chapter were done with these increased supply voltages.

#### 6.1.1 Electrical Test Setup

For all electrical tests, except the measurements of the samplers sensitivity, a clock generator and a bit pattern generator were used (Fig. 6.3). The clock generator creates a 25 GHz clock signal that provides the necessary timing information to the bit pattern generator. Additionally, this clock generator has the functionality to add sinusoidal jitter to the generated clock signal in order to perform jitter tolerance measurements. The bit pattern generator outputs a 25 Gb/s data signal with a maximum voltage swing of  $630 \,\mathrm{mV_{pp}}$ . This bit pattern generator is connected directly through the PCB to the input of the AD-CDR.

For the samplers' sensitivity measurement (see Section 6.2.3), an arbitrary waveform generator (Keysight M8195A) was used to generate the input data, because this instrument allows a finer control of the delay between the input data and the externally applied clock phases<sup>1</sup>.



Figure 6.3: The electrical test setup with AD-CDR.

<sup>&</sup>lt;sup>1</sup>These externally applied clock phases are test signals and are only necessary for the measurements of the samplers' sensitivity and are not used during normal operation.

As shown on Fig. 6.3, the recovered clock of the AD-CDR was measured by directly connecting the output to a sampling scope or spectrum analyzer. The recovered data was also recorded by connecting the output to an error analyzer or sampling scope.

#### 6.1.2 Optical Test Setup

The experimental setup shown in Fig. 6.4 is used to perform optical measurements. In this setup, a clock generator creates a 25 GHz clock signal that provides the necessary timing information for a bit pattern generator. This pattern generator outputs a 25 Gb/s data signal with a voltage swing that varies from  $300 \text{ mV}_{\text{pp}}$  to  $630 \text{ mV}_{\text{pp}}$ . Either a Pseudo Random Bit Sequence (PRBS) with a length of  $2^9 - 1$  (PRBS9) or a user-defined (burst) packet is generated and applied to a 40 Gb/s LiNbO3 Mach-Zehnder Modulator (MZM). This MZM modulates the light of a laser with the generated data. The laser operates at a wavelength of 1550 nm and the output power is set to 6 dBm.

To mimic realistic Passive Optical Network (PON) applications for the Clock and Data Recovery (CDR), a 40 Gb/s optical receiver with a RF bandwidth of 30 GHz is connected back-to-back with the modulator. This receiver comprises a PIN photo diode and a TransImpedance Amplifier (TIA), and converts the optical signal back to an electrical data stream. This data stream is then first amplified before it is applied to the AD-CDR. This amplifier has a gain of 11 dB and a bandwidth of 67 GHz and is required to increase the output swing of the optical receiver to about 370 mV<sub>pp</sub> such that the CDR can operate correctly. Subsequently, our implemented AD-CDR recovers the timing information and the data from the input signal. The recovered clock is observed with a spectrum analyzer, while the recovered data is recorded by an error analyzer for the Bit-Error Rate (BER) measurements, by a sampling scope for the eye diagram measurements and by a real time scope for the settling time measurements.

Please note that in a complete receiver, an adaptive gain control block is typically incorporated in the TIA to keep the input swing of the CDR constant during operation [1]. In our measurement setup, this was not needed because all data is generated by one source.





#### 6.2 Electrical Tests in Continuous Mode

#### 6.2.1 Functional Tests

First, basic functional tests were performed on our prototype at 3 different operating frequencies: 25 Gb/s, 20 Gb/s and 12.5 Gb/s. For this, a  $2^{31}-1$  Pseudo Random Bit Sequence (PRBS31) was applied to the input of our AD-CDR. Note that with this PRBS31 test sequence, the PD output, after the 16 times subsampling that we have build in our circuit, will contain idle patterns with a length l equal to 31 (see Section 3.6). Correct recovery of a PRBS31 sequence proves the robustness against very long idle sequences.

At  $25 \,\mathrm{Gb/s}$ , the CDR core without input and output buffers has a power consumption of  $46 \,\mathrm{mW}$  of which  $11 \,\mathrm{mW}$  is dissipated by the samplers, retiming block and subsampling block,  $4 \,\mathrm{mW}$  is consumed by the digital block and  $31 \,\mathrm{mW}$  is used for the DCO. The power dissipation at  $20 \,\mathrm{Gb/s}$  and  $12.5 \,\mathrm{Gb/s}$  is respectively  $38 \,\mathrm{mW}$  and  $23 \,\mathrm{mW}$ .

Next a batch of BER measurements was performed. The full data stream is available as 4 parallel channels at quarter-rate, but due to equipment limitations, we could only do the BER measurement on 1 of the 4 channels at the same time. All the measurement reported underneath are done in this configuration.

In a typical measurement the AD-CDR was operated over a time span of 15 minutes and the bit errors over this time frame were collected. These measurements consistently resulted in an error-free operation of the AD-CDR at 20 Gb/s and 12.5 Gb/s. At 25 Gb/s a BER of  $3.5 \cdot 10^{-13}$  was measured, well below the error correction capabilities of most applications [2].

In the remainder of this section, the performance of the DCO, the PD (including the experimental comparison of the conventional and Inverse Alexander PD) and the AD-CDR are discussed in more detail.

#### 6.2.2 Digitally Controlled Oscillator Operation

The DCO can be driven independently from the other blocks. This allows to characterize the DCO for different current, coarse tuning and fine tuning settings.

In Fig. 6.5 the DCO frequency characteristic is shown. The x-axis represents the 6-bit resistor coarse tuning word concatenated with the 5-bit integral path fine tuning word and results in 2048 possible configurations. The measurement was repeated over multiple current settings: ranging from



Figure 6.5: The free running frequency of the DCO with (a) an overview the complete frequency range and (b) a detail around 6.25 GHz.

current setting '2' to '15' (for the lowest current settings the results were not meaningful). Fig. 6.5 demonstrates that the DCO covers a frequency range from 2.73 GHz to 8.95 GHz which corresponds to a data rate range from 10.92 GHz to 35.8 GHz.

A detail of the characteristic around 6.25 GHz, which is the quarter-rate oscillation frequency for 25 Gb/s input data, is shown in Fig. 6.5(b). In this figure the influence of the different settings is more visible: each color/symbol corresponds to different current setting. The different line segments of the same color have a different coarse tuning value and all frequency points within a separate line segment have a different fine tuning value.

The DCO was designed such that for every coarse transition, the output frequency range would overlap between the two adjacent settings. If we now focus on e.g. the rightmost (dark blue) current setting we note that this is the case for some coarse transition. However for some coarse transitions there is an undesired frequency gap. This means that for a fixed current setting some oscillation frequencies cannot be generated by changing only the coarse and fine tuning settings. This issue arises from underestimated parasitics. Fortunately, this problem was anticipated and can be circumvented by using the coarse current tuning. In this way, the desired frequency range is still completely covered.



Figure 6.6: The gain of the DCO  $K_{dco}$  at 6.25 GHz.

The measured DCO gain  $K_{DCO}$  at 6.25 GHz for the different current settings is shown in Fig. 6.6. The figure shows that  $K_{DCO}$  is about 1.7 MHz per Least Significant Bit (LSB) for high current settings and that  $K_{DCO}$  increases to 2.3 MHz per LSB for lower current settings. Clearly, this means that the DCO quantization step is very rough. The measurements reported below are performed for a current setting equal to 12.



Figure 6.7: The supply sensitivity at 6.25 GHz.

The DCO supply sensitivity at 6.25 GHz is shown in Fig. 6.7. Here, the supply sensitivity equals 3.3 GHz/V. Due to the high supply sensitivity, the phase noise of the DCO is degraded: e.g. at a frequency offset of 10 MHz from the carrier the measured phase noise is equal to -95 dBc/Hz (see dotted line in Fig. 6.11). In post-layout simulation, however, the corresponding phase noise was only -110 dBc/Hz at 10 MHz from the carrier. We attribute this deterioration to supply noise which leads to excessive phase noise due to the poor supply sensitivity.

#### 6.2.3 Phase Detector Operation

To determine the performance of the PD, the sensitivity of the samplers is measured. This sensitivity is defined as the time span in which the input data is sampled correctly by the samplers. The measurement is performed by applying an external quarter-rate clock signal together with the input data to the AD-CDR. For this measurement, a  $2^{7}$ -1 Pseudo Random Bit Sequence (PRBS7) at 25 Gb/s with a rise time of 0.25 UI is applied. The internal DCO is bypassed such that the data is sampled by the external clock. By sweeping the time difference between the external clock and the input data, we could determine the BER for each time difference and the resulting bathtub curve is shown in Fig. 6.8. The bathtub curve indicates that a time span of 18.8 ps out of a data period of 40 ps gives a BER below  $10^{-12}$ .



Figure 6.8: The sensitivity of the PD with a PRBS7 input data at  $25 \,\mathrm{Gb/s}$ .

#### 6.2.4 Comparison Conventional and Inverse Alexander PD

To facilitate the experimental comparison between the conventional and Inverse Alexander PD, our prototype circuit was designed such that it can be configured to operate with the conventional as well as the Inverse Alexander PD. This is done by switching the sign of the control loop of the CDR in the Digital Loop Filter (DLF). Furthermore, the subsample factor N can be set to 16 (which is the nominal case) or to 32 (which is a test mode). For these cases comparative BER measurements were performed. A 25 Gb/s PRBS7 was applied to the CDR and jitter was intentionally applied to the input data stream. For the jitter, Gaussian pseudo-white noise with a bandwidth of 80 MHz (= equipment limit) was used. The jitter level was varied and the CDR was operated over a long time until a sufficient number of bit errors were collected to obtain a reasonably accurate estimation of the bit error rate. The results are summarized in Fig. 6.9. In the interpretation of the curves it should be noted that at a high jitter level the CDR starts to occasionally lose synchronism (due to cycle slips). This happened in each of the considered configurations, but as the figure shows, much earlier for the conventional PD than for the Inverse PD.

From Fig. 6.9, we can conclude that the BER performance of both the conventional as well as the Inverse Alexander PD degrades when the subsample factor increases from N = 16 (nominal value) to N = 32 (test case). For N = 32, the conventional PD was in fact not functional at all. It is also obvious from the figure that due to subsampling and non-idealities, the Inverse Alexander PD greatly outperforms the conventional Alexander PD: if we compare the BER at the same jitter level the improvement is not measurable but definitely above a factor  $10^5$ . If we compare the jitter levels



Figure 6.9: The measured BER for the conventional and the Inverse Alexander phase detector with a PRBS7 input data sequence at 25 Gb/s: (a) with a subsample factor N = 16 and (b) with a subsample factor N = 32. (Digital Loop Filter settings:  $K_p = 5$  and  $K_i = 2^{-7}$ ).

where a certain BER occurs, the improvement is about a factor 1.9.

Moreover, the phase noise of the recovered clock is compared between the conventional and Inverse Alexander phase detector for different subsample factors (Fig. 6.10). In all cases, a PRBS31 data sequence at 25 Gb/s was applied to the input of the CDR and the digital loop filter parameters were held constant. As predicted in Section 4.2.1, the Inverse Alexander phase detector will introduce less noise which leads to smaller phase noise compared to the conventional Alexander phase detector for the same subsample factor. However when the subsample factor is doubled, additionally aliasing effects occur which increases the in-band phase noise with approximately 3 dB for both the conventional and Inverse Alexander phase detector (Section 3.5.2).



Figure 6.10: The phase noise of the recovered clock with a PRBS31 input data sequence at 25 Gb/s: Comparison between Alexander and Inverse Alexander PD for different subsample factors (i.e. N = 16 and N = 32). (Digital Loop Filter settings:  $K_p = 5$  and  $K_i = 2^{-7}$ ).

# 6.2.5 All-Digital Clock and Data Recovery Operation

For the final continuous mode electrical AD-CDR operation measurements, the standard operation mode (with Inverse Alexander PD and subsample factor N=16) was again selected.



Figure 6.11: The phase noise of the recovered clock with a PRBS31 input data sequence at 25 Gb/s: Sweep  $K_p$ .

The closed loop phase noise of the recovered clock for different gain settings is shown in Fig. 6.11 next to the phase noise of the free running oscillator. Here, a PRBS31 data sequence at 25 Gb/s is applied to input of the AD-CDR and the phase noise of the quarter-rate recovered clock is captured. The figure shows that increasing the proportional gain  $K_p$ , increases the bandwidth of the AD-CDR. This will directly reduce the settling time of the AD-CDR during burst mode operation. However, there is less jitter rejection. Another aspect of Fig. 6.11 is that peaking starts to occur when the ratio of the proportional gain  $K_p$  and integral gain  $K_i$  decreases. Furthermore, the figure also shows that outside the loop bandwidth, the phase noise of the closed loop system approximates the phase noise of the free running clock.

The phase noise measurement of the recovered clock with DLF settings  $K_p = 5$  and  $K_i = 2^{-7}$  is compared in Fig. 6.12 to the corresponding simulation and Linear Time-Variant (LTV) analysis result (discussed in Section 3.7.2). The figure shows that although the AD-CDR is a very complex system, the phase domain simulation and calculation have a good matching with the measurement result. Still, there are some discrepancies between the two curves because the phase domain model is an approximation of voltage domain AD-CDR.



Figure 6.12: The comparison of the measured phase noise of the recovered clock with the simulated and calculated result of Section 3.7.2. (Digital Loop Filter settings:  $K_p = 5$  and  $K_i = 2^{-7}$ ).

In the time domain, the closed loop phase noise was measured as 1.455 ps RMS jitter on the recovered clock as shown in Fig. 6.13(a). Additionally, the corresponding measured eye diagram of the recovered data is depicted in Fig. 6.13(b). The RMS jitter is approximately 3.71 ps.

The capture range of the AD-CDR was also measured and is equal to 248 MHz. This corresponds to the tuning range in normal operation and is sufficiently large to allow correct operation from an initial calibration that aligns the DCO frequency within  $\pm$  30 MHz of the desired quarter-rate frequency.

Moreover, the jitter tolerance of the AD-CDR is shown in Figs. 6.14 (a) and (b) for different proportional gains  $K_p$  and integral gains  $K_i$ , respectively. On both figures, the Synchronous Digital Hierarchy (SDH) Synchronous Transport Module (STM)-256 jitter tolerance mask and the jitter tolerance of [8] and [9] are added for comparison. These jitter tolerance curves are measured by applying a PRBS7 input data sequence at 25 Gb/s with sinusoidal jitter. Each measurement is obtained by increasing the jitter level until the BER becomes  $> 10^{-12}$ . As shown on the figures, the jitter tolerance curves are measured by uned by adapting the digital loop parameters.





Figure 6.13: Persistence plots of (a) the recovered (differential) clock (jitter < 1.5 ps<sub>rms</sub>) and (b) the recovered data (jitter  $\approx 3.71 \text{ ps}_{rms}$ ).



Figure 6.14: The jitter tolerance with a PRBS7 input data sequence at 25 Gb/s: (a) Sweep  $K_p$  and (b) Sweep  $K_i$ .

| [This work]      | 40                   | 12.5-25          | 50                           | AD-CDR           | Ring-DCO        | 46         | 1.8                 | 0.050                   | -105                        | 1.46                | 0.6                              | Yes                                     | No                  | 1:4         | No               |                           |                   |
|------------------|----------------------|------------------|------------------------------|------------------|-----------------|------------|---------------------|-------------------------|-----------------------------|---------------------|----------------------------------|-----------------------------------------|---------------------|-------------|------------------|---------------------------|-------------------|
| <mark>6</mark> ] | 28                   | 22.5 - 32        | 29.7                         | AD-CDR           | DAC + Ring-VCO  | 102        | 3.2                 | 0.52                    |                             |                     | 0.35                             | Yes                                     | No                  | 1:32        | CTLE,            | 1-tap DFE                 |                   |
| <u></u>          | 65                   | 22-26.5          | 17                           | AD-CDR           | LC-QDCO         | 218        | 8.2                 | 0.46                    | -115                        | 1.28                | 0.16                             | No                                      | No                  | 1:64        | No               |                           |                   |
| <mark>.</mark>   | 90                   | 6-44             | 86.4                         | Digital CDR      | Ы               | 230        | 5.7                 | 0.2                     |                             | 0.249               | 0.35                             | No                                      | 6-11                | 1:16        | No               |                           |                   |
| 9                | 28                   | 28               | ,                            | Digital CDR      | Ы               | 107        | 3.8                 | 0.52                    |                             |                     | 0.3                              | No                                      | 14                  | 1:4         | CTLE             |                           |                   |
| 2                | 65                   | 1-16             | 93.8                         | Digital CDR (**) | Id              | 89         | 5.5                 | 0.088                   |                             |                     | 0.4                              | No                                      | 8 - 16              | 1:16        | CTLE             |                           |                   |
| <u>4</u>         | 40                   | 19-27            | 29.6                         | Digital CDR      | QR-VCO + PI     | $85^{(*)}$ | 3.1                 | 0.09                    | -96                         | 1.66                | 0.5                              | No                                      | 0.1                 | 1:2         | CTLE,            | 2-tap DFE                 |                   |
| 3                | 28                   | 40               | I                            | Digital CDR      | LC-VCO + PI     | 927 (*)    | 23.2                | 0.81                    | -105                        | 0.170               | 0.3                              | No                                      |                     | 1:64        | CTLE, 17-tap DFE | 2-tap Transversal Filter, | 3-tap Sampled FFE |
|                  | CMOS Technology [nm] | Data Rate [Gb/s] | relative frequency range [%] | Type             | Oscillator type | Power [mW] | Power eff. [pJ/bit] | Area [mm <sup>2</sup> ] | Phase noise @ 1MHz [dBc/Hz] | Jitter RCLK [psrms] | jitter tolerance @ 10 MHz [Uipp] | Satisfies STM-265 jitter tolerance mask | Reference clk [GHz] | Demux ratio | Equalization     |                           |                   |

(\*) Power consumption of complete receiver.
(\*\*) Here, the design is described as an AD-CDR. However, according to our definition of an (PLL-based) AD-CDR, this is a digital CDR.

Table 6.1: The comparison of digital CDRs.

E.g. the jitter tolerance can easily be set such that it satisfies the STM-256 mask and exceeds the jitter tolerance of [8] and [9]. Please note that for the lower jitter frequencies, the jitter tolerance is better than indicated on the figures, since the highest jitter level that our equipment can generate still leads to a BER that is better than  $10^{-12}$ .

All continuous mode, electrical test measurements of the AD-CDR are summarized in Table 6.1, which also shows a comparison with the stateof-the-art of digital CDRs. This summary shows that our design occupies the smallest area and has the highest power efficiency. Although the performance of the DCO is modest and the phase noise and the jitter of the recovered clock are higher than prior work, only our work and [9] satisfy the STM-265 jitter tolerance mask as shown in Fig. 6.14. Finally, apart from [5] and [7] which have the unattractive requirement that they need a tunable, high-quality, multi-gigahertz frequency reference clock, our design has the highest relative frequency range for digital CDRs.

# 6.2.6 Describing Function Stability Verification

To evaluate the stability of the designed AD-CDR, the amount of noise at the input of the CDR that is needed to quench a limit cycle is determined. That is, the value of the threshold RMS input jitter  $\sigma_{in,th}$  (Section 3.3) is calculated and is compared to the applied noise at the input of the AD-CDR, which is extracted from the phase noise measurement.

First, the equations derived in Section 3.3 are adapted to the case of the Digital Loop Filter (Eq. (4.1)) used in this design. The discrete-time transfer function of the DLF is converted to a continuous-time transfer function using the Impulse-Invariant transformation method:

$$H_{DLF}(s) = \mathscr{L} \left\{ \mathcal{Z}^{-1} \left\{ H_{DLF}(z) \right\} \right\}$$
$$= \mathscr{L} \left\{ \mathcal{Z}^{-1} \left\{ K_p \, z^{-D_{K_p}} + K_i \, \frac{z^{-D_{K_i}}}{1 - z^{-1}} \right\} \right\}$$
$$\approx K_p \, \exp\left( -sD_{K_p}T_{dig} \right) + \frac{K_i}{sT_{dig}} \, \exp\left( -s\left( D_{K_i} - 0.5 \right) T_{dig} \right)$$
(6.1)

where  $K_p$  and  $K_i$  are the respective gains of the proportional path and integral path, and  $D_{K_p} = 2$  and  $D_{K_i} = 9$  are the corresponding delays.  $T_{dig} = \frac{1}{f_{dig}} = \frac{1}{1.5625 \text{ GHz}}$  represents the sampling period of the DLF.

The approximated continuous-time transfer function of the DLF is then combined with the transfer function of the DCO  $H_{dco}(s)$  to constitute the transfer function of the linear block G(s). Note that for DLF settings  $K_p = 5$  and  $K_i = 2^{-7}$ , and DCO gain  $K_{dco} = 2\pi \cdot 1.8$  MHz/LSB, the frequency of the zero in the transfer function  $H_{DLF}$  is assumed to be sufficiently smaller than the unity gain frequency. This way the zero has little effect and can be neglected in the further calculations. The transfer function of the linear block G(s) is then given by:

$$G(s) = H_{DLF}(s) H_{dco}(s)$$

$$\approx K_p \exp\left(-sD_{K_p}T_{dig}\right) \frac{K_{dco}}{s}$$
(6.2)

Now, the oscillation frequency  $\omega_s$ , the maximum amplitude of the limit cycle  $A_{e,max}$  and the threshold RMS input jitter  $\sigma_{in,th}$  can be calculated by using Eqs. (3.25),(3.27) and (3.31), respectively. The transfer function of the linear block G(s) (Eq. (6.2)) is substituted into these three equations, which result in:

$$\omega_s = \frac{\pi}{2} \frac{1}{D_{K_p} T_{dig}} = 1.2 \,\text{Grad/s}$$
(6.3)

$$A_{e,max} = \frac{4\alpha}{\pi} \frac{K_p K_{dco}}{\omega_s} = 30 \,\mathrm{mrad} \tag{6.4}$$

$$\sigma_{in,th} = \frac{1}{2} \sqrt{\frac{\pi}{2}} A_{e,max} = 19 \operatorname{mrad}$$
(6.5)

From these results, we can conclude that if the amount of applied input noise is higher than 19 mrad, there will be no limit cycle. Alternatively, if the amount of input noise is less, a limit cycle with an oscillation frequency of  $1.2 \,\mathrm{Grad/s}$  and an very small amplitude of  $30 \,\mathrm{mrad}$  will occur.

The occurrence of a limit cycle is verified by calculating the applied input noise from the phase noise measurements. This is done by first determining the describing function gain of the BB-PD  $K_n$ . In Fig. 6.11, the bandwidth  $\omega_c$  with DLF settings  $K_p = 5$  and  $K_i = 2^{-7}$  is approximately  $2\pi \cdot 30$  MHz.

The linearized closed loop transfer function H of the Linear Time-Invariant (LTI) model (Fig. 3.4) evaluated at frequency  $\omega_c$  is given by:

$$H \Big|_{\omega = \omega_c} = \frac{T}{1+T} \Big|_{\omega = \omega_c} = \frac{1}{2}$$

$$\Leftrightarrow T \Big|_{\omega = \omega_c} = K_n K_p \frac{K_{dco}}{\omega_c} = 1$$

$$\Leftrightarrow K_n = \frac{1}{K_p} \frac{\omega_c}{K_{dco}} = 3.3$$
(6.7)

where H, T and  $K_{dco}$  represent the closed loop transfer function, the loop gain and the gain of the DCO, respectively.

Using Eq. (3.29), the applied input noise can be calculated as:

$$\sigma_{in} \approx \sqrt{\frac{2}{\pi}} \frac{\alpha}{K_n} = 0.12 \,\mathrm{rad}$$
 (6.8)

The calculated input noise is more than 4 times larger than the threshold RMS input jitter  $\sigma_{in,th}$ . Therefore, there is sufficient input noise present at the input to quench a limit cycle in the AD-CDR, resulting in a stable AD-CDR circuit.

# 6.3 Electrical Setup Tests in Burst Mode

To evaluate the burst mode performance of the AD-CDR, 25 Gb/s packets starting with a "1010..." preamble are used. Because all the packets are generated from the same source (Fig. 6.3), a sufficiently long gap is required between two consecutive packets to ensure the AD-CDR is no longer in lock with the generator.

This is illustrated with two packets with a gap size of 10 ns and 41 ns shown in Fig. 6.15 and Fig. 6.16 respectively. The top waveform displays an instantaneous sampled output stream, while the bottom row shows a persistence mode view of the output which superimposes multiple waveforms on the same view. It is clear that with a gap of 10 ns the CDR remains in lock, while after 41 ns the CDR is out-of-lock. To ensure random phases of the incoming data with respect to the DCO of the CDR, a gap size of  $5.2 \,\mu\text{s}$ was employed during the measurements. Of course, this is only necessary to stress the device during experiments. In practice, the gap size can be made arbitrary small.

# 6.3.1 Frame Structure

In order to measure the settling time of the AD-CDR, a long preamble sequence was added in front of the packet. Because the input data is internally demultiplexed into four quarter-rate streams, it is very easy to observe the settling at the output of the device using a real-time oscilloscope: When the "1010..." preamble is demultiplexed by four, the output should stay either low or high. If any transition occurs during this preamble, an error has occurred. The number of transitions at the beginning of the packet indicates how many packets are received: the phase of the incoming packet



Figure 6.15: A packet in electrical burst mode measurements with a short gap (10.25 ns).



Figure 6.16: A packet in electrical burst mode measurements with a long gap (41 ns).



Figure 6.17: The frame structure: the outputs will stay either low or high during the preamble time when the CDR is settled.



Figure 6.18: A captured 6.25 Gb/s output stream in electrical burst mode measurements.

is distributed randomly, as a result, there is an equal chance of receiving a 1 or 0 signal.

The packet structure and the demultiplexed output is schematically illustrated in Fig. 6.17. The packet consists of a  $2^{14}$  bit ( $\approx 16$  kbit) preamble, a 16 bit long delimiter used to align the 4 output datastreams and a  $2^{20}$  bit ( $\approx 1$  Mbit) payload. The gap between two packets is  $2^{17}$  bit ( $\approx 100$  kbit) which results in 5.2 µs. A captured output packet is shown in Fig. 6.18. The long preamble length was only used to verify no errors occur during burst mode operation after settling. In practice, the preamble length can be limited to the worst-case settling time.

# 6.3.2 Settling Time

The AD-CDR aligns the phase of its recovered clock using a wide-band PLL structure. Because this is a closed loop system, the settling time is strongly related to its bandwidth. Additionally, the settling time also depends on the relative phase of incoming data stream and on the phase noise generated by the DCO. As a result, part of the settling time is deterministic, while it also has a stochastic component.

The settling time of the AD-CDR is measured by recording when a transition occurs in the subsampled preamble at the output of the AD-CDR. Fig. 6.19 shows a maximally observed settling time of 35 ns after transmission of 2 million packets.



Figure 6.19: The AD-CDR is always in lock after 35 ns for 2 million packets with setting  $K_p = 7$  and  $K_i = 2^{-9}$ .

# 6.4 Optical Setup Tests in Continuous Mode

# 6.4.1 Functional Tests

For the optical setup tests, the power consumption and error-free operation were evaluated in continuous mode. A  $25 \,\mathrm{Gb/s}$  PRBS9 input sequence was generated by the bit pattern generator. The corresponding eye diagram at the input of the AD-CDR is given by Fig. 6.20 and has an amplitude of  $370 \,\mathrm{mV_{pp}}$  and a RMS jitter of 2.6 ps. Due to the conversion to and from the optical domain, additional noise and jitter are introduced which deteriorate the desired signal.

We performed consistent measurements which showed that our AD-CDR is able to work error-free over more than 15 min, while consuming only 46 mW. Error-free operation is verified by analyzing one of the quarter-rate outputs with the use of a BER-tester. Additionally, the eye diagram of the CDR's output is shown in Fig. 6.21. The jitter at the output of the CDR is  $3.73 \text{ ps}_{\text{RMS}}$ .



Figure 6.20: The eye diagram of the input signal of the CDR (PRBS9 @  $25 \,\mathrm{Gb/s}$ ).

### 6.4.2 Phase Noise

Next, the phase noise of the quarter-rate recovered clock was measured and is depicted in Fig. 6.22. For this measurement, the proportional gain  $K_p$  and the integral gain  $K_i$  of the digital loop filter were set to 5 and  $2^{-7}$ ,



Figure 6.21: The eye diagram of one of the quarter-rate outputs of the CDR (@  $6.25 \,\mathrm{Gb/s}$ ).

respectively. A higher proportional gain would further increase the bandwidth which will lead to a faster settling time. However, the input jitter would be less suppressed and this will result in the occurrence of bit errors. The integral gain is set sufficiently smaller than the proportional gain to avoid instability. This value cannot be too small, because we need sufficiently high gain to reduce any frequency error to zero. Fig. 6.22 shows that the CDR with our settings has a large 3 dB loop bandwidth ( $\approx 75$  MHz), which directly reduces the settling time of the AD-CDR during burst mode operation. The figure also illustrates that many spurs are present in the phase noise of the DCO. These spurs originate from the finite length of the PRBS9 sequence and frequency of the fundamental spur can be determined by:

$$f_{spur} = \frac{f_{data}}{N} \cdot \frac{1}{\text{length}(PRBS9)}$$
$$= \frac{25 \,\text{GHz}}{16} \frac{1}{511} = 3.057 \,\text{MHz}$$
(6.9)

where  $f_{spur}$  is the frequency offset of the spur,  $f_{data}$  is the clock frequency of the input data, N is the subsample factor and length(PRBS9) is the period of the PRBS9 sequence.



Figure 6.22: The phase noise of the quarter-rate recovered clock of the AD-CDR for a PRBS9 input data sequence at 25 Gb/s ( $K_p = 5$ ,  $K_i = 2^{-7}$ ).



Figure 6.23: The BER as a function of the voltage swing at the input of the AD-CDR.

### 6.4.3 Input Sensitivity

Moreover, the input sensitivity of the CDR was also determined. Fig. 6.23 illustrates the obtained bit error rate when the amplitude at the input of the CDR is varied. A BER lower than  $10^{-12}$  is reached when the input amplitude of the CDR is larger than  $300 \text{ mV}_{pp}$ . For all subsequent measurements, the signal amplitude at the input of the CDR was set to  $370 \text{ mV}_{pp}$ .

# 6.5 Optical Link Tests in Burst Mode

Similarly to the electrical tests in burst mode, a sufficiently long gap is required between two consecutive packets to ensure the CDR is no longer in lock with the generator.

Fig. 6.24 shows two packets with a gap size of 10 ns, the two packets in Fig. 6.25 have a gap size of 41 ns. As was the case with the electrical burst mode tests, the CDR remains in lock with a gap of 10 ns, while a four times larger gap brings the CDR out-of-lock. For the remainder of the measurements the gap size is increased four times to 164 ns to ensure random phases of the incoming data with respect to the DCO of the CDR. This gap size is smaller than for the electrical burst mode measurements and cannot be further increased because we are limited by the use of a commercially available AC-coupled amplifier in our setup: any increase in the gap length will cause the DC level at the input of the CDR to drift during an idle phase.

For the measurement of the settling time of the AD-CDR in the optical setup, the packet structure for the electrical burst mode tests is used (Fig. 6.17). Due to the demultiplexing, this packet structure makes it very easy to observe the settling at the output of the device using a real-time oscilloscope. A captured output packet with the optical link setup is shown in Fig. 6.26.

The settling time of the AD-CDR is measured by recording when a transition occurs in the subsampled preamble at the output of the AD-CDR (Fig. 6.27). Nearly always, the CDR is able to lock on the data with very short settling times: it is observed that 99.9% of the transitions occur within a settling time that is smaller than 20 ns. After capturing 2 million packets, the worst case settling time was 37.5 ns.



Figure 6.24: A packet in optical burst mode measurements with a short gap (10.25 ns).



Figure 6.25: A packet in optical burst mode measurements with a long gap (41 ns).



Figure 6.26: A captured  $6.25\,\mathrm{Gb/s}$  output stream in optical burst mode measurements.



Figure 6.27: The AD-CDR is always in lock after 37.5 ns for 2 million packets with setting  $K_p = 5$  and  $K_i = 2^{-7}$ .

# References

- [1] Bart Moeneclaey, Jochen Verbrugghe, Elad Mentovich, Paraskevas Bakopoulos, Johan Bauwelinck, and Xin Yin, "A 64 Gb / s PAM-4 Transimpedance Amplifier for Optical Links," in *Optical Fiber Communication Conference*, 2017, pp. 1–3.
- [2] X Yin, M Verplaetse, R Lin, J Van Kerrebrouck, O Ozolins, T De Keulenaer, X Pang *et al.*, "First Demonstration of Real-Time 100 Gbit/s 3-Level Duobinary Transmission for Optical Interconnects," in *ECOC* 2016 - Post Deadline Paper; 42nd European Conference on Optical Communication, sep 2016, pp. 1–3.
- [3] Reza Navid, E-Hung Chen, Masum Hossain, Brian Leibowitz, Jihong Ren, Chuen-huei Adam Chou, Barry Daly *et al.*, "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 4, pp. 814–827, apr 2015.
- [4] Zheng-Hao Hong, Yao-Chia Liu, and Wei-Zen Chen, "A 3.12 pJ/bit, 19-27 Gbps Receiver With 2-Tap DFE Embedded Clock and Data Recovery," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2625–2634, nov 2015.
- [5] Guoying Wu, Deping Huang, Jingxiao Li, Ping Gui, Tianwei Liu, Shita Guo, Rui Wang *et al.*, "A 1-16 Gb/s All-Digital Clock and Data Recovery With a Wideband High-Linearity Phase Interpolator," pp. 2511– 2520, jul 2016.
- [6] Joshua Liang, Ali Sheikholeslami, Hirotaka Tamura, Yuuki Ogata, and Hisakatsu Yamaguchi, "A 28Gb/s digital CDR with adaptive loop gain for optimum jitter tolerance," in 2017 IEEE International Solid-State Circuits Conference (ISSCC), feb 2017, pp. 122–123.
- [7] Lucio Rodoni, George von Buren, Alex Huber, Martin Schmatz, and Heinz Jackel, "A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 7, pp. 1927–1941, jul 2009.

- [8] Sang-Hyeok Chu, Woorham Bae, Gyu-Seob Jeong, Sungchun Jang, Sungwoo Kim, Jiho Joo, Gyungock Kim *et al.*, "A 22 to 26.5 Gb/s Optical Receiver With All-Digital Clock and Data Recovery in a 65 nm CMOS Process," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2603–2612, nov 2015.
- [9] Wahid Rahman, Danny Yoo, Joshua Liang, Ali Sheikholeslami, Hirotaka Tamura, Takayuki Shibasaki, and Hisakatsu Yamaguchi, "A 22.5to-32Gb/s 3.2pJ/b referenceless baud-rate digital CDR with DFE and CTLE in 28nm CMOS," in 2017 IEEE International Solid-State Circuits Conference (ISSCC), feb 2017, pp. 120–121.

# Conclusion and Future Work

# 7.1 Conclusion

This dissertation presents low-power subsampling All-Digital Clock and Data Recovery (AD-CDR) techniques for multi-gigabit Passive Optical Networks (PONs) in order to face the various challenges of future multi-gigabit PONs. These techniques are the result of the research conducted to replace the bulky and power hungry charge pump loop filter by a Digital Loop Filter (DLF) which has many advantages in a Clock and Data Recovery circuit.

The state-of-the-art Clock and Data Recovery (CDR) circuits are introduced in Chapter 2 and the three main challenges that prevent the full integration of digital Phase Locked Loop (PLL) techniques in a CDR are identified.

The first challenge is that the implementation of an automatically synthesized and place & routed design of the DLF requires a reduction of the operation speed of the DLF. In this work, this challenge is tackled by using subsampling instead of demultiplexing. This avoids the requirement of a huge amount of high speed parallel samplers and a complex signal processing block to process the high-speed input data. Consequently, the power consumption and chip area of the CDR are reduced.

Second, due to subsampling, the CDR loses phase information which is needed to adjust the recovered clock such that the phase error is reduced. To

maintain a correct operation of the CDR, the Inverse Alexander Bang-Bang Phase Detector (BB-PD) – an improved BB-PD – was proposed by inverting the sign of the Alexander BB-PD. With this minimal effort, the Inverse Alexander Phase Detector (PD) has all the advantages of an Alexander PD while improving the Bit-Error Rate (BER) in simulation with a factor 10 to 20 in the situation that the CDR uses subsampling (N = 4) in the PD (Chapter 4). The improvement of the Inverse Alexander PD over the conventional Alexander PD is also confirmed during the measurement with a subsample factor N = 16 (Chapter 6): at a same jitter level, the improvement in BER is over a factor of  $10^5$  or alternatively, for the same BER, the jitter level that can be tolerated is a factor 1.9 higher.

The third challenge is the non-linear operation of a BB-PD which highly complicate the analysis of the CDR. Therefore, first the stability and then the influence of noise on the BB-CDR operation is investigated using describing function techniques (Chapter 3). The results of this mathematical method were found to exhibit very good matching with time domain simulations. In particular, the occurrence and the amplitude of a limit cycle is determined as a function of the input noise level. Our analysis allows to calculate the worst case amplitude of a limit cycle and to determine the minimal amount of noise necessary to avoid limit cycling as a function of the different CDR loop parameters. For this, the simple analytical approximations of Eqs. (3.25), (3.28) and (3.31) were found, which can be used for a fast assessment of the limit cycle sensitivity of a BB-CDR. Based on our analysis, it appears that in most CDR systems, there is sufficient noise present to avoid limit cycling. Even in the case that the input jitter level is too small to avoid limit cycling, it is still likely that the amplitude of the limit cycle will be small enough to allow a correct data recovery operation. However, in this case the recovered clock will contain significant jitter peaking, which may be unacceptable. The most dangerous situation occurs when the CDR loop filter has a large delay and a high linear gain.

This chapter is extended with the Linear Time-Variant (LTV) analysis and simulation of the complete and subsampled AD-CDR system. By combining the describing function gain extracted from the simulation results and the LTV analysis, a breakdown of the influences from the different noise sources in the system is obtained. The LTV analysis closely matches the simulation results and gives a lot of insight in the behavior of the AD-CDR. The results show, for example, that the contribution of the LTV component is negligible in our design. Although the phase domain model is still an approximation of the behavior of the voltage domain signals, the simulation results shows a good matching with the measurement results.

The combination of the solutions of these challenges resulted in an AD-CDR Application Specific Integrated Circuit (ASIC) implemented in a 40 nm Low Power CMOS technology (Chapter 5). It can operate in a very wide range of data speeds (from 12.5 Gb/s to 25 Gb/s). The CDR takes in the high-speed data and recovers a quarter-rate clock and demuliplexes the recovered data into 4 parallel data streams. A ring oscillator generates 8 equally spaced quarter-rate clock phases, and provides the necessary timing resolution for an Inverse Alexander phase detector, which captures the recovered data and sends *Early/Late* signal to the automatically-synthesized digital loop filter.

A key enabling element of the presented design is the use of extensive subsampling together with the Inverse Alexander phase detector to reduce the operating speed of the synthesized digital logic and still guarantee good operation of the CDR. By avoiding parallel structures, this simplifies the design, reduces the active die area and decreases the power consumption. The resulting AD-CDR core consumes only 46 mW at 25 Gb/s and 23 mWat 12.5 Gb/s and has an area of  $0.050 \text{ mm}^2$ . Compared to the state-of-theart, my design has the best power efficiency and occupies the smallest area. The implemented CDR is also highly tunable which results in the highest relative operation frequency range for digital CDRs. Additionally, the jitter tolerance specifications for SDH STM-256 are satisfied.

Thanks to large bandwidth which can be obtained by adjusting the DLF, this AD-CDR is also very suitable for burst mode operation. As a result, our design is the first 25 Gb/s AD-CDR circuit working in burst mode operation. After capturing 2 million packets, a settling time of 35 ns or less is obtained for the measurements with an electrical setup, while the worst case settling time for the measurements with an optical setup was 37.5 ns. Note that our CDR does not require a reference clock nor a start-of-burst signal.

# 7.2 Future Work

# 7.2.1 Possible Improvements

A first step to further improve the power efficiency and operation of the AD-CDR would be to explore new implementations or topologies for the Digitally Controlled Oscillator (DCO). The performance of the currently used DCO is modest and the phase noise and the jitter of the recovered clock are higher than prior work. Therefore, more research is required to determine the fundamental trade-offs relating power consumption, jitter generation, resolution and amount of phases in ring oscillators. The different jitter generating mechanisms need to be analyzed and methods to minimize them should be proposed. This way, an improved multi-phase ring oscillator can be designed for the use in an AD-CDR application.

Alternatively, other topologies of DCOs should also be investigated. LCoscillators typically have a much lower power consumption and generate less phase noise than ring oscillators. Therefore, it is suggested to investigate the effects on the AD-CDR performance when a LC-oscillator is used as DCO.

# 7.2.2 Additional Functionalities

Due to the non-linearity of the BB-PD in the AD-CDR, the loop gain and consequently the bandwidth depend on the amount of jitter present at the input of the phase detector (as discussed in Chapter 3). In literature [1–3], adaptive gain control is used to compensate the bandwidth variations arising due to fluctuations of the input jitter level. This technique also compensates for the bandwidth variations caused by changes of the loop parameters that originate from process, voltage and temperature variations. Further research could be conducted to determine the effectiveness of this adaptive gain control in an AD-CDR.

Furthermore, it is recommended that further research be undertaken in the area of adaptive bandwidth control during burst mode operation. This way, classical trade-offs between different system parameters such as bandwidth and lock time, can be relaxed by (digitally) detecting in which state the AD-CDR is, and selecting the appropriate filter coefficients. The AD-CDR can easily be extended by incorporating an on-the-fly adaptation of the DLF's coefficients. This paves the way for independent optimization of the different system parameters with greatly improved overall performance in burst mode operation.

Future research might also explore the implementation of the BiPON- or even CBi-PON-protocol [4, 5] in an AD-CDR to reduce the power consumption of the optical network on a hierarchical level.

### 7.2.3 Higher Data Rates

Undoubtedly, next generations should support ever higher data rates. The limiting factor will again be the operation speed of the DLF. One way to cope with this is to go to smaller CMOS technology nodes, which can operate at higher speeds.

Alternatively, demultiplexing or additional subsampling should be used in the DLF to reduce the operating speed. However, care should be taken such that a correct operation is maintained when long Consecutive Identical Digits (CID) sequences are present.

A greater focus on multilevel modulation formats (e.g. 4-level Pulse- Amplitude Modulation (PAM-4)) could produce interesting findings. By extending the phase detector such that more logic levels and their transitions can be detected, higher data rates can be achieved for a same baud rate.

# References

- Hyung-Joon Jeon, Raghavendra Kulkarni, Yung-Chung Lo, Jusung Kim, and Jose Silva-Martinez, "A Bang-Bang Clock and Data Recovery Using Mixed Mode Adaptive Loop Gain Strategy," *IEEE Journal* of Solid-State Circuits, vol. 48, no. 6, pp. 1398–1415, jun 2013.
- [2] Archit Joshi and Gagan Midha, "Bandwidth Compensation Technique for Digital PLL," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 63, no. 11, pp. 1044–1048, nov 2016.
- [3] Heesoo Song, Deok-Soo Kim, Do-Hwan Oh, Suhwan Kim, and Deog-Kyoon Jeong, "A 1.04.0-Gb/s All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Proportional Gain Control," *IEEE Journal* of Solid-State Circuits, vol. 46, no. 2, pp. 424–434, feb 2011.
- [4] Arno Vyncke, "A low power, multi-rate clock-and-data recovery circuit and MAC preprocessor for 40 Gbit/s cascaded bit-interleaving passive optical networks," Ph.D. dissertation, Ghent University, 2016.
- [5] Christophe Van Praet, "Techniques to reduce energy consumption in next-generation access networks," Ph.D. dissertation, Ghent University, 2014.

Part IV Appendix

# LTV Analysis Calculations

This section describes the calculations of the Linear Time-Variant (LTV) analysis of the All-Digital Clock and Data Recovery (AD-CDR) model given by Fig.3.19(b). All symbols are defined in Table 3.1.

The output phase of the AD-CDR is given as follows:

$$\phi_{out,n}^{*}(\omega) = H_{DCO}(\omega) \left[ \phi_{dco}^{*}(\omega) + H_{N}(\omega) \left( H_{DLF}(N\omega) \widetilde{TW}(N\omega) \right) \right] \quad (A.1)$$

As shown by Fig.3.19(b),  $\widetilde{TW}$  is the effectively subsampled  $\widehat{TW}$  and their spectrums are described by [1, 2]:

$$\widetilde{TW}(N\omega) = \frac{1}{N} \sum_{n=0}^{N-1} \widehat{TW}\left(\omega - n\frac{2\pi}{N}\right)$$
(A.2)

$$\widehat{TW}(\omega) = K_n \left( \phi_{in}^*(\omega) - \phi_{out,n}^*(\omega) \right)$$
(A.3)

After placing Eqs. (A.2) and (A.3) in Eq. (A.1),  $\phi^*_{out,n}$  becomes:

$$\phi_{out,n}^{*}(\omega) = H_{DCO}(\omega)\phi_{dco}^{*}(\omega) + H_{DCO}(\omega)H_{N}(\omega)H_{DLF}(N\omega)\frac{1}{N}\sum_{n=0}^{N-1}\phi_{q}^{*}\left(\omega-n\frac{2\pi}{N}\right) + H_{DCO}(\omega)H_{N}(\omega)H_{DLF}(N\omega)\frac{K_{n}}{N}\sum_{n=0}^{N-1}\phi_{in}^{*}\left(\omega-n\frac{2\pi}{N}\right) - H_{DCO}(\omega)H_{N}(\omega)H_{DLF}(N\omega)\frac{K_{n}}{N}\sum_{n=0}^{N-1}\phi_{out,n}^{*}\left(\omega-n\frac{2\pi}{N}\right)$$
(A.4)

The sum of the last term of Eq. (A.4) can be split into two parts:

$$\sum_{n=0}^{N-1} \phi_{out,n}^* \left( \omega - n \frac{2\pi}{N} \right) = \phi_{out,n}^* (\omega) + \sum_{n=1}^{N-1} \phi_{out,n}^* \left( \omega - n \frac{2\pi}{N} \right)$$
(A.5)

Furthermore, the following relations are defined:

$$\Sigma_{\phi_{out,n}^*}(\omega) = \sum_{n=1}^{N-1} \phi_{out,n}^*\left(\omega - n\frac{2\pi}{N}\right)$$
(A.6)

$$T(\omega) = H_{DCO}(\omega)H_N(\omega)H_{DLF}(N\omega)\frac{K_n}{N}$$
(A.7)

$$H_{n,dco}(\omega) = \frac{H_{DCO}(\omega)}{1 + T(\omega)}$$
(A.8)

$$H_{alias}(\omega) = \frac{T(\omega)}{1 + T(\omega)}$$
(A.9)

where  $T(\omega)$  is loop gain of the AD-CDR loop,  $H_{n,dco}(\omega)$  is a high-pass transfer function seen by the Digitally Controlled Oscillator (DCO) phase noise and  $H_{alias}(\omega)$  is a low-pass transfer function seen by the sum of the images of the output phase spectrum  $\Sigma_{\phi_{out,n}^*}(\omega)$  and seen by the sum of the of the images of the input phase spectrum.

Using Eqs. (A.4)-(A.9),  $\phi_{out,n}^*$  can now be written in the compressed form:

$$\phi_{out,n}^*(\omega) = \phi_{out,lti}^*(\omega) - H_{alias}(\omega) \Sigma_{\phi_{out,n}^*(\omega)}$$
(A.10)

where  $\phi_{out,lti}^{*}$  is given by:

$$\phi_{out,n}^{*}(\omega) = H_{n,dco}(\omega) \phi_{dco}^{*}(\omega) + \frac{1}{K_n} H_{alias}(\omega) \sum_{n=0}^{N-1} \phi_q^{*} \left(\omega - n\frac{2\pi}{N}\right) + H_{alias}(\omega) \sum_{n=0}^{N-1} \phi_{in}^{*} \left(\omega - n\frac{2\pi}{N}\right)$$
(A.11)

This term reflects the output phase spectrum that is predicted after traditional Linear Time-Invariant (LTI) analysis of the discrete-time multirate AD-CDR model in Fig. 3.19(b).

The procedure to solve Eq. (A.10) is similar to the procedure followed in [3] and results in:

$$\phi_{out,n}^{*}(\omega) \simeq \phi_{out,lti}(\omega) - H_{alias}(\omega) \sum_{n=1}^{N-1} A\left(\omega - n\frac{2\pi}{N}\right) \phi_{out,lti}^{*}\left(\omega - n\frac{2\pi}{N}\right)$$
(A.12)

with

$$A(\omega) = \frac{1}{1 - H_{alias}(\omega)} = 1 + T(\omega)$$
(A.13)

Based on Eq. (A.12), the power spectral density of the AD-CDR output phase for uncorrelated noise sources is obtained as:

$$S_{\phi_{out,n}^*}(\omega) \simeq S_{\phi_{out,lti}^*}(\omega) - |H_{alias}(\omega)|^2 \sum_{n=1}^{N-1} \left| A\left(\omega - n\frac{2\pi}{N}\right) \right|^2 S_{\phi_{out,lti}^*}\left(\omega - n\frac{2\pi}{N}\right)$$
(A.14)

where

$$S_{\phi_{out,lti}^*}(\omega) = |H_{n,dco}(\omega)|^2 S_{\phi_{DCO}^*}(\omega)$$
  
+  $N \left| \frac{H_{alias}(\omega)}{K_n} \right|^2 S_{\phi_q^*}(\omega)$   
+  $N |H_{alias}(\omega)|^2 S_{\phi_{in}^*}(\omega)$  (A.15)

# References

- R.W. Schafer and L.R. Rabiner, "A digital signal processing approach to interpolation," *Proceedings of the IEEE*, vol. 61, no. 6, pp. 692–702, 1973.
- [2] John G. Proakis and Dimitri G. Monolakis, *Digital signal processing: principles, algorithms, and applications.* Prentice Hall, 1996.
- [3] Ioannis L. Syllaios and Poras T. Balsara, "Linear Time-Variant Modeling and Analysis of All-Digital Phase-Locked Loops," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 59, no. 11, pp. 2495–2506, nov 2012.