# **Comparison of Feedback and Feedforward Clock Recoveries for Ultra-Fast Synchronization in Passive Optical Networks**

Patrick Matalla<sup>\*</sup>, Christian Koos, and Sebastian Randel<sup>\*\*</sup>

Institute of Photonics and Quantum Electronics (IPQ), Karlsruhe Institute of Technology (KIT), Engesserstr. 5, 76131 Karlsruhe, Germany

\*patrick.matalla@kit.edu, \*\*sebastian.randel@kit.edu

**Abstract:** We compare digital non-data-aided feedback and feedforward clock recoveries for burstmode operation in high-speed PONs. For 56 GBd NRZ, PAM2, and PAM4, we demonstrate that a clock frequency offset of 20 ppm can be synchronized within 40.96 ns. © 2023 The Author(s)

## 1. Introduction

With the standardization of the high-speed 50G passive optical network (PON), analog-to-digital converters and digital signal processing (DSP) are introduced to PONs. This allows replacement of analog clock and data recovery (CDR) by digital clock recovery and adaptive equalization [1]. A non-data-aided digital clock recovery can be used to either shorten or completely avoid the preambles currently used for synchronization in PONs [2]. It can be implemented either in a feedback (FB) architecture based on a phase-locked loop (PLL) or in a feedforward (FF) architecture. Provided that relatively stable oscillators are used and data transmission is continuous, FB structures result in stable phase tracking and are therefore frequently used in optical communications, e.g., in long-haul point-to-point systems. Today, FB structures are applied because of their low complexity [3, 4]. However, due to their relatively long acquisition time, such control loops might not meet the stringent requirements of fast synchronization in burst-switched systems. In this case, FF schemes can be beneficial due to their instantaneous timing estimation and their improved high-frequency jitter performance especially when using low-cost oscillators. In [5,6], the squaring timing recovery algorithm proposed by Oerder and Meyr [7] is used, which requires more than twofold oversampling, and hence is considered too computationally complex in comparison to FB algorithms. In [8, 9], we investigated the real-time implementation, hardware complexity, and performance of FF algorithms, which require only an oversampling ratio sufficient to resolve the signal's bandwidth, and showed the potential of the FF architecture in burst-switched systems.

In this work, we discuss the differences and limitations of FB and FF clock recovery architectures for nanosecond-scale synchronization in high-speed burst-mode PONs. In particular, we compare the well-known Gardner algorithm [10] (FB) with the algorithm by Zhu et al. [11] (FF) in terms of their synchronization speed and stability for burst-mode transmission and clock frequency offset up to 20 ppm. We demonstrate synchronization of two optical network units (ONUs) at the optical line terminal (OLT) within 40.96 ns synchronization. To do so, we modulate a 56-GBd non-return-to-zero (NRZ), two-level, and four-level pulse amplitude modulation (PAM2/PAM4) signals and transmit them in C-band over 2.2 km single-mode fiber (SMF) with  $D \approx 15.5 \text{ ps/nm/km}$ .

## 2. Feedback and Feedforward Clock Recovery

The digital algorithm structure for FB and FF is depicted in Fig. 1, left. Both architectures use a timing corrector, usually consisting of an elastic buffer and an interpolator to correct for integer an fractional sampling offsets, respectively. For the PLL, the corrected signal is then fed into a timing error detector, which generates an error signal  $\varepsilon$  proportional to the sampling offset  $\tau$ . The loop filter, usually a proportional-integral (PI) filter, and the numerical controlled oscillator (NCO) form a second-order PLL, which recursively derives a timing estimate  $\hat{\tau}$ . In FF schemes, a timing estimator directly estimates the timing phase  $\hat{\tau}$  from the signal. Afterwards, the estimated phase is unwrapped at phase jumps of  $2\pi$ . To apply this sampling phase to the associated samples, the signal is delayed in a buffer and then corrected in the timing corrector.



Fig. 1. System architectures (left) and step responses (right) for feedforward and feedback schemes.

#### W2A.36

We compare two time-domain clock recovery algorithms not requiring a Fourier transform, which might be to computationally complex for PONs. The timing error detector proposed by Gardner [10] evaluates the received and sampled signal  $\{x_k\}$  by comparing adjacent samples around a center sample  $x_{2k+1}$  at twofold oversampling. The difference of the adjacent samples is weighted by the center sample and then averaged over a block of *N* samples to derive an error signal  $\varepsilon$  as

$$\varepsilon = \sum_{k=0}^{N/2-1} x_{2k+1} \left( x_{2k} - x_{2k+2} \right).$$
<sup>(1)</sup>

The Zhu algorithm squares the signal  $\{x_k\}$ . Hence, the signal is no longer zero mean and clock tones at integer multiples of the symbol rate are generated. The phase of the resulting tone at symbol rate is proportional to the sampling offset  $\tau$  and can be determined from the argument of the corresponding Fourier coefficient. The Zhu algorithm estimates the timing offset by shifting the symbol-rate frequency to baseband and limiting the bandwidth by a lowpass filter  $\{h_p\}_{p=0}^{P-1}$  of length *P* prior to the squaring operation, i.e.,

$$\hat{\tau} = \frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{N-1} \left( \sum_{p=0}^{P-1} x_{k-p} e^{-j\pi \frac{k-p}{2}} h_p \right)^2 \right\}.$$
(2)

Figure 1, right, shows the simulated step responses of the respective clock recovery schemes applied to a 50 GBd PAM2 signal with a signal-to-noise ratio of 15 dB filtered by a 5th-order Bessel lowpass filter with a 3-dB bandwidth of 35 GHz. For the FB and FF clock recoveries a block length of N = 256 is chosen (typical parallelization factor in real-time systems, i.e., one clock phase estimate per processing clock cycle) and a constant clock phase of 0.5 and -0.5 times the symbol duration is set, respectively. For the FF structure, the lowpass is implemented as a 4-tap moving average (MA) filter and additional 8-tap MA filter is applied prior to taking the argument, hence restricting the phase tracking speed to a 3-dB bandwidth normalized to the timing estimate update rate of about  $B_L = 1/(2 \cdot 8) = 6.25 \text{ e} - 2$ . Due to the additional averaging, full synchronization for FF method is achieved after  $10 \text{ ps} \cdot 256 \cdot 8 = 20.48 \text{ ns}$ . The FB control loop is implemented as a second-order PI loop with a normalized bandwidth of  $B_L = 5e-3$  and damping factor of  $\zeta = 0.707$  [12]. In contrast to the FF architecture, the loop stability in a FB scheme suffers from an inner-loop delay D, introduced by pipelining, filtering, or other inner-loop mechanisms, which can become very large in optical systems [13]. In Fig. 1, right, the consequence becomes apparent. For a feedback delay of D = 1 clock cycle, the PLL quickly converges to  $\tau = 0.5$ . However, the control loop requires more than 1.000 timing estimates to finally converge when a feedback delay of 50 clock cycles is introduced. In a 50-GBd system, this would correspond to more than 2.56 µs and therefore inherently limits the synchronization speed of FB architectures.

### 3. Experimental Results

Figure 2 illustrates the setup for a 56-GBd burst-mode transmission of loud and silent burst. In the ONU, a distributed-feedback (DFB) laser at 1540 nm is modulated by an electro-absorption modulator (EAM), which is driven from an arbitrary waveform generator (AWG) with a 28 GHz 6-dB bandwidth running at 56 GSa/s (Keysight USPA). The digital signal is a 56 GBd, NRZ, PAM2, and PAM4 sequence at the ONU1 and ONU2, respectively. The available memory of the AWG allows generating bursts with a length of 4.5812 µs (128.272 symbols), followed by a pause of 4.7812 µs. The signal of ONU1 is transmitted over a 2.2-km-SMF and is combined with the signal from ONU2 in an optical coupler. A variable optical attenuator (VOA) then sets the received optical power in front of the OLT. The receiver amplifies the signal using a semiconductor optical amplifier (SOA) followed by a 3-nm-wide bandpass filter. The optical signal is then detected by a 40-GHz PIN photodiode followed by an electrical amplifier and captured by a 33-GHz real-time oscilloscope sampling at 80 GSa/s. Finally, offline DSP is applied to resample the signal to twofold oversampling, which is required for the clock recovery presented. After clock recovery, a FF 21-tap linear equalizer [14] is applied, and hard-decision and bit-error-ratio testing is performed. Figure 2, right, shows the receiver sensitivity for both modulation formats in optical back-to-back (Btb) and after 2.2 km transmission. For PAM4 modulation, the nonlinear distortions by the SOA become apparent for high received optical powers, hence limiting the dynamic range. Further improvement of the dynamic range can be achieved by more sensitive optical receivers, e.g., photodiodes with transimpedance amplifiers or avalanche photodiodes, or by nonlinear equalizers.

In Fig. 3, we evaluate the clock recovery performance for both algorithm architectures. The FF scheme is implemented with a 16-tap MA prior to taking the argument. In the first row of Fig. 3, the block length is set to N = 256 samples and the clock frequency offset is set to 1 ppm. The FB control loop is implemented with  $B_L = 5e - 2$  to allow a faster convergence, a damping factor of  $\zeta = 0.707$ , and a feedback delay of D = 1. It can clearly be seen, that the Zhu algorithm delivers an instantaneous timing estimation, while the Gardner algorithm oscillates towards the correct clock phase at the beginning of the bursts. Considering the settling time of the 16-tap MA filter, the total synchronization speed of the Zhu algorithm results to  $10ps \cdot 256 \cdot 16 = 40.96ns$ . A



Fig. 2. Experimental setup for PON burst-mode upstream (left) with receiver sensitivity (right).

further reduction can be achieved by using overlapping blocks of samples. When increasing to N = 2048 (second row of Fig. 3), the synchronization time is reduced by a factor of eight which can be seen by the more smoothed transitions of the estimated clock phase for the FF algorithm. For the Gardner algorithm, the PLL becomes unstable and cannot follow the clock phase anymore. Finally, the clock phase offset is set to 20 ppm, N = 256 is chosen , and the feedback delay is set to a more realistic value of D = 50, requiring a reduced loop bandwidth of  $B_L = 5e-3$  to achieve stable operation (third row Fig. of 3). While the FF synchronization can still track the fast clock drift, the FB scheme cannot follow the phase change anymore.



Fig. 3. Received burst-mode signal after resampling and estimated sampling offset over time.

## 4. Conclusion

We show that feedback delays inherently limit feedback synchronization speeds to the order of microseconds. On the other hand, feedforward structures can allow ultra-fast synchronization within tens of nanoseconds. We experimentally demonstrate synchronization for a 56-GBd burst-mode signal within 40.96 ns using feedforward clock recovery. This makes them a potential candidate for digital clock recovery in future high-speed PONs.

Acknowledgements: This work was funded by the Bundesministerium für Bildung und Forschung in the KIGLIS project (Grant 16KIS1228), the Open6GHub project (Grant 16KISK010), and the AI-NET-ANTILLAS project (Grant 16KIS1316).

## References

- 1. R. Bonk et al., "50G-PON: The First ITU-T Higher-Speed PON System," in IEEE Comm. Mag., 60, 48-54, 2022.
- 2. J. Zhang *et al.*, "Efficient preamble design and digital signal processing in upstream burst-mode detection of 100G TDM coherent-PON," in *J. of Opt. Commun. Netw.*, **13**, A135-A143, 2021.
- D. van Veen, V. Houtsma, "Real-time validation of downstream 50G/25G and 50G/100G flexible rate PON based on Miller encoding, NRZ, and PAM4 modulation," in J. of Opt. Commun. Netw., 15, C147-C154, 2023.
- 4. N. Kaneda et al., "DSP for 50G/100G Hybrid Modulated TDM-PON," in ECOC, 2020.
- 5. J. Zhang et al., "Demo. of all-digital burst CDR for sym. 50 Gb/s/λ PON based on low-bandw. opt.," in Opt. Commun., 516, 128266, 2022.
- 6. J. Zhang et al., "Real-Time FPGA Demo. of PAM4 Burst-Mode All-Digital CDR for Single Wavelength 50G PON Appl.," in OFC, 2018.
- 7. M. Oerder, H. Meyr, "Digital Filter & Square Timing Recovery," in IEEE Trans. on Commun., 36, 605-612, 1988.
- 8. P. Matalla et al., "Hardware Comparison of FF Clock Recovery Algorithms for Opt. Communications," in OFC, 2021.
- 9. P. Matalla et al., "Real-Time Feedforward Clock Recovery for Optical Burst-Mode Transmission," in OFC, 2022.
- 10. F. Gardner, "A BPSK/QPSK Timing-Error Detector for Sampled Receivers," in IEEE Trans. on Commun., 34, 423-429, 1986.
- 11. W.-P. Zhu *et al.*, "FF Symbol Timing Recovery Scheme Using Two Samples Per Symbol," in *IEEE Trans. Circ. Syst.*, **52**, 2490-2500, 2005. 12. F. Gardner, "Phaselock Techniques," in *John Wiley & Sons, Inc.*, 3rd ed., 2005.
- 13. L. Barletta et al., "Impact of Loop Delay on the Perf. of Gardner Timing Recovery," in IEEE Photon. Technol. Lett., 25, 1797-1800, 2013.
- 14. R. Bonk et al., "Perspectives on & the road towards 100 Gb/s TDM PON with IM-DD," in J. Opt. Comm. Netw., 15, 518-526, 2023.