# Real-Time Feedforward Clock Recovery for Optical Burst-Mode Transmission

Patrick Matalla\*, Md Salek Mahmud, Christoph Füllner, Wolfgang Freude, Christian Koos and Sebastian Randel\*\*

Institute of Photonics and Quantum Electronics (IPQ), Karlsruhe Institute of Technology (KIT), Engesserstr. 5, 76131 Karlsruhe, Germany \* patrick.matalla@kit.edu, \*\* sebastian.randel@kit.edu

**Abstract:** We compare three feedforward non-data-aided clock recovery algorithms suitable for burst-mode reception in PONs and datacenters. Our experimental setup allows real-time OOK transmission at 3 Gbit/s. The tolerable clock frequency mismatch is 475 ppm. © 2022 The Author(s)

#### 1. Introduction

Clock recovery is an essential building block in communication systems, extracting the sampling clock information from the received signal to correct the timing phase offset between transmitter and receiver. Timing synchronization can be achieved either in a feedback architecture based on a phase-locked loop (PLL) or in a feedforward (FF) architecture. Provided that relatively stable sampling oscillators are used and data transmission is continuous, feedback structures are known for their stable phase tracking. For this reason, this architecture is frequently used, e.g., in long-haul point-to-point systems. However, due to their relatively long acquisition time, such control loops do not meet the requirements of fast synchronization in burst-switched systems such as passive optical networks (PON) [1] or data centres [2]. In those systems, FF architectures are preferred because of their capability to provide an instantaneous timing estimation. Furthermore, their improved high-frequency jitter performance allows for using low-cost oscillators [3]. In FF schemes, the received signal is split into two paths. The signal in the first path is used to determine a timing estimate, which subsequently corrects the sampling offset of the delayed signal in the second path in a digital delay element, usually a Lagrange interpolator. In PONs, the timing estimate is typically derived from a known preamble. Recently, the authors in [4] showed that a non-data-aided FF algorithm allows to shorten or even avoid the preamble. A prominent example of such an algorithm is the square-timing-recovery algorithm proposed by Oerder and Meyr [5–7]. However, since this algorithm requires an oversampling factor larger than two, it is often considered too complex for high-speed optical systems, in which a massive parallelization of the signal processing is required.

In this work, we compare the well-known Oerder and Meyr (OEM) algorithm with the algorithm proposed by Zhu [3,8] and the one proposed by Barton and Al-Jalili (BAJ) [9]. In a real-time OOK transmission at 3 Gbit/s, we experimentally determine the maximum tolerable clock frequency mismatch. Furthermore, we demonstrate that all three FF algorithms can provide an instantaneous timing acquisition in a burst-mode system without requiring a preamble.

#### 2. Feedforward Clock Recovery Algorithms

A time delay  $\tau$  of the real-valued received signal x(t) in the time-domain results in a linear phase shift in the frequency-domain. The algorithms discussed in this work extract the sampling information from the linear phase shift of the cyclostationary spectrum. The OEM algorithm calculates the timing estimate  $\hat{\tau}$  from a block of L symbols at an oversampling ratio of  $N \ge 2$ . In this work we use N = 4. After squaring the signal  $x_k$ , it is no longer zero mean and spectral components at integer multiples of the symbol rate  $f_{sym}$  are generated. The phase of the resulting peak at  $f_{sym}$ , i.e., at the frequency bin with the index m = L, can be determined from the argument of the corresponding Fourier coefficient. Since the phase at  $f_{sym}$  is equal to  $-2\pi\tau$ , the estimated clock phase is

$$\hat{\tau} = -\frac{1}{2\pi} \arg\left\{ \sum_{k=0}^{NL-1} x_k^2 e^{-j2\pi k \frac{m}{NL}} \right\} = -\frac{1}{2\pi} \arg\left\{ \sum_{k=0}^{NL-1} x_k^2 e^{-j\pi \frac{k}{2}} \right\}.$$
(1)

To apply the OEM scheme at twofold oversampling, Zhu extended the algorithm by modulating the received signal  $x_k$  with a half-symbol-rate complex exponential and by filtering it with a lowpass filter h of length P prior to the squaring operation, i.e.,

$$\hat{\tau} = -\frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{NL-1} \left( \sum_{p=0}^{P-1} x_{k-p} \,\mathrm{e}^{-\mathrm{j}\,2\pi(k-p)/4} \,h_p \right)^2 \right\} \,. \tag{2}$$

In contrast, the BAJ algorithm exploits the spectral redundancy of the cyclostationary signal in the frequency domain. First, *N*-point FFT of the received signal with roll-off factor  $\beta$  is computed. Afterwards, the clock tone is generated by multiplying the lower sideband  $X_n$  with the upper sideband  $X_{n+N/2}$ . The resulting phase of the clock tone is now proportional to the timing offset and can be extracted from

$$\hat{\tau} = \frac{1}{2\pi} \arg \left\{ \sum_{n=(1-\beta)N/4}^{(1+\beta)N/4-1} X_n X_{n+\frac{N}{2}}^* \right\} .$$
(3)

## 3. Hardware Implementation

Our real-time system is designed for a bit rate of 3 GBd OOK and is similar to the implementation presented in [10], featuring 4 GSa/s sampling rate,  $\beta = 0.2$  and a 250 MHz FGPA clock. Figure 1(a) provides a block diagram of the three clock recovery algorithms. In the summation blocks the parallel samples are summed up to a single value per clock cycle. This reduces the update rate of the algorithms to 250 MHz and 62.5 MHz for the Zhu/OEM and BAJ algorithm, respectively. In order to allow a fair comparison, the subsequent moving average (MA) filter lenghts are chosen such that for all three realizations the first drop of the sinc-shaped frequency response is at around 2 MHz. The unwrap function extends the phase information over 23 unit intervals. Afterwards, a position calculator determines the integer delay for the elastic buffer and a fractional delay for the Lagrange interpolator. The hardware utilization of the respective clock recovery algorithms require a similar hardware complexity. Solely the OEM structure requires a more complex resampling stage in order to realize fourfold upsampling. However, the Lagrange interpolator can still run at twofold oversampling. Nevertheless, the Zhu algorithm needs less resources. If the FFT complexity is included, the BAJ algorithm utilizes most hardware. However, the Fourier transform might be reused for equalization.



Fig. 1. (a) DSP architectures of the three clock recovery algorithms. (b) Hardware utilization of the Xilinx RFSoC ZCU111 System-on-Chip (SoC).

#### 4. Experimental Results

The experimental setup of the optical back-to-back system is shown in Fig. 2. A PRBS15 sequence is generated at the transmit SoC, which is then pulse-shaped and converted to an analog voltage signal. The common mode voltage of the DAC is removed by DC blocks (DC-B), then lowpass (LPF) filtered to suppress spectral images, and converted to a single-ended signal using a balun. The signal then modulates the optical carrier at a wavelength of 1550 nm and an optical power of 10 dBm, generated by an external cavity laser (ECL). The optical signal is detected by a photodiode with transimpedance amplifier and differentially connected to the ADC of the receiver SoC. The oscillator at the transmitter side is integrated on the FPGA board and measures about 3932.145 MHz. To sweep the clock frequency from 3932.145 MHz to 3935.045 MHz, i.e., a clock mismatch from 0 MHz to 2.9 MHz, a signal generator has been used as an external clock source. From Fig. 3, it can be recognized that the BAJ algorithm is



Fig. 2. Setup for a 3 Gbit/s optical back-to-back transmission. BERT: BER tester.

significantly more resilient to high clock frequency offsets of up to 475 ppm, while the other methods experience a performance degradation already at 275 ppm. The reason for the sudden degradation of the performance around

2 MHz is the MA lowpass filtering of the timing estimates. Furthermore, only a slight performance penalty between the electrical and optical back-to-back can be noticed. Since the elastic buffer undergoes an over- or underflow every 23 unit intervals, one clock cycle, i.e., 12 bits, has to be skipped or repeated. This results in an error floor. With that by hand, a lower limit of the BER can be computed from the frequency offset as depicted in Fig. 3. The BAJ algorithm approaches this limit for a clock mismatch of up to 1.75 MHz considerably well. Finally, fast



Fig. 3. Experimentally determined BER over clock mismatch in MHz and ppm. A FEC limit of 0.38 % is used.

synchronization is demonstrated in a burst-mode receiver. For this purpose, a 125 µs on/125 µs off burst pattern is generated. The clock phase estimation and the burst signal over time are shown in Fig. 4. First, the timing estimate during the pause interval is considered. The clock synchronization is applied to noise, which has a random phase in the frequency-domain. The lowpass filters of the respective synchronizers average over this, resulting in an irregular time estimate without practical significance. As soon as the burst starts, an instantaneous response of the synchronizers is detected. The settling of the MA filters can be compensated by adjusting the delay line of the signal in a feedforward scheme. It can also be seen that despite the RC-transients [11] of the burst, a clock phase estimation is possible.



Fig. 4. Received burst-mode signal after the receive filter and timing estimation over time.

# 5. Conclusion

With the Zhu and BAJ techniques, we demonstrate two feedforward algorithms that provide fast clock synchronization and high frequency jitter tolerance of up to 475 ppm at comparable hardware cost compared to the feedback algorithms. They are therefore potential candidates for future access and data center networks and offer an efficient alternative to the popular OEM algorithm.

# 6. Acknowledgements

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation - 40318716262; 359861158) and the Bundesministerium für Bildung und Forschung (Förderprojekt KIGLIS, Grant 16KIS1228).

## References

- 1. N. Iiyama et al., "Real-Time Demonstration of 20-Gb/s QPSK Burst-Mode Digital Coherent Reception for PON Upstream under Clock Frequency Mismatch of 1.0 MHz," in *OFC*, 2020.
- K. A. Clark et al., "Synchronous Sub-Nanosecond Clock and Data Recovery for Optically-Switched Data Centres using Clock Phase Caching," in *Nature Electronics*, Vol. 3, pp. 426-433, 2020.
- 3. C. Fludger et al., "Jitter Tolerant Clock Recovery for Coherent Optical Receivers," in OFC, 2013.
- 4. J. Zhang et al., "Efficient preamble design and digital signal processing in upstream burst-mode detection of 100G TDM coherent-PON," in *Journal of Optical Communications and Networking*, Vol. 13, issue 2, 2021.
- 5. J. Zhang et al., "Real-Time FPGA Demo. of PAM-4 BM All-Digital CDR for Single λ 50G PON Appl.," in OFC, 2018.
- 6. X. Zhou, "Efficient Clock and Carrier Recovery Algorithms for Single-Carrier Coherent Optical Systems: A Systematic Review on Challenges and Recent Progress," in *IEEE Signal Processing Magazine*, Vol. 31, issue 2, 2014.
- 7. M. Oerder, H. Meyr, "Digital Filter & Square Timing Recovery," in IEEE Trans. on Communications, Vol. 36, no. 5, 1988.
- 8. W.-P. Zhu et al., "Feedforward Symbol Timing Recovery Scheme Using Two Samples Per Symbol," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 52, pp. 2490-2500, 2005.
- 9. S. K. Barton and Y. O. Al-Jalili, "A Symbol Timing Recovery Scheme Based on Spectral Redundancy," in *IEE Colloquium on Advanced Modulation and Coding Techniques for Satellite Communications*, 1992.
- 10. P. Matalla et al., "Hardware Comparison of FF Clock Recovery Algorithms for Opt. Communications," in OFC, 2021.
- 11. D. van Veen, V. E. Houtsma, "50 Gbps Low Complex BM Coherent Detection for TDM PON," in ECOC, 2018.
  - Disclaimer: Preliminary paper, subject to publisher revision