# Real-Time Demonstration of 20-Gb/s QPSK Burst-Mode Digital Coherent Reception for PON Upstream under Clock Frequency Mismatch of 1.0 MHz

Noriko Iiyama, Masamichi Fujiwara, Takuya Kanai, Hiro Suzuki, Jun-ichi Kani and Jun Terada

NTT Access Network Service Systems Laboratories, NTT Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa, 239-0847 Japan Noriko.iiyama.sk@hco.ntt.co.jp

**Abstract:** We demonstrate real-time burst-mode coherent reception of 10-Gsymbol/s QPSK signals under 1.0-MHz clock frequency difference between Tx and Rx. Our sampling recovery proposal enables the dynamic range of 26.5 dB at BER of 10E-3. **OCIS code:** (060.4510) Optical Communication; (060.1660) Coherent Communications.

### 1. Introduction

With the aim of realizing higher speed mobile fronthaul/backhaul by improving capacity, more flexible user accommodation by extending reach and so on, it has been studied to enhance digital coherent (DC) technology to meet specific requirements for passive optical networks (PONs) that commercial DC receivers do not cover [1]. A technical challenge for DC PON is realizing adequate sampling synchronization in the burst-mode (BM) phase shift keying (PSK) Rx. BM clock data recovery (CDR) designed for currently deployed PONs inherently works only for non-return-to-zero (NRZ) signals [2]. Although sampling synchronization schemes for DC Rx in a continuous system such as long-haul have been established, their representative techniques, timing error detector searching zero-cross points and feedback loop to adjust the sampling rate of analog-to-digital converters (ADCs) in Rx to fit to the Tx clock frequency,  $f_{c,Tx}$ , and so on are not suitable for BM system due to their inadequate response speeds [3]. The previous works on BM DC Rx with clock frequency mismatch,  $\Delta f_c$ , can be summarized as interpolating the input samples and selecting the best of them [4]. Since the best sample is chosen from the original ones and additional ones obtained by the interpolation within a symbol duration, a timing error beyond the symbol boarder in a burst frame cannot be compensated which results in very limited  $\Delta f_c$  tolerance and/or possible burst frame length. Also, since down-sampling to 1 sample/symbol is done at the same time when selecting the best sample, the constant modulus algorithm (CMA) filter with single-tap after the sampling recovery cannot suppress the samplingtiming dependency, which induces large performance differences between the best case and the worst case depending on the sampling timing [5]. The system described has, in the worst case, 2.8-ppm tolerance against  $\Delta f_c$ .

This paper proposes a new sampling recovery scheme for BM DC Rx. This scheme achieves rapid response, strong  $\Delta f_c$  tolerance and sampling-timing independency by employing two key techniques; the timing error is detected by the maximum amplitude method (MAM) and sampling recovery is done by directly manipulating the number of incoming samples so that the symbol rate fits the sampling rate of ADCs based on the clock frequency in Rx,  $f_{c,Rx}$  [3]. The former finds the timing error faster than the zero-cross point method and increases its response speed. The latter can compensate the timing error beyond the symbol boarder and increases  $\Delta f_c$  tolerance and possible burst frame length. This sampling recovery proposal maintains the number of samples per symbol and thus allows the following 9-tap CMA filter to make the system sampling-timing-independent. We experimentally confirm the feasibility of a real-time BM DC Rx that implements the proposed sampling recovery scheme for 20-Gb/s quadrature phase shift keying (QPSK) signals where there is a mismatch of 1.0 MHz between  $f_{c,Tx}$  and  $f_{c,Rx}$ , which corresponds to 100 ppm, and report its performance.

### 2. Experimental setup and sampling recovery

The setup of our experiment is depicted in Fig. 1(a). In this setup, we utilized two independent synthesizers to emulate the difference between  $f_{c,Tx}$  and  $f_{c,Rx}$ . The one for Tx is set to 9.999 GHz and the other for Rx is set to 15.000 GHz. The Tx synthesizer directly provides its clock to the frame generator that generates two 9.999 Gb/s frame pulses. The frame consists of 3.84-µs preamble, 325.12-µs payload and 1.28-µs end-of-burst (EOB) period; it occupies 330.24 µs in total. The payload is pseudo-random bit sequence (PRBS) of  $2^7 - 1$ . The interval between frames is set to 330.70 µs assuming a frame for another ONU; 0.23-µs guard time is set between the frames. These frame pulses are input to the IQ modulator to generate a QPSK signal. The frame generator also provides two gate signals: One is for the SOA that works as a shutter to generate the burst signal and the other is for the error detector (ED) in Rx side to count only the errors in the payload. The local oscillator (LO) power is adjusted manually using a variable optical attenuator (VOA) to keep the output power of DC Rx approximately constant in this experiment



Fig. 1. (a) Experimental setup; PC=polarization controller, (b) Signal processing blocks, (c) Sampling regeneration in case 1,  $f_{c,Tx} < f_{c,Rx}$ , (d) case 2,  $f_{c,Tx} > f_{c,Rx}$ , (e) Behavior of A(k,m) when difference of  $f_c$ s is constant, (f) Estimation of buffer size

while we have reported some methods to automatically keep the DC Rx output power constant burst by burst [1][6]. The both lasers in Tx and Rx are the same-model external cavity lasers (ECLs) whose linewidth is 15 kHz. Polarizations are also controlled manually because this system deals with only single polarization while signal processing that enables polarization diversity has been established. The clock frequency of the Rx synthesizer is provided to the two 30-GSa/s ADCs, the FPGAs and the error detector (ED) via the clock distributer & divider that distributes the input clock and divides them to meet the input clock specification of each device. Since the pulses generated by Tx are two 9.999-Gb/s frames and Rx is designed to demodulate 10-Gsymbol QPSK signals, this setup replicates the condition wherein there is a  $\Delta f_c$  of 1.0 MHz, which corresponds to 100 ppm.

The real-time signal processing done in the FPGA is shown in Fig. 1(b). The key feature is the second block, which is sampling recovery consisting of timing error detection with MAM and sampling sequence regeneration. Since the Rx is designed to demodulate 10-Gsymbol/s signals and the sampling rate of the ADCs is 30 GSa/s, which corresponds to three times oversampling, three sampled values of input signals are always output from the ADCs within 100 ps, which is one symbol period T based on  $f_{c,Rx}$ . We did not use interpolation in this experiment. Hereafter, we write the sampled values of the input signals as u(k,m), where m=0, 1, 2 is the order of the signal sampled within T, and k is a parameter to express time  $t_{k,m} = kT + (m/3)T$ . The timing error detector with MAM utilizes the accumulated value  $\sigma(m) = \sum_{k=a}^{b} A(k,m)$  where A(k,m) is the squared envelope of u(k,m),  $|u(k,m)|^2$ ; it is used as an indicator to find when timing error occurs. Searching for n where  $\sigma(n) = \max_{0 \le m \le 2} \sigma(m)$  corresponds to searching for an eye opening in the case of NRZ. Since n is supposed to be always the same if  $f_{c,Tx}$ .

and  $f_{c,Rx}$  are identical, the timing error can be detected by tracking the change in *n* as the window used to calculate  $\sigma(m)$ , whose length is *b*-*a*, is slid over the signal.

In the next process, sampling sequence regeneration, the number of samples in the sampled sequence is directly manipulated by copying and inserting or eliminating a sample so that *n* always appears in every three samples. Those behaviors are illustrated in Figs. 1(c) and (d) for two cases; case 1 is  $f_{c,Tx} < f_{c,Rx}$  while case 2 is  $f_{c,Tx} > f_{c,Rx}$ . In case 1, *n* appears with shorter period than it is supposed to as indicated in the upper figure of Fig. 1(c). Then, *n* changes as follows  $0\rightarrow 2$ ,  $1\rightarrow 0$  or  $2\rightarrow 1$ , and the prior timing error detection part detects the change. Those changes are recognized as "shortage" in the number of samples within *T*, which triggers manipulations to increase the number accordingly; a sample is copied and inserted to create a new sample and following samples are accordingly delayed as shown in the middle figure of Fig. 1(c). This manipulation makes *n* appear in every three samples as in the lower figure of Fig. 1(c). Our experiment corresponds to case 1, so this operation is employed. In case 2, *n* changes as follows  $0\rightarrow 1$ ,  $1\rightarrow 2$  or  $2\rightarrow 3$ , and this is recognized as indicating a "surplus" in the number of the samples



Fig. 2. (a) Behavior of buffer overflow, (b) Real-time BER results, (c) Transition of BER at the received power of -37 dBm

as shown in the upper figure of Fig. 1(d). Then, the reverse manipulations to case 1 are done; a sample is eliminated and the following samples are accordingly advanced. The results in the lower pane in Fig. 1(d) are the same as those in Fig. 1(c). Those manipulations actually change the number of samples, making it possible to compensate  $\Delta f_c$ beyond a symbol boarder and thus allowing large  $\Delta f_c$  and long burst frame length.

Figure 1(e) illustrates the ideal behavior of A(k, m) when the difference between  $f_{c,Tx}$  and  $f_{c,Rx}$  is constant. The three lines express m=0, 1, 2. Each time *n* changes, the processing of Fig. 1(c) or (d) is repeated, which needs a buffer of certain size to store the sample sequence. The number of the samples that needs to be stored in the buffer can be estimated from  $\Delta f_c$  and burst frame length as shown in Fig. 1(f). Those results can be derived by simple arithmetic and indicate that approximately 990 samples need to be stored to the delay direction for the conditions of our experiment, which are  $\Delta f_c$  of 100 ppm and a burst frame length of 330 µs. We implemented the buffer that can accommodate 2560 samples to the delay direction; it corresponds to the frame length of 853 µs. Note that this implementation can be applied only for BM systems because the buffer size needs to be infinite for the continuous system.

#### 3. Results and Discussion

Firstly, we measured the time until the buffer overflows to confirm the validity of the estimation in Fig. 1(f) using the monitors of the FPGA. The results, Fig. 2(a), shows that the buffer overflowed when 852  $\mu$ s elapsed after the frame is detected, which is in good accordance with the estimation. Figure 2(a) shows the bit error rate (BER) measured in real-time by ED versus the optical input power to the coherent Rx in Fig. 1. We verified the 9.999-Gsymbol/s QPSK signal was successfully received in BM DC Rx designed to demodulate 10.000 Gsymbol/s signals and the BER was held below 10E-3 at the received power of -37.5 dBm. We increased the received power to -11 dBm in 1-dB steps and confirmed that the dynamic range was at least 26.5 dB. We also evaluated the response speed of this sampling recovery by capturing whole frame data using a digital sampling oscilloscope (DSO) and observing the BER transition off-line. The result is shown in Fig. 2(b). BER falls under 10E-3 within ~22  $\mu$ s and settles within ~60  $\mu$ s. This result agrees with the bit count displayed on the ED which is the number of bits utilized to count errors after synchronization. Also from Fig. 2(b), it is confirmed again that the buffer did not overflow and worked well up to the end of the frame. Throughout the experiment, there was no discernible difference in performance between the best case and the worst case as regards the sampling timing. This is thanks to the 9-tap CMA filter after sampling recovery as it suppresses the problems raised by sampling timing.

## 4. Conclusion

We experimentally verified a 20-Gb/s QPSK real-time BM-DC Rx under the condition of a clock frequency mismatch of 1.0 MHz between Tx and Rx. Our burst-mode sampling recovery scheme is realized by combining timing error detector based on MAM and the sampling sequence regeneration with the direct manipulation of the number of samples so that the symbol rate of the incoming signals is synchronized to the sampling rate of the ADCs in Rx. A real-time demonstration showed that although the clock frequency mismatch induces some penalty, the system maintains the sensitivity of -37.5 dBm and dynamic range of 26.5 dB at the BER criteria of 10E-3. This BM DC receiver meets the specified power budgets in most PON standards such as 10G-EPON or NG-PON2 and also gives some room to improve system by extending the distance between OLT and ONUs.

#### References

[1] R. Koma *et al.*, "Demonstration of real-time burst-mode digital coherent reception with wide dynamic range in DSP-based PON upstream," JLT, Vol. 35, No. 8, 2017, pp. 1392-1398.

[2] J. Terada et al., "A 10.3125Gb/s Burst-Mode CDR Circuit using a ΔΣ DAC," ISSCC 2008, 11.4.

[6] S-Y. Kim et al., "OLT receiver for power normalization of burst OFDM signals enabling OFDM/TDMA-PON," PTL, Vol. 26, No. 24, 2014, pp. 2469-2472.

<sup>[3]</sup> S. Sampei et al., "Improvement of delay spread immunity by using symbol timing synchronization based on maximum likelihood estimation for 16QAM/TDMA diversity receivers," Electron. Lett., Vol.29, No. 22, 1993, pp. 1917-1918.

 <sup>[4]</sup> F. Vacondio *et al.*, "Experimental demonstration of a PDM QPSK real-time burst mode coherent receiver in a packet switched network," ECOC 2012, Tu.3.A.1.
[5] K. Kikuchi *et al.*, "Clock recovering characteristics of adaptive finite-impulse-response filters in digital coherent optical receivers," Opt. Express, Vol. 19, No. 6, 2011, pp. 5611-5619.