# Rate-Adaptive Concatenated Polar-Staircase Codes for Data Center Interconnects

Tayyab Mehmood<sup>1</sup>, Metodi P. Yankov<sup>1</sup>, Anders Fisker<sup>2</sup>, Kim Gormsen<sup>2</sup> and Søren Forchhammer<sup>1</sup>

<sup>1</sup>DTU Fotonik, Technical University of Denmark, 2800 Kongens Lyngby, Denmark <sup>2</sup>Zeuxion, 3500 Værløse, Denmark tayy@fotonik.dtu.dk

**Abstract:** A rate-adaptive concatenated code, consisting of an outer staircase code and an inner polar code is proposed. Short blocklength inner polar codes offers rate-adaptivity and more than 0.35 dB gain compared to the 400ZR data-center-interconnect error-correcting code. © 2020 The Author(s)

# 1. Introduction

The growing demand for internet applications causes an exponential increase in data center (DC) traffic, not only the client-server traffic but also the traffic exchanged between DCs. Near Capacity achieving coded modulation (CM) schemes such as low-density parity-check (LDPC) [1] and polar codes [2] combined with advanced modulation formats contributes to increasing the transmission distance and rates for client-server (long-haul) traffic. The pursuit of high coding gain (CG) and rate-adaptivity has instigated a significant increase in power consumption. Compared to high-performance soft-decision (SD) codes, hard-decision (HD) FEC solutions such as turbo product codes (TPC) [3] and staircase codes [4] require decoders of lower complexity. An implementation comparison shows that an HD decoder [5] consumes roughly an order of magnitude less power than a SD decoder [6] at the expense of  $1 \sim 2$  dB loss. Additionally, iterative decoding and sparse graph-based SD FEC solutions are known to have an error floor above a bit-error-rate (BER) of  $10^{-15}$ , which can be mitigated at the cost of increased processing power, either by outer concatenated code [7] or by post-processing [8] schemes.

The concatenated FEC (C-FEC) solution [7] for OTN is composed of a high performance but complex inner code and comparatively simple outer code to remove the error-floor. To reduce the complexity, a concatenated LDPC-staircase code was proposed in [9], utilizing the outer HD code to achieve the low BER requirements, thus allowing reduced complexity of the inner large blocklength LDPC code, which is used to reduce the BER to below the threshold of the outer code.

Driven by the future data-center interconnects (DCIs) and optical metro networks market needs, the optical internetworking forum (OIF) proposed a multi-vendor interoperable implementation agreement for 400G pluggable coherent optical modules called 400ZR [10], with low power consumption and  $3 \sim 4x$  cost per bit reduction. The concatenated Hamming-staircase code (CHSC) proposed by Inphi [11] is adopted as the 400ZR FEC algorithm, because of its well-balanced FEC performance and low complexity (power consumption) essential for 400G DCI applications. In [12] the performance of 400ZR CHSC was investigated, down to BER of  $10^{-15}$  using FPGAs.

Rate-adaptivity is not addressed in OIF 400ZR but as enhancement it can either be achieved by using a set of FEC codes of different code-rates [1] or by shortening or puncturing, e.g. shortening the Hamming or staircase [13] codes. Considering the limited chip-area, DCIs cannot accommodate multiple FEC codes, while shortening and puncturing comes at the cost of performance degradation [13]. Hence, for future DCIs, it is highly desirable to have a low-power, high-performance and/or rate-adaptive (RA) C-FEC scheme with an error-floor below 10<sup>-15</sup>.

In this paper, we propose a low-power concatenated polar-staircase code (CPSC) which outperforms the CHSC adopted by OIF. The proposed inner polar code offers improved performance and, a high degree of rate-adaptivity with relatively small gap to capacity (GTC) performance regardless of the rate, without increasing the complexity.

#### 2. The Inner-Code Design

Polar codes are based on the "channel polarization" method, where bit-channels near the absorbing boundaries are either perfect (very high reliability) or useless channels (very low reliability) [2]. The channels between perfect and useless channels are called mediocre channels. In the proposed inner-code, the channel reliabilities (design-SNR) are calculated by employing the Bhattacharyya parameter based construction [2]. For block-length N and information bits K, polar transformed encoded vector can be defined as  $U^N = X^N F_2^{\otimes n}$ , where  $F_2^{\otimes n}$  is the  $n^{th}$  Kronecker product of  $F_2$ , the kernel matrix  $F_2 = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix}$ , and the transformed vector  $U^N$  can only be of size  $N = 2^n$ ,  $n \ge 1$ . The K information bits are assigned to the high-reliability bit positions and the remaining N - K least reliable bit-positions are set to be *frozen*. The rate of inner polar code is defined as  $R_{pc} = \frac{K}{N}$ . Rate-adaptivity can be achieved by changing the number of information bits K selected according to the channel reliabilities.

#### 3. The Outer-Code Design

Staircase codes (SC) are a class of hard-decision codes that amalgamates the ideas from linear block codes and recursive convolutional codes. This amalgamation results in a continuous product code like structure, with close



Fig. 1: (a) System model of concatenated polar-staircase coding and modulation over AWGN channel (b) Gap to capacity performance with uniform 16 QAM at different data-rates.

to capacity performance [4]. A binary primitive (n,k,r,t) BCH code in a systematic form is typically chosen as the component code. Here, n, k, r, and t are the block-length, information dimension, number of parity bits and error correcting capability of the component code, respectively. An ITU-T G.709 compatible staircase code comprises of the semi-infinite concatenation of  $m \times m$  matrices, as shown in Fig. 1a. Decoding is performed in an iterative sliding window fashion. The number of binary operations per information bit required for a sliding window decoder can be upper bounded as  $I_{sc}(W-1)t/(m-r)$ . Here  $I_{sc}$  and W are the number of iterations and window size for the sliding window decoder, respectively [9]. The rate of the SC can be defined as  $R_{sc} = (1 - \frac{r}{m})$ . The threshold  $P_{sc}$  of SC is defined as the maximum input BER for which the SC decoder can achieve an output BER of  $10^{-15}$ .

## 4. Rate-adaptive Concatenated Code Structure

The proposed CPSC consists of an outer staircase code (ITU-T G.709) with extended BCH (1022,990) as a component code, as in ZR [10]. The m(m-r) information bits from the source are encoded and the generated  $m \cdot r$ parity bits are added into the SC block in the encoding process. The permutation  $\pi_1$  is applied to the output of an outer encoder, as depicted in Fig. 1a. The interleaved bits are divided into *T* sub-blocks and each sub-block is encoded by the (N, K) systematic polar-encoder. Sub-blocks of length *T* can be calculated as  $T = \frac{J \cdot m^2}{K}$ , where *J* is the number of SC blocks transmitted. CRC is performed on *K* information bits and the cyclic redundancy check (CRC) overhead is included in the simulation results. The *K* information bits are encoded by the polar encoder and the resultant transformed vector of length *N* is first interleaved by  $\pi_2$  and then labeled and modulated to 16 QAM constellation points by a mapper  $\Phi$ . As shown in Fig. 1a, in the simulations the modulated symbols are transmitted over additive white Gaussian noise (AWGN) channel. We assume that  $\Phi$  is responsible for both binary reflected Gray labeling and modulation.

At the receiver side, demodulation and de-mapping are performed by de-mapper  $\Phi^{-1}$ . Resultant Log-likelihood ratios (LLRs) are first de-interleaved by  $\pi_2^{-1}$  and then decoded by the CRC-aided successive cancellation list (CRC-SCL) polar decoder down to  $P_{sc}$  threshold. CRC-SCL polar decoders at short blocklengths require less area occupation and power consumption compared to LDPC codes at comparable error correction performance [14]. Decoded bits  $(J \times m^2)$  are de-interleaved by  $\pi_1^{-1}$  and then fed to the sliding window decoder to achieve an output BER of  $10^{-15}$  or below.

To get a single C-FEC code for rate-adaptivity with the inner polar codes, we first calculate the GTC values for each data-rate over a *range* of optimized design-SNR values. Capacity of the AWGN channel here selected as the achievable rate for the uniform 16 QAM. The minimum GTC value for each data-rate is shown by dash lines in Fig. 1b. All the curves of Fig. 1b and Fig. 2a were obtained at SNR at which BER= $10^{-6}$ . For the rate-adaptive (RA) CPSC scheme, a *single* design-SNR value is chosen at which the average GTC value over a set of data-rates is minimum, as shown by solid yellow and green lines in Fig. 1b. We then fixed all the parameters of the proposed CPSC scheme, such as design-SNR, CRC-length, and list-size of the inner code and  $R_{sc}$ ,  $I_{sc}$ , and W of the outer code. After fixing the C-FEC parameters, depending on the channel conditions K can be tuned to get the same  $P_{sc}$  at the output of the inner decoder for the variety of rates.

### 5. Simulation Results

We present examples of RA CPSC designs with polar code-lengths N = 1024 and N = 8192 representing low and high list decoding complexity. Each concatenated code design is validated from 3.48 to 2.78 (bits/QAM symbol) data-rates, as shown in Fig. 1b. CPSC is evaluated by simulating J = 100 SC blocks ( $J \times m^2 = 2.62 \times 10^7$  bits) over the AWGN channel. The following parameters were utilized: CRC-length=6, list-size=32,  $R_{sc} = \frac{239}{255}$ , W = 5

#### Th1I.6.pdf

and  $I_{sc}$ =14. The outer code parameters such as  $n, k, R_{sc}, W$  and  $I_{sc}$  are the same as in [11].

For comparison, we include results from the OIF adopted C-FEC code [11] and off-the-shelf DVB-S2 LDPC-BCH code [1]. CHSC and CPSC have an error-floor around  $4 \times 10^{-21}$  [4], while DVB-S2 suffers from an error floor around  $10^{-11}$  [1]. It can be seen from Fig. 2a that the proposed scheme with polar block-lengths N = 1024and N = 8192 gives 0.31 dB and 0.35 dB gain over CHSC and at the same time provides a high degree of flexibility. It can be seen from Fig. 2b, that the BER performance of CPSC with N=8192 is 0.35 dB better than the Inphi CHSC [11]. DVB-S2 requires a set of FEC codes to achieve rate-adaptivity and extrapolation suggests that the performance of the proposed scheme will be better then the DVB-S2 codes at BER of  $10^{-15}$ . The proposed RA CPSC scheme can directly be extended to higher modulation formats and other block-lengths of size  $2^n$ .



Fig. 2: (a) Achieved data rates for different channel SNR values at BER of  $10^{-6}$  (b) BER performance for proposed CPSC (green squares), Inphi CHSC (blue diamonds) and DVB-S2 codes (red circles) at data-rates, DR.

### 6. Conclusion

In this paper, a concatenated polar-staircase coding technique is proposed. The short blocklength inner polar-code is used to achieve efficient error-reduction and rate-adaptivity. The proposed scheme operates a single C-FEC code and the only adjustable parameter in the system is the number of information bits of the inner code. The CPSC scheme provides 0.35 dB of performance improvement compared to the 400ZR FEC code and allows one to select any operating point with arbitrary granularity and without performance penalty.

#### Acknowledgements

This work was supported by the IFD INCOM project and DNRF Research CoE SPOC, ref. DNRF123.

# References

- 1. A. Morello et al., "DVB-S2: The second generation standard for satellite broad-band services," Proc. of the IEEE, **94**, 1, 210–227, 2006.
- E. Arikan, "Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels," IEEE Trans. Inf. Theory, 55, 7, 3051–3073, 2009
- 3. J. D. Andersen et al., "A configurable FPGA FEC unit for Tb/s optical communication," IEEE ICC, 1-6, 2017.
- 4. B. P. Smith et al., "Staircase codes: FEC for 100 Gb/s OTN," IEEE JLT, 30, 1, 110–117, 2011.
- 5. Y. Lee et al., "A 2.74-pJ/bit, 17.7-Gb/s iterative concatenated-BCH decoder in 65-nm CMOS for NAND flash memory," IEEE J. Solid-State Circuits, **48**, 10, 2531–2540, 2013.
- 6. T. C. Ou et al., "27.6 An 821MHz 7.9 Gb/s 7.3 pJ/b/iteration charge-recovery LDPC decoder," IEEE ISSCC, 2014.
- 7. K. Sugihara et al., "A spatially-coupled type LDPC code with an NCG of 12 dB for optical transmission beyond 100 Gb/s," OFC, OM2B-4, OSA, 2013.
- D. Chang et al., "FPGA verification of a single QC-LDPC code for 100 Gb/s optical systems without error floor down to BER of 10<sup>-15</sup>," OFC, OTuN2, OSA, 2011.
- 9. M. Barakatain et al., "Low-complexity concatenated LDPC-staircase codes," IEEE JLT, 36, 12, 2443–2449, 2018.
- 10. O. I. Forum, "Implementation Agreement 400ZR," Optical Internetworking Forum 0.10-Draft, 2018.
- 11. B. P. Smith et al., "Leveraging 400G ZR FEC Technology," IEEE 802.3 Beyond 10 km Opt. PHYs Study Group, 2017.
- 12. Y. Cai et al., "FPGA Investigation on Error-Flare Performance of a Concatenated Staircase and Hamming FEC Code for 400G Inter-Data Center Interconnect," IEEE JLT, **37**, 1, 188-195, 2019.
- 13. L. Schmalen et al., "Distributed rate-adaptive staircase codes for connectionless optical metro networks," OFC, W1J.2 OSA, 2017.
- F. Ercan et al., "On error-correction performance and implementation of polar code list decoders for 5G," IEEE Allerton, 443–4491, 2017.