# Investigation of Concatenated KP4 FEC with Single-Parity-Check Codes for Short-Reach IM/DD Systems

Tom Wettlin<sup>\*</sup>, Stefano Calabrò, Nebojša Stojanović, Youxi Lin, Talha Rahman

Huawei Technologies Duesseldorf GmbH, Munich Research Center, Munich 80992, Germany \*tom ionas wettlin@huawei com

**Abstract:** We investigate the concatenation of KP4 FEC with short single-parity-check codes. This represents an intermediate solution in terms of performance, complexity and latency between standalone KP4 FEC and concatenated schemes based on stronger soft-decision codes. © 2023 The Author(s)

## 1. Introduction

For the cost-sensitive datacenter networks, intensity modulation and direct detection (IM/DD) systems are the simplest and most efficient solution. With growing data rates, more advanced digital signal processing (DSP) is required. Inside the optical module, the digital chip with its integrated optics already accounts for a large part of the power consumption, which is reinforced by the application of advanced DSP algorithms. Linear drive pluggable optics (LPOs) aim at a reduction of the power consumption by directly driving the optics from the switch ASIC's serializer/deserializer (SerDes) and thus removing the DSP chip from the optical module. For future 200 Gb/s/ $\lambda$  LPO, the DSP inside the switch ASIC needs to overcome significant impairments including severe bandwidth limitations, making maximum likelihood sequence estimation (MLSE) based on Viterbi algorithm (VA) a plausible option. In terms of forward error correction (FEC), KP4 is presumably too weak to support 200 Gb/s without DSP in the pluggable module. Instead of replacing KP4 by a stronger Reed-Solomon code, combining it in next-generation SerDes devices with a simple inner code costs less power consumption and latency and allows re-use of existing cores. In this scenario, an inner FEC with extremely low latency is required to support also high performance computing and artificial intelligence clusters. Since soft-decision (SD) FEC codes require complex pre-processing like soft-output Viterbi algorithm (SOVA) [1] or BCJR [2] detection, simpler solutions are of interest.

In this paper, we investigate the decoding of very-short (KP4-matched) symbol-wise SPC codes after simple VA based hard-output MLSE [3]. This low-complexity and low-latency solution is compared to conventional SD decoding of the SPC codes after BCJR and to the option of using only KP4 FEC. Experimental results for PAM-4 transmission at around 200 Gb/s show a gain of the investigated VA-SPC approach over KP4 at moderate bandwidth limitations. An additional gain of approx. 1 dB can be achieved, if the more complex BCJR detector is applied at the receiver.

### 2. SPC Codes based on Hard-Decision MLSE

The optimal soft-decision decoding algorithm for SPC codes is the Wagner decoder [4]. This scheme checks the parity based on bit-wise hard decisions on the input sequence and in case the parity is violated, it corrects the least reliable bit according to the LLRs. If the SerDes receiver makes use of a VA-MLSE, no soft-information is available to evaluate the reliability of the input bits. However, it is possible to run the Wagner decoder without full soft information [3]. For this approach, we extend the bitwise Wagner decoder to symbolwise SPC codes, where a single parity check PAM-4 symbol is appended to bring the modulo-4 parity of the PAM-4 codeword to zero. The block diagram of the decoder after VA-MLSE is depicted in Fig. 1 a). A typical receiver DSP stack consisting of FFE, noise-whitening filter  $G(D) = 1 + \alpha D$  and MLSE with VA is used. The hard-decision after VA can be separated into the transmitted symbols x(k) and an error sequence e(k). Information about this error sequence can be obtained by filtering the VA output with the same post-filter response G(D) and subtracting the input to the Viterbi processor. The resulting sequence corresponds to the error sequence convolved with the post-filter response and corrupted by uncorrelated additive noise. To isolate the information about symbol errors, the sequence needs to be convolved with a proper error correlation filter. This filter is obtained by multiplying the error event with the matched filter response  $G(D^{-1})$ . Note that, in principle, we would need a filter bank with a matched filter for each possible error pattern. However, in bandwidth-limited systems operating at practical signal-to-noise ratio, it suffices to consider error patterns with alternating signs and unitary amplitude of the type  $\pm(1; -1; 1; ...)$ . As a consequence, we use only one matched filter for each considered burst length. The correlation filter response in



Fig. 1. a) block diagram of the VA-SPC approach and b) exemplary error correlation filters for  $n_e \leq 3$ .

$$\begin{array}{c} PAM-4 \end{array} \longrightarrow \begin{array}{c} DAC \\ f_s = f_{sym} \\ 45 \text{ GHz} \end{array} \longrightarrow \begin{array}{c} TOSA \\ 50 \text{ GHz} \end{array} \longrightarrow \begin{array}{c} 8 \text{ dBm} \\ VOA \end{array} \xrightarrow{8 \text{ dBm}} \begin{array}{c} 45 \text{ dBm} \\ 70 \text{ GHz} \end{array} \xrightarrow{8 \text{ dBm}} \begin{array}{c} 0SO \\ 256 \text{ GS/s} \\ 113 \text{ GHz} \end{array} \xrightarrow{8 \text{ dBm}} \begin{array}{c} 8 \text{ dBm} \\ 113 \text{ GHz} \end{array} \xrightarrow{8 \text{ dBm}} \begin{array}{c} 113 \text{ GHz} \end{array}$$

Fig. 2. Experimental setup. The 3-dB bandwidth is indicated below each component.

dependence on the burst length  $n_e$  is

$$H_{\rm corr}(n_e) = \alpha + \left[\sum_{k=1}^{n_e-1} (-1)^{k-1} \cdot (1-\alpha) D^k\right] + (-1)^{n_e-1} D^{n_e} \tag{1}$$

for  $n_e > 1$  and corresponds to the matched filter  $H_{corr}(1) = \alpha + D$  for single errors. If a parity violation is detected, the correlation sequences give insight about the most likely error positions and the sign of the error. In bandwidth limited systems, it is beneficial to introduce an interleaver between the channel detector and the decoder to break the correlation of the errors. In order to allow a combination of this approach with interleaving, we reconstruct the soft-information for each symbol using the output of the matched filter bank. To make the correlation sequences comparable, we normalize them with respect to their maximum value.

The matched filters for the error correlation add complexity compared to a system using Viterbi processing without inner FEC code. However, since only two different factors exist in the correlation filter responses (Eq. 1), only two multiplications per symbol are required for any  $n_e$ . A corresponding realization for  $n_e \leq 3$  is depicted in Fig. 1 b). On top of that,  $2 \cdot n_e + 1$  additions per symbol are required and the selection of the highest correlation value requires  $n_e$  2-to-1 multiplexers per symbol. For limited burst lengths, this is significantly less complex than the soft-output computation using BCJR or SOVA [5].

In the following, the VA-SPC scheme is investigated in combination with a symbol-wise SPC(16,15) code.

#### 3. Experimental Investigations

To evaluate the performance of the VA based SPC codes, experimental investigations were done based on the setup in Fig. 2. A PAM-4 sequence at 1 sample per symbol (sps) is generated and converted to an analog signal. This signal is fed into a transmitter optical sub-assembly (TOSA) integrating a driver amplifier and an electroabsorption modulated laser. The power of the resulting optical O-band signal ( $\lambda \approx 1310$  nm) is controlled by a variable optical attenuator. To compensate for the lack of electrical amplification after the photodiode (PD), a Praseodymium-doped fiber amplifier (PDFA) is used to provide a constant input power of 8 dBm. After detection with the PD, a digital storage oscilloscope at 256 GS/s is used to obtain a digital signal. In the receiver offline DSP, the signal is resampled to 2 samples per symbol, timing recovery is performed and adaptive linear feed-forward equalization (FFE) with 141 taps is done at 1 sps. To overcome noise enhancement, a post-filter with 1 memory tap (whose coefficient is estimated by Burg algorithm [6]) is applied. The resulting inter-symbol interference is removed by BCJR or VA based MLSE depending on the FEC. Before each FEC code considered in the following, a row-column interleaver with a length of 2048 symbols is applied. For the conventional SPC code, bit-wise LLRs are used and the procedure described by Schmalen et al. [7] is utilized to obtain the post-FEC performance based on un-coded offline data.

Results for the symbol error ratio (SER) with respect to KP4 symbols (consisting of 10 bits) as a function of the received optical power (ROP) of 106/112 GBd PAM-4 showing the impact of hard-decision based SPC codes are depicted in Fig. 3 a). We assume sufficient random symbol interleaving before KP4 and a KP4-SER threshold of  $2.2 \cdot 10^{-3}$ . This threshold can be calculated from the bit error ratio threshold of  $2.2 \cdot 10^{-4}$  as e.g. shown by Lentner et al. [8] and is a more accurate indicator for the KP4 performance. Considering the overhead for the applied SPC(16,15) code, 112 GBd with and 106 GBd without inner code correspond to approx. the same net rate.



Fig. 3. Experimental results for a) different VA-SPC realizations and b) different net bitrates ( $n_e \le 6$  for VA-SPC).

The inset of Fig. 3 a) shows the probability of error events with a certain length in the sequence after MLSE. While the major part of the errors is occurring as single errors, bursts of two or three consecutive symbols are also appearing frequently. The longest observed error bursts span five symbols. According to this, VA-SPC decoders considering error bursts up to length 5 are applied. The approach considering only single errors results in a worse performance than the reference without inner code. Considering up to  $n_e = 2$  errors improves the performance and leads to a small gain over the case without inner FEC at the KP4 FEC threshold. As expected from the error burst distribution, the consideration of error bursts of length three brings a noticeable performance gain, while only minor gains can be achieved by adding correlation sequences for error bursts of four and five symbols.

For a better assessment, the investigated scheme is compared to inner FEC codes with conventional SD decoding and the case without inner code. The bitrate is swept to emulate different bandwidth limitations with the fixed setup. The corresponding results are shown in Fig. 3 b). For the bit-wise SPC(16,15) code the BCJR algorithm is used instead of VA based MLSE. For conventional SPC and the VA-SPC the same symbol rates are used due to the identical overhead, while the reduced overhead for the case without inner code is considered by transmitting accordingly lower symbol rates. For the hard-decision MLSE based SPC approach error bursts with up to  $n_e = 6$ errors are considered, which is sufficient to cover all observed error events even at the highest rates. For rates up to 210 Gb/s, the VA-SPC outperforms the case of MLSE without inner FEC by up to 1 dB, while a clear penalty is visible at 220 Gb/s. At this point, the increased penalty by bandwidth limitations can no more be compensated by the symbol based SPC. Conventional SD decoding achieves an additional gain of approx. 1 dB at 200 Gb/s.

According to the shown results, short symbolwise SPC codes bring a gain of about 1 dB already in the case of VA-MLSE. If a more complex BCJR-MLSE can be afforded, the achievable gain over standalone KP4 FEC grows to 2 dB.

## 4. Conclusions

We investigated the concatenation of short symbolwise SPC codes with KP4 FEC and demonstrated 1 dB gain over KP4-only FEC in case of VA-MLSE and 2 dB gain if BCJR-MLSE is used. Our results show that SPC codes could be considered for integration in future 200G+ SerDes to support the LPO use case also in latency-critical applications as HPC and AI-clusters.

## References

- 1. J. Hagenauer et al., "A Viterbi algorithm with soft-decision outputs and its applications," in 1989 IEEE Global Telecommunications Conference and Exhibition (1989).
- 2. L. Bahl et al. "Optimal decoding of linear codes for minimizing symbol error rate (Corresp.)," in IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 284-287, (1974).
- 3. T. Conway, "A new target response with parity coding for high density magnetic recording channels," in IEEE Transactions on Magnetics, vol. 34, no. 4, pp. 2382-2386, (1998).
- R.A. Silverman et al., "Coding for Constant-Data-Rate Systems-Part I. A New Error-Correcting Code," in Proceedings of the IRE, vol. 42, no. 9, pp. 1428-1435, (1954).
- P. Robertson et al., "A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain," in Proceedings IEEE International Conference on Communications ICC, vol. 2, pp. 1009-1013, (1995).
- 6. J.P. Burg, "Maximum entropy spectral analysis," Ph.D. dissertation, Stanford Univ, (1975).
- 7. L. Schmalen et al., "A Generic Tool for Assessing the Soft-FEC Performance in Optical Transmission Experiments," in IEEE Photonics Technology Letters, vol. 24, no. 1, pp. 40-42, (2012).
- D. Lentner et al. "Concatenated Forward Error Correction With KP4 and Single Parity Check Codes," in Journal of Lightwave Technology, vol. 41, no. 17, pp. 5641-5652, Sept. 1 (2023).