# Low-Complexity Architecture for Soft-Output Trellis-Based Detection in High-Speed Data Center Applications

Kaiquan Wu, Gabriele Liga, Jamal Riani\*, and Alex Alvarado

Eindhoven University of Technology, 5600MB Eindhoven, The Netherlands <u>k.wu@tue.nl</u> \*Marvell Technology, Santa Clara, CA, USA

**Abstract** An architecture enabled by DFE is proposed to achieve reduced-state trellis-based algorithms for data center applications. Performance evaluation using PAM-4 IM/DD experimental data shows that the penalty due to state pruning is below 0.5 dB, while a complexity reduction up to 57% is achieved.

## Introduction

Non-binary pulse amplitude modulation (PAM) and soft decision (SD)-forward error correction (FEC) are promising techniques for boosting the speed of intensity-modulation (IM)/direct-detection (DD) links<sup>[1],[2]</sup> in applications such as intra- and inter-data center interconnects. Recently, an SD concatenated code has been adopted as the optical interconnect FEC in 200 Gb/s PAM-4 serial intra-data center applications<sup>[3]</sup>. To take full advantage of SD-FEC, accurate computation of bit-wise log-likelihood ratios (LLRs) is crucial. However, inter-symbol interference (ISI) caused by bandwidth-limited components limits the LLR accuracy under the additive white Gaussian noise (AWGN) assumption<sup>[4]</sup>.

Traditional approaches to mitigate ISI are filters. and trellis-based algorithms targeting symbolwise maximum a posteriori (MAP) detection. Within the former category, we find decisionfeedback equalization (DFE). DFE is easy to implement, but it suffers from error propagation. The resulting burst errors can significantly penalize the performance of SD-FEC. Therefore, it is important to mitigate this burst error penalty<sup>[5]</sup>. Bit interleaving or precoding based on differential and modulo operations<sup>[5],[6]</sup> are commonly used to reduce the penalty. However, interleaving might cause high latency, and precoding prevents accurate LLR due to the modulo errors<sup>[7]</sup>. The trellisbased solutions include Log-MAP (LM), Max-Log-MAP (MLM)<sup>[8]</sup> and soft-output Viterbi algorithm (SOVA)<sup>[9]-[11]</sup>. These algorithms provide (near-)optimum performance, but with very high complexity. Various state pruning methods have been proposed for VA<sup>[12]-[15]</sup>, however, these do not provide soft-outputs for SD-FEC.

In this paper, we propose to combine DFE

with trellis-based algorithms, based on a trellis characterizing DFE burst errors<sup>[16]</sup>. In this proposed architecture, instead of the *M*-ary PAM symbols, only 3 typical DFE symbol errors need to be tracked. The error information is then used to generate LLRs in a modified demapper. The proposed architecture reaches a tradeoff: (i) the complexity of trellis-based algorithms is reduced and (ii) the burst error penalty is mitigated. Compared to our previous work<sup>[16]</sup>, here we manage to apply the trellis in<sup>[16]</sup> to practical soft-output algorithms (i.e., MLM and SOVA) and investigate their performances in experiments.

# From the PR Channel to DFE-3 States

We consider a one-tap partial response (PR) channel for modeling the ISI in IM/DD links<sup>[17]</sup>. At time *i*, given the transmitted symbol  $x_i$ , the received symbol is  $y_i = x_i + hx_{i-1} + n_i$ , where *h* is the tap coefficient and  $n_i$  is AWGN. The set of PAM-*M* symbols is defined as  $\mathcal{X} \triangleq \{\pm 1\Delta, \pm 3\Delta, \dots, \pm (M-1)\Delta\}$ .

Given received symbols  $y = [y_1, y_2, \ldots, y_N]$ , the symbol-wise MAP rule computes the a posteriori probability (APP)  $P(x_i = x|y)$ , which is later converted into LLRs for decoding. Traditional trellis-based algorithms characterize all possible ISI states (*M* for the one-tap PR channel) and their transitions. By unfolding the *M* states in time, a fully-connected trellis is obtained, where each path represents a possible transmitted symbol sequence  $x = [x_1, x_2, \ldots, x_N]$ . An illustration of the PAM-4 trellis is shown in Fig. 1 (a).

DFE removes ISI from the current received symbol by using previous hard decisions (HDs). Given the HD  $\hat{x}_{i-1}$ , the equalized symbol is  $\overline{y}_i = y_i - h\hat{x}_{i-1}$ , which determines the next HD  $\hat{x}_i$ . With the equalized symbols  $\overline{y} = [\overline{y}_1, \overline{y}_2, \dots, \overline{y}_N]$ , an AWGN demapper computes their LLRs as if all ISI has been removed. However, this assumption



Fig. 1: (a) the PAM-4 trellis; (b) the DFE-3 trellis; (c) shows an example of (b) given certain HD input on top of (a).

is not always true, since error propagation can occur if an HD is incorrect. We define the error as  $e_{i-1} \triangleq x_{i-1} - \hat{x}_{i-1}$ , where  $e_{i-1} \neq 0$  leads to a biased observation  $\overline{y}_i$ , causing  $\hat{x}_i$  to be more prone to errors, i.e.,  $\overline{y}_i = x_i + n_i + he_{i-1}$ .

For PAM-M, there are (2M-1) possible errors. At relatively high signal-to-noise ratios (SNRs) and small h, most errors occur between adjacent symbols. Therefore, we consider a reduced number of errors  $e_{i-1} \in \{0, \pm d\}$ , where  $d = 2\Delta$ . These 3 errors lead to different *biased states*  $s_i$  for  $\overline{y}_i$ , which are written as

$$s_{i} \triangleq \begin{cases} l, & \text{if } e_{i-1} = -d \\ c, & \text{if } e_{i-1} = 0 \\ r, & \text{if } e_{i-1} = +d \end{cases}$$

where  $s_i \in S \triangleq \{l, c, r\}$ , and we call them "DFE-3 states". The corresponding trellis is depicted in Fig. 1 (b). DFE error propagation is indicated by the red edges between l and r. The red edges being dotted/dashed implies that the transitions to l and r might have zero state transition probability and thus cause disconnections.

The proposed DFE-3 trellis in Fig. 1 (b) can be deemed as a PAM-M trellis after pruning states, thanks to the pre-processing of DFE. Fig. 1 (c) shows an example of the DFE-3 trellis on top of the PAM-4 trellis, given a certain input of HD  $\hat{x}$ . The center state c falls upon the HD, and the disconnection to l takes place when c happens to be the leftmost symbol  $-3\Delta$ . In general, the DFE-3 trellis leads to a simpler analysis than the PAM-M trellis, because (i) the number of states decreases from M to 3 and (ii) part of transitions are disconnected, and thus, they can be ignored.

## **Detection and Demapping with DFE-3 States**

In the proposed architecture, the DFE-3 trellis is used to obtain the state APP  $P(s_i = s' | \overline{y})$ . The key to this APP is computing the state transition probability, as has been explained in<sup>[16]</sup>. Here we compute this probability in the log domain to ensure compatibility with LM, MLM and SOVA.

Once  $P(s_i = s' | \overline{y})$  are calculated, they are used in a state-based demapper to account for the bias

imposed on  $\overline{y}_i$ . The DFE-3 state LLR is defined as

$$L_{i,j}^{S} \triangleq \log \frac{\sum\limits_{s' \in \mathcal{S}} \sum\limits_{x \in \mathcal{X}_{j}^{1}} p(\overline{y}_{i}|s_{i}=s',x_{i}=x)P(s_{i}=s'|\overline{y})}{\sum\limits_{s' \in \mathcal{S}} \sum\limits_{x \in \mathcal{X}_{j}^{0}} p(\overline{y}_{i}|s_{i}=s',x_{i}=x)P(s_{i}=s'|\overline{y})}$$

where  $\mathcal{X}_{j}^{b} \subset \mathcal{X}$  are the set of symbols labeled by bit  $b \in \{0, 1\}$  at bit position j. The LLRs are calculated by using Jacobian logarithm for LM, or Max-Log approximation for MLM and SOVA. Note that in<sup>[16]</sup>, state LLR is computed instead with  $P(s_i = s' | \overline{y}')$ , where  $\overline{y}' = [\overline{y}_1, \overline{y}_2, ..., \overline{y}_{i-1}]$ .

#### **Experimental Results and Discussions**

For baseline comparisons, the following schemes are considered: (i) DFE; (ii) DFE with random bit-interleaving within each codeword; (iii) DFE and precoding; (iv) PAM-4 trellis-based algorithms (LM, MLM, and SOVA). Regarding scheme (iii), after de-precoding the HDs, the demapper uses a numerically optimized lookup table to map the output into LLR<sup>[18]</sup>. The proposed DFE-3 trellis-based algorithm requires DFE and the state-based demapper. The traceback length of SOVA is set as 10 regardless of the trellis in use. The LDPC code from IEEE 802.3ca with blocklength 17664 bits and rate 0.83 is employed. The decoder performs belief propagation (BP) with 6 iterations.

The performances of the aforementioned schemes are evaluated offline by using the recorded data from PAM-4 IM/DD experiments. The offline processing and experimental setup are depicted in Fig. 2. In the offline generation,  $2^{15}$  bits from a pseudo-random bit sequence (PRBS) are mapped to symbols for transmission.

The experiments target 100G serial links with a rate of 53.125 GBaud. The memory size of the DSP chip is 16 Kbytes. In addition, the testbed suffers from high insertion loss caused by lossy/long electrical traces (e.g., between DAC and driver) and bandwidth limitation. After transmission, noise whitening and feedforward equalization (FFE) are performed. By treating the effective channel as PR channel, *h* is estimated to be 0.66. The discrepancy between the PR and the real channel is due to the residual ISI after FFE.



Fig. 2: Block diagrams of PAM-4 IM/DD experiment and offline processing. The dashed pink blocks indicate optional DSPs, depending on the considered schemes. ECL: external cavity laser; DAC: digital-to-analog converter; MZM: Mach-Zehnder modulator; SSMF: standard single-mode fiber; VOA: variable optical attenuator; PD: photo diode; OSC: oscilloscope.

The offline evaluation of the pre/post-FEC BER is based on an emulation approach<sup>[19],[20]</sup>. In the upper left corner of Fig. 2, scrambling (achieved by XOR operations) and re-encoding are conducted to convert the PRBS-generated codeword into a real codeword. To sweep different SNR values, noise loading is used. SNR is defined as  $(1 + h^2)/\sigma^2$  with unitary signal power.

Fig. 3 (a) shows the pre-FEC BER vs. SNR performance. Precoding offers the best performance in correcting burst errors, followed by trellis-based algorithms. Using the PAM-4 trellis corrects more errors than the DFE-3 trellis. Surprisingly, LM exhibits worse performance than MLM and SOVA for both trellises. We conjecture that LM is more sensitive than MLM and SOVA to the presence of residual ISI, because LM follows more strictly the AWGN assumption on the additive noise. The fact that LM is more complex, but ends up with worse performance, makes LM less attractive in real IM/DD systems than MLM and SOVA. DFE, with or without interleaving, shows the worst pre-FEC BER. Since interleaving only breaks the correlation between errors, both red curves overlap.

Fig. 3 (b) shows end-to-end results. It can be seen that interleaving slightly improves the performance of DFE. Although precoding provides good pre-FEC BER, as shown in Fig. 3 (a), its post-FEC BER is limited due to the use of discrete LLRs. DFE alone or with interleaving/precoding can never reach a BER of  $10^{-6}$ . On the other hand, Fig. 3 (b) shows the excellent performances of trellis-based solutions, where the smallest gap between the DFE-3 and the PAM-4 clusters is 0.39 dB. In particular, the DFE-3 trellis causes loss within 0.5 dB for SOVA.

Here we present a rough complexity analysis of the processing on the trellis. The number of states, and the number of branches per state, are both M for the PAM-M trellis. By contrast, these numbers are both 3 for the DFE-3 trellis. Hence, the ratio of the PAM-M and the DFE-3 trellis com-



Fig. 3: PAM-4 experimental results: (a) Pre-FEC BER vs. SNR, and (b) Post-FEC BER vs. SNR.

plexity is roughly proportional to  $M^2/9$ . Specifically, we count the number of add, compare, and select (ACS) operations, which are extensively used in MLM and SOVA. Compared to the PAM-4 trellis, using the DFE-3 trellis reduces the number of ACS by 44% and 57% in MLM and SOVA, resp.

#### Conclusions

In this paper, we experimentally investigate the performances of several soft-output trellis-based algorithms. The proposed architecture, enabled by the DFE-3 trellis, leverages a performance penalty of less than 0.5 dB to achieve considerable complexity reduction up to 57%. It is believed to be of interest to future data center applications. For instance, with the deployment PAM-8, the complexity reduction can reach 95% for SOVA.

#### References

- V. Gaudet, "A survey and tutorial on contemporary aspects of multiple-valued logic and its application to microelectronic circuits", *IEEE J. Emerg. Sel. Topics in Circuits and Sys.*, vol. 6, no. 1, pp. 5–12, Mar. 2016. DOI: 10.1109/jetcas.2016.2528041.
- [2] R. Nagarajan, I. Lyubomirsky, and O. Agazzi, "Low power DSP-based transceivers for data center optical fiber communications (Invited tutorial)", *J. Lightw. Technol.*, vol. 39, no. 16, pp. 5221–5231, Aug. 2021.
- [3] L. Patra, A. Farhood, R. Radhamohan, W. Bliss, S. Ramesh, and D. Cassan. "IEEE P802.3dj task force: FEC baseline proposal for 200Gb/s per lane IM-DD optical PMDs". (Mar. 2023), [Online]. Available: https: //www.ieee802.org/3/dj/public/23\_03/patra\_ 3dj\_01b\_2303.pdf.
- [4] M. Schaedler, G. Böcherer, F. Pittalà, et al., "Recurrent neural network soft-demapping for nonlinear ISI in 800Gbit/s DWDM coherent optical transmissions", J. Lightw. Technol., vol. 39, no. 16, pp. 5278–5286, Aug. 2021.
- [5] J. Riani, B. Smith, A. Farhood, and L. Patra. "IEEE P802.3dj task force: Impact of burst errors on concatenated FEC scheme". (Mar. 2023), [Online]. Available: https://www.ieee802.org/3/dj/public/23\_03/ riani\_3dj\_01a\_2303.pdf.
- [6] M. Russel and J. W. M. Bergmans, "A technique to reduce error propagation in M-ary decision feedback equalization", *IEEE Trans. Commun.*, vol. 43, no. 12, p. 2878, Dec. 1995.
- [7] M. Jana, A. Medra, L. Lampe, and J. Mitra, "Pre-equalized faster-than-Nyquist transmission", *IEEE Trans. Commun.*, vol. 65, no. 10, pp. 4406–4418, Oct. 2017.
- [8] P. Robertson, P. Hoeher, and E. Villebrun, "Optimal and sub-optimal maximum a posteriori algorithms suitable for turbo decoding", *Eur. Trans. Telecommun.*, vol. 8, no. 2, pp. 119–125, Mar. 1997.
- [9] J. Hagenauer and P. Hoeher, "A Viterbi algorithm with soft-decision outputs and its applications", in *Proc. IEEE Global Telecommun. Conf. Exhib. 'Commun. Technol. Beyond*', Dallas, TX, USA, Nov. 1989. DOI: 10. 1109/glocom.1989.64230.
- [10] L. Gong, W. Xiaofu, and Y. Xiaoxin, "On SOVA for nonbinary codes", *IEEE Commun. Let.*, vol. 3, no. 12, pp. 335–337, Dec. 1999. DOI: 10.1109/4234.809527.
- [11] M. P. Fossorier, F. Burkert, S. Lin, and J. Hagenauer, "On the equivalence between SOVA and max-log-MAP decodings", *IEEE Commun. Lett.*, vol. 2, no. 5, pp. 137– 139, May. 1998.
- [12] M. V. Eyuboglu and S. U. Qureshi, "Reduced-state sequence estimation with set partitioning and decision feedback", *IEEE Trans. Commun.*, vol. 36, no. 1, pp. 13–20, Jan. 1988.
- [13] M. Visintin, P. Poggiolini, and G. Bosco, "Long-haul optically uncompensated IMDD transmission with MLSE using the M-method", *IEEE Photon. Technol. Lett.*, vol. 19, no. 16, pp. 1230–1232, Aug. 2007.
- [14] Y. Yu, Y. Che, T. Bo, D. Kim, and H. Kim, "Reducedstate MLSE for an IM/DD system using PAM modulation", *Opt. Exp.*, vol. 28, no. 26, pp. 38 505–38 515, Dec. 2020.

- [15] J. Zhou, J. Zhang, X. Zhao, *et al.*, "Advanced nonlinearity equalizer with TC-NL-MLSE for transmitting beyond 200-Gb/s PAM-8 in IM/DD systems", *Opt. Exp.*, vol. 30, no. 21, pp. 37416–37425, Oct. 2022.
- [16] K. Wu, G. Liga, J. Lee, L. Paulissen, J. Riani, and A. Alvarado, "DFE state-tracking demapper for soft-input FEC in 800G data center interconnects", in *Proc. Eur. Conf. Opt. Commun.*, Basel, Switzerland, Sep. 2022.
- [17] A. Mahadevan, Y. Lefevre, W. Lanneer, et al., "Impact of DFE on soft-input LDPC decoding for 50G PON", in *Proc. Opt. Fiber Commun. Conf.*, San Francisco, CA, USA, Mar. 2021. DOI: 10.1364/ofc.2021.m3g.5.
- [18] L. Szczecinski and A. Alvarado, *Bit-interleaved coded modulation: fundamentals, analysis and design.* John Wiley & Sons, Feb. 2015. DOI: 10.1002 / 9781118694077.
- [19] L. Schmalen, F. Buchali, A. Leven, and S. ten Brink, "A generic tool for assessing the soft-FEC performance in optical transmission experiments", *IEEE Photon. Technol. Lett.*, vol. 24, no. 1, pp. 40–42, Jan. 2012.
- [20] N. Stojanovic, Y. Zhao, D. Chang, Z. Xiao, and F. Yu, "Reusing common uncoded experimental data in performance estimation of different fec codes", *IEEE Photon. Technol. Lett.*, vol. 25, no. 24, pp. 2494–2497, Dec. 2013.