# Multiplication-Free Equalization Schemes for 244-Gbps PAM-4 Transmission

## Fei Xie,<sup>1</sup>, Xiaoqian Huang<sup>1</sup>, Shuangyue Liu<sup>2</sup>, Du Tang<sup>1</sup>, Zhengkang Wang<sup>1</sup>, and Yaojun Qiao,<sup>1,\*</sup>

<sup>1</sup> State Key Laboratory of Information Photonics and Optical Communications, Beijing University of Posts and Telecommunications, Beijing 100876, China

<sup>2</sup> Department of Mobile Communications and Terminal Research Research Institute of China Telecommunication, Guangzhou, China \*aiao@bupt.edu.cn

**Abstract:** We propose a multiplication-free equalization scheme using cluster-assisting lookup tables (CLUT). Results demonstrate an 11-order table size reduction compared to traditional LUTs, incurring only a 0.2-dB penalty. © 2023 The Author(s)

## 1. Introduction

Driven by bandwidth-intensive services such as 4K/8K high-definition videos, and edge computing services in the 5G era or potentially 6G era in the future, there is a rapidly increasing demand for data traffic in data center interconnects (DCIs). While coherent technology is available in the long-haul transmission systems, the 4-level pulse amplitude modulation (PAM-4) in the intensity modulation and direct detection (IM/DD) systems continue to be the dominant solution for 400-Gbps intra-DCI owing to its low cost and high power efficiency [1,2]. However, with the next-generation Ethernet expected to reach 800-Gbps or even 1.6-Tbps intra-DCI [3,4], over 200 Gbps per lane will be necessary.

To achieve high-speed optical transmission, one solution is to use a higher modulation order such as PAM-8 which makes the signal sensitive to device nonlinearity. PAM-8 also leads to the additional equalizer structure in digital signal processing (DSP) to compensate for nonlinear distortion [5]. The other solution is to continue using PAM-4. To address this issue, some cost-effective approaches have been reported, including maximum likelihood sequence estimation (MLSE) and log-maximum a posteriori estimation with the fixed number of surviving states (fixed-state Log-MAP, hereinafter referred to as MAP) decoding [6]. Calculation parts of equalizers based on MLSE or MAP include channel-shortening filters (CSF) and decoders. Lookup tables (LUT) were proved to be a substitute for multiplication in the decoder [7]. However, the CSF at the receiver cannot be replaced by LUT because of the multilevel caused by quantization. Moreover, higher rates will inevitably lead to more severe channel distortion, including increasing the memory length of inter-symbol interference (ISI). And the table size in these algorithms increases exponentially with the ISI length. In our previous work, an efficient equalizer was proposed to shorten ISI, which performs similarly to MAP with much fewer multipliers [8]. In this work, we further investigate and reduce the complexity of the equalizer, which shows a multiplication-free feature.

In this paper, we first deploy the CSF in equalizers based on MAP at the transmitter to provide a basis for the use of LUTs. Additionally, we utilize cluster-assisting lookup tables (CLUT) streamlined by the k-means clustering algorithm to obviate traditional LUTs. The introduced multiplication-free scheme is evaluated in a C-band 244-Gbps 1-km PAM-4 simulation system with 20% forward error correction (FEC) applied. Compared to traditional LUTs, the CLUT can reduce the table size by 11 orders of magnitude only with a 0.2-dB penalty under 20% FEC.

## 2. Multiplication-Free equalization schemes

The use of LUT as an alternative to multiplicative calculation in equalization algorithms has been widely verified. CLUT has the same application as LUT, but its size is much smaller. Figure 1 shows a block diagram of the LUT and CLUT. For the PAM-M signal, when the tap number of the original finite impulse response (FIR) filter is L, the size of its LUT is  $(M^L) \times (L\log_2 M)$  by using binary addresses because of all combinations of L symbols. Therefore, the size of LUT increases exponentially with the number of taps and symbols. However, the size of the CLUT we make based on the filter simplified by the k-means clustering algorithm is significantly reduced. Specifically, the size of the CLUT is by  $(\prod_{i=0}^{L_c-1}[(M-1)N(i)+1]) \times (\sum_{i=0}^{L_c-1}\log_2[(M-1)N(i)+1])$  using binary addresses, where  $L_c$  is the cluster number and N(i) is the number of members in each cluster.

To ensure high performance, we choose the equalizer based on MAP, which has three parts, including a CSF, desired impulse response filter (DIRF), and MAP decoder. The CSF is a finite impulse response, and its role is to

reduce the complexity of the MAP decoder by shortening the inter-symbol interference (ISI) length of received signals. Then, the DIRF is designed to emulate the residual ISI response. Finally, the residual ISI estimated by DIRF shall be removed by the MAP decoder. However, this scheme includes multiplication calculation and cannot replace CSF with LUT.

The traditional CSF at the receiver cannot be replaced by using a LUT with acceptable complexity, because of the multiple levels but not symbols of received signals due to channel and quantization. To this end, we place CSF at the transmitter so that the inputs of both CSF and DIRF only include signal symbols. Then, We use two LUTs to play the role of the original CSF and DIRF, respectively. Therefore, no multiplication is in the whole equalization process.



Fig. 1: Block diagram of LUT and CLUT.

Fig. 2: Simulational setup for 1-km transmission.

## 3. Simulational setup

We simulate and evaluate the performances of the 244-Gbps PAM-4 system with different equalization schemes. Figure 2 shows the simulational setup and DSP block diagram. At the transmitter, a pseudo-random bit sequence (PRBS) is first mapped into PAM-4 symbols, and filtered by CSF or CLUT to shorten the length of ISI. Then, preprocessed symbols are resampled to 2 samples per symbol (SPS) for raised cosine pulse shaping with a roll-off factor of 0.1. The transmitter employed a 5-bit digital-to-analog converter (DAC) to generate the analog signal. The amplitude of the analog signal is amplified by a driver to avoid the modulation nonlinearity and then fed into a chirp-free Mach-Zehnder modulator (MZM) with an extinction ratio of 35 dB for modulating the signal on an optical carrier generated by a continuous wavelength laser (CWL) at 1550 nm. The modulated optical signal is launched into the 1-km SSMF for transmission, where the fiber dispersion and attenuation parameters are 16.8 ps/nm/km and 0.2 dB/km, respectively. After fiber transmission, the optical signal is first amplified by an erbiumdoped optical fiber amplifier (EDFA) with a noise figure of 4.5 dB, and then input into an optical bandpass filter (OBPF) to filter the amplified spontaneous emission noise. At the receiver, a variable optical attenuator (VOA) is utilized to adjust the received optical power (ROP). Then the received signal is detected by a PIN photodiode (PD) with the responsibility of 1.0 and a thermal noise density of  $1 \times 10^{-12} A / \sqrt{Hz}$  and carries out the photoelectric conversion. A 5-bit analog-to-digital converter (ADC) is employed to generate the digital signal at the receiver. Finally, the offline DSP procedures including resampling to 1 SPS, matched filtering, DIR-based or CLUT-based MAP, PAM-4 demapping, and BER calculation are performed. Moreover, after ensuring optimal performance, we set the tap numbers of the original CSF and DIRF to 30 and 10, respectively.

## 4. Results and discussions

Figure 3(a) shows the bit error rate (BER) versus received optical power (ROP) under different cluster numbers of CSF and the original DIRF. The cluster number of CSF greatly affects the BER performance of the system by shortening the ISI length of the channel. As the number of clusters increases, the BER performance of the system improves. And when the number of clusters exceeds 9, the BER performance does not improve. After considering the performance and complexity, we set the 9-cluster CLUT to replace CSF. Figure 3(b) shows the BER versus ROP under 9-cluster CSF and different cluster numbers of DIRF. The cluster number of DIRF has little impact on the BER performance of the system, and when the number of clusters exceeds 3, the BER performance does not improve. Therefore we decided the 3-cluster CLUT to replace DIRF.

W1H.2



Fig. 3: BER versus ROP: (a) under different cluster numbers of CSF and the 10-tap DIRF; (b) under 9-cluster CSF and different cluster numbers of DIRF.

Figure 4 shows the BER versus ROP comparison between the CLUT scheme and the original scheme. It can be seen that the CLUT scheme degrades the receiver sensitivity by 0.2 dB under 20% FEC. However, CLUT can make the whole signal equalization process completely free of multiplication. Table 1 shows the required table sizes under different schemes. In CSF, The use of CLUT can reduce the size of traditional LUT by 11 and 4 orders of magnitude in CSF and DIRF, respectively.



| Table 1: Complexity comparison |
|--------------------------------|
|--------------------------------|

| Parts                     | CSF              |                 | DIRF            |           |
|---------------------------|------------------|-----------------|-----------------|-----------|
| Method                    | LUT              | CLUT            | LUT             | CLUT      |
| Parameter                 | 30-tap           | 9-cluster       | 10-tap          | 3-cluster |
| Addr. Number <sup>1</sup> | 1.15 <i>e</i> 18 | 8.52 <i>e</i> 7 | 1.05 <i>e</i> 6 | 4.00e2    |
| Addr. Length <sup>2</sup> | 60               | 27              | 20              | 9         |

Fig. 4: BER versus ROP comparison between the <sup>1</sup> Number of stored addresses in table size CLUT scheme and the original scheme.

<sup>2</sup> Length of one stored address in binary

#### 5. Conclusion

In this paper, a new multiplication-free equalization scheme based on CLUT has been investigated to replace traditional multiplication for high-speed PAM-4 transmission. The use of CLUT can reduce the size of traditional LUT by 11 orders of magnitude only with a 0.2-dB power penalty at 20% FEC, which helps to achieve lowcomplexity low-latency DSP technology. The simulational results imply that an advanced equalization scheme based on LUT is a promising approach for 244 Gbps and beyond per lane transmission and for future 800 GbE or 1.6 TbE interconnects.

#### 6. Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 62271080, and Fund of State Key Laboratory of IPOC (BUPT) (IPOC2022ZT06).

### References

- 1. Kobayashi et al., Proc.OFC, Th4B.2 (2019)
- 2. R. Nagarajan et al., Proc.ECOC, Tu.A.7.1 (2023)
- 3. P. Ossieur et al., Proc.ECOC, We.B.6.1 (2023)
- 4. D. Che et al., Proc.OFC, M3H.5 (2021)
- 5. A. Masuda et al., Proc.OFC, W4I.6 (2019)
- 6. S. Liu et al., PJ 13(3), 1-10 (2021)
- 7. Z. Chen A et al., OE 30(7), 11275-11287 (2022)
- 8. F. Xie et al., OE 31(12), 18888-18897 (2023)