A Low-Complexity Sorting Network for a Fast List Polar Decoder

Fast list decoding algorithms for polar codes have been proposed to achieve low latency and high error correction performance. In the Rate-1 and SPC nodes of fast list decoding, a high-complexity sorter is required if multiple bits are to be decoded simultaneously in a single clock cycle. This paper proposes a partitioned sorting network (PSN) that reduces the complexity of metric sorter in Rate-1 and single parity check (SPC) decoders for fast list decoding. The proposed PSN consists of two sorting networks. For the first sorting network, we analyze the order of the path metrics (PMs) and reduce the number of input candidate paths. Furthermore, for the second sorting network, we analyze the cumulative results of path selection and identify relatively frequently selected paths. The proposed PSN has up to 90% fewer compare-and-swap units (CASUs) and up to 137% higher operating frequencies than existing sorting networks for Rate-1 and SPC decoders with $L=8$ .


I. INTRODUCTION
Polar codes [1] were the first error correction codes that achieved channel capacity in a binary-input discrete memoryless channel. Polar codes have been adopted as the basis of a coding scheme for control channels in the 5G New Radio standard. However, the basic decoding algorithm for polar codes, called successive cancellation (SC) decoding, has two significant disadvantages. The first is the long latency due to the bit-sequential decoding process [2]. The second is to have lower error correction performance compared to other errorcorrecting codes, such as low-density parity-check (LDPC) and turbo codes, at the same code length.
Several decoding algorithms [3], [4], [5] have been proposed to overcome the first disadvantage, i.e., the long latency. The 2-b SC decoding algorithm in [3] can decode 2 bits simultaneously in the last stage of the polar decoding tree. In [4], the simplified SC (SSC) decoding algorithm The associate editor coordinating the review of this manuscript and approving it for publication was Oussama Habachi . reduces the number of decoding clock cycles by using multibit decisions for Rate-0 and Rate-1 nodes. Furthermore, [5] proposed a fast SSC (FSSC) algorithm using two other node types: repetition (REP) nodes, with only an information bit in the last bit position, and single parity check (SPC) nodes, with only a frozen bit in the first-bit position.
Several algorithms with high error correction performance [6], [7] have been proposed to address the second disadvantage of the SC algorithm. In particular, the SC list (SCL) decoding algorithm [6] considers L decoding paths, where L is the list size, and each path is independently decoded. To prevent the number of paths from continuously growing, a path metric (PM) operation is performed for path pruning. Meanwhile, the list size L can be increased to improve the error correction performance of the SCL decoding algorithm [6]. However, the complexity of the metric sorter sharply increases with increasing L in hardware implementation.
A list-fast-SSC decoding algorithm is proposed to limit the number of candidate paths to M in [14], where M is the number of expanded candidate paths. Furthermore, [15] proposed a pruning algorithm that can reduce M × L candidates according to M and its sorting network. The sorting network proposed in [15] is based on 8L-to-L path selection for listfast-SSC decoding. However, in Rate-1 and SPC nodes for a list-fast-SSC decoding [14], error correction performance is not guaranteed compared to SCL decoding.
The SSC list (SSCL) and SSCL-SPC decoding algorithms [16], [17] achieve low latency and high error correction performance by combining the merits of the FSSC algorithm [5] and the SCL algorithm [6]. Furthermore, the Fast SSCL (FSSCL) and FSSCL-SPC algorithms [18], [19] effectively reduced the time steps by considering only bits with low reliability, not all bits in Rate-1 and SPC nodes, based on [18] and [19].
Reference [20] proposed a minimum-combinations set to speed up the splitting process of Rate-1 decoder for fast list decoding. The minimum-combination set optimizes the candidate paths in Rate-1 nodes by excluding unnecessary candidate paths that will not be selected. Fast list decoding algorithms with a minimum-combinations set [20] requires a single time step to decode a Rate-1 node.
In Rate-1 and SPC decoders for fast list decoding, 2 τ L-to-L sorters are needed to decode τ bits simultaneously in a single time step. However, this method dramatically increases the hardware complexity of the metric sorter and the propagation delay. Therefore, an improved metric sorter is needed for a faster decoder. This paper proposes a PSN, where two kinds of sorting networks are used, for Rate-1 and SPC decoders of fast list decoding. Based on [20], the first sorting network is proposed for reducing the number of candidate paths in Rate-1 and SPC nodes. Furthermore, the second sorting network is proposed through path selection ratio analysis. Finally, the proposed sorting network is implemented in hardware and compared with sorters for SCL-based decoders. The proposed PSN reduced CASU by up to 90% and increased operating frequency by up to 137% compared to existing sorting networks for L = 8.
The rest of this paper is organized as follows. Section II introduces polar codes and their decoding algorithms. Section III proposes a novel path selection method and a simplified sorting network. Section IV presents the hardware implementation results and comparisons, and Section V concludes the paper.

A. POLAR CODES AND SC DECODING
A polar code [1] of code rate R = K /N is represented by PC(N , K ), where N and K are the code length and the number of information bits, respectively. PC(N , K ) is constructed using a generator matrix G N = F ⊗n 2 , where n is log 2 N and F ⊗n 2 is the n th Kronecker power of the matrix F 2 = 1 0 1 1 . The codeword x obtained through the encoding process is expressed as x = u · G N . The SC decoding algorithm [1] can be represented through a decoding tree. Fig. 1 shows an SC decoding tree when N = 8 and R = 1/2. In Fig. 1, s represents stages, and a node length N v for each node is 2 s . There are two kinds of messages, α and β, in the decoding tree. α = {α 1 , α 2 , . . . , α N v } are LLR messages, which are passed from parent nodes to child nodes. The LLR messages α (2) β = {β 1 , β 2 , . . . , β N v } are bit estimates and are determined by a partial sum operation of pre-decoded bits. β i is passed from child nodes to parent nodes and is computed as otherwise.
At a leaf node, a hard decision on the i-th bit is determined asû where A C is the set of frozen bits.
B. SCL DECODING SCL decoding [6] was proposed to improve the performance of SC decoding by considering opposite decision results for information bits. Accordingly, each candidate codeword, called a path, is duplicated every time an information bit is decoded. However, each path requires an individual SC 8284 VOLUME 11, 2023 decoding core, and it is unrealistic to decode all 2 K possible paths simultaneously. To solve this problem, the SCL decoding algorithm limits its consideration to only L reliable paths; thus, a path selection process is needed for pruning the number of paths from 2L to L. In SCL decoding [6], every path has a PM. The PM is a non-negative real number used in the path selection process. Each path is duplicated in the SCL decoder, and correctly decoded bits and incorrectly decoded bits are stored separately. Two corresponding copied paths share their PM, but the absolute value of the corresponding log-likelihood ratio (LLR) is added as a penalty to the opposite path. All frozen bits are decoded as 0, and paths are not duplicated for frozen bits. In an LLR-based SCL decoder [8], the PM is calculated as given in (5). Here, PM l i−1 is the PM of the l-th path before decoding, and PM l i denotes the PM of the i-th bit for the l-th path.
where α l i is the i-th LLR of the node on the l-th path andû l i is the bit decision for α l i .

C. FAST LIST DECODING
The FSSCL decoding algorithm [18], proposed based on the FSSC [5] and SCL [8] decoding algorithms, has low latency but similar error correction performance compared to SCL decoding. Various special nodes are distinguished by their arrangements of information bits and frozen bits: Rate-0, Rate-1, REP, and SPC nodes [5]. 1) Rate-0 Node: Path splitting is not performed when a Rate-0 node is decoded because it has no information bits. Therefore, the number of paths remains the same. In Rate-0 nodes, the PM can be calculated as 2) REP Node: The number of candidate paths increases to 2L because an REP node has only one information bit. In REP nodes, the PM can be calculated in accordance with the corresponding bit estimate β l N v as follows: The numbers of information bits for SPC and Rate-1 nodes are N v −1 and N v , respectively. Thus, the total numbers of candidate paths are at most 2 N v −1 and 2 N v in SPC and Rate-1 nodes, respectively. However, [18] and [19] proved that for SPC and Rate-1 nodes, it is necessary to consider at least min(L, N v ) and min(L − 1, N v ) less reliable bits, respectively, to ensure the error correction performance of the SCL decoder.
3) SPC Node: The single parity of the l-th path, γ l , is calculated as shown in (8), and the PMs are initialized as shown in (9).
i min is the least reliable bit index among the SPC nodes. PM l −1 is the PM of the l-th path in the previous decoding step. Using (8) and (9), the PMs of the candidate paths are calculated as shown in (10).
When the PM computation is finished, the least reliable bit is set by the even-parity constraint as follows: 4) Rate-1 Node: To decode a Rate-1 node, the PMs of the candidate paths are determined as shown in (12).
In [22], a parallel PM computing method is proposed to decode Rate-1 nodes in a single time step. Using a parallel PM computing method [22], in Rate-1, there are 2 min(L−1, N v ) L paths possibly considered as candidates. Thus, a 2 min(L−1, N v ) L -to-L sorter is needed in Rate-1 to compute PMs in a single time step. This method can be applied to the PM calculation (10) of the SPC nodes. The SPC decoder in which the parallel PM computing method is applied, requires a 2 min(L−1, N v −1) L -to-L sorter.

III. PROPOSED SORTING NETWORK
This section proposes methods to improve the PM sorting network for Rate-1 and SPC decoders in fast list decoding. The PM sorting network of Rate-1 and SPC decoders must consider a much larger number of path candidates than that of an SCL decoder. Therefore, the PM sorter of Rate-1 and SPC decoders can cause the critical path and decrease the operating frequency of the overall decoder. In addition, the area of the metric sorter and its computational complexity are drastically increased. To alleviate this problem, we propose a PSN considering the PMs provided as input to the sorter and the path selection ratio. Rate-1 and SPC decoders using the proposed PM sorting network shows an error correction performance similar to that of a conventional SCL decoder.

A. PARTITIONED SORTING NETWORK
Since a Rate-0 node has no information bits, path splitting is not performed. A REP node has only one information bit and requires only 2L paths. Thus, metric sorters for Rate-0 nodes and REP nodes can be implemented using the existing methods [9], [10], [11], [12], [13]. However, when the Rate-1 and SPC nodes in fast list decoding are decoded, the maximum number of candidate paths for these two special types of nodes is 2 τ L, where τ is the number of bits to be split in the nodes. For this reason, an effective 2 τ L-to-L sorter is needed. In hardware implementation, the propagation delay of a sorter is directly affected by the number of sorter inputs. Therefore, we propose methods to efficiently reduce the number of sorter inputs. Fig. 2 shows the PMs of the candidate paths in Rate-1 and SPC nodes with L = 4. PM l −1 is the PM of the l-th path in the previous decoding step, and BM l i is the branch metric (BM) of each node. BM is the penalty added to the low-reliability path in (10) and (12). The PMs of the candidate paths (CPs) are determined as follows: Suppose that the BMs are sorted in ascending order. The l-th column can be regarded as the CPs derived from the l-th path. In one column, the maximum number of candidate paths to be selected is L. Among the candidate paths in the first column, } can be select-ed, and it works the same for the other columns. Therefore, if the BMs from each path are sorted, Rate-1 and SPC decoders for fast list decoding would suffer no loss in error correction performance even when considering only L 2 candidate paths. However, in practice, the BMs from a path are not sorted in ascending order. Taking this into account, we propose a PSN that considers the additional sorting of the CPs from each path. The proposed sorting network is shown in Fig. 3. The CPs branched out from each path are arranged by Sorter I. The m l i are the PMs pre-sorted by Sorter I, which are used as the input to Sorter II.

B. CANDIDATE PATH EXCLUSION FOR SORTER I
In Sorter I, 2 τ CPs are sorted to output L candidate paths. However, some PMs among these 2 τ candidates are not  selected. Therefore, PMs that are not chosen should be excluded from the candidate paths. In [20], a minimumcombinations set is proposed for reducing the number of time steps in Rate-1 nodes. Based on [20], only candidate paths essential to the sorter input can be selectively extracted based on the order of the PMs that are branched out from one path. α * denote the node LLRs α l sorted in ascending order of . β * denotes the bit estimates sorted in the same order as α * . In fast list decoding, using (10) and (12), the PMs of the candidate paths in Rate-1 and SPC nodes can be expressed as where X ⊂ {1, 2, · · · , τ }.X is the set of bits which do not satisfy 1 − 2β * i = sgn α * i . The number of domains of CP is determined depending on the case which p i is added. In a Rate-1 node,PM −1 = PM l −1 and p i = a * i . In an SPC node, when γ = 1, PM −1 = PM l −1 + α * min and p i = a * i − a * min , and when γ = 0, PM −1 = PM l −1 and p i = a * i + a * min . In addition, CP(0) = PM −1 . According to the nature of the FSSCL-SPC decoding, the following properties are obtained.
ii) There is no apparent order between CP(i, j) and CP(k) (i < j < k).
The number of PMs smaller than CP(X ) is represented by NUM small (CP(X )). If NUM small (CP(X )) is greater than or equal to L, then CP(X ) cannot be selected as one of the L paths, and can be excluded from the input of Sorter I. In addition, CP(0), CP (1), and CP(2) can be excluded from the input of Sorter I by iii). Therefore, Sorter I outputs L sorted paths. Table 1 shows the criteria for excluding candidate paths. As shown in Table 1, since NUM small (CP(4)) = 4, CP(4) is not considered as a candidate path when L = 4.

C. CANDIDATE PATH SELECTION FOR SORTER II
Since PMs are pre-arranged by Sorter I, there are L 2 candidate paths for inputs of Sorter II. For a more efficient sorting network, the number of inputs for Sorter II should be reduced. Therefore, we analyze the selection ratios for candidate paths. In Figs. 4 and 5, specific paths account for most of the path selections, and there is a clear distinction between frequently chosen and infrequently chosen paths. In addition, although the selection frequency varies somewhat with the signal-tonoise ratio (SNR), the SNR does not significantly affect the ranking of the paths in terms of path selection frequency. Therefore, through efficient path metric sorting, the Rate-1 and SPC decoders have error correction performance close to that of SCL decoder without considering all L 2 paths in Sorter II.
The existing list-fast-SSC decoding methods [14], [15] use a fixed M , where M is the number of candidate paths. In contrast, we set a different M value of Sorter II for each path based on our analysis of candidate paths. Accordingly, the proposed method requires setting parameters M l , which denote the number of candidates m l (1, L) for each path. Let P represent the total number of candidate paths m (1, L) (1, L) , as shown in (15).
To reduce the number of inputs of Sorter II, we can prune candidate paths that are selected less frequently than significant candidate paths. With this method, P can be reduced while guaranteeing performance and represented in terms of M l . For example, in Fig. 4, M l can be determined based on the selection frequency in descending order for L = 4, as   shown in (16).   conventional FSSCL-SPC. First, the simulations were performed using binary phase-shift keying (BPSK) modulation and an additive white Gaussian noise (AWGN) channel. Then, the cyclic redundancy check (CRC-16) code was used.
Since the fast list decoding with the candidate path exclusion for Sorter I selects the same L paths compared with the existing FSSCL-SPC, there is no error correction performance degradation. Fig. 7 compares the FER performances achieved with different values of M l and P in Sorter II. In addition, Table 2 shows the M l values for various L and P values in Fig. 7. Since m (1, L) (1, L) are pre-sorted by Sorter I, the value of P in Sorter II is L 2 . Therefore, the reduction ratio of the number of candidates can be expressed as 1 − P L 2 . As shown in Fig. 7, the FER performance degrades as the reduction ratio increases. However, the FER performance with P = L 2 2 is still similar to that of the conventional FSSCL-SPC. Therefore, in this paper, we set a reduction ratio of 50% as the threshold that guarantees the error correction performance. Accordingly, the proposed method can reduce the number of candidate paths by 50%, and the corresponding settings can be generalized as expressed in (17). Fig. 8 shows the results of FER comparisons among the SCL algorithm [6], the conventional FSSCL-SPC algorithm [19], and fast list decoding with the proposed PSN. Fast list decoding with the proposed PSN can achieve a  similar FER performance as the SCL [6] and conventional FSSCL-SPC [19] algorithms. The FER performance of fast list decoding using the proposed method is degraded by less than 0.1 dB, with an FER of 10 −4 , for L = 4. Similarly, the FER is degraded by less than 0.05 dB, with an FER of 10 −4 , for L = 8.

E. PROPOSED SORTING NETWORK
We propose a PSN to reduce the sorting complexity of the Rate-1 and SPC decoders for fast list decoding. In PSN, there are two sorters, Sorter I and Sorter II. Fig. 9 shows the proposed Sorter I for L = 8, N v ≥ 8. Each horizontal line represents a PM with 8-bit quantization. Each vertical line represents a CASU. If the upper input of a CASU is larger than the lower input, the CASU swaps the two positions.
To design a more efficient Sorter I, the number of CASUs should be minimized. By the properties of section III-B, the inputs of Sorter I have CPs with a pre-determined order and CPs without a pre-determined order. CASU is only required for CPs whose order is not pre-determined. In addition, CP(0), CP (1), and , CP(2) are always the three smallest  [21] and PBS [9] for an SCL decoder when L = 4.
CPs and there is no need to compare them. Based on this, we designed the Sorter I with a minimized number of CASUs. Fig. 9 shows the proposed Sorter I for the PSN.
The proposed metric Sorter II is based on a bitonic sorter. Fig. 10 shows the general bitonic sorter (GBS) [21] and the PBS [9] for a conventional SCL decoder when L = 4. The red-dotted CASUs can be removed based on the properties of the PMs in the SCL decoder as expressed in (18) and (19). Therefore, these properties enable L-reliable path selection without degradation in error correction performance [9]. We applied the PBS's method of removing CASU for conventional SCL to Sorter II for Rate-1 and SPC decoders for fast list decoding.
According to Section III-C, the proposed Sorter II considers only L 2 /2 paths, representing the majority of the results of path selection. The proposed Sorter II can be generalized as follows: Paths m ( = L/4. As a result, the L 2 candidate paths are reduced to L 2 /2. Therefore, we call this sorter the half-input bitonic sorter (HIBS). The input paths are simplified to m 1 (1,4) , m 2 (1,2) , and m (3,4) 1 by (17) by the proposed path selection method. As a result, the L 2 = 16 candidate paths are reduced to L 2 /2=8. Fig. 11 shows the proposed HIBS for Sorter II when L = 4. In the proposed HIBS, the input PMs m (1, L) (1, L) of Sorter II are pre-sorted by Sorter I. Therefore, (19) is satisfied. For this reason, comparing between m l (1, L) is unnecessary. However, in the PM initialization equations for SPC nodes (9), the parity γ l has different values for each path. Therefore, the PM property expressed in (18) is not satisfied in an SPC decoder. For this reason, in Fig. 11, comparing between m 3 1 and m 4 1 is needed.

F. COMPLEXITY ANALYSIS
The complexity of a sorter is expressed in terms of the numbers of CASUs and CASU stages. For pair comparison with the proposed PSN, we adopted GBS [21] and PBS [9] for an SCL decoder to fast list decoding. Furthermore, for comparison of various cases, three sorters were used for Sorter II of the PSN: GBS, PBS, and HIBS.
Sorting networks for fast list decoding are designed based on GBS [21]. In addition, the complexities of the sorting networks were generalized using the complexity of GBS for the SCL. The complexity of GBS for the SCL algorithm is described in (20), where S GBS SCL (L) is the number of CASU stages in a 2L-input GBS and C GBS SCL (L) is the number of CASUs in a 2L-input GBS. Accordingly, the complexity in terms of L can be calculated, as shown in (20).
However, the number of inputs to the sorter for fast list decoding is determined not only by L but also by τ. The complexity of a GBS for the FSSCL-SPC can be generalized as expressed in (21).
Based on (21), the complexity of PBS for the conventional FSSCL-SPC can be calculated. Thus, the complexity of the 2 τ L-input PBS for fast list decoding can be expressed as shown in (22).
The modified bitonic sorting network for fast list decoding is shown in Fig. 12.
The proposed PSN is consisting of two sorters: Sorter I and Sorter II. Table 3 shows the numbers of CASUs and CASU stages for Sorter I designed based on section III-B. Additionally, in the PSN, there are L-number of Sorter I. VOLUME 11, 2023   In Sorter II, since equations (21), (22) are for the 2 τ L-input, new equations for complexities of GBS, PBS, and HIBS are needed. These complexities are calculated by using (21). The complexity of GBS and PBS for Sorter II is computed as shown in (23) Fig. 13 shows the GBS and PBS for Sorter II when L = 4. Likewise, the complexity of the proposed HIBS for Sorter II depicted in Fig. 11 is calculated as shown in (25).     for Rate-1 and SPC decoders for fast list decoding. The SBS [9], PMS [11], and PBE [12], [13] can be used for PM operation of SCL satisfying (18) and (19). However, PMs of fast list decoding do not satisfy (18) and (19). Therefore, these sorting networks [9], [11], [12], [13] are not suitable for fast list decoding and are not included in this comparison.

S HIBS
In the proposed sorting network design PSN HIBS , the number of CASU stages for L = 8 is reduced by approximately 40% compared to GBS fast−list [21] and PBS fast−list [9]. In addition, the number of CASUs for L = 8 is reduced by approximately 90% compared to GBS fast−list and by approximately 87% compared to PBS fast−list . Even in the cases of L = 4 and 8, the proposed sorting network shows competitive results. Table 5 shows the implementation results regarding the equivalent gate count (EGC) and the maximum frequency for the sorting networks in Table 4 . The sorting networks were synthesized using a standard cell library for Samsung 28 nm technology, and the EGC was calculated based on 2-input NAND gates. The results in Table 5 may not be equivalent to those in Table 4 due to various optimization methods for ASIC implementation. Table 5 shows that the proposed sorting networks have higher maximum frequencies and fewer EGCs than the others. For L = 4, the proposed design PSN HIBS can operate at approximately 137% and 115% higher frequencies than GBS fast−list and PBS fast−list , respectively. Furthermore, for L = 4, PSN HIBS requires about 90% and 86% fewer EGCs than GBS fast−list and PBS fast−list , respectively. In the proposed PSNs, PSN HIBS has 61% and 20% higher frequencies than PSN GBS and PSN PBS , respectively, for L = 8. PSN HIBS requires 68% and 54% fewer EGCs than PSN GBS and PSN PBS ,respectively, for L = 8. The designs were not evaluated for L = 16 since they are impractical to implement in these cases due to the exponential increase in the number of EGCs.

V. CONCLUSION
This paper proposes the PSN, which is efficient for Rate-1 and SPC decoders. First, we analyzed the order of PMs. In fast list decoding, some paths are not selected for Rate-1, SPC nodes. Through this, we proposed candidate paths exclusion method for each list size. Second, the selection ratios for candidate paths were analyzed. It was shown that specific paths among all L 2 possible candidates account for the majority of the selected paths. In addition, we compared the setting parameters M l that minimized the FER performance degradation and determined that the number of inputs to the sorting network could be reduced by 50%.
Finally, we proposed low-complexity sorting networks of Rate-1 and SPC decoders for fast list decoding. The proposed sorting networks can guarantee error correction performance comparable to that of the SCL decoding algorithm. Furthermore, complexity comparisons based on the number of CASUs and ASIC synthesis results showed that the proposed sorting network design reduces the number of CASUs by down to 14% and permits up to 115% higher operating frequencies compared with sorting networks of fast list decoder. Therefore, the proposed method can be effectively applied for the hardware implementation of fast list decoders with L ≤ 8.
YONGJE LEE (Graduate Student Member, IEEE) received the B.S. degree in electrical and electronics engineering from Ajou University, Suwon, South Korea, in 2021, where he is currently pursuing the M.S. degree in electrical and electronic engineering. His current research interests include signal processing, channel coding, deep learning, and SoC design. VOLUME 11, 2023 JAE HONG ROH received the B.S. and M.S. degrees in electrical and electronics engineering from Ajou University, Suwon, South Korea, in 2020 and 2022, respectively. He is currently working with LX Semicon, Seoul, South Korea. His current research interests include signal processing, channel coding, cryptography, and SoC design.
USEOK LEE (Member, IEEE) received the B.S. degree in electrical and electronics engineering from Ajou University, Suwon, South Korea, in 2019, where he is currently pursuing the combined M.S. and Ph.D. degrees in electrical and electronic engineering. His current research interests include signal processing, channel coding, cryptography, and SoC design.
MYUNG HOON SUNWOO (Fellow, IEEE) received the B.S. degree from Sogang University, in 1980, the M.S. degree in EE from KAIST, in 1982, and the Ph.D. degree in ECE from the University of Texas at Austin, Austin, TX, USA, in 1990. He worked at ETRI, Daejeon, South Korea, from 1982 to 1985; and at Digital Signal Processor Operations, Motorola, from 1990 to 1992. Since 1992, he has been with the School of Electrical and Computer Engineering, Ajou University, Suwon, South Korea, where he is currently a Professor. He has authored over 470 articles, holds more than 120 patents, and has won more than 55 awards. His current research interests include artificial intelligence circuits and systems, low-power algorithms and architectures, medical imaging diagnosis, and deep-learning-based channel coding.