Branch-Assisted Sign-Flipping Belief Propagation Decoding for Topological Quantum Codes Based on Hypergraph Product Structure

Topological codes, a kind of quantum error correction code, have been used for current quantum computers due to their local qubit layout and high threshold. With nearly linear complexity, syndrome-based belief propagation (BP) can be considered as a decoding candidate for topological codes. However, such highly degenerate codes will lead to multiple low-weight errors where the syndrome is identical so that the BP decoding is not able to distinguish it, resulting in degradation in performance. In this article, we propose a branch-assisted sign-flipping belief propagation (BSFBP) decoding method for topological codes based on the hypergraph product structure. In our algorithm, we introduce the criteria to enter the new decoding path branched from BP combined with a syndrome residual, which is obtained from the syndrome-pruning process. A sign-flipping process is also conducted to disturb the log-likelihood ratio of the selected variable nodes, which provides diversity in the syndrome residual. Simulation results show that using the proposed BSFBP decoding is able to outperform the BP decoding by about two orders of magnitude.


I. INTRODUCTION
As transistors reduce in size, the quantum effect will be increasingly important. For example, along with the short length of the gate, the quantum tunneling effect will decrease performance. Thus, quantum computers become another potential bottleneck route for Moore's law. In a quantum computer, the information will be disturbed by noise. Thus, quantum error correction (QEC) codes are necessary for the information-passing process.
The original famous QEC code was the 9-qubit Shor code, which is able to correct both a single bit-flip error and a single phase-flip error [1]. Later, a widely used type of QEC code, called the Calderbank-Shor-Steane (CSS) code, was constructed using two related classical codes under a variety of constraints [2], [3]. Since then, many QEC codes have been constructed using existing classical codes, such as quantum low-density parity-check (QLDPC) codes [4] and quantum expander codes [5].
Another type of code is the topological code, such as the toric code [6] and surface codes [7], [8], which are defined on a spin-lattice. Because of the locality of the stabilizer measurements for topological codes, they have become beneficial to fault-tolerant large-scale quantum implementation [9] and have been used in ongoing research [10], [11]. In fact, almost all the QEC codes, from the earliest Shor code to current topological codes, are classified as CSS codes. A subtype of CSS construction, called hypergraph product (HGP) construction, can also be used to construct topological codes [12]. Furthermore, HGP construction preserves the merits of topological codes and has a better minimumdistance growth compared to other constructions, such as hyperbolic construction [13]. Thus, our article will mainly focus on such topological codes.
Since QEC codes can be regarded as a classical binary code [14] or as a classical 4-ary additive code with some constraints [15], many decoding algorithms for classical codes can be applied. One of the main decoding algorithms that have nearly linear complexity is belief propagation (BP) [16]. BP decoding, modified with the syndrome as its input due to the quantum state collapse, has been used in QLDPC codes, resulting in remarkable performance [17], [18], [19], [20].
In addition, binary BP decoding can be used as an auxiliary to yield the information of qubits for other decoding algorithms. In this way, an output of binary BP decoding can be a reliable input to the main decoders. For example, Grospellier et al. [21] use the soft output from the binary BP decoding for the small-set-flip (SSF) decoding, where the regular quantum expander codes are considered [5]. Then, the SSF will search all the supports of the generators in the stabilizer group and add to the estimated error by the one that is able to reduce the weight of the corresponding syndrome [22]. Recently, inspired by ordered-statistics decoding (OSD) for classical linear block codes [23], Panteleev and Kalachev [24] proposed a quantum-version OSD using QLDPC codes, including OSD with order 0 (OSD-0) as well as a higher order version. A modified higher order OSD is proposed via a combination sweep method (OSD-CS) [25], and the authors also show that the quantum-version OSD can be used for topological codes. However, BP decoding concatenated with OSD will result in additional complexity and latency when compared to pure BP decoding, which is a disadvantage for practical applications [26].
Decoders intended for topological codes are typically based on the minimum weight perfect matching (MWPM) method [6], [27], which has the complexity O(N 3 ) at the worst case. Recently, some decoders based on MWPM have been proposed especially for topological codes, which can reduce the complexity approximately to O(N log N) [28], [29]. Although it has been shown that MWPM provides outstanding performance for topological codes, the requirement of pair-like syndromes limits its universality [30]. For example, to date, the MWPM decoder cannot be applied directly to some higher dimensional topological codes that include parity-check matrices that have column weights larger than 2 [31]. Instead, BP decoding, which has a complexity that is approximately linear to the code length, is a more general decoder for different types of codes; thus, it is considered here.
However, due to the highly degenerate nature of topological codes, BP will suffer from performance degradation. Degeneracy occurs when the codes contain many low-weight stabilizers compared to their minimum distance [32], [33], [34]. For the highly degenerate topological code, we find that its parity-check matrix will include lower row and column weights, leading to multiple low-weight errors, which have syndromes identical to the measured syndrome and will cause the BP decoding to fail. Since the difference in performance between using nonbinary BP and binary BP decoding is not significant for topological codes [32], we will focus on improving the performance of binary BP decoding under the consideration of complexity.
In this article, we propose a decoding algorithm termed branch-assisted sign-flipping belief propagation (BSFBP) decoding. In our algorithm, criteria are first proposed to identify the specific estimated syndrome. Then, the syndrome residual is obtained from the proposed syndromepruning process and is viewed as the syndrome input for the new decoding path branched from the syndrome-based BP. This branch-assisted process is able to mitigate the effect of those indistinguishable error patterns that would cause the syndrome-based BP to fail. We also propose a sign-flipping (SF) process to disturb the log-likelihood ratio (LLR) of the selected variable node. In this way, the process is able to provide a new decoding path that has diverse estimated syndromes, thereby improving the performance. BSFBP is able to outperform BP decoding by about two orders of magnitude. Our proposed decoding method can also be an auxiliary for OSD, providing a significant performance gain, which can compete with the MWPM for the considered 2-D topological codes. We also show that our BSFBP can be applied to those codes where the MWPM is limited, such as 3-D topological codes based on the homological product [35] and other types of codes based on HGP structures.
The rest of this article is organized as follows. In Section II, the background to QEC codes and binary BP decoding is introduced. Section III provides a detailed description of our proposed algorithm for topological codes. Our simulation results are shown in Section IV. Finally, Section V concludes this article.

II. PRELIMINARIES
Topological quantum codes, such as toric code [6] and surface code [7], [8], are constructed based on the lattice. The qubits for the topological codes are located on the edge, while stabilizer operators are related to the vertices and the plaquettes of the lattice. Thanks to their tolerance to local errors, they have been used for fault-tolerant quantum computing [9]. An HGP structure can also be used to construct a class of topological codes that have growing minimum distance [12], which will be described as follows.

A. QEC CODES IN HGP STRUCTURES
Pauli matrices are useful for describing the operations acting on a single qubit in quantum computation [36]. They are defined as follows:
The stabilizer formalism is useful for describing QEC codes [37]. Given G n , an [[n, k, d]] quantum code is able to encode k qubits to an n-qubit codeword with a minimum weight of Pauli operations d, where the weight is the number of qubits operated by a nonidentity Pauli matrix. The double bracket is used for quantum codes to distinguish from the conventional [n, k, d] notation for classical codes. An [[n, k, d]] stabilizer code C S is defined by its stabilizer group S ⊆ G n , where −I / ∈ S and all the elements commute mutually. A codeword |ψ ∈ C S is the +1 eigenstate of all the stabilizers, i.e., P |ψ = |ψ , P ∈ S. In addition, S can be characterized by n − k generators g i ∈ G n , 0 ≤ i < n − k. That is, every element P ∈ S can be represented as a product of g 0 , . . ., g n−k−1 . Hence, by the isomorphism between Pauli group and binary vector, i.e., I → (0 0), X → (1 0), Z → (0 1), Y → (1 1), these n − k generators g i together can be represented in terms of parity-check matrix H S for C S , where each row i of H S is the binary representation of g i [38]. For example, a generator XZZXI with n = 5 can be represented as a 2n-binary vector (1 0 0 1 0 0 1 1 0 0), where the 1s in the leftmost and rightmost n bits correspond to the qubits, in which X operate and Z operate, respectively [14].
An [[n, k X − k Z , d]] CSS code [2], [3] is a kind of stabilizer code, which is constructed using two classical codes C X [n, k X , d X ] and C Z [n, k Z , d Z ], where d ≥ min{d X , d Z }, C Z ⊂ C X , and both C X and C ⊥ Z , the dual of C Z , are able to correct up to t-bit errors, i.e., both d X , d Z ≥ 2t + 1. Given the parity-check matrix H X of C X and H Z of C ⊥ Z , the CSS code constructed from C X and C Z has the form where H X · H T Z = 0. HGP code [12] is also a class of CSS codes. Given two classical binary codes C 1 [n 1 , k 1    construction are where I k denotes the k × k identity matrix. It can be easily verified that HGP is a CSS code since H X · H T Z = 0. In this article, we consider the symmetric HGP for which The HGP structure can also be used to construct topological codes that have a minimum distance proportional to n 1/2 , where n is the length of the quantum code.  2,9]] toric code as well as for the [[113,1,8]] surface code with the parity-check matrices H b shown in Fig. 3(a) and (b), respectively. Blanks in these figures represent zeros.
As mentioned in Section I, topological codes are highly degenerate, which means that the weights of their stabilizers, i.e., the row weights of their parity-check matrices, are also low. For example, Fig. 1 shows that the average row  and column weights are 4 and 2, respectively, for H Z for the [[162,2,9]] toric code, while for H Z of the [[113,1,8]] surface code, the average row and column weights are lower than 4 and 2, respectively. This fact also results in a lowaverage column weight and will have an adverse impact on the syndrome-based BP decoding algorithm.

B. BINARY BP DECODING
BP is an efficient iterative decoding algorithm for classical LDPC codes [39], [40] and can also be applied to QEC codes [14]. For classical codes, the information of transmitted bits from the noisy channel is measured and treated as the intrinsic information for the BP decoding so as to estimate the transmitted bits. Unlike decoding on classical codes, transmitted information cannot be measured directly since the measurement will collapse a qubit to the specific classical state (0 or 1). Thus, there is no intrinsic information from the channel for quantum codes. Instead, it is only the syndrome that can be obtained from the measurement on the auxiliary qubits [36]. Thus, the syndrome-based BP decoding is used for QEC codes [41]. In this article, we will focus on the syndrome-based binary BP decoding, which we refer to as BP for simplicity.
In this article, we consider the discrete error model, where errors disturb each qubit independently and their effects can be described by the Pauli matrices, i.e., for a single qubit, the error operating on it can be described by X (bit flip), Z (phase flip), and Y (both). Furthermore, since Y can be decomposed into X and Z, under the consideration that X and Z are uncorrelated, the noise model can be further viewed as two independent channels: bit-flip and phase-flip channels [36].
The noisy model containing independent bit-flip and phase-flip channels can be analogized as two independent binary symmetric channels (BSCs) with the crossover probability p [14]. For example, based on the isomorphism between the Pauli operators and the binary vector, an n-tuple error e, where elements are Pauli matrices, is able to transform into 2n-bit vector (e x e z ), where e x and e z are two n-bit vectors which can be viewed as errors from two independent BSCs, respectively. Then, for a CSS code with the form (3), we obtain the syndrome s = (s x s z ), where s x = H Z · e x and s z = H X · e z . Then, s x and s z are independently fed to the BP decoding algorithm. Eventually, the logical error rate P css for the CSS code is P X + P Z − (P X · P Z ), where P X and P Z denote the classical logical error rates for codes C X and C Z , respectively.
In this article, since we consider the symmetric HGP structure for which H 1 = H 2 and the transmission is over the two independent BSCs that have an identical p, the decoding performance for H Z and H X are nearly identical. Thus, we can only focus on one type of error and one parity-check matrix, for instance, e x from the bit-flip channel and H Z . In the following description, we will omit the subscript of e x and s x , denoted as e and s, respectively.
Given an M × N matrix H Z from the HGP structure, and the measured syndrome s ∈ F M 2 as the input, where M = N − K, BP decoding can be conducted on the Tanner graph for H Z , where each error location corresponds to a variable node and each syndrome location corresponds to a check node. Define j and i as the index for the variable node and the check node, where 0 ≤ j < N and 0 ≤ i < M, respectively. The index set of variable nodes neighboring the ith check node is defined as N (i) = { j|H Z i j = 1}, and vice versa. Let L k j be the LLR for the jth variable node at the kth iteration. Q k i j denotes the message from the jth variable node to the ith check node at the kth iteration, and R k i j denotes the message from the ith check node to the jth variable node at the kth iteration. The maximum number of iterations of BP decoding is denoted as I max . To reduce the computational complexity, we use normalized min-sum BP decoding [42], [43], which is shown as follows.
1) Initialization: Set k = 0. Since there is no intrinsic information from the channel, we initialize each variable node j with the prior LLR, i.e., L 0 3) Horizontal message exchange: where s i denotes the ith element in the measured syndrome s and β = 1 − 2 −k denotes a scaling factor [44].

5) Hard decision:
whereê k j denotes the jth element of the estimated error e k for the kth iteration. Then, the estimated syndrome at the k iteration isŝ k =ê k · H T Z . a) If the decoding is able to converge to the measured syndrome, i.e.,ŝ k = s, declare as a success. b) Otherwise, set k = k + 1 and repeat from Step 2. If k = I max , then declare as a failure.
Note that for BP decoding on classical codes, to compute R k i j , we do not need to compute (−1) s i as in (7) since the prior LLR L 0 j contains the transmitted information from the channel. While using syndrome-based BP decoding on quantum codes, we cannot directly measure the transmitted information. Thus, we set L 0 j for each variable node by using the given cross probability p of BSC. To compute R k i j , the ith element of the measured syndrome s i is aided to update the LLR for the jth variable node. Eventually, the jth element of the estimated error is obtained by summing the prior LLR together with all the information sent to the jth variable node.

III. PROPOSED ALGORITHM
In this section, we will first show the structure of the parity check matrix in HGP form. Then, from this structure, we find that except for weight-1, for other low-weight errors, i.e., where the weight is equal to or larger than 2, another error exists with the same weight and syndrome as the given error and the corresponding syndrome. Thus, the BP decoding algorithm cannot converge to the measured syndrome. To improve the BP decoding on topological codes, we propose a BSFBP decoding approach inspired by observations through the BP decoding process of those errors. Although the proposed decoder is based on 2-D topological codes, it can also be applied to the 3-D versions, which we will show in Section IV.

A. HGP STRUCTURE FOR TOPOLOGICAL CODES
Although the locality nature of topological codes constructed from the HGP form is beneficial to the real implementation, it also leads to lower row and column weights from the perspective of the parity-check matrix, which will cause ambiguity between different errors with the same weights.
Considering H Z constructed based on an m b × n b paritycheck matrix H b using the symmetric HGP structure, i.e., we denote the first n 2 b columns as section S 1 , which corresponds to I ⊗ H b , and the final m 2 b columns as section S 2 , which corresponds to H T b ⊗ I. For example, in Fig. 1, the first 81 columns from left to right together represent section S 1 of H Z for the [[162,2,9]] toric code and the remaining 81 columns together represent section S 2 . Furthermore, S 1 can be divided into n b H b -type block columns where each H b -type block column contains n b columns. Likewise, S 2 can be divided into m b I-type block columns where each I-type block column contains m b columns. For the [[113,1,8]] surface code shown in Fig. 2, all the representations are the same as for the toric code, except H b , as shown in Fig. 3(b). Now, we will demonstrate the correspondence between the error and the syndrome by considering the weight-1 and weight-2 errors. In Statement 1, we show that each weight-1 error yields a unique syndrome. We denote the set of the nonzero element location for a given vector v as I v .
Statement 1: For topological codes based on the symmetric HGP structure, given an arbitrary weight-1 error e with syndrome s, there is no other weight-1 error having the identical syndrome s.
Proof: Suppose an error e with I e = { j}, where j is located at S 1 and its corresponding syndrome s with The case where an error has a nonzero element located at S 2 with syndrome s is similar since no other weight-1 error in S 1 or S 2 has a corresponding syndrome identical to s. That is, given a weight-1 error, it corresponds to a unique syndrome.
In contrast, given a weight-2 error, another weight-2 error might exist where these two errors have an identical syndrome. For example, a special case is given as follows.
Statement 2: Given a weight-2 error e where I e = { j, j + 1}, where j and j + 1 are in the same block column in S 2 , a weight-2 error exists, which has an identical syndrome to e.
Proof: For a toric code, where m b = n b , the weight-2 error e will yield the corresponding syndrome s where We are able to identify an error e where I e = { j , j + n b }, where j and j + n b are in different block columns in S 1 . The corresponding syndrome s has an identical  For the case of larger weights such as a weight-3 case, we can combine a weight-1 error pattern with a weight-2 error pattern described above. Then, another weight-3 error exists with the identical syndrome based on the statements above.

B. BP DECODING FOR TOPOLOGICAL CODES
In this article, we will consider the [[162, 2,9] [25] in order to demonstrate the error rate performance. For the specific topological codes mentioned above, using the weight-1 case, though no strict proof of the convergence to the measured syndrome exists for the BP decoding, the simulation achieved by tracing all weight-1 errors shows that BP decoding is able to converge to the measured syndromes. However, when the weight is greater than or equal to 2, as described in Section III-A, for such topological codes with both low row-weight and column-weight of the parity-check matrices, many pairs of errors that have an identical syndrome will exist such that the BP decoding cannot distinguish from the syndrome and will not converge to the measured syndrome.
We will show an example of the BP decoding process for the [[162,2,9]] toric code. Given the error e where I e = {3, 12, 23, 37, 40, 55} (solid circles), its corresponding syndrome s has I s = {2, 3,11,12,22,23,36,37,39,40,54, 55} (solid squares). Fig. 4 shows the related section of the Tanner graph for this code, where variable node v j corresponds to the jth error location and check node c i corresponds to the ith syndrome location. We will use both names interchangeably. Furthermore, the error (or syndrome) pattern is denoted as the set of locations of a fraction of the nonzero elements of the error (or syndrome). Then, it can be shown from Fig. 4 that the weight-2 error pattern {83, 84}, shown as red dashed circles, and the weight-2 error pattern {3, 12}, shown as red solid circles, yield the identical syndrome pattern {2, 3, 11, 12}. These error patterns that have identical weights yielding an identical syndrome pattern, denoted in red, are indistinguishable to the BP decoder. Otherwise, if the given syndrome pattern corresponds to a unique  (7) for i = 2, 3, 11, 12, ∀ j ∈ N (i). Then, in (8), L 1 j = −2β ln 1−p p + ln 1−p p < 0 for j = 3, 12, 83, 84. Thus, the weight for the estimated syndromeŝ 0 is less than the measured syndrome s since the syndrome pattern {2, 3, 11, 12} corresponding to an indistinguishable error pattern will become zero due to the addition under mod 2 if all the indistinguishable error patterns occur.
As iterations increase, messages will pass between variable nodes and check nodes, and until the 122nd iteration, most of the locations of the nonzero elements inê k are identical to the locations of nonzero elements in e, as shown in Table 1. One can see that pattern {23, 37, 40, 55} is distinguishable as it is fixed following the 123rd iteration. When the maximum number of iterations is reached, BP decoding will not converge to the measured syndrome due to those indistinguishable error patterns.
To improve the BP decoding, three phenomena can be used as aids. First, we find that most of the time, at the zeroth iteration, the hard-decision output will include all the indistinguishable error patterns. Second, we find that at some iterations, the estimated syndromes are close to the measured syndrome such as the 123rd, 124th, and 126th iteration, as shown in Table 1. Last but not least, we find that those distinguishable error patterns and their corresponding syndrome patterns might be auxiliary to decoding those indistinguishable patterns at the beginning of the decoding process due to the interactive message passing between them.

C. NEW DECODING PATH BRANCHED FROM BP
Although indistinguishable error patterns may interfere with the BP decoding to converge to the measured syndrome, some estimated syndromes exist that are close to the measured syndrome for certain iterations. Thus, we thought it 2100415 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

Engineering uantum
Transactions on IEEE might be feasible to perform another new path of the BP decoding, hereafter called the branch and the original path is denoted as the trunk, using the information from those estimated syndromes. We expect that the syndromes used for the branch correspond to less indistinguishable error patterns and are close to the measured syndromes. To achieve this goal, we introduce a syndrome-pruning process so as to obtain the syndrome residual from both the measured syndrome and the specific estimated syndrome, and view it as the input for the branch. In addition, to identify such a specific estimated syndrome, we need to design some criteria during the iterative decoding.
First, a benchmark is needed in order to compare the closeness of the estimated syndrome to the measured syndrome. Most of the time, at the zeroth iteration, the estimated error contains all the possible indistinguishable error patterns. During the decoding process, if some indistinguishable error patterns disappear at a specific iteration, the corresponding estimated syndrome will be close to the measured syndrome, compared to the estimated syndromeŝ 0 . Thus, it is reasonable to setŝ 0 as the zeroth benchmark b. To identify an estimated syndrome that is closer to the measured syndrome thanŝ 0 , we use the Hamming distance as a metric since the syndrome can be viewed as a binary vector. That is, the first criterion is introduced as where w(c) is the Hamming weight for a vector c, and ⊕ is the addition under mod 2.
To further determine that the selected estimated syndrome is part of the measured syndrome and, thus, given the validity of syndrome pruning, we introduce the second criterion: When the estimated syndrome for the kth iterationŝ k , k = 0, satisfies both C.1 and C.2, the syndrome-pruning process is conducted. The syndrome residual τ s is obtained by pruning s usingŝ k , i.e., τ s = s −ŝ k . For this decoding branch, τ s is seen as the input for the BP decoding, which means that τ s replaces the s in (7). Suppose that at the th iteration in the branch, if the estimated syndrome corresponding to the errorê is identical to τ s , then we can declare that the decoding process is able to converge to the measured syndrome since H Z · (ê k +ê ) =ŝ k + τ s = s. Otherwise, when the maximum number of iterations in the branch is reached, the process will return to the trunk of the BP decoding. Meanwhile, b is replaced byŝ k , which can not only result in a lower w(b ⊕ s) for the next branch but also cope with the case where not all indistinguishable error patterns are included at the zeroth iteration.
We will show how such branch-assisted belief propagation (BBP) works and how both criteria C.1 and C.2 and the syndrome-pruning process are able to obtain the syndrome residual by using Fig. 4 and Table 1. Considering BBP decoding, the zeroth estimated syndromeŝ 0 will be recorded as the

TABLE 2. Reduction Ratio of Unsolvable Weight-2 and -3 Errors for BBP Compared to BP
zeroth benchmark, where w(b ⊕ s) = 4. At each iteration, only the estimated syndromes satisfying C.1 will be identified. In Table 1, since w(ŝ 123 ⊕ s) = 2 ≤ 4 andŝ 123 satisfies C.2, the estimated errorê 123 is recorded. Then, the measured syndrome is pruned using the estimated syndromeŝ 123 , and the syndrome residual τ s , where I τ s = {2, 3}, will be seen as the input to the new decoding branch. Since τ s corresponds to a unique weight-1 error, as shown in Fig. 5, according to Statement 1 in Section III-A, the decoding algorithm is able to converge to a unique syndrome. Indeed, combining the recordedê 123 andê τ where Iê τ = {3}, it can be shown that the channel error e is found.
From the simulation of the considered topological codes, it can be shown that our BBP decoding algorithm is able to converge to the measured syndrome for those weight-2 and weight-3 errors that the BP decoding is able to converge. Thus, we will focus on those weight-2 and weight-3 errors that the BP decoding cannot solve, i.e., meaning that it cannot converge to the measured syndrome. Table 2 shows that the BBP decoding process is able to reduce the unsolvable weight-2 errors from the BP decoding for the toric codes and surface codes considered by roughly 50%. In addition, when extended to the case of weight-3 errors, our BBP algorithm is able to reduce the number of unsolvable errors by about 60%. Now, we will discuss the difference in performance between the BP and BBP decoding processes based on the same complexity, i.e., the same average number of iterations. Following the settings used in [25], the maximum number of iterations for the BBP decoding is set to N for both the toric codes and the surface codes. To compare fairly, the comparison between the BP and BBP decoding will be under nearly the same average number of iterations. For the BBP decoding, the average number of iterations is computed using both the maximum iteration I t max for the trunk and I b max for the branch. Fig. 6 shows that for the [[162, 2,9]] toric code, the   logical error rate when using BBP decoding will converge to 3 × 10 −3 when the average number of iterations is 80, while BP decoding only converges to 1.5 × 10 −2 based on the same average number of iterations. For the [[113,1,8]] surface code illustrated in Fig. 2, the logical error rate for BBP decoding will decrease when the average number of iterations is 8. Fig. 7 shows that the comparison between the BP and BBP decoding for the case where there is a higher crossover probability, where similar comparison results can be observed.
In brief, the BBP algorithm introduces a new decoding path with syndrome residual τ s as its branch input. In this way, τ s in (7) will correspond to the error having less indistinguishable patterns, and the new decoding might eventually converge to τ s . It is interesting to explore the introduction of a subbranch within a branch to improve the BBP decoding. However, simulation shows that the performance when having an additional subbranch is similar to the performance  is a more reasonable candidate for the decoding since it has a lower weight compared to those real error patterns mentioned above. Thus, the observation at the zeroth iteration will violate the assumption of the BBP decoding, as shown in Table 3, since L 1 7 = L 1 159 = L 1 79 = L 1 160 = −β ln 1−p p + ln 1−p p > 0 in (8). Table 3 shows that, after sufficient iterations, the nonzero elements of the estimated error all belong to one of the indistinguishable error patterns, and even the BBP decoding will fail to converge to the measured syndrome.

D. SF PROCESS INTRODUCED IN THE TRUNK
To decode such an error described above, a heuristic method is to perturb the bit LLR in an indistinguishable error pattern. Another method inspired by the third observation in Section III is to introduce an error pattern not containing any location of the indistinguishable patterns. Then, due to the interactive message passing, there will be more different estimated syndromes than the original BBP decoding, which will also provide more diverse syndrome residuals for the BBP decoding. Thus, we propose an SF process for our BBP decoding algorithm where we change the sign of the posterior LLR for the selected variable node denoted in (8).
To decide which variable node's LLR sign should be flipped, we will first use the difference between the estimated syndromeŝ k at the kth iteration and the measured syndrome s in the trunk. At each iteration in the decoding process, the indices i of the mismatched check nodes, i.e.,ŝ k i = s i , are recorded. The process is shown as follows. In Step 2), we consider three different strategies to select the variable node whose LLR will be flipped. These strategies depend on the attributes of the chosen variable node. One reasonable strategy is to select the variable node involved in the largest number of mismatched check nodes, i.e., this variable node is likely to result in an estimated syndrome close to the measured syndrome. Since our SF process operates within the iterative decoding process, another strategy is to impose a disturbance on a randomly selected variable node related to the mismatched check node, which will provide diverse syndrome residuals and enhance the probability of convergence to the measured syndrome by the BBP decoding algorithm. Finally, we also consider a balanced strategy by comparing the LLR for those variable nodes neighboring the mismatched check nodes.
For now, we will show how our BSFBP works for the example above by using the so-called Global selection strategy. For example, compared to Table 3, Table 4 shows that at the 55th iteration, the SF process flips the LLR for variable node v 10 , which results in an estimated error containing a distinguishable pattern {82}. At first glance, it seems that the flipping procedure moves the estimated syndrome away from the measured syndrome. Nevertheless, the sign of the posterior LLR for v 8 is also indirectly flipped to positive in the next iteration, and the magnitude of its LLR is increased in the following iterations. At the 57th iteration, the decoder successfully identifies the error pattern {7, 159} instead of error pattern {8}, and the syndrome pattern {7, 78} is fixed. Then, at the 58th iteration, syndrome residual τ s where I τ s = {0, 8} corresponds to a weight-1 errorê τ , where Iê τ = {0}, which can be solved by the BBP decoding algorithm according to Statement 1 in Section III-A.
Three variable node selection strategies are described as follows.
S.1 Global selection strategy: This process will perform a global search from all the variable nodes neighboring all the mismatched check nodes and select the variable node which has the most neighboring check nodes with indices belonging to U, i.e., select the variable node v j , where j ∈ N (i), ∀i ∈ U such that |N( j) ∩ E| is the maximum for all E ∈ P(U ), where | · | denotes the cardinality of the set and P(U ) denotes the power set of U. This strategy needs to compare at most d c i · d v i candidate variable nodes. S.2 Reliability-based selection strategy: The process will select the variable node with the least reliability, which is neighboring a randomly selected check node whose indices belong to U, i.e., randomly select a check node c i , where i ∈ U. Select a variable node v j , where the reliability |L k j | ≤ |L k j | for all j, j ∈ N (i), j = j. In this way, this strategy only needs to compare d c i candidate variable nodes. S.3 Random selection strategy: The process will randomly select a variable node neighboring a randomly selected check node whose indices belong to U, i.e., select a variable node v j , j ∈ N (i), i ∈ U. The complexity is the lowest among these three strategies.
We show the reduction ratio of unsolvable weight-2 and weight-3 errors for the BSFBP decoding compared to the BP decoding in Table 5, which shows that the BSFBP decoding can reduce the number of unsolvable weight-2 and weight-3 errors for the considered toric codes across all selection strategies by up to 99%. For the considered surface codes, the reduction ratio is about 90% for both weight-2 and weight-3 errors compared to the BP decoding process. Overall, our   BSFBP decoding algorithm is able to converge to the measured syndrome between 99.48% and 99.99% of weight-2 and weight-3 errors depending on the codes considered.
Figs. 9 and 10 show the performance when using different selection strategies. These show that the relationship between strategies and performance depends on the codes. For example, the performance when using S.1 and S.2 for toric codes is almost identical, while, for the surface codes, the randomness of the selection seems to assist with the decoding. A possible explanation is that surface codes have an irregular parity-check matrix. Here, the term "regular" for the parity-check matrix refers to the fact that each row and column has a uniform weight. Otherwise, the parity-check matrix is called "irregular," which is equivalent to saying that the degree of all check nodes or variable nodes is not equal. Unlike toric codes, where every variable node has two neighboring check nodes, i.e., degree 2, the considered surface codes have degree-2 and degree-1 variable nodes, i.e., some of the variable nodes are only neighboring one check node. Since the criterion for the global selection S1 is to identify the variable node that has the most mismatched neighboring check nodes, this selection is affected by the degree of the variable node and is prone to choosing variable nodes that have higher degrees, while the random selection S3 randomly chooses a variable node regardless of the degree. Thus, for this case, using S3 provides a slightly better performance. However, it is obvious that the random selection is highly dependent on the maximum number of iterations, while the global selection is more stable. For example, if the maximum number of iterations is set to 30, the performance is almost the same when using either S1 or S3.
We also compare the performance between the BBP and BSFBP decoding based on Strategy S.1 under the same average number of iterations, as shown in Fig. 11. We show that the BSFBP decoding process not only reduces the logical error rate by about one order of magnitude compared to BBP decoding but also reduces the average number of iterations, which means that using the SF process is able to converge much faster.

E. SUMMARY OF BSFBP
The details for the complete BSFBP decoding process are shown in Algorithm 1. First, based on the structure of topological codes, the specific estimated syndrome is identified using two criteria in Step 9. Then, the syndrome residual is obtained in Step 10 and used as the input to a new branch in the BSFBP decoding algorithm. To provide our decoding with diverse syndrome residuals, we also introduce a flipping element to the posterior LLR in the trunk at Step 21 based on flexible selection strategies. Note that in [17] and [18], a random perturbation and a perturbation based on the symbol of the stabilizers of QLDPC codes are used on the prior channel probabilities, respectively, to refine the initial prior probability for the nonbinary BP decoding. In contrast, our SF process aims to provide diverse syndrome residuals by perturbing the posterior LLR for the binary BP decoding. In summary, our BSFBP decoding is able to mitigate the 2100415 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. i j using L 0 j for all j and i ∈ N ( j). 2: while k < I t max do 3: Vertical message exchange as (6) 4: Horizontal message exchange as (7) 5: Elementwise LLR update as (8). 6: Hard decision as (9) and set b =ŝ 0 . 7: ifŝ k =ê k · H T Z = s then 8: Declare a decoding success. 9: else ifŝ k satisfies both C.1 and C.2 then 10: Syndrome-pruning process: τ s = s ⊕ŝ k . 11: Initialization: Use τ s as the syndrome input and set the iteration index = 0 for the branch. 12: while < I b max do 13: Repeat (6)-(8) using syndrome input τ s and the iteration index . 14: if Hard decision:τ s = τ s then 15: Declare a decoding success. effects of those indistinguishable error patterns and improve the performance of the considered topological codes.

A. SIMULATION ON TOPOLOGICAL CODES
From the discussion presented in Section II, it is sufficient to simulate only one type of error. Thus, only the bit-flip channel and H Z will be considered for 2-D codes as in [25], while for 3-D codes, the phase-flip channel and H X will be considered as in [45]. In the simulation, we will show the performance of our proposed decoding algorithm compared to other methods using binary BP decoding [21], [25] based on the BSC model. For 2-D topological codes, we use variable node selection strategy S.2 in the SF process for toric codes, while for surface codes, S.3 is used. For 3-D topological codes, we use S.2 for both codes due to the lower maximum number of iterations, as discussed in Section III-D.
We show the performances of the [[162, 2,9]] and the [[242,2,11]] toric codes in Figs. 12 and 13, respectively. These shows that, since our BBP decoding algorithm uses the structure of topological codes, it is able to surpass the  performance of conventional BP decoding. The small-set-flip (SSF) can also be applied directly on regular HGP codes [22]. However, the performance of HGP codes with code lengths below 500 using SSF decoding is not significant, as shown in [46], and even worse than using BP decoding for the toric codes considered. Thus, we consider the BP decoding concatenated to SSF (BP+SSF). BP+SSF uses BP decoding to yield an estimated error and then searches for all the supports of the M stabilizer generators and adds to the error using the support vectors, which is able to repeatedly reduce the corresponding syndrome to zero. Consequently, most of the computation time is due to SSF. In this article, SSF will only operate when BP decoding reaches a predetermined fixed number of iterations, which is set to an average number of iterations. This method can be viewed as a variant of the hybrid decoder presented in [21]    the [[242,2,11]] toric codes, respectively. Both the figures show that the proposed BSFBP decoding algorithm is able to improve the logical error rate close to 10 −4 when p = 0.01 and surpass the conventional BP decoding method by about two orders of magnitude.
If we use our BSFBP decoding method as an auxiliary to OSD, the performance can be improved significantly. We consider OSD-0 and the higher order OSD using a combination sweep method (OSD-CS), as described in [25]. OSD-0 supposes that nonzero elements of an error are more likely to exist in those locations of the error that have a lower reliability rather than in other locations where the reliability is higher. In contrast, OSD-CS will consider the case that some nonzero elements will exist in locations that have a higher reliability. Figs. 12 and 13 show that the performance of our BSFBP decoding process concatenated with OSD-0 is almost identical to the BP decoding method concatenated with OSD-CS. When the BSFBP decoding process is concatenated with OSD-CS, it is able to reach a logical error rate of about 10 −7    1,10]] surface codes, respectively. By using the BSFBP decoding process, the performance improves by almost two orders of magnitude compared to the conventional BP decoding method. Since the BP+SSF decoding approach presented in [21] is devised for codes with a regular paritycheck matrix, it is not considered for the surface code with the irregular parity-check matrix. In contrast, using BSFBP decoding combined with OSD-CS produces an error rate below 10 −5 and 10 −4 for the [[113,1,8]] and the [[181,1,10]] surface codes, which improves two and one orders of magnitude compared to the BP decoding concatenated with both OSD cases, respectively. Thus, this shows that BSFBP can be a better auxiliary to concatenate OSD. It also shows that our BSFBP decoding method itself can even outperform the BP decoding concatenated with OSD, which indicates that it is possible to explore other modified BP decoders with lower complexity and better performance than OSD for the surface codes.
In Figs. 16 and 17, we show the performance for the toric code with a distance of 11 and the surface code with a distance of 8 when using the MWPM [31], respectively. The results show that by concatenating to OSD, it is possible to compete with the MWPM decoder for the considered 2-D topological codes. Although it has been shown that MWPM provides outstanding performance for topological codes, it requires codes that have a parity-check matrix containing column weights of, at most 2, which limits its universality [30], [31].
Consider the 3-D topological codes based on the homological product, which can be viewed as a generalization of the HGP construction [35]. These 3-D topological codes are popular for single-shot error correction, which means that the decoding process needs only measure the noisy syndrome in a single shot [45]. Both 3-D toric and surface codes are characterized by the X-stabilizer generators H X , Z-stabilizer generators H Z , and a meta-check matrix M c , which is used to correct syndrome noise. In this article, we will consider our decoder when applied to the 3-D toric and surface codes, both with a lattice size of 6 under the phase-flip channel, given the noiseless syndrome (without meta-checks). With a maximum number of iterations equal to 30, as in [47], Figs. 16 and 17 show that our BSFBP decoder can also operate on 3-D topological codes and even outperform BP by two orders of magnitude. For these codes, MWPM has restrictions on direct decoding since the column weights in H X and H Z are larger than 2.

B. SIMULATION ON OTHER CODES
In this subsection, we will further explore the performance of other codes by using our proposed decoder in Fig. 18. First, the [[25,1,5]] surface code that is rotated by 45 • is considered [48]. This shows that our BSFBP is able to provide an improvement of about two orders of magnitude compared to BP. Then, we apply our decoder on [[256,32]] bicycle code with a maximum number of iterations equal to 12, as in [20]. The result shows that our decoder provides only a slight improvement. The reason for this is that the design of our BBP is based on the HGP structure; hence, only the SF process might provide an effect for the bicycle code. This can be indicated clearly by comparing it to the performance on the [[625, 25,8]] QLDPC codes based on the HGP structure [25], which can improve performance by 1.5 orders of magnitude compared to BP.

V. CONCLUSION
In this article, we proposed a BSFBP decoding method for topological codes based on the HGP structure. Our algorithm introduced the criteria and syndrome-pruning process to compute the specific syndrome residual. Then, a new decoding path using this syndrome residual as the input was then branched from BP. An SF process was also proposed on the original path of BP, which can provide more diverse syndrome residuals for decoding. For the considered topological codes, our BSFBP provided remarkable performance as a stand-alone decoder compared to BP. On the other hand, as an auxiliary decoder to OSD, we showed that it is possible to compete with the MWPM for the considered 2-D topological codes. We also showed that our BSFBP can be applied to those codes where the MWPM is limited, such as 3-D topological codes and other types of codes based on HGP structures.
Though for 2-D topological codes, it might be difficult to devise a BP-based decoder, which can compete with those state-of-the-art MWPM-based decoders such as in [28] and [29] based on similar performance and complexity, for single-shot codes such as 3-D toric and surface codes, the optimal decoder still needs to be explored in the future. Moreover, considering fault-tolerant computation, it would be interesting to compare the time overhead by applying our decoder on single-shot codes to hardware-efficient decoders such as those presented in [49], [50], and [51], when using multiple rounds of syndrome measurement on codes that have similar code parameters.