Marker Codes Using the Decoding Based on Weighted Levenshtein Distance in the Presence of Insertions/Deletions

A random marker code is inserted into the information sequences periodically, and a novel symbol-level decoding algorithm considering the weighted Levenshtein distance (WLD) is designed for correcting insertions, deletions, as well as substitutions in the received sequences. In this method, branch quantities in the decoding trellis are calculated by measuring the WLD, which is done using the dynamic programming. A simulation study is performed to demonstrate the effectiveness of the presented scheme in the practical system, especially for channels with weak synchronization problems.


I. INTRODUCTION
Loss of synchronization due to the imperfect of the sampling clocks may cause catastrophic consequences with the variable length and enormous substitutions, which are of great interest in the communication systems [1]- [5]. Channels with errors caused by the loss synchronization have memory, and the techniques designed for memoryless channels can seldom be employed directly [6]- [12].
The DM construction proposed by Davey and MacKay is the most promising technique for recovering the synchronization. In this scheme, a watermark code is used to correct synchronization errors including insertions and deletions, and a non-binary low-density parity-check (LDPC) code is employed to correct the residual errors. The watermark as a known code is added modulo 2 to the LDPC code. At the receiver, the watermark decoder passes the decoding trellis and compares the received sequences with the known watermark bit-by-bit to output the log-likelihood ratios (LLRs).
In the recent past, on one hand, some modifications to the decoding algorithm of the DM construction have been made for the channel with synchronization errors to improve the performance. Briffa designed a symbol-level watermark The associate editor coordinating the review of this manuscript and approving it for publication was Qilian Liang . decoding algorithm that takes the codebook of LDPC code into account, and reduced the block error rate significantly [13]. Subsequently, based on the scheme in [13], Jiao proposed an iterative decoding algorithm which allows soft a priori input, and further improved the synchronization-errorcorrecting capability of the system [14]. On the other hand, several new encoding schemes based on the DM construction are presented. In [15], marker code was used in the concatenated code, and provides easier synchronization. Marker can be viewed as an irregular version of the watermark. In terms of the decoding scheme in [15], it employs the bitlevel forward-backward algorithm in order to identify the insertions and deletions.
In this paper, we focus on the communication over the binary insertions/deletions-substitutions channel model adopted in [16]. Particularly, we propose that the marker code as a known pattern is inserted at regular intervals. Furthermore, considering that the branch quantities in the computations of the forward/backward quantities can be replaced by the weighted Levenshtein distance (WLD), which is not constrained by the inner encoding method, a novel forward-backward decoding algorithm in the pure symbollevel is presented for the correction of synchronization errors. The branch quantities in the decoding trellis are efficiently calculated by taking into account the WLD, where the WLD is used to measure the distance between the transmitted and received sequences having insertions/deletions. The forward-backward decoding algorithm based on WLD is powerful. Moreover, unlike previous algorithms, the presented algorithm can be used flexibly in the system when markers or watermarks are employed. The rest of the paper is organized as follows. Section II introduces the system model and the encoding scheme. Section III describes the method for computing the WLD and the proposed forward-backward decoding scheme. In Section IV, results of simulations conducted to prove the effectivity of the proposed scheme. Finally, some conclusions are drawn in Section V.

II. SYSTEM MODEL AND ENCODER
In this paper, the binary insertion/deletion-substitution channel with random and independent bit errors is considered, which can be modeled the imperfect synchronization. A proposed reliable communication system working on this binary channel will be illustrated, and the marker encoding method will be described in detail in this section.

A. SYSTEM MODEL
The proposed scheme depicted in Fig. 1 employs the known marker code as an inner coder, and uses the binary LDPC code as an outer code. The binary information b is first encoded by the outer LDPC code, and is mapped into the binary code d (N L , K L ). The known marker code w of length N is then inserted into d uniformly, which producing the transmitted code x with the length of N c . x is sent over the random binary insertion/deletion-substitution (IDS) channel [16].
The models of the IDS channel are shown in Fig. 2, where N * c is the length of the received sequences. For each bit of the sequences, there will be four situations that may occurs. Specifically, a random bit is inserted into the sequence with P i , or the transmitted bit is deleted with P d , or the bit is added (modulo 2) to the bit '1' with P s , or the bit is transmitted correctly, where parameters P i , P d , and P s denote insertion, deletion, substitution probabilities, respectively. The transmission probability P t = 1 − P i − P d . For each block, the probability of the channel making N i insertions, N d deletions, and N s substitutions is computed as follows [17].

B. ENCODING SCHEME
The outer encoder employs the binary LDPC encoding method due to its excellent error-correcting performance. Then, the LDPC code d is first divided into N L /m symbols each of which having m-bits. The number of values q that the symbol taking is satisfied q = 2 m . In this paper, we choose the pseudo-random sequence as the marker code. The inner encoder allocates λ bits of the marker code to each sub-sequence of d in order to generate x. The structure of the code x is illustrated in Fig. 3.

III. THE PROPOSED INNER DECODING METHOD USING THE WLD
In this section, the proposed novel forward-backward algorithm decoding on the symbol-level will be described in detail.

A. WLD
The distance between the transmitted and received sub-sequences is measured by computing the WLD, which is done using the dynamic programming [18]. The WLD is used to calculate the output probabilities which will be shown in sub-section B. Define the WLD between arbitrary two sequences s and s as follows.

B. THE PROPOSED FORWARD-BACKWARD ALGORITHM
In order to recover the synchronization, based on the decoding trellis, the inner decoder identifies the insertions and deletions in the received sequences. By executing the symbol-level forward and backward passes, LLRs can be obtained, which initialize the LDPC decoder to correct the residual errors.
Since the symbol probabilities are equal, the LLR for i-th bit is computed in (3).
where 0 < a ≤ q − 1, 0 ≤ i < (N L /m),d i is the symbol value corresponding to the string having m-bits, the state t i is the drift at the i-th position and t i = N i −N d , M (·) denotes the middle quantity. If the bit x i is not deleted then it will appear in the sequences as x i+t i+1 . The maximum of the state t max is set to 5 √ N c P d /(1 − P d ), and the number of states at each time is T = 2t max + 1.
The symbol-level forward probability that the drift t i is τ and that the first ((m + λ) × i + τ − 1) bits output by the channel are calculated as follows.
where, y = (y (m+λ)×i+τ , · · · , y (m+λ)×(i+1)+b−1 ). The branch quantities in eq. (5) and eq. (6) are calculated by considering the WLD. We first deduce the log-likelihood function of the branch quantity under the condition that the marker is known for the receiver.
= exp WLD(s i , y 0 ) + (m + λ)log(P t (1 − P s )) , (9) where s i → (w i ,d i ).   The diagrammatic sketch for the symbol-level forward and backward passes on the decoding trellis are shown in Fig. 4. In this figure, trellis with T = 5 is taken as an example, and the maximum insertions for each state at each bit I max is set to 2. Specially, 1 ≤ j ≤ 5 is considered. The path j → j → j is the maximum pass for the j-th state, which produces the largest received sequence. The path j → j → j is the minimum pass for the j-th state, which produces the shortest received sequence. Furthermore, the case for five states is generalized to T = 2t max + 1 = 10 √ N c P d /(1 − P d ) + 1. After a symbol-level pass, the j-th state at the i-th symbol can achieve states {j − (m + λ), · · · , j + (m + λ) × I max } at the (i + 1)-th symbol.

IV. SIMULATION RESULTS
In this section, some examples were used to evaluate the performance of the proposed scheme using the decoding based on the WLD. Table 1 shows the parameters of all outer codes used in the concatenated codes, and Table 2 gives the parameters of concatenated codes whose performance are reported in this paper. In Table 1, two irregular LDPC codes and a regular LDPC code, with different codelengths and different construction methods, were selected as the examples. A binary pseudorandom marker vector was created. λ = 1, m = 2.I max = 5.P i = P d . A belief-propagation algorithm in log-domain was used in the LDPC decoder and the maximum number of iterations was set to 20. Since the drift was set to zero at the start, the forward quantities for 0-th symbol in the first block are calculated as follows.
The backward quantities in each block are initialized with equal probabilities since the accurate block boundary is unknown at the receiver. In order to demonstrate the efficiency of the proposed decoding scheme, a set of standard concatenated codes using binary LDPC codes were simulated to evaluate the performance of algorithms. As shown in the following three pictures, i.e., Fig. 5-7, all the block error rates (BERs) of code A-C decrease with the insertion/deletion (I/D) probabilities reducing. Furthermore, the BERs as a function of I/D  probabilities for several choices of the substitution probability are illustrated in the Fig. 5-7, respectively. Increasing P s affects the performance smoothly. As a result, the degradation in the decoding performance with P s increasing.
Firstly, the performance of code A for the different P s is evaluated and illustrated in Fig. 5. From the Fig. 5, we notice that there is a graceful degradation in the performance for decreased values of (I/D) probabilities. Next, we simulate the decoding performance when using code B. It is worth mentioning that this comparison is not entirely fair as the inner codes consider in [16] is watermark code while in our scheme is marker code. This comparison is illustrated in Fig. 6. To make the comparison as fair as possible, we select the code B in Table 2 according to the parameters of codes considered in [16] with N c = 4002 and R c = 0.5. It is clear from Fig. 6 that an evident improvement in the error correction performance is achieved by using the proposed scheme. Furthermore, when code C is employed, three simulation curves with different substitution probabilities are shown in Fig. 7. As is exhibited in Fig. 7, the error correction performance is significantly improved with the decreasing of insertions and deletions. Therefore, the efficiency of the proposed scheme for the given channel model is significant, and simulation results fully meet the performance requirements.
In detail, as is shown in Fig. 7, the BER of less than 10 −3 was achieved for P i = P d = 0.03 and P s = 0.001. At these noise levels there are an average of 360 synchronization and 6 substitution errors per block. Obviously, the performance of the proposed decoding scheme is effective.

V. CONCLUSION
In this paper, we proposed a decoding framework employing the random marker code inserted into the information sequences periodically, and an innovative symbol-level decoding scheme considering the WLD is designed for correcting insertions, deletions along with substitutions in the received sequences. The performance of the forwardbackward decoding algorithm based on WLD is effective, which is demonstrated through simulation that conspicuous amount of insertions and deletions could be corrected by the proposed scheme. Moreover, unlike earlier forward-backward decoding algorithms, the presented algorithm can be used flexibly in the system where marker or watermark is used. Future work will investigate the construction of better encoding/decoding algorithms that are suited for the given channel model.