Low-Complexity Decoding Algorithms for Distributed Space-Time Coded Regenerative Relay Systems

We examine decoding structure for distributed space-time coded regenerative relay networks. Given the possible demodulation error at the regenerative relays, we provide a general framework of error aware decoder, where the receiver exploits the demodulation error probability of relays to improve the system performance. Considering the high computational complexity of optimal Maximum Likelihood (ML) decoder, we also propose two low-complexity decoders, Max-Log decoder and Max-Log-Sphere decoder. Computational complexities of these three decoders are also analyzed. Simulation results show that error aware decoders can improve system performance greatly without high system overload and Max-Log decoder and Max-Log-Sphere decoder can drastically reduce the decoding complexity with negligible performance degradation.


Introduction
Relay-assisted communication is a promising strategy that exploits spatial diversity available among a collection of distributed single antenna terminals for both centralized and decentralized wireless networks. In most relay networks, a two-stage relaying strategy is used. In the first stage, a source transmits and all relays listen; in the second stage, the relays cooperate to forward the source symbols to the destination. Generally speaking, the relay functions can be separated into two types, regenerative and nonregenerative. If the relay processes the received signal, we call it regenerative relay, such as Decode-and-Forward (DCF) [1] and Demodulationand-Forward (DMF) [2]. Otherwise, we call nonregenerative relay, such as Amplify-and-Forward (AF) [1].
It is well known that the channel between source and relay is unreliable because of fading and noise. The relay receives an attenuated version of the source signal. AF relaying scheme amplifies noise. DCF scheme always using cyclic redundancy check (CRC) will cause interruptions when the relay detects errors from the received message. DMF scheme is a tradeoff between AF and DCF in relay processing. Relay can always keep a transmit link from the source and detects and possibly decodes the source signal [3]. Moreover, the DCF scheme can also be considered as a special case of DMF if we consider the null signal as one choice of the modulation constellation. Therefore, in this paper, we treat DMF as the object to be studied for regenerative relay networks. However, DMF relay has an important disadvantage, which is the error produced in relay's Maximum Likelihood demodulation degrades the effective SNR at the destination significantly, which is called error propagation [4]. For distributed space-time coding system in regenerative relay networks, the degradation is more drastic [5,6]. In [3], we proposed a threshold-based scheme to minimize the error propagation, which is an active mechanism equipped in relays but subject to the large computation complexity.
In this paper, we intend to investigate the ML decoding structure where the destination is able to be aware of the error probability at the relays. Since the error probability at relay is a monotonic decreasing function of received SNR at relay, the destination can estimate the error probability through training sequences which is transmitted by source and amplified by relay. Meanwhile, each relay also transmits its training sequence to estimate the relay-destination channel [5,7]. Therefore, error aware distributed spacetime decoding is reasonable. After analyzing the conditional likelihood function, we give a general framework of error aware decoder for regenerative relay networks. Because the proposed ML decoder is composed of multiple likelihood function generators, the computational complexity is too large to be affordable in some cases. Due to max-log approximation, we provide a Max-Log decoder based on Csiszár-Tusnady algorithm [8]. Moreover, to reduce the complexity further, we also propose a Max-Log-Sphere decoder which combines max-log approximation and sphere decoding. In addition, we analyze complexities of these decoders in terms of elementary operation number. Finally, simulations verify the low complexity and improved performance of our proposed decoders.

System Model
We consider a wireless network with N randomly placed relay nodes, relay i = 1, . . . , N, one source node S, and a destination node D. Each node is equipped with only a single antenna and uses the Half-duplex mode. Denote the channel from the source to the ith relay as f i and the channel from the ith relay to the destination as h i . Assume that { f i } and {h i } are independent complex Gaussian random variables with zero-mean and variance δ 2 si and δ 2 id , respectively. Receiver noise is assumed as complex Gaussian random variable with zero-mean and unit-variance. We assume a block fading channel model, where channel gain stays constant during a time block and changes from block to block [1]. We also assume that the instantaneous channel is unknown to the transmitting node but perfectly known at receiving node. Assume that the source wishes to send the signal s = [s 1 , s 2 , . . . , s T ] T to the destination, where s i ∈ A and A is a finite constellation with average power 1/T. Here T is the signal block length. Hence, E{s H s} = 1. Assume s is in the codebook S = {s 1 , . . . , s L }, where L ≥ 2 is the cardinality of the codebook. For the convenience of expression, we consider all transmit power is unit.
During the first stage, the source node transmits s l , l ∈ [0, L − 1] to all relays, then each relay tries to demodulate the received signal. Denote the demodulated symbol vector at the ith relay is s i , and s i ∈ S. By (6.5) of [9], the demodulation probability can be written as where x e −t 2 /2 dt. If the source node transmits s l , the ith relay decode its received signal as s i / = s l with probability Q( (1/2)| f i | 2 |s l − s i | 2 ). Obviously, the probability of decoding successfully at the ith relay is At the ith relay, the received signal s i is mapped onto a T × 1 vector (not necessary), F i ( s i ), as processed at one antenna of colocated space-time coding transmitter. We assume the map function F i is invertible. Therefore, there are L possible transmitted vectors to the destination for the ith relay, because s i could be any vector in S. Herein, we assume all mapping functions F i , i = 1, . . . , N, are different with each other. Then, all relays transmit the mapped vectors to the destination. At the destination, the received signal is where Y = [y 1 , . . . , y T ] T is the received signal, H = [h 1 , . . . , h N ] T is the relay to destination channel vector, and N is Gaussian white noise. Define a codebook Thus, if C k is transmitted, we can express (2) as Denote the inverse function of F as F −1 . Then, we have where (1), we can derive the exact value of (4). Given a C k , the conditional probability density function of Y is

Error Aware Maximum Likelihood Decoder
In this section, we provide a general Maximum Likelihood decoder for distributed space-time coded regenerative relay networks. First of all, destination should know the channel information in this relay networks. The channels from relays to the destination h i , i = 1, . . . , N, can be estimated through pilot symbols which are transmitted by each relay before data transmission [10]. Herein, we assume all the estimators are ideally accurate without error. The effect of estimation error will be checked in simulations. To let the destination know the demodulation error probability of each relay, we propose the following extra channel estimation scheme.
Step 1. The source transmits its pilot symbol to all relays through channels f 1 , . . . , f N . Without demodulating, each relay maps the noise version signal to a vector like the scheme proposed in [11].
Step 2. Each relay transmits the vector to the destination like the Amplify-and-Forward based distributed space-time coding [12].
Step 3. The cascaded channel between source and destination carried by amplified pilots, that is, f i h i , i = 1, . . . , N, can be estimated at the destination like [11].  Therefore, f i also can be estimated by the above channel estimation scheme. It is difficult to analyze the effect of channel estimation error in error aware decoders, but we simulate that in Section 6. In following context, we just assume there is no channel estimation error to allow us to focus only on structures of error aware decoders. Because the signal vector set S and all mapping functions {F i } are known at the destination as a prior knowledge, (4) can be derived at the destination.
If the transmitted signal is s l , the likelihood function is Therefore, the error aware ML decoder is Utilize (4)-(6), then (7) is derived. By (5), P(Y | H, C k ) is independent of s l , so that according to the estimated channel H, P(Y | H, C k ) can be calculated first. Then, using the amplified pilot, P(C k | F, s l ) is also derived. Therefore, the ML decoder can be built in Figure 1, where we show the structure of error aware ML decoder for regenerative distributed space-time coding. Note that there are L adders and L N likelihood function generators. The complexity of the error aware ML decoder equals to that of L N colocated space-time decoders it is too large to be affordable if the signal block length, modulation order, and the number of relay are considerablely large. Reference [2] considered a piecewise-linear approximation to solve a similar problem, but in this case it is also too complicated to design an approximation function.

Optimality of Error Aware Decoder.
To prove the optimality of our proposed error aware receiver, we need to analyze the error performance difference between the error aware decoder and nonerror aware decoder (traditional receiver). Unfortunately, it is difficult to derive the exact error performances of error aware decoder and nonerror aware decoder. To illustrate the optimality, we try to give following two points (in Figure 2).
(1) Receiver Rule. Since the error aware decoder is based on ML rule, it should be the optimal receiver [10].
(2) Signal Space Description. To express clearly, we set the source transmit a symbol s ∈ A and A = {1, 2, 3}. We assume 1 is transmitted by the source. Consider there are 2 relays in the relay networks. Then, it is obvious that there could be 3 2 (6), error aware decoder is equivalent to the nonerror aware decoder. Therefore, both have the same error performances.
Case 2 (Error happens at relays). In our interested situation, the impact of noise is very slight therefore, the system SNR is very high and we could only consider the closest symbols as errors. As a result, our candidates are also limited between {1, 2} and {2, 1} in error aware decoder. As we know, the error performance is a Q-function of the distance between the transmitted symbol and the received symbols on the signal space [10]. Therefore, the error performance of nonerror aware receiver could be expressed as Q( d 2 3 SNR). By (6), the error performance of error aware decoder is Q( d 2 1 SNR)Q( d 2 2 SNR). As we consider high SNR regime, then there is Q(x) ≈ exp(−x 2 /2). Then the error performance of error aware decoder can be approximated as exp(−(d 2 1 +d 2 2 )SNR/2). Denote the angle between d 1 and d 3 as θ. Because the received signal is so close to {1, 2}, then cos θ > 0. By the law of cosines, we have exp(−(d 2 , we can obtain the same result. So we can prove that error aware decoder outperforms nonerror aware decoder.

Low-Complexity Error Aware Decoders
In this section, we will introduce two low-complexity error aware decoders through analyzing and simplifying the structure of ML decoder. The simplifying process we used herein can be extended for more general cases to obtain lowcomplexity decoders. First, we use Max-Log approximation to derive a Max-Log error aware decoder which can work with Csiszár-Tusnady algorithm. Second, to reduce the complexity further, sphere decoding also is combined into the Max-Log decoder, which is called Max-Log-Sphere decoder.

Error Aware
Because log(x) is an increasing monotonic function, the ML decoder can be rewritten as According to (5) and max-log approximation in [13], we derive where λ k,l = log(P(C k |F, s l )).
We can see that decoding distributed space-time code becomes searching a two-dimension array, which is indexed by (k, l). Intuitively, this decoder also needs L N ML detector like ML decoder. The only difference is that calculating the likelihood function of each symbol vector does not need cross-computation. However, double maximization problem can take advantage of Csiszár-Tusnady algorithm to reduce computing [8]. Because the set { Y − C k H 2 } is a set of distance measure which is one-one mapped to a probability distribution set and {λ k,l } is the set of probability distribution, moreover, {λ k,l } ≤ 0, then (10) can be seemed to seek the vector which has the minimum sum distance which equals to the distance from s l to C k plus the distance from C k to Y. Thus, Csiszár-Tusnady algorithm does converge to the maximum element [8]. We summarize the iterative Max-Log decoder as follows (Figure 3).

Error
Aware Max-Log-Sphere Decoder. If the length of vector s and the constellation size are sufficiently large, Max-Log decoder is also subject to the implementation. The largest computation is required for searching code set C with cardinality L N . Reducing the decoder complexity depends on searching C.
To state the Max-Log-Sphere decoder, we first find the real-valued equivalent of (3), Define where R{·} and I{·} denote real part and imaginary part. By (10), we yield arg min For a specific s l , the decoding object is arg min where ⊗ is the Kronecker product operation. Obviously, we can use sphere decoding method [14,15] to searching C k , which minimizes (13). Note that N 2 = Y − C H 2 is an χ 2 random variable with 2N degrees of freedom. We choose the radius r to be a linear function of the variance of N 2 where the coefficient α is chosen in such a way that with a high probability P f p we can find a lattice inside a sphere 2αN 0 (1) Initialization: Set error probability set P . Take any element of C k ∈ C and compute D k = − Y − C k H 2 . (2) Step 1: Find the l that makes D k + λ k,l is maximum for the chosen k, where λ k,l ∈ P and is calculated by (4). (3) Step 2: Fix l and find a D n ∈ C which makes D n + λ n,l is maximum.  where Γ(N) = ∞ 0 t N e −t dt. Note that the radius is chosen based on the noise not on channel efficiency. As stated in [14], this point has a beneficial effect on the computational complexity.
For expression convenience, we define H T ⊗ I = B and Vec { C k } = X with size 4NT × 1. Therefore, searching X is equal to searching C k . Applying the idea of the Fincke Pohst algorithm (See Algorithm 1), we search for the point X that belongs to the geometric body described by where where δ k,l,M is defined as Herein we allocate the additional weight δ k,l,m averagely over N relays. Moreover, we define and a new necessary condition can be written as In a similar fashion, one proceeds for x M−2 , and so on, and until all components of vector X are found. Note that the dominant difference between Max-Log-Sphere decoder and sphere decoder proposed in [14] is that radius varies according to all possible code words. In this Max-Log-Sphere decoder, for each k, we just check X k whether to meet its radius. If there exists more than one X k can meet the constraint (17) for x m , keep these survival code word and go to next code word. If some of these code words cannot meet the new constraint, then drop them. That is to say for each lattice we must try all possible radiuses. For a specific s l , Max-Log-Sphere decoder can be summarized as Figure 4. After terminating the decoder algorithm for s l (See Algorithm 2), select the C k which achieves the minimum distance to Y. Then through L Max-Log-Sphere decoder with l = 1, . . . , L, choose the s l which minimizes the distance to Y.
Note that Max-Log-Sphere decoder needs estimating the noise variance of the receiver. However, Max-Log decoder using Eulerian distance and error probability is more realizable. Hence, there is a tradeoff between computational complexity and implementation to choose which one is suitable.  (5), else to (4). (4) (Increase k) k = k + 1, If k = L N + 1, terminate algorithm, else go to (1).
, and go to (2) (6) Solution found for k. Save k, X k and exact distance d k,l . and set k = k + 1, if k = L N + 1, terminate algorithm, else go to (1).

Computational Complexity Analysis
In this section, we analyze and compare the computational complexity of above three decoders. We use the average numbers of real elementary operation, C p (including addition, subtraction, multiplication, and division), as a measure for computational complexity. (5), it easy to know that compute P(Y | H, C k ) needs TN +5T +3N +1 times additions and 4NT + 4T multiplications. Similarly, observe (4), we also can figure out that compute P(c k | F, s) needs N − 1 multiplications. Therefore, there are (L N −1)+L N (TN +5T + 3N − 1) additions and L N (4NT +4T + N − 1) multiplications to obtain P (Y | F, H, s l ). As a result, it needs

Complexity of ML Decoder. By
operations (additions and multiplications) to perform ML decoder.

Complexity of
As stated in system model, . That is to say ML decoder has higher complexity than Max-Log decoder.

Complexity of Max-Log-Sphere Decoder.
Max-Log-Sphere decoder for a s l has L N radiuses but each radius is only assigned for searching one possible C k . According to [14,15], an arbitrary lattice point X k that belongs to an m dimensional sphere of radius r k,l around the transmitted International Journal of Distributed Sensor Networks 7 point X t is given by the following incomplete Gamma function: where x M ] T , and Pr is the relay transmit power, which is assumed as unit in above context. The number of elementary operations that the Max-Log-Sphere decoder performs per each visited point in dimension m is Denote (24) as P k,l ; therefore, C p of Max-Log-Sphere decoder is yielded as It is difficult to compare Max-Log-Sphere decoder with other decoders, but in next section we will show simulation results to illustrate the differences.

Simulation Results
In this section, we provide the simulation results to show the proposed error aware decoders. We denote the total power noise ratio as the system signal-noise ratio (SNR) indicator. And half of total power is assigned for source transmit power, and another half is equally divided by all relays. In this simulation, we adopt distributed linear dispersion code proposed in [12] as the coding scheme for its simplicity, where F i (s) = A i s and A i is a random unitary matrix. For Max-Log-Sphere decoder, herein we set P f p = 0.99. All other parameters are the same with system model. We also should claim the nonerror aware decoder is arg min where C(s) = [A 1 s, A 2 s, ..., A N s]. Figure 5 demonstrates bit error rate (BER) performances of different decoders where two relays are employed and the signal modulation is BPSK. That is to say T = N = 2. We can see that at high SNR regime error aware decoders achieve almost 6 dB gain than nonerror aware decoder and outperform AF scheme-based ML decoder about 3 dB. Thus it is worthy to bring slight system overhead for delivering channel estimation to improve the system performance. Over all SNR range, Max-Log decoder and Max-Log-Sphere decoder have nearly the same performance with ML decoder. Therefore, the degradation of Max-Log approximation is negligible. Carefully observing, we found that the slope of  BER curve decreases. The reason is that the hard-decision error at relay limits the systems performance even though SNR is enough high. In Figure 6, we also simulate a 4-relay network to show the BER performance of that decoders. Herein, T = N = 4 and modulation is QPSK. Similarly, error aware decoders can bring about 7 dB power gain than nonerror aware decoder at 22 dB SNR. We can see that it is different from Figure 5 that error aware decoders only achieve about 1.5 dB gain than AF-based ML decoder. The reason is that high-order modulation incurs more error after decoding at relays and enlarges errorpropagation so that deceases the possible gain of error aware decoder. And in this case, the differences of three error aware decoders are more slight. It is interesting that the slope of BER curve does not decrease here. That is because more relays bring more error conditions and consume more power. Therefore, the slope decreasing threshold is larger than 2 relay with BPSK system. From both two figures, we can assert that error aware decoders can improve the system performance efficiently with little system cost.

Performance Comparison with Ideal Receivers.
For distributed space-time coded (DSTC) relay networks, [12] had proved that the maximum achievable diversity order is min{N, T}. Reference [16] addressed that demodulate-and-forward scheme in a relay network where direct link is available can only achieve half of maximum diversity. In our simulations, relay network with nonerror aware decoder has an even less diversity, that is, the diversity of nonerror aware decoder in Figure 5 is 1 and in Figure 6 is only 1.2. That is because there is no direct link in our model and direct link which does not produce demodulation error. Adding a direct link can increase the system diversity by one but adding on one relay could not  gain advantages, but even get worse. Essentially speaking, demodulation error limits the diversity growing. On the other hand, error aware decoder has a larger diversity than nonerror aware decoder. That is to say error aware decoder gains advantages from the available error probabilities.
Since there are also demodulation errors at relays, error aware decoder cannot achieve the full diversity at finite SNR.

Performance Comparison with Practical Receiver.
In order to validate the practical performance of our proposed error aware decoders, we also consider a practical receiver at the destination, where channel state information is generated by channel estimator. It means that channel state information is not perfect and has estimation error. We set the transmit power of pilot symbols used to estimate channel equal to the transmit power of data symbols. The procedure of channel estimation follows that 3-step scheme described in Section 3. Channel estimators are built on minimum mean square error (MMSE) rule [11]. In addition, the performance degradation of low-complexity decoders is incurred by less searching in codebook. Moreover, both Figures 5 and 6 prove that low-complexity decoders achieve similarly performance compared with the error aware ML decoder. Therefore, in following simulation, we do not draw the performance of all three error aware decoders but error aware ML to compare with other schemes. Figures 7 and 8 give the BER performances of different decoders with practical receivers where channel state information is not perfect. Clearly, the channel estimation error does not change performance relationship among nonerror aware decoder, AF-based ML decoder, and error aware ML decoder. Comparing Figures 5 and 7, we can   see that the performance gain obtained by error aware decoder as compared to AF-based ML decoder decreases from 3 dB to 2.5 dB. We also can find that the gain of error aware decoder over nonerror aware decoder decreases from 6 dB to 5 dB through comparing Figures 6 and 8. That is to say that the uncertainty of channel state information does degrade the performance of our proposed error aware decoder but the degradation is limited. Error aware decoder still outperforms nonerror aware decoder. In summary, our proposed error aware decoder works well in practical receivers.
International Journal of Distributed Sensor Networks

Complexity Comparison.
We will show the computational complexity of three error aware decoders by elementary operation number. Note that the operation number of Max-Log-Sphere decoder varies with unitary matrices A i and the channel realization because of (13). We average the elementary operation number over 1000 channel realizations.
In Figure 9, we show the average operation number of these three decoders when 2 relays are employed. Obviously, C p s of ML decoder and Max-Log decoder are independent of SNR. Max-Log decoder has a lower complexity than ML decoder. Max-Log-Sphere decoder needs far smaller operation number than that of ML decoder and Max-Log decoder. Of course, for 2-relay network, the operation number of ML decoder is trivial compared with current hardware computing rate. However, for 4 relays with QPSK modulation scheme, it is too large to be affordable. Figure 10 gives the elementary operation number in this case. We can see that ML decoder has 1.4404e + 014 operations! The operation number of Max-Log decoder is nearly 1% of that of ML decoder. It is notable that Max-Log-Sphere decoder needs only 0.1% operation number of Max-Log decoder. Therefore, Max-Log-Sphere decoder achieves the same BER performance with optimal ML decoder but costs drastically low computation. Although Max-Log-Sphere has an attractive performance, the noise variance should be estimated first to calculate searching radius [14]. Max-Log decoder just utilizes Eulerian distance and error probabilities; therefore, it is a good tradeoff for decoding structure between implementation and computational complexity. We can choose one of them due to different receivers.

Conclusion
In this paper, we provide a general framework of error aware distributed space-time decoder for regenerative relay networks. Through two-stage pilot symbols, the destination can estimate not only the relay-destination channel but also the error probability happening at relays. Using these estimated error, Maximum Likelihood decoder is provided. To reduce computational complexity, Max-Log decoder and Max-Log-Sphere decoder are also proposed by max-log approximation. Simulations show that error aware decoders can improve the performance drastically. Max-Log-Sphere decoder can achieve the same performance with ML decoder and needs far lower computational complexity. Without noise estimating, Max-Log decoder can make a good tradeoff between implementation and computational complexity.