Advanced Receiver Design for Quadrature OFDMA Systems

Quadrature orthogonal frequency division multiple access (Q-OFDMA) systems have been recently proposed to reduce the peak-to-average power ratio (PAPR) and complexity, and improve carrier frequency o ﬀ set (CFO) robustness and frequency diversity for the conventional OFDMA systems. However, Q-OFDMA receiver obtains frequency diversity at the cost of noise enhancement, which results in Q-OFDMA systems achieving better performance than OFDMA only in the higher signal-to-noise ratio (SNR) range. In this paper, we investigate various detection techniques such as linear zero forcing (ZF) equalization, minimum mean square error (MMSE) equalization, decision feedback equalization (DFE), and turbo joint channel estimation and detection, for Q-OFDMA systems to mitigate the noise enhancement e ﬀ ect and improve the bit error ratio (BER) performance. It is shown that advanced detections, for example, DFE and turbo receiver, can signiﬁcantly improve the performance of Q-OFDMA.


Introduction
Future broadband wireless communication systems require high-speed data rate transmissions through severe multipath wireless channels. As an effective antimultipath multiple access scheme, orthogonal frequency division multiple access (OFDMA) is endorsed by leading standards such as HIPER-LAN/2, IEEE802.11, and IEEE802. 16 and downlink in the 3GPP long-term evolution (LTE). Nevertheless, to support a number of users' access, the number of subcarriers, N, in OFDMA systems is usually very large, which provides flexibility and high spectrum efficiency, at the expense of high complexity, severe PAPR, and sensitivity to CFO in general. Alternatively, single-carrier transmission with cyclic prefix (CP) is a closely related transmission scheme, which significantly reduces PAPR and CFO sensitivity, with the same multipath interference mitigation property as OFDM [1,2]. As an extension of the single carrier with frequency domain equalization (SC-FDE) [2] to accommodate multiuser access, single-carrier frequency division multiple access (SC-FDMA) [3] is adopted as the uplink multiple access scheme in 3GPP LTE. However, noise enhancement and higher complexity introduced by discrete Fourier transform (DFT) spreading and inverse DFT (IDFT) despreading limited the applications of SC-FDMA. More importantly, from the viewpoint of user end (UE), usable and legal resource blocks of subcarriers are limited, therefore the complete FFT/IFFT computation for OFDMA and SC-FDMA demodulations is not necessary especially under the low-power consideration of the battery-driven handsets.
The Quadrature OFDMA (Q-OFDMA) systems [4] overcome the aforementioned problems with improved performance and reduced complexity. Based on the concept of layered fast Fourier transform (FFT) structure [4], the intermediate domain is introduced and a Q-OFDMA system has multiple small-size inverses (IFFTs) in the transmitter, which results in a loss of the subcarrier orthogonality. While at receiver, the orthogonality is recovered by FFT operations.
In terms of minimizing the bit error ratio (BER), the optimum maximum likelihood (ML) [5] detector is able to utilize both the diversity and coding gain furnished by frequency-selective fading channels. However, in most practical systems, linear equalizer (LE) [5][6][7] and decision feedback equalizer (DFE) [5][6][7][8][9] have been designed for complexity reasons. Turbo equalization [10][11][12][13] has been extensively studied when signal-to-noise ratio (SNR) and channel impulse response (CIR) are precisely known to the receiver. In cases where such information is not available or time varying thus need to be tracked, channel information should be estimated. Methods [14][15][16] attempt to perform estimation and equalization jointly, which improve the system performance at the cost of intractable complexity.
From the BER performance analysis of Q-OFDMA systems [17] we find that the essential characteristics of the Q-OFDMA systems. When linear zero forcing (ZF) equalizer is employed, there is a tradeoff between noise enhancement, error propagation, and frequency diversity gain, by setting different value of P. When SNR is small, Q-OFDMA systems with smaller P have better BER performance; while with SNR increasing, Q-OFDMA systems with larger P will become superior. The exact SNR point where one system starts to outperform the other depends on the channel condition and modulation scheme [4]. As a special case of P = 1, the Q-OFDMA system becomes the conventional OFDMA system, which outperforms the Q-OFDMA system (1 < P < N) only in low SNR range. This problem can be solved by utilizing advanced receivers, which is the motivation of this paper. When linear minimum mean square error (MMSE) equalizer is used, for BPSK modulated signals, Q-OFDMA system is always better than OFDMA system with ZF equalizer (for conventional OFDMA systems, ZF is already the maximum likelihood solution and MMSE equalizer cannot achieve better BER performance [18]). Other advanced equalizers, such as decision feedback equalizer and iterative equalizers can efficiently improve the performance of Q-OFDMA system, whose complexity is similar to that of the linear equalized OFDMA/SC-FDMA systems.
In this paper, we focus on analyzing the various detection techniques for Q-OFDMA systems, including ZF and MMSE LEs, DFE, and iterative equalization. The rest of this paper is organized as follows. In Section 2, Q-OFDMA system based on the layered FFT structure is presented. We present signal detection and decoding techniques for Q-OFDMA and analyze the performance in Section 3. Finally, we demonstrate the performance of Q-OFDMA systems using various detection techniques by simulations in Section 4.
The following notations will be used throughout the paper. Matrices and vectors are denoted by symbols in bold face, x i, j indicates the (i, j)th element of a matrix X, and x(i) indicates the element i in a vector x. Tr[·] denotes the trace of a matrix, E[·] denotes the expectation, |·| and · denote the absolute value and estimated value, respectively. and denote the circular convolution and element-wise product of two vectors, respectively. (·) −1 , (·) T and (·) H represent inverse, transpose, and Hermitian conjugate.

Q-OFDMA System Model
To compare the Q-OFDMA with the well-known OFDMA and SC-FDMA systems, Figure 1 shows the intuitionistic difference of the core baseband modules among three systems. At the transmitter, each user's data is first encoded, interleaved, and mapped to a certain constellation. Unlike the subchannel in conventional OFDMA systems, which is defined in the one-dimension frequency domain, subchannels in Q-OFDMA systems are defined over an array of two dimensions in the intermediate domain [4]. This array is P × Q, where both P and Q are powers of 2, and N = PQ is the equivalent to the total number of subcarriers in ordinary OFDMA systems. Thanks to the judicious use of divide-andconquer approach in the computation of DFT [5], smaller size of IFFTs/FFTs are utilized in the transmitter/receiver of Q-OFDMA, which results in reduced complexity and PAPR.
Given three N-point time-domain symbols x, h, and their circular convolution output y = x h, their DFTs have the relationship y = √ N x h. If we rearrange the frequency domain symbols x, h, and y into P × Q matrices (PQ = N) row-wise according to the layered IFFT structure concept, the vectors x q , h q , and y q from the qth column of the matrices retain that Define the intermediate-domain symbols {x q ,h q ,y q } as the IDFTs of { x q , h q , y q }, given by where F H P is the normalized P-point IDFT matrix. According to the convolution property of DFT, we gety q = Qx q h q , which establishes the relationship of the symbols in the intermediate domain, and can be expressed in matrix form asy where the P × P circulant matrixH q represents the dispersive channel, with [H q ] i, j =h(((i − j)mod P)Q + q), whereh(·) denotes the channel response in the intermediate domain.
At the receiver of the Q-OFDMA system, in order to realize a one-tap equalization, the weighting outputs are transformed from the intermediate domain to frequency domain as wheren q ∼N (0, N 0 ) are additive white Gaussian noise (AWGN) samples, the symbol energy of modulation symbols x q is E s , and indicates the diagonalized channel matrix. This scheme recovers the orthogonality between subcarriers in the frequency domain to allow for a simple one-tap equalization, similar to that for conventional OFDMA systems. An interesting observation is that (3) actually resembles to the results obtained in precoded OFDMA systems [18], with a precoding matrix F P . Thus, frequency diversity can be achieved without introducing any complexity relating to precoders in the transmitter, and PAPR is reduced as well.

Signal Detection
In this section, we will present techniques for signal detection, including ZF and MMSE equalizers, DFE and turbo receiver, specially for Q-OFDMA systems.

Low-Complexity Linear Detections.
The simplest detection is ZF equalization, and the subchannel signalx q can be calculated asx which leads to the average BER for a Q-OFDMA system with M-ary QAM modulation as [17] (Pe) ZF = (6) we can see, similar to those in singlecarrier systems [2], any small channel coefficient h pQ+q leads to noise enhancement and error propagation in a group of P subcarriers. On the other hand, frequency diversity is improved by averaging channel power over the same group of subcarriers.
Another low-complexity alternative, MMSE equalizer, can efficiently solve these problems. Similar to that in conversional OFDMA systems, the MMSE equalizer for Q-OFDMA incurs a marginal increase in complexity by requiring the estimation of noise variance σ 2 n , and is given byx where γ = E s /N 0 , and I is an identity matrix.

Decision Feedback Detection.
The class of decisiondirected detectors improve the system performance on the cost of complexity. Current DFE techniques can be operated in the time domain [5], frequency domain [9], or with hybrid structure [7,8], where the feedforward filter is realized in the frequency domain, while the feedback filter is realized in the time domain. Similar to the time-domain DFE (TD-DFE), the hybrid-domain DFE (HD-DFE) is affected by the precursors of the intersymbol interference (ISI) and error propagation. Since both the signal processing and the filter design are performed entirely in the frequency domain, the frequency-domain DFE (FD-DFE) only requires a quarter of the complexity of the HD-DFE, whose complexity is half of that of the TD-DFE [9]. Regarding to the work of DFE presented in this paper, our main contribution lies in extending the general DFE concept to the Q-OFDMA systems and testing its performance, instead of proposing new DFE structure.
Applied to the signal represented in (3), the block DFE, as shown in Figure 2, can be realized with HD-DFE and FD-DFE. The block FD-DFE, as shown in Figure 2(b), can be described by the following equations: where the feedforward and feedback filters, A and B, respectively, are chosen to minimize the mean square error (MSE) and whiten the noise at the input of the decision device T (·). Since we can only feedback decisions in a causal fashion, B is usually chosen to be a strictly upper or lower triangular matrix with zero diagonal entries. The matrices A and B are designed according to MMSE criteria. When B is chosen to be triangular and the MSE between the block estimate before the decision device is minimized, the feedforward and feedback filters can be expressed as [19] where we assume the autocorrelation matrices Rx and Rn are known, (9a) is obtained using Cholesky decomposition, U is an upper triangular with unit diagonal, Λ is a diagonal matrix, and for simplicity, the factor √ N is absorbed in D q . Decoder

Deinterleaver
Demodulator Figure 3: The turbo receiver for Q-OFDMA systems.
Since DFE takes into account the finite-alphabet property of the information symbols and the decision feedback filter eliminates the intersymbol interference from previously detected symbols, the performance of DFE is usually better than linear detectors, especially at moderate high SNR values, where decision errors are less likely to propagate.

Turbo Detection with Soft Interference Cancellation.
In this section, as shown in Figure 3, we propose an iterative receiver for joint estimation, equalization, and decoding for the Q-OFDMA systems based on the turbo processing principle. The estimator makes use of training symbols and the soft-decoded data information to track the channel frequency response. The equalizer can use the re-estimated channel to detect the transmitted data iteratively until the satisfactory outcome is obtained. We can judiciously choose estimation, equalization, and decoding algorithms according to the performance/complexity tradeoff.
For the pth element of y, we rewrite (3) as From (10), we can see the precoding matrix F P breaks the orthogonal character of D and introduces ISI, which can be eliminated by the following turbo equalization.
The equalizer gives the MMSE estimates x ofx based on the received signal y and the a priori information ofx, that is, E(x) and Cov(x,x). After passing through a demapping module, the extrinsic information for each coded bit is delivered as [11] L e E (d n ) = ln As we can see in Figure 3, the output of the demodulator, L E (d n ), has been defined as the a posteriori log-likelihood ratio (LLR) of the coded bit d n , and the output of the interleaver, L D (d n ), as the a priori LLR of d n . The extrinsic information, L e E (d n ), is a function of x(p) and the a priori information about the coded bits other than the nth bit, that is, L D (d n ), n / = n, from the previous iteration. For the initial equalization stage, no a priori information is available and hence we have L D (d n ) = 0, ∀n. The extrinsic information L e E (d n ), which is independent of L D (d n ), is deinterleaved EURASIP Journal on Wireless Communications and Networking 5 and fed into the decoder as the a priori information for the decoder. Based on the a priori LLR L E (c n ), the decoder provides the a posteriori LLR of each coded bit as follows: At the last iteration, a hard decision is made as Here, the interleaver/deinterleaver module shuffles coded bits to decorrelate errors introduced by the decoder/equalizer, and assure, locally in several iterations, d n are independent and L D (d n ) are true a priori information on the d n , which make the iterative error correction possible.

MMSE Criteria. To perform MMSE estimation, we require the statisticsx(p) E[x(p)] andv(p) Cov[x(p),x(p)] of the symbolsx(p)
, which can be computed by the a priori LLR of the coded bits, L D (d n ). For simplicity, we assume BPSK modulation is used in the following analysis. The soft estimates and their variance are defined as [11]x a soft interference cancellation is performed on y to obtain which then be fed into a linear MMSE filter and we get where the filter w p is chosen to minimize the MSE between the coded bitx and the filter outputz, that is, where ε p is a column vector whose P elements are all zeros except the pth element which is one. Thus, the MMSE estimate x ofx can be given by [11] x(p) =x(p) +z(p).
We apply (19) to (22) and formulate the MMSE estimate as Thus, the output extrinsic LLR L e E (d n ) (11) of the equalizer, is given by For the initial iteration, we have L D (d n ) = 0, ∀n,x(p) = 0 andv(p) = 1∀p, then the MMSE linear equalizer solution is simplified to and the corresponding MMSE output and LLR are given by For alleviating the high complexity of computing w p for each iteration, in the first several iterations, we utilize the coefficient matrix w p for the first iteration to compute x(p) and L e E (d n ) according to (27). In the following iterations, approximately perfect a priori LLR |L D (d n )| → ∞, ∀n is available, which leads tox p = (x(1), . . . ,x(p−1), 0,x(p+1), . . . ,x(P)) T , andv(p) = 0, ∀p. w p is then simplified to w p = σ 2 n I + DF P ε p (DF P ε p ) H −1 DF P ε p , = DF P ε p σ 2 n + (DF P ε p ) H DF P ε p . (28)

Turbo Channel Estimation.
As a result of (3), channel estimation can be easily implemented by transmitting carefully chosen training symbolsx tr such that each element in F Pxtr has unity magnitude. However, the estimation based on training symbols may not be reliable, especially when the channel is time varying and channel tracking is needed. In this section, we propose an iterative channel estimation technique in conjunction with data detection. The idea is to firstly use training symbols to perform an initial estimation, then the soft data information delivered by decoder will be utilized in estimation. At last iteration, when the decoding information from decoder becomes reliable, advanced estimators, that is, maximum likelihood or MMSE estimator, are employed to provide further performance improvement. From (4), we can see DF P = F PH , which is a frequency response of channel. Therefore, we can use H = DF P as the channel estimates for Q-OFDMA systems. The channel estimation method is summarized as the following several steps: (1) Initial channel estimation wherex T (p) is the training symbols, Δ T (p) is AWGN with zero mean and variance (σ 2 n + σ 2 ISI ). Once the initial channel estimates are obtained, the detected soft data symbolsx are achieved by (16) for BPSK modulation.
(2) Iterative channel estimation. In this stage, data-aided LS channel estimation is utilized; Similar to the initial estimation stage, it can be shown that Δ(p) has zero mean and variance (σ 2 n + σ 2 ISI ). (3) Final channel estimation. In the last iteration, the decoding information from decoder becomes very reliable, MMSE estimator [5] is able to provide further performance improvement.

Complexity Analysis.
Complexity is defined as the number of complex multiplications required in processing each frame. FFT complexity is based on radix-2 algorithm, which means the computational complexity for N point FFT/IFFT is O(N/2 log 2 N). Assume user-k occupies M subchannels in Q-OFDMA systems, and equivalently, MP subcarriers in conventional OFDMA systems. With a linear equalizer, a general OFDMA receiver includes an N-point FFT and a one-tap equalizer, and the complexity is N/2 log 2 N + MP. For a SC-FDMA receiver, refer to Figure 1, an extra p-point IFFT is required based on the OFDMA receiver, thus the complexity is N/2 log 2 N + MP + P/2 log 2 P. For a Q-OFDMA system, the receiver includes PQ-point FFTs, MP-point IFFTs, MP-point weighting operators, and M one-tap equalizer. The complexity is N/2 log 2 Q+MP log 2 P +2MP. When the channels change, the computational complexity of linear ZF/MMSE equalizer is O(P 3 ) for Q-OFDMA systems, and O(N 3 ) for OFDMA/SC-FDMA systems, where N equals to Q (Q ≥ 1) times of P. From Table 1, we note that the receiver of the Q-OFDMA with linear equalizer only requires half of the complexity of the OFDMA, whose complexity is similar to the SC-FDMA system.
The complexity of decision feedback detection is comparable to that of linear detectors, because the feedforward and feedback filters only have matrix-vector multiplications. Additionally, an FD-FDE equalizer in Figure 2(b) needs an extra P-point FFT for feedback filter, that is, cancellation is performed in the frequency domain. Therefore, the EURASIP Journal on Wireless Communications and Networking 7 complexity of the receiver of Q-OFDMA with FD-DFE is N/2 log 2 Q + 2MP log 2 P + 3MP.
The complexity of the turbo receiver mainly comes from the MMSE equalizer, MAP decoder, and the order of iterations. For each iteration, the MMSE equalizer performs three FFT operations, whose complexity is O(P/2 log 2 P) for Radix-2 algorithms, and four matrix operations whose complexity is O(P 2 ). For the MAP decoder, the complexity of soft output Viterbi algorithm (SOVA) with five iterations is twice as that of Viterbi algorithm, and the ratio becomes three with ten iterations [21]. Comparing with the linear equalizer and DFE, the complexity analysis is far more complicated for joint turbo estimation, equalization, and decoding. Assuming the channel is fixed, given the MMSE equalizer, the overall complexity of the turbo receiver of the Q-OFDMA system is N/2 log 2 Q + i(4MP + MP log 2 P) − MP, which excludes the complexity of the decoder and i denotes the number of the iterations.
In our previous work, we found that larger P leads to more reduction in complexity of Q-OFDMA and lower PAPR at the transmitter, and better CFO robustness [4]. Thus in Q-OFDMA systems with turbo receiver, P should be chosen carefully within system constraints according to the complexity/performance tradeoff.

Simulations
In this section, we present the BER performance of Q-OFDMA systems with different receivers, including linear ZF and MMSE, DFE, and iterative (turbo) receiver. In OFDMA, subcarriers are first grouped per Q successive subcarriers, and each subchannel occupies one subcarrier in each group with a fixed index. Distributed SC-FDMA is used in the simulation, the subcarriers of each user are spread over the entire signal band with a fixed index. For simplicity, system imperfections such as CFO and PAPR distortions are not introduced in the simulation. In each simulation result, BER is averaged over a number of channel realizations. In coded systems, each user's data is encoded with 1/2-rate convolutional code, and a rectangle interleaver is applied to the coded bits before modulation. SOVA is used for decoding. The initial channel coefficients are estimated by matched filter scheme over two consecutive training symbols. Two types of channel models are simulated to compare systems performance. One is the CM2 channel model from IEEE802.15.3a, which is a dense nonline-of-sight multipath model with tens of significant taps. The other is the SUI3 channel model from IEEE802.16, which is a sparse channel model with only a few taps and small normalized delay spread. In either case, the length of the guarding interval is set to be 64, and channel impulse response longer than 64 is truncated to have 64 taps to avoid ISI. Figure 4 presents an uncoded case to illustrate a few key points about the systems comparison under CM2 channel model. All of the MMSE equalized systems are with 16QAM modulation. The parameter N is fixed at 1024, 16 users sharing 64 subcarriers in all three systems. It can be noticed that when SNR is small, noise enhancement dominates the system performance and Q-OFDMA is inferior to conventional OFDMA systems; with SNR increasing, noise enhancement effect is relatively suppressed and diversity improvement makes Q-OFDMA superior. It also shows that the OFDMA performance is generally better than that of SC-FDMA with the linear MMSE receiver. We depict the simulation results in Figure 5 for uncoded systems with BPSK modulation under CM2 channel model. length is 64). From the figure, we can see that linear MMSE equalizer can significantly improve the performance of Q-OFDMA systems by suppressing the noise enhancement effect. While for general OFDMA systems, it is known that MMSE equalizer almost has the same performance as ZF equalizer. Figure 6 shows the system performance with QPSK modulation under CM2 channel model. From the figure, we can see that DFE detection further reduces the effect of noise enhancement and improves the system performance compared with linear detectors. The proposed iterative (turbo) receiver scheme performs better than Q-OFDMA systems with linear and decision feedback detectors. At BER = 10 −4 level, the Q-OFDMA systems with 2 iterations can achieve it at 17 dB SNR, which is about 2 dB lower than MMSE equalized Q-OFDMA without iteration process, and Q-OFDMA systems with more iterations get better performance. Figure 7 shows BER performance for systems with 64-QAM modulation, under SUI3 channel model. Subcarriers have very high correlation due to very limited number of multipath signals. In this case, the influence of frequency diversity is weakened, while the noise propagation is highlighted in Q-OFDMA systems. However, we can see a similar trend, in BER performance of Q-OFDMA systems with different order of iterations, to that of Figure 6.

Conclusions
In this paper, we analyze linear, decision direct and iterative (turbo) detections for Q-OFDMA systems to mitigate the noise enhancement effect and improve the BER performance. Furthermore, a dedicated turbo equalizer in  conjunction with channel estimation for Q-OFDMA systems is proposed and evaluated. We can judiciously choose estimation, equalization, and decoding algorithms according to the performance/complexity tradeoff. From simulations on wireless dispersive channels, we have shown that Q-OFDMA with FD-FDE achieves improved performance. Since both the signal processing and the filter design are performed entirely in the frequency domain, the complexity of FD-FDE Q-OFDMA is similar to that of the linearly equalized Q-OFDMA systems. Moreover, by reducing the interference and noise enhancement effect, and increasing the reliability of the detected data, the iterative receiver for joint estimation, equalization, and decoding significantly improves the performance of the Q-OFDMA system, with the similar complexity to the linearly equalized OFDMA/SC-FDMA systems.