EURASIP Journal on Applied Signal Processing 2005:6, 872–882 c ○ 2005 Hindawi Publishing Corporation Iterative PDF Estimation-Based Multiuser Diversity Detection and Channel Estimation with Unknown Interference

The equivalent diversity order of multiuser detector employing multiple receive antennas and minimum mean squared error (MMSE) processing for frequency-selective channels is decreased if it aims at suppressing unknown cochannel interference (UCCI) while detecting multiple users' signals. This is an unavoidable consequence of linear processing at the receiver. In this paper, we propose a new multiuser signal detection scheme with the aim to preserve the detector's diversity order by taking into account the structure of the UCCI. We use the fact that the structure of the UCCI appears in the probability density function (PDF) of the UCCI plus noise, which can be characterized as multimodal Gaussian. A kernel smoothing PDF estimation based receiver is derived. The PDF estimation can be based on training symbols only (noniterative PDF estimation) or on training symbols as well as feedback from the decoder (iterative PDF estimation). It is verified through simulations that the proposed receiver significantly outperforms the conventional covariance estimation in channels with low frequency selectivity. The iterative PDF estimation significantly outperforms the noniterative PDF estimation-based receiver with minor training overhead.


INTRODUCTION
The scarcity of the frequency resources and the fact that the frequency spectrum has to be shared by multiple users in future wireless communication systems impose the need for bandwidth-efficient transceiver schemes. A huge volume of research has been done on the development of different techniques for multiple access, the most important examples of which are frequency-division multiple access (FDMA), timedivision multiple access (TDMA), and code-division multiple access (CDMA).
The advances in the area of communications using multiple receive antennas have opened a completely new dimension for combating interference, called space-division multiple access (SDMA) [1,2]. The SDMA concept can be applied to any of the existing multiple-access schemes to further improve the system capacity both in terms of the number of supported users and in terms of supported data rates. Moreover, SDMA can be seen as bandwidth efficient by analogy to CDMA, where the orthogonality between users is maintained by their unique spatial signatures instead of unique spreading waveforms [3]. This, at least in terms of baseband signal processing, offers new possibilities of using large preexisting knowledge of CDMA.
An example of the area where a large experience is present in the research community is multiuser detection for CDMA [4]. It is well known that the maximum-likelihood sequence estimation (MLSE) technique achieves the best performance when detecting the multiple users' transmitted signals. However, its computational complexity, which increases exponentially with the number of users and memory length of the channel, is prohibitive for a practical use. Therefore, a significant amount of research has been conducted to develop suboptimal multiuser receivers [4]. In coded systems, the complexity of the optimal receiver further increases due to the fact that joint trellis diagram of all the users, their multipath channels, and their channel codes has to be taken into account [5]. In [6], low-complexity receivers that separately perform detection and decoding stages are proposed.
A breakthrough in the development of the suboptimal low-complexity receiver schemes has been initiated by the discovery of turbo codes [7]. The principle of turbo processing [8] has been shown to be extremely powerful in solving the computational complexity problem of the optimal receiver structures. It is based on the concept that different parts of the receiver perform locally optimal signal processing, conditioned on the processing results of the other receiver blocks. By iteratively performing such a processing, near-globally optimal performance can be obtained in various cases. Examples are joint equalization and decoding [9], joint multiuser detection and decoding in CDMA [10,11,12], and joint MIMO multiuser detection, equalization, channel estimation, and decoding [13]. The common structure of these receivers consists of the maximum a posteriori (MAP) block for multiuser detection and equalization, and a set of soft-input soft-output (SISO) channel decoders, which are separated by interleavers [10,11]. In [10] and [13], a low-complexity implementation of the iterative receivers is also proposed. It is based on soft interference cancellation and linear minimum mean squared error (SC/MMSE) filtering.
The SC/MMSE equalizer is robust against unknown cochannel interference (UCCI) if the covariance matrix of the UCCI is properly estimated and taken into account in the MMSE filtering [1,13]. Subspace estimation and projection [14,15] is another UCCI cancellation technique. However, to suppress UCCI, the both methods unavoidably consume the degrees of freedom (DOFs) provided by spatio-temporal processing in the receiver. This is a consequence of linear signal processing at the receiver that does not take into account the actual structure of the UCCI. This will result in a decrease of the overall diversity order of the receiver [15,16,17]. The loss of diversity will be more severe in channels with low frequency selectivity due to the lack of multipath diversity.
If the signal constellation of the UCCI is known at the receiver, the channel of UCCI can be estimated, and the diversity loss of the linear MMSE receiver can be completely recovered by means of joint maximum-likelihood (ML) detection of the desired users and UCCI [17]. This in turn would require either blind channel estimation methods to estimate the channels of UCCIs or blind source separation techniques [18] to estimate jointly the channels and data sequences. However, if the UCCI's channels do not change significantly over the frame, possible states of the interference can be estimated instead of estimating the channel gains themselves. This fact has been used in Bayesian equalization in the presence of UCCI in [19]. In [20], maximum-likelihood sequence estimation (MLSE) equalization was performed in combination with UCCI-plus-noise PDF estimation to combat the impact of UCCI.
In [21], the authors have derived a new receiver that can preserve the diversity gain by estimating the PDF of the UCCI plus noise. The signal processing algorithm shown in [22] is used in the first iteration and the kernel-based PDF estimation [20,23] is applied for the following iterations. It is shown there that the proposed receiver significantly outperforms the conventional detector of [22] in low frequency-selective channels with relatively small number of UCCIs. There, however, the receiver is restricted to noniterative PDF estimation and it was derived only for binaryphase-shift-keying (BPSK) modulation. In this paper, we generalize the receiver derivation to the multilevel-phaseshift-keying (MPSK) case. Furthermore, an iterative PDF estimation technique using soft feedback is proposed for situations where only short training sequences are available. It is shown that the proposed joint iterative PDF estimation and turbo signal detection technique can significantly improve performance over the noniterative technique, when only short training sequences are available. We restrict the scope of the paper to the multilevel phase-shift-keying (PSK) modulation, but it is straightforward to extend the concept to quadrature amplitude modulation (QAM) cases. The rest of the paper is organized as follows. Section 2 describes system model. Sections 3 and 4 present the conventional and the proposed receivers, respectively. Section 5 presents simulation results, and Section 6 concludes the paper. Figure 1 illustrates the system model. Each of N + N I users encodes information sequence c n (i), n = 1, . . . , N + N I , i = 1, . . . , n 0 RB, using convolutional encoder with R, B, and 2 n0 being the code rate, frame length in binary symbols, and the number of constellation points in modulation scheme, respectively. The users indexed by n = 1, . . . , N are the desired users and those indexed by n = N + 1, . . . , N + N I are UCCI. The encoded binary sequences d (i) n (k), k = 1, . . . , B, i = 1, . . . , n 0 , are interleaved and 2 n0 -PSK modulated, resulting in symbol sequences b n (k) = M{d (i) n , i = 1, . . . , n 0 } ∈ Q, k = 1, . . . , B, where M is bit-to-symbol mapping function. Q = {α 1 , . . . , α 2 n 0 } denotes signal constellation set of the 2 n0 -PSK modulation. They are preceded by the user-specific training sequences of length T symbols. The frame structure is presented in Figure 2. The entire frame is transmitted through frequency-selective channel with L paths.

SYSTEM MODEL
After coherent demodulation in the receiver, the signals from each of the M receive antennas are matched-filtered and sampled at the symbol rate. The sampled received signal at the antenna m is given by where h m,n (l) is a baseband representation of the channel gain between the nth user and mth antenna for the lth path, and v m (k) is additive white Gaussian noise (AWGN) with  variance σ 2 . After collecting signals from all the antennas into the space vector, we obtain where In order to capture the multipath components, the window of received signal samples spanning the time frame of length L at time instant k is collected into the space-time vector y(k) ∈ C LM×1 given by [13] where H ∈ C LM×N(2L−1) and H I ∈ C LM×NI (2L−1) are defined as and with E{n(k)n H (k)} = σ 2 I.

SC/MMSE multiuser detector
The conventional receiver for MIMO turbo equalization with UCCI is proposed in [22]. We highlight the main points of the receiver for convenience. Assume without loss of generality that user 1 is the user of interest. Let

First iteration
The sample vectors y(k), k = 1, . . . , T, denoting training sequences, are first directed to the channel estimator to obtain the estimate H of H, and then the samples are used to estimate the covariance matrix of the UCCI plus noise using sample average given by In order to suppress the known and unknown CCI components as well as the ISI components of the desired signal, a linear filter with weighting vector w 1 (k) is applied to the signal y(k), k = T + 1, . . . , B + T, so as to satisfy the MMSE criterion: resulting in the optimal weighting vector where and e 1 ∈ R N(2L−1)×1 is defined as By approximating the error at the MMSE filter output as Gaussian [13], the extrinsic probabilities to be passed to the decoder are calculated as where z 1 (k) is the MMSE filter output,

Subsequent iterations
Let u(k) denote the training sequence or the soft data sequence fed back from the channel decoder: where p 2 {b n (k) = α j } denotes the extrinsic probability obtained by SISO channel decoding (see (34)). Similarly define or where P 2 {b n (k) = α j } denotes the a posteriori probability obtained by the SISO decoding (see (35)). The samples u(k), k = 1, . . . , T + B, are first fed to the channel estimator to reestimate the channel H, and then the samples are used to update the estimate of the covariance matrix R xx : We now denote Soft cancellation of the known CCI components as well as the ISI components of the desired signal is performed, yielding After that, a linear filter with weighting vector w 1 (k) is applied to the signal y 1 (k) so as to satisfy the MMSE criterion: resulting in the optimal weighting vector where Note that (30) holds only for multilevel PSK. Note further that the total number of DOF of the iterative linear SC/MMSE receiver after convergence is determined by the product LM. This number is decreased by a factor equal to the rank of the matrix R xx while cancelling UCCI.
The extrinsic probabilities to be passed to the decoder are calculated as in (14), where the MMSE filter output z 1 (k) is now defined as

Channel decoder
Each of the single-user channel decoders produces the maximum a posteriori (MAP) probability of each binary symbol d (i) 1 (k): where is the deinterleaved a priori information p 1 (d (i) 1 (k) = ±1) obtained from the MMSE detection stage and p 2 (d (i) 1 (k) = ±1) is the decoder extrinsic probability. To obtain p 1 (d (i) 1 (k) = ±1), a symbol-to-bit probability conversion has to be made as follows: where B +1 = {β ∈ Q|β = M{δ p , p = 1, . . . , n 0 ; δ p ∈ {+1, −1}, p = i, δ i = +1}}, and similarly for p 1 (d (i) 1 (k) = −1). The extrinsic probabilities p 2 (d 1 (k) = ±1) are used to make the conversions from bit-to-symbol extrinsic probabilities, yielding which are delivered to the SC/MMSE receiver through (19). Similarly, the symbol-level a posteriori probabilities are calculated as to be used in (23). After a sufficient number of iterations, when the receiver has converged, the decision on the transmitted binary information symbols c n (i) is made based on the a posteriori log-likelihood ratios for c n (i), given as for decision making.
Iterative channel estimation from [24] is applied. The detailed description is reviewed in Appendix B.

Receiver derivation
Unlike the conventional receiver described in Section 3, which uses MMSE detection after soft cancellation, the proposed receiver uses maximum-likelihood (ML) processing, by making use of the actual structure of the UCCI.

First iteration
We rewrite (4) as where H CISI,1 = H − [0 LM×(L−1)N h 1 0 LM×LN−1 ], u CISI,1 (k) = u(k)−b 1 (k)e 1 , and x 1 (k) denotes the total sum of the desired user's intersymbol interference (ISI), known CCI, UCCI, and noise. Since in the first iteration the soft feedback is not available, the ISI and CCI components cannot be cancelled. ML processing requires the PDF of the signal x 1 (k), which is multimodal Gaussian, given by where D tot = n 0 [(2L − 1)(N + N I ) − N], and t i,1 depends on the entries of H I and H CISI,1 and the signal constellation of the UCCI. The number of summation terms in (38) increases exponentially with the number of users N which may be large in a practical system. In that case, the samples x 1 (k) will become less structured and their PDF will become more Gaussian-like. To justify the Gaussian approximation, we calculate the Kullback-Leibler (KL) distance (relative entropy)  [25] between the true PDF given by (38) and the corresponding Gaussian approximation given by for several values of N + N I . It can be seen from Table 1 that the KL distance decreases as the N + N I increases, which means that the true PDF approaches Gaussian. Therefore, by adopting the Gaussian assumption for the x 1 (k), the extrinsic probability to be passed to the first user's SISO decoder can be calculated as where C ML = 1/π LM det (R x1x1 ). It can also be shown [26] that the MMSE filter given by (11) can be transformed using the matrix inversion lemma, resulting in Note that the estimates of H and R xx are replaced by their true values in (11) to show that (42) holds. In practice, the estimate of R x1x1 is obtained from (40) by using estimates H and R xx , defined in Section 3. After incorporating (42) into (15), the extrinsic probability at the output of MMSE filter, given by (14), can be represented as where This, however, is just the scaled extrinsic information of (41) obtained by using the ML detector. Since the constants C ML and C MMSE do not have any impact on the receiver performance, in the first iteration the proposed ML receiver is exactly the same as the conventional MMSE receiver presented in Section 3.

Subsequent iterations
Starting from the second iteration, we make use of the soft feedback. Assuming that the soft cancellation in (24) is almost perfect, the ISI components of the desired user and the known CCI components can be cancelled, and the PDF of the signal x(k), given in (24), can be given as

(πσ 2 ) LM e −( x(k)−ti) H ( x(k)−ti)/σ 2 , (44)
where D = n 0 (2L − 1)N I , and t i depends on the matrices H I and the signal constellation of the UCCI. Assuming that the number N I of UCCIs is relatively small, the structure of the UCCI can be exploited by estimating the PDF of UCCI plus noise given by (44) and applying ML filtering. After the estimate p x ( x(k)) of p x ( x(k)) is obtained, the extrinsic probability to be passed to the first user's SISO decoder can be calculated as the output of the single-user ML detector as The PDF estimation procedure is described in the sequel. First, the channel is reestimated based on u 1 (k), k = 1, . . . , B + T, as in Section 3. Then, the samples x(k), k = 1, . . . , T + B, are used to make the estimate of the UCCIplus-noise PDF. Note that by using the samples indexed by k = 1, . . . , B + T, we perform iterative PDF estimation. In the noniterative PDF estimation, only first T samples, x(k), k = 1, . . . , T, corresponding to the training sequence, would be used. In order to perform the PDF estimation, either parametric [19] or nonparametric [23] approach can be used. The former one estimates the parameters D and t i based on the samples x(k). These estimates are then used in (44). On the other hand, the nonparametric approach estimates PDF directly, where each sample x(k) contributes to the total estimate through a weighting function. For example, for an arbitrary a = [a 1 , . . . , a LM ] T ∈ C LM×1 , the nonparametric multidimensional kernel-based PDF estimator [23] estimates the p x (a) as where K 1 (a) = 1/(2π) LM e −a H a/2 is a Gaussian kernel weighting function and σ 0 is a smoothing parameter. Although other kernel functions can be used [23], it will be shown that this choice gives an asymptotically unbiased and consistent PDF estimator. The estimation accuracy is controlled by the smoothing parameter σ 0 . The larger value of σ 0 results in the smoother but less accurate PDF estimate, and vice versa. In order to find the optimal value for σ 0 , one approach is to minimize the mean integrated square error (MISE) [23] between the true PDF and its estimate, as defined by where da = d a 1 d a 1 · · · d a LM d a LM . It is shown in Appendix A that the optimal smoothing parameter σ 0,opt can be lower bounded as follows: A similar result was obtained in [20] for the univariate case. It is a special case of (48) for LM = 1. Furthermore, the estimate of σ 0,opt satisfies the sufficient conditions for consistency and asymptotic unbiasedness. These conditions are given as lim (T+B)→∞ σ 0 = 0 and lim (T+B)→∞ (T + B) σ 0 = ∞, and they are satisfied if the parameter k 0 ∈ R is chosen to be k 0 ≥ 1 [20]. Thereby, the estimator dependence on D and t i reflects only through the constant k 0 since γ(LM) is independent of these parameters. The bit error rate (BER) performance versus parameter k 0 with different numbers of users and different numbers of multipaths as parameters is shown in Figure 3. Interestingly, the optimal value of k 0 that minimizes BER is shown to be rather insensitive to the change of the other parameters. Moreover, it is shown in [20] that for LM = 1, the optimal parameter k 0 also does not depend on the signal-to-noise ratio [21]. From Figure 3, it can be seen that k 0 ≈ 2 is a good choice for a wide range of situations. This indicates that, in practice, the knowledge about the parameters N I , L, and, correspondingly, D is not needed. If these parameters are known to the receiver, they could be used to access a lookup table in which the optimal values of k 0 for different combinations of parameters can be stored a priori. The same procedure is performed for the rest of desired users to obtain the soft estimates b n (k) and b n (k) for the next iteration.

Symmetrizing
If the UCCI signal constellation is known to the receiver, the symmetry of the constellation set can be utilized to increase the number of samples that can be used for PDF estimation. In case of 2 n0 -PSK modulation, a 2 n0 -fold increase of the number of samples can be achieved by using the fact that p(a) = p(ae − j2πk/2 n 0 ), k = 1, . . . , 2 n0 − 1.

Computational complexity
Since (45) contains the sums of exponentials, it can be approximated using the Jacobian algorithm [27]. The complexity per symbol of the proposed method is roughly O{(T + B)LM} or O{TLM}, depending on whether we use soft feedback for PDF estimation or not, respectively. The conventional SC/MMSE receiver's complexity is O{L 3 M 3 }.

NUMERICAL EXAMPLES
The performance of the proposed receiver was tested through simulations. The training sequence lengths of T = 100, 20, and 10, data sequence length of B = 900, and BPSK modulation were used. The channel gains for each path of each user were assumed to have equal average powers, with Rayleigh- distributed amplitudes. They are constant over each transmitted frame, and they change independently from a frame to another frame. The rate R = 1/2 convolutional code with the generator polynomials (5, 7) 8 and the MAP decoder [10] were used for all MIMO users. User-specific random interleavers were assumed. A lower-complexity least-square (LS) channel estimation (see [24]) was used, since it is shown in [28] that the more complex MMSE channel estimation (see Appendix B) does not offer significant performance benefits unless the power ratio between UCCI and desired signals is very strong.
In Figures 4 and 5, BER versus per-antenna E b /N 0 is presented for L = 1 and L = 2 cases, respectively. The noniterative PDF estimation is used in these examples, since long overhead (T = 100) was used. In both cases, the proposed receiver significantly outperforms the conventional one in the case where one or two out of three users are UCCI. This is the consequence of the linear processing of the conventional receiver of [22] that does not take into account the actual structure of the UCCI plus noise. Performance curve when all the users are to be detected is shown for comparison (indicated by "all known").
The performance is closer to the "all known" case for L = 1 (frequency flat fading) than for L = 2, and for N I = 1 than N I = 2. This is because the PDF of (44) becomes more scattered in the LM-dimensional space with increased L and N I . It means that fewer samples x (out of T available) effectively contribute to the estimate p x (a) of p x (a) in (46), which decreases the PDF estimation accuracy. The increased M with fixed T also reduces the estimation accuracy due to the increased dimensionality of x [23]. Its impact can, however, be compensated for in part by (48) with an appropriate choice of optimal k 0 . In Figure 6, BER performance of iterative and noniterative PDF estimation is presented. The abbreviations FB, no FB, conv., and prop. stand for the iterative PDF estimation (feedback), noniterative PDF estimation (no feedback), and conventional and proposed receivers, respectively. It can be found from Figure 6  achieve almost the same performance as the noniterative receiver with long (T = 100) training sequence. It should be emphasized that the reduction in overhead due to training when using iterative PDF estimation is rather significant.

CONCLUSIONS
A kernel smoothing PDF estimation-based receiver was derived to preserve the diversity order of iterative SC/MMSE receivers for multiuser detection in frequency-selective channels in the presence of unknown cochannel interference. The PDF estimation can be based on training symbols only (noniterative PDF estimation) or on training symbols as well as feedback from the decoder (iterative PDF estimation). It was verified through simulations that the proposed receiver significantly outperforms the conventional covariance estimation in channels with low frequency-selectivity, where the degradation is more severe due to the lack of multipath diversity. In higher frequency-selectivity channels, the PDF estimation accuracy will decrease, since the UCCI-plus-noise components will be more scattered in the multidimensional data space. Fortunately, the need for diversity is less stringent therein. The proposed receiver with iterative PDF estimation can significantly outperform both the conventional and noniterative PDF estimation-based receiver with minor training overhead. Moreover, its performance has been shown to be very close to that of noniterative PDF estimation with a long overhead. Thus, the proposed receiver provides significant potential both for bandwidth-efficiency improvement and for system capacity increase in multiuser communications in flat and moderately frequency-selective channels. Potential application areas may be in cellular systems, where there are usually only a few dominant other-cell interferers or high data-rate users, which can be suppressed by the method presented here. The receiver may also serve as a basis for a random access scheme where short bursts are transmitted in an asynchronous mode. Assuming that the collisions are not too numerous, they could be handled by the proposed method.

APPENDICES
A. DERIVATION OF LOWER BOUND ON σ 0,opt A reasonable approximation of (47) can be done by using its Taylor series expansion [23], with which where α = R 2LM ( a 1 ) 2 K 1 (a)da = 1 and From (A.1), the optimal smoothing parameter σ 0,opt is found to be In general, the function Γ(p x ) is dependent on D and t i , i = 1, . . . , D. However, it is shown in [20] for the univariate case that the upper bound on Γ(p x ) obtained using Cauchy's inequality is dependent neither on t i nor on D. Adopting the same approach in the sequel, we generalize the upper bound derivation to the multivariate case. We denote With (A.5) the expression for Γ(p x ) can be rewritten as where

B. ITERATIVE CHANNEL ESTIMATION
For the purpose of channel estimation, we will introduce a different system model notation than in the main body of the paper, following [24] for convenience of notation. Starting from (1), we collect the received signal samples at the mth receive antenna into the vector q m ∈ C (T+∆+L−1)×1 given by The elements of vectors g m are used to form the matrix H.