Transmitter Layering for Multiuser MIMO Systems

,


INTRODUCTION
Emerging wireless networks are likely to incorporate multiple-input multiple-output (MIMO) antenna systems in order to meet requirements for high transmission capacities and link availability. Since their introduction by Foschini [1], and the prototyping of Bell Lab's BLAST project [2], research into MIMO transmission has exploded, and a large body of work has been published in recent years. While the theory of MIMO transmission [3] and the potential capacity benefits are well understood, efficient transmission systems are still under intensive research. Foschini's cancellation and nulling receiver works well in theory and approaches the capacity of an uncorrelated MIMO channel, but its performance degrades rapidly in reduced-rank channels, as typically happens in outdoor wireless transmission.
Various low-complexity receivers based on linear matrix methods such as the zero-forcing (ZF) and the minimum mean-square error (MMSE) receivers have also received much attention. If the MIMO channel rank is limited by the number of transmit antennas, N t , and if there are more than about twice as many uncorrelated receive antennas, N r , it can be shown that linear methods perform very well, and indeed come close to the capacity of the channel as long as N t N r /2 [4]. However, this typically requires significantly more receive antennas than transmit antennas and may thus not be practical for downlink transmissions to mobile terminals, or where multiple MIMO terminals communicate concurrently to a central receiver. In such rank-deficient situations, linear systems quickly fall apart and fail to provide high channel efficiencies.
In this paper, we present a modulation/demodulation method for multiple antenna channels which works particularly well in receiver-rank dominated channels. The method is based on a recently proposed iterative demodulator/decoder for code-division multiple-access (CDMA) [5]. The method relies on the use of antenna signature sequences, and is computationally efficient with most of the operations related to memory access and cancellation. The receiver follows a two-stage methodology, whereby a first stage operates as an iterative demodulator using a partition combiner in an iterative cancelation/demodulation process, followed by conventional individual forward error control (FEC) codes (second stage). We will show that this method can achieve the capacity of rank-reduced MIMO channels in a flexible way.  The paper is organized as follows. Section 2 presents the formal system description and the iterative low-complexity demodulation process. In Section 3, an asymptotic density evolution analysis is presented leading to an accurate iteration equation describing the signal-to-interference progression of the different data streams in the demodulator. It is shown how higher-order modulations can be accommodated to increase spectral efficiency as signal-to-noise ratio becomes available. This methodology can approach the capacity of the channel closely using unequal power distributions over different data streams utilizing generalized PAM constellations. A density-evolution-based analysis is used to evaluate achievable spectral efficiencies and numerical simulation examples are given in supporting evidence. Section 4 discusses extensions to multiuser MIMO networks of transmitters, and Section 5 concludes the paper.

Transmitter
We consider a transmission system using a multiple-input multiple-output (MIMO) system with N t transmit and N r receive antennas. The input data stream is divided into K data streams which are subsequently independently encoded using block FEC codes of length L. Each of the output binary symbols is mapped into an antipodal signal {±1}. The block of modulated binary antipodal symbols for stream k is denoted by (v k,0 , v k,1 , . . . , v k,L−1 ).
Each symbol v k,l is multiplied by a binary antenna signature sequence s k = (s k,0 , s k,1 , . . . , s k,Nt−1 ) T whose entries are selected from the set {± 1/N t } for energy normalization. These unique antenna signatures have the effect of dispersing the data streams over all N t antennas and creating a balanced load on all transmit RF chains (the reader will appreciate that this dispersion can of course be created over multiple transmission intervals, increasing the effective N t ). This transmitter arrangement is illustrated in Figure 1.
The  Figure 1). This creates an implicit repetition code of rate 1/M for each stream, which, combined with the MIMO channel, forms a concatenated structure suggestive of iterative processing, see Section 2.2.
Different partitions of the same symbol v k,l are transmitted at different time intervals and over different antennas, and therefore both temporal and spatial diversities are generated. By doing this, we ensure a dispersion of the transmitted information and reduce correlation between received partitions belonging to the same symbol v k,l . Denoting the interleaved symbol of stream k, partition m, and transmitted at time l by v k,m,l , the discrete signal transmitted from antenna n at time l is given by where P k is the total power (energy/symbol) of transmit stream k, distributed over the N t transmit antennas. Note that this basic system model describes a number of key scenarios: (i) for higher-order modulations we group B data streams together which will form a 2 B -ary PAM symbol. This is done as follows: the antenna sequence partitions s k,m for the data streams forming the combined PAM symbol stream are chosen orthogonal, and the powers are set such that P k,b ∝ 4 b ; b = 0, . . . , B − 1. It is easy to see that such a power distribution on binary antipodal signals generates the familiar equispaced PAM symbols. Furthermore, orthogonality of the antenna signature partitions requires that N t /M ≥ B, but is otherwise easy to satisfy. It is straightforward to see how the methodology could be extended over multiple transmission times to ease this constraint. Also, this orthogonality among signature sequences of the same PAM data streams is not required to make the receiver work, but it ideally eliminates the intraconstellation interference, while the interstream interference is efficiently controlled by the iterative receiver discussed below. (ii) This model also accommodates a multiuser scenario where different MIMO Christian Schlegel et al.
3 terminals transmit to a central receiver (MIMO multiuser). This implies that the K data streams are grouped into U groups, where U is the number of distinct MIMO terminals. The only difference between the basic receiver system discussed below, and a MIMO multiuser receiver is that the signal asynchronicity between the different terminals needs to be accounted for at the receiver.
In all cases, the signals in (1) are transmitted from the N t antennas over a channel with gain matrix at time instant l given by H l , where the channel matrix entries, h r,n,l , are the path gains between transmit antenna n and receive antenna r at time l. As is customary, we assume for now that these gains are randomly and independently distributed with power E|h r,n,l | 2 = 1. It is also assumed that the channel changes slowly with respect to the symbol rate and is known at the receiver. The signal received by the N r antennas is then where η l is a vector of complex AWGN noise, η r,l ∈ N (0, 2σ 2 ).

Receiver
Detection and decoding are performed by adapting an efficient two-stage scheme initially proposed for multiuser CDMA demodulation [5][6][7]. The first stage is an iterative demodulator by means of simple interference cancellation using soft-bit estimation and matched filtering. It aims at delivering soft symbol estimates to the external forward error correction (FEC) decoders for each data stream. The success of the FEC decoders depends on the level of residual noise and interference in each demodulated stream. When the final signal-to-noise ratio produced by the iterative demodulation stage is above the code threshold, the receiver can deliver error-free data. We will quantify the conditions for this to happen below. Figure 2 depicts the iterative demodulation process. The received signal at each antenna r is given by for r ∈ 0, 1, . . . , N r − 1. For simplicity we omit the time index l here. To extract partition v k,m we filter the signals y r with all channel components in y r which contain the interleaved symbol v k,m . These matched filter outputs are given by basically matched-filter combining all N t /M components of signal partition v k,m that arrive at antenna r. Substituting the signals y r we calculate after some manipulations The interference term I k,m,r consists of two contributions: (i) interference from other data streams, and (ii) interference from the kth data stream itself. These two contributions are given by and the noise term is We will need the variances of these different terms in the later development, given by Term1: σ 2 Term2: Noise: σ 2 After deinterleaving the signal partitions z k,m,r in (5) into z k,m,r , the receiver then aggregates signal partitions by summing the signals over all antennas r to obtain the aggregate matched filtered signal for partition m and stream k:   Because of the assumption of unit-variance channel gains E|h r,n | 2 = 1,the signal power (11) of a single partition is After deinterleaving (5) and aggregating (11) we obtain M partitions z k,m,l for each symbol v k,l . Using these we calculate M soft symbol estimates v k,m which are used to remodulate the transmitted signal for interference cancellation in the received signal. In a subsequent iteration, the cancelled signal is then matched filtered again and new soft symbol estimates are derived according to the signal flow of Figure 2.
The remodulated signals for each partition m for receive antenna r at iteration i are which are subsequently used to generate the canceled signal for data stream k, which in turn is processed again by the matched filters to produce z (i+1) k,m as in (11). This process is repeated for a number of iterations, after which final soft symbol estimates of the binary symbols v k,l are formed and passed on to K standard FEC decoders.

Soft symbol calculation and error variance
The observations z (i+1) k,m depend explicitly on the soft-symbol estimates v (i) k,m through the filtering and cancelation steps. These soft estimates in turn are computed from the matched filtered partitions z (i) k,m of the previous iteration round under observance of the extrinsic information exchange principle [4] as which is the optimal local minimum-variance estimate of v k,m given that I k,m and ζ k,m are combined from a Gaussian random variable with variance σ 2 i . This is easily shown to be true under some mild conditions.
The variance of the symbol estimates (15) will be required in the SNR evolution analysis in Section 3. Defining this variance at iteration round i as σ 2 v,i,k = E|v k − v (i) k,m | 2 , and assuming that correlation between partitions is negligible due to sufficiently large interleaving (see Figure 1), it can be calculated adapting the development in [8] for CDMA as where ξ∼N (0, 1) and μ = N 2 r (M − 1)P k /(M 2 σ 2 i ). The final output signal after I iterations is z (I) k,m , which is passed to the error control decoder for stream k. The final signal-to-noise/interference ratio of z (I) k is what primarily matters.
Christian Schlegel et al.

Variance evolution
The interference and noise on stream k are given by (8)-(10) giving an effective per symbol noise/interference variance at the partition level in (11) at iteration i of which is common to all streams. The upper bound in (17) contains the self term for k and m, which, however, becomes negligible as K and M grow-see (8) and (9). Using (16) in (17), we obtain With the variable substitution σ 2 i = σ 2 i M/N r which normalizes the noise power per symbol and antenna, we further obtain The dynamic system equation (19) is analogous to that occurring in the CDMA case discussed in [8]. We further assume, without loss of generality, that the powers are ordered as P 1 ≤ P 2 ≤ · · · ≤ P K . Denoting P k = P(k), and letting K and M become large, we use the continuous approximation where T(u) = N r P(uN r ), u ∈ [0, α], is the received power distribution over all data streams and a nondecreasing function, and we have introduced the parameter α = K/N r , called the system aspect ratio.
If the signal-to-noise ratio T(0)/ σ 2 ∞ of the lowest-power data stream k = 1 at the output of the demodulator is higher than the target performance threshold μ FEC of the FEC code used, all streams can be decoded to target performance. This allows us to derive the following convergence condition from (20): In [9] it is shown that the continuous power distribution allows the convergence condition (21) to hold for arbitrary aspect ratios K/N r , as long as the constant a ≥ a 0 = 2 ln 2. As shown in [9], this constant does not depend on the system aspect ratio. Thus, for two different ratios α < α the corresponding power distributions T(u) and T (u) coincide for u ≤ α, that is, The importance of this results is that new data streams can always be added at the cost of increased average power, without affecting decodability of the existing streams. Furthermore, in [9] it is shown that the distribution (22) allows such a system to approach the capacity of the Gaussian MAC channel to within less than 1 bit of capacity.

PAM modulation
Note that using the PAM modulation method proposed in Section 2.1 applied to a single MIIMO link does not follow the continuous power distribution of (23), but is compatible with that distribution.
In the case of PAM modulations, the variance transfer function (19) is where K b is the number of data streams at modulation level b, and P 0 is the total received power of the lowest modulation level b = 0. Defining the signal-to-noise/interference ratio for the lowest modulation level as μ = P 0 / σ 2 i and α b = K b /N r as the level-b aspect ratio, we obtain the convergence equation which must hold for which is the signal-to-noise ratio threshold of the FEC code used. Figure 3 shows the right-hand side of (25) minus 1 for a signal-to-noise ratio P 0 /σ 2 = 20 dB. As long as this difference does not exceed the zero threshold for μ < 20 dB, convergence to full interference cancelation is possible and we say the ratio α is supportable. The maximal achievable spectral efficiencies are 2.08, 2.52, and 3.21 bits/dimension, for 2-PAM, 4-PAM, and 8-PAM modulations, respectively. In Figure 4, we plot the achievable spectral efficiencies using the output signal-to-noise ratio of the iterative demodulator (25) assuming ideal posterror control decoding which will deliver the highest rate per channel possible. This can be closely approached with appropriate standard error control codes. In this case, the achievable capacities are very close to ideal. A certain PAM transmit layered signal can even exceed the capacity of the same PAM modulation using orthogonal dimensions. This is because the allowable aspect ratios for the correlated channel exceed unity, the number of available orthogonal dimensions. This is most striking for 2-PAM, where the maximum achievable aspect ratio of 2.08 is 6 EURASIP Journal on Wireless Communications and Networking  more than twice the number of orthogonal dimensions. For higher PAM constellations, α b → 1 rapidly from above, and we achieve a capacity equal to that of orthogonal signaling. Note: even though the actual signals used will be twodimensional complex equivalent, we have carried out our analysis normalized per dimension. This is permissible, since the phase offsets between different components of the same signal are phase-corrected by the matched filters, and signals from all interfering partitions affect the receiver analysis only via their interference variance, and therefore random phase offsets are not relevant.

Numerical examples
In order to support our theoretical findings, the following situations have been simulated. A system with N t N r was simulated on a Rayleigh fading MIMO channel. This is a situation which is pretty hopeless for a linear receiver. The block lengths used for the simulations were L = 2000 with 30-60 iterations, and N t = M = 100, that is, the number of trans-mit antennas equals the number of partitions. The modulation parameters are as follows: for PAM-16, K = 28, N r = 8, α b = 0.875, for PAM-8, K = 27, N r = 10 α b = 0.9, for PAM-8, K = 26, N r = 12 with α b = 1.08, and for PAM-2, K = 9, N r = 5 with α b = 1.8 and N t = M = 20. The gaps to the theoretical capacity values of about 25% are due to two effects. (i) A finite number of iterations requires lowering the maximal loads, and (ii) the relatively small values of the simulation parameters required imposes Diophantine constraints causing a certain granularity in the partial loads α b . An extension of these concepts to include modulation over many time intervals, for example, by combining transmitter layering with signal spreading will ease these constraints. Also, no attempt was made to use orthogonal sequences for the power levels of a given stream.

MULTI-USER MIMO SYSTEMS
Multiuser MIMO systems are, as illustrated in Figure 5, communications arrangements where spatially disjoint MIMO transmitters communicate with a central receiver which processes the different MIMO signal streams concurrently. Focussing consideration on an uplink application, we can view such an arrangement as a single large MIMO system. The capacity of this composite MIMO system forms an upper bound of what is achievable by the distributed transmitters. The receiver proposed in this paper is fully applicable to this case, since it layers each data stream separately, irrespective of data stream colocation. The only additional complexity that needs to be considered stems from the fact that the different MIMO transmissions are asynchronous with respect to each other. However, since the iterative receiver uses only basic cancellation operations, this asynchronicity can, in principle, be incorporated easily into the remodulation process in (31). For simplicity, we make the unrealistic assumption that the terminals are synchronous, noting that the actual interference levels for an asynchronous system are actually lower bounded by the synchronous case.
We consider U distinct terminals, each equipped with the proposed space-time dispersion transmitter. All the terminals access a common receiver performing a two-stage layering demodulation. For simplicity, we assume that all terminals have the same parameters, that is, number of data streams K, block length L, and number of antennas N t . The antenna signatures s u,k are uniquely chosen for each of the U × K transmitted data streams. The received signal in the multiuser case is-analogously to (3)-given by During the detection stage, the iterative algorithm calculates soft-bit estimates for each terminal, data stream, and partition, as and canceled signal streams are generated as where the remodulated signals for each partition m for receive antenna r at iteration i are found as New values z (i+1) u,k,m,r are obtained by repeated matched filtering of the cancelled signal.
In the multiterminal case, the joint distribution of the powers {P u,k } is playing the role of {P k } in the single terminal case. Thus, the effective noise/interference variance per symbol of the signal partition at iteration i is given by Furthermore, using arguments analogous to (18)-(23) it can be shown that if the continuous approximation of the joint stream power distribution P u,k satisfies (23), again, any sys-tem aspect ratio α = KU/N r can be achieved and the system operating point can approach the multiple access channel capacity. It can also be concluded that introduction of new terminals does not degrade system performance as long as the powers of the new data streams fit into the supportable (geometric) power profile. From a practical point of view, unequal received powers from different terminals may actually be beneficial as illustrated in Figure 6, where three MIMO terminals access a common receiver with received power distributions such that the second user's power is 6 dB less than that of the first, and the third user's power is 9 dB below the strongest user. The x-axis is labeled by the average E b /N 0 , as in Figure 4. The dashed lines are the single-MIMO-channel achievable spectral efficiencies. The solid lines are those achievable with the specific 3-MIMO-terminal system whose parameters are discussed below. As expected, the lower-order constellation benefits the most from different power distributions since the user power variation has its strongest impact. In fact, the advantage of higher-order PAM modulation starts to disappear, and, in situations with many different terminals at different received powers, 2-PAM will be sufficient to attain most of the channel's capacity.
Received power distributions other than the one simulated cause similarly augmented spectral efficiencies, but, of course, decodability of the lower energy terminals needs to be assured since the substream signal-to-noise ratio differences are preserved through the iterative demodulation process. The reader can also easily see how various rate-adaptive transmission schemes can easily be accommodated by this iterative demodulation receiver.
The simulation parameters for the performance curves in Figure 6 are as follows: 3-MIMO-terminals are used with relative powers 0 dB, −6 dB, and −9 dB. The MIMO channels  Figure 6: Achievable spectral efficiencies for three MIMO terminals with relative power differences of 0 dB, −6 dB, and −9 dB via simulations, plotted against the average bit energy-to-noise-power ratio.
use independent Rayleigh fading for each path. The system parameters are N t = M = 100 for all cases; for PAM-16, K = 16, N r = 13; for PAM-8, K = 6, N r = 6; for PAM-4, K = 8 with N r = 9, and for PAM-2, K = 5 with N r = 6. We can see that the achievable spectral efficiencies exceed those of the single terminal case and are close to the analytical limiting values.

CONCLUSIONS
We have presented and analyzed a two-stage iterative demodulation and decoding receiver of low complexity which can achieve the multiple-access capacity of the single, or multiuser MIMO channel given an exponential received power distribution of the different data streams. However, practical systems with simple PAM modulation formats whose decomposed binary powers follow this distribution approach the channel capacity over a wide range of operating SNRs in both the single-user MIMO as well as the multiuser MIMO situations, and can exceed the capacity of PAM constellations on orthogonal carriers-this can be viewed as akin to a constellation shaping gain. The transmitter operates by layering each transmitted signal over all available transmit antennas with groups of transmit signals (partitions) fed through different interleavers before transmission to achieve spatial and temporal spreading. The receiver relies on a conceptually simple iterative demodulator which repeatedly recombines and filters partitioned received signals followed by "off-theshelf " standard error control decoders.