A Comparison of Scheduling Strategies for MIMO Broadcast Channel with Limited Feedback on OFDM Systems

,


Introduction
Next generation wireless cellular systems are expected to support high-quality multimedia services; this motivates the interest in multiantenna (MIMO) systems, where both spatial diversity and multiplexing can be used to increase the achievable throughput. In fact, it has been shown that the downlink capacity of a MIMO system with perfect channel state information (CSI) scales as a linear function of the number of transmit antennas [1]. Although nonlinear dirty paper coding scheme achieves the system capacity, it has a high computational cost [2], and simpler solutions have been investigated. Linear beamforming has been shown [3] to achieve a large part of dirty paper coding capacity; in particular, zero forcing beamforming matched to an opportunistic scheduling is widely used [3].
However, benefits of MIMO are obtained only by a proper scheduling of transmissions, which opportunistically exploits channel conditions in order to increase throughput, while ensuring quality of service (QoS). Several scheduling techniques have been proposed for MIMO single carrier systems on flat fading channels based on various approaches, including clique search [4], maximization of the Frobenius norm of the composite channel matrix [5,6], user channel orthogonality [7][8][9], single bit feedback [10], waterfilling [11], tree search [12], evolutionary algorithms [13], and greedy scheduling [14] extended to the case of limited feedback in [15]. In some cases, joint optimization of scheduling and power allocation is performed [4-6, 10, 11, 13], while in other cases only scheduling is considered [7][8][9]14]. Moreover, QoS-oriented multiuser scheduling and beamforming have been investigated in [16], in order to conciliate the request of high throughput with low packet delays. An overview of research on cross-layer scheduling for multiuser MIMO single-carrier systems is given in [17]. A similar problem to multiuser MIMO scheduling can be found in other transmission systems, such as multicarrier code-or frequency-division multiple access [18].
In frequency selective channels, single carrier modulation is often replaced by orthogonal frequency division multiplexing (OFDM) due to its efficiency in overcoming multipath fading. In fact, the combination of MIMO and 2 EURASIP Journal on Wireless Communications and Networking OFDM seems to be the technology of future wireless cellular systems, as it has been proposed for downlink in the long term evolution (LTE) release of 3GPP standard [19,20]. When MIMO OFDM is considered, scheduling becomes more complex, as the number of resources to be allocated, that is, the number of subcarriers, increases and only suboptimal approaches are viable [12]. Complexity is further increased in a frequency division duplexing system, where CSI is provided to the base station by each user (mobile terminal) through a feedback channel. In fact, due to the limited feedback rate, only a partial CSI is available at the base station and additional processing is required to compensate the channel uncertainty. Some of the scheduling techniques considered for single carrier transmissions can be extended to OFDM. For example, in [21] a scheduling algorithm has been proposed for MIMO OFDM systems which extends method [14] for single-carrier systems: the set of scheduled users on each subcarrier is built in a greedy fashion, by adding one user at a time with the aim of maximizing a weighted sum rate (WSR). In [22] this approach has been further simplified to avoid the need of computing a new beamforming matrix upon the insertion of a new candidate in the set of scheduled users. A further simplification of the scheduling is achieved by computing an estimate on their signal to interference ratio which is then used to exclude users that would not contribute to the WSR, by introducing a threshold to their signal to noise plus interference ratio.
In this paper, we first show that any scheduler maximizing a wide class of utility functions can be reduced to a scheduler maximizing the weighted sum rate, where the weights are suitably chosen according to the utility function. Then we revise scheduling techniques proposed in the literature that maximize the weighted sum rate for a multiuser MIMO OFDM system with limited feedback and compare them in a LTE 3GPP scenario in terms of (i) computational complexity, (ii) memory requirements, and (iii) achievable throughput.
The rest of the paper is organized as follows. In Section 2 we describe the downlink MIMO OFDM system model. In Section 3 a general scheduling method is derived and algorithms [14,21] are revised. Sections 4 and 5 present, respectively, the user selection and user preselection strategies of [22]. In Section 6 the complexity of the various strategies is investigated. In Section 7 simulation results are illustrated and Section 8 outlines main conclusions.
Notation. Bold upper and lower letters denote matrices and vectors, respectively; (·) H denotes Hermitian operation (transpose complex conjugate), while (·) T denotes transpose; · is the vector norm, and E[·] stands for expectation.

System Model
We consider the downlink of a cellular system based on OFDM [23] with N C subcarriers. The base station has M transmit antennas while K users have one antenna each. Transmission is performed in time slots of L OFDM symbols and in each time slot users feedback a partial CSI, which is used by the base station to schedule downlink transmissions. The transmission bandwidth is divided into N resource blocks, each composed of N S subcarriers. At slot n, let U c (n) = {u 1,c (n), u 2,c (n), . . . , u |Uc(n)|,c (n)} be the set of |U c (n)| users, u i,c (n) ∈ {1, . . . , K}, scheduled for downlink transmission on resource block c ∈ {1, . . . , N}. We denote as stream the (user, resource block) pair (k, c). Let also P (n) be the set of streams scheduled at slot n, that is, In our analysis we model the channel as quasi static, that is, it is considered invariant for the duration of one OFDM symbol, and it has the same frequency response on all subcarriers of each resource block. Hence, the frequency response of the MIMO channel on resource block c of OFDM symbol t for all M transmit antennas and all |U c (n)| users is described by where the M × 1 column channel vector h k,c (t) collects the gains between the M antennas of the base station and stream (k, c). In general, for OFDM symbol t, x c (t, m), and y c (t, m) are, respectively, the M × 1 and |U c (n)| × 1 column vectors of the transmitted and received signals on subcarrier m of resource block c. The discrete-time complex baseband transmission model for subcarrier m of resource block c is given by where n c (t, m) is a |U c (n)|×1 complex Gaussian noise vector with i.i.d. components having zero mean and variance σ 2 n . The transmit signal is subject to the total power constraint NS m=1 x c (t, m) 2 ] ≤ P where P is the available power. In order to exploit spatial diversity, the transmit signal is obtained from the |U c (n)| × 1 data signal d c (t, m) by applying the zero-forcing beamforming matrix G c (nL), that is, where p c (nL) is the power normalization vector which enforces equal stream power as given by where the expectation is taken only with respect to d c (t, m), and x k,c (t, m) is the kth entry of x c (t, m).

Feedback Information.
In a frequency division duplexing system, channel state information is provided through a feedback channel; therefore, we assume that matrix H c (t) is not perfectly known at the base station while user k perfectly estimates the channel vectors once for time slot, that is, when t = nL, to obtain h k,c (nL), c = 1, 2, . . . , N. As in [24] we adopt a double feedback information for all the users at each slot. In particular, at slot n user k feeds back for each resource EURASIP Journal on Wireless Communications and Networking 3 block c: (i) a channel direction information (CDI) h k,c , which ideally tracks the normalized channel vector h k,c (nL), namely and (ii) a channel quality information (CQI), based on the estimated signal-to-noise plus interference ratio (SNIR) at the receiver for M orthogonal scheduled users evaluated as [24] We assume that the feedback channel has a finite rate of N b bits per slot and per user and allows zero-delay error-free transmission. The base station builds the matrix containing the unit-norm reconstructed channel vectors h k,c (nL). Using the partial CSI, base station evaluates an estimate γ k,c (n) of the SNIR of stream (k, c) as will be seen in Section 4. Zero-forcing beamforming with equal power distribution among streams is implemented for each resource block, hence the beamforming matrix is An estimate of the normalized (with respect to the bandwidth) rate achieved by stream (k, c) ∈ P (n) at slot n is R k,c (n, P (n)) = log 2 1 + γ k,c (n) .
Notation R k,c (n, P (n)) highlights the fact that rates achieved by different streams are mutually dependent, as (i) more streams allocated simultaneously on the same resource blocks yield interference, and (ii) the total power is distributed among active streams. Performance is evaluated in terms of WSR, with w k (n) suitable weights that take into account fairness and QoS constraints.

Exhaustive Search Scheduling.
At each slot, we aim at scheduling the set of streams that maximizes WSR. This problem can be solved by considering all ( M i=1 (K!/ i!(K −i)!)) N possible sets and evaluating the WSR achieved by each candidate set. Unfortunately, this exhaustive search (ES) scheduling has a high computational cost, which becomes infeasible for an increasing number of users and subcarriers. Simpler and suboptimal scheduling methods are investigated in Section 4.

Maximum Utility Scheduler
In order to balance the opportunistic use of channel resources with fairness among users, we consider a multiuser scheduler. We first consider in this section general criteria for the choice of weights of the WSR and we derive the optimum maximum utility scheduler weights for a general utility function. Then we specialize the result for the maximum sum rate scheduler and the proportional fair scheduler.

General Multiuser Scheduling.
Let R k (n, P ) be the achievable rate associated with user k, that is, R k (n, P ) = N c=1 R k,c (n, P ). In the first slot, the average throughput achieved by user k is T k (1) = 0, k = 1, 2, . . . , K. The estimate of the average throughput achieved by user k can be updated as where N c=1 U c (n) is the set of scheduled users at slot n. If we aim at achieving an average throughput ρ k for user k, we can define the normalized averaged throughput at slot n as In [25], the following concave and differentiable utility function has been proposed to design schedulers where κ ∈ [0, 1) ∪ ( ∞) is a fairness parameter to be chosen according to the desired scheduling policy. For example, for κ → 1 we obtain the proportional fair scheduler (PFS). For κ = 0 we obtain the utility function of the maximum sum rate scheduler. When κ → ∞, (13) becomes the utility function of the max-min scheduler.
In Appendix A, we derive the set of users P (n) that maximizes the sum utility The solution is with weights where . . , N}} is the set of all possible streams. Note that for K = 1, (16) boils down to the maximum utility scheduler of [25].

Maximum Sum Rate Scheduling.
The maximum sum rate scheduler does not consider the fairness among users (κ = 0) and simply aims at maximizing the achievable sum rate (SR), providing w k = 1, for k = 1, . . . , K, and 3.3. Proportional Fair Scheduling. The multiuser multicarrier proportional fair scheduling (MMPFS) algorithm [26] is an extension to the OFDM multiuser scenario of the PFS algorithm.
For MMPFS, the average throughput of user k is updated as in (11) with α k = 1/τ, where τ is a parameter related to the time over which fairness should be achieved. In [27] it has been shown that proportional fairness, maximizing k log 2 T k (n), is achieved by scheduling users as We observe that for τ 1 we can approximate and MPFS (18) coincides with the maximization of the WSR (15) with weights (16), ρ k = 1/(τ − 1) and κ = 1.

Greedy Scheduling Strategies
Now that we have established that maximizing the weighted sum rate is equivalent to the maximization of a wide set of utility functions, we focus on methods that allow to achieve this goal. In the following we investigate suboptimal solutions to problem (15) for a small number of users K, when the probability of having a fully loaded system is small. In fact, in this scenario power distribution has an important role in selecting the optimal user set. In Section 4.3 we will consider the case of a high number of users K, and in this case a simplification of scheduling is possible. For ease of notation we drop both slot (n) and OFDM symbol (t) index in the remaining of the paper.

Multicarrier Greedy (MG).
In [14], a greedy scheduling algorithm in a single-carrier flat-fading system has been proposed, where users are selected one by one as long as the throughput increases and it has been then extended to an OFDM system in [21] and denoted here multicarrier greedy (MG). The MG algorithm comprises N step steps, and at each step we select the stream that maximizes the increase of WSR. Let S (i) be the set of streams scheduled for transmission at step i, (i = 1, . . . , N step ), with the corresponding WSR R(S (i) ). Initially we have S (0) = ∅. The stream selected at step i + 1, is and we set S (i+1) = S (i) ∪{(k, c)}. The WSR R(S (i) ) increases at each step, since stream (k, c) is inserted under the condition that When (21) does not hold, the algorithm is stopped, N step = i and P = S (Nstep) . Hence, N step is a random variable. Evaluation of the WSR in (20) for the current set of streams is based on the observation that for an equal power allocation across users and resource blocks under constraint (4), using (3), the allocated power to user k and resource block c turns out to be [15] p k,c = P can be computed by scaling with the newly computed transmit power the CQI value, that is, where ξ j,c is given by (6) while g (i) j,c is the jth column of the beamforming matrix G (i) c for users scheduled at step i. Note that total power P has been divided by |S (i+1) | = i+1 in order to obtain the per stream power P S .

Projection-Based Greedy (PBG).
According to the MG algorithm, the introduction of a new candidate stream (k, c) into the set S (i) at step i + 1 decreases the SNIRs (23) for two reasons: (1) the power is redistributed among all streams; (2) beamforming of streams already scheduled on the same resource block is modified.
Due to (a), it is beneficial to perform scheduling jointly among resource blocks rather than separately on each resource block. Due to (b), a new beamforming matrix must be computed for users scheduled on resource block c of the candidate stream. Hence, at each step many beamformers must be designed for each resource block to test (21) and only one candidate stream is then scheduled. In order to reduce the computational complexity, the projection-based greedy (PBG) algorithm [22] assumes that the insertion of a new stream does not significantly alter the SNIR of already scheduled streams. Indeed, this assumption holds as long as the channel vector of the candidate stream is almost orthogonal to channel vectors of previously scheduled streams. Therefore, we update the SNIR estimate of already scheduled streams as follows for i = 2, 3, . . . , N step − 1, while for the first step we set γ (1) p,q = ξ p,q , (p, q) ∈ S (1) . Furthermore, the evaluation of the SNIR EURASIP Journal on Wireless Communications and Networking 5 for the candidate streams requires only the computation of k,c 2 instead of the entire beamformer. In particular, if we define a k,c S (i) = 1 from (23) we have In order to compute (25) and the corresponding SNIR (26) of the candidate stream (k, c), it can be observed that its beamforming vector is obtained by the orthogonalization of h k,c with respect to the normalized channel vector of already scheduled streams on the same resource block. Hence, an orthonormal basis B c (i) = {b j,c } is first constructed for the space generated by the channel vectors {h p,c } of streams in S (i) on resource block c. Then the beamforming vector for stream (k, c) would be proportional to Now, by imposing k,c = 1 and we have By using (26) and (28), there is no need to determine a new beamformer in correspondence of each candidate stream; instead, only the basis B c (i) needs to be updated at each step, and this requires only few vector multiplications. Note that the computation of a k,c is based on the projection of the candidate vector on the basis, as from the acronym PBG. Once all streams have been scheduled, a beamformer is computed to perform transmission.

Greedy Scheduling Strategies in the High K Scenario.
If K M, multiuser diversity provides M orthogonal streams on each resource block with very high probability, thus we will have almost always a fully loaded system, that is, |U c | = M. In this case, both MG and PBG algorithms can be simplified without redistributing the available power at each new insertion, and the per stream power (4) becomes Scheduling can then be simplified by operating independently on each resource block. (MSUS). The semiorthogonal user selection (SUS) scheme [9] can be easily generalized to the OFDM scenario and is here denoted as multicarrier SUS (MSUS). The generalization includes also the maximization of the WSR instead of the SR as considered in [9]. MSUS proceeds by steps now applied separately on each resource block. For resource block c, let A (1) c = {1, . . . , K} be the initial set containing the indexes of all users. The scheduled stream at step 1 is characterized by having maximum CQI, that is,

Multicarrier Semiorthogonal User Selection Algorithm
After selecting i streams, the (i+1)th stream k i+1 is chosen within the set as where is a design parameter that sets the maximum correlation allowed between the quantized channel vectors of the selected users. We note that in MSUS we apply N single carrier SUS in parallel, one for each resource block. Also in this case the number of steps is random as the algorithm ends when set A (i) c is empty. Once users have been scheduled, the total power is equally distributed among the scheduled streams according to (4).

Preselection Methods
In the MG algorithm the WSR R(S (i) ) increases at each step and using (9) and (10), condition (21) becomes (p,q)∈S (i+1) From (23) we obtain that this condition is satisfied only if the SNIR is high enough to compensate for losses incurred by the insertion of a new scheduled stream, that is, the power redistribution and the beamforming modification, as described by conditions (1) and (2) of Section 4.2. This observation suggests a further simplification of the PBG algorithm, by apriori excluding the streams whose SNIR is below a certain threshold. Indeed, as for each candidate stream the SNIR (26) must be evaluated, by excluding streams that could never be inserted, the scheduling procedure can be fastened [22]. Note that the idea of preselecting users has been first introduced in [28], by letting users feeding back their CSI and rate request only if the quality of their channel is above a threshold. On the other hand, we use preselection as a technique to simplify scheduling rather than reducing the feedback rate. Moreover, in our case the preselection is not based only on the channel quality but also on the correlation with other users' channels.

Preselection PBG (PPBG).
We first observe from (28) that a k,c (S (i) ) ≤ 1 and from (25) we obtain Therefore, at step i of PBG there is a minimum value of ξ k,c that satisfies (33), denoted A(i + 1), and we consider for scheduling only streams having SNIR ξ k,c > A(i + 1).
As shown Appendix B, at high SNR we have Then by considering only streams (k, c) satisfying (35), we decrease the number of comparisons and SINR updates at each step of PBG. In the high K scenario the preselection technique is not feasible; in fact, as illustrated in Appendix B, A(i) → 0 for K → ∞, and therefore (35) is verified by all streams. We further note that A(i) is an increasing function of i; hence, streams whose CQI is below the threshold A(i) at step i can be neglected also in the next steps.

Simplified Preselection PBG (S-PPBG). A further simplification in preselection is achieved by neglecting weights w p on evaluating A(i), that is, by considering
Within PBG methods, we note that this approach becomes optimal when the scheduling objective coincides with the maximization of the SR. However, for the maximization of the WSR, S-PPBG is in general suboptimal.

Complexity Analysis
We analyze the worst case complexity of the various approaches, in terms of both computational complexity and memory requirement.

Computational Complexity.
We assume that a comparison yields a computational complexity equal to λ complex multiplications (CMUX), while the inversion of an M × M matrix performed by Gaussian elimination methods has complexity M · (M 2 − 1)/3 CMUX. The beamforming and g k,c 2 evaluation have therefore complexity We first observe that all considered algorithms select one stream per step, until at most M streams are allocated on each resource block, thus in general N step ≤ N · M. At step i, |C \ S (i) | = K · N − i streams are considered for insertion in S (i+1) . Furthermore, at each step, the per stream power P S is adapted, due to the insertion of a candidate stream in S (i+1) .

MG Complexity.
Complexity of the MG algorithm in the low K scenario is given by where ζ(i − 1) denotes the resource block of the stream selected at step i − 1. The first term in (39) accounts for the selection of the stream with maximum CQI. The remaining terms account for step 2 through N step , with (a) update of SNIR estimate of the (i − 1) already scheduled streams, (b) computation of a new beamformer for each of the (K − |U (i) ζ(i−1) |) candidate streams on subcarrier ζ(i − 1), (c) evaluation of g k,ζ(i−1) 2 , (d) update of the SNIR estimates, and (e) evaluation of the WSR. Lastly, the algorithm determines the stream which maximizes the WSR at step i and checks condition (21).
In the high K scenario complexity becomes since now N step = M and no power update is necessary at each step.

PBG Complexity.
Complexity of the PBG in the low K scenario is In fact, the PBG algorithm for each candidate stream on resource block ζ(i−1) (a) performs the projection of channel vector on the orthogonal basis and (b) updates the SNIR estimate. At each step, the basis is also updated according to the channel vector of last scheduled stream. At the end, the beamforming matrix is computed according to the set of scheduled streams.
In the high K scenario we have since scheduling can be performed in parallel on all resource blocks.

PPBG Complexity.
The complexity of the PPBG in the low K scenario is given by It only differs from PBG in the evaluation of A(i + 1) at each step, since it depends on the set of scheduled streams. Similarly, in the high K scenario we have

S-PPBG Complexity.
Applying the S-PPBG algorithm, we have an additional cost due to (35); on the other hand, on resource block c, at each step i we exclude a number of streams Q c (i) from the set of possible streams. Q c (i) takes into account also the scheduled streams. Then at step i we have J i,c = K − i j=1 Q c (i) candidate streams on resource block c and in total J i = |C \ S (i) | = N c=1 J i,c . Complexity becomes Note that Q c (i) is a random variables depending on the channel realization. In the high K scenario we still consider power adjustment; otherwise, from (B.2), we could never exclude streams, and then S-PPBG would become PBG. Complexity of S-PPBG in the high K scenario becomes The MSUS algorithm is equivalent to N SUS algorithms working in parallel. We remind that at each step SUS considers |A (i) is the number of users excluded at step i. It is (47)

Asymptotic Complexity Analysis.
According to complexities required by various scheduling algorithms, we investigate their asymptotic behavior as a function of K. For MG we have For PBG and PPBG we have Both S-PPBG and MSUS perform the exclusion of worse streams. Let β i be the percentage of streams excluded at step

Memory Occupation.
Lastly we investigate memory requirements of the scheduling algorithms in terms of complex location (CLS) units. We first note that all algorithms store (a) CDI and CQI of all streams, (b) the set of selected streams, and (c) the final beamformers; then a memory occupation of M COMM = N · M · K + K · N + N · M 2 + M · N CLS is common to all algorithms. For MG we have EURASIP Journal on Wireless Communications and Networking since MG stores (a) γ j,c (or, equivalently, g j,c 2 ), requiring K · N CLS, (b) per user rates (N · M CLS as worst case), (c) new beamformer (K · M 2 CLS), (d) total rate provided by each candidate (K CLS), and (e) current and last final rates (2 CLS). For PBG and PPBG we have as PBG stores (a) the value √ a k,c , (b) total rate provided by each candidate stream (K CLS), and (c) orthogonal basis (M 2 · N CLS).
The S-PPBG memory requirement is given by with respect to PBG it needs to store also A(i) (M · N CLS as worst case). Finally, for MSUS we have as MSUS stores (a) correlations of candidate streams and last inserted stream (N · K CLS), (b) the value of (1 CLS), and (c) the set of total rates of each candidate (K · N CLS as worst case).

Simulation Results
We compare the scheduling algorithms in terms of average sum rate (SR) and complexity requirements. All users are uniformly distributed in a cell of radius 500 m, as in [29]; we consider an average SNR of 15 dB per resource block at the cell border and path loss is included in the channel model. We assume also a realistic MIMO channel with time, frequency, and spatial correlation among the elements of H c (t). The channel is modeled as slowly time-variant, frequency selective Rayleigh fading as from the spatial channel model (SCM) [30]. According to the LTE release, we set transmission bandwidth to 2.5 MHz, divided into N = 12 resource blocks and centered at the carrier frequency of 2 GHz. The base station is equipped with M = 4 antennas spaced by 10 wavelength. Scheduling and beamforming are performed once a slot, and each slot is composed of 7 adjacent OFDM symbols. CSI feedback is performed with a variable number of bits using an optimized codebook, as detailed in [31].

Performance Comparison.
We first compare the SR achieved by MG with ES scheduling using as optimization criterion the maximum sum rate. For complexity reasons simulations have been limited to N = 4 resource blocks.
To simplify simulations in the ES method, results of both MG and ES in the high K scenario, K = 18N, 20N, refer to N = 1. In fact, we verified that for high K the system is fully loaded with a probability higher then 95%; in this scenario the power granted to each carrier is P/N, and  then user selection can be performed independently on each carrier. We consider both the case of perfect CSI at the transmitter and the case of partial CSI obtained by feedback from the receiver, with a feedback rate of 12 bit/user/resource block/slot. We observe that partial CSI provides a loss on SR of 2 up to 3.5 bit/user/resource block/slot, but it does not affect the general behavior of the two algorithms. As we can see from Figure 1, both MG and ES have a very close sum rate for all K. Hence, in the following we consider MG as performance bound. Figure 2 illustrates the average SR achieved by the scheduling algorithms as a function of the number of users K in the low K scenario for a feedback rate of 12 bit/user/resource block/slot. We note a negligible loss in performance of the simplified methods. Similarly, simulations in the high K scenario show that MG, PBG, and S-PPBG achieve a SR of 16.40 bit/s/Hz, while MSUS provides 15.40 bit/s/Hz. Overall we observe that the simplified algorithms do not provide SR loss for all K. This is mainly due to the fact that all scheduling methods are based on an opportunistic approach, so they all aim at selecting the best set of orthogonal users. We also note that all algorithms always select the same first stream, whose channel vector in turn determines the choice of the other streams. We underline that the average SR of S-PPBG is very close to that of PBG and MG; moreover, since S-PPBG is an approximation of PPBG, we deduce that also PPBG provides the same SR of S-PPBG. Figure 3 confirms this behavior also with a PFS.
We note also in Figure 3 that preselection applied to PBG provides slightly better performance, despite the fact that it considers a lower number of candidate sets. In fact, preselection aims at excluding from scheduling streams that would not increase the WSR and prevents the scheduler from inserting them for fairness reasons.    Figure 4 reports the average SR versus the feedback rate; we observe that the simplified methods are also robust to quantization error; in fact, for all considered values of feedback rate, PBG and S-PPBG provide the same SR of MG. Note that a feedback rate of 12 bit/user/RB/slot would result in an extremely large feedback overhead for the cases with a high number of users (960 bit/RB/slot for K = 80 users),   while performance decreases markedly at lower feedback rates. Figure 5 shows complexity versus K. For K = 2 to 64 the low K complexity expressions are used, while from K = 128 to 1024 we use the high K complexity expressions. We first observe that the complexity ratio between the scheduling algorithms is nearly the same both in the low K and high K regime. As expected, MSUS and S-PPBG complexity trend is not influenced by the value of K. From Figure 5 we note that for K = 5 ÷ 50, with corresponding fully load probability in the range from 1% to 95%, the computational cost of MG is from 2.2 to 18.5 times the cost of PBG, with a factor increasing in K; as expected, the preselection technique further reduces complexity by a factor 1.2-1.4 with respect to PBG. We note also that complexity of S-PPBG is only 2.4-2.9 times the complexity of MSUS. As complexity of PPBG is bounded between that of PBG and S-PPBG and these two are very close, we omitted to show PPBG in Figure 5.

Complexity Comparison.
In the high K scenario, simulations confirm the analysis; in fact, for K = 400 we have C MG = 2.61 · 10 6 , C PBG = 9.4 · 10 4 , C MSUS = 3.49 · 10 4 , and C SPPBG = 11.9 · 10 4 . We underline that in the high K regime S-PPBG complexity is higher than that of PBG because of the required power distribution; indeed simplification of preselection does not compensate the need of redistributing the total power. On the other hand, we note that the high complexity required by MG is mainly due to the evaluations of the beamformer at each step.
Memory requirements, investigated in Section 6, do not prefigure large differences between different methods; for K = 400 required memory locations are 35890 for MG, 29682 for PBG, 29730 for S-PPBG, and 33841 for MSUS. Hence, the simplified techniques achieve a reduction of memory requirement with respect to existing algorithms.

Conclusions
This paper has provided an overview of scheduling problems for multiuser downlink MIMO OFDM systems. We first have shown that scheduling according to a wide class of utility functions can be reduced to a scheduling problem aiming at maximizing the weighted sum rate of the system, under a proper choice of the weighting function. Then we have compared scheduling algorithm having as objective the maximization of the weighted sum rate, including greedy algorithms, based on throughput maximization and algorithms based on the semiorthogonality among MIMO channels. Extensions to a OFDM scenario of algorithms originally devised for flat-fading single-carrier systems have been investigated. The comparison has been carried out both in terms of computational complexity and in terms of achievable throughput.
Several insights on the performance of the state of the art scheduling algorithms can be highlighted from the numerical results. Firstly, the MG approach achieves an average sum rate which is very close to the maximum value achieved by ES, over a wide range of cell loads. When compared against MSUS, the proposed MG technique has a gain of about 50% in terms of average sum rate in most network conditions. Moreover, MG requires a significantly lower complexity than that of ES and only 30% additional CMUXs than MSUS. Hence, we believe that MG provides a good trade-off between performance and complexity.
Lastly, limitations in the feedback rate have a severe impact on the performance of all scheduling approaches. Indeed we have seen that all schedulers yield an average sum rate that increases linearly with the number of bits used to feedback the CSI with an increase of about 1 bit/s/Hz for each additional feedback bit.

Appendices
A. Proof of (15) We aim at solving P (n) = arg max I⊆C K k=1 U k (B k (n)). (A.1) From (12) and (11), the problem (A.1) can be rewritten as P (n) = arg max I⊆C k:(k,c)∈I U k 1 ρ k δ k (I)α k R k (n, I) where δ k indicates that user k is scheduled, that is, δ k (I) = 1 if (k, c) ∈ I and δ k (I) = 0 otherwise. Following the derivations of [25] we observe that for all but the scheduled users, the allocated rate at slot n is zero, therefore we have P (n) = arg max I⊆C k:(k,c)∈I U k 1 ρ k α k R k (n, I) Under the assumption (1 − α k )T k (n) α k R k (n, I), the following approximation holds  where (k, c) is the generic candidate stream.