Throughput-Optimal Scheduling with Low Average Delay for Cellular Broadcast Systems

While a number of scheduling policies achieve the maximum throughput region, the average delay minimization problem for cellular broadcast systems still awaits its complete solution. To this end, we introduce a scheduling policy which decomposes the cross-layer delay optimization problem into two subproblems: allocation of physical resources and user priority management. The ﬁrst subproblem is translated into a weighted sum rate maximization problem that can be e ﬃ ciently solved for di ﬀ erent channel models. The solution of the second subproblem determines the weight factors in the maximization problem expressing the priorities of users. For the latter subproblem we present a so-called idle state prediction algorithm minimizing our relevant delay measure. Analytical and simulative tools are used to show that the introduced scheduling policy provides both optimal throughput and low delay performance.


Introduction
Allocating limited resources at medium access control (MAC) layer and physical (PHY) layer among users (from now on briefly scheduling) is a fundamental problem in the design of next generation wireless systems. In general, a scheduling problem can be formulated as some kind of optimization problem, where the objective is to maximize/minimize some system performance measure under PHY layer constraints as well as quality of service (QoS) constraints on the MAC layer. One important performance measure is the total system throughput and it is therefore often considered as an objective of the optimization problem. Throughput-optimal scheduling policies are the policies, which can support any vector of arrival rates inside the ergodic achievable rate region [1,2]. There exist quite a few scheduling policies which achieve this figure of merit [1,[3][4][5][6][7][8][9].
An important observation is that even though two different scheduling policies have the same throughput performance, they might significantly differ in, for example, their packet delay performance. Hence, in a system with random packet arrivals stored temporarily in queues, an enhanced performance criterion is to keep the queue lengths as short as possible so that the average packet delay of each user is minimized. One widely applied scheduling policy is the maximum weight matching scheduling (MWMS) policy which maximizes the sum of the rates weighted by the packet queue length of each user [3,4,[10][11][12]. It was shown in [12] that the MWMS policy is delay-optimal for multiple-access channels. However, this result is based on the polymatroidal structure of the capacity region of multiple-access channels. For broadcast channels (BCs), MWMS is not delay-optimal even with symmetry assumptions. Motivated by this fact, Seong et al. introduced in [9] another throughput-optimal scheduling policy called queue proportional scheduling (QPS) which provides superior delay and fairness properties for the BC compared to MWMS. It minimizes the maximum draining time of the queueing system without new arrival. Based on the QPS policy, the delay region for such queueing systems can be characterized if the channel state is quasistatic [13].
A disadvantage of the approach in [9] is that the cost function is not directly related to average packet delay which 2 EURASIP Journal on Advances in Signal Processing by Little's law can be calculated as the average queue length divided by the average arrival rate. Based on this expression, a scheduling problem can be formulated as a cross-layer optimization problem containing system parameters and constraints in PHY layer and MAC layer. A direct solution of such a problem involves a large number of optimization variables and is intractable. The focus of this paper is to provide a new approach to this problem.
Contributions. We first analyze some general characteristics of throughput-optimal scheduling policies. It is shown that they can be generally formulated as weighted sum rate maximization problems differing only in the choice of the weight factors. Furthermore, we prove that for throughputoptimal policies the weight factors are independent of the current channel state. Hence, the cross-layer scheduling problem is decomposed into two separate optimization subproblems: (1) finding the optimal weight factors according to the queue states; (2) solving the weighted sum rate maximization problem with respect to the instantaneous channel states.
Both subproblems are only coupled by the weight factors in the maximization problem. Interestingly, it was pointed out in [14,15] that the solutions of such an optimization problem itself exhibit a layered structure with only limited degree of cross-layer coupling. Under some mild conditions, the complete optimization problem can be decomposed into several subproblems and the interfaces among them are quantified as the optimization variables coordinating the subproblems. For Step 1 we introduce an iterative algorithm called idle state prediction (ISP) algorithm to obtain the optimal weight factors. This algorithm calculates the delay-optimal weight factors under the assumption that no new arrivals occur in the future by using the ergodic achievable rate region and the current queue state. Obviously, since we assume a dynamic scenario with random arrivals, the weight factors have to be recalculated in each time slot according to the updated queue state. Once the weight factors are fixed, the actual resource allocation is determined by maximizing the weighted sum of rates according to the instantaneous channel state. Note that we do not further comment on the weighted sum rate maximization problem in Step 2 for which there exist already efficient algorithms (multiple input multiple output (MIMO) [16][17][18], orthogonal frequency division multiplex (OFDM) [19,20]). Simulations show that the delay performance can be significantly improved by the introduced scheduling policy.
The rest of this paper is organized as follows. In Section 2 we describe the system model and define the stability and delay measurement used in this paper. The characteristics of the throughput-optimal scheduling are analyzed in Section 3. Based on these characteristics, we introduce our parameter separation concept for the design of throughput-optimal policies. In Section 4 we present our scheduling policy for both static and dynamic channels.
The scheduler is evaluated through simulations in Section 5. Finally, we conclude with Section 6.
Notations. We use boldface letters to denote vectors and normal fonts with subscript are the elements of the vectors.
x denotes the l 1 -norm of the vector x. The inequality between two vectors x ≤ y stands for x being componentwise smaller than or equal to y. Furthermore we use x to denote the smallest integer larger than x and A c is the complement of a set A.

PHY Layer.
We consider a single-cell downlink system in which a base station simultaneously supplies M mobile users. The channel between the base station and each user is assumed to be constant within a time slot and varies from one time slot to another in an i.i.d. manner. The channel state of user m in the nth time slot is denoted as h m (n) ∈ S, where S is an arbitrary countable or uncountable set, and all channel states of the user set M := {1, . . . , M} are collected in the vector h(n) ∈ S M . Here, the set S is used to indicate that the general approach is not restricted to a specific transmission scheme. For example, in an MIMO system the channel state can be described as a matrix of complex channel gains such that h m (n) ∈ C nr nt where n r , n t are the number of transmit and receive antennas at the base station and mobiles, respectively. Likewise, for an OFDM system the channel state can be defined as a vector of complex channel gains on each subcarrier h m (n) ∈ C K , where K is the number of subcarriers. In the nth time slot the data is transmitted through the channel at rate r(n) ∈ R M + lying in the achievable rate region denoted as C(h(n), P) with given sum power budget P. For technical reasons we assume that the transmit rates r m (n) are uniformly bounded by some real constant c r > 0.
Note that it is not relevant for the purposes of this paper in what way the achievable rate region is parameterized. It is just assumed that we are able to solve the following maximization problem: where μ ∈ R M + is the set of weight factors. The solution of (1) is a point on the convex hull of the achievable rate region C(h(n), P). Observe that μ also represents the normal vector of the convex hull at the point r(μ, h(n)) (see Figure 1). Then, the ergodic achievable rate region is given by which is a convex set [19,20]. In this paper, we call the achievable rate region in time slot n the instantaneous achievable rate region or just achievable rate region which is dependent on the current channel state h(n) and power constraint P. The term ergodic achievable rate region is used for the rate region defined in (2) which is averaged over all channel states. In order to show the applicability of our approach let us provide an example. We denote by C OFDMA (h(n), P) the achievable region of an orthogonal frequency division multiple access (OFDMA) system, where each subcarrier is exclusively assigned to one user. Due to the limited number of coding and modulation schemes only certain rates are achievable. Furthermore, there is a fixed power budget on each subcarrier k denoted by p k and k p k ≤ P. The achievable OFDMA region is defined as where θ m,k ∈ {0, 1} is the indicator if user m is mapped onto subcarrier k and r m,k ( h m,k , p k ) is the rate of user m on subcarrier k with transmit power p k in one time slot (e.g., 2 bits for QPSK). The parameter h m,k is the reported (quantized) channel state of user m on subcarrier k. Due to these practical constraints C OFDMA (h(n), P) is a set of discrete rate points, nevertheless the solution in (1) achieves the points on the convex hull of C OFDMA (h(n), P) and the formulation of the ergodic achievable region C erg (P) is still valid. A detailed description of this region can be found in [20].

MAC Layer.
Assuming that the transmission is timeslotted, data packets arrive randomly at the MAC and queue up in a buffer reserved for each user m ∈ M. Simultaneously the data is read out from the buffers according to the system state, that is, the random channel state and the current queue lengths. Thus, the system can be modeled as a queueing system with random processes reflecting the arrival and the departure of data packets.
Denoting the queue state of the mth buffer in time slot n ∈ N by q m (n) and arranging all queue states in the vector q(n) ∈ R M + the evolution of the queueing system can be written as where is a random vector denoting the amount of arrival packets during the nth time slot and the vector r(n) ∈ R M + is the amount of transmitted data. Without loss of generality we set the length of a time slot T = 1, so that a(n) and r(n) are equal to the arrival and transmit rate during the time slot n. We assume that the size of a data packet is constant and the sequence of arrival packets forms an i.i.d. sequence of variables over time. To simplify the notation we set the packet size to 1 without loss of generality. Further we make the technical assumption that the maximum numbers of arrival packets within one time slot are uniformly bounded by real constant c a > 0. The vector of mean arrival rates is denoted as ρ = E{a(n)} ∈ R M + . The transmit rate r(n) is determined by the applied scheduling policy. In this paper, we consider only stationary scheduling policies which are the mappings We further assume that the scheduling policies are only dependent on the proportion of the individual queue lengths and not dependent on the norm q(n) ; this covers most existing policies. The rate allocation according to the scheduling policy P is denoted as r P (h(n), q(n)). Since both arrival rates a(n) and channel state h(n) are i.i.d., the evolution of the queueing system can be modeled as a discrete-time Markov chain with general state space.

A Cross-Layer Performance Measure.
The throughput region is defined as the set of all arrival rate vectors for which the Markov chain in (4) is stable in some sense [2]. There exist several relevant stability measures for Markov chains in literature, for example, strongly stable, weakly stable, recurrent. In this paper we resort to the definition of weak stability as in [21].

Definition 1.
If for every > 0 there is B > 0 and N 0 ( ) such that for all n > N 0 ( ), it follows Pr( q(n) > B) < ; then the Markov chain is weakly stable.
In contrast the definition of an unstable Markov chain is more subtle.

Definition 2. A Markov chain is said to be uniformly transient
Let us introduce the following terminology: we call a vector of mean arrival rates ρ stabilizable (not stabilizable) under a specific scheduling policy P when the corresponding queueing system driven by P is weakly stable (uniformly transient). It is well-known that any vector of arrival rates inside the ergodic achievable rate region is stabilizable and all other vectors of arrival rates are not stabilizable [1,2]. Thus, a scheduling policy is called throughput-optimal if it keeps the system weakly stable for any vector of arrival rates ρ which lies in the ergodic achievable rate region.
Having defined throughput-optimal scheduling policies we now introduce a relevant cross-layer performance measure for average packet queueing delay. Consider the following quantity: (6) where the natural number N ≥ 1 is the length of the observation time window and α 1 , . . . , α m are positive real factors. If the factors are chosen such that α m := 1/ρ m = 1/E{a m (n)}, and the limit lim N→+∞ (1/N) N n=1 α m q m (n) exists and is equal to its stationary value, then D m (N) represents the average queueing delay of each packet of user m as N → +∞ [22]. Note that even if the average arrival rates ρ m are not known a priori or they are approximately estimated "on the fly," so that α m / = 1/ρ m , (6) still represents a useful, measurable quantity for practical purposes. In this case, D(N) is the weighted average delay where the weight factor equals α m ρ m .

Parameter Separation Design of Throughput-Optimal Scheduling Policies
In this section, we study some general characteristics of throughput-optimal scheduling policies.

Theorem 1.
A throughput-optimal policy always allocates the rate vector on the convex hull of the instantaneous rate region.
Proof. If the scheduling policy P allocates a rate vector r P (h(n), q(n)) in the interior of the convex hull of C(h(n), P), we have for some μ * ∈ R M + . Disregarding sets of measure zero and since the policy is independent of time index n, the ergodic achievable rate region C P erg (P) of the policy P is smaller than C erg (P), Thus, the scheduling policy does not achieve the entire ergodic rate region C erg (P) and is not throughputoptimal.
Theorem 1 is to be understood in the sense that if the rates are not allocated on the convex hull of the instantaneous rate region, some arrival traffic with ρ ∈ C(h(n), P) can be constructed so that the queueing system is uniformly transient. Since not all arrival rates with ρ ∈ C(h(n), P) can be supported, the policy is not throughput-optimal.
Therefore, throughput-optimal scheduling policies can be formulated as an optimization problem, where μ P determined by scheduling policy P is a mapping from the current channel state h(n) and the queue state q(n) to the set of weight factors. Generally two mappings μ P and μ P lead to the same rate point, where the convex hull has no unique supporting hyperplane. In this case, we define μ P to be equivalent to μ P if they lead to the same point. The following theorem presents an important property of the mapping μ P . Theorem 2. The mapping μ P which characterizes a throughput-optimal scheduling policy is independent of the current fading state h(n).
Proof. We choose arbitrarily a weight vector μ * corresponding to a fixed boundary point r * of the ergodic achievable rate region, hence μ * is independent of the instantaneous channel state. According to Theorem 1 we have for the channel state h and the queue state q r P h, q = arg max r∈C( h,P) Thus, for fixed q ∈ R M + , we have Equality holds if and only if μ P ( h, q) = μ * and the boundary point is achieved by the scheduler P , otherwise the scheduling policy gives a rate vector in the interior of the ergodic rate region. Therefore, if μ P is dependent of the instantaneous channel state, we can choose an arrival process whose mean rate ρ * fulfills Define a bounded positive function with EURASIP Journal on Advances in Signal Processing 5 we have if q(n) is sufficiently large. Since the arrival rate is bounded a m (n) < c a < +∞, for all m ∈ M, it holds Since the last inequality holds for any q(n), according to [23,Theorem 8.4.2] the queueing system is uniformly transient.
As we introduced in Section 1, cross-layer design usually improves the system performance at the cost of high computational complexity. The optimization problem involves variables and constraints from both PHY and MAC layer. The resources, which can be dynamically adapted, are not only limited to transmit power, but can also be extended to code, frequency, and space according to the applied physical model. At the same time, the scheduler must consider the possible evolution of the queue states in subsequent time slots. However, following the result in Theorem 2, we can define the weight vector μ P (q) of a throughputoptimal policy as a function only determined by queue state q. In this way, the classical cross-layer optimization problem can be divided into two subproblems: finding the optimal weight vector μ P (q) according to the queue states; solving the resource allocation problem (9) with the given weight vector. By the separation of the optimization parameters, the complexity of the optimization problem is largely reduced. Since the second subproblem can be efficiently solved for various physical models, the scheduling design problem reduced to find the optimal weight vector for the optimization problem. Particularly for the considered delay optimization problem, the interface between the two subproblems is the weight factor μ P (q). An illustration of the scheme is shown in Figure 2. The average packet delay D(N) is dependent on the rate allocation r P , which is controlled by the weight factor μ P . Thus in Subproblem 1 we aim to find the optimal weight factor which minimize averaged delay D(N). The obtained weight factor μ P is then used in Subproblem 2 to calculate the rate allocation r. The details of this scheduling algorithm is introduced in the next section.

Scheduling Design
In this section, we introduce our scheduling policy. First, we solve the delay-optimization problem for a queueing system with a static channel and no new packet arrivals. Then, we adapt the scheduling policy to the queueing system with dynamic channels and random packet arrivals.

Scheduling Policy for a Static
where q n m , r n m denote the queue length and transmit rate of user m in time slot n. For convenience, we also use the superscript to denote the time slot in the following. Extending the problem (16) to each queue state q n we have the equivalent optimization problem The problem (17) states a convex optimization problem and we can solve it using standard "ready-to-use" methods. However, this problem involves parameters over N time slots and M users, which is very complicated, especially if N is 6 EURASIP Journal on Advances in Signal Processing (3) Set the order π so that q 1 π(1) /r (0) π(1) ≥ q 1 π(2) /r (0) π(2) ≥ · · · ≥ q 1 π(M) /r (0) π(M) .
is the predefined error tolerance of η m .
large. Therefore, we introduce in the following an iterative algorithm called idle state prediction algorithm to solve the problem.
Formulating the Lagrangian function of (17) L(r n , λ n ) = Denote η * m = α m N − N t=1 λ t m , if η * m is known, we can get the optimal μ n m with and the delay-optimization problem is transformed into where μ n is the vector of weight factors in the nth time slot.
The parameters η * m in (19) can be interpreted as the expected service time of user m if the optimal solution is applied. In the time slots n > η * m the buffer of user m is emptied and the corresponding transmitter is in idle state. Based on this property, η * is obtained with an iterative approach given in Algorithm 1. Proof. In any time slot n > η * m , we have μ n m = 0 which means the buffer of ith user is empty at the nth time slot. In any n ≤ η * m , the mth buffer must be nonempty. Therefore, if η (t) = η * , we have q m ( η * m ) = 0, for all m ∈ M, and η (t+1) = η * = η (t) . The break condition in Step (6) is fulfilled and the algorithm stops at the optimum.

Scheduling Policy for Dynamic Channels.
It is worth noting that if channel state h(n) varies over time and the base station has the knowledge of each channel state in advance, the algorithm in the previous subsection can also be used in this case with some modifications. However, such a noncausal scheduler is not realizable. The base station has usually only the current channel state information and the statistical knowledge of the channel. In this case, the optimal weight factors are calculated by the ergodic achievable rate region and under the assumption that there is no new packets arrival. In the next time slot, the weight factors must be recalculated according to the new queue state.
If no new packet arrives after the time slot n = 0, the expected delay for a given policy P is where r P n m is the rate allocated by the policy P for the mth user at nth time slot. From Theorem 2, we know that if P is a throughput-optimal policy, then E r P h n , q n = arg max r∈Cerg(P) where μ P is independent of the current channel state. Hence, the optimization problem is equivalent to Then, the optimization problem can be solved using Algorithm 2.
In the system with new packet arrivals, the weight vector μ should be recalculated according to the new queue state and the rate allocation is determined by μ and the current channel state h(n).
As we introduced in Section 2, if we chose the factor α m = 1/ρ m , the limit lim N→+∞ D(N) represents the average delay of each packet. Average arrival rates ρ m can be estimated by previous arrival processes. However, even if the estimation deviates from the actual arrival rate, lim N→+∞ D(N) can still be considered as a useful delay measurement.
The ergodic achievable rate region is also estimated based on the history. The ergodic region is calculated from a number of sampled fading states, thus the computational complexity might be very high. In [9], a method is introduced to approximate the boundary surface of C erg (P) by utilizing a hypersphere. Only M + 1 boundary points on C erg (P) are necessary to characterize the hypersphere so that the complexity is significantly reduced.
In order to prove the throughput-optimality of the policy, we need some technical propositions. The following propositions show the scheduling behavior as the queue length in the system increases. Supposing that the queue length of some users are bounded by some constant c ≥ 0, while the sum of queue length q is increasing, we denote the set of these users as G 1 := {m | q m ≤ c, m ∈ M} and the remainder as G 2 = M/G 1 .

Proposition 1.
If q i is bounded, i ∈ G 1 , and q j is unbounded, j ∈ G 2 while q is increasing, there exists some B > 0, so that for arbitrary 1 > 0.
Proof. We denote the expected service time for user i, j as η i , η j and η i , η j where the initial queue length of user i is fixed to q i and the queue length of user j is increased such that q j > q j . Suppose then the weight factor The rate allocation r n i and r n i are determined by the weight factor μ n i , μ n i , then it holds r n i ≥ r n i , ∀n ∈ 1, . . . , η i .
Further, we have where h n is the current channel state. end for Algorithm 2 and reach a contradiction. Therefore, it is shown that if q i is fixed, η i /η j monotonously decreases with growing q j as long as η i /η j > 0 and the proof follows.
Proof. Consider two queue state q and q , q = θ q for some θ > 1. The estimated service time for queue state q and q is denoted as η, η and the expected rate allocation at the nth time slot is r n and r n . Without loss of generality, we set η i /η j > 1. Suppose then η i /η j > η i /η j follows which leads to the contradiction to (35). Hence, η i /η j decreases with q as long as η i /η j > 1 and the proof follows. Proof. For the proof of weak stability it is sufficient to show that for any ρ ∈ C erg (P), the Lyapunov drift ΔV is negative for some lower bounded function V : R M + → R + [21,23]. Supposing that there are i ∈ G 2 whose queue lengths are unbounded, we choose Since q i is unbounded while r i < c r , for all i ∈ G 2 , the drift Choose arbitrary j ∈ G 2 and according to Propositions 1 and 2, we have Define β = max r∈Cerg(P) min i∈M (r i − ρ i ), we have Since the first addend in (41) is constant and the last three addends vanish by increasing q n , the drift ΔV < 0 for q n > B if B is sufficiently large. Hence, the Markov chain is weakly stable and the proof follows.

Numerical Evaluations
In order to evaluate the delay performance of the introduced ISP policy, we compare our policy with two other r 1 (Mbit/s) C erg (P) ρ Figure 3: Ergodic achievable rate region of an OFDMA system for 2 users. 7 sets of arrival rates (x-marks in the figure) are chosen from the inside/outside of the rate region to test the throughput and delay performance of the system. throughput-optimal policies: MWMS [3] and QPS [9]. MWMS uses the queue length as the weight factor in the maximization problem: For the QPS policy, the weight vector is chosen as the norm at the boundary point of C erg (P), where E{r P (h n , q n )|q n } is proportional to q n , E r P h n , q n = q n max xq n ∈Cerg(P) x, where x is a scalar. The performance of the three schedulers are compared for an OFDMA system as described in [20]. The system has 250 orthogonal subcarriers and an entire bandwidth of 2.5 MHz. The multipath channel is modeled as i.i.d block fading and the length of channel impulse response L m = 4, for all m ∈ M. The length of a time slot T is 2 milliseconds and in every slot 27 OFDM symbols are transmitted per subcarrier. The modulation is adapted to the different channel states on each subcarrier and can be chosen from QPSK, 16QAM, 64QAM. The source data is coded at rate 2/3, so that the decoding error probability at the receiver is lower than 1e-3. For an average receive SNR of 15 dB the ergodic achievable rate region for two users is shown in Figure 3. Note that the small number of users is not a limitation but facilitates the description of the ergodic achievable rate region.
We  Figures 3 and 4, the arrival rates are converted to Mbit/s for convenience. In order to verify the stability properties of the system the last set of arrival rates is chosen to lie outside the ergodic achievable rate region.
The average packet delay in the system with the selected sets of arrival rates is shown in Figure 4. It can be seen that for the sets of arrival rates inside the ergodic rate region the system is kept stable in the sense that the average packet delay is finite. For the arrival rates ρ = [977; 2440] packet/s the packet delay tends to go to infinity. All three scheduling policies are throughput-optimal. Compared to the other two scheduling policies, the introduced scheduling policy has the best delay performance and achieves a significant gain. Note that in order to show the rapid growth of the delay time by increased arrival rates, the y-axis is logarithmically scaled.
In Figure 5, we compare the delay performance with respect to the number of supported users in the system. The number of users is increased from M = 2 to M = 16, while the sum of expected arrival rate remains the same. Denote the sum of expected arrival rate by S ρ , we set ρ i = 2S ρ /3M for i ∈ M, if i is odd and ρ i = 4S ρ /3M for i ∈ M, if i is even. The other physical parameters are the same as in Figures 3 and 4. Figure 5 shows the average packet delay in the system resulting from the different scheduling policies. Solid lines are simulated by arrival rate set 1 with S ρ = 1098 packet/s. by arrival rate set 3 with S ρ = 2196 packet/s. Same as in Figure 4, it can be observed that the delay increases with increasing S ρ . Fixing S ρ , the delay decreases with the number of users due to multiuser diversity. At the same time, because of higher flexibility in resource allocation, ISP scheduler provides even more performance gain in delay than the other two schedulers. In case of M = 6, ISP scheduler achieves about 30% reduction in averaged delay compared to QPS.

Conclusion
In this paper, we presented a concept to design throughputoptimal scheduling policies for cellular BC systems. In general it is shown that the scheduling problem can be formulated as a weighted sum rate maximization problem, where the characteristics of the scheduling policy is determined by the choice of weight factors in the maximization problem classifying all throughput-optimal policies. Based on this concept, a throughput-optimal policy is developed to achieve low delay performance. The weight factors achieving the minimum averaged delay are obtained by an iterative procedure, called idle state prediction (ISP) algorithm. The convergence of the algorithm as well as the throughputoptimality of the scheduling policy are proven. Numerical results show that ISP reduces significantly average packet delay compared to other existing scheduling policies. In systems with larger number of users, this advantage becomes even more noticeable due to higher flexibility in resource allocation.