Utility maximizing load balancing policies

Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve this upper bound in a large-scale regime. Furthermore, the transient and stationary behavior of these asymptotically optimal load balancing policies is characterized on the scale of the number of server pools, in the same large-scale regime.


Introduction
We consider a service system where incoming tasks are instantaneously assigned to one out of many heterogeneous server pools. All the tasks sharing a server pool are executed in parallel and the execution times do not depend on the class of the server pool or the number of tasks currently contending for service. Nevertheless, associated with each server pool is a concave utility function which does depend on the class of the server pool and the number of tasks currently sharing it.
The goal is to assign tasks so as to maximize the overall utility of the system, defined as the aggregate utility of all the server pools normalized by the number of server pools. We derive an upper bound for its stationary mean through an optimization problem where the optimization variable is a sequence that describes the distribution of a fractional number of tasks across the server pools; the objective of the problem is the overall utility function, and the main constraint is that the total number of tasks must be equal to the offered load of the system. We construct an optimal (fractional) task assignment that solves this problem and has a particularly insightful structure, and we formulate the upper bound for the mean stationary overall utility in terms of this solution.
Armed with the above insight, we propose and analyze two assignment policies that maintain the occupancy state of the system aligned with an optimal task assignment. Specifically, we examine a policy that assigns every new task to a server pool with the largest marginal utility; this policy is dubbed Join the Largest Marginal Utility (JLMU). We also introduce a multi-threshold policy that follows the same greedy principle but only approximately, and uses significantly less state information. The optimal threshold values depend on the typically unknown offered load of the system and are adjusted over time through an inbuilt learning scheme; thus we name this policy Self-Learning Threshold Assignment (SLTA). Assuming exponential service times, we characterize the asymptotic transient and stationary behavior of both policies on the scale of the number of server pools, and we prove that both policies achieve the upper bound for the mean stationary overall utility as the number of server pools grows large.
A fundamental difference between JLMU and SLTA is that the former is agnostic to the offered load, whereas the thresholds of the latter must be set in terms of the offered load to achieve the asymptotic optimality. We demonstrate that the optimal threshold values depend solely on the offered load and that the online learning scheme of SLTA finds these values without any prior knowledge of the offered load.

Main contributions
The main contribution of this paper is an upper bound for the mean stationary overall utility that is asymptotically tight for exponentially distributed service times, and thereby serves as a crucial performance benchmark. The asymptotic tightness of the upper bound is proved by studying the stationary behavior of JLMU and SLTA in the regime where the number of server pools grows large, and by establishing that both assignment policies achieve the upper bound in the latter regime.
The analysis of JLMU is based on a fluid limit given by an infinite system of differential equations with a discontinuous right-hand side. We prove that the associated initial value problem always has a unique solution, by making a connection with a system of integral equations, expressed in terms of Skorokhod one-dimensional reflection mappings, and using a uniqueness result for certain Kolmogorov backward equations. Moreover, we show that the fluid limit holds with respect to an 1 norm, and that the system of differential equations is globally asymptotically stable with respect to this norm. These results are used to prove that the stationary distribution of the process that describes the occupancy state of the system converges in 1 to an optimal task assignment for the offered load of the system. The asymptotic optimality of JLMU is then established by proving that the stationary overall utilities form a convergent and uniformly integrable sequence of random variables; the proof of the latter properties exploits a representation of the overall utility as a linear functional on 1 and our convergence results with respect to the 1 norm.
While SLTA is simple to implement, its analysis is inherently challenging due to the complex interdependence between two components of the policy. Namely, the dispatching rule, which depends on the multiple thresholds, and the online learning scheme, which adjusts the thresholds over time. An additional technical difficulty is that the learning scheme is triggered by excursions of the occupancy state of the system that asymptotically vanish on the scale of the number of server pools. All these challenges are overcome by using a methodology developed in [12], which differs from a traditional fluid limit analysis. We generalize this proof technique to the heterogeneous setting considered in this paper, and we also extend the methodology to establish weak convergence of the stationary distribution of the occupancy process, to an optimal task assignment for the offered load of the system. The asymptotic optimality of SLTA is then proved in a similar way as for JLMU, by establishing that all of our limit theorems hold with respect to the 1 norm and exploiting the linear representation of the overall utility function.

Related work
Load balancing and task assignment in parallel-server systems has received immense attention in the past decades; some relevant papers are [6,22,24,[31][32][33]. While traditionally the focus used to be on performance, more recently the implementation overhead has emerged as an equally important issue. In large-scale deployments, this overhead has two main sources: the communication burden of messaging between the dispatcher and the servers, and the operational cost of storing and managing state information at the dispatcher [8,9]. We refer to [3] for an extensive survey on scalable load balancing.

Single-server dynamics
Infinite-server dynamics Homogenenous setting Heterogeneous setting [3,6,8,22,24,[31][32][33] [12, 13,26] [10, 11,18] Present paper Figure 1: Schematic view of some of the related work. Most of the load balancing literature concerns systems of parallel and homogeneous single-server queues; this vast literature is surveyed in [3]. Some recent papers study single-server dynamics in heterogeneous settings or infinite-server dynamics in homogeneous settings, whereas the present paper considers a heterogeneous system with infinite-server dynamics.
While the load balancing literature has been predominantly concerned with systems of parallel single-server queues, the present paper considers an infinite-server setting where the service times of tasks do not depend on the number of competing tasks. This feature is characteristic of streaming applications, where the level of congestion does not significantly affect the duration of tasks. The level of congestion has, however, a strong impact on the amount of resources received by individual streaming sessions, and thus on the experienced quality-of-service, which can be modeled through utility functions. Infinite-server dynamics have been commonly adopted as a natural paradigm for modeling streaming sessions on flow-level [1,21] and the problem of managing large data centers serving streaming sessions has been recently addressed in [26]. Systems with infinite-server dynamics have also been analyzed in [20,28,29,34], which concern loss models that are different in nature from the setting considered in the present paper.
When the server pools are homogeneous, the overall utility is a Schur-concave function of the vector describing the number of tasks at each server pool. In this case, maximizing the aggregate utility of the system boils down to equalizing the number of tasks across the various server pools. Join the Shortest Queue (JSQ) maximizes the mean stationary overall utility of the system for exponential service times, and in fact has stronger stochastic optimality properties [23,30]. In the homogeneous setting, JLMU reduces to JSQ and is thus optimal for exponential service times. Also, SLTA reduces to the policy considered in [12,13], which asymptotically matches the performance of JSQ on the fluid and diffusion scales for exponentially distributed service times. While the policy considered in [17,35,36] is similar to SLTA in name, this policy does not equalize the queue lengths.
The problem of maximizing the overall utility of the system is more challenging if the server pools are heterogeneous as in this paper. Heterogeneity is the norm in data centers, where servers from different generations coexist because old machines are only gradually replaced by more powerful versions; as shown in Figure 1, this feature has been recently addressed in the load balancing literature for single-server models [10,11,18] but not in the infinite-server context. When the server pools are heterogeneous, it is no longer optimal to maintain an evenly balanced distribution of the load, in fact it is not even obvious at all how tasks should be distributed in order to maximize the overall utility function, and the optimal distribution of tasks across the server pools depends on this function. Another striking difference with the homogeneous setting is that JLMU is generally not optimal in the pre-limit for exponentially distributed service times; we establish that, in general, the optimality is only achieved asymptotically in the heterogeneous case.
One of the most interesting features of SLTA is its capacity to learn the offered load of the system. The problem of adaptation to unknown demands was previously addressed in [14,15,27] in the context of single-server models, by assuming that the number of servers can be right-sized on the fly to match the load of the system. However, in the latter papers the dispatching rule remains the same at all times since the right-sizing mechanism alone is sufficient to maintain small queues, by adjusting the number of servers. Different from these right-sizing mechanisms, the learning scheme of SLTA modifies the parameters of the dispatching rule over time to maximize the overall utility of the system.

Outline of the paper
In Section 2 we introduce some of the notation used throughout the paper and we formulate the upper bound for the mean stationary overall utility. In Section 3 we specify the JLMU and SLTA policies, and we state their asymptotic optimality with respect to the mean stationary overall utility. In Section 4 we present several results that pertain to the asymptotic transient behavior of these two policies, and that are used to establish their asymptotic optimality. In Section 5 we prove the upper bound for the mean stationary overall utility. In order to characterize the asymptotic behavior of JLMU and SLTA, we construct systems of different sizes on a common probability space in Section 6, where we also prove relative compactness results. Limit theorems for JLMU and SLTA are proved in Sections 7 and 8, respectively, and the asymptotic optimality of these two policies is established in Section 9. Some proofs are deferred to Appendices A, B and C.

Problem formulation
In this section we define some of the notation used throughout the paper and we state the upper bound for the mean stationary overall utility. In Section 2.1 we introduce two descriptors for specifying the state of the system and we define the overall utility function. In Section 2.2 we present the optimization problem used to derive the upper bound for the mean stationary overall utility. In Section 2.3 we construct a solution of this problem explicitly, and in Section 2.4 we use the constructed solution to formulate the upper bound for the mean stationary overall utility.

Basic notation
Consider a system with m classes of server pools. All the tasks sharing a server pool are executed in parallel and the execution times do not depend on the class of the server pool or the number of tasks currently contending for service. Nevertheless, associated with each server pool is a concave utility function which does depend on the class of the server pool and the number of tasks sharing it. For example, these functions can be used to model the overall quality-of-service provided to streaming tasks sharing an underlying resource with a fixed capacity. The objective is to assign the incoming tasks to the various server pools so as to maximize the aggregate utility of all the server pools in stationarity.
The number of server pools is denoted by n and the number and fraction of server pools of class i are denoted by A n (i) and α n (i) = A n (i)/n, respectively. We assume that tasks arrive as a Poisson process of intensity nλ with independent and identically distributed service times of mean 1/µ, and we define X n (i, k) as the number of tasks in server pool k of class i; boldface symbols are used in the paper to indicate time-dependence. Server pools of the same class that have the same number of tasks are exchangeable, thus we usually consider a different state descriptor. Specifically, we let denote the fraction of server pools which are of class i and have at least j tasks. The values of X n and q n at a given time are referred as the occupancy state or task assignment. The concave utility function associated with server pools of class i is denoted by u i and the overall utility of the system is defined as the aggregate utility of all the server pools normalized by the number of server pools. More precisely, we let Note that q n (i, j) − q n (i, j + 1) is the fraction of server pools of class i with j tasks. Thus, the overall utility may equivalently be expressed as While the overall utility function is generally not linear as a function of X n , it is always linear as a function of q n , as shown by the above expression. The total number of tasks in the system, normalized by the number of server pools, can be expressed in terms of the occupancy state q n as follows: The quantity j [q n (i, j) − q n (i, j + 1)] represents the number of tasks in server pools of class i with exactly j tasks, normalized by the total number of server pools. Hence, s n indeed corresponds to the normalized total number of tasks. Throughout the paper, we write P and E to denote the probability and expectation with respect to a given probability measure. If ρ := λ/µ denotes the normalized offered load, then the stationary distribution of the total number of tasks is Poisson with mean nρ due to the infinite-server dynamics of the system. Thus, E [s n ] = ρ in stationarity, for any task assignment policy.

Optimization problem
Based on the above, we now formulate an optimization problem which yields an upper bound for the mean stationary overall utility: In order to see that the optimum of (2) yields an upper bound for the mean stationary overall utility, consider any policy such that q n has a stationary distribution. We assume that the policy is such that the evolution of the system over time can be described by a Markov process with a possibly uncountable state space that has a stationary distribution. Let q n be a random variable with the stationary distribution of q n and define E [q n ] as the sequence whose Indeed, the utility functions are concave, so for each i there exists j i ∈ N such that u i (j) and u i (k) have the same sign if j, k > j i . Therefore, (3) follows from Tonelli's theorem.
In addition, E [q n ] satisfies the constraints of (2) because the total number of tasks in stationarity has mean nρ. Thus, E [u(q n )] is upper bounded by the optimum of (2).
Number of servers j Server pool class i Figure 2: Schematic representation of the marginal utilities. White rectangular slots and gray rectangles represent idle and busy servers, respectively. Each of the columns labeled with letters represents a server pool and the dashed lines enclose server pools of the same class. If the tasks sent to a given server pool are always placed in the first idle server from bottom to top, then the marginal utilities written on top of the idle servers indicate the increase in the aggregate utility of the system when the server receives a task.

Structure of an optimal solution
For brevity, we refer to an optimizer of (2) as an optimal task assignment; the term optimal fractional task assignment would be more appropriate since the offered load nρ may not be integral. In this section we define a ranking of the server pools that can be used to construct an optimal task assignment. For this purpose, consider the sets I := {1, . . . , m} × N and I + := {(i, j) ∈ I : j ≥ 1} .
A server pool has coordinates (i, j) ∈ I + if its class is i and it has precisely j − 1 tasks; e.g., in Figure 2, server pool A of class 1 has coordinates (1,4) and both server pools of class 3 have coordinates (3,1). Since server pools with the same coordinates are statistically identical, we may focus on ranking coordinates rather than server pools.
Formally, we define a total order on I + that gives precedence to coordinates associated with larger marginal utilities. The marginal utility of a server pool of class i with j tasks is denoted by ∆(i, j) := u i (j + 1) − u i (j) and represents the change in the utility function of such a server pool if it receives an additional task. The marginal utility of the coordinates (i, j) is just ∆(i, j − 1), the marginal utility of server pools of class i with j − 1 tasks.
Consider the dictionary order ≺ on I, defined by We obtain a total order on I + by writing (i 1 , j 1 ) (i 2 , j 2 ) if and only if one of the following conditions holds:  Figure 3: Distribution of the offered load across the various server pools under the optimal task assignment of (4) for m = 3 and α n = (1/2, 1/4, 1/4). On the left, u i (x) = x log(r(i)/x) with r = (5, 10, 15). On the right, In particular, the marginal utility of server pools with coordinates (i 1 , j 1 ) is smaller than or equal to that of server pools with coordinates (i 2 , j 2 ) (i 1 , j 1 ). The dictionary order is used to break the tie when both coordinates are associated with the same marginal utility, but a different tie breaking rule could be used instead.
Consider the task assignment q * n defined by for all (i, j) ∈ I + .
In Section 5 we prove that q * n constitutes an optimal task assignment if σ * n is defined as the unique element of I + that satisfies Figure 3 illustrates the optimal task assignments obtained through (4) for two sets of utility functions and different values of ρ. If n is such that nα n (i) and nρ are integers for all i, then the plots can be interpreted as sets of n adjacent columns, where each column represents a server pool and the colored portion of a column indicates the number of tasks sharing the server pool, as in the diagram of Figure 2. The thick vertical lines separate the server pool classes and the quantities q * n (i, j) can be read off by rotating the plots. The left plot corresponds to utility functions of the form u i (x) = xg(r(i)/x), with g a concave and increasing function. These utility functions can be used to model the aggregate quality-of-service provided to streaming tasks sharing a single server pool. The quantity r(i) represents the total amount of resources in a server pool of class i and g(r(i)/x) models the quality-of-service provided to a single task when the server pool is shared by x tasks and each task gets a fraction r(i)/x of the total resources. For a given ρ, the left plot of Figure 3 depicts server pools with roughly r(i)/c(r, ρ) tasks, with c(r, ρ) a normalizing constant that does not depend on g; for some values of ρ, all server pools have exactly r(i)/c(r, ρ) tasks, but in other cases some of these numbers are rounded. This behavior is explained by noting that the derivative of u i (x) can be expressed as a function of r(i)/x, thus the occupancy levels r(i)/c(r, ρ) equalize the marginal utilities.
In the left plot of Figure 3, the occupancy levels of the various server pools maintain approximately fixed ratios as ρ increases. The right plot shows a completely different behavior: as ρ increases from 0 to 1.25, only the occupancy of server pools of class 2 grows, but from 1.25 to 6.25, only the occupancy of server pools of class 3 grows, and eventually exceeds the occupancy of server pools of class 2. Furthermore, as ρ increases beyond 7.5, only the occupancy of server pools of class 1 increases.

Performance upper bound
We now state the upper bound for the mean stationary overall utility, which follows from the observations of Sections 2.2 and 2.3; a rigorous proof is provided in Section 5. Theorem 1. Consider any task assignment policy such that the occupancy process q n has a stationary distribution, and let q n be a random variable distributed as this stationary distribution. Then In the following section we establish that the upper bound is asymptotically achievable when service times are exponentially distributed. In particular, we will see that JLMU achieves the upper bound of Theorem 1 as the number of server pools grows large; recall that JLMU is generally not optimal in the pre-limit, not even for exponential service times. Moreover, we will establish that SLTA also achieves the upper bound asymptotically, while relying on considerably less state information.
Before proceeding, it is illustrative to draw an analogy between the setting considered in this paper and the load balancing literature for systems of parallel single-server queues, where the primary objective is to minimize queueing delay. The natural policy for the setting considered in this paper is JLMU, while the natural policy for minimizing queueing delay in systems of parallel single-server queues is JSQ. The deployment of these policies involves a considerable communication overhead, or storing and managing a significant amount of state information. In the setting considered in this paper, SLTA provides a asymptotically optimal performance for exponential service times and uses substantially less state information than JLMU. From this perspective, SLTA is the counterpart of JIQ in the load balancing literature for systems of parallel single-server queues [22,31].

Load balancing policies
In this section we describe the load balancing policies considered in the paper and we state their asymptotic optimality with respect to the mean stationary overall utility. In Sections 3.1 and 3.2 we specify JLMU and SLTA, respectively. In Section 3.3 we define stochastic models, based on continuous-time Markov chains, for the analysis of both of these policies. In Section 3.4 we state the asymptotic optimality result.

Join the Largest Marginal Utility (JLMU)
JLMU assigns every new task to a server pool that currently has the best ranked coordinates, thus also the largest marginal utility. Formally, define for each occupancy state q. The maximum is taken with respect to and the condition q(i, j − 1) > q(i, j) implies that some server pool of class i has precisely j − 1 tasks. If q n is the occupancy state right before a task arrives, then JLMU assigns the task to a server pool of class σ i (q n ) with exactly σ j (q n ) − 1 tasks. The coordinates obtained through (6) correspond to server pools with the largest marginal utility by definition of . In addition, observe that the dictionary order is used to break ties between coordinates associated with the same marginal utility. If two server pools have the same coordinates, then it does not matter which of them is assigned the new task since they are statistically identical. For definiteness, we postulate that the tie is broken uniformly at random.
If all the server pools have the same utility function, then JLMU reduces to JSQ and the overall utility is a Schur-concave function of X n . If in addition the service times are exponential, then the stochastic optimality properties proven in [23,30] for JSQ imply that JLMU maximizes the mean stationary overall utility in this homogeneous setting. It might be natural to expect that the optimality with respect to the mean stationary overall utility extends to the heterogeneous setting. We refute this, however, in Section 9.3, where we construct a heterogeneous system for which JLMU is strictly suboptimal. The constructed example also hints at the underlying reasons for the suboptimality in the heterogeneous case. Essentially, instead of always assigning incoming tasks greedily, such that the increase in the overall utility is maximal, it is sometimes advantageous to dispatch the new tasks conservatively, to hedge against pronounced drops of the overall utility that may be caused by a quick succession of departures. The right balance between greedy and conservative actions depends intricately on the utility functions, but we prove that JLMU is always asymptotically optimal for exponential service times, regardless of the specific set of utility functions; this result is stated formally in Section 3.4.

Self-Learning Threshold Assignment (SLTA)
JLMU relies on complete information about the number of tasks per server pool, which could be impractical in large-scale deployments. In contrast, SLTA only requires to store at most two bits per server pool, which is considerably less state information. In order to specify this policy, we need to describe its two components. Namely, the dispatching rule, for assigning the incoming tasks to the server pools, and the learning scheme, for dynamically adjusting a set of thresholds that the dispatching rule uses.
Consider the unique enumeration {(i k , j k ) : k ≥ 1} of I + such that (i k , j k ) (i k+1 , j k+1 ) for all k. Given r ≥ 1, we define a set of thresholds by i (r) := max {j ≥ 0 : (i, j) (i r , j r ) or j = 0} for all i.
Recall the optimal task assignment defined at the end of Section 2.3. The learning scheme keeps an estimate (i rn , j rn ) of the coordinates σ * n , which depend on the typically unknown offered load of the system. The index r n determines this estimate and is used to compute thresholds from the above expression, which are in turn used to assign tokens to the server pools. Specifically, a server pool of class i with exactly j − 1 tasks has: a green token if j − 1 < i (r n ), a yellow token if i = i rn and j − 1 ≤ i (r n ).
The first condition is equivalent to r n > 1 and (i, j) (i rn , j rn ), and the second condition is equivalent to i = i rn and (i, j) (i rn , j rn ). Also, a server pool can have both a green and a yellow token at the same time. As indicated in Figure 4, a larger increase in the overall utility is obtained by dispatching tasks to server pools with green tokens first, then to server pools with yellow tokens and only afterwards to server pools without tokens.
The tokens are used by the dispatching rule. Specifically, when a task arrives, it is assigned to a server pool according to the following criteria.
In the presence of green tokens of class i = i rn−1 , the dispatcher picks one of these green tokens uniformly at random, and if only green tokens of class i = i rn−1 remain, then one of these is picked. Then the task is sent to the corresponding server pool.
In the presence of only yellow tokens, the dispatcher picks a yellow token uniformly at random and sends the task to the corresponding server pool.
Otherwise, the task is sent to a server pool chosen uniformly at random.
If (i rn , j rn ) are the coordinates σ * n defined in (5), then this dispatching rule drives the occupancy state of the system towards the optimal task assignment q * n specified in (4). The learning scheme aims at finding the coordinates σ * n , which depend on the typically unknown offered load. The learning scheme is parameterized by β n > 0 and adjusts the Number of servers j Server pool class i Figure 4: Schematic representation of the thresholds and tokens used by SLTA for (i rn , j rn ) = (2, 2). The thresholds are indicated by thick horizontal lines that cross the server pools, and the tokens are represented using squares and circles underneath the server pools; a square corresponds to a green token and a circle corresponds to a yellow token. Assuming that the rectangular slots within a server pool are always filled from bottom to top, the slots marked with an * provide a marginal utility of ∆(i rn , j rn − 1). The symbols ≤ and ≥ indicate how the marginal utility of the other slots compares to the latter value.
value of r n at certain arrival epochs, in steps of one unit. Specifically, when a task arrives, the learning scheme acts only under the following circumstances.
If the system has at least nβ n green tokens and at least one belongs to a server pool of class i rn−1 , then r n is decremented by one after the task is dispatched.
If the number of yellow tokens is smaller than or equal to one and there are no other tokens, then r n is incremented by one after the task is dispatched.
Observe that exactly one of the thresholds changes when the value of r n is modified, and that this threshold changes by one unit. Also, nq n (i, i (r n )) and nα n (i rn ) − nq n (i rn , j rn ) are the number of green and yellow tokens, respectively.

Stochastic models
If service times are exponentially distributed, then q n and (q n , r n ) are continuous-time Markov chains when the load balancing policies are JLMU and SLTA, respectively. In either case, the process s n that describes the normalized total number of tasks is defined by (1). Due to the infinite-server dynamics of the system, ns n has the law of an M/M/∞ queue with arrival rate nλ and service rate µ.
Let 1 be the space of absolutely summable sequences in R I , equipped with the norm Throughout we assume that s n (0) is finite, so q n (0) takes values in 1 . As a result, if we let F n := {k/n : 0 ≤ k ≤ n}, then q n takes values in the set If the load balancing policy is JLMU, then the state space of q n is defined as the subset S n of Q n that is reachable from an empty occupancy state. If the load balancing policy is SLTA, then the state space of (q n , r n ) is the subset S n of Q n × {r ∈ N : r ≥ 1} that is reachable from an empty occupancy state with r n = 1. The notation used for the processes s n and q n , for the state space S n , and for some other objects that will be defined later, is exactly the same for JLMU and SLTA, but we always indicate which policy is being considered.

Asymptotic optimality
Throughout the rest of the paper, we assume that there exist constants α(i) ∈ (0, 1) and a random variable q 0 such that the following limits hold: In analogy with (5) and (4), we consider the unique σ * = (i r * , j r * ) ∈ I + such that and we define a occupancy state q * in terms of σ * by In Section 9.1 we establish that q n and (q n , r n ) have a unique stationary distribution for all n, when the assignment policies are JLMU and SLTA, respectively. The following theorem is proved in Section 9.2 and implies that both policies are asymptotically optimal if the marginal utilities are bounded and the service times are exponentially distributed; we also require that (7) holds and we impose some mild technical assumptions, to be stated in Section 4.2.1. Note that the condition on the marginal utilities always holds when the q n (t) qn utility functions are non-decreasing, due to the concavity of these functions.

Theorem 2.
Suppose that service times are exponentially distributed. Then the following statements hold under (7) and the assumptions of Section 4.2.1.
(a) Suppose that JLMU is used and let q n have the stationary distribution of q n . Then the random variables q n converge weakly in 1 to q * .
(b) Suppose that SLTA is used and let (q n , r n ) have the stationary distribution of (q n , r n ).
The random variables (q n , r n ) converge weakly in 1 × N to (q * , r * ).
Furthermore, if the load balancing policy is either JLMU or SLTA and the marginal utilities are bounded, then the random variables u(q n ) are uniformly integrable and The claims concerning the stationary overall utilities are proved using (a) and (b), as well as the fact that u(q) is a bounded linear functional of q ∈ 1 . In order to establish (a) and (b), we first use drift analysis to prove that the random variables in (a) and (b) are tight in 1 and 1 × N, respectively. Then (a) is established through an interchange of limits argument based on a fluid limit and a global asymptotic stability result for the fluid dynamics; these two results are stated in Theorems 5 and 4, respectively. A different type of argument is used to prove (b). Namely, the fluid limit step is circumvented and Theorem 6 serves as the counterpart of Theorems 4 and 5, as illustrated in Figure 5.
Deriving a fluid limit for a SLTA system would be inherently difficult due to the intricate interdependence between the dispatching rule and the learning scheme, and because the actions of the learning scheme are triggered by excursions of the occupancy process that have vanishing size. We deal with these challenges using a methodology of [12] to derive the fluid approximation of Theorem 6, which consists of asymptotic bounds, over arbitrarily long intervals of time, for the occupancy state and the thresholds. As noted earlier, this   fluid approximation serves as a counterpart of both the fluid limit and the global asymptotic stability results for JLMU.

Simulation experiments
The asymptotic optimality of JLMU and SLTA is illustrated by Table 1, which shows estimates of the mean stationary overall utility E[u(q n )] for simulation experiments with different values of n. All the estimates correspond to systems with two server pool classes of equal size and utility functions of the form u i (x) = x log(r(i)/x) for r = (20,30). Also, two different values of ρ are considered, so that the optimal task assignment defined in (4) takes two distinct forms. For ρ = 9.75, the optimal task assignment is such that all server pools of class 1 have 8 tasks, half of the server pools of class 2 have 11 tasks and the other half have 12 tasks. For ρ = 10, the optimal task assignment is such that all server pools of class 1 have 8 tasks and all server pools of class 2 have 12 tasks.
All the server pools of the same class have the same number of tasks when ρ = 10, and thus we say that the optimal task assignment q * n is integral; i.e., the optimal task assignment is integral if q * n (σ * n ) = 0. In contrast, the optimal task assignment is fractional when ρ = 9.75 since server pools of class 2 may have either 11 or 12 tasks. Server pools of class 1 behave similarly in the fractional and integral settings: almost all of the time all server pools of class 1 have precisely 8 tasks when n is moderately large. However, the behavior of server pools of class 2 depends on the setting. In the fractional case, server pools typically have 11 or 12 tasks and the fractions of server pools with 11 and 12 tasks oscillate around a half. In the integral case, server pools typically have 12 tasks but a small number of server pools sometimes have 11 or 13 tasks instead. The aggregate utility of the system decreases by ∆(2, 11) whenever a server pool of class 2 goes from 12 to 11 tasks, and increases by the same quantity when the server pool goes from 11 to 12 tasks. Therefore, the contributions to the average overall utility of the oscillations observed in the fractional case roughly balance each other. In the integral case, the aggregate utility increases by ∆(2, 12), instead of ∆ (2,11), if a server pool of class 2 goes from 12 to 13 tasks. So in the integral case the contributions to the average overall utility of class 2 server pools that drop to 11 tasks or reach 13 tasks are amplified by different marginal utilities. As a result, the mean stationary overall utility E[u(q n )] is closer to the upper bound u(q * n ) in the fractional case, as reflected by the estimates in Table 1.
Although there is a difference between the fractional and integral settings, in both settings the empirical mean of the overall utility u(q n ) is extremely close to the upper bound u(q * n ) across all the values of n listed in Table 1. Furthermore, the deviation of the empirical mean from the upper bound approaches zero as n increases in both cases. We also observe that the empirical mean of u(q n ) is almost the same for JLMU and SLTA in all the experiments, and particularly for the largest values of n.
A final remark on the simulation experiments is that the empirical mean of u(q n ) is slightly larger than u(q * n ) in a few of the experiments within the fractional setting: both for JLMU and SLTA when n = 350 and just for JLMU when n = 450. It may be checked that the statement of Theorem 1 still holds if the stationary expectation sign is replaced by a time average and the upper bound is computed through (4) and (5) Table 1, and in all the experiments the empirical mean of u(q n ) is indeed smaller than this empirical upper bound. Thus, the experiments where the empirical mean of u(q n ) slightly exceeds the upper bound u(q * n ) are an indication of how close the performance of JLMU and SLTA is to optimal.

Approximation theorems
In this section we assume exponential service times and we state several results used to prove Theorem 2. In Section 4.1 we specify a fluid model of a JLMU system, based on differential equations, and we state some properties of this model. In Section 4.2 we state limit theorems that characterize the asymptotic transient behavior of JLMU and SLTA.

Fluid model of JLMU
Consider a large-scale system where the load balancing policy is JLMU, and assume that α(i) is the fraction of server pools of class i. Then the occupancy state of the system remains within the set The evolution of the occupancy state of this large-scale system can be modeled through the system of differential equations introduced in the following definition. Definition 1. We say that q : [0, ∞) −→ Q is a fluid trajectory if the coordinate functions q(i, j) are absolutely continuous for all (i, j) ∈ I and the following conditions hold almost everywhere with respect to the Lebesgue measure: where Λ : In the latter definition, λ is the arrival rate of tasks normalized by the number of server pools, µ is the service rate of tasks and q(i, j) represents the fraction of server pools which are of class i and have at least j tasks. Thus, the system of differential equations (10) has a simple interpretation. The right-most term of (10a) corresponds to the departure rate of tasks from server pools of class i with exactly j tasks and Λ(q, i, j) represents the arrival rate of tasks to server pools which belong to class i and have precisely j − 1 tasks. The definition of Λ is motivated by the following remarks.
Server pools of class i with exactly j − 1 tasks are not assigned additional tasks if (i, j) σ(q). Hence, we should have Λ(q, i, j) = 0 in this case.
All server pools of class i have at least j tasks if (i, j) σ(q). Therefore, Λ(q, i, j) should be equal to the last term of (10a) in this case, since q(i, j) is at its maximum value and thus its derivative should be zero.
The total arrival rate of tasks normalized by the number of server pools is equal to λ and this determines the value of Λ(q, i, j) when (i, j) = σ(q).

Properties of fluid trajectories
The two results stated below are proved in Section 7.1. The first one is a uniqueness theorem for the solutions of (10). Existence is ensured by Theorem 5 of Section 4.2.
Theorem 3. For each initial condition q ∈ Q there exists at most a unique fluid trajectory q with initial condition q(0) = q.
In order to prove this theorem, we first show that all fluid trajectories satisfy an infinite system of integral equations, stated using Skorokhod one-dimensional reflection mappings. The theorem is then proved using a Lipschitz property of these mappings and a uniqueness result for certain Kolmogorov backward equations.
Besides uniqueness of solutions of (10), we also establish that there exists a unique equilibrium point and that this equilibrium point is globally asymptotically stable, i.e., all fluid trajectories converge to the unique equilibrium over time.
Theorem 4. Let q * be as in (9). Then q * is the unique equilibrium of (10). Furthermore, all fluid trajectories converge to q * in 1 over time.
Recall that (9) is the counterpart of (4), which is used to formulate the upper bound for the mean stationary overall utility provided in Theorem 1. It is not difficult to check that u (q * n ) → u (q * ) as n grows large, which hints at the asymptotic optimality of JLMU.

Limit theorems
In Section 6.1 we construct the processes defined in Section 3.3 on a common probability space (Ω, F, P) for all n, in such a way that the sample paths of the occupancy processes lie in the space D 1 [0, ∞) of càdlàg functions with values in 1 , which we endow with the topology of uniform convergence over compact sets. This construction is used to prove limit theorems that characterize the asymptotic transient behavior of JLMU and SLTA. Before stating these theorems, we introduce some mild technical assumptions.

Technical assumptions
As indicated earlier, we assume that (7) holds with q 0 a random variable that takes values in Q and represents the limiting initial occupancy state. The initial number of tasks in the limit, normalized by the number of server pools, is defined as If the load balancing policy is SLTA, then we assume that the first inequality in (8) is strict and that there exists a constant γ 0 ∈ (0, 1/2) such that lim n→∞ β n = 0 and lim inf n→∞ n γ 0 β n > 0.
These assumptions are used to prove that the learning scheme reaches an equilibrium in all large enough systems with probability one. Finally, we adopt the following assumptions about the initial state of the system: there exists a random variable R ≥ 1 such that for all n with probability one. We impose (13b) just to simplify the analysis; this property always holds after a certain time, which depends on the initial state of the system.

Remark 1.
Property (13b) is preserved by arrivals and departures, thus it holds at all times provided that it holds at time zero. Furthermore, every new task is sent to a server pool with coordinates (i, j) (i rn , j rn ) if the number of tokens is positive right before the arrival. Hence, (13b) implies that tasks are sent to server pools with coordinates (i, j) (i rn , j rn ) at all times and for all n with probability one.

Statements of the theorems
First we state a fluid limit for JLMU, proven in Section 7.2. In view of Theorem 4, this fluid limit implies that, as n grows large, the occupancy processes of JLMU approach functions which converge over time to the unique equilibrium of (10).

Theorem 5.
Suppose that the load balancing policy is JLMU. Then there exists a set of probability one Γ with the following property. If ω ∈ Γ, then q n (ω) converges in D 1 [0, ∞) to the unique fluid trajectory with initial condition q 0 (ω).
Since q 0 is arbitrary, the above theorem implies that solutions to (10) exist for all initial conditions. Therefore, Theorems 3 and 5 imply that for each initial condition q ∈ Q there exists a unique fluid trajectory with initial condition q.
The proof of Theorem 5 uses a methodology of [4] to prove that, with probability one, every subsequence of {q n : n ≥ 1} has a further subsequence which converges uniformly over compact sets with respect to a metric for the product topology of R I . Then we show that this convergence in fact holds with respect to || · || 1 and that the limits of convergent subsequences are fluid trajectories, also with probability one.
The counterpart of Theorems 4 and 5 for SLTA is the following result. The proof is provided in Section 8.2 and is based on a methodology developed in [12]. Theorem 6. Suppose that the load balancing policy is SLTA and let σ * and r * be as in (8).

Performance upper bound
In this section we prove Theorem 1, which provides an upper bound for the mean stationary overall utility of a system where the service time distribution has a finite mean but is otherwise general. Recall from Section 2.2 that the optimum of (2) is an upper bound for the mean stationary overall utility. Therefore, we only need to prove that the task assignment q * n defined in (4) is an optimizer of (2).
We say that a sequence q ∈ R I is eventually zero if there exists k > 0 such that q(i, j) = 0 for all i and j > k. The following lemma implies that q * n is an optimizer of (2) if we impose the additional constraint that the solution must be eventually zero.

Lemma 1.
If q satisfies the constraints of (2) and is eventually zero, then u(q) ≤ u(q * n ).
Proof. Since q is eventually zero, it is possible to write In the last expression, the terms of the summation are ordered with respect to , and in particular in non-increasing order of the marginal utilities ∆(i, j − 1). The task assignment q * n is obtained by choosing the coefficients q(i, j) so that the first coefficients are maximal, while all the coefficients add up to ρ. Thus, q * n maximizes the right-hand side of (15).
We now provide a solution of (2), without imposing any additional constraints. Proposition 1. The task assignment q * n defined in (4) is an optimizer of (2).
Proof. By Lemma 1, it suffices to prove that u(q) ≤ u(q * n ) for each q that is not eventually zero and satisfies the constraints of (2). Next we fix one such sequence q and we construct Figure 6: Schematic view of the construction of z(i, · ) from q(i, · ) for some fixed i.
an eventually zero sequence z such that u(q) ≤ u(z) and z satisfies the constraints of (2). Then u(q) ≤ u(z) ≤ u(q * n ) by Lemma 1, as desired.
Choose k ∈ N such that α n (i)k > ρ for all i. For each i, we define z(i, j) iteratively, by Informally, each coefficient q(i, j) can be regarded as a container with capacity α n (i), as shown in Figure 6. For each i, the sequence q(i, · ) is transformed into z(i, · ) in two steps: first we remove all the mass from the coefficients q(i, j) with j > k, and then we place this mass on the coefficients q(i, j) with j ≤ k. In the latter step, we start with the first coefficient, placing as much mass as possible without exceeding the capacity α n (i). The remainder of mass is placed in the following coefficients in the same fashion, and in increasing order of j. Observe that all the mass will have been placed right after the coefficient q(i, k) is done since q satisfies the constraints of (2) and α n (i)k > ρ. Furthermore, the following property holds: The overall utility of the task assignment q satisfies: For the last step, recall that z is eventually zero, so u(z) can be computed as in (15).
The middle equality and last inequality follow from (16) and Lemma 1, respectively.
The proof of Theorem 1 follows easily from Proposition 1.
Proof of Theorem 1. As indicated at the end of Section 2.2, the optimum of (2) upper bounds the mean stationary overall utility E [u(q n )]. Thus, it follows from Proposition 1 that u (q * n ) upper bounds the mean stationary overall utility.

Strong approximations
In this section we construct the processes defined in Section 3.3 on a common probability space for all n. In addition, we prove that {q n : n ≥ 1} is almost surely relatively compact in D 1 [0, ∞) both for JLMU and SLTA. The construction of the processes is carried out in Section 6.1 and the relative compactness results are provided in Section 6.2.

Coupled construction of sample paths
Consider the following stochastic processes and random variables.

Construction for JLMU
Let N λ n (t) := N (nλt) for each t ≥ 0 and each n. This quantity will be used to count the number of tasks arriving to the system with n server pools during the interval [0, t]. Also, denote the jump times of N λ n by {τ n,k : k ≥ 1} and define τ n,0 := 0. For each function q : [0, ∞) −→ Q n and each n, we define two counting processes, for arrivals and departures, denoted A n (q) and D n (q), respectively. The coordinates (i, j) ∈ I of these processes are identically zero if j = 0, whereas the other coordinates are defined as follows: For each n, the functional equation has a unique solution with probability one. More precisely, there exists a set of probability one Γ 0 with the following property: for each ω ∈ Γ 0 and each n, there exists a unique càdlàg function q n (ω) : [0, ∞) −→ Q n that solves (18). This solution can be constructed by forward induction on the jump times of the driving Poisson processes. The assumption q n (ω, 0) ∈ Q n implies that q n (ω, 0) has finitely many non-zero coordinates and ensures that the constructed solution is defined on [0, ∞) with probability one; i.e., the constructed solution does not explode in finite time.
The occupancy processes are defined by extending the above solutions to Ω, setting q n (ω, t) = 0 for all t ≥ 0 and all ω / ∈ Γ 0 . In addition, we let A n := A n (q n ) and D n := D n (q n ), and we note that the sample paths of A n , D n and q n lie in D 1 [0, ∞). The functional equation (18) can now be rewritten as follows: This construction endows the processes q n with the intended statistical behavior. The processes A n (i, j) count the arrivals to server pools of class i with precisely j − 1 tasks and the processes D n (i, j) count the departures from server pools of class i with exactly j tasks. Indeed, A n (i, j) has a jump at the arrival epoch τ n,k if and only if the incoming task should be assigned to a server pool of class i with j − 1 tasks under the JLMU policy. In addition, the intensity of D n (i, j) equals the total number of tasks in server pools of class i with exactly j tasks times the rate at which tasks are executed, and this totals the departure rate from server pools of class i with precisely j tasks.

Construction for SLTA
The processes (q n , r n ) are constructed to a large extent as in Section 6.1.1 when the load balancing policy is SLTA. The only differences arise in (17a) and (18). Specifically, (17a) has to be modified to capture the dispatching rule of SLTA and (18) must be accompanied by another equation, for describing the evolution of r n .
The counterpart of (17a), with an extra argument r : [0, ∞) −→ {r ∈ N : r ≥ 1}, is The functions η k are defined in Appendix A using the selection variables U k , so that they have the following property. If (q, r) is the value of (q n , r n ) when the k th task arrives, then SLTA sends this task to a server pool of class i with precisely j − 1 tasks if and only if The analog of the functional equation (18) is where the sets I n,k and D n,k are defined formally in Appendix A. The former set indicates that (q, r) corresponds to a system with no green tokens and at most one yellow token right before the k th arrival. The latter set indicates that the number of green tokens is larger than or equal to nβ n and that at least one of these tokens belongs to a server pool of class i r−1 right before the k th arrival. Also, D n (q) is defined as in (17b). As in Section 6.1.1, there exists a set of probability one Γ 0 with the next property. For each ω ∈ Γ 0 and each n, there exists a unique pair of càdlàg functions q n (ω) : [0, ∞) −→ Q n and r n (ω) : [0, ∞) −→ {r ∈ N : r ≥ 1} that solve (20); these functions can be constructed by forward induction on the jumps of the driving Poisson processes. The processes (q n , r n ) are defined by extending the above solutions to Ω, setting q n (ω, t) = 0 and r n (ω, t) = 0 for all t ≥ 0 and all ω / ∈ Γ 0 . In addition, we define A n := A n (q n , r n ) and D n := D n (q n ), and we note that the sample paths of A n , D n and q n lie in D 1 [0, ∞). The functional equation (20) can now be rewritten as follows: for all ω ∈ Γ 0 and all t ≥ 0.

Relative compactness results
Let D R I [0, ∞) denote the space of càdlàg functions on [0, ∞) with values in R I . We endow the space R I with the metric defined in Appendix B, which is compatible with the product topology, and we equip D R I [0, ∞) with the topology of uniform convergence over compact sets. The following proposition is proved in Appendix B.

Proposition 2.
Suppose that the load balancing policy is JLMU or SLTA. There exists a set of probability one Γ ∞ where: are relatively compact subsets of D R I [0, ∞) for all ω ∈ Γ ∞ and satisfy that the limit of every convergent subsequence is a function with locally Lipschitz coordinates. If the load balancing policy is SLTA, then there exists a random variable R ≥ 1 such that, apart from the above properties, we also have on Γ ∞ that: r n (0) ≤ R and q n (0, i, j) < α n (i) for all (i, j) i rn(0) , j rn(0) and n.
The product topology of R I is coarser than the topology of 1 , thus convergence in D R I [0, ∞) does not imply convergence in D 1 [0, ∞). The following technical lemma is used to demonstrate that {A n : n ≥ 1}, {D n : n ≥ 1} and {q n : n ≥ 1} are relatively compact in D 1 [0, ∞) with probability one; the proof is given in Appendix A.

Lemma 2.
Suppose that the load balancing policy is JLMU or SLTA. There exists a set of probability one Γ ⊂ Γ ∞ with the following property. For each ω ∈ Γ and T ≥ 0, there exist j T (ω) and n T (ω) such that Proof. We fix some ω ∈ Γ which we omit from the notation. For {A n : n ≥ 1}, the claim is a straightforward consequence of Proposition 2 and Lemma 2. Below we prove the claim for {q n : n ≥ 1}. If the load balancing policy is JLMU, then (19) and (22a) imply that the claim also holds for {D n : n ≥ 1}. If the load balancing policy is SLTA, then we must invoke (21a) instead of (19).

Also, if the load balancing policy is SLTA, then there exists R T (ω) such that
Consider any increasing sequence of natural numbers. By Proposition 2, there exists a subsequence K such that {q k : k ∈ K} converges in D R I [0, ∞) to a function q ∈ D R I [0, ∞) that satisfies q(0) = q 0 and has locally Lipschitz coordinates. Therefore, it suffices to prove that the latter limit in fact holds in D 1 [0, ∞). More specifically, we have to demonstrate that q ∈ D 1 [0, ∞) and that For this purpose, fix arbitrary T ≥ 0 and ε > 0. In addition, let j T and n T be as in the statement of Lemma 2, which implies that q k (i, j) and q(i, j) are non-increasing on [0, T ] provided that j > j T and k ≥ n T . The coordinates of q are continuous, thus we may conclude from the monotone convergence theorem that ||q(s) − q(t)|| 1 → 0 as s → t ∈ [0, T ] monotonically, from above or below. Since T is arbitrary, q is continuous with respect to || · || 1 and, in particular, q ∈ D 1 [0, ∞).
Since q 0 ∈ 1 and ||q k (0) − q 0 || 1 → 0 with k, there exist j ε ≥ j T and k 0 ≥ n T such that the following inequality holds for all t ∈ [0, T ] and k ≥ k 0 : Note that convergence in D R I [0, ∞) implies uniform convergence over compact sets of the coordinate functions. In particular, there exists k ε ≥ k 0 such that Therefore, we conclude that which completes the proof since T and ε are arbitrary.

Limiting behavior of JLMU
In this section we assume that JLMU is used and we prove Theorems 3, 4 and 5. The first two theorems are proved in Section 7.1, which is devoted to the study of fluid trajectories. The proof of Theorem 5 is provided in Section 7.2.

Properties of fluid trajectories
In order to prove the uniqueness of fluid trajectories, we show that every fluid trajectory satisfies a system of equations involving one-dimensional Skorokhod reflection mappings. Then we use a Lipschitz property of these mappings to prove that the system of equations cannot have multiple solutions for a given initial condition. The proof strategy is inspired by a fluid limit derived in [2] using Skorokhod reflection mappings; this fluid limit corresponds to a system of parallel single-server queues with a JSQ policy.   The map (Ψ α , Φ α ) such that Ψ α (x) = y and Φ α (x) = z is called the one-dimensional Skorokhod mapping with upper reflecting barrier at α and satisfies In addition, if x, y ∈ D[0, ∞) are any two functions such that x(0), y(0) ≤ α, then for each T ≥ 0 we have the following Lipschitz properties: Consider càdlàg functions x : [0, ∞) −→ R I such that x(0, i, j) ≤ α(i) for all (i, j) ∈ I + and families of càdlàg functions v k : [0, ∞) −→ R such that v k (0) = 0 for all k ∈ N. Define for each k ≥ 1 a mapping Θ k as follows: here recall the enumeration of I + introduced in Section 3.2. The next lemma establishes that (q, w) satisfies the following set of conditions if q is a fluid trajectory and w is defined suitably in terms of q.

Lemma 4. Let q be a fluid trajectory and define r such that σ (q(t)) = i r(t) , j r(t)
for all t ≥ 0.
Proof. It is clear that (q, w) satisfies (25c) and (25d), so we only need to verify that (25a) and (25b) hold as well. By Lemma 3, it is enough to check the following properties.
In order to establish (a), it suffices to show thaṫ holds almost everywhere. Indeed, (a) holds at t = 0 and q(i k , j k ) ≤ α(i k ) at all times. Sinceẇ k−1 −ẇ k = Λ(q, i k , j k ) and q is a fluid trajectory, we obtain (26) from (10a). Property (b) is a consequence of the definition of w k and (10b), and (c) follows from the following observation. If q(t, i k , j k ) < α(i k ), then (i k , j k ) σ (q(t)) and thus k ≥ r(t), which implies thatẇ k (t) = 0.
Next we prove Theorem 3. As noted earlier, the proof relies on the Lipschitz property of the Skorokhod reflection mapppings Ψ α and Φ α . In addition, a uniqueness of solutions result for certain Kolmogorov backward equations is used.
Proof of Theorem 3. Suppose that there exist two fluid trajectories x and y such that Define v in terms of x and w in terms of y as in Lemma 4. It follows from the same lemma that (x, v) and (y, w) satisfy (25). Next we fix T > 0 and we prove that x(t) = y(t) and v(t) = w(t) for all t ∈ [0, T ].
As a first step, we demonstrate that there exists M > 0 such that v k (t) = w k (t) = 0 for all t ∈ [0, T ] and k ≥ M.
Sincev k−1 −v k = Λ(x, i k , j k ) ≥ 0, we conclude thatv k ≤v k−1 ≤v 0 = λ almost everywhere and for all k ≥ 1. It follows from (10a), or equivalently from (25a) and (25b), that the following inequalities hold almost everywhere: Define α min := min {α(i) : 1 ≤ i ≤ m} and note that there exist ε > 0 and k 0 such that  (27), both x(i) and y(i) satisfy the following initial value problem: The above system of differential equations are the backward Kolmogorov equations of the pure birth process with state space E i := {j ∈ N : j ≥ J i } that has birth rate λ j := µj at state j. This process is non-explosive since j≥J i 1/λ j = ∞, hence it follows from [7] that the initial value problem (29)  For all t ∈ [0, T ] and k ≤ M , we have these inequalities follow from the Lipschitz properties of Ψ α(i k ) and Φ α(i k ) . Let us define for all t ∈ [0, T ] and k ≤ M . For the first inequality, note that f (i k , j k + 1) is identically zero along the interval [0, T ] if (i k , j k + 1) = (i l , j l ) for some l > M , and for the last inequality observe that g 0 is identically zero by (25c). In addition, we have We conclude this section by establishing that (10) has a unique equilibrium point and that all fluid trajectories converge to this equilibrium point over time.
Proof of Theorem 4. First we verify that q * is an equilibrium of (10). To this end, note that σ (q * ) = σ * and that the right-hand side of (10a) equals zero if (i, j) = σ * and q = q * . It only remains to be shown that this also holds for (i * , j * ) := σ * . If Define J i := max {j ≥ 1 : (i, j) σ * } for each i. If (i, j) = (i * , j * ) and q is replaced by q * , then the right-hand side of (10a) equals The expression in the last line equals zero by definition of J i and therefore we conclude that q * is indeed an equilibrium point of (10).
Next we prove that all fluid trajectories converge coordinatewise to q * over time; this implies, in particular, that q * is the unique equilibrium of (10). Afterwards we prove that all fluid trajectories in fact converge to q * in 1 .

Proof of the fluid limit
In order to prove Theorem 5, it suffices to demonstrate, for each ω ∈ Γ, that every subsequence of {q n (ω) : n ≥ 1} has a further subsequence with a limit in D 1 [0, ∞), and that this limit is the unique fluid trajectory starting at q 0 (ω). The first part is covered by Proposition 3, every subsequence of {q n (ω) : n ≥ 1} has a further subsequence with a limit in D 1 [0, ∞). Next we characterize the limits of the convergent subsequences.
Let us fix an arbitrary ω ∈ Γ, which we omit from the notation for brevity, and certain functions a, d and q, respectively, which have locally Lipschitz coordinates by Proposition 2. In order to characterize these three limits, it suffices to just characterize a and d because (19) and (22a) imply that Since a and d have locally Lipschitz coordinates, there exists R ⊂ (0, ∞) such that R c has zero Lebesgue measure and the derivatives of a(i, j) and d(i, j) exist for all (i, j) ∈ I at all points in R. These derivatives are zero if j = 0 by the definitions of A k and D k . The following lemma computes the derivatives for (i, j) ∈ I + .

Lemma 5.
Fix an arbitrary t 0 ∈ R, we havė Furthermore, q(t 0 ) and the derivativesȧ(t 0 , i, j) satisfy Proof. The sequences {D k (i, j) : k ∈ K} and {q k (i, j) : k ∈ K} converge uniformly over compact sets to d(i, j) and q(i, j), respectively, for all (i, j) ∈ I. This remark, the definition of D k and (22c) imply that It is clear that this identity establishes (31). We now prove that The derivativesȧ(t 0 , i, j) are non-negative since the processes A k (i, j) are non-decreasing, so we only need to show that the derivatives add up to λ. For this purpose, note that Fix T > t 0 , and let j T and n T be as in Lemma 2. The left-hand side has at most mj T non-zero terms for all k ≥ n T and t ∈ [0, T ]. It follows from (22b) that This yields (33) since the left-hand side has at most mj T non-zero terms. It follows from (30) that the derivative of q(i, j) exists at t 0 anḋ In order to prove the equality in (32), define σ 0 = (i 0 , j 0 ) := σ (q(t 0 )). If (i, j) σ 0 , then q(t 0 , i, j) = q(t 0 , i, j − 1) by (6). Moreover, for a fixed i, the marginal utility ∆(i, j) does not increase with j since u i is a concave function. Therefore, (i, j) σ 0 and j > 1 imply that (i, j − 1) σ 0 . We conclude that The last property and (34) imply that (32) holds if (i, j) σ 0 . Note that q(t 0 , σ 0 ) < α(i 0 ) by (6). Since q(σ 0 ) is continuous and q k (σ 0 ) converges uniformly over compact sets to q(σ 0 ), there exist ε > 0 and k ε ∈ K such that It follows from (6) and the last statement that σ 0 σ (q k (t)) for all t ∈ (t 0 − ε, t 0 + ε) and k ≥ k ε .
Indeed, server pools of class i with exactly j − 1 tasks are not assigned incoming tasks in the system with k server pools if (i, j) σ (q k ). This proves (32) for (i, j) σ 0 , and we conclude from (33) that (32) must also hold in the case (i, j) = σ 0 .
Below we complete the proof of Theorem 5.
Proof of Theorem 5. As above, we fix some ω ∈ Γ which we omit from the notation. Every subsequence of {q n : n ≥ 1} has a further subsequence which converges in D 1 [0, ∞) by Proposition 3. It follows from (22a) and Lemma 5 that the limit q of this convergent subsequence is a fluid trajectory with q(0) = q 0 , which determines q by Theorem 3.

Limiting behavior of SLTA
In this section we assume that the load balancing policy is SLTA and we leverage a methodology developed in [12] to prove Theorem 6. The first steps of the proof are carried out in Section 8.1, where we establish that certain dynamical properties of the system hold asymptotically with probability one. The proof is completed in Section 8.2, where we analyze the evolution of the learning scheme over time. While the arguments used here are more involved due to the heterogeneity of the system, most of the proofs are conceptually similar to those in [12], and hence are deferred to Appendix C.

Asymptotic dynamical properties
In this section we establish asymptotic dynamical properties pertaining to the total and tail mass processes, which are defined as s n := (i,j)∈I + q n (i, j) and v n (r) := (i,j) (ir,jr) q n (i, j) for all r ≥ 1, respectively. Recall that the total mass process was introduced in (1) and represents the total number of tasks in the system, normalized by the number of server pools. The following proposition is proved in Appendix C.
where s 0 is as defined in (11). Explicitly, s(ω, t) While the above law of large numbers is known to hold weakly, it is not straightforward that it holds with probability one under the coupled construction of sample paths adopted in Section 6.1; this fact is established in Proposition 4.
The next result is also proved in Appendix C, it provides an asymptotic upper bound for certain tail mass processes, under specific conditions concerning q n and r n . The upper bound implies at least an exponentially fast decay over time.

Proposition 5.
Suppose that the next conditions hold for a given ω ∈ Γ and a given increasing sequence K of natural numbers.
(b) There exist r > 1 and 0 ≤ t 0 < t 1 such that for all t ∈ [t 0 , t 1 ] and k ∈ K.

Evolution of the learning scheme
In this section we complete the proof of Theorem 6. In Section 8.2.1 we establish that there exists a neighborhood of zero outside of which r n is asymptotically upper bounded by r * with probability one. This property partially proves (14a) and is used to obtain (14c). The proof of (14a) is finished in Section 8.2.2, where we also establish (14b).

Preliminary results
The following proposition states that r n is asymptotically upper bounded by r * outside of a neighborhood of zero with probability one; the proof is deferred to Appendix C.
Proof. We fix ω ∈ Γ and T ≥ τ > τ bd (s 0 (ω)), and we omit ω from the notation for brevity. Suppose that the statement of the corollary does not hold, then there exist ε > 0 and an increasing sequence K of natural numbers such that By Propositions 3 and 6, we may assume that {q k : k ∈ K} has a limit in D 1 [0, ∞) and that r k (t) ≤ r * for all t ∈ [τ, T ] and all k ∈ K. The latter property implies that since otherwise the number of tokens would be zero, which cannot occur by Remark 1. Therefore, Proposition 5 holds with r = r * + 1 along the interval [τ, T ], and in particular, there exists a function v(r * + 1) such that This leads to a contradiction, so the statement of the corollary must hold.

Proof of Theorem 6
Below we complete the proof of Theorem 6. For this purpose, let for all t ≥ 0 and r ≥ 1. It follows from (22b) and (22c) that The following two technical lemmas are proved in Appendix C.

Lemma 7.
Fix ω ∈ Γ, T ≥ 0 and 1 ≤ r ≤ r * . Assume that there exist an increasing sequence K of natural numbers and random times 0 ≤ ζ k,1 ≤ ζ k,2 ≤ T such that and all large enough k ∈ K.
The above lemmas are used to complete the proof of Theorem 6.
Proof of Theorem 6. We define τ eq as follows: for all s ≥ 0.
In order to prove that ξ n ≤ τ for all large enough n, we show that If r * = 1, then ξ n = τ 0 for all n and the above inequality holds, so suppose that r * > 1. Assume that (38) does not hold. Then there exists an increasing sequence K of natural numbers such that ξ k > τ 0 + τ (ε) for all k ∈ K. Moreover, by Propositions 3 and 6, this sequence may be chosen so that the next two properties hold.
The definition of ξ k implies that The hypotheses of Proposition 5 hold with r := r * , t 0 := τ 0 and t 1 := τ 0 +τ (ε), by the above remark and properties (i) and (ii). Let v(r * ) be the function defined in this proposition, as the uniform limit of the tail processes v k (r * ) over [0, T ]. It follows from Propositions 4 and 5 that sup for all sufficiently large k ∈ K. For each of these k, we have The third inequality follows from Proposition 5, and the last equality from the definition of τ (ε). It follows from (7) that the right-hand side is strictly larger than (i,j) σ * α k (i) for all large enough k ∈ K, which is a contradiction. We conclude that (38) holds, which proves (14a) and (14b). We had already proved (14c) in Corollary 1, thus the proof of the theorem is complete.

Asymptotic optimality
In this section we prove Theorem 2. Specifically, in Section 9.1 we use drift analysis to demonstrate that the continuous-time Markov chains introduced in Section 3.3 are irreducible and positive-recurrent, and to derive upper bounds for certain expectations and tail probabilities. In Section 9.2 we use these upper bounds to establish that the stationary distributions of the latter Markov chains are tight, and then we complete the proof of Theorem 2 using the results of Sections 7 and 8. Finally, in Section 9.3 we demonstrate that JLMU is not optimal in general, although it is asymptotically optimal.

Drift analysis
Denote the state space and the generator matrix of the continuous-time Markov chains defined in Section 3.3 by S n and A n , respectively. We use exactly the same notation for JLMU and SLTA, but we always indicate which policy is being considered. The drift of a function f : The proof of the following proposition uses a Foster-Lyapunov argument, which is based on the drift of certain suitably chosen functions. Proof. Suppose first that the load balancing policy is JLMU. Any occupancy state can reach the empty occupancy state after a finite number of consecutive departures. By the definition of S n provided in Section 3.3, the latter remark implies that q n is irreducible. Moreover, q n is the empty occupancy state if and only if s n = 0, which implies that the empty occupancy state is positive-recurrent, because the M/M/∞ queue s n is irreducible and positive-recurrent. Thus, q n is positive-recurrent.
Suppose now that the load balancing policy is SLTA. Any state (q, r) ∈ S n can reach the empty occupancy sate with r n = r after a finite number of consecutive departures. Moreover, the latter state can reach the empty occupancy state with r n = 1 after a finite number of alternate arrivals and departures. We conclude from the definition of S n provided in Section 3.3 that (q n , r n ) is irreducible.
Next we use a Foster-Lyapunov argument to prove the positive recurrence. Consider the functions f, g : S n −→ [0, ∞) defined by f (q, r) := (i,j)∈I + q(i, j) and g(q, r) := r for all (q, r) ∈ S n .
All server pools together form an infinite-server system, thus A n f (q, r) = λ − µf (q, r) for all (q, r) ∈ S n . In addition, we have A n g(q, r) = λ [1 I (q, r) − 1 D (q, r)] for all (q, r) ∈ S n .
Here I corresponds to those states (q, r) such that r n increases if (q n , r n ) = (q, r) and the next event is an arrival. Specifically, Also, D corresponds to those states (q, r) such that r n decreases if (q n , r n ) = (q, r) and the next event is an arrival. Specifically, Consider the function h := f + 2g and let F be the set of those (q, r) ∈ S n that satisfy the following two conditions.
The first condition holds for finitely many q ∈ Q n and the second condition holds for finitely many r ≥ 1, thus F is finite. Next we establish that A n h ≤ −λ + 4λ1 F . Note that (q n , r n ) is non-explosive since the infinite-server queue s n has this property. Therefore, it follows from [16, Proposition 2.2.1] that (q n , r n ) is positive-recurrent.
Next we provide upper bounds for certain expectations and tail probabilities, which are used in the following section to demonstrate that the sequence of stationary distributions {π n : n ≥ 1} is tight, both for JLMU and SLTA. First we state a technical lemma; the proof follows from Fubini's theorem and is provided in Appendix A. Lemma 8. Let x n have the stationary distribution π n , where x n = q n if JLMU is used and Consider the quantities The above lemma is used to prove the following two propositions. q(i, j) for all q ∈ S n and k ≥ 1.
If q n has the stationary distribution π n , then for all k ≥ 1.
Proof. Fix some k ≥ 1 and define J i := min {j ≥ 1 : (i, j) (i k , j k )}. The drift of f k with respect to q n satisfies where f (q) is as in (39). For the last step, note that (i, j) σ(q) implies that q(i, j) = α n (i).
Observe that |A n (x, x)| = nλ + nµf (x) because nf (x) is the total number of tasks at the occupancy state x and nλ is the arrival rate of tasks. Hence, The right-hand side has a finite mean with respect to π n since nf (q n ) is the total number of tasks in the system in stationarity, which is Poisson distributed with mean nρ. Thus, we conclude that E [A n f k (q n )] = 0 by Lemma 8.
Taking expectations with respect to π n on both sides of (41), and recalling that nf (q n ) is Poisson distributed with mean nρ, we obtain where the last inequality follows from a Chernoff bound.

Proposition 9.
Suppose that the load balancing policy is SLTA, fix n and consider the functions q(i, j) for all (q, r) ∈ S n and k ≥ 1.
Let (q n , r n ) have the stationary distribution π n . For each k ≥ 1, we have where θ k n is defined as in (40) Proof. Fix k ≥ 1. As in the proof of Proposition 8, we see that and that f k satisfies the hypothesis of Lemma 8. Then we obtain (42a) by taking the expectation with respect to π n on both sides of the latter inequality. Consider the sets I and D defined in the proof of Proposition 7 and let I k := I ∩ {(q, r) ∈ S n : r ≥ k} and D k := D ∩ {(q, r) ∈ S n : r > k} .
The first step of the proof of (42b) is to establish that P ((q n , r n ) ∈ I k ) = P ((q n , r n ) ∈ D k ) .
Fix l > k and consider the function g l k : S n −→ [0, ∞) defined by As in the proof of Proposition 7, we obtain where the sets I l k and D l k are defined by Define f (q, r) as in (39) and note that |A n ((x, r), (x, r))| = nλ + nµf (x, r). Thus, The right-hand side has a finite mean with respect to π n since nf (q n , r n ) is the total number of tasks in stationarity, which is Poisson distributed with mean nρ. Therefore, it follows from Lemma 8 and (44) that P (q n , r n ) ∈ I l k = P (q n , r n ) ∈ D l k . The sets I l k and D l k increase to I k and D k , respectively, as l → ∞. This implies (43) since P ((q n , r n ) ∈ I k ) = lim l→∞ P (q n , r n ) ∈ I l k = lim l→∞ P (q n , r n ) ∈ D l k = P ((q n , r n ) ∈ D k ) .

Now we may write
Using the definition of I k , we can bound the first term on the last line by The second term on the last line of (45) can be bounded by For the last inequality, observe that the condition inside the first probability sign of the left-hand side of (47) implies that and this in turn implies that since q n (i, j) is non-increasing in j for all i. In addition, the condition inside the second probability sign of the left-hand side of (47) implies that We obtain (42b) from (46) and (47), recalling that nf (q n , r n ) is Poisson distributed with mean nρ and applying Chernoff bounds.

Proof of the asymptotic optimality
In this section we prove Theorem 2. As a first step, we establish that the sequence of stationary distributions {π n : n ≥ 1} is tight both for JLMU and SLTA. Proof. Suppose first that the load balancing policy is JLMU and let q n have the stationary distribution π n for each n. The sequence {q n : n ≥ 1} is tight with respect to the product topology since the random variables q n take values in [0, 1] I , which is compact with respect to the product topology. Therefore, as in [25,Lemma 2], the tightness in 1 of {q n : n ≥ 1} will follow if we establish that lim k→∞ lim sup By Proposition 8 and Markov's inequality, we have for all k, n ≥ 1 and ε > 0.
For all sufficiently large k, the exponent on the right-hand side converges to minus infinity as n grows large and k is held fixed. Thus, {q n : n ≥ 1} is tight in 1 . Suppose now that the load balancing policy is SLTA and let (q n , r n ) have the stationary distribution π n for each n. In order to prove that {(q n , r n ) : n ≥ 1} is tight in 1 × N, it suffices to show that {q n : n ≥ 1} and {r n : n ≥ 1} are tight in 1 and N, respectively.
Indeed, if the latter properties hold, then for each ε > 0 there exist compact sets K q ⊂ 1 and K r ⊂ N such that Therefore, the compact set K q × K r ⊂ 1 × N satisfies By Proposition 9 and Markov's inequality, we have for all k, n ≥ 1.
For all large enough k, the right-hand side of the second inequality is summable over n and in particular vanishes with n. This implies that the sequences {q n : n ≥ 1} and {r n : n ≥ 1} are tight in 1 and N, respectively.
We also need the following technical lemma.

Lemma 9.
Fix q ∈ 1 and suppose that the marginal utilities are bounded. Then Proof. Note that The second term in the last line is absolutely convergent since the marginal utilities are bounded and q ∈ 1 . Moreover, there exists M ≥ 0 such that and the latter limit is zero for all i since q ∈ 1 .
Now we are ready to prove Theorem 2.
Proof of Theorem 2. Suppose first that the load balancing policy is JLMU. It follows from Prokhorov's theorem and Proposition 10 that the stationary distributions {π n : n ≥ 1} are relatively compact in 1 , thus every subsequence has a further subsequence which converges in distribution. To establish (a), it suffices to prove the following statement: if K is an increasing sequence of natural numbers such that {π k : k ∈ K} converges weakly to π, then π is the Dirac measure concentrated at q * . Similarly, the sequence {π n : n ≥ 1} is relatively compact in 1 × N if the load balancing policy is SLTA, and to prove (b) it suffices to establish the following statement: if K is an increasing sequence of natural number such that {π k : k ∈ K} converges weakly to π, then π is the Dirac measure at (q * , r * ). Below we prove (a) and (b) in parallel, proceeding as indicated above. Fix an arbitrary increasing sequence of natural numbers K such that {π k : k ∈ K} converges weakly to a certain probability measure π. The following constructions use Skorokhod's representation theorem. If the load balancing policy is JLMU, then there exist random variables q k and q, distributed as π k and π, respectively, that are defined on a common probability space (Ω I , F I , P I ) and satisfy: lim k→∞ ||q k (ω) − q(ω)|| 1 = 0 for all ω ∈ Ω I .
If the load balancing policy is SLTA, then there exist random variables (q k , r k ) and (q, r), with distributions given by the probability measures π k and π, respectively, that are defined on some common probability space (Ω I , F I , P I ) and satisfy: lim k→∞ ||q k (ω) − q(ω)|| 1 = 0 and lim k→∞ r k (ω) = r(ω) for all ω ∈ Ω I .
The second limit implies that there exists a random variable R such that r k (ω) ≤ R(ω) for all k ∈ K and ω ∈ Ω I . Moreover, q k (i, j) < α k (i) for all (i, j) (i r k , j r k ) on Ω I by the definition of the state space S k for SLTA. Indeed, recall from Remark 1 that the latter property is preserved by arrivals and departures and observe that it holds for the empty occupancy state with r k = 1.
If the load balancing policy is JLMU, then we may construct occupancy proceses q k on a common probability space (Ω, F, P) as in Section 6.1.1, such that q k (0) = q k and (7) holds with q 0 = q. If the load balancing policy is SLTA, then we may construct processes (q k , r k ) on a common probability space (Ω, F, P) as in Section 6.1.2, such that (q k (0), r k (0)) = (q k , r k ) and the assumptions of Section 4.2.1 hold with q 0 = q. By Proposition 3, we may assume that q k converges in D 1 [0, ∞) to a process q with probability one, this may require to replace K by a subsequence. If the load balancing policy is JLMU, then Theorem 5 implies that q(ω) is the unique fluid trajectory such that q(ω, 0) = q(ω) for each ω ∈ Ω. Moreover, it follows from Theorem 4 that lim t→∞ ||q(ω, t) − q * || 1 = 0 for all ω ∈ Ω. (49) If the load balancing policy is SLTA, then (14b) and (14c) hold by Theorem 6. The former of these limits implies that q(t, i, j) → q * (i, j) for all (i, j) σ * with probability one, and the latter that and in particular that q(t, i, j) → q * (i, j) for all (i, j) σ * with probability one. This also holds for (i, j) = σ * by Proposition 4, so we conclude that It follows from the stationarity of π k that q k (t) is distributed as q k for all t ≥ 0 in the case of JLMU, and that (q k (t), r k (t)) has the same distribution as (q k , r k ) for all t ≥ 0 if the load balancing policy is SLTA. Furthermore, recall that in either case we have P lim k→∞ ||q k (t) − q(t)|| 1 = 0 = 1 for all t ≥ 0 and q k ⇒ q as k → ∞, which implies that q(t) is distributed as q for all t ≥ 0. Moreover, q(t) converges weakly to q * as t → ∞ by (49) and (50). Therefore, q corresponds to the Dirac probability measure concentrated at q * both for JLMU and SLTA.
This completes the proof of (a). To finish the proof of (b), observe that s 0 = ρ with probability one since q 0 = q is equal to q * with probability one. It follows from (14a) that P lim k→∞ r k (t) = r * = 1 for all t > τ eq (ρ).
Hence, r k (t) converges weakly to r * for all t > τ eq (ρ). Since r k (t) has the same distribution as r k for all t ≥ 0, we conclude that r k converges weakly to r * . Note that q * and r * are deterministic, thus q k and r k converge to q * and r * , respectively, in probability, which implies that (q k , r k ) converges to (q * , r * ) in probability. This completes the proof of (b).
Next we prove the statements about the stationary overall utilities, and here exactly the same arguments apply both for JLMU and SLTA. Suppose that q n has the stationary distribution. By (a) and (b), q n converges in probability to q * in 1 , and by Lemma 9, Since the marginal utilities are bounded, there exists a constant a such that This implies that u(q n ) converges in probability to u(q * ). Since ns n is Poisson distributed with mean nρ, we have The last expression does not depend on n, therefore we conclude that {u(q n ) : n ≥ 1} is uniformly integrable. As a result, we have This completes the proof.

Suboptimality result
In this section we prove that JLMU is generally not optimal in the pre-limit, although it becomes optimal as the number of server pools grows large. For this purpose, we construct an example in which JLMU is strictly outperformed by another policy.
As a result, JLMU sends tasks to server pool two if and only if this server pool is empty, thus the number of tasks in server pool two is zero or one in stationarity. This assignment rule guarantees the largest increase in the aggregate utility of the system at each arrival epoch. However, this increase can be very small when tasks are sent to server pool one, whereas the decrease in the aggregate utility can be comparatively large when a departure leaves server pool two with zero tasks. Particularly, this is the case when ε is small.
Suppose that tasks arrive as a Poisson process of intensity λ with exponential service times of mean 1/µ. In addition, let X and Y denote the number of tasks in server pools one and two, respectively. Next we provide an upper bound for the mean stationary overall utility of JLMU when the utility functions are as defined above.
As already noted, a new task is sent to server pool two if and only if Y = 0. Therefore, the following statements hold.
If we let U := u 1 (X) + u 2 (Y ) denote the aggregate utility in stationarity, then Consider now the policy that sends all tasks to server pool two. While JLMU dispatches the new tasks in a greedy fashion, this other policy is conservative, since it tries to avoid drops of u 2 (Y ) from a to zero, by keeping a positive number of tasks in server pool two. The mean stationary aggregate utility can be computed explicitly for the policy that we described above since Y is now an M/M/∞ queue and X(t) = 0 for all sufficiently large t. Let (X, Y ) be the stationary distribution of (X, Y ) and let V := u 1 (X) + u 2 (Y ) denote the aggregate utility of the system in stationarity, we have It is not difficult to verify that ε can be chosen so that In general, the right balance between greedy and conservative actions is difficult to determine and depends intricately on the set of utility functions. However, the benefits of conservative actions, that prevent the number of tasks in server pools of specific classes from dropping below certain occupancy levels, diminish as the scale of the system grows. Indeed, the average fraction of these server pools that have less tasks than the optimal quantity decreases as the number of server pools grows, even if the assignment policy is purely greedy as JLMU; essentially, this is a consequence of the increase in the number of server pools per class and the decrease in the coefficient of variation of the total number of tasks in the system. Therefore, the associated loss in mean stationary overall utility asymptotically vanishes as the scale of the system grows.

Appendix A Auxiliary results
Construction of Section 6.1.2. First we define the functions η k . For this purpose, suppose that q ∈ Q n and r ≥ 1 are the current values of q n and r n , respectively. Then is the set of the coordinates that could belong to a server pool with a green token; i.e., a server pool of class i with exactly j − 1 tasks has a green token if and only if (i, j) ∈ G(q, r).
We partition the latter set of coordinates as follows: These intervals form a partition of [0, 1) such that the length of I(q, i, j) is the fraction of server pools which are of class i and have exactly j −1 tasks. The length of I(q, i, j) is equal to the probability of picking a server pool with coordinates (i, j) uniformly at random. Note that G 1 (q, r) := (k,l)∈G 1 (q,r) δ q (k, l) is the fraction of server pools of class i = i r−1 with a green token. If the latter quantity is positive, then we let G 1 (q, r, i, j) := 1 − (k,l) (i,j),(k,l)∈G 1 (q,r) δ q (k, l) G 1 (q, r) , 1 − (k,l) (i,j),(k,l)∈G 1 (q,r) δ q (k, l) G 1 (q, r) .
Observe that η k (q, r, i, j) ∈ {0, 1} for all k and (i, j) ∈ I + , and that for a fixed k there exists a unique (i, j) ∈ I + such that η k (q, r, i, j) = 1. Also, if (q, r) is the value of (q n , r n ) when the k th task arrives to the system, then η k (q, r, i, j) = 1 if and only if this task has to be sent to a server pool of class i with exactly j − 1 tasks. It only remains to define the sets I n,k and D n,k that appear in (20b). The former is the Also, if x, y ∈ D[0, ∞) satisfy x(0), y(0) ≥ 0, then for each T ≥ 0, we have ||Ψ(x) − Ψ(y)|| T ≤ ||x − y|| T and ||Φ(x) − Φ(y)|| T ≤ 2 ||x − y|| T .
The latter definition and properties of the Skorokhod mapping with lower reflecting barrier at zero can be found in [5, Theorem 6.1] and [19,Lemma 6.14]. Suppose x is as in the statement of the lemma. Define y and z as in (24) Properties (b') and (c') imply (b) and (c). Also, (a) holds since z = α − Φ(α − x) ≤ α by (a') and z = x − y by definition. Conversely, suppose that y and z satisfy (a)-(c). Then α − z = α − x + y ≥ 0 and y satisfies (b') and (c') with z replaced by α − z, hence we conclude that y = Ψ(α − x) and α − z = Φ(α − x). Therefore, y and z are as in (24).
Proof of Lemma 8. By assumption and Tonelli's theorem, The first equality follows from the stationarity of π n . If JLMU is used, it then follows from (19) and (52a) that the sequence {q n : n ≥ 1} has the same properties. If SLTA is used, then we need to invoke (21a) instead of (19).

Appendix B Relative compactness
The following characterization of relative compactness with respect to T will be useful. Proof. We only need to prove the converse, so assume that {x n (i, j) : n ≥ 1} is relatively compact for all (i, j) ∈ I. Given an increasing sequence K ⊂ N, we need to establish that there exists a subsequence of {x k : k ∈ K} that converges with respect to T . For this purpose, we may construct a family of sequences {K j : j ≥ 0} with the following properties.
Define {k l : l ≥ 1} ⊂ K such that k l is the l th element of K l . Then Proof. We fix an arbitrary ω ∈ Γ T and we omit it from the notation for brevity. As noted above, it suffices to prove that {A n (i, j) : n ≥ 1} and {D n (i, j) : n ≥ 1} are relatively compact subsets of D[0, T ] for all (i, j) ∈ I. We fix some (i, j) ∈ I and demonstrate that {A n (i, j) : n ≥ 1} is relatively compact in D[0, T ], and that the limit of every convergent subsequence is Lipschitz. The same arguments can be used if A n is replaced by D n . Let M j and {ε n (i, j) : n ≥ 1} be as in the statement of Lemma 12. It follows from Lemma 11 that for each n there exists x n (i, j) ∈ L M j such that ||A n (i, j) − x n (i, j)|| T ≤ 4ε n (i, j).
Recall that L M j is compact, thus every increasing sequence of natural numbers has a subsequence K such that {x k (i, j) : k ∈ K} converges to a function x ∈ L M j . Furthermore, where the limits are taken along K. This shows that every subsequence of {A n (i, j) : n ≥ 1} has a further subsequence which converges to a Lipschitz function.
We are now ready to prove Proposition 2 Proof of Proposition 2. Recall that Γ T has probability one for all T ≥ 0. Hence, has probability one as well. Also, (22) and (23) hold within Γ ∞ by Lemma 10.
Next we fix an arbitrary ω ∈ Γ ∞ , which we omit from the notation for brevity, and we prove that {q n : n ≥ 1} is a relatively compact subset of D R I [0, ∞) such that the limit of every convergent subsequence has locally Lipschitz coordinate functions. Exactly the same arguments can be used if q n is replaced by A n or D n .
Fix a sequence K ⊂ N. We must show that {q k : k ∈ K} has a subsequence which converges uniformly over compact sets to a function with locally Lipschitz coordinates. To this end, we may construct a family of sequences {K T : T ∈ N} such that: (a) K T +1 ⊂ K T ⊂ K for all T ∈ N;