Tightness of invariant distributions of a large-scale flexible service system under a priority discipline

We consider large-scale service systems with multiple customer classes and multiple server pools; interarrival and service times are exponentially distributed, and mean service times depend both on the customer class and server pool. It is assumed that the allowed activities (routing choices) form a tree (in the graph with vertices being both customer classes and server pools). We study the behavior of the system under a Leaf Activity Priority (LAP) policy, which assigns static priorities to the activities in the order of sequential"elimination"of the tree leaves. We consider the scaling limit of the system as the arrival rate of customers and number of servers in each pool tend to infinity in proportion to a scaling parameter r, while the overall system load remains strictly subcritical. Indexing the systems by parameter r, we show that (a) the system under LAP discipline is stochastically stable for all sufficiently large r and (b) the family of the invariant distributions is tight on scales $r^{1/2 + \epsilon}$ for all $\epsilon>0$. (More precisely, the sequence of invariant distributions, centered at the equilibrium point and scaled down by $r^{-(1/2 + \epsilon)}$, is tight.)

1. Introduction.Large-scale service systems with heterogeneous customer and server populations bring up the need for efficient dynamic control policies that dynamically match arriving (or waiting) customers and available servers.It is desirable to have algorithms that avoid excessive customer waiting and do not rely on the knowledge of system parameters.Consider a service system with multiple customer and server types, where the arrival rate of class i customers is Λ i , the service rate of a class i customer by a type j server is µ ij , and the server pool sizes are B j .A desirable feature of a dynamic control is insensitivity to parameters Λ i and µ ij .That is, the assignment of customers to server pools should, to the maximal degree possible, depend only on the current system state (server occupancies, queue sizes), and not on prior knowledge of arrival rates or mean service times, because those parameters may not be known in advance and, moreover, they may be changing in time.
If the system objective is to minimize the largest average load of any server pool, a "static" optimal control can be obtained by solving a linear program, called static planning problem (SPP), which has B j 's, µ ij 's and Λ i 's as parameters.An optimal solution to the SPP will prescribe optimal average rates Λ ij at which arriving customers should be routed to the server pools.Typically (in a certain sense) the solution to SPP is unique and the basic activities, i.e. routing choices (ij) for which Λ ij > 0, form a tree; let us assume this is the case.Probabilistic routing with static probabilities Λ ij /Λ i of routing a customer of type i to server pool j, will balance the loads among different server pools and will avoid excessive customer waiting; however, in order to find the routing probabilities, it is necessary to know all of the parameters Λ i , B j , and µ ij in advance.The Shadow Routing policy in [6] is a dynamic control policy, which achieves the load balancing objective without a priori knowledge of input rates Λ i ; in the process it "automatically identifies" the basic activity tree.Shadow Routing policy, however, does need to "know" the service rates µ ij .
In this paper we assume that the basic activity tree is known, but not the precise rates Λ i , µ ij ; we restrict the routing choices to activities within the basic activity tree.We consider the large-number-of-servers asymptotic regime, in which the arrival rate of customers and number of servers in each pool tend to infinity in proportion to a scaling parameter r; our focus is on the case where the overall system load remains strictly subcritical.In a previous paper [8] we showed that a very natural load balancing policy considered e.g. by [4], [1], [2] may lead to instability at the system equilibrium point: in particular, for certain parameter settings [8,Theorem 7.2] demonstrated the non-tightness (in fact -evanescence to infinity) of invariant measures on the diffusion, r 1 2 -scale.(More precisely, this means that the sequence of invariant distributions, centered at the equilibrium point and scaled down by r − 1  2 , is non-tight, and moreover -escapes to infinity.)In this paper we consider a different algorithm, which we call the Leaf Activity Priority (LAP) policy.As specified above, no precise knowledge of the rates Λ i and µ ij is required, besides the knowledge of the basic activity tree, and routing is restricted to basic activities only.The policy assigns static priorities to the activities in the order of sequential "elimination" of the tree leaves.The precise definition will be given in Section 2.2.Assuming strictly subcritical load, for this policy we first prove that the system is stochastically stable for all sufficiently large values of r.(In contrast to load balancing policies, the stability under LAP is not "automatic".)Next, we demonstrate the r-scale (fluid-scale) tightness of stationary distributions; this fact is closely related to stability -both are"consequences" of the relatively "benign" behavior of the system on the fluid scale.Then, we obtain a much stronger tightness result, namely that the invariant distributions are tight on the r 1 2 + -scale, for any > 0; this is the main contribution of the paper, which involves the analysis of the process under hydrodynamic and local-fluid scaling (in addition to "standard" fluid scaling).We believe that our analysis can be extended to prove still stronger, diffusion scale (r 1/2 ) tightness; this is work in progress.
For a general review of literature on the large-number-of-servers asymptotic regime, including design and analysis of efficient control algorithms, see e.g.[4,6] and references therein.
The rest of the paper is organized as follows.In Section 2 the model, the asymptotic regime, LAP discipline and basic notation and introduced.The main results are stated in Theorem 10 of Section 3, with its statements (i) and (ii) being the stability and tightness results, respectively.Section 4 contains the analysis of the process on the fluid scale, which leads to establishing stability (Theorem 10(i)) and fluid scale (r-scale) tightness of stationary distributions.In Section 5, using the fluid-scale tightness as a starting point, we prove the r 1/2+ -scale tightness (Theorem 10(ii)); this is the key part of the paper, which involves the analysis of system dynamics under LAP discipline under hydrodynamic and local-fluid scaling.

Model.
2.1.The model; Static Planning (LP) Problem.Consider the model in which there are I customer classes, labeled 1, 2, . . ., I, and J server pools, labeled 1, 2, . . ., J. (Servers within pool j are referred to as class j servers.Also, throughout this paper the terms "class" and "type" are used interchangeably.)The sets of customer classes and server pools will be denoted by I and J , respectively.We will use the indices i, i to refer to customer classes, and j, j to refer to server pools.
We are interested in the scaling properties of the system as it grows large.The meaning of "grows large" is as follows.We consider a sequence of systems indexed by a scaling parameter r.As r grows, the arrival rates and the sizes of the service pools, but not the speed of service, increase.Specifically, in the rth system, customers of type i enter the system as a Poisson process of rate λ r i = rλ i , while the jth server pool has rβ j individual servers.(All λ i and β j are positive parameters.)Customers may be accepted for service immediately upon arrival, or enter a queue; there is a separate queue for each customer type.Customers do not abandon the system.When a cus-tomer of type i is accepted for service by a server in pool j, the service time is exponential of rate µ ij ; the service rate depends both on the customer type and the server type, but not on the scaling parameter r.If customers of type i cannot be served by servers of class j, the service rate is µ ij = 0.
Remark 1. Strictly speaking, the quantity β j r may not be an integer, so we should define the number of servers in pool j as, say, β j r .However, the change is not substantial, and will only unnecessarily complicate the notation.
Consider the following, load-balancing, static planning problem (SPP): Throughout this paper we will always make the following two assumptions about the solution to the SPP (1): Assumption 2 (Complete resource pooling).The SPP (1) has a unique optimal solution {λ ij , i ∈ I, j ∈ J }, ρ.Define the basic activities to be the pairs, or edges, (ij) for which λ ij > 0. Let E be the set of basic activities.We further assume that the unique optimal solution is such that E forms a tree in the (undirected) graph with vertices set I ∪ J .Assumption 3 (Underload).The optimal solution to (1) has ρ < 1.
Remark 4. Assumption 2 is the complete resource pooling (CRP) condition, which holds "generically" in a certain sense; see [7,Theorem 2.2].Assumption 3 is essential for the main results of the paper (r 1 2 + -scale tightness), but many of the auxiliary results hold (along with their proofs) for the critically loaded case ρ = 1.
Note that under the CRP condition, all ("server pool capacity") constraints (1d) are binding: i λ ij /(β j µ ij ) = ρ, ∀j.This in particular means that the optimal solution to SPP is such that, if a system with parameter r will route type i customers to pool j at the rate λ ij r, the server pool average loads will be minimized and "perfectly balanced".
In this paper, we assume that the basic activity tree is known in advance, and restrict our attention to the basic activities only.Namely, we assume that a type i customer service in pool j is allowed only if (ij) ∈ E. (Equivalently, we can a priori assume that E is the set of all possible activities, i.e. µ ij = 0 when (ij) ∈ E, and E is a tree.In this case CRP requires that all feasible activities are basic.)For a customer type i, let S(i) = {j : (ij) ∈ E}; for a server type j, let C(j) = {i : (ij) ∈ E}.
Under the CRP condition, optimal dual variables ν i , i ∈ I, and α j , j ∈ J , corresponding to constraints (1c) and (1d), respectively, are unique and all strictly positive.The dual variable ν i is interpreted as the "workload" associated with one type i customer, and α j β j is interpreted as the (average) rate at which one server in pool j processes workload when it is busy, regardless of the customer type on which it is working, as long as i ∈ C(j).The dual variables satisfy the relations ν i µ ij = α j /β j for any (ij) ∈ E, and j α j = 1, which in particular imply that (2) Given ρ < 1, this means, for example, that when all servers in the system are busy, the total rate i ν i λ i r at which new workload arrives in the system is strictly less than the rate j α j r = r at which it is served.
Remark 5.Although (1) is the load-balancing SPP, and the notions introduced in this subsection are defined in terms of this SPP, the policy we consider in this paper (defined in Section 2.2) is not a load balancing policy.In particular, the system equilibrium point under the policy, will not balance server pool loads, but rather will keep all pools, except one, fully occupied.
2.2.Leaf activity priority (LAP) policy.For the rest of the paper, we analyze the performance of the following policy, which we call leaf activity priority (LAP).The first step in its definition is the assignment of priorities to customer classes and activities.
Consider the basic activity tree, and assign priorities to the edges as follows.First, we assign priorities to customer classes by iterating the following procedure: 1. Pick a leaf of the tree; 2. If it is a customer class (rather than a server class), assign to it the highest priority that hasn't yet been assigned; 3. Remove the leaf from the tree.
Without loss of generality, we assume the customer classes are numbered in order of priority (with 1 being highest).We now assign priorities to the edges of the basic activity tree by iterating the following procedure: 1. Pick the highest-priority customer class; 2. If this customer class is a leaf, pick the edge going out of it, assign this edge the highest priority that hasn't yet been assigned, and remove the edge together with the customer class; 3.If this customer class is not a leaf, then pick any edge from it to a server class leaf (such necessarily exists), assign to this edge the highest priority that hasn't yet been assigned, and remove the edge.
It is not hard to verify that this algorithm will successfully assign priorities to all edges; it suffices to check that at any time the highest remaining priority customer class will have at most one outgoing edge to a non-leaf server class.
Remark 6.This algorithm does not produce a unique assignment of priorities, neither for the customer classes nor for the activities, because there may be multiple options for picking a next leaf or edge to remove, in the corresponding procedures.This is not a problem, because our results hold for any such assignment.Different priority assignments may correspond to different equilibrium points (defined below in Section 2.3); once we have picked a particular priority assignment, there is a (unique) corresponding equilibrium point, and we will be showing steady-state tightness around that point.Furthermore, the flexibility in assigning priorities may be a useful feature in practice.For example, it is easy to specialize the above priority assignment procedure so that the lowest priority is given to any a priori picked activity.
We illustrate one such priority assignment in Figure 1.We will write (ij) < (i j ) to mean that activity (ij) has higher priority than activity (i j ).It follows from the priority assignment algorithm that i < i (customer class i has higher priority than i ) implies (ij) < (i j ).In particular, if j = j , we have (ij) < (i j) if and only if i < i .Without loss of generality, we shall assume that the server classes are numbered so that the lowest-priority activity is (IJ).(In Figure 1, this corresponds to assigning the number 3 to server pool C.) An example assignment of priorities to customer classes and activities to an example network.Circles represent customer classes, squares represent server pools.
Now we define the LAP policy itself.It consists of two parts: routing and scheduling."Routing" determines where an arriving customer goes if it sees available servers of several different types."Scheduling" determines which waiting customer a server picks if it sees customers of several different types waiting in queue.
Routing: An arriving customer of type i picks an unoccupied server in the pool j ∈ S(i) such that (ij) ≤ (ij ) for all j ∈ S(i) with idle servers.If no server pools in S(i) have idle servers, the customer queues.
Scheduling: A server of type j upon completing a service picks the customer from the queue of type i ∈ C(j) such that i ≤ i for all i ∈ S(i) with Q i > 0. If no customer types in C(j) have queues, the server remains idle.
We introduce the following notation (for the system with scaling parameter r): Ψ r ij (t), the number of servers of type j serving customers of type i at time t; Q r i (t), the number of customers of type i waiting for service at time t.

LAP equilibrium point.
Informally speaking, the equilibrium point (ψ * ij , q * i ) (ij)∈E,i∈I is the desired operating point for the (fluid scaled) vector (Ψ r ij /r, Q r i /r) (ij)∈E,i∈I of occupancies and queue lengths under the LAP policy.Specifically, we will be showing that in steady state the fluid-scaled vector converges in distribution to the equilibrium point, and will then show that the deviations from it are small.We define the equilibrium point below; it will be the stationary point of the fluid models defined in Section 4.
The LAP discipline is not designed with load balancing in mind, so its equilibrium point does not, of course, achieve load balancing among the server pools.To define it, we recursively define the quantities λ ij ≥ 0, which have the meaning of routing rates, scaled down by factor 1/r. (These λ ij are not the same as those given by the optimal solution to the SPP (1).)For the activity (1j) with the highest priority, define either λ 1j = λ 1 and ψ * 1j = λ 1 µ 1j , or ψ * 1j = β j and λ 1j = β j µ 1j , according to whichever is smaller.Replace λ 1 by λ 1 − λ 1j and β j by β j − ψ * 1j , and remove the edge (1j) from the tree.We now proceed similarly with the remaining activities.Formally, set Since the definition is in terms of higher-priority activities, this defines the (λ ij ) (ij)∈E uniquely.The LAP equilibrium point is defined to be the vector given by ( 3) (Since we're in the underloaded case ρ < 1, all queues should be 0 at equilibrium.)To avoid trivial complications, throughout the paper we make the following assumption: and i ψ ij ≤ β j for all j, then ψ ij > 0 for all (ij) ∈ E. In particular, the equilibrium point satisfies this condition and, moreover, it is such that i ψ * ij = β j for all j < J and i ψ * iJ < β J .
The assumption means that the system needs to employ (on average) all activities in order to be able to handle the load.It holds, for example, whenever ρ is sufficiently close to 1. Indeed, suppose the arrival rates (λ i ) i∈I are such that ρ = 1− , and consider the system with arrival rates λi = 1  1− λ i .The basic activity tree E will be the same for ( λi ) i∈I as for (λ i ) i∈I .Since ρ = 1 for ( λi ) i∈I and CRP holds, there exists a unique set of ( ψij ) (ij)∈E that satisfies the conditions, and it has ψij > 0 for all (ij) ∈ E. Also, if is sufficiently small, we must have ψ ij ≈ ψij for all (ij) ∈ E, and hence ψ ij > 0 for all (ij) ∈ E. Remark 8. Assumption 7 is technical -our main result, Theorem 10, can be proved without it, by following the approach presented in the paper.But, it simplifies the statements and proofs of many auxiliary results, and thus substantially improves the exposition.
The symbol =⇒ denotes convergence in distribution of random variables in the Euclidean space R d (with appropriate dimension d).The symbol → denotes ordinary convergence in R d .Standard Euclidean norm of a vector x ∈ R d is denoted |x|, while x denotes the L 1 -norm (sum of absolute values of the components); u.o.c.means uniform on compact sets convergence of functions, with the domain defined explicitly or by the context.For x ∈ R, x is the greatest integer less than or equal to x.
3. Main result.We are now in position to state our main result.
Theorem 10.Consider the sequence of systems under LAP policy, in the scaling regime and under the assumptions specified in Section 2, with ρ < 1. Then: (i) For all sufficiently large r, the system is stable, i.e. the countable Markov chain The proof is given in the rest of the paper, and consists roughly of two stages.First, we study the process under the fluid scaling ), which allows us to prove stability and statement (ii) for = 1/2.Then we need a more detailed analysis, involving hydrodynamic and local-fluid scaling of the process, to prove (ii) for any > 0.
Throughout the paper, we will use the following additional notation for the system variables.For a system with parameter r, we denote: is the total number of customers of type i in the system at time t; A r i (t) is the total number of customers of type i exogenous arrivals into the system in interval [0, t]; D r ij (t) is the total number of customers of type i that completed the service in pool j (and departed the system) in interval [0, t]; Ξ r ij (t) is the total number of customers of type i that entered service in pool j in interval [0, t].There are some obvious relations between realizations of these processes: r for each j ∈ S(i); and so on.
We can and will assume that a random realization of the system with parameter r is determined by its initial state and realizations of "driving" unit-rate, mutually independent, Poisson process Π the driving Poisson processes are common for all r.
4. Fluid scaling.We begin by analyzing the LAP discipline on the fluid scale.Namely, consider the scaling Proposition 11.Suppose (ψ r ij (0), q r i (0)) → (ψ ij (0), q i (0)).Then, w.p.1, for any subsequence r → ∞ there exists a further subsequence along which of Lipschitz continuous functions satisfying conditions (4).The conditions involving derivatives are to be satisfied at all regular points of the limiting set of functions.(A time point t ≥ 0 is regular if both minimum and maximum of any subset of component functions have derivatives at t.All points t ≥ 0 are regular, except a subset of zero Lebesgue measure.) The fluid model conditions are: whenever q i (t) > 0 (and then necessarily Proof of Proposition 11.The proof of convergence fact and of the basic conditions (4a)-(4d) of the limit, is very standard.Indeed, it follows from the Functional Strong Law of Large Numbers (FSLLN) for the driving processes, and the scaling applied, that w.p.1 each component function is asymptotically Lipschitz.For example, for each scaled departure process we have: w.p.1, for a fixed large C > 0 and any 0 ≤ t 1 < t 2 < ∞, This implies that, w.p.1 any subsequence of r has a further subsequence along which a u.o.c.convergence Similar convergence property holds for each arrival process.From here we obtain the convergence (along a subsequence) for all other components.Then, relations (4a)-(4d) are inherited from the corresponding conservation laws for the pre-limit trajectories.
Properties (4e)-(4i) easily follow from the priority rule of LAP; it suffices to consider the behavior of pre-limit trajectories in a small time interval [t, t + δ] when r sufficiently large.(See e.g.[5,Theorem 1] for this type of argument.) Finally, to show (4j) we recall that, by the priority assignment procedure, for the activity (ij): either (ij) has the lowest priority among activities associated with customer class i or (ij) has the lowest priority among activities associated with server pool j (or both).Taking into account that point t is regular (which in particular implies q i (t) = 0 and (d/dt) k ψ kj (t) = 0), we easily see that in the former case the only possibility is that and in the latter case we must have This implies (4j).We omit further details of the proof which are, again, rather standard.
We call any Lipschitz solution (ψ ij (•), 4) a fluid model of the system with initial state (ψ ij (0), q i (0)); a set (ψ ij (•), q i (•)), which is a projection of a fluid model we often call a fluid model as well.
Remark 12.It will not be important for the results in the paper whether the fluid model with given initial conditions is unique; all that will matter is the long-term behavior of all fluid models with given initial conditions.
Proposition 13.For any > 0 and any K > 0 there exists a finite time T = T (K) such that all fluid models whose starting state satisfies |(ψ ij (0), q i (0))| ≤ K have i ψ ij (t) = β j , ∀j < J, q i (t) = 0, ∀i ∈ I, and Sketch of proof.For the highest priority activity (1j) there are two cases.Case a: Type 1 is a leaf.In this case j is the unique server to which type 1 jobs are allowed to go, and they have the highest priority there.Pick a small δ > 0. After a finite time (uniformly bounded above, across all starting states as in the proposition statement), the condition ψ 1j (t) ≥ ψ * 1j − δ must hold, because ψ 1j (t) < ψ * 1j − δ implies that (d/dt)ψ 1j (t) is positive and bounded away from 0. After such time, q 1 (t) > 0 implies i ψ i j (t) = β j and (recall that δ is small) λ 1 ≤ µ 1j ψ 1j (t)−δ 1 for some δ 1 > 0; and therefore (d/dt)q 1 (t) ≤ −δ 1 .We conclude that after a finite time (uniformly bounded above) we must have q 1 (t) = 0.This in turn implies that (d/dt)ψ 1j (t) is negative and bounded away from 0 as long as ψ 1j (t) > ψ * 1j + δ.Thus, |ψ 1j (t) − ψ * 1j | ≤ δ and q 1 (t) = 0 after a bounded time.Case b: Pool j is a leaf.Then Assumption 7 implies ψ * 1j = β j and λ 1 > µ 1j β j .In this case, ψ 1j (t) = ψ * 1j starting at some time (that is bounded uniformly on initial states), simply because (d/dt)ψ 1j (t) ≥ λ 1 − µ 1j β j > 0 as long as ψ 1j (t) < β j .We see that, in either case a or b, for arbitrarily small δ > 0, there exists We proceed by induction on the activity priorities and, using Assumption 7, easily establish analogous properties for every activity (i j ).This implies the result.We omit details.
Theorem 14.For all sufficiently large r, the LAP discipline stabilizes the network (in the sense of positive recurrence of the underlying Markov process).Moreover, the sequence of invariant distributions of (ψ r ij , q r i ) is tight, and the invariant distributions converge weakly to the point mass at the equilibrium point.
Before we proceed with the proof, we need the following lemma.
Lemma 15.There exists T 1 > 0 such that for any T 2 > T 1 there exists a sufficiently large C = C(T 2 ) for which the following holds.For any > 0, as r → ∞, uniformly on initial states with max i∈I q r i (0) ≥ C.
In turn, to prove this lemma, we will need to use fluid models with infinite initial states.Note that we cannot appeal directly to the properties of "standard" fluid models defined earlier, because we require convergence that is uniform in all large initial states.So, we need the following version of a fluid limit result.We will use notation R = R ∪ {∞} for the the one-point compactification of R.
Proposition 16.Consider a sequence of fluid-scaled processes (ψ r ij (•), q r i (•)) with deterministic initial states such that (ψ r ij (0), q r i (0)) = C (r) → ∞ and (ψ r ij (0), q r i (0)) → (ψ ij (0), q i (0)), where each q r i (0) and q i (0) is viewed as an element of R. Partition the customer classes as I = I ∞ ∪ I 0 , where q i (0) = ∞ for i ∈ I ∞ , and q i (0) < ∞ for i ∈ I 0 .(Necessarily, I ∞ is non-empty.)Then, with probability 1, any subsequence of trajectories has a further subsequence which converges u.o.c. to a fluid model, satisfying same conditions as (4), except that for all i ∈ I ∞ the queue length q i (t) = ∞, ∀t ≥ 0.Moreover, all such fluid models are such that, uniformly on all of them, starting at some finite time T 1 , all server pools are fully occupied: i ψ ij (t) = β j , t ≥ T 1 , ∀j.
Proof of this result is very similar to that of Proposition 13 (and in fact simpler), so it is not spelled out here.We just note that Assumption 7 is essential in showing that all server pools are occupied after a finite time.Without the assumption, we could still show that the occupancy becomes strictly greater than at the equilibrium point, and that would be enough for our purposes; however, it would make Proposition 16 statement and proof more cumbersome.
Proof of Lemma 15.Let us choose T 1 = 2T 1 , where T 1 is as in Proposition 16.Now, if the lemma statement would not hold, then for some fixed > 0 we could find a sequence of systems with (ψ r ij (0), q r i (0 This, however, is impossible because, by Proposition 16, w.p.1 from any subsequence of r we can find a further subsequence such that: and therefore Proof of Theorem 14. Recall that ν i > 0 is the workload associated with a single request of type i; i.e., the optimal dual variable associated with (1c) for type i.We consider the quantity (where x r i (t) = i q r i (t) + ij ψ r ij (t)), the total workload of the system.We will argue that the quantity will serve as a Lyapunov function for the rth system.Namely, the following property holds: there exist positive constants K, T , C 1 , C 2 , C 3 such that, for all sufficiently large r, (The proof of ( 5)-( 6) is given after we complete the theorem proof.)It is then a standard application of the Foster-Lyapunov criteria to conclude that for all sufficiently large r the system Markov process is positive recurrent, and moreover, the stationary distributions are such that EW r = i ν i Ex r i remains uniformly (in r) bounded.Indeed, for any fixed initial state of the process, consider the embedded chain at times 0, T, 2T, . ... It easily follows (using the fact that each input flow is Poisson, and fluid scaling is applied) that 0 ≤ EL r (nT ) = E[W r (nT )] 2 < ∞ for all n = 0, 1, 2, . ... Also, WLOG, by rechoosing if necessary C 1 and C 2 , we can assume that the "then" part of ( 5) holds for any L r (t).We see that, for any n, from here the positive recurrence and steady-state bound EW r ≤ C 2 /C 1 easily follow, because the opposite would imply EL r (nT ) → −∞ as n → ∞.Uniform bound on EW r implies tightness of invariant distributions.The tightness together with Proposition 13 imply that the sequence of invariant distributions must weakly converge to the point mass at equilibrium.
It remains to show property ( 5)-( 6).First, it is easy to see (and is a standard observation) that ( 7) ∀T > 0, E[W r (t + T ) − W r (t)] 2 are uniformly bounded across all r and t.This guarantees (6) for any fixed K. To prove (5), we fix T 1 > 0 as in Lemma 15, and then choose a large fixed T > T 1 .Note that (min in particular, the condition max i∈I q r i (0) → ∞ in Lemma 15 is equivalent to W r (0) → ∞.If we fix a sufficiently small > 0 and apply Lemma 15, we obtain the following fact: for a sufficiently large fixed K > 0 (as a function of T ), uniformly on all L r (0) > K and all large r, Indeed, the 2ρT 1 is a crude upper bound on W r (T 1 ) − W r (0), which holds with high probability (w.h.p.) for large r, since by (2) new workload arrives at average rate ρ (in the fluid-scaled system).The term is an upper bound on W r (T ) − W r (T 1 ), also holding w.h.p., because by Lemma 15 the average rate at which workload leaves the system is w.h.p.
close to 1 in [T 1 , T ]; and recall that ρ < 1.This proves (8).The RHS of the first inequality in ( 8) is negative if we choose T large enough.This, along with (7), implies ( 5), since we have the identity 5. Proof of Theorem 10(ii).

5.1.
Preliminaries.In the previous section we have shown that the process (Ψ r ij (•), Q r i (•)) is positive recurrent, and then has unique stationary (or invariant) distribution for all large r (which proved Theorem 10(i)).Moreover, Here and in the rest of the paper So, we know that Theorem 10(ii) is true for = 1/2, and our goal is to prove it for any > 0. In what follows 0 < < 1/2 is fixed.
We will prove that there exist positive constants C and T , such that for any fixed δ 1 > 0 the following holds for all sufficiently large r: This fact, along with (10), implies that for all large r, in steady-state, This clearly proves Theorem 10(ii), because δ, δ 1 can be chosen, and rechosen, to be arbitrarily small.So, the rest of Section 5 is the proof of (11), with the final part of the proof given in Section 5.4.We will need FSLLN-type results, which can be obtained from a strong approximation of Poisson processes, available e.g. in [3, Chapters 1 and 2]: Proposition 17.A unit rate Poisson process Π(•) and a standard Brownian motion W (•) can be constructed on a common probability space in such a way that the following holds.For some fixed positive constants C 1 , C 2 , C 3 , such that ∀T > 1 and ∀u ≥ 0 From here, for the unit rate Poisson processes Π i (•), for example, we replace t with λ i rt; T with λ i rT log r; and u with r 1/4 .)Proposition 18.For any fixed T > 0 and any subsequence of r → ∞, we can find a further subsequence (with r increasing sufficiently fast), such that: Let F r (t) be the process of (unscaled) deviations from equilibrium; that is, ). Suppose we have a function h(r), such that r 1/2+ ≤ h(r) ≤ g(r).(The quantity h(r) will be the "scale" of |F r (0)|; sometimes, we simply use h(r) = |F r (0)|, but not necessarily.)We will establish properties of F r (•) under two different scalings, called hydrodynamic and local-fluid.
We remark that the use of multiple scalings (in addition to the "standard" fluid scaling) is typical in the analysis of systems in many-server asymptotic regime, cf.[4] and references therein.However, our hydrodynamic and localfluid scalings are somewhat unusual in that the scaling factor h(r) is strictly "between" r and r 1/2 .(When h(r) = r, both local-fluid and hydrodynamic scalings become the standard fluid scaling; if h(r) = r 1/2 , the local-fluid scaling becomes the standard diffusion scaling.)Also, the system behavior, of course, depends on the control discipline, LAP in our case; and so our analysis of LAP under various scalings is new.Most importantly, the way we use these multiple scalings for the purposes of proving tightness of stationary distributions is novel, to the best of our knowledge.5.2.Hydrodynamic scaling.Consider the process under the following scaling and centering: Theorem 19.Consider a sequence of deterministic realizations, such that the driving realizations satisfy FSLLN conditions, namely: Suppose (ψ r ij (0), q r i (0)) → (ψ ij (0), q i (0)).Then, for any subsequence of r there exists a further subsequence along which (ψ of Lipschitz continuous functions satisfying conditions (15).(The conditions involving derivatives are to be satisfied at regular time points t ≥ 0 of the limiting set of functions.) The hydrodynamic model conditions are: i ψ ij (t) = 0, whenever q i (t) > 0 for at least one i ∈ C(j) whenever q i (t) > 0 (and then necessarily whenever k ψ kj (t) < 0 (and then necessarily q i (t) = 0) (15j) There is a clear correspondence between the hydrodynamic model and fluid model conditions.This is not surprising, of course, -the hydrodynamic limit is also an FSLLN-type limit, but on a different, finer time and space scale.We omit the proof of Theorem 19 -it is analogous to that of Proposition 11.
We call any Lipschitz solution (ψ ij (•), 15) a hydrodynamic model (HM) of the system with initial state (ψ ij (0), q i (0)); a set (ψ ij (•), q i (•)), which is a projection of an HM we often call a hydrodynamic model as well.Also, we sometimes use shorter notations f ).We have the following corollary of Theorem 19 which we record for future reference.
Corollary 20.For any fixed T > 0, K > 0 and δ 2 > 0, there exists a sufficiently small δ 3 > 0, such that the following holds.Uniformly on all |f r (0)| ≤ K and all sufficiently large r, conditions where f (•) is a hydrodynamic model.
Proof.Suppose not.Fix T, K, δ 2 .There must exist a sequence δ 3 ↓ 0, and a corresponding sequence r = r(δ 3 ) ↑ ∞, such that ( 16), ( 17) and the convergence f r (0) → f (0) of initial states hold, but (18) fails for any hydrodynamic model.This, however, is impossible, because according to Theorem 19 (or rather its version, specialized to a finite time interval, to be precise) we can choose a further subsequence of r along which f r (t) → f (t), uniformly in [0, T ], where f (•) is a hydrodynamic model starting from f (0).
Proof.Consider a fixed HM f (•).Consider the highest priority activity (1j).There are two possible cases: j is a leaf or 1 is a leaf.Case a: If j is a leaf, then ψ 1j (t) ≤ 0 at all times, and ψ 1j (t) must increase at positive rate, bounded away from 0, until it reaches 0 within a finite time.Thereafter, ψ 1j (t) will stay at 0. (The argument is very similar to Case b in the proof of Proposition 13.) Case b: If type 1 is a leaf, then q 1 (t) must decrease and ψ 1j (t) increase at the same rate (positive, bounded away from 0), until the entire queue (if any) "relocates into" ψ 1j ; after that time, ψ 1j (t) and q 1 (t) = 0 will not change.
We see that in either case a or b, after a finite time, the highest priority activity (1j) can be in a sense "ignored".This allows us to proceed by induction on the activities, from the highest priority to the lowest, to check that by some finite time T (depending on K) the hydrodynamic model gets into a state f (T ), satisfying conditions of the theorem, and will stay in this state for all t ≥ T .Since all HMs are uniformly Lipschitz, we obviously have a uniform bound max t≥0 |f (t)| ≤ CK for some C.
Furthermore, since all x i (t) do not change with time, the linear mapping L is as follows: L(u ij , w i ) = (c , 0) where (c ij ) is the unique solution to (19a) Remark 22. Examination of the proof of Theorem 21 reveals that the HM for any initial state is in fact unique.Moreover, with a little further argument, it is easy to show that an HM depends on the initial state continuously.Furthermore, the HMs are scalable: if (f (t), t ≥ 0) is an HM, then so is (f (ct)/c, t ≥ 0) for any c > 0. From here, it is easy to find that the theorem statement holds for a constant C independent of K and for T = CK.We will not need these stronger properties in this paper.
For future reference, note that L(u ij , w i ) = (c ij , 0) is a function only of the vector (z i ), where z i = w i + j u ij .The corresponding linear mapping from (z i ) to (c ij ), we denote L .

5.3.
Local-fluid scaling.The process under local fluid scaling is as follows.For each r consider We will also denote xr ) is centered before it is scaled in space, we in particular have (by Assumption 7) that i ψr ij (t) ≤ 0 for all j < J at all times t.
The local fluid model conditions are as follows: (24a) qi (t) = 0, ∀i ∈ I The I + J − 1 equations for the I + J − 1 functions ( ψij (•)) can be solved sequentially, in order of decreasing activity priority, since the highest unsolvedfor priority will always correspond to either a customer-type or a server-type leaf of the remaining activity tree.Any Lipschitz trajectory satisfying (24) we will call a local fluid model (LFM).Conditions (24) reduce to a system of linear ODEs for ( ψij (t)), which of course implies the continuous dependence on initial state; the fact that each LFM converges to 0 is easily established, again by induction on activities; therefore, we obtain the uniform exponential bound (23).Analogously to f r (•) = ( ψr ij (•), qr i (•)), we will use shorter notation f (•) = ( ψij (•), qi (•)).
Proof of Theorem 23.The non-trivial part of the proof is showing the Lipschitz property of the limit f (•), because it is no longer a simple consequence of the FSLLN for the driving processes (as it was for the fluid and hydrodynamic limits).This is because the arrival and service rates in the system (with index r) are O(r), while the space is scaled down by h(r) = o(r).For the same reason, it is also not "automatic" that the limit queues qi (•) stay at 0. This difficulty is resolved as follows.Consider arbitrary number C 4 > ( ψij (0)) , and the random time τ (r) = min{t | ( ψr ij (t)) ≥ C 4 }.Then, speaking informally (the formal statements are given below), the trajectory xr i (•) for each i must be "almost Lipschitz" in the interval [0, τ (r)], with the Lipschitz constant η = C 4 (µ ij ) , because the absolute difference between the arrival and departure rates (scaled down by h(r)) is upper bounded by η in [0, τ (r)]; similarly, each queue length trajectory qr Formally, it is easy to show the following: if τ (r) → 0 along some subsequence, then (denoting xi (0) = j ψij (0)) ( 25) sup  (q r i (t)) − (q i (0)) → 0.
Next, in addition to (26), we show that Finally, as already observed earlier, the linear ODE (24) solutions satisfy condition (23).In particular, each local fluid model remains bounded in [0, ∞).This in turn allows us to conclude that by choosing a sufficiently large C 4 , the corresponding 4 can be arbitrarily large.This completes the proof.
We will actually need a generalized version of Theorem 23.
Theorem 24.Consider a sequence of deterministic realizations, such that the driving realizations satisfy (20)-(21).Assume that the initial states converge to a fixed vector f r (0) → f • (0).(We do not assume f • (0) = L f • (0).)Then, for any subsequence of r there exists a further subsequence along which Then, we consider local fluid scaled trajectories starting time point T 5 h(r)/r (as opposed to 0), and the rest of the proof is essentially same as that of Theorem 23.
The following corollary is derived from Theorem 24 analogously to the way Corollary 20 was derived from Theorem 19.