Many-server queues with customer abandonment: numerical analysis of their diffusion models

We use multidimensional diffusion processes to approximate the dynamics of a queue served by many parallel servers. The queue is served in the first-in-first-out (FIFO) order and the customers waiting in queue may abandon the system without service. Two diffusion models are proposed in this paper. They differ in how the patience time distribution is built into them. The first diffusion model uses the patience time density at zero and the second one uses the entire patience time distribution. To analyze these diffusion models, we develop a numerical algorithm for computing the stationary distribution of such a diffusion process. A crucial part of the algorithm is to choose an appropriate reference density. Using a conjecture on the tail behavior of a limit queue length process, we propose a systematic approach to constructing a reference density. With the proposed reference density, the algorithm is shown to converge quickly in numerical experiments. These experiments also show that the diffusion models are good approximations for many-server queues, sometimes for queues with as few as twenty servers.

1. Introduction.The focus of this paper is the numerical analysis of multidimensional diffusion processes that approximate the dynamics of a queue with many parallel servers.A many-server queue serves as a building block modeling operations of a large-scale service system.Such a service system may be a call center with hundreds of agents, a hospital department with tens or hundreds of inpatient beds, or a computer cluster with many processors.When the customers of a service system are human beings, some of them may abandon the system before their service begins.The phenomenon of customer abandonment is ubiquitous because no one would wait for service indefinitely.As argued in Garnett, Mandelbaum and Reiman (2002), one must model customer abandonment explicitly in order for an operational model to be relevant for decision making.We model customer abandonment by assigning each customer a patience time.When a customer's waiting time for service exceeds his patience time, he abandons the queue without service.
The exact analysis of such a many-server queue has been largely limited to an M/M/n+M model (also called an Erlang-A model) that has a Poisson arrival process and exponential service and patience time distributions.See, e.g., Garnett, Mandelbaum and Reiman (2002).However, as pointed out by Brown et al. (2005), the service time distribution in a call center appears to follow a log-normal distribution.Such distributions have also been observed by Shi et al. (2010) for lengths of stay in a hospital.Moreover, the patience time distribution in a call center has been observed to be far from exponential by Zeltyn and Mandelbaum (2005).With a general service or patience time distribution, there is no finite-dimensional Markovian representation of the queue.Except computer simulations, there is no method to exactly analyze such a queue either analytically or numerically.To deal with the challenge, the following strategies are adopted in this paper for analyzing a many-server queue.
First, the service time distribution is restricted to be phase-type.Since phase-type distributions can be used to approximate any positive-valued distribution, such a queueing model is still relevant to practical systems.We focus on a GI/P h/n + GI queue with n identical servers.The first GI indicates that the customer interarrival times are independent and identically distributed (iid) following a general distribution, the P h indicates that the service times are iid following a phase-type distribution, and the +GI indicates that the patience times are iid following a general distribution.Second, we are particularly interested in a queue operating in the Quality-and Efficiency-Driven (QED) regime: The queue has a large number of servers and the arrival rate is high; the arrival rate and the service capacity are approximately balanced so that the mean waiting time is relatively short compared with the mean service time.As argued in Garnett, Mandelbaum and Reiman (2002), such a system has high server utilization as well as short customer waiting times and a small fraction of abandonment.Therefore, both quality and efficiency can be achieved in this regime.Third, rather than analyzing the many-server queue itself, we propose and analyze diffusion models that approximate the queue.Two diffusion models are proposed in this paper.In each model, a multidimensional diffusion process is used to represent the scaled customer numbers among service phases.The difference between the two diffusion models lies in how the patience time distribution is built into them.The first diffusion model uses the patience time density at zero and the second one uses the entire patience time distribution.In particular, the diffusion process in the first model is a multidimensional piecewise Ornstein-Uhlenbeck (OU) process.We propose an algorithm in this paper to numerically solve the stationary distribution of a diffusion process.The computed stationary distribution is used to estimate the performance measures of a many-server queue.Numerical examples in Section 6 demonstrate that the diffusion models are very accurate in predicting the performance of a many-server queue, even if the queue has as few as twenty servers.
Except for the one-dimensional case, the stationary distribution of a piecewise OU process has no explicit formula.The algorithm proposed in this paper is a variant of the one in Dai and Harrison (1992), which computes the stationary distribution of a semimartingale reflecting Brownian motion (SRBM).As in Dai and Harrison (1992), the starting point of our algorithm is the basic adjoint relationship that characterizes the stationary distribution of a diffusion process.With an appropriate reference density, the algorithm can produce a stationary density that satisfies this relationship.
We set up a Hilbert space using the reference density.In this space, the stationary density is orthogonal to an infinite-dimensional subspace H.A finite-dimensional subspace H k is used to approximate H and a function orthogonal to H k can be numerically computed by solving a system of finitely many linear equations.This function is used to approximate the stationary density.There are two sources of error in computing the approximate stationary density by our algorithm: approximation error and round-off error.The approximation error arises because H k is an approximation of H.As H k increases to H, the approximation error decreases to zero.The round-off error occurs because the solution to the system of linear equations has error due to the finite precision of a computer.As H k increases to H, the dimension of the linear system gets higher and the coefficient matrix becomes closer to singular.As a consequence, the round-off error increases.The condition number of the matrix is used as a proxy for the round-off error.Balancing the approximation error and the round-off error is an important issue in our algorithm.
A properly chosen reference density is essential for the convergence of the algorithm.By convergence, we mean that the approximation error converges to zero as H k increases to H.More importantly, a "good" reference density can make H k converge to H quickly so that the resulting approximation error and round-off error are small simultaneously even though the dimension of H k is moderate.To ensure the convergence of the algorithm, the reference density should have a comparable or slower decay rate than the stationary density.Since the stationary density is unknown, we make a conjecture on the tail behavior of the limit queue length process of many-server queues with customer abandonment.We conjecture that the limit queue length process has a Gaussian tail and the tail depends on the service time distribution only through its first two moments.This tail is used to construct a productform reference density.With this reference density, the algorithm appears to converge quickly, producing stable and accurate results.For comparison purposes, we also test the algorithm with a certain "naively" chosen reference density in Section 7.1.The algorithm fails to converge with the "naive" reference density.The major contributions of this paper are the proposed diffusion models and the proposed reference densities that are critical to the numerical algorithm for computing the stationary distribution of a diffusion model.
Our diffusion models are obtained by replacing certain scaled renewal processes by Brownian motions.The replacement procedure is rooted in the many-server heavy traffic limit theorems that are proved in an asymptotic regime.The two diffusion model proposed in this paper are motivated by the diffusion limits proved in Dai, He and Tezcan (2010) and Reed and Tezcan (2009).See Section 4.3 for more details.The theory of diffusion approximation for many-server queues can be traced back to the seminal paper by Halfin and Whitt (1981), where a diffusion limit was established for GI/M/n queues.Garnett, Mandelbaum and Reiman (2002) proved a diffusion limit for M/M/n + M queues that allows for customer abandonment, and Whitt (2005) generalized the result to G/M/n + M queues.Puhalskii and Reiman (2000) established a diffusion limit for GI/P h/n queues.Their result was extended to G/P h/n + GI queues with customer abandonment in Dai, He and Tezcan (2010).Recently, Reed and Tezcan (2009) proved a diffusion limit for GI/M/n + GI queues.In their framework, a refined limit process is obtained by scaling the patience time hazard rate function.Harrison and Nguyen (1990) derived Brownian models for multiclass open queueing networks.Their diffusion models are SRBMs and are rooted in the conventional heavy traffic limit theorems that are pioneered in Iglehart and Whitt (1970) for serial networks and Reiman (1984) for single-class networks.See Williams (1996) for a survey of limit theorems in literature.For a twodimensional SRBM living in a rectangle, Dai and Harrison (1991) proposed an algorithm computing its stationary distribution.Dai and Harrison (1992) extended the algorithm for an SRBM living in an orthant.To deal with the unbounded state space, the notion of a reference density was first introduced there.Their finite-dimensional space H k is constructed via (global) multinominals of order at most k.With this choice of H k , the algorithm appears numerically unstable occasionally.In such a case, the round-off error may dominate the approximation error while the approximation error is still significant.Shen et al. (2002) extended Dai and Harrison (1991) to a hypercube state space of an arbitrary dimension.They used a finite element method to construct H k to avoid numerical instability.Their algorithm sometimes converges slowly because they did not explore a reference density.A linear programming algorithm for computing the stationary distribution of a diffusion process was proposed in Saure, Glynn and Zeevi (2009).Both SRBMs in an orthant and a diffusion approximation of many-server queues with two priority classes were investigated in their paper.Like the role of the reference density, it appears that the rescaling of variables is essential to the convergence of their algorithm.
The remainder of the paper is organized as follows.General diffusion processes are introduced in Section 2, where the basic adjoint relationship for a diffusion process is also presented.In Section 3, we begin with recapitulating the generic algorithm of Dai and Harrison (1992), and then propose a finite element implementation that follows Shen et al. (2002).Two diffusion models for GI/P h/n + GI queues are presented in Section 4. In Section 5, we discuss how to choose an appropriate reference density exploiting the tail behavior of a diffusion process.In Section 6, it is demonstrated via numerical examples that the diffusion models serve as good approximations of many-server queues.Section 7 is dedicated to some implementation issues arising from the proposed algorithm.The paper is concluded in Section 8. We leave the proofs of Propositions 2 and 3 to the appendix.
Notation.The symbols N, R, and R + are used to denote the sets of positive integers, real numbers, and nonnegative real numbers, respectively.For d, m ∈ N, R d denotes the d-dimensional Euclidean space and R d×m denotes the space of d × m real matrices.We use C 2 b (R d ) to denote the set of real-valued functions on R d that are twice continuously differentiable with bounded first and second derivatives.For z, w ∈ R, we set z + = max{z, 0}, z − = max{−z, 0}, and z ∧ w = min{z, w}.All vectors are envisioned as column vectors.For a d-dimensional vector x ∈ R d , we use x j for its jth entry and diag(x) for the d × d diagonal matrix with jth diagonal entry x j .For a matrix M , M denotes its transpose, M ij denotes its (i, j)th entry, and |M | = ( i,j M 2 ij ) 1/2 .We reserve I for the d × d identity matrix, e for the d-dimensional vector of ones, and e j for the d-dimensional vector with its jth entry one and all other entries zero.Given two functions ϕ and φ from N to R, we write φ 2. Diffusion processes.Let d be a positive integer.This paper focuses on a d-dimensional diffusion process X = {X(t) : t ≥ 0}.Let (Ω, F, F, P) be a filtered probability space with filtration F = {F t : t ≥ 0}.We assume that X satisfies the following stochastic differential equation (2.1) where the drift coefficient b is a function from R d to R d , the diffusion coefficient σ is a function from R d to R d×m , and B = {B(t) : t ≥ 0} is an m-dimensional standard Brownian motion with respect to F. We assume that both b and σ are Lipschitz continuous, i.e., there exists a constant Under condition (2.2), the stochastic differential equation (2.1) has a unique strong solution, i.e., there exists a unique process X on (Ω, F, F, P) such that (a) X is adapted to F, (b) for each sample path ω ∈ Ω, X(t, ω) is continuous in t, and (c) for each t ≥ 0, the stochastic differential equation (2.1) holds with probability one.See Øksendal (2003) for more details.We also assume that σ is uniformly elliptic, i.e., there exists a constant c 2 > 0 such that We are interested in the diffusion processes that model the dynamics of a queue with many parallel servers.Parallel-server queues will be introduced in Section 4. In that section, two diffusion processes will be identified to model such a queue and the coefficients b and σ will be mapped out explicitly in terms of primitive data of the queue.The diffusion models presented in Section 4 are rooted in many-server heavy traffic limit theorems proved in Dai, He and Tezcan (2010) and Reed and Tezcan (2009).
A probability distribution π on R d is said to be a stationary distribution of X if X(t) follows distribution π for each t > 0 whenever X(0) has distribution π.Condition (2.3) is required to ensure the uniqueness of the stationary distribution.See Dieker and Gao (2011) for more details.In this paper, we assume that X has a unique stationary distribution π and π has a density g with respect to the Lebesgue measure on R d .For a general diffusion process, there is no explicit solution for π.This paper develops a numerical algorithm computing π.As in Dai and Harrison (1992), the starting point of the algorithm is the basic adjoint relationship (2.5) where G is the generator of X defined by (2.6) and Σ is the covariance matrix given by (2.4).The following theorem is a consequence of Proposition 9.2 in Ethier and Kurtz (1986).
Theorem 1.Let π be a probability distribution on R d that satisfies (2.5).Then, π is a stationary distribution of X.
In this paper, we conjecture that a stronger version of Theorem 1 is true.
Conjecture 2. Let π be a signed measure on R d that satisfies (2.5) and π(R d ) = 1.Then, π is a nonnegative measure and consequently it is a stationary distribution of X.
Our algorithm is to construct a function g on R d such that (2.7) Assuming that Conjecture 2 is true, g must be the unique stationary density of X.As a special case, the nonnegativity of a signed measure π that satisfies (2.5) for a piecewise OU process was proposed as an open problem by Dai and Dieker (2010).Piecewise OU processes will be introduced in Section 4.3.1.
3. A finite element algorithm for stationary distributions.In this section, we propose a numerical algorithm computing the stationary density g.The basic algorithm follows the one developed in Dai and Harrison (1992).The finite element implementation closely follows Shen et al. (2002).

3.1.
A reference density.To compute the stationary density g, we adopt a notion called a reference density that was first introduced by Dai and Harrison (1992).A reference density for g is a function r defined from R d to where is called the ratio function.Such a function r exists because r = g is a reference density.The reference density controls the convergence of our algorithm.We will discuss how to choose a reference density for the diffusion models of a many-server queue in Section 5.
For the rest of Section 3, we assume that a reference density r satisfying (3.1) has been determined and remains fixed.In addition, we assume that Let L 2 (R d , r) be the space of all square-integrable functions on R d with respect to the measure that has density r, i.e., where B(R d ) is the set of Borel-measurable functions on R d .Condition (3.1) implies that q ∈ L 2 (R d , r).We define an inner product on L 2 (R d , r) by The induced norm is given by One can check that L 2 (R d , r) is a Hilbert space and assumption (3.2) ensures that With a fixed reference density r, we need only compute the ratio function q by (3.5).Once q is obtained, we can compute the stationary density via g(x) = q(x)r(x) where the closure is taken in the norm in (3.4).As a subspace of L 2 (R d , r), H is orthogonal to q.Let c be a constant function and c(x be the projection of c onto H.Then, c−c must be orthogonal to H. Assuming that Conjecture 2 holds and X has a unique stationary density g, one must have q = κ q (c − c) for some constant κ q ∈ R. By (3.7), the normalizing constant κ q satisfies Hence, the ratio function is given by (3.9) 3.2.An approximate stationary density.To compute q by (3.9), we need first compute c, the projection of c onto H.The space H is linear and infinite-dimensional (i.e., a basis of H contains infinitely many functions).In general, solving (3.8) in an infinite-dimensional space is impossible.In the algorithm, we use a finite-dimensional subspace H k to approximate H.
Suppose that there exists a sequence of finite-dimensional subspaces Dai and Harrison (1992), we have the following approximation result.
Proposition 1. Assume that Conjecture 2 is true.Then, where q k = (c − ck )/ c − ck 2 .Furthermore, when the reference density r is bounded, As in Dai and Harrison (1992), we choose for some finite-dimensional space C k .In Section 3.3, we will discuss how to construct C k using a finite element method.For notational convenience, we omit the subscript k when k is fixed.The finite-dimensional functional space is thus denoted by C. Let m C be the dimension of C and {f i : i = 1, . . ., m C } be a basis of C. We assume that the family Using the fact Gf i , c − ck = 0 for i = 1, . . ., m C , we obtain a system of linear equations (3.12) Au = v where (3.13) By the linear independence assumption, the m C × m C matrix A is positive definite.Thus, u = A −1 v is the unique solution to (3.12).Once the vector u is obtained, we can compute the projection ck by (3.11).Finally, the stationary density g can be approximated via

3.3.
A finite element method.In Dai and Harrison (1992), the authors employed multinominals of orders up to k to construct the space C k .This choice appears to be numerically unstable.The approximation error is significant when k is small, say, k ≤ 5.As k increases, the round-off error in solving (3.12) increases and ultimately dominates the approximation error.Although their implementation produces accurate estimates for the stationary means of SRBMs, it sometimes produces poor estimates for the stationary distributions.In this section, we construct a sequence of spaces {C k : k ∈ N} using the finite element method as in Shen et al. (2002).Because the state space in Shen et al. (2002) is bounded, neither a reference density nor state space truncation is used there.
The state space of X is unbounded in our setting.It is necessary to truncate the state space to apply the finite element method.Let {K k : k ∈ N} be a sequence of compact sets in R d .For each f ∈ C k , we assume that f (x) = 0 for x ∈ R d \ K k .The subscript k is omitted again when it is fixed and we use K to denote the compact support of the space C. In our implementation, we restrict K to be a d-dimensional hypercube where both ζ j and ξ j are positive constants for j = 1, . . ., d.
We partition K into a finite number of subdomains.Such a partition is called a mesh and each subdomain is called a finite element.Since K is a hypercube, it is natural to use a lattice mesh, where each finite element is again a hypercube.In this case, each corner point of a finite element is called a node.In dimension j = 1, . . ., d, we divide the interval [−ζ j , ξ j ] into n j subintervals by partition points Then, K is divided into d j=1 n j finite elements.For future reference, we label the nodes in the way that node (i 1 , . . ., i d ) corresponds to spatial coordinate (y i 1 1 , . . ., y i d d ), and define h j = y +1 j − y j for = 0, . . ., n j − 1 and j = 1, . . ., d.
If ∆ denotes such a mesh, we define |∆| = max{h j : = 0, . . ., n j − 1; j = 1, . . ., d} and The finite-dimensional space C is generated using the above mesh.We use the cubic Hermite basis functions to construct a basis of C, as in Shen et al. (2002).The one-dimensional Hermite basis functions for −1 ≤ z ≤ 1 are given by In dimension j = 1, . . ., d and for = 1, . . ., n j − 1, let Let x = (x 1 , . . ., x d ) be a vector in K.At node (i 1 , . . ., i d ), the basis functions of C are the tensor-product Hermite basis functions (3.17) where χ j is either 0 or 1 and Therefore, each node has 2 d tensor-product basis functions and the space C has a total of For the one-dimensional Hermite basis functions in (3.16), the second order derivative of φ(z) is not defined at z = −1 and 1, and the second order derivative of ψ(z) is not defined at z = −1, 0, and 1.As a consequence, there exists f ∈ C for which Gf is not defined on the boundaries of certain finite elements.Because such boundaries have Lebesgue measure zero in R d , for each f ∈ C, we can find a sequence of functions For the linear system (3.12) to have a unique solution, the family of functions The following proposition provides sufficient conditions for the linear independence.Its proof can be found in the appendix.
Proposition 2. Let G be the generator of X in (2.6) such that conditions (2.2) and (2.3) holds and all entries of Σ are continuously differentiable.Assume that r(x) > 0 for all x ∈ R d .Then, the family of functions r), where f i 1 ,...,i d ,χ 1 ,...,χ d is the basis function of C given by (3.17).Consequently, the solution to the linear system (3.12) is unique.Now let us consider a sequence of functional spaces {C k : k ∈ N}.Let ∆ k be the mesh for constructing C k .We assume that the mesh ∆ k+1 is a refinement of ∆ k , i.e., a node or an interelement boundary in ∆ k is also a node or an interelement boundary in ∆ k+1 .We further assume that such refinements are regular, i.e., for each η ∆ k defined in (3.15), the set {η ∆ k : k ∈ N} is bounded.The next proposition, along with Proposition 1, justifies the proposed algorithm for computing the stationary distribution.We leave the proof of Proposition 3 to the appendix, too.
Proposition 3. Let {∆ k : k ∈ N} be a sequence of lattice meshes such that each ∆ k+1 is a refinement of ∆ k and the refinements are regular.Let K k be the d-dimensional finite hypercube that is the domain of ∆ k , and C k be the functional space generated by ∆ k using the tensor-product Hermite basis functions in (3.17).Let H be the infinite-dimensional space in (3.6) and H k be the finite-dimensional space in (3.10),where the generator G satisfies (2.2) and (3.2).Assume that Then, 4. Diffusion models for many-server queues.In this section, we introduce GI/P h/n + GI queues and present two diffusion models for such a queue with many servers.The two models differ in how the patience time distribution is built into them.The patience time density at zero is used in the first model, whereas the entire patience time distribution is used in the second model.4.1.GI/P h/n + GI queues in the QED regime.We focus on a queue with many servers working in the QED regime.The QED regime will be discussed shortly.In this queue, the service time distribution is restricted to be phase-type.All positive-valued distributions can be approximated by phase-type distributions.
Let p be a d-dimensional nonnegative vector whose entries sum to one, ν be a d-dimensional positive vector, and P be a d × d sub-stochastic matrix.We assume that the diagonal entries of P are zero and P is transient, namely, I−P is invertible.Consider a continuous-time Markov chain with d+1 phases (or states) where phases 1, . . ., d are transient and phase d + 1 is absorbing.For j = 1, . . ., d, the Markov chain starts in phase j with probability p j .The amount of time it stays in phase j is exponentially distributed with mean 1/ν j .When it leaves phase j, the Markov chain enters phase = 1, . . ., d with probability P j or enters phase d+ 1 with probability 1− d =1 P j .The phase-type distribution with parameters (p, ν, P ) is the distribution of time from starting until absorption in phase d + 1 for the above Markov chain.In particular, when P is a zero matrix, the associated phase-type distribution is a hyperexponential distribution with d phases.
In a GI/P h/n+GI queue, there are n identical servers working in parallel.The customer arrival process is a renewal process.Upon arrival, a customer enters service immediately if an idle server is available.Otherwise, he waits in a buffer with infinite waiting room that holds a first-in-first-out (FIFO) queue.The service times form a sequence of iid random variables, following a phase-type distribution.When a server finishes serving a customer, the server takes the leading customer from the waiting buffer.When the buffer is empty, the server begins to idle.Each customer has a patience time.The patience times are iid following a general distribution.When a customer's waiting time in queue exceeds his patience time, the customer abandons the system with no service.
Let λ be the arrival rate and 1/µ be the mean service time.The system is assumed to operate in the QED regime, i.e., both the arrival rate λ and the number of servers n are large, while the traffic intensity ρ = λ/(nµ) is close to one.Because customer abandonment is allowed, it is not necessary to assume ρ < 1 for the system to reach a steady state.For future purposes, we put (4.1) Assume that the phase-type service time distribution has parameters (p, ν, P ).Each service time can be decomposed into a number of phases.When a customer is in service, he must be in one of the d phases.Let Z j (t) denote the number of customers in phase j service at time t.In steady-state, one expects that the customers in service are distributed among the d phases following a distribution γ, given by One can check that d j=1 γ j = 1 and γ j is interpreted to be the fraction of phase j service load on the n servers.
Suppose that all customers, including those initial customers waiting in the buffer at time zero, sample their first service phases following distribution p upon arrival.One can stratify customers in the waiting buffer according to their first service phases.For j = 1, . . ., d, we use W j (t) to denote the number of waiting customers at time t whose service begins with phase j.Then, is the number of phase j customers in the system, either waiting or in service.
Let Y (t) be the corresponding d-dimensional random vector and In each diffusion model, the process Ỹ = { Ỹ (t) : t ≥ 0} is approximated by a d-dimensional diffusion process.
4.2.System equation.The GI/P h/n + GI queue is driven by several primitive processes.Let E = {E(t) : t ≥ 0} be the arrival process, where E(t) is the number of arrivals by time t.For j = 1, . . ., d, let S j = {S j (t) : t ≥ 0} be a Poisson process with rate ν j , and φ j = {φ j (i) : i ∈ N} be a sequence of iid d-dimensional random vectors such that φ j (i) takes e with probability P j and takes a zero vector with probability 1 − d =1 P j .Similarly, let φ 0 = {φ 0 (i) : i ∈ N} be a sequence of iid d-dimensional random vectors such that φ 0 (i) takes e with probability p .For j = 0, . . ., d, define the routing process We assume that Y (0), E, S 1 , . . ., S d , Φ 0 , . . ., Φ d are mutually independent.
For j = 1, . . ., d, let T j (t) be the cumulative amount of service effort received by customers in phase j service by time t.Clearly, (4.5) Thus, S j (T j (t)) is equal in distribution to the cumulative number of phase j service completions by time t.(For more details, please refer to Section 4.1 of Dai, He and Tezcan (2010) on a perturbed system.)Let L j (t) be the cumulative number of phase j customers who have abandoned the system by time t, and L(t) be the corresponding d-dimensional vector.One can check that the process Y = {Y (t) : t ≥ 0} satisfies the following equation where S(T (t)) = (S 1 (T 1 (t)), . . ., S d (T d (t))) .
To derive the diffusion models, consider a scaled version of (4.6).We define several scaled processes by for t ≥ 0 and j = 1, . . ., d, where p j is the jth column of P .By (4.1)-(4.5), the dynamical equation in (4.6) turns out to be (4.7) 4.3.Diffusion models.In both diffusion models, we replace the scaled primitive processes in (4.7) by certain Brownian motions.These approximations can be justified by the functional central limit theorem.When the number of servers n is large, the corresponding diffusion process in each model can be proved close to Ỹ via a continuous map.Please refer to Dai, He and Tezcan (2010) for related convergence results.
Let B E be a one-dimensional driftless Brownian motion with variance λc 2 a /n, where c 2 a is the squared coefficient of variation for the interarrival time distribution.Let B 0 , . . ., B d , and B S are d-dimensional driftless Brownian motions with covariance matrices H 0 , . . ., H d , and diag(ν), respectively, where for j = 1, . . ., d.We assume that Ỹ (0), B E , B 0 , . . ., B d , B S are mutually independent.In the diffusion models, the above Brownian motions take the places of the scaled primitive processes Ẽ, Φ0 , . . ., Φd , S, respectively.Let Q(t) be the queue length or the number of waiting customers at time t and Then, When n is large, these waiting customers are approximately distributed among the d phases according to distribution p (see Lemma 2 of Dai, He and Tezcan (2010)), i.e., It follows from (4.3) that By (4.4) and (4.8), this approximation has a scaled version (4.9) Z(t) ≈ Ỹ (t) − p(e Ỹ (t)) + .
Let G(t) be the cumulative number of abandoned customers by time t and These abandoned customers are also approximately distributed among the d phases by distribution p, i.e., (4.10) L(t) ≈ p G(t).
We also exploit the following approximations (4.11) The approximations in (4.9)-(4.11)are used in both diffusion models.These two models differ only in how to approximate the scaled abandonment process G = { G(t) : t ≥ 0}.
4.3.1.Diffusion model using the patience time density at zero.In the first diffusion model, the patience time distribution is used only through its density at zero when approximating G. Let F be the distribution function of the patience times.We assume that (4.12) Thus, α is the patience time density at zero.In this model, G is approximated by When α = 0, this approximation yields G(t) ≈ 0 for all t ≥ 0. In this case, the diffusion model is for a GI/P h/n queue without abandonment.When α > 0, the intuition of (4.13) is as follows.For a many-server queue in the QED regime, the queue length is typically in the order of O(n 1/2 ).As a result, each customer's waiting time should be in the order of O(n −1/2 ), which is relatively short because n is large.At time s, a waiting customer will abandon the system during the next δ time units with probability αδ.
Hence, the instantaneous abandonment rate of the system is approximately αQ(s).It follows that which, along with (4.4) and (4.8), leads to the approximation in (4.13).This relationship can be justified by Theorem 1 of Dai and He (2010).As pointed out by Dai and He (2010) and Mandelbaum and Momčilović (2009), the performance of a many-server queue in the QED regime is insensitive to the patience time distribution as long as its density α at zero is fixed and positive.
In the dynamical equation (4.7), when the scaled primitive processes are replaced by appropriate Brownian motions and the approximations in (4.9)-(4.13)are employed, we obtain the following stochastic differential equation where we take X(0) = Ỹ (0).We may write (4.14) into the standard form and B is a d-dimensional standard Brownian motion.One can check that Σ(x) is positive definite and thus satisfies (2.3).The drift coefficient b in (4.15) is a piecewise linear function of x.Both b and σ are Lipschitz continuous.Therefore, a strong solution to (4.14) exists and is known as a d-dimensional piecewise OU process.In this model, the diffusion process X depends on the patience time distribution only through its density at zero.When using the proposed algorithm to solve the stationary density, it follows from Proposition 2 that the linear system (3.12) has a unique solution.
If we replace ρ by one in (4.14), the resulting diffusion process turns out to be the diffusion limit for G/P h/n + GI queues in Theorem 2 of Dai, He and Tezcan (2010).This limit process is the strong solution of the following stochastic differential equation (4.17) Since ρ is close to one in the current setting, the above limit process justifies the diffusion model in (4.14).4.3.2.Diffusion model using patience time hazard rate scaling.When the patience time distribution does not have a density at zero, the diffusion model in (4.14) fails to exist.When α = 0 and ρ > 1, the diffusion process X in (4.14) does not have a stationary distribution.In this case, the model cannot be a satisfactory approximation of the many-server queue, as the queue may have a stationary distribution thanks to customer abandonment.It is also demonstrated in Reed and Tezcan (2009) that when the density near zero rapidly changes, the system performance can be sensitive to the patience time distribution in a neighborhood of the origin.In this case, using the patience time density at zero solely may not yield adequate approximation to the queue.Our second diffusion model exploits the idea of scaling the patience time hazard rate function, which was first proposed in Reed and Ward (2008) for single-server queues and was recently extended to manyserver queues in Reed and Tezcan (2009).
In this model, we assume that the patience time distribution F satisfies and it has a bounded hazard rate function h, given by where f F is the density of F .With the hazard rate function, F can be written by In the second diffusion model, the scaled abandonment process G is approximated by The entire patience time distribution is built into the approximation through its hazard rate function.The intuition of the hazard rate scaling approximation was explained in Reed and Ward (2008): Consider the Q(s) waiting customers in the buffer at time s.In general, only a small fraction of customers can abandon the system when the queue is working in the QED regime.Then by time s, the ith customer from the back of the queue has been waiting around i/λ time units.Approximately, this customer will abandon the queue during the next δ time units with probability h(i/λ)δ.It follows that for the system, the instantaneous abandonment rate at time s is close to i=1 h(i/λ).By (4.4) and (4.8), the scaled abandonment rate can be approximated by from which (4.18) follows.Note that the arrival rate λ is in the order of O(n) and Q(s) is in the order of O(n 1/2 ).The patience time distribution in a small neighborhood of zero, not just its density at zero, is considered in the instantaneous abandonment rate in (4.19).Hence, the hazard rate scaling approximation in (4.18) is more accurate than that in (4.13).This approximation can be justified for GI/M/n + GI queues by Propositions 9.1 and 9.2 in Reed and Tezcan (2009).With minor modifications to the proofs, these two propositions can be extended to GI/P h/n + GI queues.
Let m be a nonnegative integer.Suppose that the hazard rate function h is m times continuously differentiable in a neighborhood of zero.By Taylor's theorem, for z > 0 small enough, where h ( ) is the th order derivative of h.In this case, the approximation in (4.18) turns out to be ((e Ỹ (s)) + ) +1 ds.
Because h(0) is identical to the patience time density at zero, the approximation in (4.13) can be regarded as the zeroth degree Taylor's approximation of (4.18).When the patience times are exponentially distributed, the hazard rate function is constant and the two approximations in (4.13) and (4.18) are identical.See Section 4 of Reed and Tezcan (2009) for more discussion.
Using the Brownian motion replacement and the approximations in (4.9)-(4.11)and (4.18), we obtain the second diffusion model for the GI/P h/n + GI queue, given by Because h is bounded, the drift coefficient b is Lipschitz continuous and the stochastic differential equation (4.20) has a strong solution.By Proposition 2, the solution to the linear system (3.12) is unique when we use the proposed algorithm to solve the stationary density of this diffusion model.Comparing (4.14) and (4.20), one can see that the two models differ only in how the patience time distribution is incorporated.Because a more accurate approximation is used for the abandonment process, the second model can provide a better approximation for the queue.
5. Choosing a reference density.The reference density controls the convergence of the proposed algorithm.In this section, we discuss how to choose appropriate reference densities for the diffusion models.Some considerations are as follows.
First, to be a reference density, a candidate function r must satisfy (3.1) even though the stationary density g is unknown.The second condition in (3.1) requires that r have a comparable or slower decay rate than g.When g is bounded, its decay rate is sufficient to determine a function r that satisfies (3.1).
Second, the most computational effort in the algorithm is constructing and solving the system of linear equations (3.12).As demonstrated by Proposition 3, the finite-dimensional H k approximates the infinite-dimensional H better as k increases, thus reducing the approximation error.On the other hand, as the dimension of H k increases, constructing and solving (3.12) requires more computation time and memory space.The condition number of the matrix A in (3.12) also gets worse as the dimension of H k becomes large.This yields higher round-off error.A "good" reference density should balance the approximation error and the round-off error.With such a reference density, it is possible to have small approximation error even if the dimension of H k is moderate.
Intuitively, when r is "close" to the stationary density g, both the ratio function q and the projection c are close to constant.We can thus expect that a space H k with a moderate dimension is able to produce a satisfactory approximation.All these observations motivate us to explore the tail behavior of a diffusion model.5.1.Tail behavior.Let us focus on the diffusion limit in (4.17).Assume that the piecewise OU process X is positive recurrent and has a stationary distribution.Let X(∞) be the corresponding d-dimensional random vector in steady state.Since ρ is close to one, the tail behavior of the diffusion process X in (4.14) is expected to be comparable to that of the limit diffusion process X in (4.17).
To explore the tail behavior of X(∞), consider a sequence of GI/GI/n + GI queues in the QED regime.If all patience times are infinite, the queues turn out to be GI/GI/n queues without customer abandonment.In each queue, the service times are iid following a general distribution.We assume that these queues, each indexed by the number of servers n, have the same service time distribution.Let λ n be the arrival rate of the nth system.To mathematically define the QED regime, we assume that (5.1) lim where ρ n = λ n /(nµ) is the traffic intensity of the nth system.Assume that all these queues are in steady state.Let N n (∞) be the stationary number of customers in the nth system and For GI/GI/n queues in the QED regime, the limit queue length in steady state was studied in Gamarnik and Momčilović (2008), where the service time distribution is assumed to be lattice-valued on a finite support.The authors first showed that Ñn (∞) weakly converges to a random variable Ň (∞) as n goes to infinity, and then proved that where c 2 a and c 2 s are the squared coefficients of variation of the interarrival and the service time distributions, respectively.In (5.3), the decay rate does not depend on the service time distribution beyond its first two moments.Recently, this result has been extended in Gamarnik and Goldberg (2011) to GI/GI/n queues with a general service time distribution.
When α = 0 and d = 1, the limit diffusion process X in (4.17) is for GI/M/n queues without customer abandonment.In this case, the service time distribution is exponential and Ň (∞) = X(∞).It was proved in Halfin and Whitt (1981) that the stationary density of X(∞) has a closed-form expression where a 1 and a 2 are normalizing constants making ǧ continuous at zero.The decay rate of ǧ in (5.4) is consistent with (5.3).Both formulas suggest that Ň (∞) has an exponential tail on the right side.For a GI/GI/n + GI queue with many servers and customer abandonment, the limiting tail behavior of Ñn (∞) remains unknown except for very simple cases.When α > 0 and d = 1, the limit diffusion process X in (4.17) is a one-dimensional piecewise OU process.It admits a piecewise normal stationary density where a 3 and a 4 are normalizing constants that make ǧ continuous at zero.See Browne and Whitt (1995) for more details.Observing (5.3) and (5.5), we conjecture that for a sequence of GI/GI/n+ GI queues in the QED regime, the limiting tail behavior of Ñn (∞) depends on the service time distribution only through its first two moments, and on the patience time distribution only through its density at zero.Conjecture 3. Consider a sequence of GI/GI/n + GI queues that satisfies (4.12), (5.1), and (5.2).Assume that the patience time distribution has a positive density at zero, i.e., α > 0 in (4.12).Assume further that the interarrival and the service time distributions satisfy the T 0 assumptions (i)-(iii) in Section 2.1 of Gamarnik and Goldberg (2011).Then, (a) N n (∞) exists for each n; (b) the sequence of random variables { Ñn (∞) : n ∈ N} weakly converges to a random variable Ň (∞); (c) Ň (∞) satisfies .
The intuition below may help understand why the conjectured decay rate must be Gaussian.When Ň (∞) > z for some z > 0, there are more than n 1/2 z waiting customers in the queue correspondingly, and each waiting customer is "racing" to abandon the system.At any time, the instantaneous abandonment rate is approximately proportional to the queue length.In such a system, the customer departure process, including both service completions and customer abandonments, behaves as if the system is a queue with infinite servers.Thus, one can expect that the tail of the limit queue length is Gaussian, which decays much faster than an exponential tail for queues without abandonment.

Reference densities for model (4.14).
For GI/P h/n + GI queues, the limit diffusion process X in (4.17) satisfies The discussion in Section 5.1 gives ample evidence of the tail behavior remains unknown when d > 1, our numerical experiments suggest that this tail is not sensitive to the service time distribution beyond its mean.Thus, we use the left tail for a queue with an exponential service time distribution to construct the reference density.We propose to use a product reference density When α = 0 and ρ < 1 in (4.14), there is no abandonment in the queue.Based on (5.3) and (5.4), we choose (5.7) where β is given by (4.1).The function r j has an exponential tail on the right and a Gaussian tail on the left.One can check that the reference density given by (5.6) and (5.7) satisfies condition (3.3).In (5.7), we set the shift term for z < 0 to be γ j β according to the following observation.In the associated queue, β is the scaled mean number of idle servers and γ j is the fraction of phase j service load.In steady state, one can expect that Ỹj (t), the centered and scaled number of phase j customers, is around −γ j β.
When α > 0 in (4.14), the associated queue has abandonment.By (5.5) and Conjecture 3, we choose (5.8) whose two tails are both Gaussian but have different decay rates.This reference density also satisfies (3.3).In (5.8), the shift term for z ≥ 0 is taken to be p j µβ/α because of the observation below.When ρ ≥ 1, the throughput of the queue is nearly nµ.Let q 0 be the scaled queue length "in equilibrium", i.e., the arrival and the departure rates of the system are balanced when the queue length is around n 1/2 q 0 .Because in this case the abandonment rate is αn 1/2 q 0 , we must have λ = nµ + αn 1/2 q 0 , or q 0 = −µβ/α by (4.1).Since the fraction of phase j waiting customers is around p j , Ỹj (t) is around −p j µβ/α as the queue reaches a steady state.

Reference densities for model (4.20).
For the diffusion model (4.20) that adopts the patience time hazard rate scaling, the tail behavior of X in steady state is left to future research.In some cases, we may exploit the diffusion limit in (4.17) to facilitate the choice of a reference density for the current model.The principle is again to ensure that the reference density has a comparable or slower decay rate than the stationary density of X.For that, we build an auxiliary queue that shares the same arrival process and service times with the GI/P h/n + GI queue, but the auxiliary queue may have no abandonment or have an exponential patience time distribution.Let X be the diffusion process in (4.14) for the auxiliary queue.If X has a slower decay rate than X, a reference density of X must be a reference density of X, too.
When ρ < 1, the auxiliary queue is a GI/P h/n queue.It is intuitive that the queue length decays faster in the GI/P h/n + GI queue than in the auxiliary queue because the latter has no abandonment.As a consequence, X has a slower decay rate than X and the reference density given by (5.6) and (5.7) for X can be used for the current model.
When ρ > 1, the auxiliary queue is a GI/P h/n + M queue.Let α > 0 be the rate of the exponential patience time distribution, which is to be determined in order for X to have an appropriate decay rate.For that, we need investigate the abandonment process of the GI/P h/n + GI queue.
Assume that the hazard rate function h is m times continuously differentiable in a neighborhood of zero for some nonnegative integer m, and among = 0, . . ., m, there is at least one h ( ) (0) = 0. We follow the convention that h (0) (0) = h(0).Let 0 be the smallest nonnegative integer such that h ( 0 ) (0) = 0.For z > 0 in a small neighborhood of zero, the 0 th degree Taylor's approximation of h is (5.9) h(z) ≈ h ( 0 ) (0)z 0 0 !, which, along with (4.8) and (4.18), implies that the scaled abandonment process can be approximated by This approximation implies that the abandonment process depends on the hazard rate function primarily through h ( 0 ) (0), the nonzero derivative at the origin with the lowest order.It also implies that the scaled abandonment rate at time t is approximately (5.10) In the hazard rate scaling, the scaled queue length in equilibrium q 0 satisfies (5.11) If (5.10) holds, it turns out to be which gives us (5.12) .
The scaled queue length process fluctuates around this equilibrium length.
Correspondingly, the instantaneous abandonment rate changes around an equilibrium level, too.This observation motivates us to take for the auxiliary GI/P h/n + M queue.With this setting, the original queue and the auxiliary queue have comparable abandonment rates when the scaled queue length is close to q 0 .For any q 1 > q 0 , when the scaled queue length is q 1 in both queues, the abandonment rate in the auxiliary queue is lower because Hence, when the queue length is longer than q 0 , it decays slower in the auxiliary queue than in the original queue.Consequently, the decay rate of X is slower than that of X and the reference density of X can work for this diffusion model.The above discussion suggests a product reference density in (5.6) with where q 0 follows (5.12) and α follows (5.13).
The above reference density fails when ρ = 1 and 0 > 0, because q 0 = 0 by (5.12) and thus α is zero in (5.13).In this case, we can still choose a reference density by (5.6) and (5.14) but using a traffic intensity ρ that is slightly larger than one.Because the tail of the queue length becomes heavier as ρ increases, a reference density for model (4.20) with ρ > 1 must have a comparable or slower decay rate than the stationary density of the model with ρ = 1.
The reference density in (5.6) and (5.14) that exploits the lowest-order nonzero derivative at the origin may fail when the hazard rate function has a rapid change near the origin.In this case, the Taylor's approximation in (5.9) may not be satisfactory when the queue length is not short enough.Such an example is discussed in Section 6.4.Choosing a reference density for that is more flexible.In addition, the above procedure cannot choose a reference density when all h ( ) (0)'s are zero, i.e., the hazard rate function is zero in a neighborhood of the origin.This topic will be explored in future research.

Truncation hypercube.
Once the reference density has been determined, we can choose the truncation hypercube K in (3.14) by the procedure below.First, pick a small number ε 0 > 0.Then, choose a hypercube K such that (5.15) When ε 0 is small enough, the influence of the reference density outside K is negligible in computing the stationary density.

Numerical examples.
Several numerical examples are presented in this section.In each example, we compute the stationary distribution of the number of customers in a many-server queue using a diffusion model and the proposed algorithm.We assume that the customer arrivals follow a homogeneous Poisson process and the service times follow a two-phase hyperexponential distribution with mean one, i.e., the system is an M/H 2 /n + GI queue with c 2 a = 1 and µ = 1.In such a queue, there are two types of customers.One type has a shorter mean service time than the other, and the service times of either type are iid following an exponential distribution.We approximate this queue by a two-dimensional diffusion process X.When the patience time distribution is exponential, both (4.14) and (4.20) yield the same diffusion process.When the patience time distribution is non-exponential, we use model (4.20) as it is more accurate.The results computed using the diffusion models are compared with the ones obtained either by the matrix-analytic method or by simulation.Please refer to Neuts (1981) and Latouche and Ramaswami (1999) for the implementation of the  matrix-analytic method.All simulation results are obtained by averaging twenty runs and in each run, the queue is simulated for one hundred thousand time units.
In the proposed algorithm, all numerical integration is implemented using a Gauss-Legendre quadrature rule.See Kress (1998) for more details.When computing A i or v i in (3.13), the integrand is evaluated at eight points in each dimension.In the numerical examples, the tail probability (6.1) ) is the two-dimensional random vector having probability density g.The integral in (6.1) is computed by adding up the integrals over the finite elements that intersect with the set {x ∈ R 2 : x 1 + x 2 > z}, and the integral over each finite element is again computed using a Gaussian-Legendre quadrature formula.Because the indicator function has jumps inside certain finite elements, we use sixtyfour points in each dimension when evaluating the integrand over each finite element.
6.1.Example 1: an M/H 2 /n + M queue.Consider an M/H 2 /n + M queue that has an exponential patience time distribution.We are interested in such a queue because its customer-count process N = {N (t) : t ≥ 0} is a quasi-birth-death process, where N (t) is the number of customers in system at time t.The stationary distribution of that can be computed by the matrix-analytic method.In this example, we take α = 0.5 for the rate of the exponential patience time distribution and p = (0.9351, 0.0649) and ν = (9.354,0.072) for the hyperexponential service time distribution.The mean service time of the second-type customers is more than one hundred times longer than that of the first type.Although over ninety percent of customers are of the first type, the fraction of its workload is merely ten percent, i.e., γ = (0.1, 0.9) .Such a distribution has a large squared coefficient of variation c 2 s = 24.The queue is approximated by the two-dimensional piecewise OU process X in (4.14).Because the service time distribution is hyperexponential, P is a zero matrix and thus R = diag(ν).By (4.15) and (4.16), the drift coefficient of X is and the covariance matrix of the diffusion coefficient is Three scenarios are considered in this example, in all of which the queue is overloaded.In the first two scenarios, there are n = 50 and 500 servers, respectively.The arrival rates are λ = 57.071and 522.36, or equivalently, ρ = 1.141 and 1.045.By (4.1), β = −1 in both scenarios.The third scenario, with n = 20 servers, will be presented shortly.
To compute the stationary distribution of X, we use a product reference density given by (5.6) and (5.8).To generate basis functions by the finite element method, we set the truncation rectangle K = [−7, 32] × [−7, 32], which is obtained by (5.15) with ε 0 = 10 −7 , and use a lattice mesh in which all finite elements are 0.5 × 0.5 squares.
Once the stationary density of X is obtained, one can approximately produce the distribution of N (∞), the stationary number of customers in system.Note that the probability density of X 1 (∞) + X 2 (∞) is given by The distribution of N (∞) can be approximated by For the first two scenarios, the distributions of N (∞) obtained by the diffusion model are illustrated in Figure 1.In the same figure, the stationary distributions computed by the matrix-analytic method are plotted, too.We see good agreement in Figure 1.Comparing the two scenarios, we also find out that the diffusion model in (4.14) is more accurate when the number of servers n is larger.This observation is consistent with the many-server limit theorem for G/P h/n + GI queues in Dai, He and Tezcan (2010).
The matrix-analytic method can be used because in this queue, the threedimensional process {(Q(t), Z 1 (t), Z 2 (t)) : t ≥ 0} forms a continuous-time Markov chain and the customer-count process N is a quasi-birth-death process.Clearly, In this example, level consists of + 1 states if ≤ n and it contains n + 1 states if > n.In the matrix-analytic method, the transition rate matrices between adjacent levels are exploited to compute the stationary distribution of N iteratively.Each iteration requires O(n 3 ) arithmetic operations.For this queue, the transition rate matrices at different levels are different because the abandonment rate depends on the queue length.For implementation purposes, we assume in the algorithm that at level > 0 for some 0 n, the abandonment rate at level is α( 0 − n) rather than α( − n).In other words, the transition rate matrices at level are invariant with respect to when > 0 .We take 0 = n + 2000 in all numerical examples.The extra error caused by this modification is negligible, because in this queue, the queue length is in the order of O(n 1/2 ) and the chance of the customer number exceeding 0 is extremely rare.
To investigate the diffusion model in (4.14) quantitatively, we list some steady-state performance measures in Table 1.They include the mean queue length, the fraction of abandoned customers, and the probabilities that the number of customers exceeds certain levels.Using the diffusion model, the mean queue length ≈ √ n It follows from the latter approximation that the abandonment fraction In the table, the tail probability P[N (∞) > ] is approximated by is computed via (6.1).In both scenarios, the diffusion model produces satisfactory numerical estimates.The computational complexity of the proposed algorithm, whether in computation time or in memory space, does not change with the number of servers n.In contrast, the matrix-analytic method becomes computationally expensive when n is large.In particular, the memory usage becomes a serious constraint when a huge number of iterations are required.For the n = 500 scenario in this example, it took around one hour to finish the matrix-analytic computation and the peak memory usage is nearly five gigabytes.Using the diffusion model and the proposed algorithm, it took less than one minute and the peak memory usage is less than two hundred megabytes on the same computer.See Section 7.4 for more discussion on the computational complexity.
Although the diffusion model is motivated and derived from the theory of many-server queues, it is still relevant for a queue with a modest number of servers.In the third scenario, there are n = 20 servers and the arrival rate is λ = 22.24.Thus, ρ = 1.112 and β = −0.5.In the proposed algorithm, we keep the same truncation rectangle and lattice mesh as in the previous two scenarios, and the reference density is again from (5.6) and (5.8).As illustrated in Figure 2, the diffusion model can still capture the exact stationary distribution for a queue with as few as twenty servers.Thus, c 2 s = 3 and γ = (0.1, 0.9) .Since there is no abandonment, we must take ρ < 1 in order for the system to reach a steady state.
The stationary distribution of the number of customers in system is shown in Figure 3.In both scenarios, the diffusion model produces a good approximation of the result by the matrix-analytic method.As in the previous example, the diffusion model is more accurate when the system scale is larger.Several performance measures in steady state are listed in Table 2.As in Table 1, satisfactory agreement can be found between the two approaches.

Example 3: an
where k > 1 is a positive integer and +E k signifies an Erlang-k patience time distribution.In this queue, each patience time is the sum of k stages and the stages are iid having an exponential distribution with mean 1/θ.When k > 1, the probability density at zero of an Erlang-k distribution is zero.The diffusion model in (4.14) does not have a stationary distribution when the queue is overloaded.Hence, we evaluate the diffusion model in (4.20) that exploits the patience time hazard rate scaling.In the following numerical experiments, we take k = 2 or 3 for the Erlang-k distribution and set θ = k, so the mean patience time is one unit time.The hyperexponential service time distribution is taken to be identical to that in Section 6.2.
The hazard rate function of the Erlang-k distribution is For the diffusion model (4.20), it follows from (4.21) that the drift coefficient of X is where The first two scenarios has n = 50 and 500 servers, respectively.Their respective arrival rates are λ = 42.929 and 477.64.Hence, ρ = 0.8586 and 0.9553, both leading to β = 1.In the proposed algorithm, the reference density is chosen according to (5.6) and (5.7).The truncation rectangle is taken to be K = [−7, 35] × [−7, 35] and is divided into 0.5 × 0.5 finite elements.Some performance estimates can be found in Table 3.
To evaluate the diffusion model (4.20), we list corresponding simulation estimates of the performance measures in both tables.As in the previous examples, the diffusion model produces adequate performance approxima-

tions.
Theoretically, the matrix-analytic method can be used in this example as the customer-count process N is also a quasi-birth-death process.But it is impractical because the computational complexity is too high.Consider the case k = 2. Let V 1 (t) and V 2 (t) be the respective numbers of waiting customers whose patience times are in the first and in the second stage at time t.For this M/H 2 /n + E 2 queue, the four-dimensional process {(V 1 (t), V 2 (t), Z 1 (t), Z 2 (t)) : t ≥ 0} is a continuous-time Markov chain.At level , there are + 1 states if ≤ n and there are (n + 1)( − n + 1) states if > n.The number of states at level is formidable when is large.Even if we may truncate the state space using the technique described in Section 6.1, the number of states is still too large to apply the matrix-analytic method.In fact, we are not aware of any other numerical methods other than simulation that can produce approximations in Tables 3 and 4. 6.4.Example 4: an M/H 2 /n + H 2 queue.Let us consider an example in which the patience time hazard rate function changes rapidly near the origin.As pointed out by Reed and Tezcan (2009), the performance of such a queue is sensitive to the patience time distribution in a neighborhood of zero.A model that exploits the patience time density at zero solely may not produce adequate performance estimates.In this example, the patience times follow a two-phase hyperexponential distribution that has p = (0.9, 0.1) and ν = (1, 200) .
In other words, there are two types of patience times.Ninety percent of patience times are exponentially distributed with mean one and ten percent are exponentially distributed with mean 0.005.We take the same hyperexponential service time distribution as in Sections 6.2 and 6.3.The hazard rate function of the hyperexponential patience time distribution is The drift coefficient of X in (4.20) is also given by (6.4) where In this example, we have h(0) = p1 ν1 + p2 ν2 = 20.9 and h Thus, 0 = 0 and the hazard rate function has a steep slope near the origin.Since the zeroth degree Taylor's approximation in (5.9) may bring on too much error, the reference density exploiting the lowest-order nonzero derivative at the origin could be erroneous.
To choose an appropriate reference density, an auxiliary queue is used again.As in Section 5.3, the auxiliary queue is an M/H 2 /n + M queue that shares the same arrivals and service times with the M/H 2 /n + H 2 queue.Let α > 0 be the rate of the exponential patience time distribution.We take α = ν1 ∧ ν2 so that the patience times in the auxiliary queue are all of the type with the longer mean.If the queue lengths are equal, the abandonment rate in the auxiliary queue must be lower than that in the original queue.Therefore, the queue length decays slower in the former queue and the reference density for model (4.14) of the auxiliary queue should work.This observation leads to a reference density that follows (5.6) and (5.14), but in this example, we take α = ν1 ∧ ν2 and solve (5.11) to find q 0 .Two scenarios with n = 50 and 500 servers are investigated.The respective arrival rates are λ = 57.071and 522.36.Thus, ρ = 1.141 and 1.045 and both scenarios have β = −1.By solving (5.11), we have q 0 = 0.165 for the first scenario and q 0 = 0.0059 for the second scenario.The reference density follows (5.6) and (5.14) with α = ν1 = 1.With ε 0 = 10 −7 , the truncation rectangle is K = [−7, 9] × [−7, 9], partitioned into 0.5 × 0.5 finite elements.The performance estimates obtained by the diffusion model (4.20) are compared with the simulation results in Table 5.The performance estimates are still quite accurate.
We also put the performance estimates produced by the diffusion model (4.14) in this table.For this model, the reference density follows (5.6) and (5.8) with α = h(0) = 20.9.In the proposed algorithm, the mesh for model (4.20) is used again.Because in this example, using the patience time density at zero solely cannot capture the behavior of the abandonment process, model (4.14) fails to produce proper performance estimates.
7. Implementation issues.The proposed algorithm was implemented using the C ++ programming language.The package was tested on both Linux and Windows platforms.In this section, we discuss several important issues arising from the implementation.They are crucial for using the algorithm to solve practical problems.To demonstrate these issues, the second scenario with n = 500 servers in Section 6.1 is investigated throughout this section.The diffusion model (4.14) is used to approximate the M/H 2 /n+M queue.  of spaces {H k : k ∈ N} may not converge to H in L 2 (R d , r) and the output of the algorithm may significantly deviate from the exact stationary density.To demonstrate this issue, let us consider a "naive" reference density.
To produce a "naive" reference density, we consider a queue that has the same arrival process and patience time distribution as the M/H 2 /n + M queue.This new queue has an exponential service time distribution and its mean service time is equal to that of the M/H 2 /n + M queue.For this M/M/n + M queue, the diffusion model (4.14) is a one-dimensional piecewise OU process whose stationary density is given by (5.5).The "naive" reference density is a product reference density in (5.6) with each r j being the stationary density in (5.5).In other words, the "naive" reference density is obtained by pretending the service time distribution to be exponential.
Let us apply the "naive" reference density.With ε 0 = 10 −7 , the truncation rectangle is set to be K = [−7, 10] × [−7, 10] and is partitioned into 0.5 × 0.5 finite elements.As shown in Figure 5a, the output of the proposed algorithm noticeably deviates from the exact stationary distribution.To further confirm that the "naive" reference density cannot work, we next test the truncation rectangle K = [−7, 32] × [−7, 32], which is used in Section 6.1 along with the proposed reference density.In this case, the matrix A in (3.12) is close to singular and its condition number is 3.52 × 10 190 .Figure 5b manifests the severe error in the algorithm output.Recall that in this example, the hyperexponential service time distribution has c 2 s = 24.Comparing (5.5) with (5.8), we can tell that the decay rate of the "naive" reference density is much larger than that of the proposed density varies quickly.We need a finer mesh to improve the accuracy.In this case, the condition number of A is 5.70 × 10 20 .Recall that to produce the curve in Figure 1b, we use a mesh consisting of 0.5×0.5 finite elements.With this mesh, the condition number of A is 1.15 × 10 23 .When the element size is further reduced to 0.25 × 0.25, the condition number of A grows to 7.13 × 10 27 .As illustrated in Figure 6b, the output of the algorithm fits the exact stationary distribution well.When we compare Figures 1b and 6b, however, there is barely any difference noticeable between the algorithm outputs.To confirm that this mesh is not superior to the one with 0.5×0.5 finite elements, we list several performance estimates in Table 6.In this table, the results in Table 1b are duplicated for comparison purposes.The difference between the algorithm outputs using these two meshes is negligible.Considering the modeling error of the diffusion model, we can assert that using 0.5 × 0.5 finite elements is sufficient to produce an accurate approximation for this queue.
Given an appropriate reference density and the associated truncation hypercube, the above discussion has demonstrated an approach to selecting a mesh.Beginning with two meshes, with one finer than the other, we compare the algorithm outputs using these two meshes.If obvious difference is observed, the coarser mesh should be discarded and a further finer mesh is explored.Continue this procedure until the difference between the outputs of two meshes are negligible.Then, the coarser one of the remaining two is selected as an appropriate mesh.
We would also demonstrate that with an improper reference density, a finer mesh cannot make the algorithm yield an adequate output.Let us go back to the example in Section 7.1 with the "naive" reference density.We set the truncation rectangle to be K = [−7, 32] × [−7, 32] and the size of finite elements to be 0.25×0.25.The output is shown in Figure 7a.Although the curve appears smoother than the one in Figure 5b with 0.5 × 0.5 finite elements, the output still fails to capture the exact stationary distribution.This time, the condition number of A is 3.91 × 10 195 .There is no doubt that such an ill-conditioned matrix will bring about a huge round-off error in solving (3.12).A mesh with 0.125 × 0.125 finite elements is also investigated and the algorithm output is plotted in Figure 7b.The condition number of A increases to 6.35 × 10 198 and the algorithm misses the target as well.7.3.Gauss-Legendre quadrature.Before solving the linear system (3.12),we must generate the matrix A and the vector v whose entries are given by (3.13).We follow a Gauss-Legendre quadrature rule to compute the integral for each entry.The integral is taken over a two-dimensional rectangle and the quadrature rule evaluates the integrand at m points in each dimension.The results are more accurate when a larger m is used.In Section 6, we take m = 8 in the numerical examples.Here, we briefly discuss the impact of the order m.
Several performance estimates are listed in Table 7.We keep the same settings for the algorithm except the quadrature order in each dimension.For the convenience of comparison, the results in Table 1b are duplicated in this table.Clearly, the Gauss-Legendre quadrature of order m ≥ 4 is sufficiently accurate for our purposes.7.4.Computational complexity.Let d, the dimension of the diffusion model, be fixed.The size of A is m C × m C where m C is the dimension of the functional space C given by (3.18).The matrix A is sparse.There are at most 6 d nonzero entries in each row or column.Hence, it takes O(m C ) arithmetic operations to construct A. We may apply Gaussian elimination to solve the linear system (3.12).When the basis functions are properly or-1.0 × 1.0 0.5 × 0.5 0.25 × 0.25 0.125 × 0.  The computation time (measured by seconds) for various meshes can be found in Table 8, where we list both the time for constructing A and v and the time for solving (3.12).When computing A and v, we follow a Gauss-Legendre quadrature rule with m = 8 points in each dimension.The truncation rectangle is set to be K = [−7, 32] × [−7, 32].Each mesh is obtained by setting the size of finite elements.The dimension m C increases by around four times as the width of each finite element is reduced by half.The proposed algorithm is tested on a laptop with a 2.66GHz Intel Core 2 Duo processor and eight gigabytes memory.Both A and v are produced by our C ++ package.The linear system (3.12) is solved by Matlab.These two parts are connected via a MEX interface that comes with Matlab.
8. Concluding remarks.In this paper, we proposed two approximate models for many-server queues with customer abandonment.Both these models are diffusion processes and they differ in how the abandonment process is approximated.A finite element algorithm was proposed for computing the stationary distribution of each model.The essential part of the algorithm is a reference density that controls the convergence of the algorithm.To construct a reference density, we conjectured that the limit queue length process has a certain Gaussian tail.Using this conjecture, we proposed a systematic approach to choosing a reference density.With the proposed reference density, the output of the algorithm is stable and accurate.Numerical examples indicate that the diffusion models are good approximations for many-server queues.
Assume that the stationary density g is twice differentiable in R d and vanishes at infinity.Using the basic adjoint relationship (2.7) and applying integration by parts twice, we have where G * is the adjoint operator of the generator G. Fix a finite domain K ⊂ R d large enough.One can solve the stationary density g by the Dirichlet Proof of Proposition 2. Since G is a linear operator, it suffices to show that for any f 0 ∈ C, we must have f 0 = 0 if Gf 0 = 0 in L 2 (R d , r).The uniform elliptic operator G can be written into the divergence form as in (8.1) of Gilbarg and Trudinger (2001) Let U ⊂ R d be a connected open set that is bounded and contains K. Since r > 0 and Gf 0 is continuous in the interior of each finite element, we must have Gf 0 = 0 in K except on the boundaries of certain finite elements where Gf 0 is not defined.Hence, Gf 0 = 0 in U in the weak sense (see (8.2) of Gilbarg and Trudinger (2001)).Note that b, Σ, and the partial derivatives of Σ are all continuous, so both b and Σ are bounded in U .Because f 0 ∈ W 1,2 0 (K), it follows from Corollary 8.2 of Gilbarg and Trudinger (2001) that f 0 = 0 in K, and thus f 0 = 0 in R d .
Proof of Proposition 3. Given a compact set K ⊂ R d , let C 2 b (K) be the set of real-valued functions on a neighborhood of K that are twice continuously differentiable with bounded first and second derivatives in K. Consider the finite hypercube K a .By (A.1), there exists κ 0 (K a ) > 0 such that for all f ∈ C2 b (K a ).
A polynomial can be used to approximate f 0 on K a .By Proposition 7.1 in the appendix of Ethier and Kurtz (1986), there exists a polynomial f p such that f p − f 0 H 2 (Ka) < ε 2 2κ 0 (K a ) .
For the lattice mesh ∆ k , let Λ a,k be the set of its nodes in the interior of K a .For any k ≥ a, let ϕ k be a function in C k such that ϕ k (x) = 0 for all x ∈ R d \ K a and ϕ k (x) = f p (x) and ∂ϕ k (x) ∂x j = ∂f p (x) ∂x j for j = 1, . . ., d and all x ∈ Λ a,k .Clearly, ϕ k ∈ C2 b (K a ).Because the sequence of meshes {∆ k : k ∈ N} is regularly refined, there exists a constant κ 1 > 0 such that η ∆ k < κ 1 for all k ≥ a.Using the interpolation error estimate in Theorem 6.6 of Oden and Reddy (1976), we have where κ 2 > 0 is a constant independent of ∆ k and f p , and Hence, there exists δ 0 > 0 such that whenever |∆ k | < δ 0 .In this case, .
X in (4.20) has the same diffusion coefficient σ as in the first model (4.14).Its drift coefficient b is (4.21) b ρ = 1.045 and n = 500

Fig 1 :
Fig 1: The stationary distribution of the customer number in the M/H 2 /n+ M queue.

Fig 2 :
Fig 2: The stationary distribution of the customer number in the M/H 2 /n+ M queue, with ρ = 1.112 and n = 20.

Fig 3 :
Fig 3: The stationary distribution of the customer number in the M/H 2 /n queue.

7. 1 .
Influence of the reference density.The reference density plays a key role in the algorithm.If the function r does not satisfy (3.1), the sequence K = [−7, 32] × [−7, 32]

Fig 5 :
Fig 5: The output of the proposed algorithm with the "naive" reference density.
Fig 7: The output of the proposed algorithm with the "naive" reference density and different meshes.
dered, the nonzero entries of A are confined to a diagonally bordered band of width O(m(d−1)/d C).Hence, solving (3.12) requires O(m(2d−1)/d C ) operations as m C → ∞.
j (x)∂f (x)/∂x j ) ∂x for each f ∈ C 2 b (R d ), where bj (x) = b j (x) For each f ∈ C 2 b (K), define a norm • H 2 (K) dx.Because both b and Σ are bounded in K, there exists κ 0 (K) > 0 such that (A.1)K (Gf (x)) 2 r(x) dx ≤ κ 0 (K) f 2 H 2 (K) for all f ∈ C 2 b (K).Let C2 b (K) be the closure of C 2 b (K) in the above norm.A standard procedure can be used to define the first-order and the second-order derivatives for each f ∈ C2 b (K).Then, the operator G can be extended to C2 b (K) and inequality (A.1) holds for all f ∈ C2 b (K).Proof of Proposition 3. It suffices to prove that for anyf 0 ∈ C 2 b (R d ), there exists a sequence of functions {ϕ k ∈ C k : k ∈ N} such that Gϕ k − Gf 0 → 0 as k → ∞.Fix ε > 0. Because K k ↑ R d as k → ∞,by (3.2) and the Cauchy-Schwartz inequality, there exists a ∈ N such that (A.2) R d \Ka (Gf 0 (x)) 2 r(x) dx < ε 2 2 .
Performance measures of the M/H2/n + M queue.
Performance measures of the M/H2/n queue.
Performance measures of the M/H2/n + E k queue with ρ < 1.
Performance measures of the M/H2/n + E k queue with ρ > 1.
Performance measures of the M/H2/n + H2 queue.
The output of the proposed algorithm with different quadrature orders.
Computation time (in seconds) of the proposed algorithm using different meshes.