An M/M/c queue with queueing-time dependent service rates

Recent studies indicate that in many situations service times are affected by the experienced queueing delay of the particular customer. This effect has been detected in different areas, such as health care, call centers and telecommunication networks. In this paper we present a methodology to analyze a model having this property. The specific model is an M/M/c queue in which any customer may be tagged at her arrival time if her queueing time will be above a certain fixed threshold. All tagged customers are then served at a given rate that may differ from the rate used for the non-tagged customers. We show how it is possible to model the virtual queueing time of this queueing system by a specific Markov chain. Then, solving the corresponding balance equations, we give a recursive solution to compute the stationary distribution, which involves a mixture of exponential terms. Using numerical experiments, we demonstrate that the differences in service rates can have a crucial impact on queueing time performance.


Introduction
For classical queueing models, service times are typically assumed to be independent of experienced delay. Such independence assumptions are often crucial for analytical tractability of the queueing system's performance. In practice, however, it has been recognized that the amount of waiting affects service durations, and the assumed independence does therefore not hold. Empirical evidence of this dependence relation primarily stems from the health care domain. The studies [3,10,11,12,26,27,30,31] indicate that delays in admission have adverse effects on patient outcomes and consequently increase the patients length of stay; this is referred to as the slowdown effect in [29].
At a more conceptual level, the field of behavioral operations investigates how servers and customers behave in an operational setting. The recent study [16] indicates that in many situations service times are affected by the load. The authors develop a framework for the impact of load on service times, where they distinguish server, network, and customer mechanisms. For the server mechanism, it is observed (and supported by literature) that there is a clear impact of workload on the service speed, and this impact may go in different directions. The authors of [16] found far fewer customer mechanisms in the literature, although they expect them to exist. A psychological view of a customers queueing experience during its sojourn time is provided in [9]. Specifically, the authors assume that the dissatisfaction level of a customer increases during waiting, whereas this may be compensated during service. As a consequence, for an acceptable level of dissatisfaction after service, the service time should be longer for a customer experiencing longer delays. Moreover, after excessive waiting customers expect valuable service [22], which may also affect the corresponding service time. Similarly, the recent study [32] in a retail environment found that customers waiting longer in fact consume more. Thus, from the customer perspective, it seems conceivable that excessive waits are associated with longer service times.
The aim of this paper is to find the steady-state queueing time distribution in multi-server queues where the service time is affected by the experienced queueing time. Despite its apparent practical relevance, such queues have hardly been studied in a service setting with multiple servers. More specifically, we let the service rate of each server depend on whether the experienced queueing time of the customer in service is above or below a given threshold upon service initiation. We envisage that this typically corresponds to a customer mechanism, although the service rate adaptation might also be the consequence of the server adapting to congestion. Despite the inherent model complexity, the steady-state queueing time distribution turns out to be remarkably tractable in this case and can be expressed in terms of a mixture of exponential terms. From our numerical experiments, we see that taking the differences in service rates into account results in crucially different queueing time behavior. As such, ignoring the dependence between experienced waiting and service time might be wholly inadequate. An online implementation of the model is available to further facilitate managerial decision making [14].
Delay thresholds are typically used in empirical health care studies to distinguish delayed and non-delayed patients; if the admission delay is above the delay threshold, a patient is considered to be delayed. For instance, [10,27] investigated the impact of delayed patients at the emergency department on the inpatient length of stay. Based on the patient data, the difference in length of stay is in the order of hours. For cardiac patients, delays are even much more critical; delays in the order of minutes lead to adverse patient outcomes [12]. Less critical cases, such as surgery of hip fractures, have delay thresholds in the order of days [30]. For patients with communityacquired pneumonia a similar delay threshold is used [26]. For both situations it is shown again that delayed admissions experience extended length of stay. Another example of the impact of physician workload at the emergency department (ED) are [3,31]; amongst others, the authors observe that high physician workload leads to overtesting and generates extra post-ED care.
The health care situations described above have recently inspired the study of multi-server queues, in which the service time (i.e. the length of stay) is affected by delay and congestion at the clinical ward, such as the Intensive Care Unit [11,18,29]. The study of [11] is also supported with data verifying the correlation between delay and length of stay. In [11], the multi-server queue with delay-dependent service is abbreviated with M/M (f )/c; the focus from the queueing perspective is on approximations and bounds for the workload process. The multi-server variant with abandonments in the quality and efficiency driven (QED) regime is considered in [18]. Next to the fact that this involves an asymptotic analysis, the service rate adaptations are also instantaneous instead of the more intricate delay effects on individual customers. Such server mechanisms are referred to as operator slowdown in [29], as opposed to customer slowdown. The model in [29] also involves a multi-server queue, where the service rate depends on whether a customer has to wait or not. In terms of the current paper, this means that the waiting threshold is at zero. In addition, [29] focuses on the number of customers instead of queueing times.
There have been some recent studies on multi-server queues where service times depend on delay. The authors of [34] consider a general multi-server queue with abandonments and derive fluid limits as a proxy for expected queueing times. Moreover, [35] considers a setting with customer abandonments, where the service time is either endogenously or exogenously determined by the system's dynamics. The focus there is mainly on statistical estimation for both dependency situations. Finally, in [17] the service speed is affected by behavioral factors, such as server speedup due to increased workload and social loafing when multiple workers share the workload. However, the analysis is in terms of queue lengths instead of queueing times.
From the literature discussed above, we observe that almost all studies of multi-server queues with delay-dependent service involve some sort of approximation. This is different for the singleserver case, which is much more amenable for analysis. An important observation for the singleserver case is that the queueing time then corresponds to the workload a customer finds upon arrival; this is no longer the case for the multi-server setting with delay-dependent service. There is a long tradition of single-server queues with workload-dependent features; we refer to [19] for an early overview containing many references. Among those early papers are [25] and [8]. Interestingly, in 1973 Posner already noted that the server may provide more appropriate service to counter the negative effect of waiting [25]; the author then provides a complete analysis for the M/M/1 case in which the service rate is a step function of the queueing time. A little later, [8] provides an exact analysis for the M/M/2 queue where non-waiting customers have a different service rate.
For workload-dependent M/G/1 queues, often the service and/or arrival rates are assumed to depend on the workload, but not so often the complete service time. However, generalizations of such systems are Lévy driven queues in which the Lévy exponent depends on the position of the process. The Lévy exponent incorporates the Laplace transform of the service time distribution and, hence, the service time may thus depend on the workload found by a customer entering service. Examples of such Lévy driven queues with state-dependent exponent are [4,5,24]. Finally, [33] and [7] consider G/G/1 queues with service and interarrival times that depend linearly on delays.
Limiting distributions in terms of mixtures of exponentials are also common in Markovmodulated fluid models. In fact, our analysis is along similar lines as such fluid models, although our differential equations differ from the ones found in traditional fluid queues [2], see [21] for an early overview. Some examples of fluid models with level-dependent features are [13,23,28]. A crucial difference with fluid models is the role of the background state. Our state description, where the service time depends on experienced delay, is delicate. In our case, the background state should be interpreted as the server state process; our state description is based on [1].
The paper is organized as follows. In Section 2, a model and state description is provided. Section 3 presents balance equations that are required to determine the limiting distribution. The limiting distribution is derived in Section 4, followed by some illustrative special cases in Section 5. Section 6 contains some numerical insights. For readability, most of the technical proofs are deferred to Appendix A. A python algorithm to compute the queueing time distribution is avaiblabe for downloading at the public repository [15]; see [14] for an online implementation.

Model and state description
We consider a queueing system with c identical servers and an infinite waiting room. Customers arrive according to a Poisson process with rate λ. Let W (t) be the virtual queueing time (VQT) at time t. That is, if a customer arrives at time t, his service will start at time t + W (t). Clearly, if at least one server is idle at time t, W (t) = 0. If all servers are busy at time t, W (t) > 0. The service times of the customers depend on their queueing time through a critical level k > 0 as follows: if a customer arrives at time t, and W (t) ≤ k, he is classified as a class 1 customer, and his service time is exp(µ 1 ), otherwise he is classified as class 2 customer and his service time is exp(µ 2 ). In order to describe the dynamics of the VQT process {W (t), t ≥ 0}, we introduce the server state process S(t) = (S 1 (t), S 2 (t)) as follows. We say that S(t) = (i, j) if i servers are serving class 1 customers and j servers are serving class 2 customers at time t + W (t), just before the new service starts at time t + W (t). Clearly, we must have 0 ≤ S 1 (t) + S 2 (t) ≤ c − 1 for all t ≥ 0. Furthermore, We discuss the evolution of the {(W (t), S 1 (t), S 2 (t)), t ≥ 0} process below. We will use the following notation for the aggregate service rate: Suppose the state at time 0 is (0, i, j) with 0 ≤ i + j < c − 1. If the next event is an arrival, the state jumps to (0, i + 1, j); if it is a departure of type 1, it jumps to state (0, i − 1, j); and if it is a departure of type 2, it jumps to state (0, i, j − 1). Hence, the transition rate from state (0, i, j) to state (0, i + 1, j) is λ, to state (0, i − 1, j) is iµ 1 and to state (0, i, j − 1) is jµ 2 .
Next, suppose the state at time 0 is (0, i, c − 1 − i) with 0 ≤ i ≤ c − 1. Again, if the next event is a departure of type 1, it jumps to state (0, i − 1, c − 1 − i); and if it is a departure of type 2, it jumps to state (0, i, c − 1 − 2). Hence, the transition rate from state (0, If the next event is an arrival, all servers become busy, i + 1 of them serving type 1 customers and (c − 1 − i) of them serving type 2 customers. The next departure occurs after an exp(∆(i + 1, c − 1 − i)) amount of time and the VQT process jumps to level X ∼ exp(∆(i + 1, c − 1 − i)). Also, the next departure is of type 1 with probability (i + 1)µ 1 /∆(i + 1, c − 1 − i) and of type 2 with probability (c − 1 − i)µ 2 /∆(i + 1, c − 1 − i). Combining these observations, we see that the transition rate to state (x, i, c − 1 − i) is This completes the description of all transitions out of states (0, i, j) with 0 ≤ i + j ≤ c − 1.
Next, consider states (w, i, c − 1 − i) with 0 < w ≤ k and 0 ≤ i ≤ c − 1. The state does not change if the next event is a departure. It can change only if the next event is an arrival. An arrival in this state is of type 1. By following the same argument as in the case of state (0, i, c − 1 − i), we see that the transition rate to state (w + x, i, c − 1 − i) is Now consider states (w, i, c − 1 − i) with w > k and 0 ≤ i ≤ c − 1. An arrival in this state is of type 2. Hence, following the same argument as above, we see that the transition rate to state Finally, if W (t) > 0, the VQT process changes continuously at rate −1 between arrivals. This completes the description of the evolution of the process {(W (t), S(t)), t ≥ 0}.

Limiting distribution: balance equations
In this section, we derive the balance equations for the VQT process (W (t), S(t)) defined in Section 2 that are satisfied by the limiting distribution. It is straightforward to see that the VQT process is stable if We shall assume stability from now on and focus on the limiting distribution of (W (t), S(t)). Now let, for x ≥ 0 and t ≥ 0, x), and define the row vector function whose first two derivatives are denoted by F (x) and F (x). Also, for the case that no customers are waiting, let Using the transition rates derived above, we see that the π's satisfy the following balance equations: with 1 (·) denoting the indicator function.
Next we derive the integro-differential equations satisfied by F (·) for the case there is queueing delay.
We denote by I the identity matrix, whose size will be clear from the context, and byÎ a rectangular matrix obtained from I by adding a null column on the left, i.e.Î = ( 0 I ). Throughout the paper, we will use the convention of denoting by· a rectangular (instead of square) matrix.
Let B 1 be a c × c square matrix with entries given by and B 2 be a c × c square matrix with entries given by We finally define the matrices Theorem 1. The limiting distribution vector F satisfies the following integro-differential equations: Proof. The proof follows standard probabilistic arguments and uses an infinitesimal approach, which we defer to the Appendix. Equations (7) and (8) are integro-differential equations. These equations are related to level crossings principles; see Subsection 5.1 for the single-server case providing additional intuitive insight. To find the limiting distribution, we first convert them to second order linear nonhomogeneous differential equations. They are given in the following theorem. To do so, first define the differential operators L 1 and L 2 as follows: Theorem 2. The limiting distribution vector F satisfies the following second order differential equations: where Proof. This follows from rewriting the integro-differential equations (7) and (8), see Appendix.
The next corollary gives the boundary conditions for F . Here and later we use the notation g(x±) := lim y→x ± g(y) to denote the one-sided limits of the function g at x.
Corollary 1. The limiting distribution vector F satisfies the following boundary conditions: Proof. See Appendix for details.

Solution of the balance equations
In this section we determine the limiting distribution by developing the analytical solution of the differential equations in Theorem 2 and the boundary conditions in Corollary 1. In fact, we present two different ways to express the limiting distribution of the VQT process. The first is based on a scalar representation and clearly reveals that F (x) can be written as a mixture of exponentials. This representation gives insight in the probabilistic interpretation of the queueing delay and is presented in Subsection 4.1. The second concerns a matrix representation and is more compact. This representation is more amenable for numerical computations and can be found in Subsection 4.2. (9) in Theorem 2, we first need to solve the homogeneous equation. Hence, we first need to define several backgrounds quantities. Consider the following quadratic eigenvalue equation:

Limiting distribution. For the solution of Equation
There are 2c solutions {(θ i , φ i ), 0 ≤ i ≤ 2c−1} to the above system. Since the matrices involved in the above equation are all upper triangular, it is easy to see that these 2c solutions are given by the solutions to the following quadratic equations: If both conditions λ = cµ 1 and λ = c(µ 1 − µ 2 ) are satisfied, all these eigenvalues are real and distinct, see also Remark 2 for the other cases. For the rest of the paper we implicitly assume that these conditions hold. The solutions {θ i , 0 ≤ i ≤ c − 1} are given by 1} are positive and are given by Note that θ 2c−1 = max{0, λ − cµ 1 }. The corresponding eigenvectors {φ i , 0 ≤ i ≤ 2c − 1} are easy to compute. In particular, the eigenvector corresponding to the null eigenvalue is denoted by (22) φ * = [0, 0, · · · , 0, 1], which is a row vector of length c.
Next, we turn to the homogeneous equation based on Equation (10). For this, consider the following quadratic eigenvalue equation: and the β i+c 's, for 0 ≤ i ≤ c − 1 are given by Similarly to (20), and (21), all these eigenvalues are real and distinct, assuming which is a row vector of length c. respectively. Moreover, consider the process W (t) ∈ (0, k) and fix the server state process S(t) = (i, c − 1 − i); then the VQT process is decreasing with rate 1 and makes jumps with rate λ of size exp(∆(i + 1, c − 1 − i)). Upon a jump, the server state process S(t) changes with probability , which may be interpreted as a type of clearing [6]. It may be verified that the stationary density of such a 'clearing system' is a mixture of exp(θ i ) and exp(θ i+c ). A similar argument applies to W (t) > k in terms of β i .
As mentioned, to get the solution of the differential equations of Theorem 2 we first need to find the solution of the homogeneous differential equations using the above, and then we look for a particular solution. However, in order to construct a particular solution, since both equations (9) and (10) admit zero as eigenvalue, we would need together with the left eigenvectors φ * and ψ c , respectively defined in (22) and (26), the corresponding right eigenvectors that we denote byφ * andψ c . The following result shows an important relation between those eigenvectors that will be used later in the proof of Theorem 3 to show that F (x) does not have a linear term. Lemma 1. Letφ * andψ c be the right eigenvectors corresponding to the left eigenvectors φ * and ψ c , i.e. satisfing the following relations It follows that Proof. The proof uses linear algebra techniques and is included in the Appendix.
The next result gives the solution of the differential equations of Theorem 2 in terms of 4c Theorem 3. For 0 < x < k, the solution is given by For x > k, the solution is given by Proof. This follows from solving the systems of second order linear differential equations below and above level k in Theorem 2, thereby also utilizing Lemma 1; see Appendix.
Remark 2. We note that in the special case that λ = cµ 1 , we have two identical eigenvalues , then the β i for i = 0, . . . , c − 1 are identical to β 0 and the mixture of exponentials in F (x) for x > k needs to be replaced.

Remark 3.
In case µ 1 = µ 2 the model reduces to a simple M/M/c queue. Therefore, in this case the VQT distribution is given by is the Erlang's C formula and ρ = λ/(cµ 2 ) is the server utilization.
We next use the boundary conditions in Corollary 1 to solve for the 3c + 1 unknown constants We also have c(c + 1)/2 probabilities π(i, j), 0 ≤ i + j ≤ c − 1 that need to be determined. The result is given in the next theorem.
and the normalizing equation where 1 denotes the all-one vector.

4.2.
Computing the solution. For the limiting distribution in Theorem 3, we need Theorem 4 that expresses all constants as a solution of a quite large system of linear equations. In this section we try to express the solution in simpler terms that are easier to implement in a computer language equipped with basic matrix functions. In addition the contruction of the solution given below shows that the system in Theorem 4 only admits a unique solution.
First of all, it helps to write the function F (x), given in (30) and (32), in matrix form. We start by defining the matrices Φ , whose rows are given by the eigenvectors corresponding to the eigenvalues in (20) and (21). We also define the diagonal matrices Θ 1 having all non-positive (negative) eigenvalues, U + 1 having all positive (non-negative) eigenvalues if λ > cµ 1 (λ < cµ 1 ).
In a similar way we construct the matrices U ± 2 solving the equation 2 with all negative eigenvalues and U + 2 with all non-negative eigenvalues. This allows us to rewrite the expression in (30) and (32) as x ≥ k, for unknown constant vectors a − , a + and b − . We then express these vectors in terms of the unknown vector F (0) by using the continuity conditions given in Corollary 1. This gives the following easier expressions In the theorem below, F (0) is expressed in terms of δ c−1 and b c . Proof. This follows from Corollary 1 and some tedious rewriting, see Appendix.
Having expressed the function F (x) for x ≥ 0, in terms of δ c−1 , it is only left to find the probability of the discrete states, together with the constant b c that can be found by using the normalizing equation. This is the result of following theorem. Theorem 6. The discrete probabilities can be computed as follows where the matricesĤ i are defined in (82) and with the constant b c computed as where H 19 and H 20 are given in (76) and (77) in the Appendix.
Proof. The linear system of equations may be rewritten to recursively express δ c−1 in terms of b c , whereas b c follows from normalization; see Appendix.

Special cases
In this section, we present two special cases that provide probabilistic understanding of the limiting VQT process. In Subsection 5.1 we focus on the single-server case, whereas Subsection 5.2 covers the case with 2 servers.

5.1.
Single-server queue. In case c = 1, the process {W (t), t ≥ 0} is sufficient for the state description. In fact, this process now corresponds to an M/M/1 queue in which the jump size depends on the state found upon arrival; see Figure 1 for an illustration of its sample path. In this case, the model is a special case of the M/G/1 variant of Model I in [5]; see also e.g. [20] for a classical related model with two service speeds. Observe that for c = 1 all matrices become scalars. In particular, using that B i =∆ i = µ i andQ i (x) = e −µ i x , and applying integration by parts, the integro-differential equations (7) and (8) in Theorem 1 can be written as These equations can also be interpreted as level crossing equations, where the left and right hand sides correspond to the rate in and out of set (0, x), respectively. Solving these equations, in terms of the density F (x), we obtain, for 0 < x < k, whereas, for x > k, we have The VQT density allows for an intuitive interpretation. Specifically, in the region (0, k) jump sizes are always exp(µ 1 ), which implies that F (x) is proportional to the limiting workload density in an M/M/1 queue with service rate µ 1 (and finite workload capacity k). Also, observe that sample paths of W (t) in the region (k, ∞) are always initiated by an upcrossing of k with a jump of size exp(µ 1 ), after which all jumps are exp(µ 2 ) until a subsequent downcrossing of k. This implies that W (t) in (k, ∞) behaves as the workload process in an M/M/1 queue with service rate µ 2 , but with an exceptional first service time in a busy period that has rate µ 1 . This directly explains the mixture of the two exponentials in (k, ∞).

5.2.
Numerical example for c = 2. As an illustration of the balance equations and the limiting distribution, we consider the 2-server case in this section. A visual representation of the transition diagram of (W (t), S 1 (t), S 2 (t)) can be found in Figure 2. To avoid excessive expressions, we focus on a numerical example with specific parameters. We fix k = 0.45, λ = 2, µ 1 = 0.75 and µ 2 = 1.12. The second order differential equations of Theorem 2 then looks as follows: for 0 ≤ x < 0.45, we have 1.5π(0, 1), whereas, in the region x > 0.45, we obtain The boundary conditions in Corollary 1 are rather straightforward. By Theorem 5, the solution of the above system of differential equations is unique depending on a constant b c and the components π(1, 0) and π(0, 1). That is the quantities F 0 (0) and F 1 (0) are determined given those values as follows It follows that the remaining unknowns π(0, 0), π(1, 0), π(0, 1) and b c can be determined by imposing the following constraints The expressions for F 0 (x) and F 1 (x), in the interval 0 ≤ x ≤ 0.45, are whereas, in the interval x > 0.45, they are given by Note that θ 1 = |λ − cµ 1 | = +0.5 and β 0 = λ − cµ 2 = −0.24. The distribution of the virtual queueing time is visualized in Figure 3, along with the cases of 3 and 4 servers.

Numerical insights
In this section we focus on numerical insights. Specifically, we consider W , the stationary distribution of the queueing time or VQT. Note that the results in Section 4 contain more information, as they also provide the server state. To obtain VQT, observe that P (W = 0) = Figure 3. Stationary VQT cumulative distribution function for λ = 2, µ 1 = 0.75, µ 2 = 1.12 and k = 0.45. † i,j:0≤i+j≤c−1 π(i, j) and P (W ≤ x) = P (W = 0) + F (x)1. From F (x), we may also directly derive the mean stationary VQT, which we give here in matrix representation.

Lemma 2. The mean stationary VQT is computed as follows
where I(a, b; D) is defined as in (85).
Proof. The results follow from the density F (x) and integration by parts, see Appendix.
First, we consider the impact of having different µ 1 and µ 2 . In case µ 1 = µ 2 , the system corresponds to the classical M/M/c queue, whereas there is a slowdown (speedup) effect when µ 2 < µ 1 (µ 2 > µ 1 ). As a basic example, we take k = 5, c = 3, λ = 2, and µ 1 = 0.8, whereas µ 2 ∈ {0.7, 0.8, 0.9}. The cumulative distribution function (cdf) of W is visualized in Figure 4. Clearly, in case of a slowdown (µ 2 = 0.7), the queueing time strongly deteriorates compared to the standard situation where µ 2 = µ 1 = 0.8. In fact, if µ 2 ≤ 2/3 the system would even become unstable. For the current example, the impact of a speedup (µ 2 = 0.9) is relatively small compared to the standard situation, as the basic service rate of 0.8 is already sufficient to provide reasonable queueing times. Hence, taking differences in service rates into account is crucial to provide reliable queueing times, especially in case of slowdowns.
Second, the shape of the VQT density may also be strongly affected by speedups (or slowdowns), i.e., differences in µ 1 and µ 2 . The VQT density is strictly decreasing for the standard M/M/c queue, which will also hold in case µ 2 < µ 1 (slowdown). However, this is no longer necessarily the case for speedups, see Figure 5 for k = 5, c = 3, λ = 2, µ 2 = 0.8, and µ 1 ∈ {0.3, 0.6, 0.67, 0.74, 0.8}. In particular, for more extreme variants of a speedup effect, the peak in the VQT density may be around or above level k.
Third, we consider the impact of the number of servers, i.e., scale of the system. Let k = 5, µ 1 = 0.8, µ 2 = 0.7 (slowdown), and consider systems with 2, 3, and 4 servers. We let λ be 4/3, 2, and 8/3, respectively, such that the loads λ/(cµ i ), for i = 1, 2 are identical. The cdf of the stationary VQT is presented in Figure 6. Clearly, as the number of servers increases the queueing time improves, which is in line with economies of scale for regular M/M/c queues. We † The python algorithm to generate Figures 3, 4, 5, 6 and 7 is avaiblabe for downloading at the public repository [15]; see [14] for an online implementation. like to note that the relative ordering of cdf's below k may change in case λ ≥ cµ 1 (which we did not visualize here).
Finally, we consider the expected VQT. It is well known that E[W ] is convex and increasing in λ for regular M/M/c queues with c and µ fixed. This property is not necessarily preserved in the current model, see Figure 7. Specifically, in case µ 1 is relatively small (µ 1 = 0.3 in Fig. 7), a large fraction of the customers will experience a VQT of around k, assuming the system to be stable. This destroys the convexity of E[W ] as a function of λ. Moreover, the impact of µ 1 is also considerable for more heavily loaded systems. For instance, comparing E[W ] for different µ 1 ∈ {0.3, 0.6, 0.9} with fixed λ/(cµ 2 ) = 0.9, we see that the mean VQT is much smaller when a lot of customers can be served with rate µ 1 (i.e., for µ 1 = 0.9). To conclude this section, we note that neglecting the differences in service rate leads to rather inadequate performance characteristics.

Appendix A. Proofs
In this appendix, we present the technical proofs of the results presented throughout the paper. Proof of Theorem 1: Let 0 ≤ x < k, and 0 ≤ i ≤ c − 1 be fixed. Define P i (W (t) ∈ A) = P (W (t) ∈ A, S(t) = (i, c − 1 − i)), with A ⊂ R. Conditioning on the jump size being exactly let t → ∞, divide by h, and rearrange terms to get Letting h → 0 and multiplying both sides by −1, and noting that F (0) = 0, we get Equation (7), after noticing that Following the same steps as above, we get Substituting Q κ B κ = B κQκ , κ ∈ {1, 2}, yields Equation (8). Proof of Theorem 2: First consider Equation (7). Note that Since B 1 is invertible, we can use Equation (7) to get, for 0 ≤ x ≤ k, Taking the derivative with respect to x on both sides of Equation (7), we get Substituting Equation (50) in the above equation we get Equation (9) with Here we have used Equation (17) to eliminate F (0). This is a second order linear differential equation with constant coefficients and a constant driving function on the right hand side.
Next we consider Equation (8). First, note that, for x > k, we get, by applying Equation (50), that We also have, for x > k, Substituting in the RHS of Equation (8) we get, for x > k, Differentiating both sides of Equation (54), we get Using (54) we have Substituting Equation (56) in the RHS of Equation (55) we get Equation (10). This completes the proof. Proof of Corollary 1: Equation (14) follows from the definition of F . Equation (15) follows by taking the left and right limits at k in Equations (48) and (49), respectively. Equation (16) follows by taking the left and right limits at k in Equations (7) and (8), respectively.
The balance equation for state (0, i, c − 1 − i) yields In matrix form this can be written as which is Equation (17).
Lemma 3. Let M be a c × c matrix with entries given by It is non-singular and satisfies the following equation where {i, j} is the Kronecker delta function, that is Then by matrix multiplication we have Proof of Lemma 1: Let us assume that the left equation in (29) holds, that is α 1 ·ψ c = 0. Then using the definition of α 1 , given in (12), we have that 1∆ 2 − B 2 )ψ c = 0 and we are going to show that (59) (B 1 −∆ 1 )∆ −1 1∆ 2ψc = 0, so that the column vector∆ −1 1∆ 2ψc is parallel toφ * , because it is a right eigenvector of the matrix (B 1 −∆ 1 ) corresponding to the null eigenvalue, whose multiplicity is one.
If Equation (59) holds, we have that where in the last equality we used the relation∆ 2ψc = B 2ψc given by (28).
To prove (59), we continue from (60) by rewriting it in the following way In the first equality we used (28), in the third one that B 1∆ −1 1 = (µ 1 I +∆ c−1 ) −1 B 1 , given by the definition (5), in the sixth one we used the result of Lemma 3, where the matrix M is defined. Finally in the last two equalities we use again the definition (5) and the hypothesis (28).

Proof of Theorem 3:
We consider the two regions of x separately. First assume that 0 < x < k. Equation (9) is a non-homogeneous linear system of ordinary second order differential equations. Hence, we first try a homogeneous solution of the type where φ is a row vector of length c. Substituting in L 1 F h (x) = 0 and cancelling e θx , we get Equation (18), with 2c solutions {(θ i , φ i ), 0 ≤ i ≤ 2c − 1}. The eigenvalues θ i 's are given in Equations (20) and (21). The homogeneous solution to Equation (9) is then given by where the 2c constants {a i , 0 ≤ i ≤ 2c − 1} are to be determined.
For the particular solution we should look for a function of the following type because θ c−1 = 0 is an eigenvalue for the homogeneous solution. By substitution in (9) we get that the following equation has to be satisfied where the vector ζ can be chosen such that ζ · φ * = 0, because for all a ∈ R, In addition, in order to have (63) satisfied for any x, the coefficient of the linear term should be null implying that η = a φ * . Taking the scalar product of both sides of (63) by the right eigenvectorφ * satisfying (27), we get that −a φ * (λI −∆ 1 ) ·φ * = α 0 ·φ * , implying that Note that the linear term is missing if α 0 ·φ * = 0.
Finally, we have that the particular solution is equal to where a is defined as in Equation (64). Below we show that α 0 ·φ * = 0, implying that a = 0 and thereby Equation (30). Next consider the region x > k, where F satisfies Equation (10). It is also a non-homogeneous linear system of ordinary second order differential equations. As before we try a homogeneous solution of the type where ψ is a row vector of length c. Substituting in L 2 F h (x) = 0 and cancelling e βx we get Equation (23), which has 2c solutions (β i , ψ i ), 0 ≤ i ≤ 2c − 1. The eigenvalues β's are given in Equations (24) and (25). The homogeneous solution to Equation (10) is then given by Since the solution has to be bounded we immediately get that b i = 0 for c < i ≤ 2c − 1, since the corresponding β i 's are strictly positive. Moreover, we have β c = 0 and ψ c is as given in Equation (26). It follows that the homogeneous solution to Equation (10) can be written as where the c + 1 constants {b i , 0 ≤ i ≤ c} are to be determined. Next we determine the particular solution. Similarly to what we have done before for the interval [0, k], the particular solution associated with the constant term α 1 in the right hand side of (10) would be of the form (62), since the associated homogeneous equation L 2 F h (x) = 0 admits the constant function as solution. However, in this case, the boundary condition, requiring lim x→∞ F (x) to be bounded, implies that the vector η is zero and therefore that α 1 ·ψ c = 0. By applying Lemma 1, this also implies that the linear term in (65) is missing.
Eventually it follows that the particular solution in the region x > k is given by as can be verified by direct substitution, where M 1 and M 2 are as given in Equations (33) and (34). The general solution is then as given in Equation (32). This completes the proof. Proof of Theorem 5: According to the results of Corollary 1, we write F (k) to mean F (k−) = F (k+) and similarly for the derivative in k. By defining By defining Then substituting the expression of α 2 in (13), by employing also (12) and (11) we get the first result in (43).
Proof of Lemma 2: By taking derivatives of Equations (41) and (42) we can compute the VQT density function as follows We define the following matrix function