MINIMIZING EQUILIBRIUM EXPECTED SOJOURN TIME VIA PERFORMANCE-BASED MIXED THRESHOLD DEMAND ALLOCATION IN A MULTIPLE-SERVER QUEUEING ENVIRONMENT

We study the optimal demand allocation policies 
to induce high service capacity and achieve minimum expected sojourn times 
in equilibrium in a queueing system with multiple strategic servers. 
We propose the mixed threshold allocation policy 
as an optimal state-dependent policy that induces optimal service capacity from strategic servers. 
Compensation to the server can be paid at customer allocation or upon job completion. Our study focuses on the use of a multiple-server mixed threshold allocation policy to replicate the demand of a given state-independent policy to achieve a symmetric equilibrium with lower expected sojourn time. 
The results indicate that, under both payment schemes, for any given multiple-server state-independent policy, there exists a multiple-server threshold policy that produces identical demand allocation and Nash equilibrium (if any). 
Moreover, the policy can be designed to minimize the expected sojourn time at a symmetric equilibrium. 
Furthermore, under the payment-at-allocation scheme, our results, combining with existing results on the optimality of the multiple-server linear allocation policy, show that the mixed threshold policy can achieve the maximum feasible service capacity and thus the minimum feasible equilibrium expected sojourn time. 
Hence, our results agree with previous two-server 
results and affirm that a trade-off between incentives and efficiency need not exist in the case of multiple servers.


1.
Introduction. The problem of finding the optimal control policy for a queueing system has been widely studied in the literature, see for instance [5,9,10]. Recent studies have focused on queueing systems with strategic servers [2,7], particularly on deriving an optimal policy to induce high service capacities in a competitive environment [1,3,6]. In these systems, the servers decide their own service capacities and compete with each other for higher market share and profit. For example, this can be used to model service systems composed of independently-operating service providers, or supply chains with make-to-order suppliers who make their own operating decisions. It is then of interest what kind of policy for customer allocation and compensation can be used to induce high service capacities from the servers with minimum cost.
With performance-based demand allocation, the buyer decides the amount of demand to be allocated to a server based on the service capacities of all servers. This has been identified as a plausible option among different means to motivate faster service, as it requires little bargaining power of the buyer when compared with other motivators, like imposing late fees or offering a higher price per job [1]. The common-queue and separate-queue allocation studied in [6] are examples of such demand allocation policies. In the former, a common queue is maintained for two strategic servers and the demand allocated to a server is endogenously determined. In the latter, separate queues are maintained for each server and the demand is allocated to the two queues in proportions such that the expected waiting times in the two queues are the same. Extension of these two policies to the case of multiple servers have been studied in [3].
While both allocation policies discussed in [6] may be implemented without observing the servers' capacities, demand allocation policies that explicitly account for the servers' chosen service capacities may give the buyer a greater power to control servers' incentives and could be designed to induce the maximum feasible service capacity from servers. This has been shown in [1], where several state-dependent and state-independent allocation policies are studied and compared. Under the assumption that payment to the servers are made at customer allocation, it was concluded that the optimal policies in the two classes, respectively the linear allocation and the mixed threshold allocation, can induce the same maximum feasible service capacity, and thus there are no trade-off between incentives and efficiency.
The optimal state-independent policy in [1] has been extended to the case of multiple servers in [14], whereas the optimality argument of the two-server mixed threshold policy is significantly more difficult to generalize to the case of multiple servers and has not been considered in the literature.
The main aim of this paper is to generalize the mixed threshold policy proposed by [1] to the multiple-server mixed threshold policy, and study to what extent these policies can replicate the demand allocation of state-independent policies. We address both cases of the payment-at-allocation and payment-upon-completion scheme. Our result shows that, if we prohibit server overloading, then the multipleserver mixed threshold policies can replicate the demand allocation of any policy. Furthermore, under the payment-at-allocation scheme, the replication of the demand allocation of a state-independent policy with server overloading is feasible if we allow ourselves to include single-sourcing with some probability in the mixed threshold policy. Assuming that all servers are identical, in the Nash equilibrium, the expected sojourn time with our mixed threshold policy is optimal with the equilibrium service capacities. In other words, our results concur with the two-server results of [1] and indicate that there is no trade-off between incentives and efficiency.
The rest of the paper is structured as follows. Section 2 summarizes related literature. Section 3 introduces the multiple-server demand allocation problem and review previous results obtained by [1] and [14]. In Section 4, we generalize the twoserver threshold policy to an n-server threshold policy and find the set of allocated demand vectors that can be replicated using an n-server mixed threshold policy. In Section 5, we summarize the result and give a discussion on further research issues.
2. Literature review. Game theoretic analysis of equilibrium service capacities of two or more strategic servers has been considered in [2,7]. Later studies focused on choosing an optimal policy to induce high service capacities in a competitive environment [1,3,6]. In these systems, the servers decide their own service capacities and compete with each other for market share and profit. Game theory [11] is used to model the interactions between strategic servers so as to find out the equilibrium service capacities and profits.
In [6], the common-queue and separate-queue allocation were compared with two strategic servers in a principal-agent framework (see [8]) to minimize cost needed to maintain expected sojourn time at or below a required level. The equilibrium service capacities chosen by the servers were found and compared. It was shown that the separate-queue allocation may give lower costs than the common-queue allocation, suggesting that there is a trade-off between efficiency and incentives. Extension of these two policies and the corresponding comparison to the case of multiple servers have been done in [3].
The study in [1] considered various demand allocation policies that explicitly account for the servers' chosen service capacities. The principal-agent problem studied is based on a two-server Markovian queueing system, where the buyer would like to induce a high service capacity from strategic servers through a performancebased allocation of demand and a compensation proportional to allocated demand. Two classes of allocation policies, namely state-independent allocation policies and state-dependent allocation policies are studied and compared. The model under each allocation policy is considered as a multiple-player strategic game and the Nash equilibrium, if any, is identified. Assuming payment to servers is made at customer allocation, they show that the linear allocation policy is an optimal stateindependent policy and induces the maximum feasible service capacity from servers. They further argue that by randomizing between two-server threshold allocation policies, one could achieve an allocation identical to the linear allocation policy. Thus an optimal state-dependent policy that induces the maximum feasible service capacity can be obtained. However, we remark that in cases where the capacity of the primary server is lower than its allocated demand, the mixed threshold policy under the payment-at-allocation scheme implies that we allocate customers only to the primary server, which makes the system unstable even when the total service capacity is greater than the total demand rate, and at the same time we would be paying the server for more customers than it can actually serve. Similar optimality results have not been obtained in their study for the case where servers are paid upon job completion.
The optimality of the multiple-server linear allocation policy, again under the payment-at-allocation scheme, has been proved in [14]. However, the optimality argument of the two-server mixed threshold policy, as proposed by [1], has not been considered in the case of multiple servers and the extension is much more mathematically complicated. The main difficulty lies in the complexity of the queueing system under an n-server threshold policy. With non-strategic servers, there have been studies on the optimality of threshold-type policies for heterogeneous server systems [10,13]. However, the steady-state probabilities of the system cannot be obtained explicitly, and it is not straightforward to see how the demand allocation changes with the thresholds. Therefore, given a fixed state-independent policy, showing the existence of a mixed threshold policy that gives the same identical allocation is much more difficult in the case of multiple servers as the allocation vector (with respect to each chosen service capacity vector) is of higher dimension than in the two-server case. The study in this paper focuses on showing the set of allocation vectors that can be achieved by mixed threshold policies and establishing a similar result of optimality of the mixed threshold policies in the multiple-server case.
3. The multiple-server demand allocation problem. We consider a queueing system with n identical strategic servers. Customers arrive to the system according to a Poisson process with rate λ. Each server chooses its own service capacity µ i and incurs a cost at the rate of c(µ i ), where c(0) = 0 and c(.) is assumed to be strictly increasing and convex, i.e. c (.) > 0 and c (.) ≥ 0. The time that Server i serves a customer is, independent of all other service times, exponentially distributed with mean rate µ i . The buyer pays each server an amount of R for each customer it completes serving. The aim of the buyer is to select a demand allocation policy, through which the customers are assigned to the servers, that minimizes the expected sojourn time for a customer in the equilibrium. We assume which is the necessary condition for the expected waiting times to be finite in an equilibrium where the n servers split the demand equally. Moreover, as a benchmark for comparison, we define the maximum feasible service capacity asμ where c(μ) = λR/n. In other words, the maximum feasible service capacity is the service capacity at which, when chosen by all servers, each server receives equal share of the demand and earns zero profit. We consider two different payment schemes here. The first one is the paymentat-allocation scheme, where a server is paid when the customer is allocated to the server. The second one is the payment-upon-completion scheme, where a server is paid for a customer when it completes the service for the customer. When the service capacity of a server exceeds or equals its allocated demand rate, the two payment schemes essentially pay the same amount to the servers in the long run. However, if the service capacity µ i of a server i is lower than its allocated demand rate λ i , i.e. µ i < λ i , the payment-at-allocation scheme will be paying the server at R times its allocated demand rate, i.e. Rλ i , while the payment-upon-completion scheme will be paying the server at R times its service rate, i.e. Rµ i , which is lower than in the former case. It should be noted that, under the payment-at-completion scheme, since a server i is paid at most at the rate of Rµ i even if λ i > µ i , we can only consider allocation policies with λ i ≤ µ i , i.e. we never need to overload a server, as overloading the server does not pay more to the server and thus do not help to give higher incentives to the server.
3.1. State-independent and state-dependent allocation policies. As proposed in [1], demand allocation policies can be divided into two classes, namely the state-independent and state-dependent allocation policies. The class of stateindependent policies is characterized by the fact that under such policies, customer allocation is only based on the service capacities of the servers, but not the states of the servers (i.e., whether a server is busy or idle). Consequently, there is no difference between allocating a customer to a server immediately upon its arrival or not. We then assume that a First-In-First-Out (FIFO) queue is maintained for each server, and customers are immediately allocated to the queue of a server upon arrival. We further assume that the arrival of customers to each of these servers follows a Poisson process with rate λ i . This assumption holds, for instance, when we allocate each customer to Server i with probability λ i /λ. Examples of state-independent policies with multiple servers are the separate-queue allocation [3,6], the linear allocation and the proportional allocation [1,14]. In particular, [14] proved that the n-server linear allocation policy is optimal under the paymentat-allocation scheme. The other class of allocation policies, the state-dependent policies, are policies that allow customer allocation to depend on the state of the servers. Consequently, a customer may not be allocated to a server immediately upon arrival. The most common example is the common-queue allocation policy [3,6], but here we will focus on a multiple-server generalization of the two-server mixed threshold policy discussed in [1].

3.2.
State-independent policies: A review of the multiple-server linear allocation policy. Under the payment-at-allocation scheme, the two-server linear allocation policy proposed by [1] has been shown to be an optimal state-independent policy when appropriate parameters are chosen. Under the same payment scheme, the multiple-server linear allocation policy and its optimality have been studied by [14]. Under the n-server linear allocation policy, the allocation to Server i is given by where the servers' capacities are sorted in a decreasing order, θ > 0, 0 < ρ ≤ 1 and n ≤ n is the largest integer such that λn ≥ 0 and µn > 0. It should be noted that under this n-server linear allocation, the demand allocated to Server i can be greater than the service capacity chosen by Server i, i.e., λ i (µ) > µ i for some capacity vector µ. In other words, with the policy under the paymentat-allocation scheme, there are cases where a server is paid for more customers than it can actually serve, but such cases do not occur in the Nash equilibrium of the game.
Under the payment-at-allocation assumption (i.e. servers are paid for the job at allocation), [14] modelled the decision of the servers' capacities as an n-player strategic game and proved the existence and uniqueness of a Nash equilibrium in which the service capacity equals to the maximum feasible service capacity when the appropriate values of θ and ρ are chosen. Specifically, when the cost function c(.) is strictly convex, [14] proved that a unique equilibrium exists with and ρ = 1 when R > r 1 = b. In the equilibrium µ i =μ for all i and the expected service times are finite. For the case where the cost function c(.) is linear, i.e., c(µ i ) = bµ i (b > 0), [14] proved that a unique equilibrium exists with In the equilibrium µ i =μ for all i and the expected service times are finite. We remark that similar results have not been obtained under the payment-upon-completion scheme.
3.3. The state-dependent policies: A review of the two-server mixed threshold allocation policy. Although the multiple-server linear allocation policy has been proved to induce the maximum feasible capacity from the servers, we are interested in investigating whether the same equilibrium service capacity can be induced by a state-dependent allocation policy. The main reason is that linear allocation, being a state-independent policy, does not allow for demand pooling, and so it is possible for a customer to be waiting for a busy server while another server is idle, even when the idle server could provide a lower expected sojourn time for the customers. A state-dependent allocation policy that induces the same level of service capacity could possibly give a lower expected sojourn time of the customers in the equilibrium when compared to the linear allocation policy. For the case of two servers, Cachon and Zhang ( [1]) have shown that a mixed threshold allocation policy achieves this goal. The two-server threshold allocation has first been studied as a control policy with non-strategic servers in the literature. In particular, it has been proved in [9] that the buyer's optimal allocation with two heterogeneous non-strategic servers is of threshold type. Under a two-server threshold allocation, a single queue is maintained for the two servers, but a job may not be allocated immediately to a server upon arrival, even if the server is idle. Job allocation is based on the designation of the primary (and secondary) server and a threshold parameter m. When a job arrives, it is allocated to the primary server if it is idle or has fewer than m jobs in queue and allocated to the secondary server only if it is idle, the primary server is busy, and has m jobs in queue. The advantage of a threshold allocation over a common-queue allocation is that, in some cases an idle server may be so slow that waiting for the another busy but faster server may yield a lower expected sojourn time. A numerical method for evaluating the system's performance under threshold allocation has been studied in [12]. It can be seen that, when different values of m are chosen, the demand allocated to the servers would be different. This allows us to parametrize the policy to create the appropriate level of competition that induces the desired service capacity in the equilibrium.
In Cachon and Zhang's study [1] of the two-server allocation problem with strategic servers, they proposed randomizing between threshold policies with different parameters to replicate the demand allocation of the linear allocation policy, so that the maximum feasible service capacity can be attained in the Nash equilibrium. Specifically, they argued that the buyer can allocate any portion of the buyer's demand to the primary server by varying which server is designated the primary server and randomizing between different threshold values m. They supported their claim by the fact that the primary server's allocated demand increases with m and when m is infinity, the primary server earns the buyer's entire demand.
The above argument is valid with many choices of service capacities, particularly when the service capacities are close enough to the equilibrium ones, which assures the existence of the desired Nash equilibrium. However, when the primary server's service capacity µ 1 is less than the total demand λ, with any finite values of m, the secondary server is allocated at least λ − µ 1 of demand. The limit of the primary server's demand, as m goes to infinity, is µ 1 . The only way to allocate more than µ 1 of the demand to the primary server is not to use the secondary server at all, i.e. setting m = ∞ and making λ 1 = λ, and to pay for the customers to the server at allocation instead of service completion. However, this will cause the system to be unstable, even in cases where µ 1 + µ 2 > λ, and is therefore undesirable.
If we prohibit server overloading (either by only allocating λ i ≤ µ i to Server i or by not allowing the buyer to pay at customer allocation), then some allocated demand vectors cannot be replicated by a two-server mixed threshold policy. It is then important to know which allocated demand vectors can be replicated. We will extend the two-server mixed threshold allocation policies to the case of n servers and address the issue in the following sections.
4. Multiple-server threshold policies. In this section, we will generalize the two-server threshold policy to an n-server threshold policy. We will assume that the buyer pays the server for a customer when the service is completed.
4.1. The n-server policy. With n servers, where n ≥ 2, it is natural to extend the two-server case by assigning the servers as the 1 st , 2 nd , . . . , n th servers and specifying n − 1 threshold parameters. Similar control policies for non-strategic servers have been studied in [10]. In some of these studies the threshold parameters may depend on the state of the other servers. (More precisely, the threshold for the i th server can depending on the state of the (i + 1) th , . . . , n th servers). However, for simplicity and because randomization gives enough flexibility for parameterizing the policy, we shall assume that m i is a constant in each policy in our study.
An n-server (pure) threshold allocation policy T is specified by an assignment of the Servers 1, 2, . . . , n as the 1 st , 2 nd , . . . , n th servers and the thresholds m 2 , . . . , m n where each m i is a nonnegative integer. We define m 1 = 0. A single queue is maintained in the system. When a customer arrives, it is assigned to Server 1 if it is idle. If Servers 1, 2, . . . , i − 1 are all busy and the number of waiting customers (including the new arrival) is more than m 1 + . . . + m i , the customer is assigned to Server i. (Alternatively, we can also assign the first customer in the queue for Server i and let the new arrival wait in the queue.) Otherwise, it waits in the queue. When Server i completes service of a customer, if the number of waiting customers is more than m 1 +. . .+m i , then the first customer in the queue is assigned to Server i. If m i = ∞ for some i, then no customer is allocated to Servers i, i + 1, . . . , n.
Given any service capacity vector the demand allocated to the servers via the threshold policy T is is defined to be Server i's expected rate of receiving customers. In each state, if Server i is idle, its rate of receiving customers is the arrival rate of customers multiplied by the probability that an arriving customer is allocated to Server i. On the other hand, when Server i is busy, its rate of receiving customers is µ i if there are waiting customers that can be assigned to Server i upon service completion of the current customer, and is zero otherwise. Because the n-server policy is much more complicated, it is not straightforward to see what demand allocation can be achieved by the pure policy and by randomizing between some n-server threshold policies. In the following, we give some properties of an n-server pure threshold allocation policy and its allocated demand λ.
Suppose Server i is designated as the i th server. Let λ = (λ 1 , . . . , λ n ) be the allocated demand of an n-server pure threshold policy with the thresholds m 2 , . . . , m n . We have the following results: We have seen that the demand allocation to the servers can be varied by adjusting the thresholds of a policy. However, because the thresholds only take integral values, the demand allocation is limited to a countable set of points. To enable us to select from a wider range of demand allocation, we introduce the n-server mixed threshold policy, which randomizes between a number of pure threshold policies.
Definition 4.2. An n-server mixed threshold allocation policy τ is specified by an integer k ≥ 1, real numbers α 1 , . . . , α k such that k i=1 α i = 1 and k n-server threshold policies T 1 , . . . , T k . When the mixed threshold allocation policy is used, each of the threshold policy T i is used with probability α i . The demand allocated via the mixed threshold policy τ is then denoted by λ (τ ) = (λ for any server j = 1, 2, . . . , n.
It is clear that the set of demand vectors that can be allocated by a pure threshold policy when n i=1 µ i > λ is contained in the set Since S µ is a convex set, it follows immediately that the set of demand vectors that can be allocated by a mixed threshold policy is also contained in S µ . In the following, we explore which allocation vectors in S µ can be achieved by some mixed threshold policy given a fixed service capacity vector µ such that n i=1 µ i > λ. Unless otherwise specified, in the following we shall assume such a fixed service capacity vector.
Suppose we have a target demand allocation vector λ t such that n i=1 λ t i = λ. We say that an allocation policy τ with demand allocation λ (τ ) is λ t -dominated in the order (i 1 , i 2 , . . . , i n ) if Also note that the above condition implies that λ and an allocation vector If λ t j < min(µ j , λ) for all j = 1, 2, . . . , n, then there exists an n-server (pure) threshold policy that is λ t -dominated in the order (1, 2, . . . , n).
The pure threshold policies in Lemma 4.3 will be used in the following to compose a mixed threshold policy that gives the our target demand allocation. To illustrate the idea of we have obtained in the lemma, note that we can represent an allocated demand in a diagram as in Figure 1 by showing each n j=k λ ij for k = 1, 2, . . . , n. Then a λ t -dominated policy in the order (i 1 , i 2 , . . . , i n ), with demand allocation λ (τ ) have each of these quantities less than or equal to that of λ t , as shown in Figure 2.   The following two lemmas are used to show that we can construct mixed threshold policies with some nice properties for the construction of the one giving the target demand allocation.
To facilitate our discussion, we say that an allocation policy τ with demand allocation λ (τ ) is λ t -dominated and m-smaller in the order (i 1 , i 2 , . . . , i n ) if the policy is λ t -dominated in the order (i 1 , i 2 , . . . , i n ) and for all j = m, m + 1, . . . , n, where m is an integer such that 2 ≤ m ≤ n and i 1 , i 2 , . . . , i n ∈ {1, 2, . . . , n} are distinct.
Note that in the above definition, the property is equivalent up to any permutation of i m , i m+1 , . . . , i n . We also note that any policy λ t -dominated in the order (i 1 , i 2 , . . . , i n ) is λ t -dominated and n-smaller in the order (i 1 , i 2 , . . . , i n ). Therefore, from Lemma 4.3 we have obtained a set of λ t and n-smaller policies in different orders (i 1 , i 2 , . . . , i n ). The idea of an λ t -dominated and m-smaller policy in the order (i 1 , i 2 , . . . , i n ) is illustrated in Figure 3.
In the following lemma, we show that, given policies that are λ t -dominated and m-smaller in all possible orders (j 1 , j 2 , . . . , j n ), we can obtain λ t -dominated and (m − 1)-smaller policies in any order (i 1 , i 2 , . . . , i n ). Considering different orders of (i 1 , i 2 , . . . , i n ) and using Lemma 4.4 for induction from m = n down to m = 2, we can obtain λ t -dominated and 2-smaller policies in any order (i 1 , i 2 , . . . , i n ). It can be seen easily that for such a policy, we have λ  Figure 3. The gray bars represent the target allocation λ t and the white bars represent the demand allocation λ (τ ) . In additional to the requirement of a λ t -dominated policy, this policy needs to satisfy λ ij ≤ λ t ij for j = m, m + 1, . . . , n, as illustrated in the diagram by breaking up the sum λ im + λ im+1 + . . . + λ in into small blocks of λ im , λ im+1 , . . . , λ in and having each of the small blocks corresponding to λ (τ ) smaller than or equal to those corresponding to λ t . simpler way as requiring The following lemma states the existence of such a policy.
If λ t i < min(µ i , λ) for all i = 1, 2, . . . , n, then for any fixed k, there exists an n-server mixed threshold policy with allocated demand λ such that Lemma 4.5 provides us with a set of policies that are close enough to the target demand allocation in the sense that only one of the servers could possibly receive more demand than the targeted one, with the other servers all receiving an equal or less amount of demand compared to the targeted demand allocation. Using these policies, we can find a mixed threshold policy that gives exactly our target demand allocation. This is shown in the following proposition. Then there exists a mixed threshold allocation policy with allocated demand λ such that λ i = λ t i for all i = 1, 2, . . . , n. We have shown that for any µ with n i=1 µ i > λ, any demand allocation vector set in the interior of the set S µ is the allocated demand of some mixed threshold policy. Moreover, if λ t i = 0 for some i, the demand allocation can be achieved by removing all servers i with λ t i = 0 and considering a mixed threshold policy for the reduced number of servers. On the other hand, if λ t i = λ ≤ µ i for some i, then it can be achieved by assigning all customers to Server i. Therefore, the set is achievable. It remains to investigate whether we could find a mixed threshold policy that achieves λ t where n i=1 λ t i = λ and λ t i = µ i < λ for some i. However, this is impossible, because whenever the system is stable (i.e. when the sum of service capacities of all servers with finite thresholds is greater than the total demand rate), the demand allocated to Server i, λ i , must be strictly less than µ i (as the proportion of time of the system having no customers is positive). Therefore, in order to have λ i equal µ i , the system must be unstable. This implies that The remaining demand λ− n i=1 µ i then cannot be allocated. Thus it is impossible to allocate to Server i exactly a demand of µ i using a mixed threshold allocation if all demand has to be allocated.
Nevertheless, the problem can be solved in two ways. First, note that λ i approaches µ i in the limit as the threshold m i+1 goes to infinity if i j=1 µ j < λ. For any > 0, we can find a value of the threshold such that |µ i − λ i | < . Alternatively, we can use a state-independent allocation and assign a proportion of µ i /λ of the arrivals to Server i for such cases.

4.2.
Analysis on unstable queueing system. In the above sections, we have mainly focused on the case where the total service capacities exceed the total demand rate and so all demand are allocated, i.e. n i=1 λ i = λ. If the sum of the chosen service capacities are less than the total demand rate, µ 1 + . . . + µ n ≤ λ, the queueing system is not stable regardless of the values of m 2 , . . . , m n . Although it is natural to utilize the servers as much as possible when the system is not stable, the alternative of allocating strictly less than the service capacity of a server to it may be useful with strategic servers to induce the servers to switch to higher service capacities in the long run, since we are mainly concerned with the equilibrium service capacities. Technically, designing an allocation policy that assigns λ i < µ i to Server i in these cases may help to avoid the existence of an undesirable Nash equilibrium where the queueing system is unstable.
In [1], under the state-independent linear allocation, a server may be given an allocated demand more than, equal to or less than its service capacity when the queueing system is not stable. We remark that with threshold allocation, when the system is unstable, it remains impossible to allocate to a server a demand level that is higher than its capacity, because a customer is only assigned to the server when it is idle. Thus any demand allocation where λ i > µ i is not possible. As a pure strategy, the buyer can choose to allocate a demand of zero or µ i to Server i by setting m i to be infinite or finite, respectively. Under the condition that µ 1 + µ 2 + . . . + µ n ≤ λ the threshold m i does not affect the allocated demand of other servers. Consequently, we can randomize between the values of m i and obtain any allocated demand λ i such that 0 ≤ λ i ≤ µ i . Therefore we conclude that the set of feasible allocation when µ 1 + µ 2 + . . . + µ n < λ is the set of allocation vectors satisfying 0 ≤ λ i ≤ µ i .

Efficient mixed threshold policies.
We have shown that the set of demand allocation vectors µ i ≤ λ can be replicated by mixed threshold policies. However, it is not yet certain whether such policies perform better than state-independent policies. It has been shown that for servers with different service capacities, the optimal policy that gives the lowest expected sojourn time is of threshold type [10], but some thresholds may depend on the states of the other servers, and the mixed threshold policy we have may not give the lowest expected sojourn time with respect to the chosen service capacities. Indeed, in order to design an allocation policy that induces the server to choose the maximum feasible capacity and thus minimizes equilibrium expected sojourn times, efficiency must be given up with some out-of-equilibrium choices of service capacities. Hence, our aim in this section is to find out whether the mixed threshold policy can give a lower expected sojourn time in equilibrium when compared to the state-independent policies.
As we deal with identical servers, we expect a symmetric equilibrium, where all servers choose the same service capacity and receive equal share of the demand. It is desirable that our mixed threshold policy gives the minimal expected sojourn time in this case, which will be shown in the following two propositions. Proposition 2. When µ 1 = µ 2 = . . . = µ n = µ c > λ/n one can randomize among some threshold allocation policies with zero thresholds to obtain the demand allocation λ 1 = λ 2 = . . . = λ n = λ/n. Proposition 3. In an n-server queueing system, given that µ 1 = µ 2 = . . . = µ n = µ c , any n-server threshold allocation with all thresholds being zero gives the same expected sojourn time as an n-server common-queue system where each server has service capacity µ c .
Finally, note that because any pure threshold policy with all threshold being zeros has an expected sojourn time identical to that of the n-server common queue, any mixed policy that is composed of such pure threshold policies would have the same expected sojourn time too. Combining with Proposition 2, we have shown that the mixed threshold policy used to replicate a state-independent policy could be designed to have minimal sojourn times in a symmetric equilibrium, which is better than the state-independent policy. Thus the use of a mixed threshold policy could indeed help to improve efficiency and lower the expected sojourn time in the equilibrium.

Interpretations and discussions.
We have shown that for any fixed service capacity vector µ and any target demand allocation vector λ such that 0 ≤ λ i ≤ λ, λ i < µ i and it is possible to choose a mixed threshold policy that gives the demand allocation λ. For the case where λ i = µ i , it can be catered for by using a state-independent allocation for that case. Applying the respective policy for each service capacity vector µ when it is observed, we have a state-dependent policy that gives the demand allocation λ(µ). In other words, for any state-independent policy P 1 with demand allocation λ such that 0 ≤ λ i ≤ min(µ i , λ), there exists a state-dependent policy that replicates the demand allocation of the policy P 1 . Moreover, from the discussion in Section 3.3, we see that the expected sojourn time under the state-dependent policy is lower than that under policy P 1 . We conclude that for any state-independent policy that does not overload the servers, i.e., λ i ≤ µ i , there exists a state-dependent policy that replicates the same demand allocation, thus giving the same Nash equilibrium but a lower expected sojourn time in the equilibrium.
The arguments above apply to both the payment-at-allocation and paymentupon-completion cases. We note that server overloading under the payment-uponcompletion scheme is not meaningful as the server only receives payment for the customers that it finishes serving, the same is not true under the payment-at-allocation scheme. In this case, server overloading needs to be considered as that would result in a higher compensation rate to the server. In the following, we assume the payment-at-allocation scheme and discuss the case where server overloading is permitted. If we relax the conditions λ i ≤ µ i for i = 1, 2, . . . , n and use the paymentat-allocation scheme, as we have seen in [14], there could be a state-independent allocation that gives an equilibrium with the maximum feasible service capacity with λ i > µ i in some cases. To replicate the allocation of such policies, we must allow servers to be overloaded and use the payment-at-allocation scheme.
If we assign all the demand to one server, say Server i, and pay the server at customer allocation, then the demand allocated to Server i and its rate of revenue, would be λ and λR respectively. Randomizing this allocation with other mixed threshold policies, it is possible to achieve any target demand allocation λ such that 0 ≤ λ i ≤ λ and n i=1 λ i = λ. This can be easily proved by noting that allocating all demand to Server i gives the demand allocation λ = λe i = (0, . . . , 0, λ i th entry , 0, . . . , 0), for i = 1, . . . , n, and any target demand allocation can be expressed as a convex combinations of these vectors. However, such an allocation results in infinite waiting times and should be avoided as far as possible. Thus, for demand vectors such that 0 ≤ λ i ≤ min(µ i , λ), we can apply the results in previous subsections and use a mixed threshold policy that comprises of only threshold policies with finite waiting times to replicate the demand allocation. In particular, at equilibrium we only need to randomize between threshold policies with zero thresholds, so that the expected sojourn time is equal to that in an n-server common queue system with the maximum feasible service capacity chosen.

4.5.
Comparison of expected sojourn times. In previous subsections, we have shown that under the payment-at-allocation scheme, a mixed threshold allocation policy, if allowed to overload servers, can attain the same equilibrium service capacity as the linear allocation policy while giving a lower (and in fact minimum) expected sojourn time. However, the implementation of a mixed threshold allocation policy is complicated and may be costly, especially when the number of servers is large. In this subsection, we investigate how the ratio of the expected sojourn time of the two policies changes when the number of servers, n, becomes very large.

SIN-MAN CHOI, XIMIN HUANG AND WAI-KI CHING
This would provide insight into whether it is worthwhile to implement the mixed threshold allocation policy when there is a higher implementation cost compared to the linear allocation policy.
Assume fixed n ≥ 2 and R such that Let W si and W sd be the expected sojourn time in equilibrium under the optimal state-independent allocation and the corresponding replication by the threshold allocation policy, assuming both allocation yields a unique symmetric equilibrium.
By standard results of an M/M/1 queue, with demand λ/n allocated to each server, we have By Proposition 3 and standard results of a M/M/n system, we also have Combining Equations (2) and (3) we have where a = λ/μ n . We are interested in the limit of the ratio in (4) as n goes to infinity. Otherwise, To understand the implications of the proposition, we first note that the maximum feasible service capacity,μ, becomes arbitrarily small as n goes to infinity. Then c (0) represents the marginal cost of increasing capacity at such a level. From Proposition 4, we know that if this marginal cost is zero, i.e., if c (0) = 0, the advantage of using the optimal state-dependent policy over the optimal state-independent policy vanishes as the number of servers approaches infinity. On the other hand, if c (0) > 0, the ratio of the expected sojourn time under the optimal state-dependent policy to that under the optimal state-independent policy approaches a limit that is strictly below one when the number of servers approaches infinity.
5. Concluding remarks. In this paper, we have extended the two-server mixed threshold allocation policy proposed by [1] to the case of n servers. For any stateindependent policy that prohibits server overloading, we have shown that it is possible to replicate the allocated demand by a mixed threshold policy. We consider two payment schemes: the payment-at-allocation scheme and the paymentupon-completion scheme. Under the payment-at-allocation scheme, where serveroverloading is possible, we have shown that a mixed threshold allocation policy can replicate the allocated demand if we include a single-sourcing strategy in the mixed policy and allow payment at customer allocation. For the payment-upon-completion scheme, although we do not know whether the mixed threshold policy can give the maximum feasible service capacityμ, our results do show that the mixed threshold policy can perform as well as any other state-independent or state-dependent policy in terms of the induced service capacity. For identical servers, the mixed threshold policy at the symmetric equilibrium can be composed of only threshold policies with zero thresholds. As a result, the policy yields the minimal expected sojourn time with the equilibrium service capacities.
Our results concur with existing two-server results that there are no trade-off between incentives and efficiency. Whether or not we allow server overloading, we can find a n-server mixed threshold policy that induces the same service capacity from the servers as any given state-independent policy. Moreover, in the symmetric equilibrium, the mixed threshold policy can always give a lower expected sojourn time.
Our extension of the mixed threshold policy to multiple servers is natural, but the proof that the n-server mixed threshold policy can replicate any other policy is significantly more difficult than its two-server counterpart. The technical hurdle lies in that, in the inductive process of constructing the desire mixed threshold policy, we need to match the demand of a server while keeping the previously matched ones the same. This concern was not present in the two-server case and made the proof much more complicated in both finding the appropriate component pure policies and in matching the target demand allocation.
Our results have been derived in a framework where servers are identical, i.e. they have the same cost function c(µ). Nevertheless, the results in Sections 4.1 -4.2 are independent of the cost structure of the servers. Therefore, with asymmetric servers, it is also possible to replicate the demand allocation of any state-independent policy that prohibits server overloading by an n-server threshold allocation policy. However, because the Nash equilibrium, when exists, may not be symmetric, it has yet to be investigated whether a suitable n-server threshold allocation policy performs better than a state-independent policy in terms of achieving a lower expected sojourn time.
Our results are based on a Markovian queueing system. We believe that a similar analysis can be carried out in the cases with more general distributions of the inter-arrival times or service times. However, the actual computation of the nserver mixed threshold policy may be more complicated due to the difficulty in the computation of the allocated demand in an n-server threshold system.
In our model, it is assumed that the service capacities chosen by the servers can be observed by the buyer. However, in reality the buyer has to infer the service capacities from realized service times. In our study we have not considered how statistical errors may affect our results. Thus it is an interesting future research issue.
In Zhang's work [14], it has been shown that the multiple linear allocation can achieve the maximum feasible service capacityμ in the Nash equilibrium, under the payment-at-allocation scheme with server overloading permitted. It then follows from our results that, if we also allow server overloading and payment at allocation, there exists an n-server mixed threshold allocation policy that achieves the maximum feasible service capacity with the minimum feasible expected sojourn time at equilibrium. However, as mentioned in earlier sections, server overloading and payment at allocation cause unnecessary infinite-waiting times at some outof-equilibrium plays, and may be undesirable. It still remains to be investigated whether there exists a state-independent policy without server overloading that achieves the maximum feasible service capacity under the payment-upon-completion scheme. Nevertheless, our results still show that the mixed threshold policy can perform as well as any other state-independent or state-dependent policy in terms of the induced service capacities. Thus, if a policy that induces the maximum feasible service capacity exists, then our results would imply that an optimal mixed threshold allocation policy without server overloading (i.e. λ i ≤ µ i in all allocation) also exists.
Our work has proved the existence of an n-server threshold policy that replicates any given state-independent policy that prohibits server overloading. For any fixed service capacity and given target demand allocation, it is desirable to find a mixed policy that not only gives the minimum expected sojourn time, but also randomizes between minimum number of policies. Finding an efficient way to identify such a mixed policy may be a direction for future research. Since the n-server mixed threshold policy involves a set of parameters for each service capacity vectors, another future research issue may be to investigate whether there could be simpler state-dependent policy with fewer parameters that gives the same incentives and efficiency.
6. Proof of Lemmas and Propositions. Proof. For statement (i), When µ 1 + µ 2 + . . . + µ k < λ, in any state of the queueing system, the arrival rate of the queueing system is λ, which is greater than the total service rate, which is at most µ 1 + µ 2 + . . . + µ k . Considering that a birth-anddeath process with birth rate λ and death rate µ 1 + µ 2 + . . . + µ k is unstable, we see that the long-run number of customers in the queueing system will be infinite with probability 1. On the other hand, when the total number of customers in the system is more than k + m 2 + m 3 + . . . + m k , Servers 1, 2, . . . , k will always be in use and so the process of the additional number of customers behaves as a birth-and-death process with birth rate λ and death rate µ 1 + µ 2 + . . . + µ k . Therefore, µ 1 + µ 2 + . . . + µ k > λ is sufficient for the system to be stable. When the system is stable, all customers are served with probability 1. Thus λ = n i=1 λ i . Statement (ii) is straightforward since, by definition, no customer is allocated to join the servers i, i + 1, . . . , n as m i = ∞.
For Statement (iii), the fact that is straightforward. For the other side, we consider two cases.

Case I: If
i−1 j=1 µ j > λ, we compare the system with the delay system (subject to the same inter-arrival times and service times) where the i th , (i + 1) th , . . . , n th servers are not used. This is equivalent to the case where m i = ∞. Let Y be the number of waiting customers in the system. Then we have lim k→∞ P (Y ≥ k) = 0 since the system is stable given i−1 j=1 µ j > λ. Now consider the original threshold system with threshold m i . For fixed i, let Y mi be the total number of customers waiting in the first m 2 + m 3 + . . . + m i positions of the queue. Since in this system some customers are allocated to Servers i, i + 1, . . . , n while no customer is lost to other servers in the previous system, we have Y mi ≤ Y for any outcome of the inter-arrival times and service times. Thus the event Y mi ≥ m 2 + m 3 + . . .
But then the left-hand-side is nonnegative and the right-hand side approaches zero as m i → ∞. Clearly the right-hand-side is independent of m i+1 , . . . , m n . For any > 0, we have m * i such that Case II: If i−1 j=1 µ j ≤ λ, once again, consider Y mi , the number of customers waiting in first m 2 + m 3 + . . . + m i positions of the queue under the given threshold policy, and Y l mi , the number of waiting customers in a loss system that consists of the first i servers with thresholds m 1 , m 2 , . . . , m i−1 and queue length m 2 + m 3 + . . . + m i , with both systems subject to the same inter-arrival times and service times. Then we have Y mi ≥ Y l mi for any outcome of the inter-arrival times and service times. Thus we have P (Y mi ≥ m 2 + m 3 + . . . + m i−1 + 1) ≥ P Y l mi ≥ m 2 + m 3 + . . . + m i−1 + 1 .
Note that the left-hand-side is at most 1, while the right-hand-side is independent of m i+1 , . . . , m n and converges to 1 as m i → ∞ because it approaches the case of a delay system, which is unstable given = λ. For j = 0, 1, . . . , n − 1, let T j be the pure threshold policy with m 2 = m 3 = . . . = m n = 0 and Server i being the (i + j) th server when i + j ≤ n, and the (i + j − n) th server otherwise. Then we have Proof. First, because all servers have the same service capacity and all thresholds are zero, the state of the system can be represented by the number of customers in the system. Moreover, the designation of 1 st , 2 nd , . . . , n th servers do not affect the expected number of customers in the system because all servers have the same service capacities. Let X c denote the number of customers in the n-server common queue system with all service capacities being µ c . Also, let X t denote the number of customers in the system under an n-server threshold allocation policy with all thresholds being zero. Suppose the system is subject to the same arrivals and service times, then we have X c = X t Taking expectation, we have E[X c ] = E[X t ]. The Little's Queueing formula states that in a stable system we have L = λW , where L is the long-term average number of customers in the system and W is the long-term average time a customer spends in the system. Since λ is the same for both systems under consideration, we have the expected sojourn times equal, i.e. W c = W t where W c and W t are, respectively, the expected sojourn time in the n-server common-queue system and under the threshold allocation policy with all threshold being zeros.

Proof of Proposition 4.
Proof. Sinceμ n is given by c(μ n ) = λR n , a n = λ/μ n goes to infinity as n goes to infinity. To find out the limit, we first note that lim n→∞ a n n = lim