Resource allocation in congested queueing systems with time-varying demand: An application to airport operations

Motivated by the need to develop time-efficient methods for minimizing operational delays at severely congested airports, we consider a problem involving the distribution of a common resource between two sources of time-varying demand. We formulate this as a dynamic program in which the objective is based on second moments of stochastic queue lengths and show that, for sufficiently high volumes of demand, optimal values can be well-approximated by quadratic functions of the system state. We identify conditions which enable the strong performance of myopic policies and develop approaches to the design of heuristic policies by means of approximate dynamic programming (ADP) methods. Numerical experiments suggest that our ADP-based heuristics, which require very little computational effort, are able to improve substantially upon the performances of more naive decision-making policies, particularly if exogenous system parameters vary considerably as functions of time.


Introduction
In the modern world there exists a need to develop fast, responsive and easily adaptable methods for computing optimal (or near-optimal) solutions to problems in which resources must be allocated dynamically in order to satisfy time-varying demands from multiple sources. A particularly pertinent example of this can be found in the real-time deployment of scarce resources at busy airports. With increasing numbers of airports operating at the very limits of their capacity in order to maximize runway throughput rates in the face of ever-intensifying demand from airlines and other stakeholders, efficient usage of runway capacity is essential if significant operational delays are to be avoided.
In this paper, we consider the optimal dynamic control of service facilities which can be modelled as dual-class queueing systems with random, nonstationary patterns of demand and adjustable service rates.
Various authors have formulated problems in which service rates may be adjusted dynamically in response to a stationary pattern of demand [1,3,13,18,34,50], and used the techniques of stochastic dynamic programming to characterize optimal policies. In real-world applications, however, there is usually a need to consider time-varying demand. Classical applications which have received a lot of attention in the literature include telephone call centres and hospital emergency departments [11,21,51]. In these settings, service rates are typically controlled via the setting of hourly staffing levels.
Although the results in this paper are intended to be of general interest, our primary motivation is the importance of reducing operational (queueing) delays at busy airports. The question of how to avoid excessive flight delays during peak congested periods has become increasingly topical in recent years due to relentless air traffic growth, which has placed the resources of many of the world's busiest airports under seemingly unsustainable pressure [16,20]. In 2016, more than 15 million minutes of delays were experienced by air passengers due to inefficiencies in the air traffic management system [28].
Measures aimed at reducing operational delays on a particular day of operations at an airport can be separated into two distinct categories: • Strategic interventions: These are made in advance of the day of operations in question, and involve designing schedules and setting 'declared capacity' constraints in such a way that airline requests can be satisfied without causing large queues of traffic to form during congested periods [54]. For example, "demand smoothing" [43] may help to achieve this.
• Tactical interventions: These are made on the day of operations itself in response to real-time observed queues of traffic and other important factors, which may include weather and wind conditions and information about future expected arrival times.
In this paper we assume that the properties of the random, nonstationary processes by which customers arrive in the system have been predetermined and are not subject to any real-time control. In an airport context, this means we take the schedule of flights as a 'given' and consider the optimization of tactical interventions only. The adjustment of time-varying service rates for different customer classes is within the realm of tactical control, and in an airport context this should be interpreted as the dynamic allocation of resources between two sources of demand (arrivals and departures). We can also consider other applications involving the dynamic allocation of service capacity, including make-to-stock production systems [26,37] in which product inventories are updated based on anticipated consumer demand, maintenance problems with different classes of machines subject to breakdown [10] and resource allocation in wireless networks [15].
Tactical control of airport operations constitutes a relatively new area of research which began with Gilbo [19], who considered a problem involving the allocation of service rates between arrivals and departures based on linear capacity constraints. Gilbo's work, as well as that of other authors [7,12], assumed deterministic queueing dynamics. Jacquillat and Odoni [30] introduced a dynamic optimization framework based on stochastic queue dynamics which represents a considerable advancement on earlier work. Their model incorporates the dynamic control of runway configurations and arrival/departure service rates and includes stochastic models for the evolution of weather and wind conditions. This paper differs from previous research by focusing on the optimization of tactical decision-making during a congested period of time in which servers at the service facility are likely to be continuously busy.
We suppose that, during the period of interest, the probability of any server becoming idle is negligible, and refer to this as the "Servers Always Busy" (SAB) assumption. This assumption, which becomes asymptotically correct as demand rates increase towards infinity, facilitates an elegant mathematical analysis which can be used as the basis for a heuristic approach involving the use of approximate dynamic programming techniques to compute parametric value function approximations.
At a severely congested airport, there may indeed be lengthy periods of time during a typical day during which usage of the runways is constantly in demand from aircraft waiting to perform take-offs and landings. This is especially likely to be the case if weather conditions are poor, or if wind conditions prevent certain runway configurations from being used [4,30,43]. Naturally, the "busy" times of day are those at which capacity utilization decisions will have the greatest impact. In particular, lengthy queues and operational delays may result from suboptimal decision-making. Thus, we conjecture that optimizing tactical decision-making during a busy portion of the day may indeed go a long way towards alleviating the worst effects of congestion and optimizing overall performance.
An important measure of daily performance at an airport is the proportion of flight delays which exceed a certain length of time. In the airline industry, it is common practice to refer to departures or arrivals which take place within 15 minutes of their scheduled times as "on-time" [40,44,53]. For this reason, it is important that our objective function takes into account not only the average queueing delays over a period, but also their variability. In this paper we consider an objective function based on weighted second moments of queue lengths (measured in terms of service phases). Objectives based on second moments have previously been considered in [30,31] and are appropriate to use in problems where control must be exercised over the tails of queue length distributions.
The key contributions of this paper are as follows: • We formulate an optimization problem based on the dynamic, tactical control of service rates for two distinct customer classes in response to nonstationary demand rates and capacity constraints. We then discuss a natural method for approximating the queueing dynamics of the system during a busy period and, based on this idea, introduce a "surrogate problem" which offers greater mathematical tractability than our original optimization problem.
• For the surrogate problem, we derive myopic decision-making policies (which optimize immediate outcomes) and also prove certain structural properties of optimal policies. We argue that, in certain plausible cases for the system parameters, the myopic and optimal policies obtained using our analyses of the surrogate problem should also perform well in the original problem.
• We develop various heuristic policies for the original optimization problem based on properties of the surrogate problem. One such heuristic applies 'approximate' policy improvement to a myopic policy, while two others are based on quadratic approximations for the value function. We test these heuristics against more 'naive' decision-making policies using simulation experiments.
Our work builds upon earlier research by introducing a new methodological approach to address problems involving optimal resource allocation in highly congested queueing systems. Our modelling assumptions are similar to those made in the existing airport literature, but our solution methods (including the development of value function approximations based on simplified queueing dynamics) are innovative and provide an alternative to rigorous dynamic programming methods which tend to be slow and reliant upon computational 'fixes', e.g. discretizations of the state and action spaces. We anticipate several exciting possibilities for future research, including the potential for our busy-period analyses to be incorporated within hybridized optimization methods for large-scale transportation problems.
In Section 2 we formulate the problem of optimizing tactical decision-making during a pre-specified period of operations as a finite-horizon Markov decision process in which we aim to minimize a cost function which takes into account system congestion levels and operational costs associated with redeployment of resources. The "servers always busy" (SAB) approximation is discussed in Section 3. In Section 4 we introduce our 'surrogate' problem, discuss myopic policies and develop quadratic approximations for the value function. Section 5 describes our heuristic methods, and Section 6 provides the results of our numerical experiments. In Section 7 we summarize the contributions of this paper.

Model formulation 2.1 The demand and service processes
There exists a well-established convention for airports to be modelled as queueing systems in which arrival and departure queues evolve independently according to M (t)/E k (t)/1 dynamics [30,32,33,35,42,43]. The use of nonhomogeneous Poisson processes to model 'arrivals' to the runway (i.e. take-offs and landings) allows demand rates to be calibrated according to the schedule of operations for the day and other relevant operational factors. For service times, the E k (t) (time-dependent Erlang) assumption offers the advantage of mathematical tractability and also allows many types of M (t)/G(t)/1 queues to be well-approximated due to the "richness" of the Erlang family of distributions [24]. The treatment of arrivals and departures as independent queueing processes is arguably an oversimplification, since aircraft arriving at an airport will enter the queue for departures some time later; however, empirical studies have suggested that this assumption is adequate for practical purposes [45].
In this paper we consider a queueing system in which two classes of customer arrive via separate, independent nonstationary random processes. In keeping with the airport paradigm, we will refer to the customer classes as A (for arrivals) and D (for departures). We will consider two possible cases for the random processes governing customer arrival times: • Case 1: Arrivals for customers of classes A and D follow separate, independent nonhomogeneous Poisson processes with demand rate functions λ(t) and η(t) respectively (t ≥ 0).
• Case 2: An individual customer i (of either class) is scheduled to arrive at time ω i . However, their actual arrival time is ω i − E[ξ i ] + ξ i , where the ξ i are mutually independent, exponentially-distributed random variables and E[ξ i ] takes a uniform value of eitherξ A > 0 orξ D > 0 depending on whether customer i is of class A or class D.
As mentioned above, models with Poisson arrivals have been relied upon extensively in the previous aviation literature and empirical studies have justified their use [14,29,52]. In recent years, however, there has been a trend towards considering models in which inter-arrival time variances are smaller than one would observe under a Poisson model [27,36]. A key factor behind this trend has been the development of projects such as NextGen in the USA [38] and SESAR in Europe [17], which are intended to make flight arrival times more predictable through the use of four-dimensional trajectory-based operations. We therefore include Case 2, which bears similarities to pre-scheduled random arrival (PSRA) models considered in recent literature [9,23,25,36], as an alternative to the Poisson model. We assume that service times for all customers are Erlang-distributed and consist of k ≥ 1 i.i.d service phases; these phases are exponentially distributed with intensity rates kµ(t) and kν(t) for classes A and D respectively, where t ≥ 0 is the continuous time variable.
The service rates µ(t) and ν(t) are within the realm of tactical control in our model and are modelled as piecewise constant functions, as in [19,30,31]. We consider a period of operations [0, T ] and let 0 = t 0 < t 1 < ... < t S = T be a partition of this interval into S equal-length time intervals. For each s ∈ {1, 2, ..., S}, the interval [t s−1 , t s ) will be referred to as time interval s. At the beginning of interval s ≥ 1, a decision-maker chooses a pair of service rates (µ s , ν s ) for the respective customer classes and these remain constant until the next interval, i.e. µ(t) = µ s and ν(t) = ν s for t ∈ [t s−1 , t s ). For each s ∈ {1, 2, ..., S}, there is a set of linear constraints which governs the service rates µ s , ν s that may be chosen. We will use J s to denote the number of linear constraints in interval s and write these constraints where the parameters γ js and θ js are positive constants. We will also assume that γ 1s < γ 2s < ... < γ Js,s and that the intersection points between pairs of constraints are located in the positive quadrant of the µ s -ν s plane, in order to avoid redundancy. For example, in Figure 1, the decision-maker may choose any point in the region R s enclosed by the lines L 1s , L 2s , L 3s and the coordinate axes, and this corresponds to a pair of service rates (µ s , ν s ) to be used during interval s. In fact, it can be assumed that the decision-maker always chooses a point on the upper boundary of R s , since the objectives that we consider in this paper imply that one should always choose the largest possible value of ν s for a given value of µ s (this can be shown using a simple sample path argument, which we have omitted here). It follows that the decision-maker only needs to choose one value (µ s ) at each time interval, with ν s then being given via the dominant linear constraint.
The assumption of linear constraints is well-suited to the airport applications that we are considering, since it is standard practice in the literature to represent the maximum number of departures that can be processed in any time interval as a function of the number of arrivals by means of a piecewise linear capacity envelope [19]. This capacity envelope can be constructed using an intricate procedure described in [46] which depends on aircraft separation requirements, approach speeds, runway occupancy times and other data. Recently, Jacquillat and Odoni [30] recommended interpreting each point on the piecewise linear envelope as a pair of average service rates for arrivals and departures (taking into account the uncertainty involved in actual take-off and landing times) and referred to the envelope as an operational throughput envelope (OTE). This interpretation makes more sense in the context of our paper, since we are treating each µ s (or ν s ) as an expected number of service completions per unit time for class A (or D) during interval s, rather than an upper bound. In reality, the exact shape of an OTE is interval-dependent and cannot be predicted with certainty for future time intervals, since it depends on weather conditions and also the mix and sequencing of different types of aircraft entering the queues for the runways. However, airports invariably have sophisticated meteorological equipment available for making weather forecasts and the mixture of air (or ground) traffic at an arbitrary point in time is heavily dependent on the predetermined schedule for arrivals (or departures), so it is reasonable to suppose that the OTEs for time intervals in the not-too-distant future can be predicted with a high degree of confidence.
The system evolves as a pair of independent, first-come-first-served queues of type M (t)/E k (t)/1 or P SRA/E k (t)/1, depending on the nature of the arrivals processes. We assume that the service rates selected at the beginning of any time interval are chosen with knowledge of the latest observed numbers of customers waiting in both queues, the random distribution of future arrival times and the service rate constraints for future intervals. In the PSRA case, the history of previous customer arrivals affects future random arrival times and we assume that this historical information is also known.

The objective function
We assume that the objective of the decision-maker is to find an admissible dynamic policy π for choosing the service rates µ 1 , µ 2 , ..., µ S which minimizes where X s and Y s are the respective numbers of Erlang service phases for customers of classes A and D remaining to be processed at the end of interval s, x 0 and y 0 are the initial values for these phase counts, k is the number of Erlang service phases per customer (as stated in Section 2.1), τ = t 1 − t 0 is the length of each time interval, and α > 0 and β ≥ 0 are constants. The quantity [kτ (µ s − µ s−1 )] 2 (which may be interpreted as the squared difference in the expected number of service phase completions between intervals s and s − 1 if the server remains constantly in use) represents the switching cost of changing the service rate (for class A) from µ s−1 to µ s . Thus, the decision-maker aims to minimize the sum of two terms: a weighted, aggregated measure of the congestion in the system over all time intervals, and a penalty which increases with the frequency and severity of changes made to the service rates.
Our motivation for including the second term in (1) is that, in many applications, there may be some cost associated with altering the service rates due to the disruption caused by having to re-allocate physical and/or personnel resources. At airports, for example, changes in the service rates for arrivals and departures may require re-deployment of ground resources (e.g. apron stands, buses, marshallers) and in some cases, complicated operations such as runway configuration changes might be required. In inventory problems, frequent changes to the production rates for different products may result in inefficiencies due to set-up costs, etc. However, in some applications the inclusion of a switching cost may be unnecessary and throughout this paper we make special reference to the case β = 0.
Weighted second moments of queue lengths have previously been used in [30] as a performance measure, although in our case the formulation is slightly different because X s and Y s represent the numbers of service phases (not customers) present after s time intervals. Throughout this paper, we refer to the number of phases present as the phase count. The factor α allows for the possibility of class A customers being more expensive to keep waiting than class D customers, or vice versa.
The use of weighted second moments in our objective function can be justified in various different ways. Due to the nature of the Erlang service process, if n ≥ 1 customers are present in one of the queues (including the customer being served), then the phase count r satisfies (n − 1)k + 1 ≤ r ≤ nk, where k is the Erlang parameter. As explained in [30], the total of the expected delays for n = r/k customers of the same class present in the system (where · denotes the ceiling function) is indeed of order n 2 and therefore also of order r 2 . Furthermore, taking into account the second moments of X s and Y s assists us in finding policies which will restrict the tails of their distributions, so that the system performs better with respect to measures such as the proportion of "on-time" service completions (as discussed in the introduction). In real-world applications, of course, practitioners are likely to be interested in meaningful performance measures rather than convoluted objective functions, and therefore it is desirable to show that a policy which performs well with respect to the objective 1 also achieves strong results with respect to other criteria. Later in this paper, we will examine the performance of our heuristics with respect to an alternative performance measure (expected waiting times) using numerical experiments.

MDP formulation
The problem that we are considering can be modelled as a Markov decision process (MDP) [39,41] with a finite time horizon in which we aim to minimize expected total cost. Below, we describe the state space, action space, transition probabilities and single-stage cost function for the MDP.
• The state space of the MDP, which we will denote by X , consists of vectors of the form where x and y are the latest observed phase counts for the respective customer classes andμ is the service rate for class A used in the previous time interval (this can be given a dummy value for the first interval). In the PSRA case, the extra state variables z and w are the numbers of class A and D customers respectively whose arrivals are 'imminent', in the sense that they have not yet arrived despite their earliest possible arrival times having passed (in other words, their exponential 'clocks' have already started running). Note that the memoryless property of the ξ i variables implies that we do not need to know specifically which of the pre-scheduled arrivals have not yet occurred; we only require the number of imminent arrivals for each customer type. The inclusion ofμ as a state variable implies that the state space is uncountable, unless we are considering the special case β = 0 in which caseμ may be omitted.
• The action space at interval s is one-dimensional and consists of the continuous range of service rates µ s which satisfy the feasibility constraints described in Section 2.1 (based on the relevant OTE). Specifically, given a set of constraints of the form γ js µ s + ν s ≤ θ js in interval s, we require 0 ≤ µ s ≤ min j {θ js /γ js }. As explained in Section 2.1, we can then assume that the complementary service rate ν s is given by θ js − γ js µ s .
• Transition probabilities for our MDP are governed by the underlying nonstationary queueing dynamics. For example, in the case of Poisson arrivals, these probabilities are based on the Chapman-Kolmogorov equations for M (t)/E k (t)/1 queues (see, for example, [22]). We have included a full description of these equations in Appendix A.1.
• The one-stage cost function is related to the objective function (1) and can be defined as • The Bellman optimality equation can be written, for x ∈ X , as where Θ s is the feasible set of service rates for interval s, X s is the random state at the end of interval s and we set V S+1 (x) := 0 for all states x since our objective function (1) does not include costs incurred beyond the end of slot S. The functions V 1 , V 2 , ..., V S are referred to as value functions. We also use V (x) := V 1 (x) to denote the optimal expected total cost starting from state x.
The inclusion of phase counts in our cost and objective functions makes sense from an optimization perspective, since we should be able to obtain stronger-performing policies by considering the fine-grain state of the system. However, we note that the phase counts are a mathematical construct and in physical applications would not be observable by a decision-maker. Instead, decisions would be taken in response to the numbers of customers in the queues. In this paper, we take the approach of finding an optimal phase-dependent policy (allowing decisions to depend on the exact phase counts), and then aggregating this to obtain a customer-dependent policy which chooses the same decision at any two states (x, y, z, w,μ) and (x, y, z, w,μ) such that x,x ∈ {(n − 1)k + 1, ..., nk} for some n ∈ N. Various methods of aggregation are possible, but in this paper we adopt a simple averaging method which works as follows: given that n and m customers are observed in the A and D queues respectively andμ is the service rate for the previous time interval, let the service rate chosen for class A be the mean of the service rates specified by the phasedependent policy at the k 2 states (x, y, z, w,μ) satisfying both (n − 1)k ≤ x ≤ nk and (m − 1)k ≤ y ≤ mk.
Our simulation results in Section 6 incorporate this aggregation method.
Unfortunately, the MDP formulated above cannot be solved using conventional dynamic programming (DP) methods such as value iteration or policy iteration. The reasons for this include the continuous state and action spaces, the large number of possible random transitions between consecutive time intervals and the fact that the transition probabilities themselves can only be obtained accurately via repeated recourse to slow numerical methods. Indeed, any computationally feasible DP algorithm would require an excessive amount of time to obtain a mere approximation to an optimal solution. The rest of this paper focuses on the development of heuristic policies which should be effective under busy conditions.
3 The "Servers Always Busy" (SAB) approximation As discussed in the introduction, we aim to make progress by restricting attention to an interval of time [0, T ] during which there is a high probability of both queues remaining non-empty. Initially, we will consider the first of the two cases for arrival times described in Section 2, in which customers arrive via nonhomogeneous Poisson processes. For class A, we use λ(t), µ(t) and X(t) to denote (respectively) the arrival rate, the service rate and the number of phases remaining to be completed (i.e. the phase count) at time t, where t ∈ [0, T ] is a continuous time variable. For class D, the corresponding functions are denoted by η(t), ν(t) and Y (t). Let us also define two random variablesX(t) andŶ (t) bŷ where x 0 = X(0) and y 0 = Y (0) are the initial phase counts and A(t), B(t), D(t) and C(t) are mutually independent Poisson random variables: Note that B(t) and C(t) are phased-based measures (service phase completion counts), but A(t) and D(t) are customer-based measures (customer arrival counts). If the servers are always busy during [0, T ] then the customer arrival and service phase completion processes for the individual queues are indeed independent Poisson processes and we haveX(t) ≡ X(t) andŶ (t) ≡ Y (t), i.e.X(t) andŶ (t) mirror the physical state of the system. If the servers are almost always busy during [0, T ] then we can expect X(t) andŶ (t) to provide good approximations for X(t) and Y (t). In such cases, we can approximate the probability distributions at t ∈ [0, T ] by exploiting their relations to the random variables in (6). If t is sufficiently large, we can make this process more practical computationally by claiming that, by the central limit theorem,X(t) andŶ (t) are approximately normally distributed with means and variances given as On the other hand, suppose we are considering the second case (PSRA) described in Section 2, whereby customer arrival times are randomly displaced from their pre-scheduled times. For simplicity we will describe the situation only for the first queue (with state X(t)); the situation for the Y (t) queue is obviously analogous. Let ω i denote the scheduled arrival time of the i th customer, so that ω i −ξ A is their earliest possible time of arrival. We consider a random variablê where x 0 = X(0) and B(t) ∼ Po k t 0 µ(u)du (as before), but this time with m = max{i : ω i −ξ A < t}. Here, H i (t) is a Bernoulli random variable equal to 1 if the i th customer arrives before time t and 0 otherwise. Hence, If approximations are required for state probabilities, then (as in the Poisson case) we can make use of the central limit theorem for sufficiently large t and write and Var[X(t)] derived from (8) (this holds since the variables H i (t) are independent of each other, albeit not identically distributed). However, the ADP methods that we develop later in this paper do not actually require approximations for state probabilities.
Taking into account the importance of being able to apply our methods to real-world scenarios, we tested the quality of the SAB approximation using data obtained from John F. Kennedy International Airport in New York. Our experiment included time-dependent Poisson demand rates for arrivals and departures which were constructed using a particular flight schedule, and two different sets of capacity constraints (OTEs) based on alternative weather scenarios. Based on these capacity constraints, a naive 'greedy' heuristic was used to select the time-varying service rates and the quality of the SAB approximation was then tested by comparing its estimates for the objective function (1) with 'exact' values derived using Chapman-Kolmogorov equations. A full description of our experimental procedure and results can be found in Appendix A.2, and we also summarize the results here: • We found that, throughout a period of time beginning at roughly 4:30PM and ending at roughly 10:00PM on the day of operations that we considered, the arrivals queue and the departures queue both had idleness probabilities smaller than 0.05 in the 'good weather' case. In the 'poor weather' case, these probabilities were consistently smaller than 0.03.
• The error incurred by using the SAB approximation to estimate the objective function (1) over a period of 22 consecutive 15-minute time intervals, beginning at 4:30PM and ending at 10:00PM, was only 0.26% in the good weather case, and 0.02% in the poor weather case.
• In the poor weather case, it was possible to consider a longer time period beginning at 3:15PM and ending at 11:15PM without incurring a SAB approximation error greater than 0.5%.
Based on these findings, we propose to develop heuristic optimization strategies which offer the poten-tial to improve the quality of decision-making at JFK Airport during a schedule-dependent 'busy' period of the day (e.g. 4:30PM to 10:00PM) by making use of the SAB approximation.

A surrogate optimization problem
As discussed earlier, the MDP formulated in Section 2.3 cannot be solved using conventional dynamic programming (DP) methods. We therefore aim to develop heuristic policies which are based on the simplified queueing dynamics discussed in Section 3.
In this section we introduce a surrogate optimization problem which closely resembles the problem formulated in Section 2, especially in cases where customer demand is consistently high in relation to service capacity. As discussed in the introduction, we intend to utilize this surrogate problem by (i) characterizing myopic policies and investigating the conditions under which these perform well; (ii) investigating the nature of optimal policies by finding quadratic approximations for the value functions in terms of the state variables; (iii) developing heuristic methods for the original (not the surrogate) optimization problem based on the properties of these myopic policies and quadratic approximating functions.
Please note that throughout this section we will make repeated use of the variables x, y, z, w andμ (as in (2)) to represent a generic observed state at the beginning of an arbitrary time interval s, without including an explicit s-dependence on these variables.
The surrogate problem differs from the problem formulated in Section 2 in the following ways: 1. For each s ∈ {1, 2, ..., S}, we remove the constraint µ s ∈ Θ s and allow the service rate µ s (for class A) to take any positive or negative real value, with the corresponding service rate ν s (for class D) then being given by the dominant linear constraint. Specifically, given a set of J s linear constraints {γ js µ s + ν s ≤ θ js } Js j=1 , the feasible region for (µ s , ν s ) is now and is therefore an infinite region given by the intersection of closed half-planes.
2. Following the selection of (µ s , ν s ) ∈ R 2 in time interval s, the random phase counts X s and Y s at the end of interval s are given by where sgn(·) denotes the signum function, x and y are the phase counts at the beginning of interval s, and A s , B s , C s and D s are mutually independent, discrete random variables. In particular, A s and D s (which represent the numbers of customer arrivals during interval s) obey the same laws as in the original problem and these depend on which case we are considering for the demand processes (either Poisson or PSRA). We define B s and C s as follows: Note that, as a result of (11), a negative service rate causes the phase count to increase by 1 rather than decreasing by 1 at the end of a service phase completion.
We will denote the one-step cost function in the surrogate problem by so that it essentially has the same definition as f s (x, µ s ) (in the original problem) except that the expectation is with respect to the dynamics of the surrogate problem. The objective in the surrogate problem is to find a dynamic policy π for choosing µ 1 , µ 2 , ..., µ S ∈ R which minimizes Throughout Sections 4 and 5 we will assume that the time interval duration τ is equal to one. This is without loss of generality, since the demand rates and service rates can always be re-scaled appropriately.
We should also note that the objective function (13) implies that, as in the original problem, the service rates (µ s , ν s ) should always be chosen on the upper boundary of the feasible region. Therefore the notation in (12) makes sense, since the complementary service rate ν s is given as a deterministic function of µ s via the dominant linear constraint.
By assuming that the random variables in (11) are independent, we implicitly relax the constraints on the phase counts X s and Y s by allowing them to take negative values. Effectively, we are considering a mathematical abstraction of our original problem in which servers remain busy at all times and can cause phase counts to either increase or decrease (depending on the signs of the service rates). Naturally, this abstraction makes it easier to perform analyses since we are approximating the queueing dynamics of the original system using independent random variables. At the same time, our analyses should provide useful insights into the nature of optimal policies during busy periods in which phase counts are likely to remain positive. In addition, given that (µ s , ν s ) must be located on the upper boundary of the feasible region, a negative value for either µ s or ν s implies that |µ s − ν s | is relatively large, i.e. it implies imbalanced service rates. We can argue that imbalanced service rates are unlikely to be chosen by the decision-maker under certain plausible cases for the parameters of our problem, as described below: (i) The nature of the objective function (1) implies that, under an optimal policy, the decision-maker should try to maintain an appropriate balance between the phase counts X s and Y s . Hence, imbalanced service rates will usually not be advantageous unless there is a significant imbalance between demand rates and/or service rate constraints for classes A and D.
(ii) The incorporation of non-zero switching costs (i.e. setting β > 0) will tend to deter the decisionmaker from making drastic changes to the service rates from one time interval to the next, thereby reducing the likelihood of a negative service rate being chosen.
(iii) If the variances of A s and D s are relatively small (which may happen if we are considering prescheduled arrival times, for example) then the variances of X s and Y s will also be small, further reducing the likelihood of the decision-maker needing to select imbalanced service rates.
To summarize the above, each of the following characteristics of our original problem should tend to increase the likelihood of optimal policies and optimal costs being similar for the original and surrogate problems: (i) consistently high demand-to-capacity ratios over all time intervals; (ii) similarity between classes A and D with respect to time-varying demand rates and service rate constraints; (iii) non-zero switching costs; (iv) small variances for customer arrival times. Of course, these conditions may not necessarily be satisfied in real-world applications, but if they are satisfied then this should imply a stronger performance for the heuristics that we derive in Section 5. Numerical experiments can be used to illustrate the effects of varying demand rates, service rate constraints etc. (see Section 6).
In searching for an optimal policy which minimizes (13), a useful starting point is to find a myopic policy which operates by minimizing the expected single-step cost g s (x, µ s ) at each time interval s. In Section 4.1 we provide characterizations for myopic policies, and in Section 4.2 we examine the nature of optimal policies for our surrogate problem.

Myopic policies
Our assumptions regarding the set of linear constraints {γ js µ s + ν s ≤ θ js } Js j=1 are the same as in the original version of the problem. Specifically, we assume that γ 1s < γ 2s < ... < γ Js,s and that the intersection point between any pair of consecutive constraints γ js µ s + ν s = θ js and γ j+1,s µ s + ν s = θ j+1,s , which we will denote by σ js , lies within [0, θ Js,s /γ Js,s ] (otherwise there would be at least one redundant constraint in the original problem). Using this notation, we can state that We are seeking to find a myopic decision which minimizes the function g s (x, µ s ) defined in (12).
For ease of notation, let the intervals (−∞, σ 1s ), (σ 1s , σ 2s ), ..., (σ Js−2,s , σ Js−1,s ), (σ Js−1,s , ∞) be denoted by Ω 1s , Ω 2s , ..., Ω Js,s respectively. On each of the intervals Ω js , we can check to see if g s (x, µ s ) attains a local minimum. First, evaluating the means and variances of X s and Y s , we have Hence, evaluating the second moment of X s , we have where It follows from (17) and (18) that g s (x, µ s ) is a continuous, piecewise quadratic function of µ s . The boundaries between the quadratic pieces include the vertices σ js (for j = 1, 2, ..., J s ) and also the end-points of the interval [0, θ Js,s /γ Js,s ] at which the service rates change sign. On each quadratic piece, we can check to see if a local minimum is attained. The expressions for these local minima depend on the dominant linear constraints in the regions in which they are found, and also on the signs of µ s and ν s in these regions.
For example, if a local minimum is attained at some point µ s ∈ (0, θ Js,s /γ Js,s ) in which the j th constraint is dominant, then the expression for this local minimum is Other possible cases are discussed in Appendix A.3. To simplify the search for a global minimizer of g s (x, µ s ) we provide the following result, which is proved in Appendix A.3. .., σ Js,s }, which we will denote by σ * s . With this notation, we can conclude that the myopic service rate is given by

Optimal policies
Although the surrogate problem that we are considering enables myopic service rates to be calculated quite easily, finding a policy π which minimizes the objective function (13) remains a challenging problem.
Our approach in this section is to look for a simple approximation to U (x) (specifically, a parametric function of the system state x) and use this approximation to develop a heuristic approach. In where U 1 , ..., U S are finite-stage value functions and U S+1 (x) := 0 for all x ∈ X . We aim to find the minimizing value of µ s in (21), assuming that the relationships ν s = θ s − γ s µ s hold for intervals s.
Let us begin by considering U S (x), the optimal return under state x with only one interval remaining until the end of the horizon. Naturally, a myopic choice of service rate is optimal at this interval. Although we are considering a single linear constraint, the cost function g S (x, µ S ) still exhibits piecewise quadratic behavior due to the nature of the variances in (16) as the service rates change sign (this is explained further in Appendix A.3). Ideally, we would like to express U S (x) as a simple quadratic function of the state variables, but the piecewise behavior of g S (x, µ S ) prevents this. However, we can derive a lower bound for U S (x) by omitting the modulus signs in (16) and writing Equality holds in (22) if and only if µ S and ν S = θ S − γ S µ S are both non-negative, which (as we have argued earlier) is likely to be the case for many states x ∈ X under plausible modelling assumptions.
where L j (resp. M j ) is the number of customers of class A (resp. D) scheduled to arrive at the beginning of interval j, and p j (resp. q j ) is the probability that a customer of class A (resp. D) scheduled to arrive at the beginning of interval s + j arrives during interval s. (Note: this notation makes sense when j = 0 since, due to the memoryless property, we can treat all customers who have not yet arrived by interval s as if they were scheduled to arrive at interval s.) For simplicity we have assumed in (23)-(24) that all customers scheduled to arrive during interval j have the same scheduled arrival time (this might be at the beginning of interval j, for example), but these expressions can easily be modified if this is not the case.
Minimizing the right-hand side of (22) over µ S ∈ R yields a local minimum at and with this value of µ S , the right-hand side of (22) can be expressed as where P S is a matrix of order n × n (where n is the number of state variables), q S ∈ R n and r S ∈ R.
Importantly, P S , q S and r S do not have any dependence on x or µ s . We can infer from (26) that U LB S (x) is a quadratic function of the state variables. Note that the number of state variables n can be 2, 3, 4 or 5, depending on the type of arrivals distribution and the value of β. For example, in the case of Poisson arrivals with β = 0, we have x = (x, y) ∈ Z 2 and it can be shown that (We have omitted the expression for r S because it is too long.) Evidently, U LB S (x) is a lower bound for U S (x) since the latter is obtained by minimizing g S (x, µ S ), whereas the former is obtained by minimizing an expression which is bounded above by g S (x, µ S ). Let us proceed by obtaining lower bounds for U s (x) for each s ∈ {1, 2, ..., S − 1}. We propose the following lemma.
where P s+1 is a real-valued matrix of order n × n, q s+1 ∈ R n , r s+1 ∈ R and n is the number of state variables. Let U LB s : X → R be defined by where andÃ s+1 andB s+1 are the coefficients of x 2 and y 2 respectively in the expression for U LB s+1 (x). Then there exists a real-valued n × n matrix P s , a vector q s ∈ R n and a constant r s ∈ R such that Proof of the lemma can be found in Appendix A.4. We note that ψ s (µ s ) is a piecewise linear function which 'compensates' for the fact that Var[X s |x, µ s ] and Var[Y s |x, µ s ] are not linear functions of µ s . If either µ s or ν s = θ s − γ s µ s is negative then ψ s (µ s ) takes an artificial non-zero value in order to ensure that a quadratic representation of U LB s (x) is still possible. By using the result of Lemma 4.2 iteratively, we can find suitable values P s , q s and r s satisfying (30) for each s ∈ {1, 2, ..., S}. We can then write for any state x ∈ X and s ∈ {1, 2, ..., S}. Generally it is not worthwhile to write expressions for P s , q s and r s in terms of the system parameters, since these expressions are too convoluted. However, an interesting special case occurs when β = 0 and the OTE is the same for each time interval, so that γ s = γ and θ s = θ for each s ∈ {1, 2, ..., S}. This is the subject of the next subsection.

A single, fixed linear constraint with β = 0
The following lemma states what may be proved in this special case.
whereÃ s ,B s ,C s andD s are constants which depend on the system parameters. In particular, the values ofÃ s ,B s andC s are given bỹ (b) In the case of pre-scheduled arrivals, we have whereÃ s ,C s ,F s ,G s ,H s ,J s ,K s ,L s ,M s ,P s andZ s are constants which depend on the system parameters. In particular,Ã s andC s have the same values as in (32).
Furthermore, the service rate µ OPT s (x) which attains the minimum in (28) is where and µ MYP s (x) is the myopic service rate: The proof of Lemma 4.3 can be found in Appendix A.6. If we consider a problem in which the cost function g s (x s , µ s ) is replaced by g s (x s , µ s ) + ψ s (µ s ), then (35) provides us with a simple condition for the optimality of a myopic policy. We state this below as a corollary.
Corollary 4.4. Consider a decision problem in which we aim to minimize where π is a policy for choosing the service rates µ 1 , µ 2 , ..., µ S ∈ R. We assume a single constraint γµ s +ν s ≤ θ applies for all s ∈ {1, 2, ..., S} and β = 0. Then a myopic policy is optimal if and only if α 2 = γ 3 .
Proof. The value functions for the problem described in the corollary are U LB s (x), defined recursively via (28). By Lemma 4.3, we see that an optimal policy (with respect to U LB s (x)) differs from a myopic policy by adding a constant term (α 2 − γ 3 )/(2k(α + γ 2 ) 2 ) to the myopic service rate, and hence the myopic and optimal policies coincide if and only if α 2 = γ 3 .
It is important to note that the relationship α 2 = γ 3 holds if α = γ = 1, in which case the two classes of customer are equivalent with respect to both congestion charges and resource consumption rates. This might apply to many types of application in which there is no obvious reason to treat one class of customer differently from another. The fixed service rate adjustment δ increases with α but decreases with γ, which is logical given the physical interpretations of these parameters.
Although Corollary 4.4 is related to U LB (x) rather than our original value function U (x), the differences U (x) − U LB (x) should be relatively small if (i) the optimal policy for the surrogate problem rarely chooses negative service rates, and (ii) phase counts for the surrogate problem rarely become negative. At the beginning of Section 4 we suggested conditions under which these effects should be observed. • The difference U MYP (x) − U LB (x) tends to increase as demand rates become more nonstationary, and also increases as the demand-to-capacity ratio increases.

In Appendix A.5 we have provided a numerical example in which customers of classes
• However, even in cases where total demand consistently exceeds capacity by as much as 100%, U MYP (x) − U LB (x) is generally smaller than 0.25%.
We also repeated our experiments with different values of α and γ, with similar results (please refer to Appendix A.5 for full details). The second of the bulleted points above is especially relevant in the context of this paper, since in problems where demand greatly exceeds capacity, we can expect the dynamics of our surrogate problem to closely resemble those of the original problem. We therefore have some evidence to suggest that, in the case of a single, fixed linear constraint with β = 0, myopic policies are very close to optimal in the surrogate problem -and this should also apply to the original problem if demand rates are sufficiently high. Our numerical results in Section 6 will demonstrate the effects of time-varying service rate constraints and positive β values on the performances of myopic policies.
where (x, y) is the state at the beginning of interval s, I(·) denotes the indicator function, λ(t) and η(t) are continuous flow rates and µ s and ν s are the chosen service rates for interval s, which are both non-negative and satisfy γµ s + ν s = θ. The phase counts at the end of interval s are X s = X(t s ) and Y s = Y (t s ). The value function for this problem will be denoted by V FF (x), where and π is a decision-making policy which, in this deterministic setting, is simply a vector of service rates.
In the 'surrogate' version of our fluid flow problem, we do not enforce non-negativity for either the service rates or the phase counts and we replace the definitions of X(t) and Y (t) above by with U FF (x) used instead of V FF (x) to represent the corresponding value function.
(with U FF S+1 (x) := 0 for all x ∈ X ) are given by U FF s (x) =Ã s (γx + y) 2 +B s (γx + y) +D s ∀x ∈ X , s ∈ {1, 2, ..., S}, whereÃ s andB s have the same definitions as in (32) andD s is constant. For each interval s ∈ {1, 2, ..., S}, the myopic and optimal service rates for the surrogate problem are given by and also suppose that for each s ∈ {2, ..., S}. Let n 0 = min{x 0 , y 0 }. Then and the service rates in (39) are optimal in both versions of the fluid flow problem.
Proposition 4.5, which is proved in Appendix A.7, provides us with conditions involving the demand rates and capacity envelope parameters which ensure that the simple myopic policy represented by (39) is also an optimal policy for both the surrogate problem and the original problem. Although the result concerns a modified version of the problem with deterministic fluid flow dynamics, the conditions (40) and (41) (or similar conditions adapted to the case of pre-scheduled arrivals) might also imply a strong performance for the aforementioned policy in the original stochastic problem, especially if the variances for end-of-interval phase counts X s and Y s are small.

Multiple linear constraints
We now discuss the case of multiple linear constraints. As before, let J s be the number of constraints in interval s, so that the j th constraint (for j ∈ {1, 2, ..., J s }) has the form γ js µ s + ν s ≤ θ js . Also, let (n s , n s+1 , ..., n S ) be an (S − s + 1)-tuple, such that n i ∈ {1, 2, ..., J i } for i ∈ {s, s + 1, ..., S}. The tuple (n s , n s+1 , ..., n S ) represents a sequence of individual constraints 'chosen' from the constraint sets for intervals s, s + 1, ..., S. Given a particular tuple (n s , n s+1 , ..., n S ) we can obtain a quadratic function U LB (ns,n s+1 ,...,n S ) (x) such that U LB (ns,n s+1 ,...,n S ) (x) ≤ U s (x) ∀x ∈ X by assuming that the relationship ν i = θ n i ,i − γ n i ,i µ i holds for all µ i ∈ R in interval i ∈ {s, s + 1, ..., S} and then constructing the quadratic function U LB (ns,n s+1 ,...,n S ) (x) in exactly the same way as described in Section 4.2.2, replacing single-constraint parameters such as γ s and θ s by γ js and θ js as appropriate. Conceptually, for each of the intervals i ∈ {s, s + 1, ..., S}, we are drawing a straight line over the top of the OTE for interval i and deriving a lower bound by supposing that the decision-maker is allowed to choose any point on this line. However, the lower bound U LB (ns,n s+1 ,...,n S ) (x) obtained in this way is unlikely to be very tight if there are many constraints, since the local minima found on the unbounded straight lines will rarely satisfy the constraints for the actual surrogate problem. A tighter lower bound is given by (ns,n s+1 ,...,n S ) U LB (ns,n s+1 ,...,n S ) (x), but this is a piecewise quadratic function of x which cannot be computed easily due to the large number of possible tuples (n s , n s+1 , ..., n S ) that one must consider as S − s increases.
A computationally feasible algorithm would need to be based on a simple representation of U LB s (x), similar to the quadratic functions that we were able to obtain in the case of single linear constraints. In Section 5 we introduce a number of heuristic policies, two of which are based on the idea of approximating the value functions V s (x) using quadratic functions of the state variables.

Design of heuristics
In this section we introduce 4 different heuristics. All of these are dynamic decision-making policies which choose a service rate µ s for class A in response to an observed system state x ∈ X at time interval s (with the service rate for class D, ν s , being given by the OTE for interval s). We present the 4 heuristics in increasing order of the amount of computational effort required.
All of the heuristics that we develop in this section are based on optimal (or near-optimal) solutions to the surrogate problem described in Section 4, in which negative service rates and phase counts are permitted. However, in Section 6 we test the heuristics by applying them to the original problem formulated in Section 2 (which does not permit negative service rates or phase counts).

Myopic policy (MYP)
Under this policy, we choose the service rate µ MYP

Approximate Policy Improvement (API)
This heuristic offers a computationally inexpensive method of improving upon a myopic policy. The principle is that, at time interval s ∈ {1, 2, ..., S − 1} and state x ∈ X , we choose the service rate µ s which minimizes the estimated two-step return where E[X s |x, µ s ] is the expected state at the end of interval s given that we choose service rate µ s . Effectively, we are relying upon a forecast of the next state in order to predict the service rate µ MYP s+1 (E[X s |x, µ s ]) chosen by the myopic policy at the beginning of interval s + 1, rather than taking into account the full distribution of µ MYP s+1 (X s ) as a function of the random phase counts X s and Y s (which would require a prohibitive amount of computational effort). This is why we describe the heuristic as approximate policy improvement. For the final time interval S, a myopic service rate can be chosen. We also use the same truncation method as in the myopic policy, so that µ s , ν s ∈ [0, θ Js,s /γ Js,s ].
In principle, we could extend this idea and obtain an approximate m-step improvement policy by calculating the best service rate for interval s under the assumption that myopic service rates are chosen for intervals s + 1, s + 2, ..., s + m. This method would also rely upon predictions of the myopic service rates for future time intervals based on expected values of the phase counts. However, our experiments have shown that any benefit obtained by implementing m-step improvement as opposed to one-step improvement is usually very slight, and in some cases it actually performs worse -perhaps because the forecasts for future phase counts become less reliable as the 'look-ahead distance' increases.

Coefficient Averaging (CAV)
This heuristic is motivated by the discussion in Section 4.2.3 and is intended as a fairly naive but inexpensive method of acquiring quadratic approximations for the value functions in a problem with multiple linear constraints. It uses backwards induction, starting from the final time interval S. For each of the linear constraints γ jS µ S + ν S ≤ θ jS effective at interval S, we construct the function where g jS (x, µ S ) is similar to the usual single-step expected cost g S (x, µ S ) except that it assumes the relationship ν S = θ jS − γ jS µ S holds for all µ S ∈ R, and ψ jS (µ S ) is similar to the function ψ S (µ S ) defined in (29) except that γ S and θ S are replaced by γ jS and θ jS respectively. The analyses in Section 4.2.1 imply that, for each j ∈ {1, 2, ..., J S }, U LB jS (x) is a quadratic function of x. We then create a new function U CAV As demonstrated by the analyses in Section 4.2.1, the functions U LB j,S−1 (x) are linear combinations of the same variables as the functions U LB jS (x). We can therefore obtain a function U CAV S−1 (x) using the same coefficient-averaging procedure as that used for U CAV S (x), and we repeat these steps to obtain functions

Least-squares Fitting (LSF)
This heuristic, like the previous one, is based on backwards induction and the use of a parametric function of the system state to approximate the value functions V s (x). In this case we assume, a priori, that V s (x) can be well-approximated by a quadratic function of the state variables and use least-squares fitting (after 'sampling' the values at a selection of states) to find suitable values of the coefficients that appear in this quadratic function. The steps are as follows: 1. Set s = S andV S+1 (x) = 0 for all states x ∈ X .
2. Select a certain number of system states in X either randomly or systematically (choosing a large number of states should yield a stronger performance at the expense of greater computation time).
Let M s ⊂ X denote the set of states chosen.
3. For each x ∈ M s , calculate the value 4. Use least-squares regression to minimize where Q s (x) is quadratic in the state variables and the minimization is carried out with respect to the coefficients in Q s (x). LetV s (x) denote the function Q s (x) which minimizes (44). The LSF heuristic chooses the service rate µ LSF s (x) ∈ Θ s which attains the minimum in (43).

Numerical experiments
In this section we evaluate the performances of the heuristic policies proposed in Section 5 using results gathered from numerical experiments with randomly-generated system parameters. Noting that we cannot feasibly compute optimal solutions to the optimization problem formulated in Section 2, we instead take the approach of comparing the performances of our SAB-based heuristics with that of a more naive 'baseline' decision-making policy which might be employed by a decision-maker in practice. In Section 6.1 we describe our baseline heuristic. The majority of our numerical results are presented in Section 6.2, and in Section 6.3 we investigate the effects of our heuristics on customer waiting times.

A naive baseline heuristic
The naive heuristic that we introduce here, which we will refer to as a "myopic demand ratio" (MDR) heuristic, is based on a measure of the demand placed on the server by customers of class A as a proportion of the total demand imposed by both customer classes over a single time interval. The heuristic works as follows: at the beginning of interval s ∈ {1, 2, ..., S}, the decision-maker observes the phase counts x and y for classes A and D respectively and also calculates the expected numbers of customers arriving during interval s, E[A s |x] and E[D s |x]. Taking into account that customers of class A are more expensive to hold in the queue by a factor α, the decision-maker calculates the 'effective demand ratio' as They then select the unique point (µ s , ν s ) on the OTE for interval s which results in the ratio µ s /ν s being equal to the effective demand ratio in (45).
The principle of making decisions based on short-term expected queue lengths has previously been adopted by Jacquillat and Odoni [31], who proposed a heuristic for air traffic control based on setting the service rate for arriving aircraft to be as close as possible to the 'effective arrival demand'.
It should be noted that the naive MDR heuristic -unlike the MYP, API, CAV and LSF heuristics developed in Section 5 (which we will refer to from this point on as the 'SAB heuristics') -is not based on any of the analyses in Section 4, and therefore does not rely upon a "servers always busy" (SAB) assumption. This suggests that any superiority that the SAB heuristics might have over the MDR heuristic might diminish as the SAB assumption becomes less reliable, i.e. as the system becomes less busy. Later in this section we will provide evidence to show that this is indeed the case.

Results from simulation experiments
We have created 28,800 test scenarios and used Monte Carlo simulation to test the performances of the 5 heuristics within each scenario. Details of the methods we have used for generating the parameters can be found in Appendix A.8. It should be noted that our heuristics are designed for use during congested periods, and therefore we have generated the random demand rates, initial phase counts and service rate constraints in such a way that server idleness is unlikely to occur.
For each scenario, we simulated the random evolution of the system under the 4 SAB heuristics and the MDR heuristic, carrying out 500 replications for each policy and ensuring that the same random number stream was used for all 5 policies. In all experiments, the random evolutions of the phase counts were simulated accurately but the decisions made under the various heuristic policies were made without reference to the exact phase counts; instead, these were made in response to the numbers of customers in the two queues and the aggregation method described in Section 2 was used to translate decisions based on phase counts to decisions based on observed numbers of customers.  The results indicate that the LSF heuristic performs best, and is able to improve upon the myopic policy by about 10% on average. The CAV heuristic is the second strongest, but (unlike the LSF) appears to become significantly weaker if pre-scheduled arrivals are considered rather than Poisson arrivals. In the PSRA case, the LSF and CAV heuristics both have to estimate a greater number of coefficients in their quadratic approximating functions, and the LSF heuristic seems able to adapt better to this than the more naive CAV heuristic. The API heuristic consistently performs better than the myopic policy, but is usually not able to match the more sophisticated CAV and LSF heuristics. The naive MDR heuristic is not able to perform as well as any of the SAB heuristics. Overall, as expected, the performances of the 4 heuristics are ranked in an order which corresponds with their computational requirements.
We now present further analyses of our numerical results, focusing only on the relative strengths of the 4 SAB heuristics (the naive MDR heuristic will be discussed again later in this section, when different levels of server 'busyness' are considered). We can gain greater insight into the relative strengths of the SAB heuristics by breaking down the results according to the values of individual system parameters. Figure 2 shows comparisons between the API, CAV and LSF policies for various different values of β (recall that β controls the weight of the 'switching costs' for changing service rates). The policies are compared with respect to the improvements they achieve against the myopic policy.
When β = 0, the API heuristic is relatively strong. This is because, under an optimal policy, service rates may change radically from one interval to the next and API explicitly takes into account the piecewise linear nature of the OTE and the fact that different regions of the OTE correspond to constraints with heuristics would be most suitable if the OTE was an infinite straight line. As β becomes larger, each OTE effectively becomes narrower as we only need to consider a small interval of possible µ s values as realistic candidates for optimality. We can see that for relatively small, positive values of β, API is much worse than CAV and LSF, but as β becomes larger API seems to improve. This may be because, as β becomes large, the API policy's prediction of the next system state becomes almost unimportant with respect to predicting the next service rate that will be chosen (µ MYP s+1 (X s )), so the API policy is effectively optimizing a two-step cost without any major uncertainty about the next service rate. Meanwhile, the CAV and LSF heuristics still maintain some superiority over API, but lack the ability to prioritize any particular segment of the OTE when approximating the shapes of the value functions. customer arrival rates also appears to harm the performance of the myopic policy (the parameter(s) which control the demand rate variability are explained in Appendix A.8). We also tried varying the parameters k and α, but did not observe any significant ensuing effects on the relative strengths of the various policies.
We make two additional observations here regarding the results in Tables 2-4: • The values shown in the last row of Table 2 (for δ OTE = 0.5) are quite large. This suggests that a myopic policy may be very weak if the OTEs vary greatly from interval to interval.
• By making comparisons across the columns of Table 3 we can see that the API policy's relative strength against the CAV and LSF policies seems to diminish as S increases. This may be because the API policy only looks ahead by one time interval − which makes it better-suited for problems with short time horizons.   Our experiments were implemented in Python on a 2.5 Ghz quad-core processor with 8 GB of RAM, and we observed that none of the 4 heuristics required any significant amount of computational effort in comparison to dynamic programming algorithms. The MYP and API heuristics use closed-form (albeit somewhat messy) formulas for their decision-making and are able to return decisions for all S time intervals in less than one second. The CAV and LSF heuristics differ from the more simple MYP and API heuristics by requiring the pre-computation of a set of quadratic coefficients for each time interval. However, for the parameter values that we considered, we still found that the total amount of time required by the CAV heuristic to pre-compute quadratic coefficients and then return decisions for all time intervals was less than one second. The LSF heuristic requires slightly more time due to the need to 'sample' a number of states at each time interval as part of its least-squares fitting technique, as explained in Section 5. As described in Appendix A.8, we used a constant sample size |M s |=500 in all of our experiments. With this value of |M s | we found that the total amount of computation time required by the LSF heuristic was between one and ten seconds, depending mainly on the number of time intervals, the number of linear constraints for the service rates and also the nature of the arrivals process (Poisson or PSRA).
Next, we decided to test the sensitivity of our results to the 'busyness' of the system by sampling 4800 of the 28,800 previously-tested scenarios and, for each of these 4800 scenarios, repeating our experiments several times with re-scaled demand rates. For these experiments, we restricted attention to scenarios with β = 0, in order to simplify the aggregate cost function to S s=1 (αX 2 s + Y 2 s ), without the extra component related to switching costs. Our intention was to demonstrate that, with β = 0, reducing the demand rates causes the difference in performance between the SAB heuristics and the more naive MDR heuristic to become smaller. With β > 0 the picture becomes less clear because, in that case, the switching cost component of the objective function becomes relatively more important as demand rates decrease (due to the expected phase counts becoming smaller), and this in turn may actually cause the SAB heuristics to increase their advantage over the MDR heuristic since they are able to deal with switching costs in a more  Table 4: Percentage improvements for the API, CAV and LSF heuristics versus the MYP heuristic for various different ranges of τ , the length of each time interval sophisticated way. We re-scaled the demand rates as follows: • For scenarios with Poisson arrivals, the Poisson demand rate functions λ(t) and η(t) were replaced by pλ(t) and pη(t) respectively, where 0 < p < 1.
• For PSRA scenarios, the numbers of customers scheduled to arrive in the various time intervals were all multipled by p before being rounded to the nearest integer.
We tried each of the values p = 0.75, p = 0.67, p = 0.5 and p = 0.25 and, in each case, re-ran our numerical experiments for the 4800 sampled scenarios.

Analysis of customer waiting times
Our experiments in this section so far have been designed to compare the performances of the heuristic policies with respect to the objective function (1). A practitioner might also wish to know the effects of the heuristics on measures of performance which are more directly of interest, such as average waiting times and maximum waiting times. With this in mind, we carried out another set of experiments involving 4800 randomly-generated scenarios. In these experiments we used the parameter values α = 1 and β = 0 in order to make the objective function (1) as suitable as possible for minimizing the average customer waiting time across all time intervals, for classes A and D combined. All other parameters were generated as described in Appendix A.8.  Comparing the results in Table 6 with those in Table 1, we see that when average waiting times are used as a performance measure (with α = 1 and β = 0) as opposed to the usual objective function (1), the ranking order of the 4 SAB heuristics remains the same, although the percentage improvements against the myopic policy are somewhat smaller -which is perhaps unsurprising, given that our objective function is not directly based on waiting times. Our method of construction obviously implies that the purple myopic curve should be almost linear (the nonlinearity being due to ties, since there are zeros in the data), but the curves for the other 3 heuristics are slightly more convex in shape and tend to remain below the myopic curve, which reflects the fact that the proportions of customers exceeding the critical waiting time thresholds (as given by the myopic policy) are usually smaller under the API, CAV and LSF policies.

Conclusions and further work
In this paper we have considered a problem involving dynamic allocation of resources between two sources of time-varying demand, with an intended application to the tactical control of scarce resources imply that they could easily be re-run in order to prescribe new decisions at short notice if the arrival rates (or other system parameters) are subject to stochastic variations during on-line implementation.
A key finding of our work is that, under quite plausible modelling assumptions (e.g. similarity between the customer classes with respect to demand rates and service rate constraints), myopic policies based on SAB queue dynamics appear to perform strongly. This is worthy of note since, in general stochastic optimization problems, one usually cannot rely on a myopic policy to be anywhere near optimal. Furthermore, our numerical experiments have identified circumstances in which one can improve significantly on the performance of a myopic policy through the use of more sophisticated heuristics. These circumstances include long durations for time intervals and highly time-dependent service rate constraints.
Comparisons between our SAB-based heuristics suggest that: • The LSF heuristic performs strongest overall, but is also the most computationally expensive.
• The CAV heuristic is strong in many cases but also appears to be more sensitive to the value of β and the type of arrivals distribution.
• The API heuristic offers a computationally inexpensive method of improving upon a myopic policy and appears to perform best in the β = 0 case.
As Table 5 shows, however, the ranking order of the heuristics can change dramatically if demand rates are not high enough for the SAB assumption to be justified.
From a methodological perspective, the approximate DP methods that we have developed in this paper -in particular, the least-squares fitting (LSF) heuristic -bear certain similarities to reinforcement learning (RL) methods which have been applied to many different types of dynamic control problems in the last few decades (see [6,47]). Our LSF heuristic operates by randomly sampling a set of states at each time interval and fitting a quadratic to the value function estimates obtained at these states. In this respect it is similar to RL algorithms which use value function approximation and/or neural networks. We believe that our approach is particularly efficient for this particular problem because, as a result of our analyses in Section 4, we are able to use analytical expressions for the second moments in our surrogate problem to efficiently obtain value function estimates at the various time intervals via backwards induction, without needing to simulate the random transitions of the process and explore potentially sub-optimal action choices (as a conventional RL algorithm might do). In future work we intend to investigate whether the use of simulation-based RL algorithms might enable us to obtain strong parametric value function approximations under a broader range of conditions, including non-busy periods.
There are several interesting possibilities for extensions of our work. In this paper we have assumed, a priori, that the period of operation is one in which the servers remain almost constantly in use. At an airport, this might correspond to a period of several hours during part of the day (e.g. a late-afternoon peak period in which many aircraft are expected to arrive). The question of how our approach may be combined with a method for decision-making during non-busy times of day is left open for further research, as is the related issue of how optimizing decisions during busy periods might affect the performance of the system during any subsequent, non-busy periods. As we have asserted elsewhere, however, it is intuitively clear that if a decision-maker's main interest is in mitigating the worst effects of congestion, then optimizing decisions during the busiest times of day should help greatly.
In the context of airport operations, our work continues the long-standing convention of modelling queues of aircraft as M (t)/E k (t)/1 queues, but we have also adapted our heuristic methods to the case of pre-scheduled random arrivals, which has received considerable recent attention in the literature. With airport applications in mind, there are various other possible generalizations of our model. We have not incorporated a stochastic model of weather evolution, and have instead relied upon the implicit assumption that reliable weather forecasts are available which enable the OTEs for all time intervals (which, in practice, would be affected by weather variations, as well as the types of aircraft scheduled to arrive at different times of day and any scheduled runway configuration changes) to be predicted with a high degree of confidence. Also, although our incorporation of 'switching penalties' for changing service rates is intended to model -to some extent -the costs involved in dynamically re-deploying airport resources, our model would have to be extended in order to facilitate more complicated operations (e.g. runway configuration changes) which might broaden the range of options available to a decision-maker.