On the dynamic control of matching queues

We consider the optimal control of matching queues with random arrivals. 
In this model, items arrive to dedicated queues, and wait to be matched 
with items from other (possibly multiple) queues. A match type corresponds 
to the set of item classes required for a match. Once a decision has been 
made to perform a match, the matching itself is instantaneous and the 
matched items depart from the system. We consider the problem of minimizing 
finite-horizon cumulative holding costs. The controller must decide which 
matchings to execute given multiple options. In principle, the controller may 
choose to wait until some “inventory” of items builds up to facilitate 
more profitable matches in the future. 
 
We introduce a multi-dimensional imbalance process, that at each 
time t , is given by a linear function of the cumulative arrivals to 
each of the item classes. A non-zero value of the imbalance at time t 
means that no control could have matched all the items that arrived by time t . A lower bound based on the imbalance process can be 
specified, at each time point, by a solution to an optimization 
problem with linear constraints.While not achievable in general, 
this lower bound can be asymptotically approached under a dedicated 
item condition (an analogue of the local traffic condition in bandwidth 
sharing networks). We devise a myopic discrete-review matching control 
that asymptotically–as the arrival rates become large–achieves the 
imbalance-based lower bound.

1. Introduction.We consider the matching of items that arrive randomly over time.Items of different classes arrive sequentially and wait in their respective queues -a queue for each class.Items can leave the system only after being matched to items of other (possibly multiple) classes.Once matched, the items leave the system together.We refer to such systems as matching queues and are concerned with their optimal control.The items may be thought of as people with relevant needs and/or skills (such as in an online market setting) or as inanimate components that must be combined to form completed products (such as in a manufacturing setting).
In the example depicted in Figure 1, there are 4 classes of items, and items of class i arrive according to a time-varying Poisson process A i having A queueing network view of a system with four input streams and two matchings.
instantaneous rate λ i (t), i = 1, 2, 3, 4. Items of class 1 can be matched to items of class 2. Items of class 2 can be also matched with items of classes 3 and 4.This structure is reflected in the graph in Figure 1 where each rectangle corresponds to an item class and each of the circles A and B to matching types.When a class 1 item is matched with a class 2 item they both leave the system: matchings are instantaneous.An item of class 4 must be matched to both a class 3 and a class 2 item to depart.The matching-toqueue adjacency matrix (henceforth, the matching matrix) is given in this case by The controller must decide when to perform a matching and which matching to perform given multiple options.If the decision is to perform d A matchings of type A and d B matchings of type B, (M d) i units are depleted from the class i queue, where d = (d A , d B ).The controller may choose to wait even if items are available.In Figure 1, suppose that there is a single item available in each of the class 1, class 2 and class 4 queues but none in the class 3 queue.The controller may be greedy and match the available class 1 item with the available class 2 item, which would result in two items departing the system.It may be, however, preferable to "inventory" this class 2 item until a class 3 item arrives at which point it can be matched with one class 2 and one class 4 items -depleting three items.
A first (and central) step in optimizing these networks is to identify a good state descriptor.This choice is simple if one were to consider the simplest of matching networks -one with two item classes and one possible matching; see Figure 2. Assuming one performs matchings whenever there are items to be matched, either queue 1 or queue 2 must be empty.In turn, the imbalance between the arrivals S(t) = A 1 (t) − A 2 (t), is a sufficient state descriptor: the size of each of the queues at time t is determined by the value of the imbalance.In particular, (1.1) Given a strictly increasing non-negative cost function C : R 2 + → R + , it is trivial now that under any control If the imbalance at time t is non-zero, then matching all arriving items is infeasible and the instantaneous cost must be strictly positive.
More generally the definition of the imbalance process is not as simple.The conceptual implications, however, remain valid in great generality: • If the imbalance process is non-zero at time t, then there is no control that could have matched all items that arrived by time t > 0. • A lower bound on the holding costs under any matching control is given by a simple function of the imbalance process, as in (1.2).
In this paper, we show how to explicitly construct such an imbalance process.
The imbalance S is constructed as a linear mapping of the arrival processes.If q 0 is the vector of queues at time t = 0, then where the imbalance matrix Y has dimension I × J, with I being the number of classes and J the number of possible matchings; see Section 3. In the simple example of Figure 2, Y is the column vector (1, −1) T .
Our objective is to propose a matching control that minimizes the finite horizon cumulative cost, u 0 C(Q(t))dt, where u > 0 is the time horizon and C : R I + → R + is the instantaneous cost function.Finite horizon (rather than long-run average) objectives are natural because the matching queues we consider here are inherently unstable: for the simple example in Figure 2, either the queue sizes blow up to ∞ (if λ 1 = λ 2 ) or the imbalance process is null recurrent (if λ 1 = λ 2 ).
A lower bound on the instantaneous holding cost based on the imbalance process in (1.3) is given by min C(q) (1.4a) s.t.Y T q = S(t) (1.4b) q ≥ 0, (1.4c) at every time t > 0. That is, one optimizes the queue sizes q subject to the constraint that the appropriately weighted (by Y T ) sum of the queues must equal the imbalance process.Obviously, there is no reason to expect that the lower bound in (1.4) is achievable as it ignores past actions; namely, the fact that once an item is matched, it can not be "unmatched".In Figure 1, if the controller matches a class 2 item with a class 1 item when a class 4 item was waiting but a class 3 item arrives shortly after, the controller cannot "unmatch" the class 1 and 2 items in order to match instead the class 2, 3, and 4 items.
In this paper, we show that a simple discrete review matching control achieves the imbalance based lower bound for a large class of networks (specifically, those that satisfy a dedicated item condition -see Assumption 1), when the arrival rates become large.In brief, our proposed control minimizes the costs given the backlog of items at specified decision epochs t 1 , . . ., t m : at time t m , we determine the number of matches of each type d m and the resulting queues q m by solving: where M is the matching matrix.The algorithm is fleshed out in detail in §4.
The discrete-review matching control (1.5) is (partly) greedy-in the case of Figure 1, our control will not "inventory" items of class 2 in anticipation of arrivals of classes 3 and 4.However, the term "greedy" must be used with care.Discrete review means that we do nothing between review periods, so that items accumulate.The inventory accumulated between review periods provides the flexibility to achieve the lower bound in (1.4) at each decision epoch.Decision epochs must be close enough so as not incur significant holding cost, but sufficiently spaced out so that enough items have accumulated between review periods to provide matching flexibility.
In summary, the main contribution of this paper is to introduce a model for matching queues, identify the imbalance process as a key concept, and prove that a simple control is asymptotically optimal for holding cost minimization.
Organization of the paper.We conclude this section with a literature review.We specify our model in §2, and explicitly construct the imbalance matrix Y and the imbalance process S(t) in §3.We fully describe our proposed control in §4.We state our first asymptotic optimality result in §5.That first result assumes that, in fluid scale, all items can be matched.When there is no such fluid balance, our asymptotic optimality result requires additional conditions, and that setting is studied in §6.Proof essentials appear in §7 and numerical examples in §8.We make concluding remarks in §9.The proofs of propositions and theorems appear in the main body of the paper, and there is an appendix for the proofs of lemmas.
Literature review.There are two streams of literature that are relevant to our work: that on stochastic processing networks and that on assemble-toorder systems.
The stochastic processing networks (SPNs) literature.In Harrison's SPN framework [6], a matching network can be viewed as a SPN with I classes of items that are processed via J activities.For the matching network drawn in Figure 1, there are I = 4 classes and J = 2 activities that can be used to process these classes.Each of the J activities is undertaken by a resource and all processing is instantaneous (so that each resource has infinite capacity).
Our model can thus be viewed as an extension of assembly-queues.One early example is the model studied in [4], which has I classes, a single activity served by a finite capacity resource, and positive processing times.Although the matching requirement is similar, there are two important distinctions.The first is the fact that we have instantaneous processing.A consequence of this is that the imbalance process can be approximated by an unregulated diffusion.With positive processing times, the appropriate approximation is a regulated diffusion.The second important distinction is that with a single activity there is no question of control.In contrast, our focus is on networks with multiple matching types, in which the network performance is determined by decisions concerning which activities to undertake whenthat is, by the matching decisions.This question of matching control has not been fully explored in the SPN literature.
Studies of SPNs typically follow a hierarchy in which one first considers a static planning problem, and subsequently a waiting-or holding-cost minimization problem.The second step is often facilitated by use of the "standard Brownian machinery" proposed in [5]; namely, by introducing a heavy-traffic asymptotic regime in which resources are almost fully utilized and queue-lengths can be approximated by a function of a Brownian motion.An equivalent problem of lower dimension, that can be more easily solved, is constructed based on the workload process (see [8]) whose construction, in turn, relies on the dual of a static planning problem.A class of SPNs that is closely related to matching networks is that of parallel-server networks (Figure 1 serves to visualize this similarity).That class is studied using the standard Brownian machinery in, for example, [2,7,11], and those papers identify resource-pooling conditions under which the workload process is one-dimensional.The solution to the one-dimensional equivalent workload formulation is used to construct a control for the original SPN that is proved to be asymptotically optimal in heavy-traffic.
In spirit, our analysis of matching networks follows this same hierarchy.The construction of the imbalance process is based on the dual to an appropriate static planning problem, and we perform an asymptotic analysis to motivate a dynamic control and prove that the proposed control is asymptotically optimal.The conventional notion of heavy traffic does not, however, apply in our setting because there is no obvious notion of resource capacity.The appropriate analogue to the heavy-traffic asymptotic regime involves balance among the various input flows, which effectively "serve" one another.The network in Figure 2 is balanced when λ 1 (t) = λ 2 (t) for all t ≥ 0, and that of Figure 1 is balanced when λ 1 (t) + 0.5(λ 3 (t) + λ 4 (t)) = λ 2 (t) and λ 3 (t) = λ 4 (t) for all t ≥ 0. Instead of the resource pooling condition, a reduction in problem dimensionality is achieved here by a dedicated-item condition which is an analogue of the local traffic condition developed in [10] for a bandwidth sharing model.The control problem does not, however, collapse here to a single dimension.
The assemble-to-order system literature.Our interpretation of the imbalance process is closely connected to the interpretation of the inventory position in an assemble-to-order (ATO) system.In such a system, stochastic demand for a set of end products is met by assembling components.Different end-products use different (possibly overlapping) subsets of components and the controller must decide dynamically which end products to assemble given the backlogged demand for products and the inventory of components; see, e.g., the survey paper [15].The components may either be arriving stochastically, if the capacitated component production is explicitly modeled, or they may be ordered, in which case they arrive after a lead time.Regardless of the component delivery method, the relevant state descriptor is the inventory position which tracks the number of components required to satisfy product demand, and is positive (negative) if the system has extra (is short of) components.Our imbalance process is similar in that it tracks which items are plentiful and which items are in short supply.
In the assemble-to-order setting with component production, as in [12], the notion of heavy-traffic equates the rate at which products are demanded to the rate at which components are produced.When those rates are in balance, the standard Brownian machinery can be used to construct an asymptotically optimal control that specifies dynamically which products to assemble.An important similarity (and a departure from traditional queueing networks) is that, in both the ATO context and in our matching setting, capacity -be it the queue of components in the ATO setting or the queues of items in our setting (that serve as "resources" for other items) -can be "banked" and is not perishable.Yet this is where the connection to the matching setting partly breaks down.In the matching setting there is no natural notion of physical capacity as items play the dual role of "products" needing components and "components" that are used in other products.This duality requires a different notion of "heavy-traffic".
The standard Brownian machinery is applied also in the study of ATO systems where components are ordered; see [3] and [13].These papers uses functional central limit theorems to show that, as the lead times grow large (and so does, in particular, the demand for products during the lead time) a certain lower bound stochastic program is attained.This is also the spirit of the analysis in this paper: to use functional central limit theorem methodology to show that when the arrival rates become large, the lower bound in (1.4) can be attained.In our setting, the absence of capacitated resources allows us to cover, within a single asymptotic framework, balanced and nonbalanced networks.
Notation.We let R denote the real numbers and R + denote the positive real numbers.The set of integers is Z and N denotes the non-negative integers.For a set S, |S| denotes its cardinality.All vectors are assumed to be column vectors.The transpose of a vector v is denoted by v T .The notation |v| denotes the Euclidean norm of v.We let e be the vector of all 1's, and e j be the vector of all 0's except with a 1 in the jth place.All processes considered in what follows are assumed to be right continuous with left limits, and D d [0, ∞) denotes the space of such functions from [0, ∞) to R d .For a process x ∈ D d [0, ∞) and a constant u > 0 we let x s,u = sup s≤t≤u |x(t)| (we abbreviate to x u if s = 0) and define ∆x(t) = x(t) − x(t−).
For asymptotic optimality we consider a sequence of systems indexed by n ∈ R + .We use the notation ⇒ to denote convergence in distribution as n → ∞ in the space D d [0, ∞).We use the same notation for weak convergence of random variables and the correct interpretation will be clear from the context.For a sequence of random variables {X n } and a sequence of nonnegative numbers a n we say that 2. The matching model.The model consists of a set I of input streams, or item classes, and a set of matchings J .A matching corresponds to a subset of I that contains at least two item classes.We let I(j) be the set of item classes participating in matching j ∈ J and J (i) be the set of matchings which involve item class i ∈ I.The matching matrix M ∈ {0, 1} I×J , where I = |I| and J = |J |, has M ij = 1 if i ∈ I(j) and 0 otherwise.We assume that for each i, there exists at least one j such that M ij = 1; that is, each item class is connected to at least one matching.In Figure 1, and Class i items arrive according to a (possibly time varying) Poisson process A i = (A i (t), t ≥ 0) with instantaneous rate λ i (t), so that Λ i (t) = t 0 λ i (s)ds is a first order approximation for the cumulative arrivals up to time t. 1 We assume that The control is the vector of processes D j = (D j (t), t ≥ 0), j ∈ J , where D j (t) tracks the cumulative number of times matching j has been performed in [0, t], and has Let q 0,i be the number of items in queue i at time 0. The number of class i items waiting at time t ≥ 0 is then or, in vector notation Naturally, we only consider controls under which Also, since matching j ∈ J is only feasible at times t ≥ 0 in which at least one item is waiting in each of the queues i ∈ I(j), we require that for all j ∈ J , i ∈ I(j) and t > 0, where we define Given a non-negative and convex function C : R I + → R + that has C(0) = 0 and is strictly increasing with respect to the natural partial order on R I + , we seek to solve the problem for any given u > 0, where the minimization should be interpreted in a stochastic sense.In words, we wish to minimize the finite horizon costs of having items waiting (holding costs).
Our proposed solution will be pathwise (asymptotically) optimal under our assumptions.As a corollary, if one considers q 0 = 0 and the particular cost function the solution minimizes i∈I Q i (t) for each t ≥ 0 and, consequently, maximizes the total number of items matched in [0, u].

Dedicated item condition.
The following condition is the analogue of the local traffic condition in [10, Assumption 5.1], defined there for a bandwidth sharing network.In words, the condition requires that each matching has at least one class that is used by that matching and that matching only.
In Figure 1, class 1 is served only by matching A and classes 3 and 4 are served only by matching B. The dedicated item condition is not satisfied, for example, in Figure 3(LHS) where matching B has no such item class, but it is satisfied in Figure 3(RHS).
Definition 1 (dedicated item (DI)).For each j ∈ J there exists i ∈ I such that M ij = 1 but M ik = 0 for all k = j.
Assumption 1.The network satisfies the DI condition.
Since a matching must have at least two classes, the dedicated item condition implies that I > J.It also implies that the matrix M (after possibly re-arranging indices) has the form M T = (I, M 2 ) where I is the J × J identity matrix and M 2 is some J × (I − J) matrix.In particular, the matrix M has rank J.A useful implication of the structure of the matrix M is that M x ≥ 0 if and only if x ≥ 0. In the algebraic literature this property is referred to as inverse monotonicity and it is this implication of the dedicated item condition that is central to our proofs.
We say that a network is fluid-balanced if for each t ≥ 0, there exists z = z(t) ≥ 0 such that M z = Λ(t): starting with empty queues one can (in first order) match all arrivals by time t and make the "fluid" queues empty.
Conversely, we say that a network is not balanced if there exists t such that M z = Λ(t) for all z ≥ 0. In this case, no control can empty the queues completely in fluid scale -there must be some positive queues some of the time.
This separation according to fluid balance is not central to the development of our proposed control in Sections 3 and 4.However, the conditions for asymptotic optimality are stronger for non-balanced networks; see Section 6.
3. The imbalance matrix and process.Our proposed control is based on the imbalance process (1.3).The imbalance process is used to identify the lower bound on the achievable cost given in (1.4).In this section, we provide one possible construction of the matrix Y in (1.3), and relate this construction to the duality of item classes as customers and servers.
To motivate the imbalance process, suppose that no action is taken until time t.At this time, it is feasible to match all items if there exists d ≥ 0 that solves The dual to (3.1) is given by the linear program By strong duality (3.1) has a feasible solution d * if and only if (3.2) has a finite optimal solution y * (any such solution must have (y * ) T (q 0 +A(t)) = 0).Suppose that q 0 + A(t) ≫ 0, i.e., that all components of the vector q 0 + A(t) are strictly positive.Then, by the dedicated item condition it must be that d * ≫ 0 and, by complementary slackness, that (y * ) T M = 0. Thus, for q 0 + A(t) ≫ 0, the primal has a feasible solution d * if and only if the dual has an optimal solution y * with (y * ) T (q 0 + A(t)) = 0 and (y * ) T M = 0. If, on the other hand, there exists y such that y T M = 0 but y T (q 0 + A(t)) > 0 there is no solution d ≥ 0 to M d = q 0 + A(t) so that, regardless of our actions, some queues must be positive at time t.
We construct a process S = (S(t), t ≥ 0) having S(t) = 0 if there exists d ≥ 0 such that M d = q 0 + A(t) and S(t) = 0 otherwise.We fix a matrix Y whose columns span and define The process S(t) obtains values in the subspace The following formalizes our heuristic motivation of the imbalance process.
Lemma 1. Suppose that Assumption 1 holds.For each x ≥ 0 such that Y T x = 0 there exists a unique solution d to the system of equations M d = x, and this solution is non-negative.
As a corollary of this lemma we observe that the network is fluid balanced if, for all t ≥ 0, Y T Λ(t) = 0 as this guarantees the existence of z(t) ≥ 0 such that M z = Λ(t).
Then, S 2 (t) = 0 only if A 3 (t) = A 4 (t), which is consistent with the fact that-assuming no matchings are performed by time t-all class 3 and 4 items can be matched at that time only if the exact same number of each class is present.If S 2 (t) = 0, then S 1 (t) = 0 only if so that there are enough class 2 items to match all items of classes 1, 3, and 4 that are present in the queues at time t.
In general, there may be multiple choices for the matrix Y .The choice of Y does not affect our results so we do not make this dependence explicit in our notation.Regardless of how Y is chosen, its rank is I − J (recall that rank(M ) = J < I).
There are standard (algebraic ways) to construct such a matrix Y .The following is a construction of Y that has an intuitive physical interpretation and is rooted in the greater relative importance of some item classes.In Figure 1, for example, class 2 is such a class.It is the only class that is used in more than one matching so that, by allocating items of this class between the two distinct matchings A and B, the system manager can control item departures.In this sense, class 2 is a "resource" for class 1, 3, and 4 "items".Formalizing this idea is conceptually useful and facilitates an explicit construction of the matrix Y .Definition 2. We say that a class i ∈ I is a resource class if |J (i)| > 1 (it participates in more than one matching).We let S ⊂ I be the set of resource classes and let C = I \ S be the remaining classes, which we refer to as the customer classes.
The algorithm shown below can be used to construct an appropriate matrix Y (that is, one whose rows span (3.3)).In the presentation of this algorithm, we use the notation r to refer to a resource class and c to refer to a customer class.Then, for a row vector y in the matrix Y T (which has I entries), the notation y r (y c ) refers to the position in the vector y associated with that resource (customer) class.For easier understanding, we illustrate each step by applying it to Example 1.
(0) Start with an "empty" matrix Y T that has I − J rows and I columns.
(1) For each resource class r ∈ S: (i) Add a row vector y that has a 1 in the entry associated with that resource class.r = 2 and y = (x, 1, x, x) (ii) For each matching j ∈ J (r): For each customer class c ∈ C ∩ I(j): (2) For each matching j ∈ J having |C∩I(j)| ≥ 2, fill one of the remaining rows in the matrix Y with the vector y constructed as follows: (i) Arbitrarily designate a customer class c(j) ∈ C ∩ I(j).
In the application of the last step to the network in Example 1, note that matching A has a single customer class so that this step applies only to matching B. Overall, we generate exactly I − J rows, each of size I.To see this, note that step (1) creates |S| rows.Then, step (2) creates |C| − J rows, because each matching j yields |C(j)| − 1 rows and there is no overlap between local inputs.Finally, the independence of the columns of Y (rows of Y T ) follows immediately by construction as does the fact that Y T M = 0.
4. The proposed discrete-review control.Consider first the case that q 0 = 0: the initial queues are empty.A modification of the algorithm to account for general initial conditions is provided at the end of this section.
At each decision epoch 0 we solve for the number of matches of each type d m and the resulting queues q m that minimize the instantaneous holding cost; that is, we solve for (d m , q m ) in min C(q m ) (4.1a) The proposed control D ⋆ and queue-length processes are From (4.1), it is straightforward to see that the constructed D ⋆ is piecewise constant, satisfies (2.2)-(2.5)and is thus admissible.
The optimization (4.1) is exactly (1.5) in the Introduction because ∆A(t m ) = 0 for all m almost surely when arrivals are Poisson, so that In particular, substitution shows that the constraints in (4.1) and (1.5) are identical.This is important because the formulation (1.5) does not require tracking the arrival increments A(t m ) − A(t m−1 ).
To see the connection of our proposed control with the imbalance process introduced in §3, we multiply by Y T in (4.1b).Then, recalling that Y T M = 0 and the imbalance process definition in (3.4), we find , then Y T q m = S(t m ).This is true at time t 0 = 0 and will subsequently hold for all m by the definition of the algorithm.Then, (4.1) can be equivalently written as min C(q m ) (4.2a) s.t.Y T q m = S(t m ), (4.2b) For each m the problem (4.2) has the lower bound given in (1.4) of the introduction (set t = t m ): The source of the potential higher cost under the proposed discrete-review matching control in (4.2) and the lower bound in (4.3) comes from the constraint (4.2c), that prevents previously matched items from being "rematched" more advantageously.
In particular, a lower bound on what the algorithm can achieve (or, in fact, on what any algorithm can achieve) is given by min u 0 C(q(s))ds, (4.4a) s.t.Y T q = S(t), for all 0 ≤ t ≤ u, (4.4b) ) is optimal for (4.3) and, given a sample path (S(t), t ≥ 0), the sample path (Q(S(t)), t ≥ 0) is optimal for (4.4).Thus, the imbalance-based process Q(S(t)) generates a lower bound on the original finite horizon cumulative cost objective (2.6).To understand how our algorithm overcomes the gap between the original formulation and this lower bound it is useful to formalize an equivalence between (4.4) and the original cumulative finite horizon cost formulation (2.6).
Eliminate first the requirement in (2.6) that only integer numbers of items can be matched.The original problem formulation then becomes: The imbalance-based problem formulation is: A function (q(t), t ≥ 0) is said to be admissible for (4.7) if it is RCLL and satisfies all the constraints.The following establishes that, under the dedicated item condition (Assumption 1), these problems are equivalent.
Theorem 1. Suppose that Assumption 1 holds.If (Q, D) is an admissible solution for (4.6), then, Q is admissible for (4.7).Conversely, if Q is an admissible solution for (4.7), then there exists a process D such that (Q, D) is admissible for (4.6).
The simple proof appears at the end of this section.Theorem 1 tells us that, to study the original problem formulation, it suffices to study the imbalance formulation (4.7).Note that the formulation (4.7) is the lowerbound problem (4.4) with the added constraint A(t) − A(s) ≥ q(t) − q(s); in other words, that constraint is the source of any potential cost gap.For the solution Q(S(t)) to be feasible for (4.7) we need that This is the key observation underlying our proposed control.By spacing out the decision epochs far enough we can guarantee that, with high probability, Our control will then be able to track the trajectory of queues Q(S(t)) at decision epochs; that is, the solutions to (4.2) and (4.3) will be the same.Assuming that the decision epochs can still be spaced close enough so that the cost build-up during a review period is negligible, our proposed discrete review matching control (4.2) will achieve a near minimum cumulative finite horizon cost.
In summary, the essential element is that there are sufficiently many arrivals during sufficiently short review periods.
Remark 1.It is important that if (Q, D) is a feasible solution to (4.6), then the feasibility of Q for (4.7) does not require the DI condition.It follows, in particular, that regardless of any assumptions the optimal value in (4.7) serves as a lower bound for (4.6) and, in turn, for (2.6).
Remark 2 (when Q is not unique).If, for each x ≥ 0, the solution Q(x) in (4.5) is guaranteed to be unique (say, if C(•) is strictly convex), the explicit form of Q is not required in order to use our algorithm and generate the optimality results that follow.In the absence of such uniqueness we modify the algorithm and use the explicit Q.
At review epoch t m , if there exists d m ≥ 0 such that (d m , Q(S(t m ))) satisfies (4.1c) (and, in turn, is optimal for (4.1a)-(4.1c))choose this solution, i.e, set In words, at review epoch t m the algorithm chooses Q(S(t m )) whenever feasible.
Proof of Theorem 1.Let (Q, D) be an admissible solution for (4.6).Then, the first equation in (4.6) together with the fact that D is an increasing process guarantee that A(t) − A(s) ≥ Q(t) − Q(s) for all 0 ≤ s ≤ t ≤ u.Also, since Y T M = 0 by construction, we have that . Hence, Q is feasible for (4.7).Next, we will show that if Q is a solution to (4.7) then there exists a process D such that (Q, D) is a solution to (4.6).We construct the process D(t) as follows: Let x(t) = q 0 + A(t) − Q(t), where q 0 is defined to equal Q(0).Note that since Q is a solution to (4.7) we have, in particular, that x(t) ≥ 0 and Y T x(t) = 0. Using Lemma 1 let D(t) be the unique non-negative solution to M D(t) = x(t).We claim that the process D(t) constructed this way is, in fact, increasing.Indeed, by construction M (D(t) − D(s)) = x(t) − x(s).Since x(t) − x(s) ≥ 0 (by the second constraint in (4.7)) and since Y T x(t) = Y T x(s) = 0, we have by Lemma 1 that D(t) − D(s) must be the unique (non-negative solution) to this system.Thus, D(t) is increasing.Finally, since A(t) is RCLL and so is, by definition Q(t), they both have a finite number of discontinuity points on any finite interval.Thus, to show that the third constraint in (4.6) holds, it suffices to show that if (s, t] is an interval such that Q i (u−)+∆A i (u) = 0 for all u ∈ (s, t] then D j (t)−D j (s) = 0 for all j ∈ J (i).Since Q ≥ 0 and A is an increasing pure jump process we have on this interval that Q(s) = 0 for all s < t and ∆A i (u) = 0 for all u ∈ (s, t].In particular A i (t)−A i (s) = 0. Suppose to reach a contradiction that there exist j ∈ J (i) with D j (t)− D j (s) > 0. In this case (M (D(t)− D(s))) i > 0 and, by our construction of D, Q i (t)−Q i (s) = A i (t)−A i (s)−(M (D(t)−D(s))) i < 0 which, since Q(s) = 0 for all s < t is a contradiction to the non-negativity of Q.
General initial conditions.We end this section with a modification of the algorithm to accommodate general initial conditions.Such a modification is needed only if at time 0 q 0 = Q(Y T q 0 ).Otherwise, if q 0 = Q(Y T q 0 ), then the algorithm can be used as presented.Let (4.8) ) ≥ 0 and Y T x = 0 so that, applying Lemma 1, there exists d ≥ 0 that solves and we can set . From here, we can proceed as in our original algorithm.
5. Asymptotic optimality for balanced networks.We consider a sequence of systems, indexed by n, in which the arrivals are accelerated: is the instantaneous arrival rate of class i items at time t, and Λ n i (t) = nΛ i (t) is the mean cumulative number of class i item arrivals.We assume, without loss of generality, that λ max = 1, so that n is interpreted as the maximal aggregate arrival rate over the time horizon.The review epochs are (5.1) for m = 1, 2, . . ., and we have at most ⌊un 2/3 ⌋ review epochs on [0, u].If t n 0 = 0, then t n m = m(1/n) 2/3 .Our convention is to superscript with n any process or quantity associated with the n th network.Thus, for example, q n 0 is the initial queuelength vector in the n th network.It is standard to construct non-stationary Poisson processes from unit-rate Poisson processes (A i , i ∈ I) as follows Given a control D n , the queue process Q n is constructed as in (2.2)-(2.5).With some abuse of terminology we henceforth say that a sequence {D n } is an admissible control if D n is admissible for each n (i.e., D n satisfies (2.2)-(2.5)).
From the functional central limit theorem for renewal processes and the random time change theorem it follows, when the underlying Poisson processes are independent, that (5.2) where B is a standard I-dimensional Brownian motion.In turn, When the system is fluid balanced, Y T Λ n = 0.Then, for continuous Q, and assuming q n 0 / √ n → q0 , as n → ∞, In particular, the lower-bound cost is of the order of √ n.The fluid queues would be 0 in this case, so that √ n is the cost of stochasticity.Following the standard notion of asymptotic optimality, we say that a control is asymptotically optimal if its optimality gap is negligible relative to √ n.
The informal weak-convergence argument in the previous paragraph suggests that our proof of asymptotic optimality hinges on the following continuity assumption.Below, M is as in (3.5) and, recall, |v| is the Euclidean norm of a vector v.

Assumption 2 (Lipschitz selection).
There exists a function Q(•) and a constant κ, such that, for each s ∈ M, Q(s) is a minimizer in (4.5) and Example 2. The matching network in Figure 4 has the matching matrix and it clearly satisfies the DI assumption.The unique (up to multiplicative constant) vector that spans Y is y = (−1, 1, −1) T .Consider the cost function The function C is convex and strictly increasing if c i > 0, i = 1, 2, 3. Next, the solution to argmin q≥0 {C(q) : −q 1 + q 2 − q 3 = s} for s ∈ M, is and Theorem 2. Fix u > 0 and suppose that Assumptions 1 and 2 hold, that C(•) is Lipschitz continuous, and that the network is fluid-balanced.If q n 0 / √ n → q0 , then, (i) For any admissible control {D n } and all n, ), for all t ≥ 0 almost surely.
(ii) For the proposed control and {D n ⋆ } is asymptotically optimal.
Remark 3 (General arrival processes).We restricted Theorem 2 to Poisson arrivals as this facilitates covering stationary and non-stationary arrivals in one result by having the weak convergence (5.2).However, neither the Poisson assumption nor independence between the I components of A n is required.The key requirement in the proof of Theorem 2 is that the bounds in Lemma A.2 in our appendix hold.This would be the case, for example, if A n 1 , . . ., A n I are renewal processes with finite 5 th moment for the inter-arrival time (see the proof of Lemma A.2 and the references therein).
We postpone the proof of Theorem 2 to Section 7, after we have considered the case of non-balanced networks.q n (t) = q n 0 + Λ n (t) − M z n (t), for all 0 ≤ t ≤ u, z n (0) = 0, z n is increasing, When the network is balanced there exists a solution (z n (t), t ≥ 0) to M z n (t) = Λ n (t) so that if q n 0 = o(n), the queues can be kept small in fluid scale.For networks that are not fluid-balanced, the optimal queuelength fluid trajectory is non zero and is, specifically, constructed using Q as follows: multiplying the first constraint by Y T on both sides and using )ds provides then a lower bound on the optimal cost for (6.1).Further, if qn is such that then x s,t = Λ n (t) − Λ n (s) − (q n (t) − qn (s)) ≥ 0 satisfies Y T x s,t = 0 so that by Lemma 1, there exists a non-negative solution d s,t to M d = x s,t .Constructing z n by setting z n (0) = 0 and z n (t) − z n (s) = d s,t ≥ 0, we have that (q n , z n ) is feasible and optimal for (6.1).
What we seek to achieve with our control is to track the stochastic fluctuations of the optimal trajectory around the fluid To that end, we impose conditions that guarantee that Q is well behaved in the vicinity of Y T (q n 0 + Λ n (t)).For simplicity of exposition, we focus for the remainder of this section on stationary arrivals, i.e, on the case that λ(t) ≡ λ > 0, so that Λ n (t) = nλt.The proofs apply to non-stationary arrivals but the conditions in that case are less transparent and we relegate them to the appendix.

Assumption 3 (Contraction).
There exists a function Q that satisfies Assumption 2 and a constant η < 1 such that Assumption 4 (Homogeneous cost function).The function C(•) is homogeneous; that is, there exists δ such that for all x ∈ R + and all κ > 0 C(κx) = κ δ C(x).
Notice that, if the network is fluid balanced, Y T λ = 0 and Q(Y T λ) = 0, so that Assumption 3 is trivially satisfied.In general, whether Assumption 3 holds or not is a property of both the cost function C(•) and the rate λ.Verifying this condition merely requires solving the optimization problem (4.5) at the single point s = Y T λ.Consider, for example, the network Figure 4 with linear holding costs C(x) = h T x with h 1 = 0 and h 2 = h 3 > 0. Suppose that arrivals are stationary with instantaneous rate λ = (2, 1, 1).Then, Q(Y T λ) = (2, 0, 0) ≮ (2, 1, 1) so that Assumption 3 is violated.In this same example, if there exists ǫ ∈ (0, 1) such that λ 2 = (1 + ǫ) and ) and the assumption is satisfied.
The intuition behind this requirement is as follows: to keep the queue of class 3 small, the "capacity" of resource class 2, λ 2 , must be strictly greater than the input of class 3. The reason the inequality is strict is that the controller must have enough flexibility to guard against stochastic fluctuations in the arrival process that result in no class 2 jobs being present when class 3 jobs need them.
Note that with qn 0 = 0, Assumption 3 implies that qn (t) − qn (s The fact that (6.4) ), which, in words, means that optimal fluid queue of each item class increases slower than the arrivals, is the only consequence of Assumption 3 that we use in our proofs.If (6.4) can be verified directly by computing Q, then the requirement that arrivals are time-homogeneous is not needed.In fact, homogeneity of the cost function, as in Assumption 4, is not necessary.More general (but less transparent) conditions are specified in the appendix; see Remark 5.
Theorem 3. Suppose that the network is not balanced.If, in addition to the Assumptions of Theorem 2, λ(t) ≡ λ and Assumptions 3 and 4 hold, then the conclusions of Theorem 2 continue to hold.
Non-Lipschitz cost functions.The assumption that C is Lipschitz continuous in Theorems 2 and 3 can be relaxed.But that relaxation does not come for free.The difference between the minimum achievable cost and the lower bound cost may be larger than √ n, and the definition of asymptotic optimality must be modified accordingly.Assuming that Q is Lipschitz continuous, In turn, the lower bound cost should satisfy, in order of magnitude, that where is the local Lipschitz constant of the convex function C. A control is asymptotically optimal if its distance from the optimum is negligible compared to the "cost of stochasticity" √ nL(q n (t)).
Definition 4. The control {D n ⋆ } is asymptotically optimal if, given ǫ > 0, there exists K such that, for all δ > 0, (6.5) Remark 4 (Lipschitz C).When the cost function C is Lipschitz continuous, Ln u (K) ≤ β for all n and K, where β is the Lipschitz constant of C. In this case, asymptotic optimality reduces to the simpler requirement that, for all ǫ, δ > 0, lim sup which is consistent with Definition 3.
Theorem 4. Theorems 2 and 3 hold for non-Lipschitz cost functions with (ii) in Theorem 2 replaced by (ii) For the proposed control {D n ⋆ }, as n → ∞, Furthermore, given ǫ > 0, there exists K such that for all δ > 0 (6.7) and D n ⋆ is asymptotically optimal is in the sense of Definition 4.
To make the connection between the bound in (6.7) in Theorem 4 and that in Theorems 2 and 3 more concrete, recall that for Lipschitz continuous C lim sup n→∞ Ln u (K) < ∞ and the optimality gap alternatively, one has separable costs of the form Here, if qn u = 0, we have a gap that is o P ( √ n m ), but if qn u ≈ n (as one expects, e.g., in the non-balanced case), the optimality gap is o P (n m−1 √ n).
7. Proof essentials.We prove Theorem 4, which is the most general statement of Theorems 2 and 3.The key to its proof is the following lemma showing that the queue length under the proposed control tracks the path of Q(S(t)).Let r n (u) be the number of review epochs by time u.That is, Lemma 2. Fix u ≥ 0 and suppose that the conditions of Theorem 4 hold.Then, under the proposed control {D n ⋆ }, (7.1) lim inf )) for all m = 1, . . ., r n (u)} = 1, and for any ǫ > 0 there exists K(ǫ) and t 0 (ǫ) such that The proof of Lemma 2 requires Assumptions 3 and 4, and this is the only place those assumptions are used in the proof of Theorem 4.
Proof of Theorem 4. From Remark 1, the further lower bound (4.4) and its solution (4.5) it follows that for all n and all t ≥ 0, which immediately proves (5.5).
It only remains to prove (6.6).To that end, note that Since matches are only made at review epochs we have that This results in the upper bound To complete the proof, we argue that each of the terms (7.7)-(7.9)converges weakly to zero.The term (7.7) converges in probability to 0 by (7.1) in Lemma 2. The weak convergence of A n to a continuous limit process in (5.2) implies that the term (7.8) converges weakly to 0. Finally, to prove that (7.9) converges weakly to 0, note that for t and we used again the fact Λ(t The right-hand side converges to 0 by the weak convergence of S n to a continuous limit (5.3), and we conclude that (7.9) converges weakly to 0 and, in turn, that (6.6) holds.

Numerical experiments.
For our experiments, we use the networks in Figures 1 and 3(ii).We refer to these as network I and network II respectively.The DI condition holds for both networks.We consider separable quadratic cost of the form C(q) = i c i q 2 i with the coefficients c T = (2, 1, 5, 7) for network I and c T = (3, 1, 5, 7, 0) for network II.
We fix the horizon to be [0, 1].Given a sample path of the arrivals (A(t), 0 ≤ t ≤ 1) we generate the corresponding sample paths of (Q(S(t)), 0 ≤ t ≤ 1) and of (Q(t, D ⋆ ), 0 ≤ t ≤ 1).In the figures, q L i = (Q i (S(t), t ≥ 0), is the trajectory of the lower bound queue trajectory and q a i = (Q i (t, D ⋆ ), t ≥ 0) is the trajectory of queue i under the proposed algorithm.We also compute the associated costs 1 0 C(Q(S(t))dt and 1 0 C(Q(t, D ⋆ ))dt.The fact that the gap between q L i and q a i and the gap between the costs are all small is the numerical manifestation of our mathematical result in Theorem 4. Our experiments show an impressive precision when the proposed solution is applied to a concrete network.
Figures 5 and 6 depict the sample paths for the stationary balanced and non-balanced networks.Assumption 4 holds for this example since C is a homogenous function.It is also checked (by solving numerically for Q(Y T λ)) that Assumption 3 holds for Network I but is violated for Network II.Nevertheless, the algorithm performs well in both settings.The costs appear in a box in the top-left corner.Each figure lists also the arrival-rates that are used.
Finally, we consider a non-stationary setting with sinusoidal arrival-rate functions of the form λ i (t) = a i + b i sin(c i t).The arrival rates are specified by three vectors a T = (a 1 , . . ., a I ), b T = (b 1 , . . ., b I ) and c T = (c 1 , . . ., c I ).All sample paths are initialized with empty queues at time t = 0.The result is displayed in Figure 7.While verifying the conditions for non-stationary non-balanced networks (as in the appendix) is complex, it is evident that the algorithm performs extremely well in the test settings.9. Concluding remarks.We have introduced a matching queue model.The control question of interest is how to match items in order to minimize holding costs.We established that a simple myopic discrete review matching control performs well, both analytically (by proving asymptotic optimality) and numerically (through simulation), for the broad class of networks that satisfy the dedicated item condition.The central idea of this work is the identification of an imbalance process that facilitates the construction of a lower bound and the design of a policy that asymptotically achieves this bound.
The analogue in parallel-server networks to the imbalance process is the workload process.In that setting, the workload process and workload formulations have played a central role in solving a variety of queueing control problems.A natural question is, then, whether the imbalance process can play a similar role in solving a variety of matching control problems.In what follows, we hint to how it could.
Imbalance-based control problems.The purpose of the discussion here is to illustrate how the results of this paper may be leveraged towards the control of more elaborate matching models.We do not pursue the construction of  Queue length q 1 L q 2 L q 3 L q 4 L q 5 L q 1 a q 2 a q 3 a q 4 a q 5 a Lower bounds cost = 51745.89Algorithm cost = 51749.67a formal framework or the proof of asymptotic optimality.Our derivation is purely formal and intended to illustrate how the imbalance process can potentially be used to construct a lower dimension control problem in other matching models.q 2 L q 3 L q 4 L q 5 L q 1 a q 2 a q 3 a q 4 a q 5 a Lower bound cost = 70926.87Algorithm cost = 70929.18A problem of input regulation/admission control serves this illustration purpose.Suppose that arrivals of class i follow a (possibly non-stationary) Poisson process E i = (E i (t), t ≥ 0) with instantaneous rate λ i (t).In contrast to our original model assume here that the controller may accept some of  q 2 L q 3 L q 4 L q 5 L q 1 a q 2 a q 3 a q 4 a q 5 a Lower bound cost = 2110.864Algorithm cost = 2123.518the arriving items and reject others upon arrival.Let R i = (R i (t), t ≥ 0) be a process that counts the number of class i items rejected by time t.This is a control process.
Suppose there is a penalty p i for rejecting a class i item.The relevant control problem is to minimize where we now have two controls -an admission control R and a matching control D.
be the cumulative class i items that enter the system by time t.The constraints (2.2)-(2.5)remain valid, and the constraint that R i can increase only in points of increase of E i must be added.
In the same spirit, the definition of the imbalance process remains It is then natural to consider the following analogue of the imbalance formulation (4.7) Using, instead of S R (t), the control invariant process S(t) = q 0 + E(t), we re-write the above as min u 0 C(q(s))ds + p ′ R(u), s.t.Y T (q(t) + R(t)) = S(t), for all 0 ≤ t ≤ u, A(t) − A(s) ≥ q(t) − q(s), for all 0 ≤ s ≤ t ≤ u, E(t) − E(s) ≥ R(t) − R(s), for all 0 ≤ s ≤ t ≤ u, q(t) ≥ 0, for all 0 ≤ t ≤ u, which leads to the lower bound min u 0 C(q(s))ds + p ′ R(u), s.t.Y T (q(t) + R(t)) = S(t), for all 0 ≤ t ≤ u, q(t) ≥ 0, for all 0 ≤ t ≤ u, R(t) ≤ E(t), for all 0 ≤ t ≤ u.
Despite the addition of the constraint on R(t), a useful re-shuffling interpretation is maintained.Our lower bound, recall, had the interpretation of allowing the controller to take corrective actions by "unmatching" items that were already matched.Here, we are allowing to "un-reject" items that were previously rejected.The solution to this lower bound may not be as simple as (4.4).Nevertheless, it is a significant simplification relative to the original formulation.
From a solution point of view, it stands to reason that the following modification of the action in the review epochs will generate near optimal solutions: min C(q m ) + p ′ r m s.t.Y T (q m + m k=1 r k ) = S(t m ), A formal study of this heuristic derivation and, more broadly, of a variety of control problems for matching networks seems a promising (and challenging) direction for future research.

APPENDIX
Proof of Lemma 1.First note that every x ≥ 0 that satisfies Y T x = 0 (as in the statement of the lemma) is in the image of M .This follows from the fact that Y spans the orthogonal space to M and thus x (being orthogonal to Y ) must be in the column space of M : there must exist d such that M d = x.The result is now immediate from the DI condition.As noted in the paragraph following Assumption 1, with the DI condition M d = x ≥ 0 implies d ≥ 0.
Proof of Lemma 2. For x ∈ M n t = {y ∈ R I−J : Y T (q n 0 + Λ n (t)) + y ∈ M}, define Q n t (x) = Q(Y T (q n 0 + Λ n (t)) + x), and Q n t (x) = Q n t (x) − qn (t), and x ∈ M n t .
Recall that qn (t) = Q(Y T (q n 0 + Λ n (t)), so that Q n t (x) captures the effect of second order perturbations around the optimal fluid trajectory; see (A. so that the result follows recalling that, by assumption, h(ǫ) satisfies that ǫ −3/4 h(ǫ) → ∞ as ǫ → 0.
Proof of Lemma A.2.The first part follows from [1] (see the explanation in the proof of Lemma 4.1 in [12]).We do not repeat the proof.For the second part, from strong approximation theorems follows the existence of a I-dimensional Brownian motion B and a constant c (depending on u) such that lim sup x n m > α 2 n 1/3 .
We will argue that each of the elements on the right-hand side converges to 0 as n → ∞ starting with the second element.Define h(ǫ) = ǫ

A λ 2 λ 1 Fig 2 .
Fig 2. The simplest matching network -one matching type and two item classes.

Fig 4 .
Fig 4.An example with one resource class and a one dimensional matrix Y .