DIFFUSION APPROXIMATION FOR AN INPUT-QUEUED SWITCH OPERATING UNDER A MAXIMUM WEIGHT MATCHING POLICY

For N ≥ 2, we consider an N × N input-queued switch operating under a maximum weight matching policy. We establish a diffusion approximation for a (2N − 1)-dimensional workload process associated with this switch when all input ports and output ports are heavily loaded. The diffusion process is a semimartingale reflecting Brownian motion living in a polyhedral cone with N2 boundary faces, each of which has an associated constant direction of reflection. Our proof builds on our own prior work [13] on an invariance principle for semimartingale reflecting Brownian motions in piecewise smooth domains and on a multiplicative state space collapse result for switched networks established by Shah and Wischik in [19].

1. Introduction.Input-queued crossbar switches are widely used in Internet routers.The main control feature in such switches is the choice of a scheduling policy for the transfer of packets from input ports to output ports in each time slot.A policy that has received considerable attention is the so-called maximum weight matching (MWM) policy.At each time step, this policy chooses a matching (or bijection) from the input ports to the output ports that maximizes the sum of the lengths of the virtual output queues served by the matching.Under various assumptions on the arrival processes, rate stability has been established for the MWM policy, provided the load placed on each of the input and output ports is less than its maximum capacity [7,11,21].
Recently, Shah and Wischik [19] have studied the asymptotic behavior of fluid models for heavily loaded switched networks operating under maximum-weight-like policies and used this behavior to prove a multiplicative state space collapse result for such networks.In this paper, assuming that multiplicative state space collapse holds, we establish a diffusion approximation for an N × N (where N ≥ 2) input-queued switch operating under the MWM policy when all of the input ports and output ports are heavily loaded.(This contrasts with the situation considered in Stolyar [20] for a generalized switch model, where in effect only one input port or one output port is heavily loaded.)The diffusion is a semimartingale reflecting Brownian motion living in a polyhedral cone in R 2N −1 + with N 2 boundary faces, each of which has an associated constant direction of reflection.Our proof builds on our own prior work [13] on an invariance principle for semimartingale reflecting Brownian motions in piecewise smooth domains.When combined with the multiplicative state space collapse result of Shah and Wischik [19], specialized to single-hop networks operating under the MWM policy, this yields a new diffusion approximation result for a heavily loaded N × N input-queued switch operating under the MWM policy.
Our interest in studying this problem stems not only from the application, but also from the fact that an input-queued switch can be viewed as an example of a stochastic processing network with head-of-the-line (HL) service [10] in which each activity can simultaneously process packets from more than one queue.This is similar to an assembly or joining operation familiar in manufacturing systems.Another type of stochastic processing network that involves simultaneous actions arises from bandwidth sharing models considered recently by several authors, see e.g., [12,17,23].However, in these models, the activities simultaneously use multiple resources or servers to process jobs; in other words, the simultaneous aspect is at the opposite end of the activity than in the case of an input-queued switch.Furthermore, in both HL multiclass queueing networks and in bandwidth sharing models, the workload for a single server is typically one-dimensional, whereas for a single N × N input-queued switch, which has only one server, the workload dimension is 2N − 1 when all input and output ports are heavily loaded.The analysis of stochastic processing networks is still in its early stages of development and the study of this concrete switch example, with its simultaneous processing action, provides valuable information that can guide development of a general theory.
There are several possible directions for further research related to diffusion approximations for switches.Here we have focused on the situation where all input and output ports are heavily loaded.At the other end of the spectrum, Stolyar [20] has established a diffusion approximation for the situation when only one input or one output port is heavily loaded.It would be interesting to develop results where only some input and output ports are heavily loaded.Such an approximation does not follow from our results as we require a non-degenerate covariance matrix for the uniqueness of our diffusion.Indeed, in light of the complexities of the geometry of the state space and directions of reflection for our diffusion approximation, it is likely that some care is required to interpolate between the single heavily loaded port case of Stolyar and the all ports heavily loaded case we treat here.In addition to this interesting direction, one might also consider investigating other scheduling policies.In this paper, we have restricted attention to the maximum weight matching (MWM) policy.However, Shah and Wischik [19] have established multiplicative state space collapse for some generalizations of this policy.In particular, one class of such policies are obtained when the virtual output queuelength is replaced by its α th power in the criterion for determining an optimal matching, yielding the MW-α policies for α ∈ (0, ∞).(The MWM policy considered in this paper corresponds to α = 1.)The multiplicative state space collapse result established in [19] under such a policy leads to a natural conjecture for a diffusion approximation to the workload process; this diffusion lives in a cone with piecewise smooth curved boundaries when α = 1.While we believe that the invariance principle in [13] can be adapted to justify this approximation for α ∈ (1, ∞) for small N , the complexities of the geometry make this a daunting task as N increases.For α ∈ (0, 1), in addition to the geometric complexity, there is not yet even an existence and uniqueness theory (nor an associated invariance principle) for the proposed diffusion process.Shah and Wischik [19] also considered multi-hop switched networks, and it would be interesting to see if a diffusion approximation could be established for that more general network setting.The interested reader may also wish to consult the recent work of Shah et al. [15,16,18] who, for several policies, have investigated bounds on the total system workload for input-queued switches as a function of both N and the distance from the heavy-traffic boundary.
This paper is organized as follows.In Section 2, we introduce our model for an input-queued switch operating under a maximum weight matching policy.We also introduce the key performance processes there, namely, an N 2 -dimensional queue-length process with one component for each inputoutput pair and a (2N − 1)-dimensional workload process which tracks the work (measured in packets) waiting in each of the first (N − 1) input ports, waiting for the first (N − 1)-output ports and the total work waiting in the system.In Section 3, we introduce the setup for our heavy traffic diffusion approximation result.In particular, we consider an N × N input-queued switch with a sequence of arrival processes, where the load on all input and output ports approaches full capacity as one moves along the sequence.The arrival processes are assumed to satisfy functional central limit theorems.
In Section 4, we introduce a critical fluid model associated with the switch model.We recall the characterization of its invariant states and a sufficient condition for multiplicative state space collapse given in [19].The definition of our diffusion process (which is a semimartingale reflecting Brownian motion living in a polyhedral cone) is given in Section 5, where the main result of this paper is also stated.The proof of this result is given in Section 6.When coupled with the multiplicative state space collapse result of Shah and Wischik [19], our main result yields a diffusion approximation for the workload process (and hence for the queue-length process) for a heavily loaded input-queued switch.In Section 7, with a view to future research, we discuss the main difficulty in proving diffusion approximations for an input-queued switch operating under the MW-α policies for α = 1.
1.1.Notation and terminology.The following notation will be used throughout the paper.The symbol N will denote the set of strictly positive integers and N 0 = N ∪ {0} will denote the set of non-negative integers.The symbol R will denote the set of real numbers and We denote the inner product on R d by •, • , that is, x, y = d i=1 x i y i , for x, y ∈ R d .The usual Euclidean norm on R d will be denoted by | • | so that |x| = ( d i=1 x 2 i ) 1/2 for x ∈ R d .For x, y ∈ R d , x ≥ y means x i ≥ y i for all i = 1, . . ., d.We will let A ′ denote the transpose of A, where A is a matrix or a vector.Let |A| denote the norm of an m × n matrix A (for m, n ∈ N), We denote by D([0, ∞), R d ) (for some d ∈ N) the space of right continuous functions with finite left limits (r.c.l.l.functions) from R + into R d and we endow this space with the usual Skorokhod J 1 -topology (see Section 5 of Chapter 3 of Ethier and Kurtz [9]) under which it is a Polish space.We denote by All continuous time stochastic processes used in this paper will be assumed to have r.c.l.l.paths in some Euclidean space.Consider W, W 1 , W 2 , . .., each of which is a d-dimensional process (possibly defined on different probabil-ity spaces).The sequence {W n } ∞ n=1 is said to be C-tight if the probability measures induced by the sequence {W n } ∞ n=1 on (D([0, ∞), R d ), M d ) form a tight sequence and if each limit point, obtained as a limit in distribution along a subsequence, almost surely has sample paths in C([0, ∞), R d ).The notation "W n ⇒ W " will mean that "W n converges in distribution to W as n → ∞".
Throughout this paper, we shall use Propositions to state previously known or standard results.Lemmas will state supporting results proved here and the Theorem will indicate our main result.

Model description.
Fix an integer N ≥ 2. The structure for an N ×N input-queued switch is illustrated in Figure 1.
An N × N input-queued switch has N input ports and N outport ports.Time is slotted so that packets of fixed size arrive at the switch at the beginning of a time slot.For concreteness, time slot n denotes the time interval [n − 1, n), n ∈ N, and we assume zero packets arrive at time zero.Packets arriving at input port i and destined for output port j are stored in a first-in-first-out (FIFO) buffer called a "Virtual Output Queue" (VOQ), denoted here by VOQ ij .The use of VOQs avoids the so-called head-of-line blocking phenomenon (cf.[1] and [11]).Thus, there are N 2 separate VOQs, one for each input-output pair.The packets arriving at an input port are switched from that port to the appropriate output port by a crossbar fabric.At the beginning of each time slot (just after packets have arrived for the time slot), a matching policy specifies which input ports are to be connected to which output ports during the time slot.At the end of that time slot, input port i transfers one packet to output port j if they are matched to one another and VOQ ij is non-empty.In each time slot, each input port can transmit at most one packet and each output port can receive at most one packet.Due to the constraints just mentioned, this scheduling amounts to choosing a bipartite matching (or bijection) between the sets of input ports and output ports at the beginning of each time slot.Such a matching may connect input port i to output port j even if VOQ ij is empty, in which case no packet is transmitted and the system is said to "transmit a blank".

Stochastic primitives.
For each i, j ∈ I .= {1, 2, . . ., N } and integer n ∈ N, let I ij (n) denote the number of packets that have arrived to input port i that are destined for output port j in the time interval (0, n], and set I ij (0) = 0. Note that these packets are stored in VOQ ij .We assume that, for i, j ∈ I, I ij (•) is defined from a sequence of i.i.d., nonnegative, integer valued random variables {ϑ ij (k) : k ∈ N} with mean λ ij ∈ (0, ∞) and variance b ij ∈ (0, ∞), where for each k ∈ N, ϑ ij (k) denotes the number of packets that arrive to input port i and that are destined for output port j at time k.Then I ij (n) has the representation: (1) where an empty sum is defined to be zero.We assume that the sequences {ϑ ij (k) : k ∈ N} for i, j ∈ I are mutually independent, and so the processes I ij (•) for i, j ∈ I are mutually independent.For each n ∈ N 0 , let I(n) denote the N 2 -dimensional vector determined by {I ij (n) : i, j ∈ I} such that I ij (n) is the ((i − 1)N + j) th entry of I(n).This notation allows us to refer to the N 2 -dimensional vector I(n) rather than an N × N matrix.
2.3.Scheduling policy: Maximum weight matching.The N × N inputqueued switch is assumed to be operated under a maximum weight matching policy, denoted by MWM policy.At the beginning of each time slot (just after the arrival of new packets), a matching is chosen by the MWM policy to connect input ports and output ports during that time slot.The matching is chosen as a function of the number of packets in each of the VOQs at the beginning of the time slot.For i, j ∈ I and n ∈ N 0 , let Q ij (n) denote the number of packets in VOQ ij at time n (immediately after the arrival of any packets at time n), and let Q(n) denote the N 2 -dimensional vector whose where D ij (n) denotes the cumulative number of packets that have departed from VOQ ij by time n.For concreteness, we imagine the packets departing at the end of each time slot, just before the beginning of the next time slot, e.g., D ij (1) is the number of pockets that departed during the time slot [0, 1).We next describe the MWM policy more precisely, which, in turn, specifies the form of D ij (•).We can represent a matching π by an N × N matrix of zeros and ones π = [π ij ], where π ij = 1 if input port i is connected to output port j, otherwise π ij = 0.The matrix π has exactly one 1 in each row and column.It can be viewed as representing a permutation of N elements and there are a total of N !distinct matchings.We denote the set of distinct matchings by Π.At time n for n ∈ N 0 , the MWM policy chooses a matching π in the following way: immediately after the arrivals of packets at time n, for each matching π, the weight of π over the time interval [n, n + 1), denoted by Then, the MWM policy chooses the matching for use in the time interval [n, n + 1).(If there is more than one matching with maximal weight, a deterministic ordering of matchings is assumed and the earliest matching in the ordered list is used.)At the end of the interval [n, n + 1), packets are transferred according to the matching π * (Q(n)).
For each i, j ∈ I and n ∈ N 0 , we can now give a description of D ij (n).Let D ij (0) = 0.For each matching π, let T π (0) = 0 and for n ∈ N, let T π (n) denote the total number of time slots in [0, n) that the matching π has been used by the MWM policy.Then, Combining (2) and (7) we see that is the cumulative number of "blanks" transmitted from input port i to output port j up to time n.

Workload process.
In this subsection we define a (2N −1)-dimensional workload process W from the queue-length process Q.
For n ∈ N 0 and i, j ∈ I \ {N }, define For i, j ∈ I \ {N } and n ∈ N 0 , the quantity W i (n) is the amount of work (measured in packets) that is waiting at input port i at time n, W N −1+j (n) is the amount of work destined for output port j that is waiting to be transmitted at time n, and W 2N −1 (n) is the total amount of work in the system at time n.We have used this non-symmetric form of workload process, rather than the symmetric 2N -dimensional process that has one workload component for each input port and each output port, because the former has no redundant components whereas the latter has one redundant component.Note in particular that the workload at input port N is given by Then the workload process W can also be written in the following compact form: Let A ij denote the ((i−1)N +j) th column of A for i, j ∈ I and A ij k denote the k th element of this column for k = 1, . . ., 2N − 1.Then ( 12) and for any π ∈ Π, i,j∈I Thus, combining ( 5), ( 8), ( 11) and ( 13), we have that for each n ∈ N 0 , The following lemma shows that the matrix A has full row rank.
Lemma 2.1.The rows of the matrix A are linearly independent, hence A has row rank 2N − 1.
Proof.Let {A k : k = 1, . . ., 2N − 1} denote the set of rows of the matrix A and let {c k : k = 1, . . ., 2N − 1} be a set of constants such that In particular, c N = 0.Moreover, we can also observe that the (( Then we get that c 2N −1 = 0, and hence c i = 0 for each 1 ≤ i ≤ N − 1.That proves the lemma. Next we introduce two additional matrices.Let B be the 2N × N 2 matrix such that (e ′ i , e ′ j ) ′ is the ((i − 1)N + j) th column of B for i, j ∈ I, where for each i ∈ I, e i ∈ R N is the i th unit coordinate vector in R N .For example, when N = 2, the matrix B has the form It is not difficult to see that the matrix B has row rank 2N − 1.In fact, BQ would define a symmetric version of workload.Let C be the 2N × (2N − 1) matrix such that ( 16) It is readily verified that ( 17) We end this subsection by introducing the following rank property of the matrix AB ′ .
Proof.It suffices to show the null space relation: since if this holds, from the fact that B ′ has column rank 2N − 1, we have N (AB ′ ) is one-dimensional, and then since AB ′ is a (2N − 1) × 2N matrix, AB ′ has full row rank.
2.5.State descriptor.The dynamics of the switch are described by the following collection of processes: We extend χ(•), and its constituent processes, to be defined on R + such that for each t ∈ R + , where ⌊t⌋ denotes the integer part of t.

Sequence of systems and scaling.
To establish a diffusion approximation for our switch model, we now consider a sequence of switch models indexed by r where r tends to infinity through a sequence of positive real values.(To ease notation, we suppress the sequence indexing on r.)The basic switch structure with associated matrices A, B and C does not vary with r.Each member of the sequence is a stochastic system as described in the previous section.We append a superscript of r to any process, sequence of random variables or parameter associated with the r th system that can vary with r.Thus, we have processes I r , {T r π : π ∈ Π}, U r , D r , Q r , W r , sequences of random variables {ϑ r ij (k) : k ∈ N} for i, j ∈ I, and parameters λ r ij and b r ij for i, j ∈ I.The r th switch model has the associated state descriptor 3.1.Heavy traffic assumption.To obtain a heavy traffic diffusion approximation for our sequence of switch models, we impose the following heavy traffic condition.Let λ r denote the N 2 -dimensional vector determined by {λ r ij : i, j ∈ I} such that λ r ij is the ((i − 1)N + j) th entry of λ r .
We note that ( 20)-( 21) imply that Aλ = v.This condition can be interpreted as meaning that in the heavy traffic limit, all of the input ports and output ports are heavily loaded, that is, N j=1 λ ij = 1 for i ∈ I and N i=1 λ ij = 1 for j ∈ I. Thus, λ is a doubly stochastic matrix and by the Birkhoff-von Neumann theorem (see [2], Theorem 5.6 for a proof), it can be expressed as a convex combination of the permutation matrices π ∈ Π.

Diffusion scaling.
Let us consider the following diffusion scaling.For r > 0 and t ∈ R + , we define For each t ∈ R + , it follows from ( 14) and ( 22)-(25) that For each i, j ∈ I, recall that A ij denotes the ((i − 1)N + j) th column of the matrix A, and let Note that since T r π , Q r only jump at positive integer times, it follows from (9) that Y r ij can jump at time s only if Q r ij (s−) = 0 and so for each i, j ∈ I, 3.3.Functional central limit theorem for stochastic primitives.We now introduce assumptions which will imply a functional central limit theorem for the diffusion scaled external packet arrival processes { I r : r > 0}.
Assumption 3.2.For each r and i, j with finite mean λ r ij and finite variance b r ij satisfying ( 20)-( 21) and (31) b r ij → b ij as r → ∞, and the following condition (of Lindeberg type) holds: For each r, the sequences {ϑ r ij (k) : k ∈ N} for i, j ∈ I are mutually independent.
Proposition 3.1.Let ν be a probability measure on R 2N −1 .Suppose that Assumptions 3.1 and 3.2 hold and W r (0) converges in distribution as r → ∞ to a (2N − 1)-dimensional random variable W 0 with distribution ν.Then, where W 0 is independent of I and I is an N 2 -dimensional Brownian motion starting from the origin that has zero drift and This proposition follows directly from the standard functional central limit theorems for triangular arrays (cf.Theorem 18.2 of [3]).We assume henceforth that Assumptions 3.1 and 3.2 hold.
4. Fluid model, invariant states and multiplicative state space collapse.In this section, we consider a critical fluid model for the inputqueued switch.We specialize to our situation some items developed for switched networks with maximum-weight-like policies in [19], namely, the form of the model, the characterization of invariant states and sufficient conditions for multiplicative state space collapse.Throughout this section, we assume the critical loading condition Aλ = v for λ ∈ R N 2 + satisfying λ > 0.

, d, is absolutely continuous. A regular time for an absolutely continuous function
+ is a value of t ∈ (0, ∞) at which each component of f is differentiable.(Since f is absolutely continuous, almost every time t ∈ (0, ∞) is a regular time for f and f can be recovered via integration of its a.e.defined derivative.)A uniformly Lipschitz continuous function is absolutely continuous.
+ such that there exist two families of nondecreasing, nonnegative, continuous functions {U ij : i, j ∈ I} and {T π : π ∈ Π} defined from [0, ∞) into R + satisfying for all i, j ∈ I and t ∈ R + , and for all π ∈ Π, It is clear that {T π : π ∈ Π} are Lipschitz continuous with Lipschitz constant one and it follows from the oscillation inequality (cf.Theorem 5.1 of Williams [22]) for solutions of the one-dimensional Skorokhod problem that {Q ij , U ij : i, j ∈ I} are uniformly Lipschitz continuous as well.In particular, almost every t > 0 is a regular time for {T π : π ∈ Π}, Q, U , and at such a time, for all i, j ∈ I, + is an invariant state for the fluid model if there exists a fluid model solution Q(•) such that Q(t) = q for all t ∈ R + .
For the situation of critical loading (Aλ = v) treated here, the following is a version of an optimization problem considered by Shah and Wischik [19]; see Definition 5.3 therein with f (x) = x.(In fact, Shah and Wischik [19] treated a more general situation, where some input and output ports can be underloaded.)Using this optimization problem, results in [19] yield a characterization of the invariant states of the fluid model and the property that fluid model solutions converge towards the manifold of invariant states as time goes to infinity.Indeed, as shown in [19], any fluid model solution Q satisfies the constraints of the optimization problem for all time with w = AQ(0) and the objective function in the problem can be used to create a Lyapunov function to prove the convergence towards the invariant manifold.x ∈ R N 2 + .
Since the function x → |x| 2 is strictly convex, and tends to ∞ as |x| → ∞, and the feasible set of (40) is non-empty, closed, convex, and bounded below, then (40) has a unique solution.Let ∆(w) denote the unique optimal solution.Since the constraints are linear, we can represent ∆(w) in terms of Lagrange multipliers (see [5], Proposition 3.4.1): Note that p here need not be unique.The function ∆ is called the lifting map, which has the following properties.
Proof.The first property follows from Corollary II.3.1 and Corollary I.3.4 of Dantzig et al. [6].The second property follows by the same argument as in the proof of the second property in Proposition 4.1 of [12].
The next proposition provides a representation of the invariant states.
+ is an invariant state if and only if q = ∆(Aq).
Proof.It follows from Lemma 5.11 of [19] that q ∈ R N 2 + is an invariant state if and only if q = ∆ W (q), where the map ∆ W is defined in Definition 5.3 of [19].(Here we use ∆ W instead of the ∆W in [19] to distinguish from the symbols used in this paper).The map ∆ W is the composition of two maps ∆ and W , where ∆ is defined by (35) of [19] and W is a workload map defined right before (35) of [19].In the setting of input-queued switches, Theorem 8.3 of [19] shows that W (q) = Bq = CAq and with Lemma A.1 of [19] we see that ∆ W (q) = ∆(Aq), and the desired result follows.
We now show that the lifting map ∆ maps R 2N −1 + into the set of invariant states and we give a representation for it.
To establish the second part of the lemma, let w ∈ W.There exists a p ∈ R 2N + such that w = AB ′ p.It follows from the argument above that On the other hand, since AB ′ has full row rank, its Moore-Penrose pseudoinverse (AB ′ ) † satisfies AB ′ (AB ′ ) † = I 2N −1 .Then we have Thus, by (18) This and (46) imply that (43) holds.
Definition 4.4 (MSSC).We say that multiplicative state space collapse holds if for each Remark 4.2.If the above holds without the factor in the denominator of (47), then state space collapse is said to hold, that is, for each Shah and Wischik [19] adapted a method of Bramson [4] and combined it with asymptotic behavior of fluid model solutions to give sufficient conditions for multiplicative state space collapse to hold for switched networks operating under maximum-weight-like policies.The following proposition is a consequence of their results specialized to single-hop networks and the maximum weight matching policy considered here.Proposition 4.3.Suppose that Assumptions 3.1 and 3.2 hold and for each r > 0, the initial queue sizes {Q r ij (0) : i, j ∈ I} are non-random and satisfy lim r→∞ Q r (0) = q 0 for some invariant state q 0 .In addition, suppose that there exists a sequence {δ n : n ∈ N} ⊂ R + such that δ n → 0 as n → ∞ and Then multiplicative state space collapse holds.
Proof.The input queued switch model in this paper is a special case of the single-hop switched network considered in [19], and the maximum weight matching policy considered here is the max-weight policy (or MW-f policy with f (x) = x) considered in [19].For each r > 0, the arrival process I r (•) has stationary increments with mean arrival rate vector λ r .It follows from ( 20)-( 21) that λ r → λ as r → ∞ and Aλ = v, which means all of the input ports and output ports are heavily loaded in the limit.By virtue of the fact that λ can be expressed as a convex combination of permutation matrices (see the discussion at the end of Section 3.1), λ is admissible in the sense of Definition 5.1 of [19].Although r is restricted to N in [19], the proofs apply whenever r > 0 tends to infinity through an increasing sequence of values.With this observation, the proposition follows directly from Theorem 7.1 of [19].
Remark 4.3.As noted following Assumption 2.5 in [19], the last condition of Proposition 4.3 is satisfied in our setting if for each i, j ∈ I, the sequence {ϑ r ij (k) : k ∈ N} satisfies a uniform fourth moment bound of the following type: < ∞ for all i, j ∈ I.

Diffusion approximation. Let
(50) It will be shown in Lemma 6.1 that the polyhedron W has N 2 boundary faces given by ( 51) Recall the diagonal matrix Ξ from Proposition 3.1.Let Γ = AΞA ′ , which is a (2N − 1) × (2N − 1) covariance matrix.We assume that Γ is non-degenerate, i.e., it is strictly positive definite.(This holds if b ij > 0 for all i, j ∈ I, for example.)Also let µ be a probability measure on W, where W is endowed with the Borel σ-algebra of R 2N −1 .Recall γ ij defined in (27) for each i, j ∈ I and θ in (21).
Definition 5.1.A Semimartingale Reflecting Brownian Motion that lives in the polyhedral cone W, has direction of reflection γ ij on the boundary face B ij for each i, j ∈ I, has drift θ and covariance matrix Γ, and has initial distribution µ on W, is an {F t }-adapted, (2N − 1)-dimensional process W defined on some filtered probability space (Ω, F, {F t }, P ) such that (ii) P -a.s., W has continuous paths, W (t) ∈ W for all t ∈ R + , and W (0) has distribution µ, (iii) under P , (a) X is a (2N − 1)-dimensional Brownian motion starting from the origin with drift θ and covariance matrix Γ, iv) for each i, j ∈ I, Y ij is an {F t }-adapted, one-dimensional process such that P -a.s., Remark 5.1.We call a process that satisfies the above properties an SRBM associated with the data (W, {γ ij : i, j ∈ I}, θ, Γ, µ).Geometric conditions for existence and uniqueness in law of an SRBM are given in Dai and Williams [8].It will be shown in the course of proving our main result, Theorem 5.1, that these are satisfied by (W, {γ ij : i, j ∈ I}).
We now state the main result of this paper.
Theorem 5.1.Suppose that Assumptions 3.1 and 3.2 and multiplicative state space collapse all hold.Let W, {γ ij : i, j ∈ I}, θ, Γ be as described at the beginning of this section.Suppose that W r (0) converges in distribution to a random variable with distribution µ on W. Then ( W r , Q r ) converges in distribution as r → ∞ to a continuous process ( W , Q), where W is an SRBM associated with the data (W, {γ ij : i, j ∈ I}, θ, Γ, µ) and Q = ∆( W ).
The following corollary follows directly from Theorem 5.1 and Proposition 4.3.
Corollary 5.1.Suppose that all of the assumptions in Proposition 4.3 hold and Γ is non-degenerate.Then the conclusion in Theorem 5.1 holds, where µ is the point mass at Aq 0 .
As an illustration of Corollary 5.1, consider a 2 × 2 input-queued switch operated under the MWM policy.The workload space W in this case is a polyhedral cone lying strictly inside R 3 + that has the following representation: The set of directions of reflection {γ ij : 1 ≤ i, j ≤ 2} associated with the four boundary faces is given by The workload space W in this case is depicted in Figures 2-3.The conclusion of Corollary 5.1 applies under Assumptions 3.1 and 3.2, provided Γ is nondegenerate and the initial queue sizes {Q r ij (0) : i, j ∈ I} are non-random and satisfy lim r→∞ Q r (0) = q 0 for some invariant state q 0 .Note that the latter holds if the VOQs in each of the switches indexed by r start empty and µ is the point mass at the origin in R 3 + .Then the limiting SRBM W lives in W. It behaves like Brownian motion with a constant drift θ and covariance matrix Γ inside W and it is confined to W by instantaneous reflection (or pushing) at the boundary where the direction of reflection on B ij is given by γ ij , i, j ∈ I.This pushing on the boundary corresponds to an underutilization of processing capacity (or transmittal of blanks) in the original system, where pushing on boundary face B ij corresponds to the situation where VOQ ij is empty and so there are no packets to transmit from there and processing effort allocated there is wasted.In the diffusion approximation, this results in an instantaneous increase in W i (for i = N ), W N −1+j (for j = N ) and W 2N −1 on B ij .We thus see the constraints imposed by the switch architecture due to simultaneous processing from more than one queue are encoded in the geometry of the approximating diffusion.

Proof of diffusion approximation.
In this section we prove the diffusion approximation result, Theorem 5.1.Throughout this section, we assume that the hypotheses of Theorem 5.1 hold.For the proof of this theorem, we apply an invariance principle for SRBMs that we developed in [13].
To obtain the convergence of W r , we will apply Theorem 5.4 of [13].To justify use of this theorem, we need to verify Assumptions (A1)-(A5), 4.1, and 5.1 of [13].We first verify, in Section 6.1, that the workload space W and the directions of reflection {γ ij : i, j ∈ I} satisfy the geometric properties in Assumptions (A1)-(A5) and 5.1 of [13].The verification of Assumption 4.1 of [13] relies on an oscillation inequality that we prove in this paper in Section 6.2.In Section 6.3, using the oscillation inequality, we prove that state space collapse follows from multiplicative state space collapse.Finally, in Section 6.4 we verify Assumption 4.1 of [13], which ensures that W r satisfies a perturbed version of the SRBM definition, and we prove the convergence of W r by applying Theorem 5.4 of [13].The convergence jointly of Q r with W r then follows from the state space collapse already established in Section 6.3.6.1.Verification of geometric conditions.Recall that the workload space W is given by (50), where AB ′ is a (2N − 1) × (2N ) matrix that has the following entries: for 1 Proof.We first show that W ⊆ G. Let x ∈ W. Then x = AB ′ p for some p ∈ R 2N + .It follows from (58) that for each i, j ∈ I, There is a p ∈ R 2N such that x = AB ′ p, since AB ′ has rank 2N − 1.Now we show that there is a p * ∈ R 2N + such that x = AB ′ p * .We use p to construct p * .If p 1 ≥ 0, then we let p 1 = p.If p 1 < 0, we let p 1 be such that p 1 i = p i −p 1 and p 1 N +i = p N +i +p 1 for i ∈ I. Then we have p 1 1 = 0 and by (55), If p 1 2 ≥ 0, we let p 2 = p 1 .If p 1 2 < 0, we let p 2 be such that Then by ( 55) again, we have that x = AB ′ p 2 and p 2 1 ≥ 0, p 2 2 ≥ 0. Continuing in this manner, we can construct for i ∈ I.By (60) and the fact that x ∈ G, we know that for 1 ≤ i ≤ N , and p N +1 N +1 = 0. Using (55) again, we have that x = AB ′ p N +1 and p N +1 i ≥ 0 for all 1 ≤ i ≤ N +1.Continuing in this manner, we can construct p 2N ∈ R 2N such that x = AB ′ p 2N and p 2N i ≥ 0 for all 1 ≤ i ≤ 2N .Letting p * = p 2N , we conclude that G ⊆ W.
To prove the second claim in the lemma, fix i, j ∈ I.For each x ∈ B ij , we know that x = AB ′ p for some p ∈ R 2N + with p i = p N +j = 0. Then n ij , x = 0 by (60).Hence x ∈ ∂G ij and since x ∈ W = G, it follows that x ∈ ∂G ij ∩ ∂G.On the other hand, for each x ∈ ∂G ij ∩ ∂G, since W = G, we have that x = AB ′ p for some p ∈ R 2N + and n ij , x = 0. Then by (60), we have that The next lemma establishes certain geometric properties of the set of directions of reflection {γ ij : i, j ∈ I}.For each x ∈ ∂W, let Lastly, suppose that i = j = N .Then 1 ≤ k, l ≤ N − 1 and Combining all of the above, we see that kl∈I(x) n ij , γ kl > 0. Thus, since I(x) is finite, (62) follows.
We next establish (61).The argument is similar to that for (62), but it differs in some details as there is not exact symmetry between {n ij } and {γ kl }.It suffices to show that (70) where because we can then set b kl (x) = a kl H(x) for kl ∈ I(x), where H(x) = kl∈I(x) a kl and (61) holds.To show (70), fix ij ∈ I(x).Now, kl∈I(x) Note from (63) that a ij n ij , γ ij > 0. Fix k = i, l = j, and consider the term between the braces {} in (71).If kl / ∈ I(x), then the inner product of γ ij with the brace term is non-negative by ( 64)-( 65).On the other hand, if kl ∈ I(x), then we know from (67) that il, kj ∈ I(x).We will show that, in this case, the inner product of γ ij with the brace term is strictly positive.There are a number of cases to consider.Suppose first that 1 Hence it follows that The above display holds since the sum of the first term and the third term on the right-hand side of the equals sign is non-negative because N ≥ 2.
Suppose next that 1 Lastly, suppose that i = j = N .Then 1 ≤ k, l ≤ N − 1 and Combining all of the above, we obtain that kl∈I(x) a kl n kl , γ ij > 0 for ij ∈ I(x) and (61) follows.Corollary 6.1.Assumptions (A1)-(A5) of [13] are all satisfied by W and {γ ij : i, j ∈ I}, where W = G given by (59) is a minimal description for W.
Proof.In Lemma 6.1, we showed that W is a convex polyhedron with representation given by the intersection of the half-spaces G ij , i, j ∈ I.We now prove by contradiction that such a representation is minimal in the sense that no proper subcollection defines W. Suppose that there exist i, j ∈ I such that W = (k,l) =(i,j) G kl .Let p ∈ R 2N be a vector such that pi = pN+j = −1 and pk = pN+l = 1 for k = i and l = j, and let x = AB ′ p.By (60), we have that Then x ∈ W. Hence there exists p * ∈ R 2N + such that x = AB ′ p * .It follows that AB ′ (p * − p) = 0 ∈ W. By the second equality in (60), which holds for any p ∈ R 2N , we obtain that In particular, p * i +p * N +j = pi + pN+j .Since p * i +p * N +j ≥ 0 and pi + pN+j = −2, we have the contradiction, and this proves the minimality of the representation.
To see that W has non-empty interior, for p ♯ > 0, let x ♯ = AB ′ p ♯ .Then it follows from (50) that x ♯ is in the interior of W. As noted in Section 3 of [13], it follows that the conditions on the geometry of W, Assumptions (A1)-(A3) of [13], are satisfied.Since the {γ ij : i, j ∈ I} are constant vectors, they are trivially uniformly Lipschitz continuous, and since they are also of unit length, Assumption (A4) of [13] holds.Assumption (A5) of [13] is implied by Lemma 6.2.Remark 6.1.As noted in [13], in the context of W being a convex polyhedron with minimal description and constant vector fields {γ ij : i, j ∈ I} on the boundary faces, (A5) is equivalent to Assumption 5.1 of [13].
6.2.Oscillation inequality.The following oscillation inequality will be used in combination with the multiplicative state space collapse condition (Definition 4.4) to show that state space collapse (Theorem 6.2) holds.Also it is the key in verifying Assumption 4.1 of [13].For the statement of the oscillation inequality, we need the following notation.For any 0 ≤ s < t < ∞ and any integer k ∈ N, let D([s, t], R k ) denote the set of functions x : [s, t] → R k that are right continuous on [s, t) and have finite left limits on (s, t], and for

Lemma 6.3 (Oscillation Inequality
).There exists a constant c 0 > 0 such that for any δ > 0 and any 0 the following hold: Proof.A local version of this oscillation inequality is given in Theorem 4.1 of [13].The main point of our argument below is to show that for the polyhedral cone W and constant directions of reflection {γ ij : i, j ∈ I}, the inequality actually holds globally.Since W and {γ ij : i, j ∈ I} satisfy Assumptions (A1)-(A5) of [13], Theorem 4.1 of [13] holds with W and γ ij in place of G and γ i there.By the remark after Lemma A.3 and the construction of Π m in the beginning of the proof of Theorem 4.1 of [13], we obtain that there is a constant c 0 > 0 such that Π(u) ≤ c 0 u for each u ≥ 0, where the function Π is constructed in the proof of Theorem 4.1 of [13].By the fact that for each i, j ∈ I, B ij belongs to a hyperplane, we can choose R(•) in Assumption (A2) of [13] such that R(ε) = ∞ for each ε ∈ (0, 1).Since γ ij is a constant vector for each i, j ∈ I, then the Lipschitz constant L in Assumption (A4) of [13] can be arbitrarily small and hence, ρ 0 = a 4L in [13] can be arbitrarily large.Then the quantity min ρ 0 4 , R(a/4) 4 in Theorem 4.1 of [13], which restricts the size of the neighborhood of a point in W in which the oscillation inequality holds, can be arbitrarily large.It follows that the oscillation inequality in Theorem 4.1 of [13] holds globally for paths w in W with Π(u) ≤ c 0 u for all u ≥ 0. 6.3.State space collapse.We first state and prove the following two preliminary lemmas.Lemma 6.4.{ W r (0) + X r (•) : r > 0} converges in distribution to a (2N − 1)-dimensional Brownian motion with initial distribution µ, drift θ and covariance matrix Γ.
we see that the left hand inequality of (75) holds.On the other hand, for i, j ∈ I and t . By letting c 2 = N , we obtain the right hand inequality of (75).
We are now ready to prove the main lemma used in proving the state space collapse.By (25), we obtain that for each t ∈ R + , (76) W r (t) = W r (t) + ξ r (t), where W r (t) = A∆( W r (t)), (77) By (41), we have that (79) W r (t) ∈ W for all t ∈ R + and r > 0.
Recall from ( 28)-( 29) that for each r > 0 and where for each δ > 0 fixed and each r > 0, t ∈ R + , i, j ∈ I, (82) We then have the following estimate.
Lemma 6.6.For each T > 0, δ > 0, there exists r(T, δ) > 0 such that for all r ≥ r(T, δ), (83) Proof.Fix T > 0 and δ > 0. By the convergence assumed for the initial random variables { W r (0) : r > 0}, the convergence in distribution of { W r (0) + X r (•) : r > 0} established in Lemma 6.4, multiplicative state space collapse and the convergence in distribution of I r to the Brownian motion I in (33), we have that, for each ε > 0, there are constants K 0 ≥ 1 (not depending on ε) and r 0 (ε) > 0 such that for all r ≥ r 0 (ε), (84) The constants K 0 and r 0 (ε) depend on T and δ as well, but since these parameters are fixed throughout this proof, we do not explicitly indicate that dependence here.In the following, ε > 0 will be fixed.A specific, suitably small value of ε will be chosen later (as a function of T and δ) to ensure that various inequalities hold.For r > 0, let From ( 84)-( 86) we have that for all r ≥ r 0 (ε), ( 87) Now, for r ≥ r 0 (ε), on O r,ε , by (78) we have We now focus on when Y r can increase.Fix r ≥ r 0 (ε), i, j ∈ I, and ω ∈ O r,ε .Recall the constraints on where Y r ij can increase from (28).Fix a time instant t * ∈ (0, T ] such that Q r ij (t * −, ω) = 0. Since ω ∈ O r,ε , we have (89 By (41), there is p r (t * , ω) ∈ R 2N + such that ∆ W r (t * −, ω) = B ′ p r (t * , ω).Then it follows from the definition of B and (89) that (90) By the definition (77) of W r , we have It then follows from (60), ( 90) and (91) that Notice that by ( 5) and ( 8), the jump of Q r ij at time t * is bounded in size by the size of the jump of I r ij at time t * plus one (bounding a possible jump of the departure process).Thus, for r ≥ 8|A| √ N /δ, .
It follows from this and (88) that for r ≥ r Finally, on combining (96) with (93), we obtain that for r ≥ r The following is immediate from Lemma 6.6.Corollary 6.2.Under the assumptions of Theorem 5.1, state space collapse holds.Remark 6.2.In Theorem 7.7 of [15], Shah et al. use an alternative method to give a proof that multiplicative state space collapse implies state space collapse under some conditions.Their method utilizes an a priori probabilistic bound on Q r that they obtain using a Lyapunov drift technique.Indeed, their result applies to input-queued switches operating under MW-α policies for α ≥ 1.Although we focus on the case α = 1 here, our methodology in fact extends to allow a proof for all α ∈ (0, ∞).Furthermore, Shah et al. require that the heavy traffic limit be reached through a sequence of strictly underloaded systems, whereas our result allows for the limit to be approached through critically loaded or even overloaded systems; in other words, the Shah et al. result assumes θ in our Assumption 3.1 has all components strictly negative, whereas we do not restrict the sign of the components of θ at all.In summary, we have included our proof here because it allows a more flexible heavy traffic assumption and our methodology readily extends to allow a proof of Corollary 6.2 for MW-α policies for all α ∈ (0, ∞).6.4.Proof of Theorem 5.1.Recall that we are assuming that the hypotheses of Theorem 5.1 hold.It suffices to show that the conditions of Theorem 5.4 of [13] hold, from which it will follow that W r converges in distribution as r → ∞ to an SRBM associated with the data (W, {γ ij : i, j ∈ I}, θ, Γ, µ).The joint convergence of Q r with W r and Q = ∆( W ) will then follow by the state space collapse of Corollary 6.2 and the continuity of ∆ established in Proposition 4.1.The conditions of Theorem 5.4 of [13] fall into four groups.We treat each of these groups separately below.
Firstly, as verified in Corollary 6.1, the workload space W is a convex polyhedron having non-empty interior described as the intersection of a minimal set of half-spaces, and the directions of reflection {γ ij , i, j ∈ I} satisfy Assumption 5.1 of [13].
Secondly, we verify that Assumption 4.1 of [13] holds.Recall the existence of r(T, δ) from Lemma 6.6.Let {r k : k ∈ N} be a strictly increasing sequence of positive constants such that for each k ∈ N, r k ≥ r(k, 1 k ) and r k → ∞ as k → ∞.Define δ r such that δ r = 1 when r ≤ r 1 and δ r = 1 k when r k < r ≤ r k+1 for k ∈ N. Then δ r → 0 as r → ∞ and by Lemma 6.6, for each k ∈ N and r k < r ≤ r k+1 , (97) If for each i, j ∈ I and t ∈ R + , we define then from this, (29), ( 76)-(82), we have that for each r > 0 and t ∈ R + , Lemma 6.7.The process ξ r (•) and the processes ζ r,δ r ij (•), i, j ∈ I, all converge in probability to zero processes as r → ∞.
Proof.For ξ r (•), for any T > 0, it follows from (97) that for each k > T and any r > r k , we have It follows that ξ r (•) converges in probability to the zero process as r → ∞.
Fix i, j ∈ I.By (97), we know that for each k > T and any r > r k , (104) It follows that the non-decreasing process ζ r,δ r ij (•) converges in probability to the zero process as r → ∞.
By Lemma 6.4, { W r (0) + X r (•) : r > 0} is C-tight.Combining the above, it follows that the conditions of Assumption 4.1 of [13] are satisfied with W in place of G, r in place of n, δ r in place of δ n , ij in place of i, γ ij in place of in place of α n , and ζ r,δ r in place of β n .Thirdly, we have the conclusion of Lemma 6.4.Fourthly, and finally, we must verify condition (vii) of Theorem 4.3 of [13] (with θ in place of µ there).This condition requires that for any weak limit point ( W , X, Y ) of {( W r , X r , Y r )}, { X(t) − θt : t ∈ R + } is a martingale relative to the filtration generated by ( W , X, Y ).(This condition is needed so that property (iii)(b) of Definition 5.1 will be satisfied).By Proposition 4.1 of [13], it suffices to prove the following lemma which implies that (vii) of Theorem 4.3 of [13] holds.Lemma 6.8.The process X r as given by (30) has the decomposition: where Xr , ǫr are (2N − 1)-dimensional processes satisfying the following conditions.
Proof.We first introduce some martingales that will be used in defining Xr .Let (Ω r , F r ) be the measurable space on which all of the processes indexed by r are defined.Let {H r t : t ∈ R + } be the filtration defined by H r t = σ{I r (s), Q r (0) : 0 ≤ s ≤ t}, t ∈ R + .
We first show that W r and U r are adapted to the filtration {H r t : t ∈ R + }.From the definitions, it is easy to see that Q r (0), I r (0), D r (0) ∈ H r 0 and Q r (0), I r (1) ∈ H r 1 .Since the MWM scheduling policy is being used, Thus, we have verified all of the hypotheses of Theorem 5.4 of [13] and the desired result, Theorem 5.1, follows.
7. Discussion of other policies.In this section we elaborate on some possible directions for further research.Shah and Wischik [19] studied switched networks under some generalizations of the MWM policy considered in this paper.In particular, they established a multiplicative state space collapse result (Theorem 7.1 of [19]) for input-queued switches operating under a MW-α policy for α ∈ (0, ∞), where the MW-α policy chooses the matching π in the time interval [n, n + 1) that maximizes the weight (111) (The MWM policy considered in this paper corresponds to α = 1).The set of invariant states for input-queued switches operating under a MW-α policy can then be characterized using the unique solution ∆ α (w) for w ∈ R This leads to a natural conjecture for a diffusion approximation to the workload process for a heavily loaded input-queued switch operating under a MW-α policy.This proposed diffusion lives in the cone: Proving a rigorous heavy traffic limit theorem justifying such a diffusion approximation for α = 1 is a natural research problem.Here we illustrate some of the challenges associated with proving such a result.These revolve around the fact that when α = 1, although the state space for the proposed diffusion approximation for the workload is a cone, it is not a polyhedral cone.
In fact, it has piecewise smooth curved boundary faces.The complexity of the geometry as N increases, and the current lack of a general existence  from the origin, the boundary and directions of reflection for the proposed diffusion approximation locally satisfy conditions required by the invariance principle given in [13].However, the workload cone has a "singular point" at the origin where the conditions in [13] fail to be satisfied.However, since this is an isolated point, we believe that the invariance principle in [13] and the uniqueness result of [8] can be adapted to validate the diffusion ap-proximation in this case.For higher dimensional analogues of this case, we anticipate that a valid diffusion approximation can be established.However, as N increases, it becomes more difficult to compute the inward normals to all boundary faces, and as yet we do not have a systematic way to characterize these geometric conditions.For the case α = 0.5, depicted in Figures 6-7 (which is representative of the case α ∈ (0, 1)), the workload cone is convex and has boundary faces that curve outwards.In fact, in this case, the workload cone has a C 1 boundary except at the origin and the direction of reflection is piecewise constant on the boundary.There is not yet an existence and uniqueness theory (nor an associated invariance principle) for the proposed diffusion process in this case.Furthermore, as for α > 1, the geometry of the cone becomes difficult to compute as N increases.
For a, b ∈ R, a∨ b denotes the maximum of a and b and a ∧ b denotes the minimum of a and b.The indicator function of the set B is denoted by 1 B (that is, 1 B (x) = 1 if x ∈ B and 1 B (x) = 0 otherwise).All vectors and matrices in this paper are assumed to have real-valued entries.

4. 1 .
Fluid model solutions.Fluid model solutions can be thought of as being obtained as formal limits of {χ r (•)} under law of large numbers scaling.The following terminology is used below.

Remark 4 . 1 .
Our fluid model description is equivalent to that in Definition 4.1 of [19].4.2.Invariant states.

Fig 2 .Fig 3 .
Fig 2. A portion of the polyhedral workload cone W is shown for a 2 × 2 input-queued switch.