Tail asymptotics for busy periods

The busy period for a queue is cast as the area swept under the random walk until it first returns to zero, $B$. Encompassing non-i.i.d. increments, the large-deviations asymptotics of $B$ is addressed, under the assumption that the increments satisfy standard conditions, including a negative drift. The main conclusions provide insight on the probability of a large busy period, and the manner in which this occurs: I) The scaled probability of a large busy period has the asymptote, for any $b>0$, \lim_{n\to\infty} \frac{1}{\sqrt{n}} \log P(B\geq bn) = -K\sqrt{b}, \hbox{where} \quad K = 2 \sqrt{-\int_0^{\lambda^*} \Lambda(\theta) d\theta}, \quad \hbox{with $\lambda^*=\sup\{\theta:\Lambda(\theta)\leq0\}$,} and with $\Lambda$ denoting the scaled cumulant generating function of the increments process. II) The most likely path to a large swept area is found to be a simple rescaling of the path on $[0,1]$ given by, [\psi^*(t) = -\Lambda(\lambda^*(1-t))/\lambda^*.] In contrast to the piecewise linear most likely path leading the random walk to hit a high level, this is strictly concave in general. While these two most likely paths have very different forms, their derivatives coincide at the start of their trajectories, and at their first return to zero. These results partially answer an open problem of Kulick and Palmowski regarding the tail of the work done during a busy period at a single server queue. The paper concludes with applications of these results to the estimation of the busy period statistics $(\lambda^*, K)$ based on observations of the increments, offering the possibility of estimating the likelihood of a large busy period in advance of observing one.

In contrast to the piecewise linear most likely path leading the random walk to hit a high level, this is strictly concave in general. While these two most likely paths have very different forms, their derivatives coincide at the start of their trajectories, and at their first return to zero. These results partially answer an open problem of Kulick and Palmowski [16] regarding the tail of the work done during a busy period at a single server queue. The paper concludes with applications of these results to the estimation of the busy period statistics (λ * , K) based on observations of the increments, offering the possibility of estimating the likelihood of a large busy period in advance of observing one.

Introduction
Consider S = {S k : k ≥ 0}, a random walk that starts at zero and has (not necessarily i.i.d.) increments process X = {X n : n ≥ 0}: The tail behavior of B (and related random variables) is of interest in several apparently distinct fields from queueing systems, to percolation, to insurance [16,18,12].
By simple rescaling arguments, should the following limit exist, it must have this form: This limit is established with K > 0 in [2] for the particular case of the the M/M/1 queue. By extending results in [12] from a fixed terminal point to the random terminal point τ via the infinite time-horizon sample path Large Deviation Principle setup in [13], Theorem 2 establishes the limit eq. (3) for a broad class of non-long-range dependent, light-tailed arrivals processes, providing a formula for K. The scaled Cumulant Generating Function (sCGF) associated with the scalar LDP is denoted Λ(θ) = lim k→∞ 1 k log E(exp(θS k )) for θ ∈ R.
Identifying λ * = sup{θ : Λ(θ) ≤ 0}, Theorem 2 shows that under Assumptions 1 and 2, Its proof establishes that the most likely path to a large swept area is strictly concave. The most likely path that first returns to zero at t = 1 is identified to be with all other most likely paths being simple rescalings of ψ * , as illustrated in Fig. 1. This is in contrast to the most likely path for the random walk to hit a high level, which is known to be piecewise linear, e.g. [1,14].
As an illustrative example, consider a random walk with i.i.d Gaussian increments for which everything is calculated in closed form in Sec. 4.1. Ten billion busy period paths were simulated and, of these, the one with the largest swept area as well as the one that attained the greatest height were recorded. In addition to plotting these paths in Fig. 2, conditioned on these values the most likely paths to these events are shown. The distinct shapes of the paths to these two unlikely events is apparent. Note that the terminal times of the theoretically predicted paths are deductions of the conditioned area and height, respectively, and are not constrained explicitly. Also shown is the logarithm of the empirical probability with which the busy period These results provide a partial answer to Open Problem 3.1 of [16] regarding precise asymptotics for the probability that B is large, by identifying the associated rough asymptotics and showing that most likely paths are typically strictly concave. They also reveal a lacuna in [3,Theorem 4.1] where the most likely path is assumed to be piecewise linear.
We conclude the paper in Sec. 5 with a discussion of the practical utility of these results. Given observations of the increments process X, estimates of the key quantities (λ * , K) can be created that, under additional restrictions on X, can be shown to satisfy a large deviation principle. This offers the possibility of accurately estimating the likelihood of a long busy period in advance of observing one.

Functional setup
The framework for analysis used in this paper was first developed for weak convergence of probability measures [19,4,23], and subsequently used in the context of sample path large deviations, e.g. [7,17,25,14,13,10]. In particular, Ganesh and O'Connell [14] employed this setup to establish an infinite time horizon version of Anantharam's result [1], proving that the most likely path to exceed a high level for a random walk with negative drift is piecewise linear on the scale of large deviations.
Let C[0, ∞) denote the collection of real-valued continuous functions on [0, ∞). Let A[0, ∞) denote the collection of the integrals of functions that are elements of L 1 [0, x) for all x > 0 (for example, see Riesz and Sz.-Nagy [21]). For each r ∈ R, define the space and equip it with the topology induced by the norm Define the polygonal sample paths The sample path process {S n (·)} is known to satisfy the LDP in Y r for a broad class of nonlong-range dependent, non-heavy-tailed random walks. For example, see [14,Theorem 1], where [5, Theorem 2] provides general mixing and uniform tail exponent conditions under which the prerequisites of this theorem hold. This encompasses increment processes that are Harris recurrent Markov chains, subject to a Foster-Lyapunov drift condition [15]. The existence of such an LDP will be the primary assumption in our proof of eq. (3).

Tail asymptotics
We shall make two assumptions. The first is the existence of a sample path LDP for the random walk that ensures the walk has negative drift, but a possibility of becoming positive on the scale of large deviations.

Assumption 1
The sample path process {S n (·)} satisfies the LDP in Y −δ , some δ > 0, with rate function where I is the good, strictly convex rate function associated with the random walk {S n (1)} and I(x) < ∞ for some x > 0.
Our identification of the most likely path to a large swept area will have a surprising relationship with the most likely path to exceed a high level, so we recall the following result of Ganesh and O'Connell.

Theorem 1 ([14]
) Under Assumption 1, the following exist and are non-negative Moreover, for any h > 0, while the most likely path to this event is Note that the probability of sweeping a large area, eq. (3), decays on a slower scale than the probability of hitting a high height, eq. (9).
For the supremum of a random walk, λ * determines the rate of decay of the probability of hitting a high level as shown in eq. (9). For busy periods, it will play a new and surprising rôle in the characterization of the most likely path to a large swept area. The inverse of ∇I will prove central to the development that follows. To that end, our second assumption is the following regularity condition.

Assumption 2
The rate function I is continuously differentiable on an interval that contains This assumption justifies the definition, The inverse exists, so that Ξ is finite-valued, on an interval that contains That λ * is significant here stems from the second part of the following lemma.

Lemma 1
The scalar λ * defined in eq. (8) satisfies Moreover, it is the unique positive solution of Proof: The identity ∇I(x * ) = I(x * )/x * follows from the first-order optimality condition for x * , based on its definition in eq. (8).
Clearly λ * is positive as x * is positive and I(x) > 0 for all x > −δ. To see that λ * thus defined is a solution of eq. (12), direct substitution, change of variables and integration by parts suffices. To see it is unique, note that it is equivalently characterized as any positive solution of As I is strictly convex, Ξ(x) is strictly increasing when finite. At x = 0 we have that Ξ(0) = −δ < 0, so this equation only has one positive solution.
We will use λ * to define the most likely path to sweep an area over [0, 1]: The most likely path to sweep any other area will be a simple rescaling of this solution. This path ψ * is strictly concave on [0, 1] as I is strictly convex. On this interval it is of the form found in [12] in the analysis of simulation of queues: on differentiating each side of eq. (13), it follows that the path satisfies the simple differential equation, Note that, thus defined, we have that That is, remarkably, the most likely paths to sweeping a large area, ψ * in eq. (13), and to exceeding a large height, ϕ * in eq. (10), both start and end with identical derivatives, but are distinct in-between. Before providing the main result, we establish the following characterizations of ψ * .

Proposition 1
The following hold for t ∈ [0, 1]: (i) In contrast to eq. (13), a non-integral representation is obtained in terms of the rate function, (ii) In terms of the sCGF, and hence, A simpler expression for λ * is obtained when Λ is symmetric. This symmetry can be interpreted as asymptotic reversibility of the underlying walk.
If eq. (18) holds for all θ, then This gives eq. (20) for all x ∈ R such that I(x) < ∞.
For example, eq. (18) is satisfied if X is i.i.d., E(exp(θX 1 ) is finite in a neighborhood of the origin and P (X 1 = x)/P (X 1 = −x) = exp(−λ * x) for all x, as holds for X 1 Gaussian or Bernoulli-{−C, +C}. This condition, however, extends beyond i.i.d. increments processes and in Sec. 4 we present an example where eq. (18) is satisfied for a Markov chain X.
Armed with these assumptions, definitions and characterizations of the path ψ * , we now prove the main result.
Theorem 2 Under the above assumptions, for any b > 0, where K is given in (5). Moreover, (i) The most likely asymptotic value of τ /n leading to {B ≥ n 2 b} is (ii) The rescaled, most likely asymptotic path of S n (·) is which is strictly concave on [0, a]. Define The set F is closed as φ → max(φ, 0) is Lipschitz continuous from Y −δ → Y 0 and, as for any φ ∈ Y −δ we have φ(t) < 0 for all t sufficiently large, integration is also continuous (e.g. [24, Theorem 11.5.1]). Thus we can use the LDP upper bound to obtain lim sup As lim t→∞ φ(t)/(1 + t) = −δ < 0 for all φ ∈ Y −δ , this infimum over t is attained at some finite t. Consider this inner functional infimum for fixed t: This functional optimization problem is closely related to [12, eq. (6)], where one can identify b with z. Mild alterations to Proposition 7 therein shows that with b = a 2 1 0 ψ * (t)dt for some a > 0, with ψ * (t) defined in eq. (13), this infimum occurs at any t ≥ a for which it transpires that ψ * b , defined in eq. (21), is the optimizer. This essentially occurs as The quantity λ * in the definition of ψ * arises as a scalar Lagrange multiplier [12]. Note that ψ * b (a) = 0, and thus the most likely path to sweep a rescaled area b satisfies τ /n ≈ a = b/ 1 0 ψ * (t)dt. Using Proposition 1 we have that and thus the expression for a in the statement follows.
What remains to be shown is that there is a coincident lower bound. Let ψ * be the unique solution of eq. (13) and define ψ * b using eq. (21). The path ψ * b (t) starts at 0 ends at a, and is the optimal path that sweeps an area of where, with e(t) = (1 + t), Both B 1 and B 2 are open by construction and, for sufficiently small, their intersection is non-empty. As defined, S n (·) / ∈ B 1 as S n (0) = 0, but this is not significant as we can use an exponentially equivalent representation with S n (0) := 1/n. Thus if S n (·) ∈ B with S n (0) := 1/n, then {B > bn 2 }. As B is open for all , we can use the LDP lower bound lim inf To evaluate K, note that Using Λ(θ) = θ Ξ(θ) − I(Ξ(θ)) for θ ∈ [0, λ * ], integration by parts and the fact that ∇Λ(λ * ) = 0, we have that Inserting this into eq. (23) in conjunction with the expression for 1 0 ψ * (t) dt given above obtains the expression for K in (5).

Examples
Theorem 2 provides a mechanism for calculating the most likely path to a large busy period B as well as the exponent K. We shall perform this calculation for illustrative examples: with i.i.d Gaussian increments where λ * , ψ * and K can all be determined in closed form; with Bernoulli{−1, +C} increments, which includes M/M/1 queue lengths, where explicit expressions of λ * are not always possible, but ψ * can be written in terms of it and K must always be calculated numerically; and, finally, for increments with Markovian dependencies where λ * and ψ * can be determined in closed form, but K must be identified numerically.
For each of the examples ten billion paths were simulated. As well as recording the logarithm of the frequency with which B exceeded b as a function of b, the largest swept area and highest paths were logged for comparison with the theoretically predicted most likely paths. For comparison with observations, Theorem 2 says that given we observe B = n 2 b, on the scale of large deviations the most likely time taken to generate the area is and the most likely path is then Thus, since τ * n = an, for large swept area we have the approximation This most likely path is solely parameterized from the observations by the value B. In particular, note that given B the time τ is determined, so in the comparisons for the simulation results that follow, the length of the most likely path is not explicitly fit to data.
Similarly, eq. (3) leads to the approximation For contrast, we also record the path that reaches the highest height and use Theorem 1 for comparison. Given a path S n that reaches a height H, S n (·) reaches H/n and therefore Again, given H, the time at which this most likely path returns to zero is completely determined.

Gaussian increments
Let X be i.i.d. Gaussian, with X 0 having mean −δ < 0 and variance σ 2 . As X is i.i.d., we have that The most likely path to {B ≥ bn 2 } for large n can also be determined to be A demonstration of these results appears in Fig. 2. The highest and largest swept area paths are compared with the most likely conditioned on these quantities using the approximations in eq. (24) and eq. (26). The distinct nature of two paths to these two unlikely events is evident. The empirical likelihood of a large deviation is shown along with that from the approximation above, which gives remarkably good agreement up to a constant prefactor. If C ≥ 2, then the corresponding reflected random walk is not reversible, and moreover the increments do not satisfy the symmetry assumptions of Proposition 2. For C = 2 we obtain the expression,

Bernoulli {−1, +C} increments
Using eq. (17), for arbitrary C we conclude that the most likely path on [0, 1] can be written in terms of λ * as In order to calculate K in equation (5), we need to evaluate the integral λ * 0 Λ(θ) dθ. This doesn't result in a closed form for any C, but it is simple to evaluate numerically.
In order to determine the most likely time and paths to a large busy period and a great height on the scale of large deviations we use the approximations eq. (24) and eq. (26). For the reversible Bernoulli {−1, +1} case corresponding to M/M/1 queue-lengths, Fig. 3 compares the highest and biggest paths with those from theory. The quality of the predictions is apparent. With a numerical integration of Λ giving K ≈ 0.1485, the asymptotic approximation is compared with the empirical probability, showing great accuracy up to a constant prefactor.
As an example of a non-reversible random walk, we consider the Bernoulli {−1, +10} case where the most likely paths are now asymmetric (as seen in Fig. 4). For this example, λ * ≈ 0.1439 and K ≈ 0.0978, both of which have been determined numerically.

Markovian {−1, +1} increments
As an example beyond i.i.d. increments, assume that the increments process X forms a two-state Markov chain on the state space {−1, +1} with transition matrix The stationary distribution is (β/(α + β), α/(α + β)) so we require α < β for stability. The sCGF Λ can be calculated using techniques described in [6, Section 3.1]: which satisfies the conditions of Proposition 2 and so the most likely path is symmetric. The rate function for the associated random walk can be calculated using methods described in [7]. For example, from [9] we have that where One can check directly that We have the following expression of the most likely path returning to 0 at t = 1, which can be seen directly to possess the symmetry ψ The integral λ * 0 Λ(θ) dθ does not evaluate in closed form, so again numerics must be used to determine K. For example, if α = 2/10 and β = 3/10, then δ = 2/10, x * = 2/10, λ * = log(8/7) and K ≈ 0.0489. A simulation-based illustration of these results appears in

Conclusions and discussion on estimation
As well as identifying the most likely path to a large swept area, Theorem 2 shows that in the absence of long range dependence and heavy tailed increments, in broad generality we have the approximation in eq. (25): where K can be identified in terms of the sCGF Λ associated with the increments X. An approximation such as this might be of value for practical purposes, but unless X is known in advance we would require a methodology to estimate K from observations of the system.
Based on thermodynamic ideas, Duffield et al. [8] investigated an estimation scheme for Λ based on observations of X. They demonstrated empirically that the scheme has desirable properties for a large class of of increment processes. Indeed, if X consists of i.i.d. bounded random variables [9, Theorem 1] or a finite state Markov chain [11,Theorem 3], from observations of X one can construct consistent functional estimates, {Λ n }, of Λ that themselves satisfy a LDP in the space of R-valued convex functions on R. From these, we can deduce an LDP for estimating K from observations of X as follows. If X is i.i.d, define and let D θ denote the matrix with diagonal entries exp(θf (1)), . . . , exp(θf (M )) and all offdiagonal entries equal to zero. Then our estimate of Λ given n observations is Λ n (θ) = log ρ(P n D θ ), where where ρ is the spectral radius. In both cases we define the estimates λ * n = sup{θ : Λ n (θ) ≤ 0} and K n = 2 − λ * n 0 Λ n (θ) dθ.
Regarding the λ * estimates, [ c(θ)dθ is also continuous so that again Puhalskii's extension of the contraction principle applies and we have an LDP for {K n }. Consistency of the Λ estimators ensures consistency of the λ * and K estimators. Thus these Λ-estimators enable the estimation of K directly from observations of X, offering the possibility of estimating the probability of a system experiencing an exceedingly large busy period in advance of one occurring.