Distribution of the supremum location of stationary processes

The location of the unique supremum of a stationary process on an interval does not need to be uniformly distributed over that interval. We describe all possible distributions of the supremum location for a broad class of such stationary processes. We show that, in the strongly mixing case, this distribution does tend to the uniform in a certain sense as the length of the interval increases to infinity.


Introduction
Let X = (X(t), t ∈ R) be a sample continuous stationary process. Even if, on an event of probability 1, the supremum of the process over a compact interval [0, T ] is attained at a unique point, this point does not have to be uniformly distributed over that interval, as is known since Leadbetter et al. (1983). However, its distribution still has to be absolutely continuous in the interior of the interval, and the density has to satisfy very specific general constraints, as was shown in a recent paper Samorodnitsky and Shen (2011).
In this paper we give a complete description of the family of possible densities of the supremum location for a large class of sample continuous stationary processes. The necessary conditions on these densities follow by combining certain general results cited above, and for every function satisfying these necessary conditions we construct a stationary process of the required type for which this function is the density of the supremum location. This is done in Section 3, which is preceded by Section 2 in which we describe the class of stationary processes we are considering and quote the results from Samorodnitsky and Shen (2011) we need in the present paper.
Next, we show that for a large class of stationary processes, under a certain strong mixing assumption, the distribution of the supremum location does converge to the uniformity for very long intervals, and it does it in a strong sense. This is shown in Section 4.

Preliminaries
For most of this paper X = (X(t), t ∈ R) is a stationary process with continuous sample paths, defined on a probability space Ω, F, P , but in Section 4 we will allow upper semi-continuous sample paths. In most of the paper (but not in Section 4) we will also impose two assumptions on the process, which we now state.
For T > 0 we denote by X * (T ) = sup 0≤t≤T X(t), the largest value of the process in the interval [0, T ].
It is easy to check that the probability in Assumption U T is well defined. Under the assumption, the supremum over interval [0, T ] is uniquely achieved.
The second assumption on a stationary process deals with the fluctuations of its sample paths.
Assumption L: K := lim ε↓0 P X has a local maximum in (0, ε) ε < ∞ , with the limit easily shown to exist. Under Assumption L the process X has sample paths of locally bounded variation; see Lemma 2.2 in Samorodnitsky and Shen (2011).
For a compact interval [a, b], we will denote by τ X, [a,b] = min t ∈ [a, b] : X(t) = sup a≤s≤b X(s) the leftmost location of the supremum in the interval; it is a well defined random variable. If the supremum is unique, the adjective "leftmost" is, clearly, redundant. For a = 0, we will abbreviate τ X, [0,b] to τ X,b , and use the same abbreviation in similar situations in the sequel.
We denote by F X, [a,b] the law of τ X, [a,b] ; it is a probability measure on the interval [a, b]. It was proved in Samorodnitsky and Shen (2011) that for any T > 0 the probability measure F X,T is absolutely continuous in the interior of the interval [0, T ], and density can be chosen to be right continuous and have left limits; we call this version of the density f X, [a,b] . This version of the density satisfies a universal upper bound We will also use the following result from the above reference.

Processes satisfying Assumption L
In this section we prove our main theorem, giving a full description of possible càdlàg densities f X,T for continuous stationary processes satisfying Assumption U T and Assumption L.
Theorem 3.1. Let X = (X(t), t ∈ R) be a stationary sample continuous process, satisfying Assumption U T and Assumption L. Then the restriction of the law F X,T of the unique location of the supremum of the process in [0, T ] to the interior (0, T ) of the interval is absolutely continuous. The density f X,T has a càdlàg version with the following properties: (a) The density has a bounded variation on (0, T ), hence the limits exist and are finite. Furthermore, The density is bounded away from zero. That is, Moreover, if f is a nonnegative càdlàg function satisfying (a)-(c) above, then there is a stationary sample continuous process X, satisfying Assumption U T and Assumption L, such that f is the density in the interior (0, T ) of the unique location of the supremum of the process in [0, T ].
Proof. The existence of a càdlàg density with properties (a)-(c) in the statement of the theorem is an immediate consequence of the statements of Theorems 3.1 and 3.3 in Samorodnitsky and Shen (2011). We proceed to show the converse part of the theorem. If f X,T (t) = 1/T for all 0 < t < T , then a required example is provided by a single wave periodic stationary Gaussian process with period T , so we need only to consider the second possibility in property (c). We start with the case where the candidate density f is a piecewise constant function of a special form.
We call a finite collection (u i , v i ), i = 1, . . . , m of nonempty open subintervals of (0, T ) a proper collection of blocks if for any i, j = 1, . . . , m there are only 3 possibilities: either with constructing a stationary process as required in the theorem when the candidate density f satisfies requirements (a)-(c) of the theorem and has the form for some proper collection of blocks, with the obvious convention at the endpoints 0 and T , for some H > 1. Observe that for functions of the type We will construct a stationary process by a uniform shift of a periodic deterministic function over its period. Now, however, the period will be equal to HT > T . We start, therefore, by defining a deterministic continuous We set x(0) = 2. Using the blocks of the first component we will define the function x on the interval (0, L 1 ] in such a way that x(L 1 ) = 2. Next, using the blocks of the second component we will define the function x on the interval (L 1 , L 1 + L 2 ] in such a way that x(L 1 + L 2 ) = 2, etc. Since this construction will terminate with a function x constructed on the entire interval [0, HT ] with x(HT ) = 2 = x(0), as desired.
We proceed, therefore, with defining the function x on an interval of length L j using the blocks of the jth component. For notational simplicity we will take j = 1 and define x on the interval [0, L 1 ] using the blocks of the first component. The construction is slightly different depending on whether or not the component has a central block, whether or not it has any left blocks, and whether or not it has any right blocks. If the component has l ≥ 1 left blocks, we will denote them by (0, v j ), j = 1, . . . , l. If the component has r ≥ 1 right blocks, we will denote them by (u j , T ), j = 1, . . . , r. If the component has a central block, we will denote it by (u, v). We will construct the function x by defining it first on a finite number of special points and then filling in the gaps in a piecewise linear manner.
Suppose first that the component has a central block, some left blocks and some right blocks. In this case we proceed as follows.
Step 1 Recall that x(0) = 2 and set Note that the last point obtained in this step is x ld + l i=1 v i = 1.
Step 2 Set Step 3 Set Note that the last point obtained in this step is Step 4 We add just one more point at distance d from the last point of the previous step by setting Note that this point coincides with L 1 as defined in (3.5).

If the component has no left blocks, then
Step 1 above is skipped, and Step 2 becomes the initial step with If the component has no right blocks, then Step 3 above is skipped, and at Step 4 we add the distance d to the final point of Step 2, that is we set If the component has no central block, then Step 2 is skipped, but we do add the distance T to the last point of Step 1. That is, the first point obtained at Step 3 becomes once again with the obvious change if l = 0. It is easy to check that in any case Step 4 sets x(L 1 ) = 2, with L 1 as defined in (3.5). In particular, In any case, if the values of x at two adjacent points are different, we define the values of x between these two points by linear interpolation. If the values of x at two adjacent points, say, a and b with a < b, are equal to, say, y we define the function x between these two points by provided the value at the midpoint, y −(b−a)/2d ≥ −1. If this lower bounds fails, we define the values of x between the points a + dy and b − dy by for an arbitrary τ > 0 such that both τ ≤ 1/d and the value at the midpoint, The reason for this slightly cumbersome definition is the need to ensure that x is nowhere constant, while keeping the lower bound of x and its Lipschitz constant under control. We note, at this point, that, since in all cases b − a ≤ T + d, we can choose, for a fixed T , the value of τ so that τ ≥ τ d > 0, where the constant τ d stays bounded away from zero for d in a compact interval. Now that we have defined a periodic function (x(t), t ∈ R) with period HT , we define a stationary process X by is uniformly distributed between 0 and HT . The process is, clearly, sample continuous and satisfies Assumption L. We observe, further, that, if the supremum in the interval [0, T ] is achieved in the interior of the interval, then it is achieved at a local maximum of the function x. If the value at the local maximum is equal to 2, then it is due to an endpoint of a component, and, since the contribution of any component has length exceeding T , this supremum is unique. If the value at the local maximum is smaller than 2, then that local maximum is separated from the nearest local maximum with the same value of x by at least the distance induced by Step 2, which T . Consequently, in this case the supremum over [0, T ] is unique as well.
Similarly, if the supremum is achieved at one of the endpoints of the interval, it has to be unique as well, on a set of probability 1. Therefore, the process X satisfies Assumption U T .
We now show that for the process X constructed above, the density f X,T coincides with the function f given in (3.3), with which the construction was performed. According to the above analysis, we need to account for the contribution of each local maximum of the function x over its period to the density f X,T . The local maxima may appear in Step 1 of the construction, and then they are due to left blocks. They may apear in Step 3 of the construction, and then they are due to right blocks. They may appear Step 2 of the construction, and then they are due to central blocks. Finally, the points where x has value 2 are always local maxima. We will see that they are due to base blocks. We start with the latter local maxima. Clearly, each such local maximum is, by periodicity, equal to one of the B values, and the global maximum is then located at the point i j=1 L j − HT + U . Therefore, the contribution of each such local maximum to the density is 1/HT at each 0 < t < T , and overall the points where x has value 2 contribute to f X,T Next, we consider the contribution to f X,T of the local maxima due to left blocks. For simplicity of notation we consider only the left blocks in the first component. Then the local maximum due to the jth left block is at As before, we need to check over what interval of the values of U this local maximum becomes the global maximum of X over i=1 v i is shifted to cover the origin, and this corresponds to an interval of length v j of the values of U . The shifted local maximum itself will then be located within the interval (0, v j ), which contributes 1/HT at each 0 < t < v j . Overall, the local maxima due to left blocks contribute to Similarly, the local maxima due to right blocks contribute to f X,T Finally, we consider the central blocks. If the first component has a central block, then the local maximum due to the central block is at the point Any value of U that makes this local maximum the global maximum over [0, T ] must be such that the time interval shifted to cover the origin. Furthermore, that value of U must also be such that the time interval (l + 1)d + shifted to cover the right endpoint T . If we think of shifting the origin instead of shifting x, the origin will have to be This corresponds to a set of values of U of measure v − u, and the shifted local maximum will then be located within the interval (u, v), which contributes 1/HT at each u < t < v to the density. Overall, the local maxima due to central blocks contribute to f X,T we conclude by (3.6) -(3.9) that f X,T indeed coincides with the function f given in (3.3). Therefore, we have proved the converse part of the theorem in the case when the candidate density f is of the form (3.3).
We now prove the converse part of the theorem for a general f with properties (a)-(c) in the statement of the theorem. Recall that we need only to treat the second possibility in property (c). In order to construct a stationary process X for which f X,T = f , we will approximate the candidate density f by functions of the form (3.3). Since we will need to deal with convergence of a sequence of continuous stationary processes we have just constructed in the case when the candidate density is of the form (3.3), we record, at this point, several properties of the stationary periodic process Property 1 The process X is uniformly bounded: −1 ≤ X(t) ≤ 2 for all t ∈ R.

Property 2
The process X is Lipschitz continuous, and its Lipschitz constant does not exceed 3/2d.

Property 3
The process X is differentiable except at countably many points, at which X has left and right derivatives. On the set D 0 = {t : where τ d > 0 stays bounded away from zero for d in a compact interval.
Property 4 The distance between any two local maxima of X cannot be smaller than d. At its local maxima, X takes values in a finite set of at most N + 3 elements. Moreover, the absolute difference in the values of the process X in two local maxima in the interval (0, T ) is at least 2 −N , where N is as above.
All these properties follow from the corresponding properties of the function x by considering the possible configuration of the blocks in a component.
We will now construct a sequence of approximations to a candidate density f as above. Let n = 1, 2, . . .. It follows from the general properties of càdlàg functions (see e.g. Billingsley (1999)) that there is a finite partition We define a piecewise constant functionf n on (0, T ) by setting, for each i = 1, . . . , k, the value off n for t i−1 ≤ t < t i to bẽ By definition and (3.10) we see that Next, we notice that for every i = 1, . . . , k − 1 there are points s i ∈ (t i−1 , t i ) We now define Clearly, the function f n is càdlàg, has bounded variation on (0, T ) and is bounded away from zero. By (3.12), f n also satisfies (3.1) since f does. Finally, since T 0 f X,T (t) dt < 1, we see by (3.11) that, for all n large enough, T 0 f X,T (t) dt < 1 as well. Therefore, for such n the function f n has properties (a)-(c) in the statement of the theorem, and in the sequel we will only consider n large as above. We finally notice that f n takes finitely many different values, all of which are in the set {j/knT, j = 1, 2, . . .}. Therefore, f n can be written in the form (3.3), with H = kn. Indeed, the blocks can be built by combining into a block all neighboring intervals where the value of f n is the smallest, subtracting 1/knT from the value of f n in the constructed block and iterating the procedure.
We have already proved that for any function of the type (3.3) there is a stationary process required in the statement of the theorem. Recall that a construction of this stationary process depends on assignment of blocks in a proper collection to components, and we would like to make sure that no component has "too many" left or right blocks. To achieve this, we need to distribute the left and right blocks as evenly as possible between the components. Two observations are useful here. First of all, it follows from the definition of f n and (3.3) that for n large enough (we are writing k n instead of k to emphasize the dependence of k on n), where L n and B n are the numbers of the the left and base blocks in the nth collection. On the other hand, similar considerations tell us that once again for n large enough, where we have used property (b) of f . Therefore, for such n, , and the right hand side is a finite quantity depending on f , but not on n. Performing a similar analysis for the right blocks, and recalling that we are distributing the left and right blocks as evenly as possible between the components, we see that there is a number ∆ f ∈ (0, ∞) such that for all n large enough, no component in the nth collection has more than ∆ f left blocks or ∆ f right blocks.
We will also need bounds on the important parameter d = d n appearing in the construction of a stationary process corresponding to functions of the type (3.3); these bounds do not depend on a particular way we assigns blocks to different components. Recall that (3.14) where m n = B n + L n + R n + C n (in the obvious notation) is the total number of blocks in the nth collection. Since we see that We also know by the uniform convergence that T 0 f n → T 0 f . Therefore, by (3.14) and (3.15) we obtain that, for all n large enough, An immediate conclusion is the following fact. By construction, the distribution of X n (0) is absolutely continuous; let g n denote the right continuous version of its density. Since X n is obtained by uniform shifting of a piecewise linear periodic function with period H n T , the value of the density g n (v) at each point v times the length of the period does not exceed the total number of the linear pieces in a period divided by the smallest absolute slope of any linear piece. The former does not exceed 2m n , and by Property 3 and the above, the latter cannot be smaller than Since, by (3.16), d n is uniformly bounded from above, we conclude, for some Once again, since by (3.16), d n is uniformly bounded from below, we conclude that (3.17) g n (v) is uniformly bounded in v and n.
Let X n be the stationary process corresponding to f n constructed above.
We view X n as a random element of the space C(R) of continuous functions on R which we endow with the metric Let µ n be the law of X n on C(R), n = 1, 2, . . . (but large enough, as needed).
By Property 1 and Property 2 of the processes X n and the lower bound in (3.16), these processes are uniformly bounded and equicontinuous. Therefore, by Theorem 7.3 in Billingsley (1999), for every fixed m = 1, 2, . . Since each probability measure ν m is supported by C([−m, m]), the measure ν itself is supported by functions in C(R). By construction, the measure ν is shift invariant. If X is the canonical stochastic process defined on C(R), ν , then X is a sample continuous stationary process. In the remainder of the proof we will show that X satisfies Assumption L and Assumption U T , and that f X,T = f .
We start with proving that Assumption L holds for X. It is, clearly, enough to prove that, on a set of probability 1, For notational simplicity we will identify that subsequence with the entire sequence (X n ). By the Skorohod representation theorem (Theorem 6.7 in Billingsley (1999)), we may define the processes (X n ) on some probability space so that X n → X a.s. in C([−m, m]). Fix ω for which this convergence holds, and for which X has two local maxima closer than θ exist in the time interval [−m, m]. It straightforward to check that the uniform convergence and Property 3 above imply that for all n large enough, the processes X n will have two local maxima closer than 5θ/4. This is, of course, impossible, due to Property 4 and (3.16). The resulting contradiction proves that X satisfies Assumption L.
Next, we prove that Assumption U T holds for X. Since the process X that Assumption U T holds for X will follow once we prove that We proceed similarly to the argument in the proof of Assumption L. We may assume that X n → X a.s. in C[0, T ]. Fix ω for which this convergence holds.
The uniform convergence and Property 3 of the processes (X n ), together with the uniform upper bound on d n in (3.16), show that, for every local maximum t ω of X in the interval (0, T ) and any δ > 0, there is n(ω, δ) such that for all n > n(ω, δ), the process X n has a local maximum in the interval (t ω − δ, t ω + δ). This immediately implies that 2 ) is defined for the process X n in the same way as the random vector (M 1 , M 2 ) is defined for the process X, n = 1, 2, . . .. In particular, for any ε > 0, As a first step, notice that, by Property 4 of the processes (X n ), for any is achieved at a local maximum, and one at an endpoint ≤ c f ε , for some c f ∈ (0, ∞). Finally, we consider the case when both M In that case, it is impossible that X n has a local maximum in (0, T ), since that would force time 0 to belong to one of the decreasing linear pieces of the process due to left blocks, and time T to belong one of the increasing linear pieces of the process due to right blocks. By construction, the distance between any two points belonging to such intervals is larger than T . That forces X n (t), 0 ≤ t ≤ T to consist of at most two linear pieces. By Property 3 of the process X n , in order to achieve |X n (0) − X n (T )| ≤ ε, each block of the proper collection generating X n contributes at most an interval of length ε/ min(1/(2 ∆ d n ), τ dn ) to the set of possible shifts U . Recall that there are m n blocks in the collection.
Combining (3.20), (3.21), (3.22) and (3.23) we see that for all ε > 0 small enough, Letting ε ↓ 0 we obtain (3.19), so that the process X satisfies Assumption It is now a simple manner to finish the proof of the theorem. Assume, once again, that X n → X a.s. in C[0, T ]. Fix ω for which this convergence holds, and both X and each X n have a unique supremum in the interval [0, T ]. It follows from the uniform convergence that τ Xn,T → τ X,T as n → ∞. Therefore, we also have that τ Xn,T ⇒ τ X,T (weakly). However, by construction, f n (t) → f (t) for every 0 < t < T . This implies that f is the density of τ X,T , and the proof of the theorem is complete.

Long intervals
In spite of the broad range of possibilities for the distribution of the supremum location shown in the previous section, it turns out that, when the length of an interval becomes large, and the process satisfies a certain strong mixing assumption, uniformity of the distribution of the supremum location becomes visible at certain scales. We make this statement precise in this section.
In this section we allow a stationary process X to have upper semicontinuous, not necessarily continuous, sample paths. Moreover, we will not generally impose either Assumption U T , or Assumption L. Without Assumption U T , the supremum may not be reached at a unique point, so we will work with the leftmost supremum location defined in Section 2.
Recall that a stationary stochastic process X = (X(t), t ∈ R) is called strongly mixing (or α-mixing, or uniformly mixing) if see e.g. Rosenblatt (1962), p. 195. Sufficient conditions on the spectral density of a stationary Gaussian process that guarantee strong mixing were established in Kolmogorov and Rozanov (1960).
Let X be an upper semi-continuous stationary process. We introduce a "tail version" of the strong mixing assumption, defined as follows.
Assumption TailSM: there is a function ϕ : (0, ∞) → R such that It is clear that if a process is strongly mixing, then it also satisfies Assumption TailSM. The point of the latter assumption is that we are only interested in mixing properties of the part of the process "responsible" for its large values. For example, the process where Y is a strongly mixing process such that P (Y (0) > 1) > 0, and Z an arbitrary stationary process such that P (Z(0) < 1) = 1, does not have to be strongly mixing, but it clearly satisfies Assumption TailSM with ϕ ≡ 1.
We will impose one more assumption on the stationary processes we consider in this section. It deals with the size of the largest atom the distribution of the supremum of the process may have.
Assumption A: In Theorem 4.1 below Assumption A could be replaced by requiring Assumption U T for all T large enough. We have chosen Assumption A instead since for many important stationary stochastic processes the supremum distribution is known to be atomless anyway; see e.g. Ylvisaker (1965) for continuous Gaussian processes and Byczkowski and Samotij (1986) for certain stable processes. The following sufficient condition for Assumption A is also elementary: suppose that the process X is ergodic. If for some a ∈ R, P sup t∈[0,1] X(t) = x = 0 for all x > a and P (X(0) > a) > 0, then Assumption A is satisfied.
Theorem 4.1. Let X = (X(t), t ∈ R) be a stationary sample upper semicontinous process, satisfying Assumption TailSM and Assumption A. The density f X,T of the supremum location satisfies for every 0 < ε < 1/2. In particular, the law of τ X,T /T converges weakly to the uniform distribution on (0, 1).
Proof. It is obvious that (4.1) implies weak convergence of the law of τ X,T /T to the uniform distribution. We will, however, prove the weak convergence first, and then use it to derive (4.1).
We start with a useful claim that, while having nothing to do with any mixing by itself, will be useful for us in a subsequent application of Assumption TailSM. Let T n , d n ↑ ∞, d n /T n → 0 as n → ∞. We claim that for any δ ∈ (0, 1), To see this, simply note that by (2.1), the probability in (4.2) is bounded from above by as n → ∞.
The weak convergence stated in the theorem will follow once we prove that for any rational number r ∈ (0, 1), we have P τ X,T ≤ rT → r as T → ∞.
Let r = m/k, m, k ∈ N, m < k be such a rational number. Consider T large enough so that T > k 2 , and partition the interval [0, T ] into subintervals and observe that by (4.2), Therefore, Let ϕ be the function given in Assumption TailSM. Then where V i,T = sup t∈C i X(t)1 X(t) > ϕ( √ T ) , i = 0, 1, . . . .k − 1.
Denote by G T the distribution function of each one of the random variables V i,T , and let W i,T = G T (V i,T ), i = 0, 1, . . . , k − 1. It is clear that Notice, further, that by Assumption TailSM, for every 0 < w i < 1, i = 0, 1, . . . , k − 1, By Assumption A, D(T ) → 0 as T → ∞. Since for every 0 < w < 1, we conclude by (4.6) that the law of the random vector W 0,T , . . . , W k−1,T converges weakly, as T → ∞, to the law of a random vector (U 0 , . . . , U k−1 ) with independent standard uniform components. Since this limiting law does not charge the boundary of the set {(w 0 , w 1 , . . . , w k−1 ) : max 0≤i≤m−1 w i ≤ max m≤i≤k−1 w i }, we conclude by (4.3), (4.4) and (4.5) that P τ X,T ≤ rT → P max 0≤i≤m−1 U i ≥ max m≤i≤k−1 U i = m/k = r , and so we have established the weak convergence claim of the theorem.
We now prove the uniform convergence of the densities in (4.1). Suppose that the latter fails for some 0 < ε < 1/2. There are two possibilities.
Since t n → t * , there is a choice of 0 < τ < 1 such that (4.8) 1 + θ > 1 1 − τ and, moreover, the range in (4.7) is nonempty for all n large enough. Furthermore, we can find 0 < a < b < 1 such that 1 − (1 − τ )/t n + < a < b < min τ /t n , 1 for all n large enough. Therefore, for such n as n → ∞ by the already established weak convergence. This contradicts the choice (4.8) of τ .
The second way (4.1) can fail is that there is 0 < θ < 1, a sequence T n → ∞ and a sequence t n ∈ [ε, 1−ε] such that for every n, T n f X,Tn (t n T n ) ≤ 1−θ.
We can show that this option is impossible as well by appealing, once again, to Lemma 2.1 and using an argument nearly identical to the one described above. Therefore, (4.1) holds, and the proof of the theorem is complete.
The following corollary is an immediate conclusion of Theorem 4.1. It shows the uniformity of the limiting conditional distribution of the location of the supremum given that it belongs to a suitable subinterval of [0, T ].
Corollary 4.2. Let X = (X(t), t ∈ R) be a stationary sample upper semicontinous process, satisfying Assumption TailSM and Assumption A. Let Then lim T →∞ P τ X,T ∈ a ′ T , b ′ T τ X,T ∈ a T , b T = θ .