Vertices of the least concave majorant of Brownian motion with parabolic drift

It was shown in Groeneboom (1983) that the least concave majorant of one-sided Brownian motion without drift can be characterized by a jump process with independent increments, which is the inverse of the process of slopes of the least concave majorant. This result can be used to prove the result of Sparre Andersen (1954) that the number of vertices of the smallest concave majorant of the empirical distribution function of a sample of size n from the uniform distribution on [0,1] is asymptotically normal, with an asymptotic expectation and variance which are both of order log n. A similar (Markovian) inverse jump process was introduced in Groeneboom (1989), in an analysis of the least concave majorant of two-sided Brownian motion with a parabolic drift. This process is quite different from the process for one-sided Brownian motion without drift: the number of vertices in a (corresponding slopes) interval has an expectation proportional to the length of the interval and the variance of the number of vertices in such an interval is about half the size of the expectation, if the length of the interval tends to infinity. We prove an asymptotic normality result for the number of vertices in an increasing interval, which translates into a corresponding result for the least concave majorant of an empirical distribution function of a sample of size n, generated by a strictly concave distribution function. In this case the number of vertices is of order cube root n, and the variance is again about half the size of the asymptotic expectation. As a side result we obtain some interesting relations between the first moments of the number of vertices, the square of the location of the maximum of Brownian motion minus a parabola, the value of the maximum itself, the squared slope of the least concave majorant at zero, and the value of the least concave majorant at zero.


Introduction
It was shown in [3] that one-sided Brownian motion can be generated by a jump process with independent increments (which is the inverse of the process of slopes of the least concave majorant) together with Brownian excursions between successive vertices of the least concave majorant. This decomposition of Brownian motion, using the inverse process, was also analyzed in [17], where certain path decomposition results, introduced by David Williams, were applied.
The study of the (least) concave majorant of Brownian motion in [3] was actually motivated by the wish to give an alternative derivation of the asymptotic distribution for certain statistics, studied in [1] and [19], and first proved in [10], where the asymptotic distribution was found by analyzing the spacings, induced by the least concave majorant of the empirical distribution function.
As a side effect, [3] also threw some new light on a result of [20], which is stated below. A straightforward proof of this result, using characteristic functions and the Poisson representation in [10], is given in [9]. The corresponding result for the Brownian bridge on [0, 1], which follows from [3], is: If one studies more closely "where the action is", in the sense that the number of vertices increases to infinity, it turns out that all the action is near 0 and 1: on an interval [ε, 1 − ε], where ε > 0, there will, with probability one, only be finitely many vertices.
The situation is strikingly different for the least concave majorant of two-sided Brownian motion minus a parabola. Here the action is "the same everywhere", and the point process of locations of vertices is stationary. The real purpose of the papers [4] and [6] was to analyze the (stationary) process where V (a) = argmax t∈R W (t) − (t − a) 2 , a ∈ R, and W is standard two-sided Brownian motion, originating from zero. The process V itself is a pure jump process, which runs through the locations of the vertices of the least concave majorant. At points where V (a) is not uniquely determined (which happens if a is a slope of the least concave majorant), we take the largest value t at which W (t) − (t − a) 2 is maximal. In this way the process V becomes right-continuous. The infinitesimal generator of the process (1.1) is given in Theorem 4.1 of [6], where it is expressed in terms of Airy functions. However, most attention has been for the result on the distribution of V (0), which gave an analytic expression for the limit distribution of a whole class of so-called "isotonic estimators", for example the pointwise limit distribution of an estimator of the mode, discussed in [2], and the pointwise limit behavior of the Grenander (maximum likelihood) estimator of a decreasing density, see, e.g., [18] and [4]. For results on global functionals, however, like the L 1 or L 2 distance of the Grenander estimator to the underlying density, or the number of its jumps, one needs information on the whole process V , and not only on its pointwise behavior. In this paper we will show how one can extract information from Theorem 4.1 in [6] in the derivation of a central limit theorem for the number of points jump of V in an increasing interval. The result has a rather large number of applications in statistics, but we will only sketch one such result for the number of points of jump of the Grenander estimator (which is equivalent to the corresponding result for the number of vertices of the least concave majorant).
Our main result is the following central limit theorem, which is proved at the end of section 2.
Perhaps somewhat remarkably, the difference between the results for the least concave majorants of one-sided Brownian motion without drift and two-sided Brownian motion with a parabolic drift has its counterpart in the difference between the convex hulls of uniform samples of points from the interior a convex polygon and from the interior of a convex figure with a smooth boundary, see [5]. In this case one also meets the rates log n and n 1/3 for the number of vertices of the convex hulls of the samples, with corresponding central limit results.

The number of jumps of the process V in an increasing interval
Although [6] has the simpler conceptual characterization, the characterization in [4] might be more useful for numerical computations. It was also used in [11], where Chernoff's density and its moments were computed. Chernoff's density is the density of V (0), which often occurs as limit of isotonic estimators and in particular in the limit distribution of the Grenander estimator. Here, however, we take [6] as our starting point.
The process (1.1) is completely characterized by Theorem 4.1 of [6]. As a corollary we have the following result for the jump measure.
where g has Fourier transform

2)
and where p 0 has Laplace transformp Remark 2.1 Note that we define the Fourier transform in the "probabilistic way", in analogy with the definition of the characteristic function of a probability distribution.
A picture of the function g, using the representation which follows from (2.2), is shown in Figure 2.
The function p has the representation where theã n are the zeros of the Airy function Ai on the negative halfline, see (4.12) in [6]. This expansion is divergent at zero, however, where we have: see part (ii) of Lemma 4.2 in [6]. This is the reason for considering the regularization and for only using the representation (2.3) for u ≥ 1. If u < 1 we use the representation given below in (2.11) of Lemma 2.2. The function u → u 3/2 p(u) is shown in Figure 3.
For later purposes we summarize in the following lemma some properties of functions p and g and the random variable V (0). This also gives still another regularization of the function p. (i) The function p 1 , defined by has Fourier transformp has Fourier transformĥ iii) The random variable V (0) has characteristic function , t ∈ R. (2.9) (iv) The random variable V (0) has expectation zero and second moment (2.10) Proof. Part (i): this immediately follows from Theorem 2.1, noting that the function x → 1/ √ 2πx has Laplace transform 1/ √ 2u and by switching from Laplace transform to Fourier transform. Part (ii): this follows from (i) and Theorem 2.1, by noting that the convolution turns into the product of the Fourier transforms (with an added change of sign for the Fourier transform of p 1 ). Part (iii): this is the Fourier transform of the function which turns into the convolution of the Fourier transform of x → g(−x) and Fourier transform of Part (iv) follows from the formula  Remark 2.2 The function p does not have the same meaning in [4]. If we denote the function p of [4] byp, we have: The functionp has an expansion which converges at zero.

Lemma 2.2
The function p can be written (2.11) and the coefficients a k and b k are recursively defined as follows. Set c 0 = 1 and Then with a 0 = 1, b 1 = 2/3, and B(p, q) ≡ Γ(p)Γ(q)/Γ(p + q), the standard Beta function, set Proof. This follows from Remark 2.2 and Theorem 4.2 in [4]. 2 Let u 2 : R → R be defined by here Ai is the derivative of the Airy function Ai. The notation u 2 is used because u 2 has the interpretation where u(t, x) is the solution of the heat equation This function occurred in the paper [2], where the density of the location of the maximum of Brownian motion minus a parabola was first characterized.
The following result summarizes the correspondence between the results in [4] and [6].
where u 2 is defined by (2.14) and (2.15) andp by (2.11). (ii) where the functions g and p are defined as in Theorem 2.1.
(iii) The density of V (0) is given by where g is defined as in Theorem 2.1.

Remark 2.3
In [11] part (iii) of Theorem 2.2 was used in the computation of the Chernoff distribution. The function u 2 corresponds to the function k 1 in [6] and part (i) of of Theorem 2.2 corresponds to the first version of the infinitesimal generator of the process V (a) − a, given in Theorem 4.1 of that paper. It is seen from Theorem 2.2 and (2.14) that the function p (or alternatively, the regularizatioñ p) is the fundamental function; both the jump measure and the density of V (0) are expressed in terms of p.
We are now ready to compute the expectation of the number of jumps of the process V in an interval (of slopes) [a, b], where we use similar techniques as in [5], which dealt with convex hulls of samples of points from the interior of a convex set in the plane.
Let the function φ be defined by is a martingale w.r.t. the filtration, generated by V (b), b ≥ a. As a consequence, we have the following result.
Proof. We get from Theorem 2.1: Using the stationarity of the process and the result follows. The constant k 1 was determined numerically by using Theorem 2.2. 2 Remark 2.4 As one of the referees remarks, Fourier analysis, applied on the right-hand side of (2.18), gives: where W is standard two-sided Brownian motion, originating from zero, and V (0) is defined as in (1.1), for a = 0. This follows from Lemma 2.1, since we get by Parseval's formula: where we use integration by parts and the Airy equation for the term involving Ai as in the proof of part (iv) of Lemma 2.1. This also gives an interesting relation between the moments of the location of the maximum and moments of the maximum itself. By [13] we get: (1.7) and (2.5) of their paper, which is in accordance with the value, given in Lemma 2.3. The integral representation for the maximum of W (t) − t 2 of type (2.19) above corresponds to (2.1) in their paper (after replacing W (t) − t 2 by W (t) − 1 2 t 2 , see also Remark 2.5 below). The value EV (0) 2 was computed in [11], where it is given by 0.26355964 (note that this is also given in part (iv) of Lemma 2.1), and this gives k 1 = 2.10848 again. For convenience, we state this in a separate lemma.
which implies that, if we define k 1 (1) = k 1 and k 2 (1) = k 2 , and denote the corresponding constants for the process t → W (t) − ct 2 by k 1 (c) and k 2 (c): Relation (2.19) changes into: Remark 2.6 As also pointed out by one of the referees, [15] represent the constant k 1 (1/2), where k 1 (1/2) is defined as in Remark 2.5, in their Corollary 4 as the sum of two constants: where X(t) = W (t) − 1 2 t 2 andX is the greatest concave majorant of X, with slopeX (0) at zero. (I switch from the convex minorants of Brownian motion plus a parabola to the concave majorants of Brownian motion minus this parabola here; this gives the same k 1 ). [15] give simulation results for X(t) = W (t) − 1 2 t 2 , which would imply that k 1 (1/2) ≈ 1.289. We get from Remark 2.5: which is larger than the value arising out of the simulations in [15], indicating that it is very hard to obtain precise values of these constants by direct simulation of Brownian motion. Denoting the least concave majorant of the process t → W (t) − ct 2 byX c , and using a notation similar to the notation of Remark 2.5, we would get the relation which indeed is compatible with the relation:

Remark 2.7
There exists a simple relation between V c (0) andX c (0), where V c is defined as in Remark 2.5 andX c (0) is defined as in Remark 2.6. This follows from the so-called "switch relation": we get thatX c (0) and 2cV c (0) have the same distribution, and hence: Combining the preceding remarks, and in particular assuming that (2.23) (or (2.22)) holds, we obtain: The asymptotic behavior of the functions p and φ is given in the following lemma.
Lemma 2. 5 We have: whereã 1 is the first zero of the Airy function Ai on the negative halfline.
(iii). The function g is denoted by g 1 in [6], and hence, according to part (i) of Corollary 3.4, [6]: where theã n are the zeros of the Airy function on the negative halfline. We now have, using part (i) of Corollary 3.4 and part (ii) of Lemma 4.2, [6], if t < 0, and the first part of (iii) now follows. Using Laplace's method and part (ii) of Corollary 3.4, [6], we find: Proof. We have: and hence: Moreover, using an obvious time reversal argument, also used in [5], we get: Note: We prove in the sequel that the dependence between φ(−V (c 1 ) + c 1 ) and φ (V (c 2 ) − c 2 ) dies out exponentially fast, as c 2 − c 1 → ∞, which, together with part (iii) of Lemma 2.5, gives that the covariance of φ(−V (c 1 ) + c 1 ) and φ (V (c 2 ) − c 2 ) dies out exponentially fast, as c 2 − c 1 → ∞. Hence we get: var for a constant k 2 ≥ 0. There are several ways in which one could try to determine the constant k 2 . One possible approach is to use an integro-differential equation to determine the constant k 2 , following a suggestion on p. 546 of [4]. Let the function k(a, t) be defined by k(a, t) = E φ(V (0)) V (a) = t , a ≤ 0.
Then we have, for a < 0, Hence we get: which leads to the integral equation Using this approach, the constant k 2 was approximated numerically, on a grid with stepsize 10 −3 in both coordinates on the interval [−10, 10], using the boundary condition k(0, t) = φ(t), t ∈ R, and replacing integrals by Riemann sums. In this way we obtained: which would give: k 2 ≈ 0.986. However, since the numerical computations seemed somewhat unstable, we have more trust in the value obtained by simulating the vertex process directly, without first generating Brownian motion, in the way described below. One could also try to determine an approximate value of the constant k 2 by simulating Brownian motion directly. However, since one needs very long intervals (or, alternatively, a rescaling which also leads to very computation-intensive simulation), it is doubtful that we get a good approximation in this way. See also the discussion in Remark 2.6 on the constant k 1 (1/2), obtained by simulating Brownian motion directly in [15], which gave the value 1.289, while the analytically determined value is 1.32826.
We can use Theorem 2.1 to generate the process {V (a) : a ∈ R} without first generating Brownian motion. This method of generating the vertices was also used in [16] and [7], for generating the vertices of convex hulls of Poisson processes of points in the plane, and seemed to work rather well in that situation.
We start the process at time zero, by generating V (0) according to the "Chernoffian" distribution f V (0) , given by: where g is defined as in Theorem 2.1. Suppose this gives V (0) = x. Next we generate the waiting time until a jump according to the distribution function where φ(u) is the integrated jump measure, starting from position u. Suppose that this gives the jump time a > 0. Then we generate a jump according to the jump density This gives a new position y from which we generate a waiting time, according to the distribution function which gives a new jump time b > a, from which we generate a jump according to the jump density and so on. Defining We indeed used this procedure to generate the process V . Instead of the jumps lengths themselves we generated the square roots of the jump lengths, which have a bounded density, in contrast with the jump lengths, which have a density which is unbounded near zero. This enabled us to generate the square roots of the jump lengths by rejection sampling, since we can compute the density from the theory above (but note that this density depends on the value of x − a, so we get a family of densities, parametrized by x − a).
The waiting times between jumps can be generated using the following observations. Note that, for a uniform random variable U : where W is standard exponentially distributed and Φ −1 x is the inverse of the function Hence the waiting times between jumps can be generated by generating the random variables Φ −1 x−a (W ), where W has a standard exponential distribution; Φ −1 x−a was computed on a equidistant grid, with distance 10 −3 between successive points of the grid, and with linear interpolation between points of the grid. In this way we found in 10, 000 simulations, where a ran through the interval [0, 10 4 ]: k 1 ≈ 2.1082 (note that this is very close to the analytically determined value 2.10484) and k 2 ≈ 1.029.
The alternative characterization of the jump process, used in the simulations, is given in the following theorem.

Theorem 2.4
The process {V (a) : a ∈ R} is a Markovian pure jump process, where the jump density at time a is given by given V (a−) = x, and where the distribution function of the waiting time till the next jump is given by given V (a) = x.
Remark 2.8 By part (iii) of Lemma 2.5, we have: This yields, for fixed x, a ∈ R, This is in accordance with: see Corollary 3.4, part (iii), in [6].
We summarize our findings on the variance in the following lemma. where and φ is defined by (2.16). The value of the constant k 2 was determined by simulating the vertex process directly in the way described above, using Theorem 2.4, by 10 4 runs on the interval [0, 10 4 ].
For the central limit result, we also need the following lemma.

Lemma 2.7
The process V (a) : a ∈ R} is strongly mixing with strong mixing function for a constant c > 0. More specifically, for arbitrary a ∈ R we have: Proof. The proof proceeds along similar lines as the proof of Theorem 3.3 in [8]. Consider, for a 1 , . . . , a k ≤ a and b ≥ . . . b 1 ≥ a + d, the events for Borel sets A 1 , . . . , A k and B 1 , . . . , B . Define and consider the events By monotonicity, the event E 1 only depends on the increments of Brownian motion before time a + M , and the event E 2 only depends on the increments of Brownian motion after time a + d − M . By the definition of M and the independent increments property of Brownian motion, this implies that the events E 1 and E 2 are independent, and hence Furthermore, by Corollary 3.4 of [6] we get: where, as before, Ai is the Airy function Ai andã 1 its largest zero on the negative halfline. The probability P{E 2 = E 2 } can be handled in a similar way. Hence we get: for a constant c > 0, and the result follows. 2 The following lemma shows that all moments of N [a, b] exist, for all b > a.

Lemma 2.8 We have:
for all l > 0 and all b > a.
Proof. By the stationarity it is sufficient to prove this for N [0, a]. For l > 0, the process is a martingale w.r.t. the filtration {F a : a ≥ 0}, where for each a ≥ 0. Moreover, since, according to part (iii) of Lemma 2.5, for all positive a and l, since, by part (iii) of Corollary 3.4 of [6], We are now ready to prove our main result.

The jumps of the Grenander estimator
As an application of the results in section 2 we now discuss the use of these results in deriving the asymptotic normality of the number of jumps of the Grenander estimatorf n of a strictly decreasing density f 0 on [0, M ], M > 0. Note that the number of jumps of the Grenander estimator is the same as the number of segments of the least concave majorant of the empirical distribution function, which, in turn, is the same as the number of vertices of the least concave majorant minus one. To keep the length of the present paper within reasonable bounds, we only give a sketch of the proof. Full details will be given elsewhere.
We have the following result.
Lemma 3.1 Let N n the number of jumps off n , wheref n is the Grenander estimator, based on a sample of size n from f 0 , where f 0 is a decreasing continuous density which stays away from zero on its support [0, M ], with a continuous derivative f 0 , which also stays away from zero on [0, M ], where one-sided derivatives are taken at the endpoints. Then: where the constants k 1 and k 2 are defined as in Theorem 1.3, that is: Sketch of proof. Note that, as in [4], p. 542, we can introduce locally a process V n , defined by and , and g 0 is the inverse of f 0 . Here F n is the empirical df and a = f 0 (t) for an interior point t of the support of f 0 . The process V n is the (local) inverse of the slope process. As noted in [4], the process V n converges in distribution in the Skorohod topology to the process V , where V is the process of locations of maxima, discussed in section 2 (where c 1 has an extra factor 2, to obtain V (s) instead of V ( 1 2 s) in the limit). The jumps of the limiting process are a stationary locally finite point process, implying that the number of jumps of the process V n on an interval [b, c] converges in distribution to the number of jumps of V on the same interval. Hence, defining we get that the number of jumps ofṼ n on an interval [b, c] converges in distribution to the number of jumps of V on an interval of length c −1 which has, on an interval [b, c], the same number of jumps as the process U n , defined by on the interval [a + bn −1/3 , a + cn −1/3 ].
We can strengthen this argument somewhat (full details will be given elsewhere), using a Poissonization argument together with a strong approximation result of [14], to show that the expectation and variance of the number of jumps of U n on an interval [a − n −1/3 log n, a + n −1/3 log n], are of order 2k 1 c −1 1 n 1/3 log n and 2k 2 c −1 1 n 1/3 log n, respectively, where k 1 and k 2 are defined as in Theorem 1.3.
Partitioning the interval [f 0 (M ), f 0 (0)] into K n intervals of length of order 2n −1/3 log n, with midpoints a j , we get: where N (0, σ 2 ) is a normal distribution with expectation zero and variance Although the proof proceeds along similar lines as the proof of Theorem 1.3 in section 2, the embedding into Brownian motion needs some careful attention, and therefore the details of the proof will be given elsewhere.

Concluding remarks
In the preceding, a central limit result was proved for the number of vertices in an increasing interval of the concave majorant of the process {W (t)−t 2 , t ∈ R}, where W is two-sided standard Brownian motion, originating from zero. The central limit result involves two constants k 1 and k 2 for the mean and variance, respectively, see Theorem 1. Much less is known about the constant k 2 . We used a direct simulation of the vertex process to determine this constant, but there is room for improvement here. The approximate value we found is close to 1, and our preliminary value is: k 2 = 1.029. The basis for the simulation of the vertex process directly, without first generating Brownian motion, is given in Theorem 2.4, which gives the distribution of the (non-exponential) waiting times between jumps of the process V , as a function of a and x, where V (a) = x, and the density of the size of the jumps. The square roots of the jump lengths were generated by rejection sampling, and the waiting times between jumps by generating standard exponential random variables, and by applying the inverse of the cumulative hazard function of the waiting time distribution (again parametrized by x and a) on these.
A similar technique of generating vertices of a convex hull was used in [16] and [7], where convex hulls of random points in the plane were studied. The behavior of the least concave majorant of Brownian motion minus a parabola has some remarkable analogies with the behavior of the convex hulls of points chosen uniformly from the interior of a circle, where we also get central limit theorems for the number of vertices, with an expectation and variance which are also both of order n 1/3 , if n is the number of points chosen. On the other hand, the behavior of the concave majorant of one-sided Brownian motion without drift has analogies with the behavior of the convex hulls of points drawn uniformly from a convex polygon, where we get central limit theorems for the number of vertices with an asymptotic expectation and variance which are both of order log n, if n is the number of points chosen, as shown in [5].