The Pareto Record Frontier

For iid $d$-dimensional observations $X^{(1)}, X^{(2)}, \ldots$ with independent Exponential$(1)$ coordinates, consider the boundary (relative to the closed positive orthant), or"frontier", $F_n$ of the closed Pareto record-setting (RS) region \[ \mbox{RS}_n := \{0 \leq x \in {\mathbb R}^d: x \not\prec X^{(i)}\ \mbox{for all $1 \leq i \leq n$}\} \] at time $n$, where $0 \leq x$ means that $0 \leq x_j$ for $1 \leq j \leq d$ and $x \prec y$ means that $x_j<y_j$ for $1 \leq j \leq d$. With $x_+ := \sum_{j = 1}^d x_j$, let \[ F_n^- := \min\{x_+: x \in F_n\} \quad \mbox{and} \quad F_n^+ := \max\{x_+: x \in F_n\}, \] and define the width of $F_n$ as \[ W_n := F_n^+ - F_n^-. \] We describe typical and almost sure behavior of the processes $F^+$, $F^-$, and $W$. In particular, we show that $F^+_n \sim \ln n \sim F^-_n$ almost surely and that $W_n / \ln \ln n$ converges in probability to $d - 1$; and for $d \geq 2$ we show that, almost surely, the set of limit points of the sequence $W_n / \ln \ln n$ is the interval $[d - 1, d]$. We also obtain modifications of our results that are important in connection with efficient simulation of Pareto records. Let $T_m$ denote the time that the $m$th record is set. We show that $F^+_{T_m} \sim (d! m)^{1/d} \sim F^-_{T_m}$ almost surely and that $W_{T_m} / \ln m$ converges in probability to $1 - d^{-1}$; and for $d \geq 2$ we show that, almost surely, the sequence $W_{T_m} / \ln m$ has $\liminf$ equal to $1 - d^{-1}$ and $\limsup$ equal to $1$.

This paper is mainly about the stochastic process (F n ), where F n is the boundary, or "frontier", for Pareto records (otherwise known as nondominated records or weak records; consult Definitions 1.1-1.2) in general dimension d when the observed sequence of points X (1) , X (2) , . . . are assumed (as they are throughout the paper) to be i.i.d. (independent and identically distributed) copies of a d-dimensional random vector X with independent Exponential(1) coordinates X j .
Theoretical investigation leading to the results in this paper were spurred by empirical observations whose generation is discussed briefly in Section 5 (see especially Figure 3) and in detail in [5] and began with the simple result of Theorem 1.4.
Notation: Throughout this paper we abbreviate the kth iterate of natural logarithm ln by L k and L 1 by L, and we write x + := d j=1 x j for the sum of coordinates of the d-dimensional vector x = (x 1 , . . . , x d ).
Unless otherwise specifically noted, all the results of this paper hold for any dimension d ≥ 1.
1.1. Pareto records and the record-setting region. We begin with some definitions. Write x ≺ y (respectively, x ≤ y) to mean that x j < y j (resp., x j ≤ y j ) for 1 ≤ j ≤ d. (We caution that, with this convention, ≤ is weaker than , the latter meaning "≺ or ="; indeed, (0, 0) ≤ (0, 1) but we have neither (0, 0) ≺ (0, 1) nor (0, 0) = (0, 1). This distinction will matter little in this paper, since the probability that any coordinate of an observation is repeated or vanishes is 0, but the distinction is important in [5].) The notation x ≻ y means y ≺ x, and x ≥ y means y ≤ x. Definition 1.1. (a) We say that X (k) is a (Pareto) record (or that it sets a record at time k) if X (k) ≺ X (i) for all 1 ≤ i < k.
(b) If 1 ≤ k ≤ n, we say that X (k) is a current record (or remaining record, or maximum) at time n if X (k) ≺ X (i) for all 1 ≤ i ≤ n.
(c) If 1 ≤ k ≤ n, we say that X (k) is a broken record at time n if it is a record but not a current record, that is, if X (k) ≺ X (i) for all 1 ≤ i < k but X (k) ≺ X (ℓ) for some k < ℓ ≤ n; in that case, the observation corresponding to the smallest such ℓ is said to break or kill the record X (k) .
For n ≥ 1 (or n ≥ 0, with the obvious conventions) let R n denote the number of records X (k) with 1 ≤ k ≤ n, let r n denote the number of remaining records at time n, and let β n := R n −r n denote the number of broken records. Note that R n and β n are nondecreasing in n, but the same is not true for r n . For dimension d ≥ 2, by standard consideration of concomitants [that is, by considering the d-dimensional sequence X (1) , . . . , X (n) sorted from largest to smallest value of (say) last coordinate] we see that r n (d) (that is, r n for dimension d, with similar notation used here for R n ) has, for each n, the same (univariate) distribution as R n (d−1); note, however, the same equality in distribution does not hold for the stochastic processes r(d) and R(d − 1).

Definition 1.2. (a)
The record-setting region at time n is the (random) closed set of points RS n := {x ∈ R d : 0 ≤ x ≺ X (i) for all 1 ≤ i ≤ n}.
(b) We call the (topological) boundary of RS n (relative to the closed positive orthant determined by the origin) its frontier and denote it by F n .
such that X (i) is a current record at time n}, and that the current records at time n all belong to RS n but lie on its frontier. Observe also that F n is a closed subset of RS n . Because this paper makes heavy use of the classical probabilistic notion of boundary-crossing probabilities, to avoid confusion we have chosen to use the term "frontier" for F n , rather than "boundary", in Definition 1.2(b).
1.2. The record-setting frontier. Our first result shows that deviations of the sum of coordinates for a generic current record at time n from L n are typically of constant order. Observe that the conditional distribution of X (k) + given that X (k) is a current record at time n doesn't depend on k ∈ {1, . . . , n}; in particular, it's the conditional distribution of X (n) + given that X (n) sets a record. Let Y n be a random variable with that distribution. Let G denote a random variable with the standard Gumbel distribution (i.e., distribution function x → e −e −x , x ∈ R), and write L −→ for convergence in law (i.e., in distribution) Proof. This is quite elementary. Let p n denote the probability that X (n) sets a record. Fix n ≥ 2 for the moment. For x ≻ 0 we have , and so the conditional density depends on x only through x + . It follows that the density f n (y) of Y n satisfies Using the well-known asymptotic equivalence p n ∼ n −1 (L n) d−1 /(d − 1)! as n → ∞ [see (4.5) below], it is easy to check that, for each fixed z ∈ R, the density of Y n − L n at z converges to the standard Gumbel density e −z e −e −z as n → ∞. The claimed result thus follows from Scheffé's theorem (e.g., [4,Thm. 16.12]), which shows that there is in fact convergence in total variation. This paper primarily concerns the stochastic process (F n ), and specifically its "width" as defined next (see Figure 1). Definition 1.5. Recall that F n denotes the frontier of RS n , and let F − n := min{x + : x ∈ F n } and F + n := max{x + : x ∈ F n }. (1.1) We define the width of F n as Very roughly put, what we will see in this paper is that, unlike Y n of Theorem 1.4, deviations of F + n from L n are exactly of order L 2 n; on the other hand, we will see that deviations of F − n from L n are of smaller order than L 2 n. It will follow that the width of the frontier is exactly of order L 2 n.
We next make some simple observations about the quantities appearing in Definition 1.5 that will prove fundamentally useful to our development. Lemma 1.6 (characterization of F + n ). We have Proof. The current records at time n all belong to F n , and broken records and non-records all have coordinate-sums (strictly) smaller than some current record. Thus F + n ≥ max{X (k) Let e j = (0, . . . , 0, 1, 0, . . . , 0) denote the jth coordinate vector. We claim that the points Y (j) := X (i j ) e j with j = 1, . . . , d all belong to F n (in fact, to F n ∩ RS n ), and then the inequality is immediate. To prove the claim, note that all of the points Y (j) belong to RS n [because Y (b) Over the event {r n ≥ m}, F − n is certainly at most the mth-largest sum of coordinates of remaining records, which is in turn at most B m,n .
(c) The asserted monotonicity is clear for the bounding processes. The asserted monotonicity of F − follows easily from the observation that F n+1 ⊆ RS n+1 ⊆ RS n .
It seems difficult to study the processes F + and F − bivariately, so we draw all our conclusions about the width process W by studying F + and F − univariately (that is, separately) and using W = F + − F − . The behavior of F + is well known from classical extreme value theory and is reviewed in Section 2. Conclusions about F − will be drawn from (i) the upperbounding processes in Lemma 1.7(a)-(b) together with classical extreme value theory for those bounding processes and (ii) a rather nontrivial lower bound developed in Section 3.

1.3.
Main results. We next present the main results of our paper. What the results show, in various precise senses, is that F + n and F − n both concentrate near L n, with deviations that are O(L 2 n), from which it follows of course that W n = O(L 2 n). But for d ≥ 2 we show more, namely, that L 2 n is the exact scale for W n , that is, that W n = Θ(L 2 n). We can even narrow things down further: W n / L 2 n → d − 1 in probability for each d ≥ 1, with an almost sure lim inf equal to d − 1 and an almost sure lim sup equal to d.
Here are our main results for arbitrary but fixed dimension d ≥ 1. We consider both convergence in probability (typical behavior) and almost sure largest and smallest deviations from L n (top and bottom boundary-behavior, respectively) for large n. Theorem 1.8 (Kiefer [7]). Consider the process F + defined at (1.1).
(a) Typical behavior of F + : (b) Almost sure behavior for F + : Remark 1.10. In fact, one can show rather simply from Corollary 1.9(b) and the fact that F + has nondecreasing sample paths that the set (call it Λ) of limit points of the sequence (F + n − L n)/ L 2 n is almost surely the closed Here is a sketch of the proof. The set Λ is closed, so we need only show that Λ is dense in [d − 1, d], which clearly follows if we can show that the roughly stated idea being that then (a.s.) the sequence (F + n − L n)/ L 2 n "can't leap downward over any interval i.o." in its infinitely many downward moves from its lim sup to its lim inf. To prove (1.3), we first bound F + n+1 from below by F + n , then express the resulting difference with a common denominator, and finally use the consequence F + n ∼ L n a.s. of Corollary 1.9(b) to find as n → ∞.
Our results for F − show that the deviations of F − n from L n are almost surely negligible on a scale of L 2 n. Theorem 1.12. Consider the process F − defined at (1.1).
(a) Typical behavior of F − : (c1) A bottom outer boundary for F − on the scale of L 3 n: (c2) A bottom inner boundary for F − on the scale of L 3 n: Theorem 1.12 gives rise immediately to the following succinct corollary.
(a) Typical behavior of F − : We come now to our main focus, the process W . The results in Theorem 1.14 follow directly from Corollaries 1.9 and 1.13. Theorem 1.14. Consider the process W defined at (1.2).
(a) Typical behavior of W : and, in particular, W n = Θ(L 2 n) a.s. Remark 1.15. (a) When d = 1, at each time n ≥ 1 there is exactly one current record, F + n = F − n is the value of that record, RS n is the closed interval [F + n , ∞), and W n = 0. (b) Using Remark 1.10, Theorem 1.14(b) can be strengthened to the conclusion that the set of limit points of the sequence W n / L 2 n is almost surely (c) Theorem 1.14(b) has the following immediate corollary. If, for some positive integer d 0 , processes W (d) corresponding to dimension d, d = d 0 , d 0 + 1, . . . , are defined on a common probability space (regardless of any dependence among the processes), then That is, roughly speaking, for time n large relative to large dimension d, the width W n (d) almost surely concentrates near (d − 1) L n.
(d) We could have used d in the denominators of (1.4), but we chose d − 1 because of Theorem 1.14(a). A remark of a somewhat similar flavor as (b) for convergence in probability is the following. If, for some integer d 0 ≥ 2, processes W (d) corresponding to dimension d, d = 2, . . . , d 0 , are defined on a common probability space (regardless of any dependence among the processes), then We have not investigated whether this result might extend to dimension d 0 growing with n.

1.4.
Outline of paper. The stochastic process F + is studied in Section 2, where we prove Theorem 1.8. We treat the process F − in Section 3, where we prove Theorem 1.12. In Section 4 we assess asymptotic behavior of the record counts R n , r n , and β n introduced following Definition 1.1 as preparation for Section 5, where we produce versions of our main results concerning the record-setting frontier process F when time is measured in the number of records (rather than observations X (i) ) generated.

The process F +
This section is devoted to the proof of Theorem 1.8 concerning the process F + defined at (1.1). In light of the characterization provided by Lemma 1.6, Theorem 1.8 follows from results of [7]. Kiefer is concerned with behavior of the law of the iterated logarithm type for the empirical distribution function and sample p n -quantiles for a sequence of independent uniform(0, 1) random variables, with p n > 0 and p n ↓ 0, but notes that his results "may easily be translated into results for general laws." Since we are concerned here with a sequence X + , . . . from the Gamma(d, 1) distribution and with (only) the p n = 1/n upper quantile, for completeness and the reader's convenience we distill Kiefer's proof(s) for our special case.
Proof of Theorem 1.8. (a) This is elementary. We have Kiefer describes two proofs. The first proof observes, for any sequence b n → ∞ which is ultimately monotone nondecreasing, that o.} and applies the Borel-Cantelli lemmas to the sequence of independent events {X (n) The second proof exploits the nondecreasingness of the sample paths of the process F + · = B 1,· noted in Lemma 1.7 and proceeds as follows. If (b n ) is ultimately monotone nondecreasing and (n j ) is any strictly increasing sequence of positive integers, then are independent. Now choose b n ≡ L n + c L 2 n and n j ≡ 2 j and apply the Borel-Cantelli lemmas.
(c) For the case c < 0 of outer-class bottom boundaries, we start with the observation that if (b n ) is ultimately monotone nondecreasing and (n j ) is any strictly increasing sequence of positive integers, then and n j ≡ ⌊e |c|j/2 ⌋ and apply the first Borel-Cantelli lemma.
For the case c ≥ 0 of inner-class bottom boundaries, we start with the observation that if (b n ) is ultimately monotone nondecreasing and (n j ) is any strictly increasing sequence of positive integers, then, recalling the definition (2.1), and n j ≡ ⌊e αj L j ⌋ with α > 1 and apply the first Borel-Cantelli lemma to the events {F + n j > b n j+1 } and the second Borel-Cantelli lemma to the independent events {F + n j ,n j+1 ≤ b n j+1 }.
3. The process F − 3.1. Towards a stochastic lower bound on F − n . To prove Theorem 1.12 we need a stochastic lower bound on F − n to complement the upper bound of Lemma 1.7. For this we use the definitions of the frontier F n and the closed record-setting region RS n to argue as follows. For x ∈ R d , let The difficulty with upper-bounding the probability of this event is of course that the last union is uncountable. In the next subsection we produce a geometric lemma whose application effectively bounds the uncountable union by a finite union.

3.2.
A geometric lemma. Consider the (uncountable) union of positive orthants whose vertices lie on the hyperplane where m ≥ d − 1 is an integer. We can also form a finite union of positive orthants whose vertices lie on the hyperplane x+ = 2m − (d − 1) situated a bit further from the origin. Our key geometric lemma guarantees that the uncountable union contains the finite union (see Figure 2).
and so by finite subadditivity where the last inequality holds assuming that b = b n = (1 + o(1)) L n as n → ∞. We summarize and simplify the bound we have derived in the next proposition, where we assume further that L n − b n → ∞. The bound is the key to the proof of the first assertion in Theorem 1.12(a) and of Theorem 1.12(c1).
Proof of Theorem 1.12(a). The second assertion in Theorem 1.12(a) follows from the case d = 1 of Theorem 1.8(a) since, according to Lemma 1.7(a), we have where we recall the definition The first assertion follows from part (c1), proved next.
Proof of Theorem 1.12(c1). As noted in Lemma 1.7, the process F − has nondecreasing sample paths. From this it follows that if (b n ) is (ultimately) monotone nondecreasing and (n j ) is any strictly increasing sequence of positive integers, then To complete the proof, we choose b n ≡ L n − 3 L 3 n and n j ≡ 2 j , bound P(F − n j ≤ b n j+1 ) using Proposition 3.2, and apply the first Borel-Cantelli lemma.
Here are the details. Since L n j = j L 2 and b n j+1 = (j + 1) L 2 − 3 L 2 [(j + 1) L 2] = j L 2 − (1 + o(1))3 L 2 j, the hypotheses of Proposition 3.2 are met and which is summable. Remark 3.3. We chose the constant 3 as the coefficient of − L 3 n in parts (a) and (c1) of Theorem 1.12 for convenience. As the proof shows, we could have used any constant larger than 2.
Proof of Theorem 1.12(c2). This follows immediately from the case d = 1 of Theorem 1.8(c) using the aforementioned bound (3.5).
There remains only the proof of Theorem 1.12(b). For that we need first the following almost sure lower bound on r n , which is of interest in its own right.

Record counts
Knowledge about the record counts R n , r n , and β n discussed in Section 1 is interesting in its own right, and knowledge about R n will be needed in the next section.

Typical behavior.
In this subsection we review a known central limit theorem (CLT) of Berry-Esseen type for r n and use it to derive easily CLTs for R n and β n . Here are the results. Complicated but explicit forms are known for the constants γ d,j appearing in the variance expressions.
Then the number R n of records set through time n satisfies Then the number β n = R n − r n of broken records at time n satisfies For d = 1, part (c) follows from part (b) because r n = 1 for n ≥ 1. For d ≥ 2, part (c) follows from parts (a) and (b); for Var β n we use the triangle inequality for L 2 -norm after centering by means, and for the CLT we use the CLT of part (b) together with Slutsky's theorem.
We have not attempted to find further terms in the asymptotic expansion for Var β n nor a Berry-Esseen theorem for β n .

4.2.
Almost sure behavior. We next establish a sufficient condition for a top boundary for the absolute centered process (|R n − E R n |) to be of outer class, and derive from that condition strong-law concentration for R about its mean function. We also establish analogous results for the processes β and r.
(a) If ǫ > 0, then As a consequence, As a consequence, As a consequence, if d ≥ 5 then r n E r n a.s.

−→ 1.
Proof. (a) Since E R n ∼ (L n) d /d! by Theorem 4.1(b), the second assertion is indeed an immediate consequence of the first. To prove the first assertion, we establish P R n ≥ E R n + (L n) To prove (4.1) we exploit the nondecreasingness of the sample paths of the process R. If (b n ) is ultimately monotone nondecreasing and (n j ) is any strictly increasing sequence of positive integers, then 4 +ǫ (which is clearly nondecreasing) and n j ≡ ⌊e j 2/d ⌋. Observe for large j that L n j = j 2/d + O(e −j 2/d ), and hence from Theorem 4.1(b) that Observe also that As a consequence of these two observations, Further, from Theorem 4.1(b) we have Hence, by Chebyshev's inequality, which is summable. The first Borel-Cantelli lemma now implies that P(R n j+1 ≥ b n j i.o.(j)) = 0, and then (4.3) yields the desired (4.1). The proof of (4.2) is similar and again uses the nondecreasingness of the sample paths of R. If (b n ) is ultimately monotone nondecreasing and (n j ) is any strictly increasing sequence of positive integers, then 3d 4 +ǫ and, again, n j ≡ ⌊e j 2/d ⌋. The sequence (b n ) is ultimately monotone nondecreasing because it is known (e.g., [3]) that while also provided ǫ < d/4 (which we may assume without loss of generality), whence Proceeding as for (4.1), by Chebyshev's inequality we have which is summable. The first Borel-Cantelli lemma now implies that and then (4.4) yields the desired (4.2).
(b) For d = 1, part (b) follows from part (a) because r n = 1 for n ≥ 1, so we assume d ≥ 2. The sample paths of β, like those of R, are nondecreasing. Thus, in precisely the same fashion that part (a) is proved using the mean and variance results from Theorem 4.1(b), so one can prove part (b) using the mean and variance results from Theorem 4.1(c). A key technical detail in establishing the analogue of (4.2) for the process β is this analogue of (4.5) [which follows immediately from (4.5) by use of concomitants]: (c) We obtain part(c) by subtraction from parts (a)-(b): This gives the first assertion. Since E r n ∼ (L n) d−1 /(d − 1)! by Theorem 4.1(a), the second assertion is indeed an immediate consequence of the first provided 3d/4 < d − 1, i.e., d ≥ 5. In dimension d = 2 we can come close to (4.6), or at least to showing that r n = Θ(L n) a.s. Indeed, we can combine the representation of the distribution of r n as a Poisson-binomial sum with a Chernoff bound and the first Borel-Cantelli lemma to show that r n = O(L n) a.s., and Theorem 3.4 gives r n = Ω((L n)/(L 2 n)) a.s.

Time change
It is natural to wonder about the appearance of the record-setting frontier (even in dimension 2) when many observations, or (equivalently) many records, have been generated. Figure 3 displays the record-setting frontier for one trial after 10,000 bivariate records had been generated, at which point results such as those in Section 1 suggest themselves. According to Theorem 4.1(b) [or Proposition 5.1(a2)], had this been done naively, by generating observations X (i) and waiting for new records to be set, it would have taken roughly 10 61 observations to obtain 10,000 records. Instead, only the records were generated, using the importance-sampling scheme described and analyzed in [5].  Figure 3. Record frontier F 10,000 after 10, 000 records generated using the importance-sampling algorithm described in [5].
The record-setting region process (RS n ), and therefore also the frontier process (F n ) we have studied in earlier sections, is adapted to the natural filtration for the process C = (C n ) n≥0 , where C n = (C (1) n , . . . , C (rn) n ) is the r n -tuple of remaining records at time n in order of creation. Let T 0 = 0, and for m ≥ 1 let T m denote the mth record-creation epoch; note that C remains constant over each of the time-intervals [T m−1 , T m ), m ≥ 1. Fill and Naiman [5] don't simulate the i.i.d. observations process X (1) , X (2) , . . . (that is, they don't work in "observations-time"), but rather simulate the process C = ( C m ) m≥0 , where C m := C Tm [and hence the processes ( RS m := RS Tm ) and ( F m := F Tm )] (that is, they work in "records-time"). The following goal thus naturally arises: Translate results about C to results about C.
The keys to doing so are (i) monotonicity of the sample paths of various processes of interest (such as F + and F − ) and (ii) the switching relation The switching relation enables us to obtain information about the recordcreation times T m from the records-counts Theorems 4.1(b) and 4.2(a). The following proposition is not the most elaborate result which can be obtained in such fashion, but it will suffice for our purposes. Proof. Fix d ≥ 1.
(a) Given ǫ > 0, by the switching relation (5.1) and Theorem 4.1(b) we have  (a2) If d = 2, then the same calculations show that for any real x we have yielding the claimed CLT, since from [3], γ 3,0 = π 2 6 + 1 2 . (a1) If d = 1, then the same calculations show that for any real x we have yielding the claimed CLT, since γ 2,0 = 1. Indeed, (5.3) follows immediately from (4.6) by substitution of T m for n and use of Proposition 5.1(b1). To sketch a proof of the converse, consider the ratio on the left in (4.6) for T m ≤ n < T m+1 . For the numerator of the ratio, note that r n = r Tm . Use T m ≤ n < T m+1 in the denominator to get upper and lower bounds on the ratio, and then use Proposition 5.1(b1) to relate the upper and lower bounds on the ratio in (4.6) to the ratio in (5.3).
We can now translate results of Section 1 from observations-time to records-time (the main goal of this section being to translate Theorem 1.14 about frontier width in this fashion), but [because of the limitation of Proposition 5.1(b2)] we only know how to translate some of our almost sure results when d ≥ 5.
(a2) If d ≥ 3 we have the following convergence in law to Gumbel: (b) Almost sure behavior for F + : (b1) For any d ≥ 1 we have Proof. (a2) Assume that d ≥ 3 and let . Given x ∈ R and ǫ > 0, we will show that and a similar proof establishes P( G m ≤ x) ≤ P(G ≤ x + ǫ) + o(1). Letting m → ∞ and then ǫ ↓ 0 completes the proof of (a2), and (a1) is a simple consequence. We now prove (5.4). By Proposition 5.1(a3) and nondecreasingness of the sample paths of F + , we have Thus, making use of Theorem 1.8(a), we arrive at as desired.
(b1) By Corollary 1.9(b) and Proposition 5.1(b1), the following asymptotic equivalences hold a.s.:  We come finally to our main focus of this section, the process W .