LIMIT THEOREMS FOR THE NUMBER OF MAXIMA IN RANDOM SAMPLES FROM PLANAR REGIONS

We prove that the number of maximal points in a random sample taken uniformly and independently from a convex polygon is asymptotically normal in the sense of convergence in distribution. Many new results for other planar regions are also derived. In particular, precise Poisson approximation results are given for the number of maxima in regions bounded above by a nondecreasing curve. polygons.


Introduction
A point p 1 = (x 1 , y 1 ) is said to dominate another point p 2 = (x 2 , y 2 ) if x 1 ≥ x 2 and y 1 ≥ y 2 . For notational convenience, we write this as p 1 p 2 . The maxima (or maximal points) of a sample of points are those points dominated by no other points in the sample. We investigate in this paper distributional properties of the number of maximal points in a random sample taken uniformly and independently from a given planar region.
Such a dominance relation (not restricted to planar points), known more frequently as Pareto optimality, is an extremely useful notion in diverse fields ranging from economics to mechanics, from social sciences to algorithmics; see for example Karlin (1959), Bühlmann (1970), Leitman and Marzollo (1975), Preparata and Shamos (1985), Devroye (1986), Steuer (1986), Stadler (1988), Harsanyi (1988), Statnikov and Matusov (1995). It is one of the most natural order relations in multivariate observations because of the lack of intrinsic total order relations. Other terms like "efficiency" in econometrics, "noninferiority" in control, "admissibility" in statistics are all similar notions. One also finds a similar dominance relation used in a card game called "Russian poker." From a probabilistic point of view, we do not distinguish in this paper the difference between > and ≥, a key issue, however in other fields. The following quote from the review by J. Stoer and J. Zowe in AMS Mathematical Review (MR: 50 #3928) of Zeleny's book [39] typically describes the general situation encountered in theory and practice: One of the more pertinent criticisms of traditional decision-making theory and practice is directed against the approximation of multiple goal behavior by a single technical criterion. In a realistic model of a technical or economical optimization problem it will be impossible to tie all the given criteria into a single function, which could serve as objective function for an associated mathematical programming problem. It will be more appropriate to handle such problems as problems with a vector-valued objective function. Instead of finding an optimal solution of a single objective function, the problem is now one of locating the set of all nondominated points (also called efficient points or Pareto-optimal points).
We mention some concrete applications as follows.
1. Finding the maxima of a sample is a prototype problem with many algorithmic and practical applications. Many geometric and graph-theoretic problems can be formulated as maxima-finding problems, including the problem of computing the minimum independent dominating set in a permutation graph, the related problem of finding the shortest maximal increasing subsequence, the problem of enumerating restricted empty rectangles, and the related problem of computing the largest empty rectangle. Also the d-dimensional maxima-finding problem is equivalent to the enclosure problem for planar d-gon (where the corresponding sides are parallel), the latter problem in turn has several applications in CAD systems for VLSI circuits; see [16,26,33] for details and references.
Let C be a given measurable region and M n (C) denote the number of maxima in a random sample of n points taken uniformly and independently from C. Almost all results in the literature are concerned with the mean value of M n (C). Distributional results are rare, the most studied case being C = [0, 1] 2 for which the problem reduces to the number of records in iid (independent and identically distributed, here and throughout this paper) sequences of continuous random variables; see Baryshnikov (1987Baryshnikov ( , 2000. From this connection, our study may also be regarded as another line of extensions of the theory of records; see Rényi (1962), Barndorff-Nielsen and Sobel (1966), Arnold et al. (1998). The main contribution of this paper is to provide means of establishing the central limit theorem for M n (C) when C is a convex polygon and when C is some region bounded above by a nondecreasing curve. Indeed, we derive precise Poisson approximation results in the second case (by computing the asymptotics of the total variation and Fortet-Mourier distances). A by-product of our central limit theorems gives the asymptotics of the variance. Many other auxiliary and structural results are also derived.

Results
Known results for M n (C) when C = [0, 1] d can be found in Bai et al. (1998), Devroye (1999) and the references therein.
When C is a convex polygon P, Golin (1993) gave the following "gap theorem": namely, E(M n (P)) is either of order n 1/2 , or of order log n, or bounded, and no other scales are possible.
On the other hand, Devroye (1993) showed that if where f is nonincreasing and is either convex, or concave, or Lipschitz (of order 1), then E(M n (C)) ∼ π 0 n 1/2 , where π 0 = π 2 1/2 1 Note that by Cauchy-Schwarz inequality, it is easily deduced that the right-hand side of π 0 reaches its maximal value when C is a right triangle with decreasing hypotenuse of the shape @ @ ; compare Dwyer (1990).
Our first result shows in a precise way that in the case of a convex polygon P E(M n (P)) = π 1 n 1/2 + π 2 log n + π 3 + O(n −1/2 ), where π 1 is essentially Devroye's ("discretized") constant, π 2 can assume only values 0, 1/2 and 1, and π 3 is a complicated constant; see (1.3), (1.6), (3.9) and (3.12) for explicit expressions of these constants. In words, if we move a horizontal line from ∞ downwards, then I u denotes the intersection of this line and P when they first meet; likewise, I r denotes the intersection of a vertical line moving from ∞ leftwards and P when they first meet.
The upper-right part of P is defined as where min I u := min{x : (x, y * ) ∈ P} and min I r := min{y, : (x * , y) ∈ P}. Note that the upper-right part can be either a point, or one or two lines, or a region with nonzero measure. Also note that this definition applies to any other planar regions.
Define s 1 , s 2 , . . . , s ν to be the line segments on the upper-right part of P which bridge I u and I r when I u ∩ I r = ∅, where ∅ denotes the empty set. Also let θ j be the angle formed by s j and a horizontal line for j = 1, . . . , ν; see Figure 1 for an illustration. Let N (0, 1) denote a normal random variable with zero mean and unit variance. Let |P| and |s j | denote the area and length, respectively, of P and s j .
If π 1 = 0 and π 2 > 0 then Var(M n (P)) ∼ π 2 log n, (1.8) and M n (P) − π 2 log n (π 2 log n) 1/2 The proof of (1.7) consists in first splitting the upper-right part of P that lies in P into triangles of the form (except possibly the first and the last). Thus the asymptotic normality of M n (P) is reduced to that of M n (T ), where T is the right triangle with corners (0, 0), (0, 1) and (1,0). It turns out that in this specific case, M n (T ) enjoys many interesting properties; details are examined in the next section. Originally, our proof for the asymptotic normality of M n (T ) proceeded by computing the third and fourth central moments, which was very laborious; see Bai et al. (2000) for details. The proof given here uses the method of moments, and, because of a better manipulation of the recurrence, the result is stronger (convergence of all moments) and the proof is much shorter. The proof of Theorem 1, except (1.8) and (1.9), is then given in Section 3. The case when I u ∩ I r = ∅ is essentially a special case of Theorem 2; a sketch of proof is given in Section 6. Note that in the special case when P is the unit square, the result (1.9) can be easily proved by checking Lyapunov's condition.
Our proof actually covers the case when the underlying region P is not convex, provided that the upperright part of P can be split into a finite union of triangles and rectangles; see Section 3.2.
A comparison of (1.3) with the result for the expected number of points on the convex hull of n iid points chosen uniformly from P by Rényi and Sulanke (1963) shows that there are considerably more maxima than hull points in a random sample when I u ∩ I r = ∅ (the former can be used to approximate the latter; see Devroye, 1980). For more information on related results, see Buchta and Reitzner (1997) and the references given there.
The surprising factor 1/2 in (1.3) and (1.5) suggests several new questions like "Why it is 1/2?" "Is this factor universal?", and "Since this factor depends (from (1.5)) only on the boundary of the upper-right part of P, what happens for general regions?" It is for answering these questions that we study the following two problems. First, if we compare (1.2) and (1.3), it is then natural to consider the convergence rate of Devroye's result (1.2). Devroye (1993) observed that the continuous part of f contributes the n 1/2 term (as in the expression of π 0 ), while the discontinuous points contribute O(log n) maximal points on average. We show that if f is piecewise twice continuously differentiable then the next dominant term in (1.2) is asymptotic to c log n, where c can be explicitly computed in terms of the local behaviors of f near "critical points," that is, points at which |f | = 0, or |f | = ∞ or f does not exist. This result thus connects the critical points and the log n term in the asymptotic expression of E(M n ) in a quantitatively precise way. The idea is to break f at all critical points and then to sum over the expected number of maxima (scaled by their area proportions) in each smaller regions with more smooth boundaries (see Figure 8), the hard part being the determination of the coefficient of the log term. This problem is discussed in Section 4.
Second, we consider E(M n (C)) when C is bounded by two nondecreasing curves. We show that if the rates that the two functions tend to their values at unity are the same then E(M n (C)) is asymptotic to a constant; otherwise, it is asymptotic to c log n for some constant c. In particular, if one curve is linear and the other is either a vertical or horizontal line, then c = 1/2. This consideration thus partly interprets the constant 1/2 in a deeper way.
The last problem we consider is the distribution of M n (C) when C is of the shape (1.1), where f is nondecreasing. It is well known that when f (x) ≡ 1, M n is well approximated by a Poisson distribution with mean log n [29]. We derive, under suitable conditions on f , precise asymptotic approximations for the total variation and Fortet-Mourier distances between the distribution of M n (C) and a suitable Poisson distribution. Actually, our results suggest that Poisson law (with bounded or unbounded mean) is the universal limit law of M n for nondecreasing f ; see the discussion in Section 6 for more support of this suggestion.
Recall that the total variation and the Fortet-Mourier distances of two random variables X, Y can be defined, respectively, by
In particular, M n is asymptotically normally distributed with mean and variance asymptotic to α α+1 log n. We can indeed derive a local limit theorem for M n .
Note that depending on α < −1 + π/ √ π 2 − 6 (plus sign) or α > −1 + π/ √ π 2 − 6 (minus sign). The case when α = −1 + π/ √ π 2 − 6 is of special interest since κ = 0 and our result reduces to an upper estimate. If we replace (1.10) in this case by L(x) = O | log x| −3/2 , then we can show, using the proof techniques for Theorem 2 and the approach in [29], that The proof of Theorem 2 relies on an explicit expression for the moment generating function (in terms of moment generating function of M n (C) when C is a rectangle) and a careful analysis of the associated sums and integrals. Details as well as other Poisson approximation results are given in Section 6.
Notation. Throughout this paper, n is the major asymptotic parameter which is taken to be sufficiently large. All limits, whenever unspecified, is taken to be n → ∞. The generic symbols ε, c, and K always represent suitably small, absolute, and large, respectively, positive constants independent of n whose values may vary from one occurrence to another. The symbol [z n ]F (z) denotes the coefficient of z n in the Taylor expansion of F (z). We write simply M n when there is no ambiguity of the underlying planar region.

Maxima in right triangles
Let T denote the right triangle with corners (0, 0), (0, 1) and (1, 0). We prove in this section the asymptotic normality of M n = M n (T ) by the method of moments. The key idea of the proof is to compute the (centralized) moments recursively and then to reduce all major asymptotic estimates to an asymptotic transfer lemma (see Lemma 4).

4)
for n ≥ 1 with the initial condition f 0 (w) = 1, where the sum is extended over all nonnegative integer triples (j, k, ) such that j + k + = n − 1 and π j,k, (n) := n − 1 j, k, 2 Proof. Let X j = (x j , y j ), j = 1, . . . , n, be n iid points taken uniformly in T . Let z j = x j + y j .
The idea of the proof is to find the point, say X 1 , that maximizes the sum x j + y j , writing for simplicity X 1 = (x, y); and then to divide the triangle into two smaller triangles, T 1,x,y and T 2,x,y , and a rectangle R x,y , as shown in Figure 2. Since the rectangle R x,y contains no maxima, the number of maxima M n in T equals 1 plus those in the two smaller triangles. The probability that there are j points in T 1,x,y , k points in T 2,x,y and points in the rectangle R x,y is equal to Thus we have from which (2.4) follows.
A "random divorce model?" The above probability distribution has a straightforward probabilistic interpretation: Given n − 1 couples, we randomly divide them into two groups with sizes t and 2n − 2 − t, where 0 ≤ t ≤ 2n − 2. Then π j,k, (n) is the probability that there are j couples in one group and k couples in the other, the number of "un-coupled" being = t − 2j = 2n − 2 − t − 2k in both groups. From this point of view, the number of maxima in T can also be interpreted as the total number of steps needed to completely "divorce" n − 1 couples by repeating the above procedure until no further such divisions are possible, namely, when the sizes of all subproblems (or number of couples) reduce to zero. Our result (2.3) is equivalent to saying that this quantity is asymptotically normally distributed; see Frieze and Pittel (1995) for a similar example.

Mean and variance
We prove (2.1) and (2.2). Taking derivative on both sides of (2.6) with respective to w and substituting w = 0, we obtain G 1 (0) = 0 and Then g 1 (0) = 0 and by equating coefficient of z n on both sides where δ ab is the Kronecker symbol. Solving this recurrence, we have Thus the mean of M n is equal to This proves (2.1). The asymptotic approximation of E(M n ) follows from Stirling's formula. Note that (2.1) can be proved in a more straightforward way by computing the probability that a point is maximal. Taking derivative twice with respect to w and substituting w = 0 in (2.6), we obtain G 2 (0) = 0 and We have g 2 (0) = 0 and (n + 1)g 2 (n + 1) + g 2 (n) = δ n0 + 4g 1 (n) 2n + 1 for n ≥ 0. Solving the recurrence, we obtain Thus the second moment of M n satisfies for n ≥ 1 To derive asymptotics of E(M 2 n ), since both sums Σ 1 , Σ 2 are of the type n0≤j≤n n j (−1) j a j for some sequence a j , which can essentially be regarded as the n-th difference of a 0 , we use the associated integral representation from finite differences to evaluate the sums; see [23]. Lemma 1. The sums Σ 1 and Σ 2 satisfy the asymptotic expansions , where ψ denotes the logarithmic derivative of the Gamma function.
Since the growth rate of ψ(s) at σ ± i∞ is of logarithmic order, we can evaluate the integral by shifting the line of integration to the left and by taking the residues of the poles encountered into account. There is no pole at s = 1 because φ 1 (1) = 0. The only singularities are (i) a double pole at s = 1/2 and (ii) simple poles at s = 0 and s = −1/2 − j for j ≥ 0. Collecting the residues of these poles, we obtain the asymptotic expansion (2.7).
Noting that there are perfect cancellations of the ψ-terms and the last sums in both expansions (2.7) and (2.8), we obtain for any K > 0. This completes the proof of (2.2) for the variance of M n .

Higher moments
We prove in this section that for m ≥ 1 where h m := (2m)!σ 2m 2 /(2 m m!). From these the asymptotic normality of (M n − √ πn)/(σ 2 n 1/4 ) will follow. The case m = 1 holds by (2.1) and (2.2). We prove the remaining cases m ≥ 2 by induction. A different approach for the asymptotics of higher moments is needed since the preceding one becomes too involved for moments of degree ≥ 3; see Bai et al. [4].
Solution and asymptotic transfers of the recurrence. We first study recurrences of the type (2.12).
By (2.12) and the asymptotic transfer lemma, the proofs of (2.9) and (2.10) are reduced, by induction, to estimating the asymptotics of R (1) n,m and R (2) n,m .

Asymptotics of R
(1) n,m . By (2.9), (2.10) and induction, we have for m ≥ 2 On the other hand, By Bernstein's inequality, we have, uniformly in x and for sufficiently large n, and, similarly, Thus, Consequently, Estimate for R (2) n,m . We use a slightly different method to estimate R (2) n,m since there are additional cancellations caused by the factor ∆ r n,j,k . Denote again by (J, K, Λ) a trinomial distribution with parameters (n − 1; x)). By induction, (2.9) and (2.10), we have By an argument similar to the estimate of U m,p (n) using Bernstein's inequality, we have from which it follows that An alternative way of estimating V r (n) is as follows. Using the inequality By the local limit theorem for Poisson distribution (or the saddlepoint method), we have Using Cauchy's integral formula and this estimate, we obtain By the inequality 1 − cos θ ≥ 2θ 2 /π 2 for |θ| ≤ π, we have V + r (n) = O n!n −n e n n −1/2 The results (2.9) and (2.10) follow then from applying the asymptotic transfer lemma. This completes the proof of (2.3) and Theorem 3.

Maxima in convex polygons
We prove Theorem 1 in this section.

Mean value of M n (P)
We prove (1.3) in this section. Our approach can actually provide an asymptotic expansion but we content ourselves with (1.3) for simplicity.
Let The following lemma is useful for estimating Laplace-type integrals encountered in this paper.
Proof. The right inequality is obvious. For the left inequality, we have by using the inequalities e t ≥ 1 + t and (1 − t) n ≥ 1 − nt (by induction).
Most Laplace integrals in this paper are of the type for some T and E. If we apply the above lemma and the inequality |e t − 1| ≤ te t , we obtain which is usually easier to deal with than a "microscopic analysis".
Case 1: I u ∩ I r = ∅. Let P j be the right triangle with hypotenuse s j (and under s j ), the angle formed by s j and the adjacent being θ j (see Figure 3). Note that part of P j may lie outside P (see Figure 4(b)).

Lemma 6.
If P j ⊂ P, then
Let R j be a rectangle between P j and P j+1 as shown in Figure 3 (the exact position of the lower-left boundary of R j being immaterial).
Proof. By (3.1) and a change of variables, we have for any K > 0, where ζ(y) = tan θ j + 2y + y 2 cot θ j+1 . Interchanging the order of integration, the first term on the right-hand side is equal to Note that when θ j+1 → θ j , E(M n (R j )) → 1, which cancels nicely with the constant in (3.3); this is consistent with the result (2.1) for T .
Let Z u , Z r be the trapezoids formed by I u and I r , respectively, as shown in Figure 3 when |I u | > 0 and |I r | > 0.
and if |I r | > 0 then Proof. Again by (3.1), we obtain, using (3.2) for any K > 0. The proof for E(M n (Z r )) is similar.
When |I u | = 0, there are two further possible cases: P 1 ⊂ P and P 1 ⊂ P. Let τ denote the segment on P having the common intersection point I u with s 1 . In the first case P 1 ⊂ P, let T u be the right triangle with hypotenuse τ and let θ u be the angle between the hypotenuse and the opposite; see Figure 4(a). In the second case, we denote by S u and θ u the right triangle with hypotenuse τ and the angle formed by τ and the opposite, respectively; see Figure 4(b).

Lemma 9.
If |I u | = 0 and P 1 ⊂ P then if |I r | = 0 and P ν ⊂ P then Proof. By (3.1), for any K > 0. The remaining proof of (3.5) is similar to the derivations for (3.4). The result (3.6) is proved in a similar manner.
Thus when I u ∩ I r = ∅, we have This proves (1.3) when π 1 > 0 with Case 2: I u ∩ I r = ∅. We further divide into four cases. First, if |I u | > 0 and |I r | > 0 (see Figure 5(a)), then where J denotes the contribution of the part in P outside the rectangle formed by the two segments I u and I r . If we are in the case shown in Figure 5 Similarly, the contribution from other parts of P and the "overshoot" are all O(n −1 ).
On the other hand, if |I u | > 0 and |I r | = 0 with angle θ u at the upper-right corner (see Figure 5(b)), then by a similar argument Similarly, if |I r | > 0 and |I u | = 0 with angle θ r at the upper-right corner (see Figure 5(c)), then Finally, if |I u | = |I r | = 0 with angles θ u and θ r indicated as in Figure 5 1 + tan θ u tan θ r log √ 1 + tan θ u tan θ r + 1 − tan θ u tan θ r √ 1 + tan θ u tan θ r − 1 + tan θ u tan θ r + O(n −1/2 ). (3.11) Collecting the above results, we have when I u ∩ I r = ∅ This completes the proof of (1.3).

Asymptotic normality
We split the upper-right part of P that lies in P into ν triangles as shown in Figure 3. Note that all P j 's except possibly the first and the last are right triangles and that the number of maxima in a right triangle T is scale invariant. Assume that ν ≥ 1. Let S n := P \ ∪ 1≤j≤ν P j . Then, by our discussions in Section 3.1, E(M n (S n )) = O(ν + log n). From this and the inequality it follows that (M n − π 1 n 1/2 )/(σ 1 n 1/4 ) and W n := 1≤j≤ν [M n (P j ) − (|P j |πn/|P|) 1/2 ]/(σ 1 n 1/4 ) have the same limit distribution. Let Φ n and Φ denote the distribution functions of W n and N (0, 1), respectively.
To that purpose, assume first that all P j , 1 ≤ j ≤ ν, are all right triangles. Define Π n to be the set of all ν-tuples ρ = (r 1 , . . . , r ν ) of nonnegative integers r j such that r 1 + · · · + r ν ≤ n. Let Ω ρ be the event that there are exactly r j points lying in P j , where ρ ∈ Π n . Then Φ n (x) = ρ∈Πn P (Ω ρ )Φ n,ρ (x), where Φ n,ρ is the conditional distribution function of W n under Ω ρ . Let Y j := M n (P j ). Observe that, under Ω ρ , these Y j 's are independent and that the distribution of Y j under Ω ρ is the same as that of the number of maxima of r j iid points taken uniformly at random from P j . Let Ψ n,ρ be the conditional distribution function (under Ω ρ ) of . Define µ j := |P j |/|P| and the subset Π n ⊂ Π n by Π n := ρ = (r 1 , . . . , r ν ) ∈ Π n : |r j − µ j n| ≤ (µ j n) 1/2 log n for all j = 1, . . . , ν . Thus to prove (1.7), it suffices to show that (i) the probability of the union of Ω ρ , ρ ∈ Π n , tends to 1; (ii) uniformly for all ρ ∈ Π n , P (|W n − Z n,ρ | > ε | Ω ρ ) → 0, for any ε > 0.
This proves the asymptotic normality of M n (P) when both P 1 and P ν are right triangles.
Now assume that P 1 is not a right triangle (it is then an obtuse triangle). We split P 1 into 2µ right triangles and an obtuse triangle in the following "paper-folding" way (see Figure 6). Denote by s 1 the hypotenuse of P 1 and call the point opposite to the hypotenuse x. Make a right triangle by connecting a vertical line from x to s 1 . Assume that the horizontal line intersects s 1 at y. Draw a vertical line from y to the opposite. This results in two right triangles and an obtuse triangle. Call the right triangle sharing the same hypotenuse with s 1 T 1 . Repeating µ − 1 times the same construction in the obtuse triangle yields µ right triangles T i along s 1 . We take µ = c log n , where c is properly chosen so that the expected number of points lying in the obtuse triangle is ≤ n 1/5 and that the expected number of points in each T i is ≥ n 1/5 . This is achievable since the area of the obtuse triangle contracts exponentially. We then argue similarly as above, noting that the main contribution to M n (P 1 ) comes from T i . The asymptotic normality of M n (P 1 ) follows as above. The case when P ν is not a right triangle is similar. This completes the proof of (1.7).

Maxima in regions bounded above by a nonincreasing curve
We consider in this section C of the form (1.1) where f (x) is a nonincreasing function in the unit interval. We may assume for convenience that Let C 2 (a, b) denote the set of real functions whose second derivatives are continuous in (a, b), where a < b. For convenience we also define The classification of critical points into |f (x)| = 0, |f (x)| = ∞ or f − (x) = f + (x) leads the study of the expected number of maxima to the following three prototypical cases of decreasing f (see Figure 7): (ii) f (x) = η > 0 for 0 ≤ x ≤ ξ < 1 and f ∈ C 2 (ξ, 1); or its symmetric (with respect to the line y = x) counterpart; (iii) There is a unique critical point ξ ∈ (0, 1) at which f is continuous (to exclude jumps), f ∈ C 2 (0, ξ) and f ∈ C 2 (ξ, 1). Figure 7: The three basic prototypes.

(iii) (i) (ii)
It is obvious that any piecewise C 2 (0, 1) decreasing functions f can be finitely decomposed into the above prototypes and rectangles (see Figure 8). Our results provide a "transparent" connection between the coefficient of the logarithmic term and the local behavior of f near each critical point. This connection, already observed by Devroye (1993), is made quantitatively more precise here. Note that f may have critical points at the boundary (0 and 1) in the first case. Also if ξ = 1 in the second case, then C is the unit square and the expected number of maxima is asymptotic to log n + O(1).   (1)), The symmetric version with respect to the line y = x has the same asymptotic behavior.
(iii) Assume that ξ is the unique critical point in (0, 1) at which f is continuous. If f ∈ C 2 (0, ξ), f ∈ C 2 (ξ, 1), f satisfies (4.1) and (4.2), and (1)), Thus the expected number of maxima, E(R n ), contributed from the rectangle, satisfies in case (ii) and E(R n ) = O(1) in the last case (see Figure 7). Under stronger conditions, our method of proof can be further refined to make explicit the o-terms; see Bai et al. (2000).
Then prove that log n + α 3 log δ 0 + o(log n), and that the main contribution to E(M n ) comes from I 2 : For details, see Bai et al. (2000).
Case (ii). (Sketch) It suffices to consider the contribution from the rectangle R n (see Figure 7) since the other part is essentially covered by case (i). By (4.3), The symmetric version of E(M n ) with respect to the line y = x is similar.

Maxima in regions bounded by two increasing curves
The results in the preceding section give an interpretation of the "magic constant" 1/2 in the expansion (1.3). We give another different "structural interpretation" in this section by considering the region where f (x) and g(x) are two nondecreasing function in the unit interval. Assume that and that We may assume that f (x) ≥ g(x) in the vicinity of x = 1, namely, α > β if α = β and a < b if α = β.

2)
where on the other hand, if α = β then It is easily shown that Here A(1 − x, 1 − y) denotes the area of the region By (5.1), uniformly for 0 ≤ x ≤ δ and φ(x) ≤ y ≤ γ(x).
Since for fixed x the function A(1 − x, 1 − y) is a nondecreasing function of y, If α = β, then by (5.3), .
On the other hand, if α > β, then ¿From this integral representation, the expansion (5.2) is obtained by using Mellin transform techniques. See Bai et al. (2000) for details.
The case when α = ∞ and g is linear is similar. When both f and g are linear functions, we have α = β = 1, and thus which is essentially the case when |I u | = |I r | = 0; see (3.11).

Maxima in regions bounded above by an increasing curve
We prove Theorem 2 on Poisson approximations in this section.

Probability generating function
Let denote the probability generating function of the number of maxima when the underlying region C is a rectangle.

Proposition 2. Let F (x) =
x 0 f (t) dt. Then for any nondecreasing function f Proof. We use a method similar to the proof of (2.4). Let V = {(x j , y j ) : 1 ≤ j ≤ n} be n iid points chosen uniformly in C.
We first locate the point p = (x, y) for which y = max 1≤j≤n y j . Then the number of maxima equals one plus those in the (shaded) rectangle (see Figure 9). The probability that there are exactly k points in V lying in the rectangle formed by (x, y) and (1, 0) is equal to  Figure 9: Division of C at the point with max 1≤i≤n y i ; all other maxima are inside (and on) the rectangle.
by the change of variables v = f −1 (y). Interchanging the order of integration and making the change of variables Expanding the factor (F (v) n−1−k by binomial theorem and then evaluating the inner beta integral, we have by an integration by parts. It follows that We can further remove the factor f (v) by an integration by parts: Thus . This completes the proof.

Corollary 1.
For any increasing function f , Proof. This follows from (6.1) and the inequality (

Proposition 3. Under the assumptions of Theorem 2, let
Proof. For simplicity, we assume that L(x) ≡ 0; the method of proof is easily amended to the desired case. Thus we have where ε is a small but fixed positive number. First split the integral into two parts: This yields Estimation of I 2 for large j: j ≥ j 0 . The integral I 2 is equal to Observe that for j ≥ j 0 , j/n ≥ j 0 /n ≥ y. This implies that for 0 < y ≤ j 0 /n. Thus This together with (6.4) proves (6.3).
Evaluation of I 2 for small j: j ≤ j 0 . In this case, we have where for j ≥ j 1 := n α/(α+1) . On the other hand, if j ≤ j 1 − 1, then for large enough n. Consequently, Therefore, uniformly for 1 ≤ j ≤ j 0 . It remains to evaluate the integral on the right-hand side.
For the remaining case j 2 ≤ j ≤ j 0 , we take r = j/(n + 1) and r 0 = j 1/2 n −1 log n and decompose the integral into three parts: for j 2 ≤ j ≤ j 0 . Accordingly, for |y − r| ≤ r 0 , It follows that, by extending the integration limits to the unit interval and by estimating the errors so introduced as in the cases of I 4 and I 6 , This completes the proof of (6.2).
To this aim, we substitute the integral representation The case (u) ≥ 2 is similar.
Proof of Theorem 2. (Sketch) The results for the mean and the variance follow from the integral formulae and Proposition 4 using the approach in [28].
The Poisson approximation formulae are obtained by applying the approach in [29] using Proposition 4, details being omitted here.
Proof of (1.8) and (1.9). If C is a convex polygon P with |I u | = 0 and |I r | > 0, then suitable application of Theorem 2 gives M n (P) − 1 2 log n ( 1 since the region outside the upper-right part can contribute at most O(1) maximal points in probability. Also one easily deduces that (1.6) holds by using (1.11) and results in Section 3.1. The same result holds for the case |I u | > 0 and |I r | = 0 by symmetry. The remaining case |I u ||I r | > 0 is similar.

Bounded number of maxima.
Theorem 6. If f is nondecreasing, Proof. (Sketch) Starting from (6.1), we have, by a much simpler analysis than the proof of Propositions 3 and (4), from which (6.6) follows.
The analytic intuition behind this process is that the saddlepoint y = j/n of (1−y) n−j y j is asymptotically not altered by the appearance of the factor (1 − L(y)) j−1 , which is more smooth in nature. This intuition Therefore, we expect again a Poisson distribution in the limit with mean log(1/L(1/n)).
Although we can make this heuristic rigorous by adding suitably technical conditions on L, we only indicate an example to avoid excessive complication.

Extensions
We have derived many new quantitative results for the number of maxima in this paper but more new problems arise; we briefly mention some of them. Golin (1993) that for convex planar regions C, the mean E(M n (C)) satisfies either E(M n (C)) n 1/2 or E(M n (C)) = O(log n) depending on (in our notation) whether I u ∩ I r = ∅, the upper-right part of C being defined as for convex polygons. By the same method of proving Propositions 1 and 2, it is easy to show that in the case when I u ∩ I r = ∅, we have E(M n (C)) ≤ H n−2 + 2, where H n denotes the n-th harmonic number. The idea is now to find the two points with maximal x and y coordinates, respectively. If these two points coincide, then there is only one maximal point and the inequality trivially holds. Otherwise, all other maxima lie inside the rectangle formed by the two points for which the expected number of maxima is given by H n−2 . One may also derive more precise asymptotics for E(M n (C)) when I u ∩ I r = ∅ using the approaches in Sections 3.1 and 4. A natural question is what happens if we remove convexity?

Convex planar regions. It is stated in
Convergence rate and large deviations. We naturally expect an error of order n −1/2 for our central limit theorems (1.7) and (2.3), but proof is still lacking. What is the "entropy function" for the corresponding large deviation principle? How to refine the asymptotic approximation (1.6) for the variance?
Poisson approximation. Although we have discussed several cases leading to Poisson distribution in the limit, we are still unaware of how "universal" the Poisson limit law is, that is, what is the "minimal condition" besides monotonicity for which M n (C) is asymptotically Poisson?
Higher dimensions. Most of our approaches either become too tedious or fail for dimension > 2; see Baryshnikov (2000) for a sketch of proof of the asymptotic normality of M n ([0, 1] d ).
Average-case analysis of maxima-finding algorithms. Few results are known in this direction when the underlying region is not [0, 1] d ; see Devroye (1986Devroye ( , 1999. The efficient list algorithm by Bentley et al. (1993) for finding the maxima of a point set performs slightly slower than linear when C is, say a circle or a right triangle with decreasing hypotenuse, as suggested by simulations; see Golin (1994). But the asymptotic behavior of the expected cost remains open.

Maxima layers.
Regarding the set of maximal points as the first layer of maxima, we can consider the second layer maxima of the remaining points after "peeling off" the first layer. Higher layers maxima are defined similarly. The maximum possible layer is called the depth. If C = [0, 1] 2 , maximal layers are closely related to the longest increasing subsequences in random permutations; see Bollobás and Winkler (1988), Aldous and Diaconis (1999), Baik and Rains (1999) for further details. The problem becomes more involved for other planar regions and for higher dimensions. For example, what is the mean value of the second layer maxima when C = [0, 1] d , d > 2?
In summary, the study of maxima of random samples is not an isolated subject but instead closely related to many well-studied problems; it also provides natural generalizations of many problems like records, longest increasing subsequences, permutations avoiding certain patterns, partially ordered sets, etc.