The hiring problem with rank-based strategies

The hiring problem is studied for general strategies based only on the relative ranking of the candidates; this includes some well known strategies studied before such as hiring above the median. We give general limit theorems for the number of hired candidates and some other properties, extending previous results. The results exhibit a dichotomy between two classes of rank-based strategies: either the asymptotics of the process are determined by the early events, with a.s. convergence of suitably normalized random variables, or there is a mixing behaviour without long-term memory and with asymptotic normality.


Introduction
The hiring problem is a variant of the well-known secretary problem, in which we want to hire many good applicants and not just the best. An informal formulation is that a large number of candidates are examined (interviewed) one by one; immediately after each interview we decide whether to hire the candidate or not, based on the value of the candidate (which is assumed to be revealed during the interview) and of the values of the candidates seen earlier. This is thus an on-line type of decision problem. The mathematical model assumes that the values of the candidates are i.i.d. random variables, with some continuous distribution (which prevents ties). See below and Section 3 for formal details.
There are two conflicting aims in the hiring problem: we want to hire (rather) many candidates but we also want them to be good. Thus there is no single goal in the hiring problem, and thus we cannot talk about an optimal solution. Instead, the mathematical problem is to analyse properties of various proposed strategies. The property that has been most studied is the number of accepted candidates among the first n examined, here denoted M n . We will also study the inverse function N m , the number of candidates examined until m are accepted. Some other properties, such as the distribution of the value of the accepted candidates, are discussed in Sections 9-11.
There are two main groups of strategies. In the present paper we study only rank-based strategies, i.e., strategies that depend only on the rank of the candidate among the ones seen so far; in other words, on the relative order of the values of the candidates. A typical example is 'hiring above the median', see below; see also [14; 4; 2; 6; 10; 8; 11; 9]. In statistical terms, the values of the candidates are regarded as on an ordinal scale. Thus, the distribution of the value of a candidate does not matter, and can freely be chosen as e.g. uniform (an obvious standard choice used by some previous authors) or exponential (used in the analysis in the present paper). Furthermore, for rank-based strategies, it is equivalent (for a fixed n) to assume that the values of the first n candidates form a uniformly random permutation of {1, . . . , n} [14; 2; 6; 10; 8; 11; 9].
The alternative is to use a strategy depending on the actual values; a typical example is 'hiring above the mean' [18; 4; 15; 16]. For such strategies, the results depend on the given distribution of the value of a candidate; several different distributions have been investigated in the papers just mention. Such strategies will not be considered in the present paper.
In the present paper we thus study rank-based strategies. More precisely, following Krieger, Pollak and Samuel-Cahn [14], we consider strategies of the following form (which seems to include all reasonable rank-based strategies). We assume throughout the paper that we are given a sequence of integers r(m), m 0, such that In the case α = 1 2 , we thus take r(m) = m/2 when m 2 is even, meaning a smaller r(m) and thus a larger threshold than in 'hiring above the median'; the sequence (starting with r(0)) is 1, 1, 1, 2, 2, 3, . . .
A third simple example is 'hiring above the r-th best', where r 1 is a fixed number [2; 8; 9]. This means r(m) = r when m r; we always hire the r first candidates, so the complete definition in accordance with (1.1) is r(m) := min{r, m + 1}, m 0. (1.4) Note that the case r = 1 gives the strategy of hiring only the candidates that are better than everyone seen earlier, i.e., the records. The present paper gives a general analysis of strategies of the type above, with an arbitrary sequence r(m) satisfying (1.1). Our main results give the asymptotic distribution of N m (in general) and M n (under weak regularity assumptions on r(m)), see in particular Theorems 4.6, 7.5, 1.2 and 1.3. In particular, this gives new proofs of known results (and some new) for the examples above.
It turns out that there is a dichotomy: the general results are of two different types, depending on whether m r(m) −2 < ∞ or m r(m) −2 = ∞; we will call these two cases large r(m) and small r(m), respectively. Note that the strategies 'hiring above the median' and 'the α-percentile rule' have large r(m), while 'hiring above the r-th best' has small r(m); indeed, the limit theorems found by previous researchers for these cases are of different types, compare e.g. [11] and [9].
The main differences between the two cases can be summarized as follows, at least assuming some further regularity of r(m). For simplicity, we consider here only M n ; the same types of results hold for N m .
large r(m), m r(m) −2 < ∞: M n / E M n converges to a non-degenerate distribution on [0, ∞). (Thus, the limit is not normal.) Furthermore, M n / E M n converges a.s. Hence, the limit and the asymptotic behaviour are essentially determined by what happens very early, i.e., by the values of the first few candidates. This also means a strong long-range dependence in the sequence M n . small r(m), m r(m) −2 = ∞: Asymptotic normality of M n . There is no long-range dependency; instead M n is asymptotically independent of what happened with the first n 0 candidates, for any fixed n 0 . In particular, there is no a.s. convergence.
Intuitively, the reason for the difference between the two cases is that when r(m) is small, each accepted candidate has (typically) a rather large influence on the threshold, and thus on the future of the process, and these influences add up and eventually dominate over the influences of the values of the first candidates, while if r(m) is large, then the influences of later candidates are small, and the effects of the first few candidates dominate.
We state here two theorems that exemplify our main results, one for large r(m) and one for small. In both cases, we assume a regularity condition on r(m) yielding simpler results; proofs and more general results are given in . See also the examples in Section 8, including the counterexample Example 8.9.
for a random variable W , which can be represented as in (5.8) below. Furthermore, all moments converge in (1.5), and, for every s 0, The moments in (1.6) can be explicitly computed in many cases, see [6], Theorem 6.1 and Examples 8.1-8.2.
For our example result in the case of small r(m), we assume that the sequence r(m) is regularly varying. (See e.g. [3, p. 52] for definition, and [1] for definition of a mixing limit theorem.) Theorem 1.3. Assume that r(m) is a regularly varying sequence such that m r(m) −2 = ∞. Let µ(n), n 1, be a sequence of real numbers such that and let Then, as n → ∞, Furthermore, (1.10) is mixing.
Section 3 contain some basic general results. Then Theorem 1.2 and related results for large r(m) are proved in Sections 4-6, while Theorem 1.3 and related results for small r(m) are proved in Section 7. Section 8 contains some examples. The remaining sections consider some additional properties that have been considered by previous authors. Section 9 consider results conditioned on the value of the first candidate. Section 10 treat the probability of accepting the next candidate, and also the number of unsuccessful candidates since the last accepted one. Section 11 studies the distribution of the accepted values.
Remark 1.4. Some previous papers, in particular [11; 9], contain also interesting exact results on the exact distribution of M n for finite n. We do not consider such results here.
Acknowledgement. This research was begun during a lecture by Conrado Martínez at the conference celebrating the 60th birthday of Philippe Flajolet in December 2008; Conrado talked about the hiring problem, and I got the basic idea of the method used here, and made some notes. It then took me almost 10 years to return to the problem and develop the ideas in the notes, with further inspiration from papers by Conrado and others that were written in the meantime. I thank Conrado Martínez for an inspiring lecture, and I find it fitting to dedicate this paper to the memory of Philippe Flajolet.

Notation
Exp(a) denotes the exponential distribution with rate a. In other words, if X ∈ Exp(a), then P(X > x) = e −ax for x 0; equivalently, aX ∈ Exp(1). Hence, E X = 1/a. Ge(p) denotes the Geometric distribution started at 1 (also called First Success distribution), with P(X = n) = p(1 − p) n−1 , n 1.
E, E i , E ′ j and so on will always denote Exp(1) variables, independent of each other.
C and c are used for unspecified constants, which may vary from one ocurrence to another. For constants that depend on some parameter (but not on other variables such as m or n), we similarly use e.g. C K and c(δ).

General limit theorems
We begin by formalising the hiring strategy discussed in Section 1, at the same time introducing some further notation. Recall that for a rank-based strategy, the result does not depend on the (continuous) distribution of the values of the candidates. We choose this distribution to be exponential.
Thus, let X 1 , X 2 , . . . be i.i.d. random variables with X i ∈ Exp(1), representing the values of the candidates. We assume without further mention that these values are distinct (which happens a.s.), so we ignore the possibility of ties below. For convenience we identify a candidate and her value; we will thus say both 'candidate n is accepted' and 'value X n is accepted'.
Let N m be the index of the m-th accepted candidate, and denote the m-th accepted value by X * m := X Nm . Conversely, let M n be the number of candidates accepted among 1, . . . , n. Thus, The hiring strategy is defined by a given function r : Z 0 → Z >0 satisfying (1.1), and thus, in particular, 1 r(m) m + 1.
The basic rule of the strategy is that if m 0 values have been accepted so far, then the next value is accepted if and only if it exceeds a threshold Y m , which is the r(m)-th largest value among the m values already accepted, interpreted as Y m := 0 when r(m) = m + 1. (In particular, Y 0 = 0.) Remark 3.1. It is easy to see [14] that this threshold Y m is the same as the r(m)-th best value of all candidates seen so far, since all previous candidates with values at or above Y m were accepted. If r(m) m, then the strategy is thus to accept a candidate if her value is among the r(m) best of all values seen so far (including her own). It follows by symmetry that conditioned on M n = m, and on everything else that has happened earlier, the probability that candidate n + 1 is accepted equals r(m)/(n + 1), see [14].
The threshold Y m is thus updated when a new value is accepted. This is described by the following lemma which is simple but basic for our analysis. In particular, note that Y m never decreases.
Hence, if r(m + 1) = r(m), then Y m+1 is the smallest of these values, while if r(m + 1) = r(m), then Y m+1 is the next smaller accepted value (or 0), which is X * (r(m)) = Y m . So far, the argument has been deterministic. We now use our assumption that the values X i are i.i.d. random variables as above; this is where the choice of exponential distribution is convenient and greatly simplifies the argument. Lemma 3.3. Assume as above that X 1 , X 2 , . . . are i.i.d. and Exp(1). Then, the increments Y m+1 − Y m , m 0, are independent random variables with Proof. Run the hiring process as above, but keep the values X n secret as long as possible, revealing only enough to determine whether to accept X n or not, and to determine the next threshold Y m . To be precise, when a new candidate n is examined, reveal first only whether her value X n is larger than the current threshold Y m or not. If not, we forget this candidate and move on to the next. Suppose instead that X n > Y m , so that we accept n. Then we also have to update Y m . By Lemma 3.2, this is trivial if r(m + 1) = r(m) + 1. However, if r(m + 1) = r(m), then there are r(m) accepted candidates (including the latest recruit, n) that have values > Y m . We now reveal the minimum of these values, giving Y m+1 , but we do not reveal the remaining r(m) − 1 of them.
Claim. Conditioned on Y m = y and on everything else that has been revealed so far, the r(m) − 1 (still hidden) accepted values that are larger than Y m have the distribution of r(m)−1 i.i.d. random variables with the distribution L(X | X > y).
To show the claim, we use induction on m. We condition on Y m = y and everything else that has been revealed so far, and note that when we accept the next X n , we know just that X n > y, so X n too has the distribution L(X | X > y). Hence, by the induction hypothesis, the r(m) accepted values that are larger than Y m = y are r(m) (conditionally) independent random variables with this distribution. By Lemma 3.2, this completes the induction step when r(m + 1) = r(m) + 1; otherwise we reveal the minimum Y m+1 of them, and note that conditioned on Y m+1 = y ′ > y, the remaining r(m) − 1 = r(m + 1) − 1 variables are i.i.d. with the distribution L(X | X > y ′ ).
This proves the claim. Furthermore, since X ∈ Exp(1), this distribution L(X | X > y) is the same as the distribution of X + y. (The standard lack-of-memory property of exponential distributions.) Hence, if r(m + 1) = r(m), then the claim and its proof yield that, conditioned on Y m = y and everything else revealed so far, where E 1 , E 2 , . . . are i.i.d. and Exp(1). In particular, Y m+1 − Y m is independent of Y 1 , . . . , Y m . Furthermore, (3.2) holds, since min 1 j r(m) E j ∈ Exp(r(m)). Let Proof. An immediate consequence of Lemma 3.3, with when δ k = 1, and E k ∈ Exp(1) arbitrary but independent of everything else when δ k = 0.
We are now prepared for a theorem giving an exact representation of the sequence N m .
where, conditioned on the sequence (Y m ) ∞ 1 given by Lemma 3.4, the random variables V k are independent with V k ∈ Ge(e −Y k ).
Proof. Fix m and condition on Y 1 , . . . , Y m and N 1 , . . . , N m . Each new candidate after N m has probability P(X n > Y m | Y m ) = e −Ym of exceeding the threshold Y m , and these events are independent. Hence, still conditioned on the past, (3.8) Furthermore, still conditioned on the past, this waiting time N m+1 − N m is independent of the value of the next accepted candidate X * m+1 . Hence, the argument in the proof of Lemma 3.3 extends and shows that conditioned on Y 1 , . . . , Y m and N 1 , . . . , N m , the increments Y m+1 − Y m and N m+1 − N m are independent, with the (conditional) distributions given by (3.2) and (3.8).
This implies that conditioned on (Y m ) ∞ 1 , the increments V k := N k+1 − N k are independent, with (conditionally) V k ∈ Ge e −Y k . Remark 3.6. As said above, our choice X n ∈ Exp(1) simplifies the argument, but it is not really essential. An equivalent argument has been used by e.g. [4] with values X n ∈ U (0, 1); then one considers the gap 1 − Y m and shows that these gaps can be written as products of independent random variables. Taking (minus) the logarithm of the gap yields a sum of independent random variables (which is more convenient than a product for limit theorems), and that is equivalent to our version with exponentially distributed values X n .
So far we have given exact formulas, but now we start to approximate in order to obtain simpler formulas. First we approximate the geometric distributions in (3.7) by exponential distributions. Theorem 3.7. As m → ∞, a.s.
where Y k are given by (3.5) and E ′ k ∈ Exp(1) are independent of each other and of (Y k ) ∞ 1 . Proof. We use continuous time, and assume that candidate n is examined at time τ n , where the waiting times τ n − τ n−1 (with τ 0 = 0) are i.i.d. Exp (1). In other words, the candidates arrive according to a Poisson process on [0, ∞) with intensity 1. Note that by the law of large numbers, τ n /n a.s. −→ 1 as n → ∞. We assume that the times (τ n ) n are independent of the values (X n ) n .
Let T m := τ Nm be the time the m-th candidate is accepted. Then, as m → ∞ and thus N m → ∞, We argue as in the proof of Theorem 3.5. Condition on Y 1 , . . . , Y m and T 1 , . . . , T m for some m. Then, after T m , candidates arrive according to a Poisson process with intensity 1, and thus candidates with a value > Y m arrive as a Poisson process with intensity e −Ym . Consequently, conditioned on the past, the waiting time T m+1 − T m has an exponential waiting time (3.11) By the same argument as in the proof of Theorem 3.5, this implies that Finally, the exact continuous-time representation (3.12) implies the approximation (3.9) by (3.10).
The results above are valid for any sequence r(m) fulfilling the conditions (1.1). For further approximations, we treat the cases of large and small r(m) separately, in Sections 4-6 and Section 7, respectively.

Large r(m)
In this section we assume that r(m) is large, i.e., with, as always, so Y m − y m converges to Z given by (4.2) whenever the latter sum converges. Furthermore, this occurs a.s., since the summands in (4.2) are independent random variables with mean 0 and sum of variances We let Z denote the sum in (4.2) whenever (4.1) holds.
where Z and y k are given by (4.2) and (3.13), and E ′ k ∈ Exp(1) are independent of each other and of Z.
Hence, the result (4.6) follows from (3.9) and the simple deterministic Lemma which is less than 2η m k=0 a k if m is large enough. This implies (4.8). Consequently, (4.14) Define, with y k given by (3.13), where Z and λ m are given by (4.2) and (4.15).
Equivalently, a.s. N m ∼ λ m e Z as m → ∞. Hence, N m grows as the deterministic sequence λ m , with a random factor (asymptotically independent of m) given by e Z . Theorem 4.6 gives the asymptotics (and in particular the asymptotic distribution) of N m , the number of candidates examined until m have been accepted. By inversion, we obtain corresponding asymptotic results for M n , the number of accepted candidates when n have been examined. We state one general result as the next theorem. More explicit results require inversion of the function m → λ m , which easily is done under further assumptions; we study an important case in Section 5 below.
Note first that M n a.s.
−→ ∞ as n → ∞, since for every m, a.s. some future candidate n > N m will satisfy X n > Y m and thus be accepted.

Roughly linear rank thresholds
As said in the introduction, the strategies 'hiring above the median' and 'the α-percentile rule' satisfy r(m) = αm + O(1) for some constant α > 0, and we stated Theorem 1.2 for this case. Note that r(m) = αm In fact, by the proof below, Theorem 1.2 holds under the weaker assumption We assume throughout this section that (5.1) holds, with 0 < α 1. We then define This is an easy exercise, but for the reader's convenience we give a proof in Appendix A. Note also that (5.3) implies ∞ m=1 r(m) −2 < ∞, so r(m) is large and we can use the results of Section 4; our goal in this section is to use the assumption (5.1) to make the results more explicit.
and the result (5.6) follows.
We are prepared to prove the convergence (1.5) in Theorem 1.2.
We proceed to compute moments of W .
Next we bound moments of M n , using a series of lemmas. Recall T m := τ Nm from the proof of Theorem 3.7. We tacitly continue to assume (5.1).
Proof. For convenience, we first consider T 2m . By (3.12), We estimate the three factors on the right-hand side separately. For the first factor, by (4.3) and (5.12), for any real u, since each factor in (5.12) is 1 (by the explicit form or by Jensen's inequality). In particular, Finally, for the third factor, we use the fact that U ∈ Γ(m) and compute Proof. We may assume that K is an integer (by Lyapunov's inequality). Furthermore, N m m 1, and thus E N −K m 1. It follows that (5.23) holds trivially for m < 4K + 2, so we may assume m 4K + 2. In this case, we use the Cauchy-Schwarz inequality and obtain by Lemma 5.6 (1). Since E ′′ i have moments of all orders, the law of large numbers holds with moment convergence [7, Theorem 6.10.2], and thus E τ 2K n /n 2K → 1 as n → ∞, and therefore E τ 2K n /n 2K C K for all n 1. Consequently, and thus The result follows by (5.24) and (5.26). Although perhaps of less interest, we show further that the moment convergence in Theorem 1.2 holds also for some, but not all, s < 0. Let r * 1 be as in Remark 5.5. Lemma 5.9. For every real u < r * , there exists a constant C u such that Proof. The case u < 0 is Lemma 5.7, and u = 0 is trivial, so we may assume u > 0. We consider again first T m = τ Nm . Assume first that r(1) = 2, so that r * 2. It then suffices to prove (5.29) for 1 u < r * , so we assume this. Recall from Remark 5.5 that then E e uZ < ∞. Hence, (3.12), Minkowski's inequality, (5.18), (4.15) and (5.6) yield, with X u := (E |X| u ) 1/u , In other words, The argument above fails if r(1) = 1, since then r * = 1 and E e uZ = ∞ for every u 1, see again Remark 5.5. (And we need u 1 in order to use Minkowski's inequality.) In this case, let k 0 := min{k : r(k) = 2} and consider We have, cf. (4.3) and (5.12), 34) and thus, using (4.15) and (5.6) as above, Consequently, for 0 < u < 1 = r * , using T m = T k 0 + e Y k 0 T m and the subadditivity of Hence, (5.32) holds in this case too. Finally, by the law of large numbers, P(τ n > n/2) → 1 as n → ∞, and thus P(τ n > n/2) c for every n 1. Hence, Proof. By Lemma 5.8, it suffices to consider s < 0. Then, let −r * /α < t < s and u := −αt ∈ (0, r * ). We argue as in the proof of Lemma 5.8 with minor modifications. Let x > 0 and let m := ⌊xn α ⌋. Then, by Lemma 5.9, Proof. Note that the right-hand side of (1.6) is finite for Re s > −r * /α and an analytic function of s in that half-plane, see Remark 5.5. Lemma 5.10 implies uniform integrability of (M n /n α ) s for every s ∈ (−r * /α, 0), and thus by Theorem 1.2 for every real s > −r * /α, which implies (1.6) for Re s > −r * /α. On the other hand, if s −r * /α, then E W s = ∞, see Remark 5.5, and thus E M s /n αs → ∞ by (1.5) and Fatou's lemma.
Example 6.2. Taking s = q in (6.3), we see that the q-th moment has the rational value r(i).
(6.8) Remark 6.3. The result extends to all complex s such that Re s > −q/ν (and more generally to Re s > −r * /α); in particular to imaginary s, which gives the characteristic function of log W , see Remark 5.5 and Theorem 5.11. Remark 6.4. Positive random variables with moments that can be expressed as a fraction of products of Gamma functions as in (6.2) are studied in e.g. [12]. In particular, [12, Theorems 5.4 and 6.1] imply that W has an infinitely differentiable density dunction f W (x) on [0, ∞), with the asymptotic 9) where the positive constants C 2 , c 1 , c 2 can be expressed explicitly in ν, q and r(1), . . . , r(q). This density f W is of a type known as H-function, see [12, Addendum].

Small r(m)
In this section we develop the results in the case of small r(m), i.e., m r(m) −2 = ∞. However, we begin with some results holding in general, although their main interest is for the case of small r(m).
Let, recalling (3.5), and, as a simpler approximation, We have, by the argument in (3.13), Hence, the condition that r(m) is small can be written in three equivalent forms: Furthermore, when this holds, then σ m ∼σ m . For convenience, define Hence, , which by (7.9) yields the lower bound Consequently, by (3.5), (7.14) In other words, if r(m) j m, then Proof. First condition on the entire sequence (Y k ) ∞ 1 . Then, by (7.5)-(7.6), and thus (7.20) Using (7.9), with m 1 := m − ⌈r(m)/2⌉ as above, and taking the expectation, we obtain On the other hand, if sup m r(m) < ∞, we note that the right-hand side of (7.20) is 1, and thus taking the expectation yields The result in this case (bounded r(m)) follows from (7.24) and (7.25), together with (7.7) again.
. Hence, (7.26) follows from (7.27).  Proof. An immediate consequence of (4.3) and the central limit theorem with Lyapounov's condition [7, Theorem 7.2.2], using the estimate The asymptotic distribution of N m is thus log-normal, for any small sequence r(m). Under weak regularity assumptions, this can be inverted to yield asymptotic normality of M n . For convenience, we will assume that the sequence r(m) is regularly varying, see e.g. [3, §1.9]. We define, Furthermore, r(m) −2 is regularly varying of index −2κ, and it follows, using the assumption m r(m) −2 = ∞ when κ = 1/2, thatσ 2 m defined by (7.2) is regularly varying of index 1 − 2κ, see [3, Theorem 1.9.5(ii) and Propositions 1.5.8 and 1.5.9b]. Henceσ 2 m ∼σ 2 m . Proof of Theorem 1.3. Note first that (1.7) implies that µ(n) → ∞ as n → ∞. We may thus assume µ(n) 1. Furthermore, (7.34) yields γ(n) = β(⌊µ(n)⌋) → ∞ as n → ∞. We may thus also replace µ(n) by ⌊µ(n)⌋, and thus in the sequel assume that µ(n) is an integer. Fix x ∈ R and let m := ⌈µ(n) + xγ(n)⌉. By (1.9) and (7.35), as n → ∞, and thus m ∼ µ(n). In particular, m → ∞ as n → ∞ and m 1 for all large n; consider only such n. Then, using (3.1), In the last line, the random variable on the left of ' ' is asymptotically normal by (7.30); we turn to the term on the right. By (1.7) and (7.32), y µ(n) = log n + O(1), and by ( and thus (7.43) and (7.44) yield Consequently (7.42) and Theorem 7.5 imply, with ζ ∈ N (0, 1), for every x ∈ R, which proves (1.10). Finally, the limit (7.28) is evidently mixing, i.e., it holds also conditioned on any fixed set Y 1 , . . . , Y K of the variables. (See [1, Proposition 2].) Using the argument in the proof of Lemma 3.3, (7.28) holds also conditioned on the sequence of indicators 1{X k } is accepted, k N K , which is equivalent to conditioning on M 1 , . . . , M N K . It follows that the results above, including (7.49) and (1.10), hold also conditioned on M 1 , . . . , M N K , and thus a fortiori also conditioned on M 1 , . . . , M K . Hence (1.10) is mixing. Remark 7.7. It can be seen from the proof that the assumption that r(m) be regularly varying may be replaced by the weaker if m = m + O(β(m)), then r( m) ∼ r(m). (7.50)

Examples
Example 8.1. Consider 'hiring above the median'. By (1.2), this is the case r(m) = ⌊m/2⌋ + 1. Hence, (6.1) holds with ν = 1 and q = 2, and thus Theorem 6.1 applies and yields which shows that (W/2) 2 ∈ Exp(1) and that W thus has a Rayleigh distribution with density function Consequently, Theorem 1.2 shows that M n /n 1/2 converges in distribution to this Rayleigh distribution, as shown by Helmi and Panholzer [11]; moreover, Theorem 1.2 shows a.s. convergence, and convergence of all moments. ( [11] treated only the mean.) Furthermore, the definition (3.4) yields δ k = 1{m odd}, and consequently, (4.2) yields It is well-known that this sum yields a centered Gumbel distribution: (Recall that by definition, E Z = 0.) This can be seen in several ways, for example by computing the moment generating function E e sZ = e −γs Γ(1−s), Re s < 1, by arguments similar to the proof of Theorem 6.1, or directly from (5.8) and the identification of W as a Rayleigh distribution; we omit the details.
Example 8.2. Let 0 < α 1 and consider 'the α-percentile rule', i.e. r(m) = ⌈αm⌉ by (1.3). Theorem 1.2 applies and shows convergence M n /n α → W α , a.s., in distribution, and with all moments, to some positive random variable W α with moments given by (1.6). In particular, when α is rational, the moments can be calculated (in terms of the Gamma function) by Theorem 6.1. We give a few examples in Table 1. (The expectations were given in [6], see below. Note that the results can be written in different forms, using standard Gamma functions identities; cf. the partly different but equivalent formulas for the expectations given here and in [6].) In particular, note that (6.5) yields for the special case ν = 1 (where r(i) = 1 for 1 i q) and for ν = q − 1 (where r(i) = i for 1 i q − 1 and r(q) = ν = q − 1) The expectations c α := E W α = lim n→∞ E M n /n α have been considered before. Krieger, Pollak and Samuel-Cahn [14] found that W 1 ∼ U (0, 1) and thus c 1 = 1/2, but otherwise showed only existence of the limit c α . Gaither and Ward [6] computed c α (our E W α ) as .
In the case α = ν/q rational, they showed further how this can be transformed into a form that they could evaluate symbolically; as examples they gave explicit values for all cases with q 6. The formula (8.7) must agree with (1.6) for s = 1, i.e., but we do not see any direct proof of this. The explicit values for α rational are obtained more easily from (6.2)-(6.5); in particular, for the cases ν = 1 and ν = q − 1, we can take s = 1 in (8.5)-(8.6). Gaither and Ward [6] gave a graph of the function α → c α , and conjectured that it is continuous at all irrational α but only left-continuous at rational α. This is easily verified from our form (8.8), since the infinite product converges unifomly on each interval [a, 1], and each factor in it is continuous at irrational α and left-continuous everywhere, but for each rational α there are factors that have jumps; furthermore, the jumps in the factors are always positive. Hence, c α has a positive jump at each rational α ∈ (0, 1).
Let us consider the case α = 1/2 in more detail. For comparisons, let W med denote the limit variable for 'hiring above the median' in Example 8.1.
We can study this in general. Given a sequence r(m) satisfying (1.1), define a new sequencer(m) by inserting an extra 1 first, i.e., letr(m) := r(m − 1), m 1. We use˜to denote variables for the new sequence. It follows from (4.2) thatZ with E ∈ Exp(1) independent of Z. Suppose now that r(m) = αm+O(1), or more generally that (5.1) holds. Then the same is true forr(m); furthermore it is easy to see from (5.2) thatρ = ρ + 1, and thus (5.8) yields, using (8.11), where U = e −E ∈ U (0, 1) is independent of W . Equivalently, which also follows from (1.6). In fact, this has a simple probabilistic explanation. In the modified strategy, the first candidate is, as always, accepted, and becauser(1) = 1, the threshold for the next candidate isỸ 1 = X 1 . Since the threshold never decreases (see Lemma 3.3), this means that only candidates better than X 1 have a chance of being considered. Moreover, it is easy to see that if we consider only the subsequence of candidates with values X n > X 1 , then the ones hired by the modified strategy are precisely those that would have been hired by the original strategy applied to this subsequence of candidates. Conditioning on X 1 = x 1 , the values in the subsequence will be independent with the conditional distribution L(E + x 1 ), and subtracting x 1 from all values, we obtain the original problem for the original sequence. However, still conditioned on X 1 , if we start with a sequence of n candidates, the subsequence will contain only Bin(n − 1, e −X 1 ) candidates. Note that U := e −X 1 ∈ U (0, 1). It follows, using the law of large numbers, that if Theorem 1.2 applies to the original strategy, then it holds for the modified one too, with where U ∈ U (0, 1) is independent of W .
Example 8.4. Another way to view the difference between 'hiring above the median' and 'the 1 2 -percentile rule' is that r(m) has been decreased by 1 for every even m 2. Let us consider, in general, the effect of decreasing a single value r(m) by 1, assuming that this is possible (i.e., that r(m − 1) < r(m) = r(m + 1)). Assume also for simplicity that Theorem 1.2 applies. Then (1.6) shows that W is modified such that E W s is multiplied by 15) where V has density 1/r(m) on (0, 1) and a point mass P(V = 1) = 1 − 1/r(m). Hence, the modified limit W d = V α/(r(m)−1) W . This can be repeated for several changes.
In particular, looking just at the expectation, decreasing r(2) from 2 to 1 in 'hiring above the median' multiplies E W by (1 + 1 4 )/(1 + 1 2 ) = 5/6. As seen above, E W decreases by a factor 2/3 if we change 'hiring above the median' to 'the 1/2-percentile rule', and we now see that half of the decrease is due to the decrease of r(2). This illustrates that, as said in Section 1, in the case of large r(m), the asymptotic behaviour is heavily influenced by the effects of the first candidates.
It follows from (8.17) that W 2 /4 ∈ Γ(2), and thus W has the density Equivalently, W/ √ 2 ∼ χ(4), a chi distribution. We return to the significance of this example in Section 9.
Example 8.6. The extreme case of small r(m) is r(m) = 1, m 0. This means the we only accept candidates that are better than all previous candidates, i.e., the record values in the sequence (X n ). Theorem 1.3 applies with µ(n) = log n, β(m) 2 = m and γ(n) 2 = ⌊log n⌋ ∼ log n, which yields This is a well-known result for the number of records, see e.g. [7,Theorem 7.4.2], and is easily proved directly by the central limit theorem, observing that the indicators I k := 1{X k is a record} are independent with I k ∼ Be(1/k). See further the next example. (This connection between records and the hiring problem was noted by [4].) Example 8.7. More generally, consider 'hiring above the r-th best' for a fixed r 1, with r(m) given by (1.4). Thus Example 8.6 is the case r = 1. This strategy was studied by Archibald and Martínez [2] and, in great detail, by Helmi, Martínez and Panholzer [9]. A value X k is accepted if it is an r-record, in the sense that it is one of the r best values seen so far. (In particular, the first r values X k are always accepted.) Theorem 1.3 applies with µ(n) = r log n, β(m) 2 ∼ m and γ(n) 2 ∼ r log n, which yields M n − r log n (r log n) 1/2 d −→ N (0, 1), (8.21) as shown by [9] (who also gave many other results, including for fixed n, and for the case when both n, r → ∞). Again, this is easily shown directly by the central limit theorem, using the fact that the indicators I k := 1{X k is an r-record} are independent with I k ∼ Be(r/k) for k r, which is noted in [9], see also the furthern references given there.

Conditioning on the first value
We have seen above that in the case of large r(m), the asymptotics depend heavily on the first values X k , and thus in particular on the first value X 1 . Furthermore, as have been remarked by [4], assuming r(1) = 1, so that the second accepted candidate is the first one with X n > X 1 , the waiting time N 2 − N 1 until the second candidate is accepted has, conditioned on X 1 , the distribution Ge e −X 1 with expectation and thus E N m = ∞ for every m 2. These effects led [4] to consider 'hiring above the median' conditioned on X 1 . We can do this in general. We assume that r(m) is large, since for small r(m), conditioning on X 1 has no effect on the asymptotics, see e.g. the mixing property in Theorem 1.3. In particular, when Theorem 1.2 applies, (1.5) extends to whereW is independent of X 1 and has moments as in (1.6) for the sequencě r(m).
For a different distribution of the values X n , e.g. uniform, p is of course given by the corresponding tail probability.
Proof. This was explained already in Example 8.3, although we here modify in the opposite direction, so the original sequence here is the modified one there. As explained in Example 8.3, of the first n candidates, the ones accepted are the first one and then the candidates accepted using the strategy given byř(m) on the candidates that pass the test X k > x 1 . For asymptotics, we can ignore the first accepted candidate, and thus the results are the same as forř(m) with n replaced byŇ n , the (random) number of values X k , 2 k n, such that X k > x 1 . By the law of large numbers, a.s. N n ∼ pn, and the result follows. We omit the details.
Example 9.2. Consider again 'hiring above the median' as in Example 8.1, but condition on X 1 . The sequenceř(m) := r(m + 1) then is the one studied in Example 8.5; thus we find, for example, see (9.2), that conditioned on X 1 = x 1 , where p = e −x 1 and W has the distribution with density (8.19). This (and more) has been shown by Helmi  10. Probability of accepting and length of gaps Let I n := 1{X n is accepted} and p n := E I n be the indicator and the probability that candidate n is accepted; thus M n = n k=1 I k and E M n = n k=1 p k . Let Y * n be the current threshold when candidate n is examined. We then have accepted M n−1 candidates, and thus Furthermore, if P n is the conditional probability that X n is accepted given the past, and thus We return to a more explicit asymptotic result in the case (5.1) in Theorem 10.5 below.
Conditioned on Y * n , or equivalently on P n , the waiting time until the next candidate is accepted is Ge(P n ). We will see that asymptotically, the same holds if we go back in time from n to the last acceptance. The next lemma excludes some extreme cases. Proof. It follows from (3.5) that  Then there exists a sequence a n → ∞, such that on the interval J n := [n − a n P −1 n , n], the stochastic process (I k ) k∈Jn w.h.p. agrees with a sequence (I ′ k ) k∈Jn of indicator variables that conditioned on P n are i.i.d. with I ′ k ∈ Be(P n ).
Proof. Fix an integer K > 0, and define the stopping time ν n := min{k : M k−1 M n−1 − K}. Thus ν n n and, assuming n is so large that M n−1 K, a.s. as n → ∞, since M n → ∞ and thus r(M n−1 − K) → ∞ a.s. Consequently, a.s. as n → ∞, By definition, K candidates are accepted in the interval J * := [ν n , n) (provided ν n > 1), and, conditioned on P νn , each candidate in J * is accepted with probability at most P νn .
Let a be a fixed large number and define n 1 := ⌈n − aP −1 νn ⌉. If ν n n 1 , then |J * | n − n 1 aP −1 νn , and thus at least K candidates are accepted in the interval [ν n , ν n + ⌊aP −1 νn ⌋). Hence, using Markov's inequality, Consequently, given any ε > 0, we may by choosing K > a/ε make this probability < ε, uniformly in p > 0. Hence, we may in the rest of the proof assume that ν n < n 1 . This means that for every k in the interval [n 1 , n], ν n < k n, and thus P νn P k P n . It follows that, conditioned on P νn , we may couple the Markov process (I k ) k∈[n 1 ,n] with a sequence of (conditionally) i.i.d. variables (I ′′ k ) k∈[n 1 ,n] with P(I ′′ k ) = P νn , with an error probability at most, using (10.7), (n − n 1 + 1) P νn − P n ∼ aP −1 νn P νn − P n a.s.
We now uncondition, and see (using (10.9) and dominated convergence) that we may couple (I k ) k∈[n 1 ,n] and (I ′′ k ) k∈[n 1 ,n] with error probability o(1). We may then instead couple with (I ′ k ) k∈[n 1 ,n] where I ′ k are conditionally i.i.d. with P(I ′ k ) = P n , introducing an additional error o(1) by an estimate similar to (10.9).
We may here replace a by a + 1. Moreover, by a simple general argument, since this coupling with error probability o(1) is possible for every fixed a > 0, it is also possible for some sequence a n → ∞; this follows by the following elementary lemma, taking x(a, n) to be the total variation distance between the two sequences, which completes the proof.  that x(a, n), a, n ∈ N, are real numbers such that for every fixed a, x(a, n) → 0 as n → ∞. Then there exists a sequence a n → ∞ such that x(a n , n) → 0.
Proof. Let n 0 = 1. For every k 1, choose n k > n k−1 such that |x(k, n)| < 1/k when n n k . Define a n = k when n k n < n k+1 . Let L n := n − N Mn be the number of candidates examined after the last accepted one. Let d TV (X, Y ) denote the total variation distance between two distributions or random variables. We can also find the unconditional distribution of L n . For convenience, and in order to obtain more explicit results, we consider only the case in Section 5, and we assume α < 1, which implies (10.4). We first study P n .
Remark 10.9. Theorem 10.2 implies also the same limit results for, e.g., the distance between the last two accepted. See [11,Theorem 4] for 'hiring above the median'.

The distribution of accepted values
Finally, we study the distribution of the accepted values. For simplicity we consider again only the situation in Section 5. We also assume for simplicity that α < 1, leaving the case α = 1 to the reader.
Let, for a real number x, M x n be the number of values X k with k n that are accepted and furthermore satisfy X k x. Define M >x n = M n − M x n similarly.
Theorem 11.1. Suppose that (5.1) holds for some α ∈ (0, 1). Then, a.s., for every u ∈ R, In other words, the empirical distribution function of the differences X k − Y * n for the M n accepted candidates converges a.s. to the distribution with distribution function F (u). Hence, ifX n is the value of one of the M n accepted candidates, chosen uniformly at random, then,

2)
where V has the distribution F (u).
The proof is given later. Note that V has density f (u) := F ′ (u) = αe αu/(1−α) , u < 0, αe −u , u > 0. (11.3) Thus, V has an asymmetric double exponential distribution (Laplace distribution); if α = 1/2, V has the usual Laplace distribution. In order to prove Theorem 11.1, we introduce a simpler strategy. Fix a real number z and define Proof. Consider first a fixed u 0. Then every value X k > x n + u with k n will have X k > x n x k , and thus be accepted. Hence, M >xn+u n is the number of all such values, and since the indicators 1{X k > x n + u} are i.i.d. with P(X k > x n + u) = e −xn−u , (11.10) Consider next M n . This too is a sum of independent indicators I k := 1{X k > x k }. Furthermore, for k large enough so that x k 0, p k := E I k = e −x k = αe −αz k α−1 = αwk α−1 . The random variables I k are not identically distributed, but the Chernoff bound holds for sums of arbitrary independent indicator variables [13, Theorem 2.8], and thus (11.9) holds for M n too, and we obtain as above M n / E M n a.s. −→ 1, (11.13) which together with (11.12) yields (11.6). Furthermore, (11.6) and (11.10) show that for every fixed u 0, and thus (11.7) holds for u 0. Finally, consider a fixed u 0. Similarly as above, we write M xn+u n as a sum of independent indicators I ′ k := 1{x k < X k x n + u}. Note that I ′ k = 0 unless x k < x n + u, which by (11.4) is equivalent to (1 − α) log k < (1 − α) log n + u (11.15) or k < e u/(1−α) n. and thus, using (11.18) and (11.6), a.s., We have proved that (11.7) holds a.s. for every fixed u ∈ R. Hence, it holds a.s. for every rational u, but this implies that it holds for all u simultaneously, since the left-hand side is monotone in u and the right-hand side is continuous; we omit the details.
A lower bound follows in the same way, now comparing in the opposite directions with z > Z + ξ > z ′ . This proves (11.1) a.s. for a fixed u, and thus for all rational u simultaneously, which again implies the result for all real u simultaneously by monotonicity and continuity.
Corollary 11.3. Suppose that (5.1) holds for some α ∈ (0, 1). Then, the fraction of the accepted values that are larger than the current threshold, and thus would have been accepted now, converges a.s. to α.