Optimal Stopping and the Sufficiency of Randomized Threshold Strategies

In a classical optimal stopping problem the aim is to maximize the expected value of a functional of a diffusion evaluated at a stopping time. This note considers optimal stopping problems beyond this paradigm. We study problems in which the value associated to a stopping rule depends on the law of the stopped process. If this value is quasi-convex on the space of attainable laws then it is a well known result that it is sufficient to restrict attention to the class of threshold strategies. However, if the objective function is not quasi-convex, this may not be the case. We show that, nonetheless, it is sufficient to restrict attention to mixtures of threshold strategies.


Introduction and main results
Let Y = (Y t ) t≥0 be a time-homogeneous, continuous strong-Markov process. Let T be the set of all stopping times, and let T T be the set of all (one-and two-sided) threshold stopping times, ie. stopping rules based on the first crossing of upper or lower thresholds. Let V = V (τ ) be the value associated with a stopping rule τ . Consider the optimal stopping problem associated with V , ie. the problem of finding V * (S) = sup where S is some set of stopping times (for example S = T or S = T T ), and especially the problem of finding an optimizer for (1). We say the V = V (τ ) is law invariant if, whenever σ, τ are stopping times, L(Y σ ) = L(Y τ ) implies that V (σ) = V (τ ), where L(Z) is the law of Z. It follows that V (τ ) = H(L(Y τ )) for some map H.
The following result is well-known, but we include it as a contrast to our result on the sufficiency of randomized threshold rules.
Main Result 1 (See Theorem 2 below). Suppose H is quasi-convex and lower semi-continuous. Then V * (T T ) = V * (T ). Corollary 1. In the setting of Theorem 2, in solving the optimal stopping problem (1) over the set of all stopping times it is sufficient to restrict attention to threshold rules.
stopping rule which is of threshold form, see for example, Dayanik and Karatzas [5]. The fact that quasi-convexity means that there is no benefit from following randomized strategies is well understood in the economics literature, see Machina [14] Camerer and Ho [3], Wakker [21] and He et al [11].
Recently there has been a surge of interest in problems which, whilst they have the law invariance property, do not satisfy the quasi-convex criterion. Two examples are optimal stopping under prospect theory (Xu and Zhou [22]), and optimal stopping under cautious stochastic choice (Henderson et al [9]).
Introduce the set T R of mixed or randomized threshold rules.
Main Result 2 (See Theorem 1 below). Suppose law invariance holds for V , but not quasi-convexity for H. Then V * (T T ) ≤ V * (T R ) = V * (T ).
We will show by example that the first inequality may be strict.
In the setting of Theorem 1, in solving the optimal stopping problem (1) over the set of all stopping rules it is sufficient to restrict attention to randomized threshold rules, but it may not be sufficient to restrict attention to (pure) threshold rules.
It should be noted that we do not include discounting in our analysis since a problem involving discounting does not satisfy the law invariance property. Nonetheless, as is well known, the conclusion of Corollary 1 remains true for the problem of maximizing discounted expected utility of the stopped process V (τ ) = E[e −βτ u(Y τ )]. However, in problems which go beyond the expected utility paradigm, there are often modelling issues which mitigate against the inclusion of discounting. For this reason, historically the literature has concentrated on problems with no discounting. Finding the optimal stopping rule is often already challenging in these models.
The significance of Corollary 2 is as follows. In many classical models optimal stopping behavior involves stopping on first exit from an interval. If decision makers are observed to stop at levels which have already been visited by the process, then this behavior is inconsistent with the classical optimal stopping model. However, our result implies that the converse is not true: if decision makers are observed to stop only when the process is reaching new maxima or minima, then it does not necessarily mean that they are maximizers of expected payoffs. Instead the decision criteria may be more complicated, and they may be utilizing a randomized threshold rule.

Problem specification and the problem in natural scale
We work on a filtered probability space (Ω, F , F = {F t } t≥0 , P). Let Y = (Y t ) t≥0 be a (F, P)-stochastic process on this probability space with state space I which is an interval. LetĪ be the closure of I. We suppose that Y is a regular, time-homogeneous diffusion with initial value Y 0 = y such that y lies in the interior of I.
Let T be the class of all stopping times τ such that lim t↑∞ Y t∧τ exists (almost surely). We introduce two subclasses of stopping times • T T , the subclass of (pure) threshold stopping times; • T R , the subclass of randomised threshold stopping times.
Note that T T ⊂ T R ⊂ T . The set of pure threshold stopping times includes stopping immediately and can be written as where τ a,b = inf u≥0 {u : Y u / ∈ (a, b)}. Note that if a = y or b = y then τ a,b = 0 almost surely, and that if σ = τ almost surely then we have V (σ) = V (τ ). Hence we may suppose that τ ≡ 0, the strategy of stopping immediately, lies in T T .
In order to be able to define a sufficiently rich class of randomized stopping times we need to assume that F is larger than the filtration generated by Y . Assumption 1. F 0 is sufficiently rich as to include a continuous random variable, and the stochastic process Y is independent of this random variable.
It follows from the assumption that for any probability measure ζ on D = ([−∞, y] ∩Ī) × ([y, ∞] ∩Ī) there exists an F 0 -measurable random variable Θ = Θ ζ = (A ζ , B ζ ) such that (A ζ , B ζ ) has law ζ. For a set Γ let P(Γ) be the set of probability measures on Γ. Then for any ζ ∈ P(D) we can define the randomised stopping time τ ζ as the first time Y leaves a random interval, where the interval is chosen at time 0 with law ζ.
The set of randomized threshold rules T R is given by Our analysis is focussed on problems in which the value associated with a stopping rule depends only on the law of the stopped process. Let Q(S) = {µ : µ = L(Y τ ), τ ∈ S}.
Given that the value associated with a stopping rule is law invariant, one natural approach to finding the optimal stopping time is to try to characterize Q(S). Often, the best way to do this is via a change of scale. Let s be a strictly increasing function such that X = s(Y ) is a local martingale. (Such a function s exists under very mild conditions on Y see, for example Rogers and Williams [16], and is called a scale function. Note that if s is a scale function then so is any affine transformation of s and so we may choose any convenient normalization for s.) Let I X = s(I) and letĪ X be the closure of I X . Then X is a regular, time-homogenous local-martingale diffusion on I X with initial value x = s(y). Set ). It follows that ν ∈ Q X (S) if and only if ν♯s ∈ Q(S) and hence Thus, if we can characterize Q X (S) then we can also characterize Q(S). Moreover, defining H X : . The problem of optimizing over stopping laws for the problem with Y becomes a problem of optimizing over the possible laws of the stopped process X in natural scale.
. Hence T T has the alternative representation and the set of threshold stopping times for Y is the set of threshold stopping times for X. Similarly, T R can be rewritten as 3 Characterizing the possible laws of the stopped process in natural scale is in natural scale then the state space of X is an interval I X = s(I) and X 0 = x := s(y).
There are four cases: 1. I X is bounded; 2. I X is unbounded above but bounded below; 3. I X is bounded above but unbounded below; 4. I X is unbounded above and below.
The third case can be reduced to the second by reflection. The first case is generally similar to the second case, and typically the proofs are similar but simpler. The final case is degenerate and will be treated separately. In the main text we will mainly present arguments for the second case (with the other cases covered in an appendix), but results will be stated in a form which applies in all cases. Henceforth, in the main text we suppose I X is bounded below, but unbounded above. Without loss of generality we may assume I X = (0, ∞) or [0, ∞). Then X is a non-negative local martingale and hence a super-martingale. Moreover, lim t→∞ X t exists. Hence T includes stopping rules which take infinite values and on {τ = ∞} we set X τ = lim t→∞ X t = 0. In this case T is the set of all stopping times and the intersection with T in the definitions (2) and (3) is not necessary. By Fatou's lemma and the super-martingale property Proof. Here we prove the lemma in the case where I X is bounded below. We show that Q X (T ) = Q X (T R ) = P ≤x . Given ν ∈ P ≤x the aim is to find a stopping time τ ∈ T R such that L(X τ ) = ν. The task of finding general stopping times with L(X τ ) = ξ for given ξ ∈ P(I X ) is known as the Skorokhod embedding problem (Skorokhod [18]). In fact we use an extension of an embedding due to Hall [8], see also Durrett [6]. The extension relates to the fact that we allow for target laws which have a different mean to the initial value of X, whereas the Hall embedding assumes zν(dz) = x. The Hall embedding, and the extension we give, are mixtures of threshold strategies. Suppose ν is an element of P ≤x (and ν is not a point mass at x). The case of ν = δ x corresponds to the (threshold) stopping time τ = 0. Let G be the (right-continuous) quantile function of ν. We have x ≥ zν(dz) = (0,1) G(u)du. In particular, unless lim u↑1 G(u) ≤ x there exists a unique solution Then ν 1 has support in [z * , ∞) and barycentre x. Moreover ν = ν 0 + ν 1 .
Define c = ∞ x (y − x)ν(dy). By construction, c = ∞ x (y − x)ν 1 (dy) and we have from the fact that ν 1 has barycentre x that ∞ z * (y − x)ν 1 (dy) = 0 and hence Let η ∈ P([0, x] × (x, ∞]) be given by Note first that η is a probability measure: 0≤a≤x x<b≤∞ η(da, db) where we use the definition of c and (5) in going from the second line to the third. It remains to show that L(X τ X η ) = ν. Let f be a bounded test function. Then, using the fact that if b = ∞ then X τ X a,∞ = a, and the definition of c and (5) for the penultimate line, Hence L(X τη ) = ν as required. Let

Sufficiency of mixed threshold rules
Our main result is that in a large class of problems it is sufficient to search over the class of mixed threshold rules.
Note that it is not our claim that every optimal stopping rule is a mixed threshold rule. Typically, at least in the case where V (T T ) < V (T ), there will be other optimal stopping rules which are not of threshold type.

Rank dependent utility and optimal stopping
Let Z be a non-negative random variable. Let v : [0, ∞) → [0, ∞) be an increasing, differentiable function with v(0) = 0. Then the expected value of v(Z) can be expressed as Under rank-dependent utility (Quiggin [15]) or probability weighting (Tversky and Kahneman [20]) the prospect value E v (Z) of Z is be a non-negative diffusion and consider the problem of maximizing over stopping times the prospect value of the stopped process Y , ie of finding Clearly the prospect value depends on the stopping time only through the law of the stopped process.
Hence it is sufficient to characterize the optimal target distribution, for example via its quantile function. Xu and Zhou [22] solve for the optimal quantile function in several cases. One relevant case is the following: Proposition 1 (Xu and Zhou [22]). Suppose Y is in natural scale and has state space [0, ∞)and initial value y. Suppose v and w are concave. Suppose there exists λ * ∈ (0, ∞) which solves Then the quantile function of the optimal stopping distribution is G . Xu and Zhou [22] point out that although there is a unique optimal prospect there are infinitely many stopping rules which attain this prospect. They advocate the use of the stopping rule based on the Azéma-Yor stopping time [1], in which case the stopping rule has a drawdown feature, and involves stopping the first time the process falls below some function of the maximum. Our main result says that there is also a randomized threshold rule which is optimal.

Cautious stochastic choice
Given a process Y and a utility function u the certainty equivalent associated with a stopping time τ is C u (τ ) = u −1 (E[u(Y τ )]). The idea in Cautious stochastic choice (Cerreia-Vioglio et al [4]) is that agents use multiple utility functions and evaluate an outcome in a robust manner as the least favorable of the individual certainty equivalents. If the set of utility functions is {u α } α∈A , and if we write C α as shorthand for C uα then the CSC value of a stopping rule is and an optimal stopping rule is the one which maximizes the CSC value. Clearly the CSC value of a stopping rule depends only on the law of Y τ . Moreover, suppose A = {α, β} and suppose u α and u β are strictly increasing and continuous with strictly increasing and continuous inverses. Suppose further that there exist τ 1 and τ 2 andỹ such that is a continuous function of θ. Moreover, C α (τ θ ) is strictly increasing in θ and C β (τ θ ) is strictly decreasing. By our assumptions it follows that the best choice θ * of θ is such that C α (τ θ * ) = C β (τ θ * ) and then θ * ∈ (0, 1) and CSC(τ θ * ) > max{CSC(τ 1 ), CSC(τ 2 )}.
In particular, the value associated with a stopping rule is not quasi-convex. By the analysis of this section, in searching for an optimal stopping rule it is sufficient to restrict attention to randomized threshold rules, but we cannot expect in general that there is a pure threshold rule which is optimal. For a deeper study of optimal stopping in the context of Cautious stochastic choice see Henderson et al [9].

Sufficient conditions for the optimality of pure threshold rules
In this section we argue that if the value associated with a stopping rule is law invariant, and if H is quasi-convex and lower semi-continuous then pure threshold rules are optimal.
Recall that H is quasi-convex if H( Recall also that if H is lower semi-continuous and µ n ⇒ µ then H(µ) ≤ lim inf H(µ n ). In fact we do not require H(µ) ≤ lim inf H(µ n ), but rather the weaker condition H(µ) ≤ lim sup H(µ n ).
Lemma 2. Suppose ν ∈ Q X (T ) consists of finitely many atoms. Then there exists η ∈ P(D X ) such that η consists of finitely many atoms and L(X τ X η ) = ν. Proof. It follows from the construction in the proof of Lemma 1 that if µ is purely atomic then so is η.
Lemma 3. Let ν be an element of Q X (T ). Then there exist (η n ) n≥1 such that η n has finite support for each n and such that L(X τ X ηn ) ⇒ ν. Proof. Since ν ∈ Q X (T ) = Q X (T R ) there exists η such that L(X τ X η ) = ν. Let (η n ) n≥1 be a sequence of measures with finite support such that η n ⇒ η. Then for f : [0, ∞) → R a bounded continuous test function definef : Then, sincef is bounded and continuous and it follows that ν n := L(X τ X ηn ) ⇒ ν. Theorem 2. Suppose Y is a regular, time-homogeneous diffusion. Suppose the law invariance property holds (Assumption 2). Suppose that H is quasi-convex and lower semi-continuous. Then V * (T ) = V * (T T ).
For any µ n with finite support we can define ν n = µ n ♯s −1 . Then we can find a measure η n with finite support such that L(X τ X ηn ) = ν n . Moreover ν n can be decomposed as a convex combination Then, since H is quasi-convex, Then, for τ ∈ T , if µ = L(Y τ ) and if µ n ⇒ µ Hence V * (T ) ≤ V * (T T ).

Discussion
In classical optimal stopping problems involving maximizing expected utility the optimal strategy is a threshold rule and involves stopping the first time that the process leaves an interval. However, in more general settings the optimal strategy may be more sophisticated. In some settings, for example those involving regret (Loomes and Sugden [13]) the optimal stopping rule may depend on some functional of the path (for example the maximum price to date). But, as argued here, for a large class of problems the payoff depends only on the distribution of the stopped process, and then there are many optimal stopping rules, some of which take the form of randomized threshold rules. In this article we have utilized (an extended version of) the Hall solution of the Skorokhod embedding problem (Hall [8]) to give our randomized threshold rule, but there are other solutions of the Skorokhod embedding which can also be viewed as mixed threshold rules, including the original solution of Skorokhod [18] and the solution of Hirsch et al [12]. The idea that if the objective is expressed in terms of a function which is not quasi-convex then agents may want to use randomised strategies is well appreciated in static settings. In a dynamic setting He et al [11] argue that in binomial-tree, probability-weighted model of a casino (Barberis [2]) gamblers may prefer path-dependent strategies over strategies which are defined via a partition of the set of nodes into those at which the gambler stops and those at which he continues. (See also Ebert and Strack [7] and Henderson et al [10] for discussion of a related optimal stopping problem with probability weighting based on a diffusion process.) He et al [11] argue further that the pathdependent strategy can be replaced by a randomized strategy under which the decision about whether to stop at a node depends not on the path history but rather the realization of an independent uniform random variable. This preference for randomization mirrors our result, but takes a different form. In our perpetual problem the agent chooses a randomized pair of levels and then follows a threshold strategy based on these levels. In He et al [11] a zero-one decision about whether to stop at a node is replaced by a probability of continuing, and the stopping rules which arise are not randomized threshold rules.
Many optimal stopping models in the economics literature predict that the agent will stop on first exit from an interval, which necessarily involves stopping either at the current maximum or the current minimum. If instead, observed behavior includes stopping at levels which are not equal to one of the running extrema of the process then this is evidence against the model. (Strack and Viefers [19] present experimental evidence from a laboratory game that players do not follow threshold strategies -instead players visit the same price three times on average before stopping.) But, our results imply that the converse is not true. Even if agents only ever take a decision to sell at a time when the process is at a new maximum or new minimum, this does not necessarily mean that agents are following a pure threshold rule. They could have any target distribution, as for example in Proposition 1, but be realizing this target distribution via a randomized threshold rule.

A.2 The range of X is bounded
Suppose X is bounded. In this case Q(T ) = Q(T R ) = P =x where P =x = {ν ∈ P(Ī X ) : zν(dz) = x}. To see this note that X is a uniformly integrable martingale and not just a super-martingale. Therefore we must have E[X τ ] = lim E[X τ ∧t ] = x and hence Q(T ) ⊆ P =x . Conversely, by the same argument as in Lemma 1, but this time with v * = 0 and ν 1 ≡ ν, we deduce that for any ν ∈ P =x there exists a randomization η such that L(X τ X η ) = ν. It follows that Q(T ) = Q(T R ) = P =x . The proofs of Lemma 2, Lemma 3 and Theorem 1 go through unchanged.
A.3 The range of X is R Now suppose I X is unbounded above and below. By the Rogozin trichotomy (Rogozin [17]) −∞ = lim inf t X t < x < lim sup t X t = ∞ and lim t↑∞ X t does not exist. In this case we must restrict T to the set of stopping times with P(τ < ∞) = 1. In the main text we set T T = T ∩ ∪ β≤y≤γ,β,γ∈Ī Y {τ β,γ } but we could equivalently write T T = ∪ In the definition of randomized threshold rules we can write T R = {τ ζ : ζ ∈ P(D 0 )} where D 0 is as above and similarly T R = {τ X η : η ∈ P(D X 0 )}. When I X = R we claim that we have Q X (T ) = Q X (T R ) = P(R). Since stopping times are finite almost surely we must have Q X (T ) ⊆ P(R) so it is sufficient to show that for any ν ∈ P(R) we have ν ∈ Q X (T R ). Given ν ∈ P(R) let A ν be a F 0 -measurable random variable with law ν and set τ = inf{u : X u = A ν }. Then L(X τ ) = L(A ν ) = ν.
The proofs of Lemma 2, Lemma 3 and Theorem 1 go through unchanged.

A.4 Other results
Proof of Proposition 1. A proof is given in Xu and Zhou [22, Theorem 5.1], but since it is short, elegant and pertinent to our main results we include it here. From the characterization of Q(T ) we have that a quantile function must satisfy 1 0 G(u)du ≤ y. By construction G * has this property, and since v ′ and w ′ are decreasing, G * is increasing. Hence G * has the properties required of a quantile function of a distribution which can be obtained by stopping Y . On the other hand, for any non-negative function G with 1 0 G(u)du ≤ y,