A controller-stopper-game with hidden controller type

We consider a continuous time stochastic dynamic game between a stopper (Player $1$, the \textit{owner} of an asset yielding an income) and a controller (Player $2$, the \textit{manager} of the asset), where the manager is either effective or non-effective. An effective manager can choose to exert low or high effort which corresponds to a high or a low positive drift for the accumulated income of the owner with random noise in terms of Brownian motion; where high effort comes at a cost for the manager. The manager earns a salary until the game is stopped by the owner, after which also no income is earned. A non-effective manager cannot act but still receives a salary. For this game we study (Nash) equilibria using stochastic filtering methods; in particular, in equilibrium the manager controls the learning rate (regarding the manager type) of the owner. First, we consider a strong formulation of the game which requires restrictive assumptions for the admissible controls, and find an equilibrium of (double) threshold type. Second, we consider a weak formulation, where a general set of admissible controls is considered. We show that the threshold equilibrium of the strong formulation is also an equilibrium in the weak formulation.


Introduction
We consider a continuous time two player stochastic game between a stopper (Player 1) and a controller (Player 2).The controlled process (X t ) is given by where (λ t ) is a process chosen by the controller, (W t ) is a Brownian motion, θ is an independent Bernoulli random variable with P(θ = 1) = 1 − P(θ = 0) = p ∈ (0, 1) indicating whether the controller is effective (or active) or not, and c > 0 is a constant.On the other, based on the observations of (X t ), the stopper selects a stopping time τ at which the game ends.For a given stopping-control strategy pair (τ, (λ t )) the reward of the stopper is and the reward of the controller is where r > is a constant (discount rate).
The control process values are restricted to take one of two constants { λ, λ} at each time t, where we assume that λ > λ > c > 0. The model is further specified in Sections 2 and 3 where we also define notions of (Nash) equilibria corresponding to both players wishing to maximize their respective rewards.
The interpretation is that (X t ) is the accumulated income of Player 1 who wants to maximize the discounted accumulated income by selecting a time τ at which the game ends.This income has a positive drift if θ(ω) = 1 and a negative drift if θ(ω) = 0; however, the outcome of θ cannot be observed by Player 1 who must make the stopping decision based only on observations of (X t ).The stopping decision will in equilibrium, as we will show, be made based on the probability that Player 1 assigns to the event {θ = 1}, which is dynamically updated based on the observations of (X t ).
On the other hand, an active Player 2, i.e., in case θ(ω) = 1, can affect the drift of the income (X t ) by dynamically selecting the effort level, i.e., (λ t ), and thereby, as we shall see, affect the probability that Player 1 assigns to the event {θ = 1}.The accumulated income of an active Player 2 is based on the constant income rate c minus the cost rate (λ t − λ) 2 which is zero when effort is low and positive otherwise; cf.(3).Moreover, an active Player 2 wants to dynamically select the effort level (λ t ) in order to maximize the discounted accumulated income of Player 2 until Player 1 ends the game.Player 2 must therefore consider the trade-off between exerting a large effort, which is costly, and a small effort, which implies no cost but decreases the probability that Player 1 assigns to {θ = 1} compared to the large effort.An inactive Player 2 cannot act at all.
In line with the interpretation above, our ansatz to this problem is to use stochastic filtering methods to search for an equilibrium which depends on the conditional probability of the event {θ = 1} based on the observations of (X t ), which corresponds to Player 1's continuously updated belief about Player 2 being active.Indeed, we find such an equilibrium of (double) threshold type meaning that we find two thresholds 0 < b * 1 < b * 2 < 1 such that an equilibrium is that Player 2 exerts the smaller effort λ when the conditional probability of {θ = 1} is above b * 2 and the larger effort λ when the conditional probability is below b * 2 , and Player 1 stops the game whenever the conditional probability of {θ = 1} falls below b * 1 ; see Remark 6 for details.We study this game in an increasing order of generality regarding the set of admissible control strategies.First using a strong formulation and second using a weak formulation of the game.The same threshold equilibrium is obtained in both formulations.
• Strong formulation: In the strong formulation we admit control strategies only of Markovian type in the sense that λ t = λ(P t ), where λ : [0, 1] → { λ, λ} (a deterministic function) and (P t ) is defined as a process which in equilibrium coincides with the conditional probability that the stopper assigns to {θ = 1}.The process (P t ) is here the strong solution to a particular stochastic differential equation; see the beginning of Section 2 for details.In this formulation the main results are: (i) we provide a verification theorem for a double threshold equilibrium, and (ii) we prove that a double threshold equilibrium exists under certain parameter restrictions.
• Weak formulation: In the weak formulation, admissible control strategies correspond to a general set of stochastic processes adapted to a filtration generated by (X t ) taking values in { λ, λ}.Here, however, we start by defining (X t ) as a Brownian motion and we achieve a controlled process analogous to the one in (1) by means of a measure change, with which we define the reward functions and a corresponding equilibrium; see Section 3 for details.The main result is that the double threshold equilibrium found in the strong formulation is also an equilibrium in the weak formulation, i.e., when allowing a larger set of admissible control strategies.
In Section 1.1 we survey related previous literature and clarify the contribution of the present paper.In Section 1.2 we present stochastic filtering arguments which are relevant to the subsequent sections.The strong formulation of our game is studied in Section 2. In particular, the beginning of Section 2 specifies the strong formulation further, Section 2.1 contains a heuristic derivation of an equilibrium candidate, Section 2.2 reports the verification result, and Section 2.3 reports the equilibrium existence result.The weak formulation is studied in Section 3.

Previous literature and contribution
The problem studied in the present paper belongs to a new class of dynamic stochastic control and stopping games with the key feature being that the player's may be ghosts (cf.[7]) in the sense that a player does not necessarily exist, or equivalently is not activity, or not effective.This ghost feature was first studied in [7], where a two-player stopping game is studied and the term ghost was introduced.In [10], a controller-stopper-game where the stopper faces unknown competition in the form of a ghost controller is studied, in the context of a fraud detection application.In [11], a de Finetti controller-stopper-game (of resource extraction) where the controller faces unknown competition in the form of a stopper ghost with the option to extract all the remaining resources instantaneously is studied.
From a game theoretic interpretation our main contribution is that our game is a non-zerosum game where the player objectives agree in the sense that both players would benefit if the hidden controller were revealed (in the case of an active Player 2).In this sense, both players are not exactly competing against each other, but rather aiming on finding an agreement that would benefit both.Typically, such situations are complicated since it makes existence of (non-trivial) Nash equilibria sensitive to the specific player payoffs.This stands in contrast to previously studied games of these type, see e.g., [10] where the profit of one player is an immediate loss for the other, which results in opposite player objectives in the sense that the controller aims at staying hidden, which is the opposite to our situation.
From a technical view-point, our main contribution is twofold.First, we constrain the control process to take values in a finite set, i.e., { λ, λ}.This means that we can interpret the problem of the controller as an optimal switching problem without a cost for switching, implying that switching (between the two control values { λ, λ}) may occur infinitely often, which stands in contrast to the usual formulation of optimal switching problems; see e.g., [25] and the references therein.Second, we consider a weak formulation for these types of games, based on defining the state process (X t ) as a Brownian motion, and the reward functions in terms of measure changes.Then we show that the Nash equilibrium in Markovian strategies (i.e., in the strong formulation), is also a Nash equilibrium in the weak formulation.
This weak approach is inspired by [9], which formulates a weak approach in the study of a sequential estimation problem, where the optimizer can choose a bounded control representing the rate at which the information is received and a stopping time at which the experiment ends, in particular, [9] considers an optimization problem and not a game.Weak solution approaches to dynamic stochastic games have previously been considered in a variety of recent papers; see e.g., [26], and [27] which contain surveys of the related literature.
In a broader context, the problem studied in the present paper can be regarded as a controllerstopper-game under incomplete information.Controller-stopper-games were first studied for zero-sum games.In [19] a zero-sum game between a controller and a stopper is studied for a one-dimensional diffusion, whereas [1] considers the game in a multidimensional setting.In [24] a zero-sum game between a stopper and a controller choosing a probability measure is studied.Singular controls for zero-sum controller-stopper-games were studied in [16] for a one-dimensional diffusion and in [3,4] for the multidimensional setting.In [17] zero-sum controller-stopper-games with singular control are studied for a spectrally one-sided Lévy process.A zero-sum game between a stopper and a player controlling the jumps of the state process is studied in [2].
Stochastic games under asymmetric information were first considered in [5], which considers a zero-sum stochastic differential game between two controllers.In [6] path-wise non-adaptive controls are studied for a zero-sum game between two controllers.An asymmetric information Dynkin game with a random expiry time observed by one of the players is studied in [20].In [15] a two-player zero-sum game under asymmetric information is considered where only one player can observe the underlying Brownian motion, while the second the player only observes the strategy chosen by the first player.A zero-sum game where both players observe different processes is studied in [14].Non-Markovian zero-sum games under partial information are also considered; see e.g., [8].
For a background regarding the interpretation of our game as a dynamic signaling game between an owner (Player 1, the stopper) and a manager (Player 2, the controller) see [12] and the references therein.

The underlying stochastic filtering theory arguments
The present section contains a brief account of the stochastic filtering arguments that underlie the analysis of the present paper.The section is included as an informal and heuristic precursor for content of the subsequent sections.A formal result in the direction of this section is Proposition 5.
Let us first consider the perspective of the stopper.Assuming that the controller uses a control strategy (λ * t ) we obtain-using standard filtering theory; see e.g., [21,Chapter 8.1]that the innovations process defined by is a Brownian motion with respect to ((F X t ), P), where (F X t ) is defined as the smallest rightcontinuous filtration to which (X t ) is adapted.
Relying again on basic filtering theory, and arguments similar to those in [10, Section 2.1], we find that if the strategy (λ * t ) is (F X t )-adapted-we shall later see that an equilibrium with this property can indeed be found-then the conditional probability (process) that the stopper assigns to the controller being active, i.e., P(θ = 1|F X t ) = E[θ|F X t ], t ≥ 0 is given by the stochastic differential equation (SDE) Note that the observations above rely implicitly on the assumption that the control strategy (λ * t ) is fixed in the sense that the stopper knows which process (λ * t ) that the controller uses.However, in order to verify that a candidate equilibrium strategy (λ * t ) is indeed an equilibrium strategy (cf.Definition 2 below) we must be able to analyze what happens to an equilibrium stopping strategy-which as we shall see will be determined as a threshold time in terms of the conditional probability process-when the controller deviates from the candidate equilibrium strategy.
To this end observe that if we consider an (F X t )-adapted candidate equilibrium strategy (λ * t ) and an arbitrary admissible deviation (control) strategy (λ t ), and now define a process (P t ) to be given by then (P t ) depends, of course, on the equilibrium candidate (λ * t ) as well as the deviation strategy (λ t ).However, using the observations above it is also directly verified that P t = P(θ = 1|F X t ), t ≥ 0 in the special case of no deviation (i.e., with (λ * t ) = (λ t )).In other words, (P t ) defined as in (5) coincides with the conditional probability process in the case of no deviation, but it also tells us how the controller affects (P t ) in the case of deviation, and we may therefore, as we will see, use this definition of (P t ) to find an equilibrium.
(Depending on the context we will, to ease notation, sometimes write P t and sometimes write P λ,λ * t .)In the present section we will restrict the set of admissible control strategies to be of Markov control type.Recall that Section 3 contains a weak formulation of our game where we relax the notion of admissible strategies to be a set of general stochastic processes (taking values in { λ, λ}).By restricting to Markov controls we ensure that (P t ) is obtained as the strong solution to (6); see Proposition 20 in Appendix A. Furthermore, using the definition of (X t ) in (1) as well as (6) we note that the dynamics of (P t ) can be written as and that (P t ) is (F X t )-adapted.Formally, we restrict the set of admissible control strategies to be of Markov control type by identifying an admissible control strategy (λ t ) with a deterministic function λ : [0, 1] → { λ, λ} according to λ t = λ(P t ), where (P t ) is given by ( 6), and where λ satisfies the conditions of Definition 1 (which also defines the set of admissible stopping strategies).
Definition 1 (Admissibility in the strong formulation).
• A Markov control (deterministic function) λ : [0, 1] → { λ, λ} is said to be an admissible control strategy if it is RCLL (right-continuous with left hand limits).The set of admissible control strategies is denoted by L.
• A stopping time τ is said to be an admissible stopping strategy if it is adapted to (F X t ).The set of admissible stopping strategies is denoted by T.
In line with Section 1, both players want to maximize their respective rewards and we define our Nash equilibrium accordingly.
Remark 3. In line with the usual interpretation of a Nash equilibrium we note that the first condition in (8) implies that deviating from the equilibrium is sub-optimal for the stopper, and that the second condition implies the same for the controller.Note also that the appearance of the equilibrium control λ * in the right hand side of the second condition in (8) is due to the role that it plays for the determination of P t = P λ,λ * t also when the controller deviates from the equilibrium, cf.(6).
Remark 4. A connection between our equilibrium definition and a fixed-point in a suitable best response mapping can be established.In fact we will use this connection when proving the equilibrium existence result Theorem 10.Let (τ * , λ * ) be any given admissible strategy pair.Then we may, in line our equilibrium definition, define the (point-to-set) best response mapping of the stopper as while the (point-to-set) best response mapping of the controller is given by It is then immediately clear that our equilibrium definition corresponds to a fixed-point in the best response mapping In the following result we conclude this section by establishing that (P t ) does indeed correspond to the conditional probability of an active controller, i.e., {θ = 1}, in case the controller does not deviate from an equilibrium (candidate).
Relying on standard filtering theory (see e.g., [21,Chapter 8.1] and arguments similar to those in the proof of [10,Proposition 11]), it can now be seen, for 0 ≤ t ≤ T , that is a Brownian motion with respect to ((F X t ), P).Hence, by the definition of (X t ) in (1) it is directly seen that (Π t ) satisfies the SDE Recalling the definition of (P t ) in ( 6), we observe that (P t ) and (Π t ) are both strong solutions to the same SDE in case λ * = λ.The results follow.

Searching for a threshold equilibrium
The aim of the present section is to search for an equilibrium of threshold type in the sense that the equilibrium strategy pair satisfies (τ where Remark 6.The (double) threshold strategy pair defined by ( 10)-( 11) corresponds to (i) stopping the first time that (P t )-whose dynamics is in this case given by (6) with λ(P t ) = λ * (P t ) = λ b * 2 (P t )-falls below b * 1 , and (ii) the controller using the control process (λ b * 2 (P t )), which is equal to the small controller rate λ when P t ≥ b * 2 and the large controller rate λ when P t < b * 2 .
We remark that the content of this section is mainly of motivational value and that a corresponding formal result is the verification theorem reported in Section 2.2, below.

The perspective of the controller
Given a candidate equilibrium strategy λ * ∈ L and supposing that the stopper uses a candidate equilibrium threshold strategy of the kind (10) where b * 1 ∈ (0, 1), the controller faces the optimal control problem where we recall that (P t ) = (P λ,λ * t ) is given by ( 6); however, due to the conditioning on θ = 1 in the controller reward J 2 (see ( 3)) we may here set θ = 1 in (6).
Indeed writing v(p) = v(p, b * 1 ) and relying on (3) with the underlying process (P t ) in the representation (6) with θ = 1, we expect, using the usual dynamic programming arguments, that the optimal value v(p) satisfies for all λ ∈ { λ, λ} and p ∈ (b * 1 , 1), while equality should hold in case λ * (p) = λ, i.e., We will from now on ease the presentation by sometimes writing e.g., λ * instead of λ * (p).By subtracting one of the two equations above from the other we obtain We conclude that if (τ b * 1 , λ * ) is an equilibrium then λ * = λ * (p) must satisfy (13) for λ ∈ { λ, λ} and all p ∈ (b * 1 , 1).We now first consider the case λ * (p) = λ with the deviation λ = λ (if λ = λ, then (13) trivially holds).In this case (13) becomes Supposing that p(p − 1)v p is decreasing (this is under additional assumptions on the model parameters verified in Proposition 27, below) we see, for any given equilibrium strategy λ * , that if we can find a value for p that gives equality in (14), then it is a lower threshold for the set of points p where λ * (p) = λ is possible; i.e., for any p smaller than this threshold we must have λ * (p) = λ.The interpretation is that if the stopper assigns a small probability to an active controller then the controller will control with the large rate λ.We now consider the case λ * (p) = λ and obtain, similarly to the above, the condition which in turns gives the condition Similarly to the analysis of (A) above, this gives us an upper threshold for p where λ * (p) = λ is possible; i.e., for any p exceeding this threshold we need λ * (p) = λ.
In order for ( 14) and ( 15) to be feasible conditions we need that (A) minus (B) is nonnegative, which is is directly verified.Hence, with the observations above as a motivation we will search for an equilibrium strategy λ * of the threshold type (11), where the threshold Note that (16) indicates that there may be multiple Nash equilibria, since every b * 2 satisfying (16) results in an equilibrium candidate strategy for the controller.As our equilibrium controller candidate we will, however, consider a switching point b * 2 that corresponds to equality in the right hand side inequality in (16).More precisely, we will search for an equilibrium controller strategy given by (11) Let us lastly note that if the players use a threshold strategy pair (b * 1 , b * 2 ), defined as in ( 10)-( 11), with 0 < b * 1 < b * 2 < 1, then it can be shown that the corresponding value for the controller, i.e., where is given by ( 6) with λ Indeed we will in the subsequent analysis show that we may choose a stopper-controller threshold pair (b * 1 , b * 2 ) which is an equilibrium with a controller value given by ( 18), under certain parameter assumptions; see Theorems 8 and 10.
Remark 7. Note that (18) is a boundary value problem on (b * 1 , 1), whose solution v has been extended to be equal to zero on [0, b * 1 ).The boundary conditions of (18) follow immediately from the boundary cases p ≤ b * 1 and p = 1, which result in immediate stopping (corresponding to no income for the controller) and never stopping (corresponding to the income rate c earned forever), respectively.
Relying again on Proposition 5 we may moreover use that P t = E[θ|F X t ] and iterated expectation to replace θ in the stopper reward with P t ; in other words we have the representation where (P t ) is given by (9).Based on this it can be shown that the stopper reward coincides with Note that ( 19) is also a boundary value problem on (b * 1 , 1) whose solution u has been extended to be equal to zero on [0, b * 1 ).The boundary conditions of ( 19) can be interpreted using arguments similar to those in Remark 7.
Lastly note that if 2 ) = 0, by the smooth fit principle of optimal stopping theory, which motivates condition (II) in Theorem 8 below.

A threshold equilibrium verification theorem
Here we present our first main result, which is a verification theorem based on the equilibrium conditions that were informally derived in Section 2.1.
2 ) be solutions to the boundary value problems (18) and (19).Suppose that (IV) Then the stopper-controller strategy pair is a Nash equilibrium (Definition 8).Moreover, u and v correspond to the equilibrium values for the stopper and the controller respectively, i.e., Let n be fixed number.Relying on Proposition 5 which implies that (P t ) solves (9), as well as (II) and Itô's formula we obtain for an arbitrary stopping time τ that e −r(τ ∧n) u(P τ ∧n ) = u(p) + where the Itô integral is a martingale since the integrand is bounded.Now use (19) and (21) to see that Using the above together with Proposition 5 and iterated expectation, and (I), we find that (Note that the equality above is trivial when p ≤ b * 1 , since u(0) = 0 for p ≤ b * 1 ).Using that u is bounded together with u(b * 1 ) = 0 we find using dominated convergence that the first expectation above converges to zero as → ∞.Hence, using dominated convergence again, we find that We conclude that Optimality of λ * .The controller reward (3) is conditioned on θ = 1.Hence, in order to find the optimal strategy for the controller, we consider the process (P t ) defined by ( 6) with θ = 1; in particular, if the controller selects an admissible control λ, then (P t ) is given by We now define the process (N t ) given by , by Note that it also holds, for p To see this use (18) and that λ * = λ b 2 * is given in (20).Multiplying the equation above by e −rt and subtracting the resulting left hand side (which is zero) from the drift coefficient of (N t ) yields that the drift coefficient of (N t ) can, for p ∈ ), be written as With arguments similar to those in Section 2.1.1 we find that conditions (III) and (IV) imply that the expression above is non-positive (compare the above expression with ( 13)), i.e., the drift of (N t ) is non-positive (regardless of the choice of λ ∈ L).
We conclude that (N t ) is a bounded process with non-positive drift.Using optional sampling we find for any n ∈ N. Using dominated convergence and lim n→∞ e −r(τ b * ∧n) v(P τ b * ∧n ) = 0 a.s.we find Repeating the same arguments with λ = λ * we obtain that the drift of (N t ) vanishes and that We conclude that

Equilibrium existence
The main result of this section is Theorem 10 which reports conditions on the primitives of the model that guarantee the existence of a threshold equilibrium.The proof of this result, which is reported in Section 2.4, relies on the Poincaré-Miranda theorem and is in this sense a fixed-point type proof.In particular, the Poincaré-Miranda theorem follows from the Brouwer fixed-point theorem; cf.e.g., [22].
The following notation will be used throughout this section In particular, we will use these to express solutions to the ODEs in (18) and (19).
Theorem 10 (Equilibrium existence).Suppose the model parameters λ, λ, c and r are such that and Then there exists constants given by (20) is a Nash equilibrium.
Using this observation it is easily verified that there exists, for fixed c and r, a constant h ∈ (0, ∞) such that these conditions are satisfied for each h ≤ h.In other words, the conditions of Theorem 10 hold, i.e., an equilibrium exists, whenever λ and λ are sufficiently close to each other.

The proof of Theorem 10
The proof of Theorem 10 is found in Section 2.4.3.It relies on the content of Sections 2.4.1-2.4.2.

Observations regarding Equation
where the constants k i , i = 1, .., 4 can be determined by the boundary and smoothness conditions in (18).(Recall that α i (λ), i = 1, 2 are defined in (23).)However, instead of directly determining k i , i = 1, .., 4 to attain these conditions we will determine these constants in order to attain only the boundary conditions and the continuity in (18) as well as the condition (The interpretation of ( 27) is that condition (IV) in Theorem 10 holds from the right.)After this we will show that b * 2 can be chosen so that ( 27) also holds from the left (i.e., so that v satisfies all conditions of (18) as well as (IV)); see Lemma 12 .
Using these constants we obtain from (26) that and hence, using also continuity .
We need the following technical result in the proof of Theorem 10 (in Section 2.4.3).The proof can be found in Appendix B. 2 ) be given by (26) with the constants k i , i = 1, . . ., 4 determined above.Then and In particular, (ii) For any fixed b * 1 ∈ (0, 1) there exists a b * 2 ∈ (b * 1 , 1) such that the solution v(p) to (18) satisfies (IV).

Observations regarding Equation
where the constants c i are determined by the boundary and smoothness conditions in (19).The boundary condition u Finally, the remaining two conditions give us and We will make use of the following technical result in the proof of Theorem 10.The proof can be found in Appendix B.
Lemma 13.Suppose (25) holds.Then, for the solution to (19) it holds that 2 ) be given by ( 26) and (32) with the constants c i , k i , i = 1, ..., 4 determined as in Sections 2.4.1 and 2.4.2.Then all we have left to do to is to show that the pair (b * 1 , b * 2 ) can be chosen so that To this end we introduce the notation Using Lemma 13, it is easy to see that there exist constants 0 < b1 < b1 < 1 such that: (i) f ( b1 , b 2 ) < 0 for all b 2 > b1 , and f ( b1 , b 2 ) > 0 for all b 2 > b1 , and (ii) f is continuous on the set Fix two such values b1 and b1 (arbitrarily).We can now find a continuous extension of f on the whole rectangle [ b1 , b1 ] × [0, 1] by We conclude that f is continuous on Using Equation ( 29) and (30) we find a continuous extension of g on the whole rectangle Based on Lemma 12 (in particular the left hand side inequality of (31)) we may now conclude that: g(b 1 , 1) < 0, for any b 1 ∈ [ b1 , b1 ] and g(b The conclusions noted for f and g imply that we may use the Poincaré-Miranda theorem (cf.[22]).In particular, it implies that there exists a pair

Weak formulation
The purpose of this section is to consider a more general class of admissible control strategies compared to that of the strong formulation in Section 2. To this end we consider here a weak formulation of our game based measure changes and Girsanov's theorem.We remark that this formulation is closely related to [9], where a similar weak solution approach is used for an optimal control problem with discretionary stopping.The main finding of the present section is that the double threshold equilibrium of Theorem 8 is a Nash equilibrium also in the weak formulation.
Let (Ω, A, P) be a probability space supporting a one-dimensional Brownian motion (X t ) and a Bernoulli random variable θ with P(θ = 1) = p ∈ (0, 1).Denote by (F X t ) the smallest right continuous filtration to which (X t ) is adapted.Define the terminal filtration according to ) and F X,θ ∞ analogously.
Definition 14 (Admissibility in the weak formulation).
• A process (λ t ) is said to be an admissible control process if it has RCLL paths, is adapted to (F X,θ t ), and takes values in { λ, λ}.The set of admissible control processes is denoted by L.
• A stopping time τ is said to be an admissible stopping strategy if it is adapted to (F X t ).The set of admissible stopping strategies is denoted by T. Remark 15.The set of admissible stopping strategies in the weak formulation is analogous to set of admissible stopping strategies in the strong formulation.The main difference is instead that we define (X t ) as a Brownian motion in the weak formulation, whereas (X t ) is given by (1) in the strong formulation.Now for any given control process (λ t ) ∈ L we define the process (W λ t ) according to By Girsanov's theorem ([18, Chapter 3.5]) there exists a measure P λ t ∼ P on (Ω, (F X,θ t )), given by such that {W λ t , F X,θ t ; 0 ≤ t ≤ T } is a Brownian motion on (Ω, F X,θ T , P λ T ) for each fixed T ∈ [0, ∞).Moreover, we note that (Λ λ t ) is a martingale by Novikov's condition.Thus, the theory of the Föllmer measure gives us the existence of a measure P λ on F X,θ ∞ , which satisfies P λ (A) = P λ t (A) for every t ∈ [0, ∞) and A ∈ F X,θ t ; see [9, Section 2], and also [13] and [18, p.192].This allows us to give definitions of the reward functions based on measure changes.Definition 16.Given a strategy pair (τ, (λ t )) ∈ T × L we define the payoff of the stopper as and the payoff of the controller as Remark 17.Let us motivate Definition 16 further.In the strong formulation (Section 2) we consider a fixed probability measure and define the controlled process (X t ) in terms of a control process (λ t ) and a given Brownian motion (W t ); cf.(1).In the present weak formulation we instead define (X t ) as a Brownian motion, and let the control process (λ t ) imply a measure change P λ , such that W λ defined by (39) is a Brownian motion under this measure.By comparing the resulting weak formulation equation for (X t ) (i.e., (39)) and the equation for (X t ) in the strong formulation (in (1)) the connection between the formulations becomes clear.
The Nash equilibrium is now defined in the usual way: Definition 18.A pair of admissible strategies (τ * , (λ * t )) ∈ T × L is a said to be a Nash equilibrium if for any pair of deviation strategies (τ, (λ t )) ∈ T × L.
In line with the strong formulation solution approach (cf.( 10) and ( 11)) we define a double threshold strategy pair ( where (P t ) is (in analogy with ( 7)) given by the SDE (Recalling that (X t ) is a Brownian motion we find that (46) has a strong solution using analogues arguments as in the strong solution formulation; cf.Proposition 20).
The main result of the present section is that the double threshold equilibrium investigated in the strong formulation is also an equilibrium in the weak formulation.Note that this implies that equilibrium existence in the weak formulation is guaranteed by the same parameter conditions as in Theorem 10.
1 are such that the conditions of Theorem 8 hold (implying that they correspond to a double threshold equilibrium (10)- (11), in the strong formulation).Then, given by (44), ( 45) and ( 46) is an equilibrium in the weak formulation (Definition 18).
Proof.In this proof we write λ * = λ b * 2 .For any admissible deviation strategy (λ t ) it follows from (39) and ( 46) that (P t ) has the representation where we recall that (W λ t ) is a Brownian motion under the measure P λ .We remark that (P t ) depends in this sense on both λ * and (λ t ) when the controller deviates from the equilibrium.
Note that the representation of (P t ) = (P λ * ,λ * t ) in (47) is analogous to (6) in the strong formulation.Moreover, it is directly seen that the value functions are the same for both formulations in the case of no deviation, i.e., Hence, u(p) = J 1 (τ * , λ * , p) and v(p) = J 2 (τ * , λ * , p), with u(p) and v(p) as in Theorem 8. Using this it is directly checked that the proof of Theorem 8 can be adjusted so that it shows that (b * 1 , b * 2 ) corresponds to an equilibrium also in the present weak formulation.Indeed, this requires only minor adjustments including that (P t ) is here given by (47), and that the deviation strategies are allowed to be processes in L. Particularly, note that Proposition 5 holds also in this case.
Acknowledgment The authors are grateful to Erik Ekström at Uppsala University for discussions regarding games of the kind studied in the present paper, and suggestions that lead to improvements of this manuscript.
A Properties of (P t ) in the strong formulation Proof.Consider the interval I = [ , 1 − ], for a small arbitrary constant > 0. Then the diffusion coefficient is uniformly bounded away from zero in I. Thus we obtain for both cases {θ = 0} and {θ = 1} that: (i) a weak solution (P t ) to ( 6) exists (cf.e.g., [18,Ch. 5]), and (ii) a solution to ( 6) is pathwise unique in I (see [23]).By Lemma 21, we obtain that (P t ) cannot reach 0 or 1 in finite time.Hence, (6) admits a strong solution (P t ) by [18,Corollary 3.23].
(The existence of a strong solution to (50) is given by arguments similar to those in the proof of Proposition 20.) Since λ * is RCLL in [0, 1] (and is piece-wise constant), there exists a z ∈ (0, 1) such that λ * (p) = λ * (z) for p ∈ (0, z].With some calculations we now obtain that the scale function of (50), is for a ≤ z given by where C(z) > 0, and density of the speed measure for p ≤ z is given by Using that s (x) is increasing for x ∈ (0, 1) we have Hence, τ0 = ∞ follows from Feller's test for explosion, and (48) follows.

B Proofs of Lemmas 12 and 13
Proof.(of Lemma 12.) Observe that b .
First we consider the limit b * 2 → 1.For the first part we have that 1, since α 1 ( λ), −α 2 ( λ) > 0. We consider the remaining term.We obtain For (A1) we obtain using α 1 ( λ) − α 2 ( λ) > 0, that Adding the limits gives us (29), and using (24) we thus obtain the first part of (31).For the second limit we find We note that α 1 ( λ)−α 2 ( λ) > 0 and further investigate the limit by considering the denominator and numerator of k 1 separately.For the denominator of k 1 we have that to see this use e.g., that x → 1−x x is decreasing for x > 0. For the numerator of k 1 we use (24) to find and by (52) we obtain (30) (from which the second part of (31) follows).Hence, statement (i) has been proved.(18) as well as (IV) and hence statement (ii) holds.
We need the following technical result in the proof of Lemma 13.Lemma 22.Let c 1 be given by (34), then we have (a) x is strictly decreasing for x > 0, which implies that Additionally, −α 2 ( λ) > 0 and b * 1 ≤ c/ λ implies that the numerator is positive and decreasing in b * 2 .We conclude that c 1 is strictly decreasing on b * 2 .In order to prove (b) we write where Proof.(of Lemma 13.)We find with some work (use e.g.,(33)) that .

C Results for the proof of Theorem 10
Throughout this section we consider the setting of the proof of Theorem 10.Particularly, we here consider a pair (b * 1 , b * 2 ) such that (II) and (IV) hold.We also rely on condition (25). 1 , 1) and v(1) = c/r implies that there exists p ∈ (p, 1) with v(p) > c/r such that v attains a local maximum at p. Using the ODE in (18) we find v pp (p−) ≥ v pp (p+) > 0, which is a contradiction.
We continue to prove v(p) < c/r by contradiction.For this purpose, assume that v(p) = c/r for some p ∈ (b * 1 , 1).Then by v ≤ c/r we have that v p (p) = 0. We have two cases: 2 ) > 0. Proof.We will only prove the first statement, since the second statement follows using analogues arguments.Suppose that v p (b * 1 +) ≤ 0. We will show that this implies that v has a local minimum below zero, i.e., there exists a point p ∈ (b * 1 , 1) such that v p (p) = 0, v pp (p+) ≥ 0 and v(p) < 0. This contradicts the ODE in (18) since c − (λ * (p+) − λ) 2 ≥ 0 (by ( 25)).We have three cases: +) < 0, then v(1) = c r and continuity immediately imply that v has a local minimum below zero.
• If v p (b * 1 +) = 0, c − ( λ − λ) 2 > 0, then the ODE in (18) implies that v pp (b * 1 +) < 0. Analogously to the first case this implies that v has a local minimum below zero.Proof.We show that v p (p) > 0 for p ∈ (b * 1 , b * 2 ) by contradiction.The remaining case can be proved using analogues methods.To this end, assume that p 1 ∈ (b * 1 , b * 2 ) is the smallest point such that v p (p 1 ) = 0. We consider three cases: • If v pp (p 1 ) = 0, then the ODE in (18) implies that v is constant on (p 1 , b * 2 ), which is a contradiction to v ∈ C 1 (b * 1 , 1) and Lemma 24(b).
• If v pp (p 1 ) > 0, then p 1 is a local minimum and Lemma 24(a) implies that p 1 cannot be the first point with v p = 0.
• It is easily verified that f is increasing and that f c/ λ ≥ 0 (use e.g., (25)).This is a contradiction to u p (b * 1 ) = 0 and the statement follows.

Figure 1
Figure 1 contains a numerical example.Remark 11. (i) The conditions (24)-(25) of Theorem 10 can be directly examined for any given parameter specification.(ii) If we set λ = λ + h, then we can write these conditions as
Relying on the continuity of v p (b * 2 −, b * 1 , b * 2 ) for b * 2 ∈ (b * 1 , 1) it follows immediately from (31) and the intermediate value theorem that we can choose b

Proof. 2 ∈
Let us prove (a) by showing that showing that c 1 is strictly decreasing in b * 2 (recall that b * (b * 1 , 1)); the result then follows by taking b * 2 = 1 in c 1 .It holds that the denominator of c 1 is positive and strictly increasing in b * 2 .To see this use e.g., that α 1 ( λ) − α 1 ( λ) < 0 and that 1−x

D(b * 1 , b * 2 )= b * 1 in the first and b * 2 = 1
denotes the denominator of c 1 .Note that b * 1 ≥ c/ λ implies that the second expression is non-positive.Thus, by similar arguments to (a) the result follows by taking b * 2 in the second expression.
and the first result follows.Now suppose b * 1 ≥ c/ λ.Then Lemma 22(b) implies