Optimal Stopping of BSDEs with Constrained Jumps and Related Zero-Sum Games

In this paper, we introduce a non-linear Snell envelope which at each time represents the maximal value that can be achieved by stopping a BSDE with constrained jumps. We establish the existence of the Snell envelope by employing a penalization technique and the primary challenge we encounter is demonstrating the regularity of the limit for the scheme. Additionally, we relate the Snell envelope to a finite horizon, zero-sum stochastic differential game, where one player controls a path-dependent stochastic system by invoking impulses, while the opponent is given the opportunity to stop the game prematurely. Importantly, by developing new techniques within the realm of control randomization, we demonstrate that the value of the game exists and is precisely characterized by our non-linear Snell envelope.


Introduction
In recent decades, the optimal stopping problem has garnered considerable attention as one of the fundamental stochastic control problems.As a non-linear counterpart to the classical optimal stopping problem, El Karoui et.al. introduced the notion of reflected backward stochastic differential equations (RBSDEs) [11].Since their introduction, RBSDEs have found wide-ranging applications in the realm of stochastic control.These applications involve strategies of mixed type, seamlessly integrating stopping [4,10] (or more generally switching [19,18] and impulse control [29]) with classical control.Additionally, RBSDEs have also proven invaluable in addressing related challenges such as stochastic differential games (SDGs) [15].However, the application of BSDEs, including RBSDEs, is constrained by a notable limitation: their semi-linear nature allow us to relate the solution of a BSDE to a stochastic control problem only when the volatility is not immediately affected by the classical control.
Efforts to address this issue have led to the development of two distinct approaches.On one hand, there is the advancement of quasi-sure analysis [35] and the related concepts of second-order BSDEs (2BSDEs) (see [6,36]) and G-nonlinear expectations [27].On the other hand, there is the consideration of BSDEs driven by both a Brownian motion and an independent Poisson random measure, where the jumps are constrained to exceed a predefined barrier [22].The latter type of BSDEs were related to fully non-linear Hamilton-Jacobi-Bellman integro-partial differential equations (HJB-IPDEs) through a Feynman-Kac representation in [23].This innovative approach to stochastic optimal control is commonly referred to as control randomization.A significant breakthrough in this field was achieved with the seminal work of [14], which directly linked the value function of the randomized control problem to that of the original control problem.This eliminated the need for a Feynman-Kac representation, thereby expanding the theoretical framework to encompass stochastic systems with path-dependencies.Building upon this foundation, subsequent advancements extended their approach to the framework of partial information settings in [1] and optimal switching problems in [13].
Whereas approaches based on quasi-sure analysis and related techniques (notably that of [24]) have been successfully employed to solve various types of zero-sum stochastic differential games (see e.g.[30,25,31,28]), the extension of control randomization in this context appears to be constrained.It primarily manifests through a Feynman-Kac relation between RBSDEs, where the jumps are constrained to be non-positive, and fully non-linear variational inequalities that was established in [7].Building upon a result presented in [3], the latter offers a probabilistic representation of the value function in Markovian controller-and-stopper games.It is worth noting that, the methodology employed in [7] to prove the existence of a solution to the RBSDE relies on a double penalization scheme.Therefore, similar to previous studies on doubly reflected BSDEs (see e.g.[8]), their approach assumes strong smoothness conditions on the reflecting barrier.
In the first part of the present work we take an altogether different approach and investigate the non-linear Snell envelope defined as Y t := ess sup τ ∈Tt Y τ t , where for each [0, T ]-valued stopping time τ , the quadruple (Y τ , Z τ , V τ , K τ ) is the maximal solution to the stopped BSDEs with constrained jumps t (e) ≥ −χ(t, Y τ t− , Z τ t , e), dP ⊗ dt ⊗ λ(de) − a.e., (1.1) in which χ : [0, T ] × Ω × R d+1 × U → [0, ∞) provides a lower barrier for V τ .We study a general setting where the barrier, S, is only required to be càdlàg and quasi-left upper semi-continuous.Under this assumption, along with mild conditions on the data f and χ, we demonstrate the existence of a càdlàg process Y that satisfies standard integrability assumptions and fulfills the aforementioned relation.
In the second part, we shift our focus to the scenario where the stopped BSDE takes on a linear form, t (e) ≥ −χ(t−, X, e), dP ⊗ dt ⊗ λ(de) − a.e., where we have introduced a state-process, X, that solves the path-dependent SDE γ(s−, X, e)µ(ds, de), ∀t ∈ [0, T ]. (1. 3) The primary contribution in this part lies in establishing a relationship between the non-linear Snell envelope, defined over solutions to (1.2), and a path-dependent SDG of impulse control versus stopping.Specifically, for any t ∈ [0, T ] and any given impulse control u := (η j , β j ) ∞ j=1 ∈ U t (the set of impulse controls where the first intervention is made after t), we let X t,u solve the path-dependent SDE with impulses, X t,u s = x 0 + s 0 a(r, X t,u )dr + s 0 σ(r, X t,u )dW r + s∧t 0 U γ(r−, X t,u , e)µ(dr, de) ½ [η j ≤s] γ(η j , X t,[u] j−1 , β j ), ∀s ∈ [0, T ], (1.4) where [u] k := (η j , β j ) k j=1 and N := sup{j : η j ≤ T }.We then consider the game of impulse control versus stopping with lower (resp.upper) value process defined as Within our problem formulation, the cost/reward functional takes on the form J t (u; τ ) := E Ψ(τ, X t,u ) + τ t f (s, X t,u )ds + N j=1 ½ [η j ≤τ ] χ(η j , X t,[u] j−1 , β j ) F t and U S,W t (resp.T S,W t ) is the set of non-anticipative maps from the set of stopping times with respect to F t,W (the filtration generated by µ(• ∩ [0, t], •) and W ) valued in [t, T ], denoted T W t , to the set of F t,W -adapted impulse controls with the first intervention after t, denoted U W t (resp.U W t → T W t ).In particular, we demonstrate that, under fairly general assumptions on the involved coefficients, the non-linear Snell envelope Y t := ess sup τ ∈Tt Y τ t serves as a representation of the game's value by satisfying Y t = Y t = Ȳt .This finding extends the existing results on path-dependent impulse control in [9,20], as well as the recent advancements in path-dependent SDGs involving impulse controls in [29,28].Notably, our work expands this framework to incorporate scenarios where the opponent employs a stopping rule, while also providing opportunities for the development of more efficient numerical solution methods.Of greater significance, however, is that our work bridges a void in the literature on control randomization by extending the applicability of this methodology to incorporate zero-sum SDGs with path-dependencies.Additionally, our assumptions are formulated in a way to allow the results to transfer to other types of SDGs.In particular, it should be fairly straightforward to adapted the developed methodology to handle controller-stopper games, thereby extending the results in [7] to the non-Markovian framework while allowing for a more general setting compared to [5,25], as these works are based on a non-degeneracy assumption on the volatility.
The remainder of the article is structured as follows.In the next section, we establish all notation that will be used throughout the first part of the paper and recall some important results on reflected BSDEs with jumps.In Section 3, we show existence of a non-linear Snell envelope Y t = ess sup τ ∈Tt Y τ t , where Y τ is the first component in the unique maximal solution to the general BSDE in (1.1).In Section 4, we meticulously outline the framework for our SDG.In addition, we provide preliminary estimates on the solution to the controlled SDE as defined in equation (1.4) and introduce approximations of the value functions Y and Ȳ , based on truncation and discretization.Subsequently, in Section 5 we demonstrate the existence of a value for the game by showing that the upper and lower value functions both coincide with the same non-linear Snell envelope, as defined in Section 3.

Probabilistic setup
Let (Ω, F, P) be a complete probability space that supports a d-dimensional Brownian motion denoted by W , and an independent Poisson random measure µ defined on a compact set U ⊂ R d with a finite compensator λ.We denote the P-augmented natural filtration generated by W and µ as

Notations
Throughout, we will use the following notations, where T > 0 is the fixed problem horizon: • For a measure space ( Ω, F ) and a filtration F on F we let Prog( F) (resp.P( F)) denote the σ-algebra of F-progressively (resp.F-predictably) measurable subsets of R + × Ω.
• For p ≥ 1 and a measure space (E, E, m) we let L p (E, E, m) denote the set of functions ξ : E → R which are E-measurable and such that |ξ| p is integrable under m.When m = P we often use the shorthand L p (E, E) and when (E, E, m) = (Ω, F, P) we sometimes write L p .
• We let T be the set of all [0, T ]-valued F-stopping times and for each η ∈ T , we let T η be the corresponding subset of stopping times τ such that τ ≥ η, P-a.s.
• For p ≥ 1 and τ ∈ T , we let S p,τ be the set of all R-valued, Prog(F)-measurable càdlàg processes When τ = T , we use the shorter notation S p .
• We let A p,τ be the subset of S p,τ with all P(F)-measurable processes, Z, that are non-decreasing and start at Z 0 = 0.Moreover, we let A p := A p,T .
• We let H p,τ (W ) denote the set of all R d -valued, P(F)-measurable processes (Z When τ = T , we use the notation H p (W ).
• We let H p,τ (µ) denote the set of all R-valued, Unless otherwise specified, all inequalities involving random variables are assumed to hold P-a.s.

Reflected BSDEs with Jumps
Our approach will heavily rely on the existing theory of reflected backward stochastic differential equations (RBSDEs) with jumps.Several studies have addressed the existence and uniqueness of such RBSDEs, with varying assumptions on the involved coefficients and the obstacle, as documented in works such as [16,12,17].We recall the following important result: Theorem 2.1.(Hamadéne-Ouknine [17]) Assume that a) ξ ∈ L 2 (Ω, F T ).
b) The barrier S is real-valued, Prog(F)-measurable and càdlàg with S + ∈ S 2 and S T ≤ ξ.
< ∞ and for some k f > 0 we have that P-a.s., for all (t, y, y Then, there exists a unique quadruple where K c is the continuous and K d the purely discontinuous part of K, respectively. The comparison principle for BSDEs with jumps is not as straightforward as for BSDEs driven solely by Brownian motion.Early work on this topic was presented in [2], where a first result was obtained, and later expanded upon in [34] and [32].In addition to the prerequisites for Theorem 2.1, these studies assumed an integral constraint on the driver.In a related context, [33] employed a similar comparison result to establish a connection between the solution of a reflected BSDE with jumps and a stopping problem.In light of these findings, we recall the following: Theorem 2.2.(Quenez-Sulem [32,33]) Assume that (ξ, S, f ) satisfies the assumptions in Theorem 2.1 and that dP ⊗ dt-a.e. for all (y, z) ∈ ×R 1+d and v, v ′ ∈ L 2 λ we have with ψ ∈ L 2 λ .We then have: i) The unique solution to RBSDE (2.1) satisfies Y t = ess sup τ ∈Tt Y τ t , where the triple ii) If S is quasi-left upper semi-continuous, then with and get that K + is continuous and satisfies K + Dt − K + t = 0, P-a.s.
iii) Assume that the parameters of another RBSDE, ( ξ, S, f ), satisfy the requirements of Theorem 2.1 in addition to ξ ≤ ξ, P-a.s., St ≤ S t , P-a.s., for all t ∈ [0, T ] and that 2) with parameters ( ξ, S, f ) satisfies Ỹ τ t ≤ Y τ t , P-a.s. for each t ∈ [0, T ] and τ ∈ T t .In particular, if Ỹ is the first component in the solution to (2.1) with parameters ( ξ, S, f ) we get that Ỹt ≤ Y t , P-a.s.Remark 2.3.Recall here the concept of quasi-left continuity: A càdlàg process (X t : t ≥ 0) is quasi-left continuous if for each predictable stopping time θ and every announcing sequence of stopping times θ k ր θ we have X θ− := lim k→∞ X θ k = X θ , P-a.s.Similarly, X is quasi-left upper semi-continuous if X θ− ≤ X θ , P-a.s.

Optimal stopping of BSDEs with constrained jumps
In this section, we consider optimal stopping of BSDEs with constrained jumps.For each τ ∈ T , we recall the definition of the quadruple τ as the maximal 1 solution to the following equation: where the data (f, S, χ) satisfies the assumptions in Theorem 3.1 below.Indeed, repeating the steps in [22] we find that, under these assumptions, (3.1) admits a unique maximal solution for each τ ∈ T .
The main contribution of the present section is that we show the existence of an aggregator Y ∈ S 2 satisfying Y η = ess sup τ ∈Tη Y τ η for every η ∈ T , in addition to a corresponding optimal stopping time.We summarize this result in the following theorem: Theorem 3.1.Assume that, 1 Maximal in the sense that Y τ t ≥ Ỹ τ t whenever Ỹ τ is the first component of another solution • S ∈ S 2 is left upper semi-continuous at predictable stopping times, in particular ξ := S T ∈ L 2 (Ω, F T ) and lim t→T S t ≤ ξ, P-a.s.; for all (t, y, y ′ , z, z ′ ) ∈ [0, T ] × R 2(1+d) , P-a.s., and such that f (•, 0, 0, 0) H 2 (W ) < ∞.Moreover, dP ⊗ dt-a.e. for all (y, z) ∈ ×R 1+d and v, v ′ ∈ L 2 λ we have  1+d) , dP ⊗ λ(de)-a.e., and such that χ(•, 0, 0, Then there exists a Y ∈ S 2 such that for every η ∈ T , we have As noted in the introduction, a similar results was shown in [7].In particular, assuming that χ is identically zero and introducing an additional condition of regularity on the barrier S, the work presented in [7] established the existence of a process Y satisfying the conditions of Theorem 3.1.Furthermore, it was shown that there are processes (Z, V, K − , K + ) ∈ H 2 (W ) × H 2 (µ) × A 2 × A 2 such that the quintet (Y, Z, V, K − , K + ) is the unique maximal solution to the reflected BSDE: However, the specific regularity assumption on S made in [7] and their choice of χ ≡ 0 render their methodology unsuitable for addressing impulse control problems.Conversely, Theorem 3.1 is meticulously tailored to accommodate impulses and the corresponding intervention costs, making it a more appropriate framework for handling this particular application.
To prove Theorem 3.1 we apply an approximation routine based on penalization and for each n ∈ N, (v ′ (e) − v(e))λ(de).
Since (Y n , Z n , V n , K +,n ) satisfies V n s (e)μ(ds, de)  |V n s (e)|2 λ(de)ds for any α > 0 and we conclude that On the other hand, squaring V n s− (e)µ(ds, de) gives |V n s (e)| 2 λ(de)ds ) and by choosing α > 0 sufficiently small the assertion follows.
An immediate consequence of Lemma 3.4 is that the sequence (Y n ) n∈N is non-increasing.As it is, moreover, bounded from below by the barrier S, we find that there is a progressively measurable process Ỹ such that Y n ց Ỹ , pointwisely.Since each ν ∈ V is bounded and thus belongs to V n for some n, Proof.Since Y n is a sequence of optional processes, Ỹ is optional.To prove right-continuity, we thus only need to show that Ỹ is right-continuous at each stopping time.Moreover, since Ỹ is the limit of a non-increasing sequence of càdlàg processes it is clearly right upper semi-continuous.We argue by contradiction and assume that there is an ϑ ∈ T and an ε > 0 such that lim inf tցϑ Ỹt ≤ Ỹϑ − ε on some set B ∈ F ϑ of positive measure.If this would be true, then the sequence of stopping times defined as where On the other hand, for each ν ∈ Vϑ i and τ ∈ T ϑ , we have and we find that where χ := U χ(•, 0, 0, e)λ(de).Taking the expectation on both sides and utilizing the fact that B ∈ F ϑ , we can deduce that Concerning the first term, right-continuity of S and dominated convergence implies that ½ B S τ ∧ϑ i − S ϑ i ) tends to 0 in L 1 as i → ∞, uniformly over all τ ∈ T ϑ .Moreover, |Y τ ;ν | ≤ |Y 0 | + |S|, where the right-hand side belongs to S 2 , and a simple application of Jensen's inequality together with the fact that f (•, 0, 0, 0), χ both have finite H 2 (W )-norms, and since there is a C > 0 such that sup for all i ∈ N, we find that the right-hand side of (3.6) tends to 0 as i → ∞.This is a contradiction, since our construction above implies that In particular, we conclude that Ỹ is right-continuous.We turn to the existence of left limits and consider the number of downcrossings of the interval [a, b] made by Ỹ for any two constants a, b ∈ R with a < b.We thus define . . is a non-decreasing sequence of stopping times (strictly until it hits T , since Ỹ is right-continuous) and ϑ a i ր ϑ a,b (and then also ϑ b i ր ϑ a,b ) for some predictable stopping time ϑ a,b ∈ T .Moreover, on the set On the other hand, arguing as above we find that where the right-hand side tends to 0 since S has left limits and ϑ a i − ϑ b i , tends to 0, P-a.s.Now, by construction (3.7) ) is P-a.s.finite for any a, b ∈ R with a < b.By countable additivity this holds simultaneously for all rational pairs a < b (outside of a P-null set) and thus also simultaneously for all real pairs a < b proving that Ỹ has left limits everywhere.
Combining Lemma 3.3 and Fatou's lemma gives that Ỹ S 2 < ∞ and we conclude that Ỹ ∈ S 2 .
Remark 3.6.By construction, it also follows that the limit Ỹ is quasi-left upper semi-continuous.In particular, if (ϑ i ) i∈N is an increasing sequence in T with for any n ∈ N, whereas Y n ϑ ց Ỹϑ as n → ∞ by definition.As Y n is non-increasing in n, the sequence of stopping times (τ n ) n∈N is non-increasing and by rightcontinuity of the filtration, we conclude that there is a τ ⋄ ∈ T η such that τ n ց τ ⋄ .Moreover, rightcontinuity of Ỹ and S implies that On the other hand, Y n τ⋄ ≥ S τ ⋄ for each n ∈ N from which it follows that Ỹτ ⋄ = lim n→∞ Ỹ n τ ⋄ ≥ S τ ⋄ and we conclude that Ỹτ ⋄ = S τ ⋄ .In particular, the stopping time τ := inf{t ≥ η : Ỹt = S t } satisfies η ≤ τ ≤ τ ⋄ , P-a.s.Lemma 3.7.Let η and τ be as defined above, then Ỹη = Y τ η .
We thus conclude that ( Ỹ , Z, Ṽ , K− ) solves (3.2) for τ = τ on the interval [η, τ ].On the other hand, (Y τ , Z τ , V τ , K τ ,− ) is the unique maximal solution to this BSDE and since Ỹ ≥ Y τ , we conclude that V τ,n s (e)µ(ds, de) On the other hand, repeating the steps above it easily follows that Y τ,n ց Y τ pointwisely and we conclude that In particular, ( Ỹ , τ ) fulfills the condition regarding (Y, τ * ) as stated in the theorem.
Remark 3.8.An additional point of significance is that the sequence of reflected BSDEs, represented by (3.3), presents an efficient means for numerically approximating the non-linear Snell envelope, Y .Alternatively, one can first approximate τ * using the sequence τ n and then employ a numerical scheme for BSDEs with constrained jumps (see e.g.[21]) to find an efficient numerical approximation of Y .

A zero-sum game of impulse control versus stopping
We delve into formulating the game of impulse control versus stopping, which is closely connected to the problem of optimal stopping of BSDEs with constrained jumps treated above and, in particular, to the corresponding non-linear Snell envelope.In this section, we will precisely define the problem, state the main result and provide preliminary estimates for the involved processes.Subsequently, in the following section, we will prove that the game has a value by establish a relationship between the upper and lower value functions and the aforementioned non-linear Snell envelope.

Additional notations and definitions
We introduce the following additional notations: • We let T W be the set of all F W -stopping times valued in [0, T ] and for each t ∈ [0, T ], we let T W t be the set of F t,W -stopping times τ such that τ ≥ t, P-a.s.
• We let D be the set of càdlàg paths [0, T ] → R d with a finite number of jumps and for each t ∈ [0, T ], we introduce the semi-norm where C(x) is the continuous part of x, J i (x) is the time of the i th jump of x and ∆x t := x t − x t− .
• We introduce the filtration D := (D t ) t∈[0,T ] , where D t is the σ-algebra generated by the canonical maps • We let U be the set of all u = (η j , β j ) N j=1 , where (η j ) ∞ j=1 is a non-decreasing sequence of F-stopping times, β j is a U -valued, F η j -measurable random variable and N = N u T := sup{j ≥ 0 : η j ≤ T } is P-a.s.finite.Moreover, for t ∈ [0, T ], we let U t be the subset of U with η 1 ≥ t, P-a.s.
• We let U W t be the subset of U t , where the η j are F t,W -stopping times and β j is F t,W η j -measurable.
) be the set of all u := (η j , β j ) N j=1 in U t (resp.U W t ) for which N ≤ k, P-a.s.
• For any τ ∈ T and u = (η j , β j ) N j=1 ∈ U we let N u τ := max{j ≥ 0 : t} and define the concatenation operator ⊗ t as Note that an alternative, and indeed a more general, metric on the space of càdlàg paths that we could have used as a basis for our definition of d is the Skorokhod metric.However, with the type of temporal distortions that we expect, d proves to be more convenient and practical to utilize.
Definition 4.1.For t ∈ [0, T ], the set of non-anticipative stopping strategies, denoted T S,W t , is defined as the set of maps τ S : U W t → T W t such that for any η ∈ T W t and u, ũ ∈ U W t , the difference between the sets {ω : τ S (u) ≤ η, u η = ũη } and {ω : τ S (ũ) ≤ η, u η = ũη } is a P-null set.Definition 4.2.For t ∈ [0, T ], the set of non-anticipative impulse control strategies U S,W t (resp.U S,W,k t ) is defined as the set of maps u S : ) such that for any τ, τ ′ ∈ T W t , there is a P-null set E for which u S (τ

Problem formulation and main result
Let us recall the definition of the cost/reward process.For t ∈ [0, T ], u ∈ U t , and τ ∈ T t , the random variable J t (u, τ ) was defined as: where we recall that (1.4) defined X t,u as the soluton to γ(r, X t,u , e)µ(dr, de) We then define the lower value process and the upper value process Ȳt := ess sup The main result of the second part of the paper is the following:  2 is the maximal solution to (1.2).Moreover, the upper and lower value processes satisfy Y t = Ȳt = Y t , P-a.s., for each t ∈ [0, T ].
A comment regarding the use of filtrations seems appropriate in this context.For t = 0, as F 0 is the trivial filtration, we encounter a SDG of impulse control versus stopping in a Brownian filtration.Our motivation for investigating the game formulation in a conditional setting, allowing t ∈ (0, T ], goes beyond providing a representation for the non-linear Snell envelope discussed in Section 3. It also serves as a foundational element for considering more general scenarios.Subsequent research can build upon this framework, similar to the extension of reflected BSDEs, known to be associated with classical control versus stopping games [15,4], to encompass control problems involving a combination of switching and classical control [19], as well as impulse control versus classical control games [29].Furthermore, the conditional framework facilitates the efficient use of the penalization routine described in Section 3 to approximate the value function over a wide range of historical information.

Assumptions
To be able to represent an impulse control u ∈ U W t by a randomized control, defined in terms of the random measure µ, we need the following assumption: Assumption 4.4.λ is a finite positive measure on (U, B(U )) with full topological support.
We assume that the coefficients of the SDE satisfy the following: Assumption 4.5.There is a C > 0 such that for any t, t ′ ≥ 0, b, b ′ ∈ U and x, x ′ ∈ D, we have: satisfies the growth condition ii) The coefficients a : [0, T ] × D → R d and σ : [0, T ] × D → R d×d are Prog(D)-measurable and satisfy the growth condition and the Lipschitz continuity The coefficients of the reward/cost satisfy the following assumptions: Assumption 4.6.There are constants C > 0 and q > 0 in addition to a family of modulus of continuity functions (̟ K ) K≥0 such that for any t, t ′ ≥ 0, b, b ′ ∈ U and x, x ′ ∈ D, we have: i) The running cost f : [0, T ] × D → R is Prog(D)-measurable and satisfies the growth condition and for any K > 0 ) ii) The intervention cost χ : and satisfies iii) The barrier function ) and satisfies the regularity property In addition, we assume that In impulse control it is generally assumed that intervening on the system at the end of the horizon is suboptimal.Inequality (4.3) extends this assumption and ascertain that it is never optimal for the impulse controller to intervene on the system at the time that the game ends.Note also that we may (and will), without loss of generality assume that there is a C > 0 such that ̟ K is bounded by C(1 + K q ) for any K ≥ 0.

Preliminary estimates
Lemma 4.7.Under Assumption 4.5, the path-dependent SDE (1.4) admits a unique solution for each t ∈ [0, T ] and u ∈ U t .Furthermore, the solution has moments of all orders, in particular we have for p ≥ 0, that where C > 0 does not depend on u and λ and X u := X 0,u .
Proof.The result was proved for a Brownian filtration in [29] (see Proposition 4.2) and the method of that proof readily extends to cover our framework.
Remark 4.8.Since (σ j , ζ j ) j≥1 ∈ U , the above proposition immediately gives that there is a Corollary 4.9.There is a C > 0 such that Proof.We have and the assertion follows by the polynomial growth assumptions on Ψ and f and Remark 4.8.
In addition to the assumptions stated in Section 4.3, we introduce two hypotheses that are formulated in a more implicit manner to allow for broader applicability of the results presented in the following section.These hypotheses can be demonstrated to hold under relatively mild additional assumptions.For instance, the first hypothesis is shown to hold in Lemma 4.3 of [29] by imposing a continuity requirement on γ and L 1 (resp.L 2 ) continuity on a (resp.σ).Furthermore, Proposition 4.12 below establishes that the second hypothesis holds when χ is bounded from below by a positive constant.
Moreover, it is worth noting that both hypotheses can be easily verified for a wide range of classical control versus stopping games if we initially approximate the classical control α using a piece-wise constant process αt := a 0 ½ [0,η 1 ] (t) + N j=1 β j ½ (η j ,η j+1 ] (t), as done in [14].
Proof.By (H.1) and Lemma 4.7 together with the Burkholder-Davis-Gundy inequality it follows that there is a modulus of continuity ρ and a constant C 1 > 0 such that . Hence, there is a subsequence (l j ) j∈N such that ∞ j=1 P[A l j ] < ∞ implying that lim sup j A l j is P-negligible.In particular, d[(τ l j , X t,u l j ), (τ l j , X t,ũ l j )] → 0, P-a.s. as j → ∞.
The above lemma allows us to deduce the following important continuity result: Proof.We treat only the term containing Ψ as identical arguments can be used to for the other terms.We seek a contradiction and assume that there is a ̺ > 0, a sequence ε l such that ε l → 0 as l → ∞ and sequences (u for every l ∈ N. By Assumption 4.6.(iii)we find that for any K ≥ 0, it holds that Now, by possibly going to a subsequence we have by Lemma 4.10 that ̟ K (d[(τ l , X t,u l , (τ l , X t,ũ l )]) → 0 P-a.s.Since ̟ K is uniformly bounded, we can use dominated convergence to conclude that there is a subsequence such that contradicting (4.7) as K > 0 was arbitrary.
We note that (H.1) presumes that the impulse controls have a limited number of interventions.The hypothesis becomes particularly significant when combined with the upper and lower value functions under a truncation on the maximal number of interventions presented in the next subsection.

Truncation of the impulse control set
Next, we develop useful approximations of the value processes by their counterparts with a truncated number of interventions.For this, we introduce As the following proposition shows, a vital example where (H.2) is satisfied is when the intervention costs are strictly positive.Proposition 4.12.Assume that χ is bounded from below by a positive constant, i.e. χ(t, x, b) ≥ δ > 0, then there is a C > 0 such that and similarly P-a.s., for all k ∈ N and t ∈ [0, T ].
Proof.We prove (4.11) as this proof is slightly more involved.Let τ S ∈ T S,W t and note that for any u ∈ U W t , non-anticipativity gives that τ S (u) = τ S (ũ), P-a.s.where ũ equals u up to and at the stopping time τ S (u), but then makes no further interventions.We thus have that ess inf where U τ S t is the subset of U W t with all impulse controls that do not have any interventions after τ S (u).For any u ∈ U τ S t , we have There is thus a C > 0 (that does not depend on t, τ S and u) such that the control u ∈ U τ S t is dominated by the control ∅ ∈ U τ S t (i.e. the impulse control that makes no interventions in [0, T ]) whenever and we can choose a smaller ε until either u ε dominates ∅ or if this never happens we have ess inf We thus assume that u ε dominates ∅ so that where Using (4.12) together with Remark 4.8, the second inequality follows as τ S ∈ T S,W t and ε > 0 were arbitrary.The first inequality follows analogously by first taking τ S (u) = T .
• We let (U ε l ) n ε U l=1 be a Borel-partition of U such that each U ε l has a diameter that does not exceed ε and there is a sequence (b ε l ) . For ε > 0 and k ∈ N, we then define . Similarly, we let T W,ε t be the subset of T W t with all stopping times τ for which τ ∈ T ε , P-a.s.Definition 4.14.For ε > 0, let T S,W,ε t be the subset of T S,W t containing all strategies τ S : U W t → T W,ε t such that on the set {Ξ ε (u) ∈ U τ .On the other hand, we have by Assumption 4.6.(iii)and we conclude that we can disregard strategies with Ñ S (τ ) > 0. In particular, this gives that and, similarly, we have Hence, On the other hand, for each u ∈ U W,k,ε t and τ ∈ T W t , we have , it follows by applying Corollary 4.11 that E (Y k t − Y k,ε t ) + → 0 as ε → 0. We approach the opposite inequality by noting that Now, for each u ∈ U W,k t and τ ∈ T W,ε t , we have On the other hand, with we get by repeatedly appealing to (4.3) of Assumption 4.6.(iii)that Combined, this gives that We can thus use Corollary 4.11 to conclude that as ε → 0 and the assertion follows.
Similarly, we have the following result.
Lemma 4.16.For each k ∈ N and t ∈ [0, T ], we have Proof.We have and we can repeat the argument in the first part of the proof of Lemma 4.15 to conclude that lim sup ε→0 E[( and using that τ S (u) = τ S (Ξ ε (u)), P-a.s., whenever τ S ∈ T S,W,ε Repeating the argument in the second part of the proof of Lemma 4.15 then completes the proof.
Lemma 4.17.For each k ∈ N, ε > 0 and t ∈ [0, T ] we have Proof.For 0 ≤ t ≤ s ≤ T , u ∈ U t , ũ := (η j , βj ) Ñ j=1 ∈ U s and τ ∈ T s , we introduce the random variable which gives us the cost/reward of using the pair (ũ, τ ) given that the minimizer has applied the control u on the interval [t, s].We let Ũ S,W,k,ε s and immediately get that Y k,ε t = R ∅,k t .Moreover, to simplify notation we let where Ū ε,k To see this note that R v,k and exploiting (4.17) in a standard fashion gives that where the last inequality holds since τ S,ε k ∈ T S,W,ε t .

Game value by control randomization
A successful approach to represent the solution to various types of control problems (including those with path-dependencies) has been to consider a weak formulation where the auxiliary probability space is endowed with an independent Poisson random measure that is used to represent the control.Optimization is then carried out by altering the probability measure to modify the compensator of the random measure, so that (in the limit and on an intuitive level) the path of the corresponding Poisson jump processes has probabilistic characteristics that mimic those of an optimal control.This approach to stochastic optimal control is termed control randomization [14] and as explained in the introduction, it is intimately connected to BSDEs with a constrained jumps.Despite its efficacy in addressing various types of optimal control problems, the approach pioneered in [14] has yet to be extended to encompass stochastic differential games.In the present section we bridge this void by establishing a connection between the previously defined lower and upper value functions and a nonlinear Snell envelope.
In particular, we first introduce a randomized version of the game where we represent the impulse control by the sequence (σ j , ζ j ) that appears in the Dirac sum formulation of the random measure, µ = j≥1 δ (σ j ,ζ j ) , and then control the integrand in the Doléans-Dade exponential appearing in a Girsanov transformation applied to the probability measure P, effectively changing the probability distribution of µ.Applying the same penalization routine as in Section 3, we show that the value function of this game corresponds to a non-linear Snell envelope defined over solutions to (1.2).We then proceed to show that the value of the randomized game coincides with both the upper and the lower value function of the original game posed in Section 4, thus proving Theorem 4.3.

Randomized game and related non-linear Snell envelope
We introduce the concept of a randomized game and establish its connection to the non-linear Snell envelope.We recall the definition of the set V (resp.V n ) as consisting of all P(F) ⊗ B(U )-measurable bounded maps ν = ν t (ω, e) : [0 where E ν is expectation with respect to the probability measure P ν on (Ω, F) defined by dP ν := κ ν T dP with Remark 5.1.As in Lemma 4.7, it can be easily deduced that for each p ≥ 0, there is a constant C > 0 such that To establish a relation between our non-linear Snell envelope and the value of a game formulated over randomized controls, we adopt the approximation routine described in Section 3. Specifically, we consider the unique solution and define K −,n t := n t 0 U (V n s (e) + χ(s, X, e)) − λ(de)ds.The following representation holds: Proposition 5.2.For each n ∈ N and t ∈ [0, T ], we have the representations and where τ n (t) := inf{s ≥ t : Proof.We remark that Ψ(•, X) is càdlàg and upper left semi-continuous at predictable stopping times, whereby Remark 3.2 implies that the condition for Theorem 2.2 holds, ensuring that τ n (t) is an optimal stopping time for Y n t .We let and suppose that for each (τ, ν) V τ,ν s (e)µ(ds, de) χ(s−, X, e)µ(ds, de) where μν (ds, de) := µ(ds, de) − ν s (e)λ(de)ds.Now, under the measure P ν defined above, the compensator of µ is ν • (e)λ(de).Hence, Finally, letting v * t (e) := n½ [V n t (e)<−χ(t,X,e)] and arguing as in the proof of Lemma 3.4 gives that (ν * , τ n (t)) is a saddle-point for the game, i.e. for any (ν, τ ) ∈ V n × T t we have In particular, as the restriction to positive densities becomes irrelevant when taking the infimum, the representations (5.3) and (5.4) hold.
By combining the above proposition with Theorem 3.1, we obtain the following corollary: There is a Y ∈ S 2 such that for any η ∈ T , we have Y η = ess sup τ ∈Tη Y τ η , where for each τ ∈ T , the quadruple 2 is the unique maximal solution to (1.2).Moreover, we have the representation that holds for all t ∈ [0, T ].
To complete the proof of Theorem 4.3, we need to relate the value of the randomized game to that of the original game.This is accomplished in the subsequent subsections, where we begin by demonstrating that the expected value of the upper value function is dominated by that of Y .

Proving that Ȳt ≤ Y t
We begin by examining the inequality E Ȳt − Y t ≤ 0, which can be readily deduced from the findings reported in Section 4.1 of [1].This relation will suffice since the principal result of the next subsection below implies that Ȳt ≥ Y t , P-a.s., leading us to conclude that Ȳt = Y t , P-a.s.To maintain formality, we state the result in the following proposition: The novel work in [14] considered a weak formulation of the control problem where, in addition to a supremum over controls, the value function was obtained by taking the supremum over all conceivable probability spaces.In particular, this made it straightforward to prove that the value in the original problem dominates that of the randomized version.An essential contribution made in [1] was to consider a strong version of the control problem, where the probability space is fixed.Considering the type of zero-sum games that we analyze does not lead to a significant increase of the complexity compared to the analysis in [1].To see this, note that for each ε > 0, there is a stopping strategy τ S,ε ∈ T S t (the set of non-anticipative maps τ S : U t → T t ), such that For each τ S,ε ∈ T S t , the expression on the right-hand side represents an impulse control problem commencing at time t, built upon the historical trajectory (X s : 0 ≤ t ≤ s).Notably, this problem deviates from the standard archetype, as the pertinent information (Ψ(τ S,ε (u does not conform to conventional regularity assumptions.
Conversely, the pivotal outcomes leading up to Proposition 4.2 in [1] hinge exclusively upon the properties of measurability, without necessitating any further regularity constraints on this dataset.Consequently, we are able to replicate the reasoning delineated in the corresponding proofs, culminating in the deduction that for each n ∈ N, we have inf where V n t,inf>0 := {ν ∈ V n inf>0 : ν s ≡ 1, ∀s ∈ [0, t]}.Now, as u t,µ := (σ j+N µ t , ζ j+N µ t ) j≥1 ∈ U t , where N µ t := µ((0, t], U ), and τ := τ S,ε (u t,µ ) ∈ T t , it follows by the second representation of Y n in (5.3) that the right-hand side is dominated by E[Y n t ].Taking the limit as n → ∞ and using dominated convergence, we conclude that Proposition 5.4 holds.

Proving that Y t ≤ Y t
We finish the proof of Theorem 4.3 by showing the following: Effectively this proposition together with the preceding one implies that Y t ≤ Y t ≤ Ȳt ≤ Y n t , P-a.s., for each t ∈ [0, T ] and n ∈ N, enabling us to show that the game has a value.
The proof of this proposition is more involved than that of Proposition 5.4 and is distributed over several lemmata.Leveraging Lemma 4.15 and (H.2), it suffices to prove that E (Y t − Y k,ε t ) + → 0 as ε → 0 for each k ∈ N. To achieve this, we use the fact that each u ∈ U k,ε can be approximated to arbitrary precision by a random measure in the spirit of Section 4.1.2in [14] or Section 4.2 in [13].However, it should be noted that the game framework that we examine requires a different approach to the aforementioned works.Mainly, this is due to the fact that the optimal stopping times τ n (t) depend on n.To resolve this issue we resort to a discretization of the set of stopping times T t in the game representation of Y n , both by restricting all stopping times to take values in T ε and by restricting the information that is used through only considering stopping times in a smaller filtration.It is worth noting that our approach has been specifically tailored to address both the game setting and the conditional framework, distinguishing it from the methods employed in [14,13].
To begin, we restrict the stopping set for the randomized version of the game and impose a similar restriction on the number of interventions in the randomized control as we did in the original version.Specifically, we define N µ s,t := µ((s, t], U ), so that N µ s,t represents the number of interventions in the control corresponding to µ within the interval (s, t].Since we are only considering bounded-from-below ν, we cannot place a P ν -a.s.upper bound on N µ t,T in the optimization.Instead, we introduce the set: so that for any ν ∈ V n,k,t inf>0 , we have E ν N µ t,T ≤ k + T λ(U ) and more importantly there is a C > 0 such that for all (n, k, t) ∈ N 2 × [0, T ] and ν ∈ V n,k,t inf>0 .We now let where T ε t is the set of stopping times with respect to the filtration ) and for each u ∈ U , the filtration To support this approximation we introduce the following objects.We let and introduce the process X t,τ,ε := lim j→∞ X t,τ,ε,j •∧τ , where the sequence (X t,τ,ε,j ) j∈N is defined recursively by letting X t,τ,ε,j be the unique càdlàg process that satisfies for each τ ∈ T t .We adopt the notation X t,ε := X t,T,ε and let With this definition, J R,t,ε corresponds to delaying the jumps in the state X t,ε so that for each s ∈ T ε , the new state X t,ε •∧s is F W s ∨ F µ t,ε s -measurable while additionally discretizing the interventions to take values in Ū ε .This discretization of the randomized impulse control allows us to solve the corresponding optimal stopping problem in a straightforward manner as shown in the next lemma.Lemma 5.6.For each ε > 0 and t ∈ [0, T ], there is a non-increasing sequence of stopping times P-a.s., for all (k, n) ∈ N 2 .
Proof.To show the existence of an optimal stopping time we use dynamic programming and introduce the processes from which we extract the F-stopping times Since Ỹ n,k,t,ε s is non-increasing in n, the sequence of stopping times (τ ε n ) n∈N ∈ T ε t is non-increasing in n.Now, in the left-hand side of (5.6), stopping outside of the set T ε is suboptimal from the point of view of the maximizer.On T ε , we thus have By a regular BSDE argument, the non-linear expectation ess inf ν∈V n,k,t inf>0 E ν satisfies a tower property and we get that for arbitrary τ ∈ T t ε i , ess inf This leads us to the conclusion that Ỹ n,k,t,ε satisfies the weak dynamic programming principle, Ỹ n,k,t,ε On the other hand, by iteration and again using the tower property, we find that the right-hand side is bounded from above by ess inf ν∈V n,k,t -stopping time and thus belongs to T ε t for each n ∈ N.This proves the assertion whenever t ∈ T ε .The generalization to arbitrary t ∈ [0, T ] is straightforward.
In addition, we can compare the approximation to the original value process as follows.
with σ 0 := 0. Assuming w.l.o.g. that ν ≡ 1 on [0, t] we get that for each K > 0, Now, N µ t,T has fourth order moment under P ν for ν ∈ V n,k,t inf>0 that is uniformly bounded in n.Remark 5.1 then gives that while repeating the argument from Section 5.2 gives Since the right-hand side of (5.9) tends to 0 as ε → 0 uniformly in (ν, τ ) by Corollary 4.11, we conclude that lim sup implying that the left-hand side must equal zero as K > 0 was arbitrary.We move on to the second term on the right-hand side of (5.8) and have for any ν ∈ V n,k,t inf>0 and τ ∈ T t , valued in T ε , that Repeating the latter part of the above argument now gives that for each K ≥ 0, it holds that The idea is to use the sequences Û m j and Ŝm ′ j ′ to "randomize" an impulse control û ∈ Û Ŵ ,k,ε t and then add πl to get a new sequence ǔ := (η j , βj ) ∞ j=1 of random variables such that the P-compensator of the corresponding random measure3 ǔ := ∞ j=1 δ (η j , βj ) has a density ν with respect to λ(da)dt which is strictly positive and such that ǔ is sufficiently "close" to û.
Following the above procedure, we define Ŷ k,ε t as the canonical extension of Y k,ε t to Ω. Before we proceed to prove Proposition 5.5, we present the following lemma.
Lemma 5.8.For each t ∈ [0, T ], k ∈ N and ̺ > 0, there is an ε ∈ (0, ̺] and a ǔ ∈ and the random measure on (t, T ] × U corresponding to ǔ has a P-compensator with respect to the filtration Ft, Ŵ ∨ Fǔ that is absolutely continuous with respect to λ and takes the form νt (ω, a)λ(da)dt ∈ U S τ ∩ U S,W t and by Assumption 4.6.(iii) it is optimal to have Ñ S ≡ 0. Lemma 4.15 now implies that in L 1 ( Ω, Ft , P) we have In particular, there is an ε ∈ Using (5.10), we prove the lemma in two steps: Step 1: We first show existence of a non-negative (not necessarily bounded away from zero) map ν satisfying the first part of the lemma.By the definition of the essential supremum and stability under pasting of the set Û Ŵ ,k,ε t , there is a ûε,̺ := (η ε,̺ j , βε,̺ j ) P-a.s.Define for each m ∈ N, the transition kernel q m (b, da) on U as in the proof of Lemma 4.4 of [13], let ηm j := ηε,̺ j + Ŝm j , βm j := q m ( βε,̺ j , Û m j ), Ň m := inf{j ≥ 0 : ηm j < T } and introduce the impulse control ǔm := (η m j , βm j ) . According to Lemma A.11 in [14] the corresponding P-compensator with respect to Ft, Ŵ ∨ Fǔ m is given by the explicit formula Moreover, there is an m ′ ∈ N such that the densities of the random variables in the sequence (S m j ) j∈N all have support in (0, ∆t ε ) and ξ ε 2 ( βm j ) = βε,̺ j for j = 1, . . ., k, whenever m ≥ m ′ .We thus conclude that for any such m and each τ ∈ T Ŵ ,û Arguing as in the proof of Lemma 4.15 while appealing to a slightly adjusted version of Corollary 4.11, where we allow impulse controls in Ǔ Ŵ t , we now find that there is a m ′′ ≥ m ′ such that and by (5.10)-(5.12)we get Step 2: To establish the claim, we need to modify ǔm so that the corresponding density with respect to λ is bounded away from 0 on [t, T ].We therefore consider the control ǔm,l := (η m,l j , βm,l j ) Ň m,l j=1 corresponding to the random measure ǔm + πl (• ∩ [t, T ], •) and note that the number of interventions of ǔm,l on [t, T ] is bounded by k + N πl , where N πl t,T := πl ([t, T ], U ) is Poisson distributed with parameter λ(U )(T − t)/l under P.In particular, this gives that N πl t,T and then also k + N πl t,T has moments of all orders under P.
The proof, which is divided into three steps, uses an auxiliary formulation of the randomized version of the game with a state process X that is driven by Ŵ and μ.

Lemma 3 . 5 .
time η ∈ T .Using this relation we are able to prove the following lemma: The process Ỹ ∈ S 2 .In particular, Ỹ is càdlàg.
and thus lim sup i→∞ P[B i−1 \B i ] = 0 and by taking the expectation of both sides in (3.7), we conclude that P[B] = 0.As the number of downcrossings of the interval [a, b], denoted D([a, b]), is finite on the set Ω \ B and since a < b were arbitrary, we conclude that D([a, b] Ṽs (e)µ(ds, de), ∀t ∈ [η, τ ]

Theorem 4 . 3 .
Under the assumptions detailed in Section 4.3 and (H.1)-(H.2) below, there is a process Y ∈ S 2 such that Y η = ess sup τ ∈Tη Y τ η for any η ∈ T , where the triple
measurable and bounded away from zero.Moreover, the number of interventions of the impulse control ǔ (i.e. the ǔ-measure of the set (t, T ] × U ), denoted N ǔ t,T , has moments of all orders under P.Proof.As noted in the proof of Lemma 4.15, non-anticipativity implies that any strategy u S ∈ U S,W t can be written u S (τ ) := u τ − ⊗ τ ũS (τ ), with u ∈ U W t and ũS (τ ) := (η S j (τ ), βS j (τ ))Ñ S (τ ) j=1
ensuring the existence of a C > 0 such that Y n S 2 ≤ C for all n ∈ N. Next, Ramark 3.2 and Theorem 2.2.(ii) implies that for each η ∈ T and n ∈ N, the corresponding stopping time τ n is optimal for (3.3) in the sense that in measure which by uniform integrability gives strong converges in H p (W ) × H p (µ) for p ∈ [1, 2).