A note on the strong formulation of stochastic control problems with model uncertainty

We consider a Markovian stochastic control problem with model uncertainty. The controller (intelligent player) observes only the state, and, therefore, uses feed-back (closed-loop) strategies. The adverse player (nature) who does not have a direct interest in the pay-off, chooses open-loop controls that parametrize Knightian uncertainty. This creates a two-step optimization problem (like half of a game) over feed-back strategies and open-loop controls. The main result is to show that, under some assumptions, this provides the same value as the (half of) the zero-sum symmetric game where the adverse player also plays feed-back strategies and actively tries to minimize the pay-off. The value function is independent of the filtration accessible to the adverse player. Aside from the modeling issue, the present note is a technical companion to [S\^I3b].


Introduction
We consider a stochastic optimization problem over a state system which is formally identical to [S13b]. However, in this model, only one player is a true optimizer (intelligent player), who tries to maximize the pay-off. The other control variable models Knightian uncertainty, creating an adverse player. However, this is not a zero-sum game, since the adverse player (nature) does not have a vested interest in minimizing the pay-off. Formally, the robust optimization problem appears to be identical to half of the game. When modeling it rigorously, we argue that, this is actually not identical to half of a zero-sum game.
More precisely, we interpret the control problem with model uncertainty as a two-step optimization problem. The controller (intelligent players) observes the state process only, so he/she chooses feedback (closed-loop) strategies. The adverse player, chooses open-loop controls, and such controls are actually adapted to a possibly larger filtration than the one generated by the Brownian motion. In other words, the adverse player, while not acting strategically against the controller, has access to the Brownian motion and other information and may choose a parametrization of the model which happens to be totally adverse to the controller.
A similar model of robust control over feed-back/closed-loop/positional strategies for the controller and open loop-controls for the adverse player has been considered in [KS88] in deterministic setting. However, our discretization of time for the feed-back strategies is different, and, arguably, better fitted to the present case where the system is stochastic. In addition, our note deals with the (important, in our view) issue of the filtration accessible to the adverse player. It is not obvious a-priori that the value function does not depend on the filtration available to the adverse player. To proof that this is actually the case, is a part of our contribution.
There is a vast literature on robust optimization/model uncertainty, and we do not even attempt to scratch the surface in presenting the history of the problem. However, we have not encountered this very particular way to represent stochastic optimization problems with model uncertainty, i.e. a strong formulation over elementary feed-back strategies for the controller vs. open-loop controls for the nature, nor the technical result about the equality of the value functions we obtain.
The message of the present note is two-fold: first, an optimization problem with model uncertainty is not the same as a zero-sum game, so it should be modeled differently. We propose to use feed-back strategies for the controller and open-loop controls for the adverse player, obtaining a two-step/sup-inf optimization problem over strong solutions of the state system. Second, with this formulation, the value function is, indeed, equal to the (lower) value of the zero-sum game, where the adverse player is symmetric to the controller and also plays pure feed-back strategies. Beyond the modeling issue, the mathematical statement does not seem obvious, and the proof is based on verification by Stochastic Perron's Method, along the lines of [S13b]. It is unclear how one could prove directly, using only the probabilistic representation of the value functions, such statement.

The Stochastic System
We consider a stochastic differential system of the form: starting at an initial time 0 ≤ s ≤ T at some position x ∈ R d . Here, the control u chosen by the controller (intelligent player) belongs to some compact metric space (U, d U ) and the parameter v (chosen by the adverse player/nature) belongs to some other compact metric space (V, d V ) and represents the model uncertainty. In other words, the Brownian motion W represents the "known unknowns", and the process v stands for the "unkown unknowns", a.k.a. "Knightian uncertainty". The state X lives in R d and process (W t ) s≤t≤T is a d ′ -dimensional Brownian motion on a fixed probability space (Ω, F, P) with respect to some filtration F = (F t ) s≤t≤T satisfying the usual conditions, which is usually larger than the the augmented natural filtration generated by the Brownian motion, by which we mean, The space (Ω, F, P), the Brownian motion W and the filtration F may depend on s. To keep the notation simple, we do not emphasize the dependence on s, unless needed. The coefficients b : Standing assumption: Now, given a bounded and continuous function g : R d → R, the controller is trying to maximize E[g(X s,x;u,v T )]. Since v is "uncertain", optimizing "robustly", means optimizing the functional inf v E[g(X s,x;u,v T )], leading to the two-step optimization problem It is not yet clear what u, v mean in the formulation above, and giving a precise meaning to this is one of the goals of the present note.

Modeling a Zero-Sum Game
For an identical stochastic system, imagine that v represents the choice of another intelligent player and g(X s,x;u,v T ) is the amount payed by the v player to the u player. For this closely related, but different problem it was argued in [S13b] that, as long as both players only observe the state process, they should both play, symmetrically, as strategies, some feed-back functionals u, v of restricted form.
We denote by C([s, T ]) C([s, T ], R d ) and endow this path space with the natural (and raw) The elements of the path space C([s, T ]) will be denoted by y(·) or y. The stopping times on the space C([s, T ]) with respect with the filtration B s , i.e. mappings τ : are called stopping rules, following [KS01]. We denote by B s the class of such stopping rules starting at s.
An elementary strategy α starting at s, for the first intelligent player/controller is defined by • a finite non-decreasing sequence of stopping rules, i.e. τ k ∈ B s for k = 1, . . . , n and s = τ 0 ≤ . . . τ k ≤ · · · ≤ τ n = T • for each k = 1 . . . n, a constant value of the strategy ξ k in between the times τ k−1 and τ k , which is decided based only on the knowledge of the past state up to τ k−1 , i.e. ξ k : The strategy is to hold ξ k in between (τ k−1 , τ k ], i.e. α : An elementary strategy β for the second player is defined in an identical way, but takes values in V . We denote by A(s) and B(s) the collections of all possible elementary strategies for the u-player and the v-player, respectively, given the initial deterministic time s.
The main result in [S13b] is the description of the lower and upper values of such a zero-sum symmetric game over elementary feed-back strategies. We recall the result, for convenience: Theorem 2.2 Under the standing assumption, we have 1. for each α ∈ A(s), β ∈ B(s), there exists a unique strong solution (X s,x;α,β t ) s≤t≤T (such that X s,x;α,β t ∈ F W s ) of the closed-loop state system 2. the functions are the unique bounded continuous viscosity solutions of the Isaacs equations (for i = − and i = +) to the game where, using the notation

Back to Control with Model Uncertainty
In our setting, v does not represent an intelligent player: we can think about it as nature, but which does not have a direct pay-off/vested interest from playing agains player u. The controller (player u) does have a pay-off to maximize. It is still natural to assume that, the controller only observes the state of the system, so he/she uses the same elementary feedback strategies α ∈ A(s). On the other hand, the adverse player, the nature, can choose any paramter v, and, can actually do so using the whole information available in the filtration F. In other words, we treat as the possible (uncertain We would like to emphasize again that, in our model, • nature uses open-loop controls v ∈ V(s), while the controller uses feed-back strategies α ∈ A(s), • the nature's controls are adapted to the filtration F which may be strictly larger than the one generated by the Brownian motion. It is important, in our view, to obtain strong solutions of the state equation, and this is the main reason to restrict feed-back strategies to the class of elementary strategies. Mathematically, our result states that the use of open-loop controls by the stronger player (here, the nature), even adapted to a much larger filtration than the one generated by the "known randomness" W , does not change the value function, from the one where the stronger player only observes the state process, as long as the weaker player only observes the state. Our main result can be rephrased as This is the technical contribution of the note. In our understanding, this is not entirely obvious. The other part of the contribution is to modeling the robust control/model uncertainty problem as Remark 2.4 1. one possible way to model the robust control problem is to assume that α is an Elliott-Kalton strategy (like in [EK72] or [FS89]) and v is an open loop control. While such an approach is present in the literature, we find it quite hard to justify the assumption that the controller can observe the changes in model uncertainty in real time, i.e. really observe v t right at time t. Locally (over an infinitesimal time period), this amounts for the nature to first choose the uncertainty parameter v, then, after observing v for the controller to choose u. This contradicts the very idea of Knightian uncertainty we have in mind. If one actually went ahead and modeled our control problem in such a way, than V would be equal to V + , since the Elliott-Kalton player is the stronger player as described above (see [FS89] for the mathematics, under stronger assumptions on the system).
2. another way would be to model "nature" as the Elliott-Kalton strategy player β an let the controller/intelligent player use open loop controls u. This does not appear too appealing either, since nature does not have any pay-off/vested interest. Why would nature be able to observe the controller's actions and act strategically against him/her? In addition, if the controller chooses open-loop controls, he/she needs to have the whole information in F available. The controller does not usually observe directly even the noise W , leave alone the other possible information in F. However, with such a model, mathematically, the resulting value function is expected to be the same, V = V − (see, again, [FS89], up to technical details).

Proofs
Proposition 3.1 Fix s, x and α ∈ A(s) and v ∈ V(s). Then, there exists a unique strong (and square integrable) solution (X s,x;α,v t ) s≤t≤T , X s,x;α,v t ∈ F t of the state equation The proof of the above proposition (both existence and uniqueness) is based on solving the equation, successively on [τ k (X s,x;α,v · ) , τ k+1 (X s,x;α,v · )] for k = 1, . . . , n together with the following very simple lemma from [S13b] Lemma 3.2 Fix s and let τ be a stopping rule, τ : C([s, T ]) → [s, T ], τ ∈ B s . Let (X t ) s≤t≤T be a process with continuous (all, not only almost surely) paths, which is adapted to F. Then, the random time τ X : Ω → [s, T ] defined by τ X (ω) τ (X · (ω)) is a stopping time w.r.to the filtration F. In addition X τ X ∈ F τ X .
Before we proceed, let α ∈ A(s), β ∈ B(s). We can consider v t = β(t, X s,x;α,β · ) ∈ V(s), such that This means, for a fixed α, there are more open-loop control nature can use, than feed-back strategies an adverse zero-sum player could use. This shows that The proof of the main Theorem relies on an adaptation of the Stochastic Perron's Method introduced in [S13b] for symmetric zero-sum games played over elementary feed-back strategies. As mentioned, this is a technical companion to [S13b]. Compared to it, there are two technical differences, that need to be pointed out right away 1. the stochastic sub-solutions of the robust control problem need to be defined differently, to account for the fact the the adverse player is using open-loop controls 2. the proof of the Perron scheme needs to also be modified accordingly, to account for the same (but this difference is not major) Following [S13b], we first define strategies starting at sequel times to the initial (deterministic) time s. The starting time is a stopping rule.  A(s, τ ), for the first player, starting at τ , is defined by • (again) a finite non-decreasing sequence of stopping rules, i.e. τ k ∈ B s , k = 1, . . . n for some finite n, and with τ = τ 0 ≤ . . . τ k ≤ · · · ≤ τ n = T.
• for each k = 1 . . . n, a constant action ξ k in between the times τ k−1 and τ k , which is decided based only on the knowledge of the past state up τ k−1 , i.e. ξ k : C([s, T ]) → U such that ξ k ∈ B s τ k−1 .
The notation is consistent with A(s) = A(s, s).
We recall, still from [S13b], that strategies in A(s, τ ) cannot be used by themselves for the game starting at s, but have to be concatenated with other strategies.
Since w(T, ·) ≤ g(·), we obtain easily that w(s, Since we have already characterized V − as the unique solution of the lower Isaacs equation in [S13b], it turns out that, we actually need only half of the Perron contraction here. We denote by L the set of stochastic super-solutions (non-empty from the boundedness assumptions). Define In order to prove the Theorem, we need two Lemmas, whose proofs are actually identical to the corresponding ones in [S13b].

Proof of the main Theorem
The proof is very close to [S13b], except for the fact that now v is an open-loop control when we apply Itô. However, Itô really works just the same. We only sketch a couple of points of the proof, in order to avoid most arguments that have appeared already in [S13b].