On Markovian solutions to Markov Chain BSDEs

We study (backward) stochastic differential equations with noise coming from a finite state Markov chain. We show that, for the solutions of these equations to be `Markovian', in the sense that they are deterministic functions of the state of the underlying chain, the integrand must be of a specific form. This allows us to connect these equations to coupled systems of ODEs, and hence to give fast numerical methods for the evaluation of Markov-Chain BSDEs.


Introduction
Over the past 20 years, the role of stochastic methods in control has been increasing. In particular, the theory of Backward Stochastic Differential Equations, initiated by Pardoux and Peng [9], has shown itself to be a useful tool for the analysis of a variety of stochastic control problems (see, for example, El Karoui, Peng and Quenez [6] for a review of applications in finance, or Yong and Zhou [11] for a more general control perspective). Recent work [4,5] has considered these equations where noise is generated by a continuous-time finite-state Markov Chain, rather than by a Brownian motion.
Applications of BSDEs frequently depend on the ability to compute solutions to these equations numerically. While part of the power of the theory of BSDEs is its ability to deal with non-Markovian control problems, the numerical methods that have been developed are typically still restricted to the Markovian case (see, for example, [2,1]). In this paper, we ask the question When does a (B)SDE with underlying noise from a Markov Chain admit a 'Markovian' solution, that is, one which can be written as a deterministic function of the current state of the chain?
As we shall see, such a property implies strong restrictions on the parameters of the (B)SDE. However, these restrictions form a type of nonlinear Feynman-Kac result, connecting solutions of these SDEs to solutions of coupled systems of ODEs. This connection yields simple methods of obtaining numerical solutions to a wide class of BSDEs in this context.

Martingales and Markov Chains
Consider a continuous-time finite-state Markov chain X on a probability space (Ω, P). (The case where X is a countable state process can also be treated in this manner, we exclude it only for technical simplicity.) Without loss of generality, we shall represent X as taking values from the standard basis vectors e i of R N , where N is the number of states. An element ω ∈ Ω can be thought of as describing a path of the chain X.
Let {F t } be the completion of the filtration generated by X, that is, As X is a right-continuous pure jump process which does not jump at time 0, this filtration is right-continuous. We assume that X 0 is deterministic, so F 0 is the completion of the trivial σ-algebra.
Let A denote the rate matrix 1 of the chain X. As we do not assume timehomogeneity, A is permitted to vary (deterministically) through time. We shall assume for simplicity that the rate of jumping from any state is bounded, that is, all components of A are uniformly bounded in time. Note that (A t ) ij ≥ 0 for i = j and i A ij = 0 for all j (the columns of A all sum to 0).
It will also be convenient to assume that P(X t = e i ) > 0 for any t > 0 and any basis vector e i ∈ R N , that is, there is instant access from our starting state to any other state of the chain. None of our results depend on this assumption in any significant way, however without it, we shall be constantly forced to specify very peculiar null-sets, for states which cannot be accessed before time t. If we were to assume time-homogeneity (that is, A is constant in t), this assumption would simply be that our chain is irreducible. However, this assumption does not mean that A ij > 0 for all i = j.
From a notational perspective, as e i denotes the ith standard basis vector in R N , the ith component of a vector v is written e * i v, where [·] * denotes vector transposition. For example, this implies that useful quantities can be written simply in terms of vector products. For example, we have I Xt=ei = e * i X t .

Markov-Chain SDEs
We now relate our Markov chain to a N -dimensional martingale process, with which we can study SDEs. To do this, we write our chain in the following way where M is a locally-finite-variation pure-jump martingale in R N . Our attention is then on the properties of stochastic integrals with respect to M . We shall make some use of the following seminorm, which arises from the Itō isometry. Definition 1. Let Z be a vector in R N . Define the stochastic seminorm where Tr denotes the trace and the matrix of derivatives of the quadratic covariation matrix of M . This seminorm has the property that for any predictable process Z of appropriate dimension. We define the equivalence relation ∼ M on the space of predictable processes by Z ∼ M Z if and only if Z t − Z t Mt = 0 dt × dP-a.s.
Remark 1. A consequence of this choice of seminorm is that Z + c1 ∼ M Z for any Z and any predictable scalar process c. This is simply because i e * i Ae j = i A ij = 0 for all j, and so all row and column sums of Ψ(A t , X t ) are zero. Theorem 1. Every scalar square-integrable martingale L can be written in the form for some predictable process Z taking values in R N . The process Z is unique up to equivalence ∼ M .
The key SDEs which we shall study are equations of the form where f : Ω × R + × R × R N → R is a progressively measurable function, Y is an adapted process with E[sup t<T Y 2 t ] < ∞ for all T and Z is predictable. A solution Y to this equation is a special semimartingale, hence the canonical decomposition into a predictable part (− f (ω, t, Y t− , Z t )dt) and a martingale part ( Z * t dM t ) is unique. We shall make the general assumption that f (ω, t, y, ·) is invariant with respect to equivalence ∼ M , that is, up to indistinguishability. This assumption is important, as it ensures that f only considers Z in the same way as it affects the integral Z * dM .
In terms of existence and uniqueness of solutions to (2) we shall focus on two key cases, • first, when Y 0 ∈ R and {Z t } t≥0 a predictable process are given, and so (2) is a forward SDE, and • second, when Y T ∈ L 2 (F T ) is given for some T > 0, and Z t is chosen to solve the backward SDE, that is, to ensure that Y is adapted and Z is predictable.
For the forward equation, the existence and uniqueness of solution process Y is classical, under various assumptions on the driver f . For the backward equation, under the assumption of Lipschitz continuity of the driver f , a result on existence and uniqueness of the pair (Y, Z) is given in [4] (where Z is unique up to equivalence ∼ M , as defined in the following definition). In this paper, we shall not focus on determining conditions on f such that existence and uniqueness of solutions holds, but shall always assume that sufficient conditions are placed on f such that the equation of interest has a unique solution.
Lemma 1. For any pair (Y, Z) satisfying (2), the pair (Y, Z ) also satisfies (2) if and only if Z ∼ M Z .
Proof. By the canonical decomposition of Y into a finite variation and a martingale part, we see that ]0,t] Z * dM = ]0,t] Z * dM up to indistinguishability, and so Z ∼ M Z . Conversely, if Z ∼ M Z , then by our assumption on f , (Y, Z ) will also satisfy (2), up to indistinguishability.
Proof. Let L have a representation dL t = Z t dM t for some predictable process Z . Then as ∆L t = Z t ∆M t up to indistinguishability, we must have that One can verify that a predictable set with this property is the union of a dt × dP-null set and a set on which A t X t− ≡ 0. However, if A t X t− = 0 then d M, M /dt = 0, and so we see that Z ∼ M Z . As the martingale representation is unique up to equivalence ∼ M , we have our result.

Markovian solutions to (B)SDEs
Definition 2. We say a stochastic process Y is Markovian if, up to indistinguishability, Y t depends on ω only as a function of X t , that is, it can be written as for some deterministic function u.
We note that this is, in some ways, a misnomer, as a process satisfying Definition 2 need not be Markovian in the sense that Y t is conditionally independent of Y r given Y s , for all r < s < t. Nevertheless, this terminology is standard in the theory of BSDEs, and describes adequately the property of interest.
Our key result on the structure of Markovian solutions to (2) is the following.
Theorem 2. Suppose Y has dynamics given by (2). Then the following statements are equivalent: for some functionf : Furthermore, under either set of conditions, e * i Z t = u(t, e i ) up to equivalence ∼ M .
Proof. 1 implies 2. From (2), Y is continuous except at a jump of X. By boundedness of A, for any bounded interval [a, b], with positive probability there will not be a jump of X in [a, b]. Therefore, for each i, considering the non-null set {ω : Consider a jump of X which occurs at the stopping time τ .
Define the process Z t with components Then we have (Z τ ) * X τ = Y τ and (Z τ ) * X τ − = Y τ − , for every jump of X. Hence Z is a predictable (indeed, deterministic) process such that ∆L = Z ∆M , and by Lemma 2, we see Z ∼ M Z . Now note that, except on the thin set {∆X t = 0}, Considering these dynamics on the non-null set {ω : As Z t is deterministic, by rearrangement, we see that does not vary with ω on the sets where X t is constant. Therefore, we can writẽ f (X t , t, Y t , Z t ) as the common value taken by f on these sets. Finally, we see that replacing f byf and using the fact that e * i Z t = u(t, e i ) in (3) yields the desired dynamics for Z.
2 implies 1. By uniqueness of solutions to (2), as Y 0 = X * 0 Z 0 and Z has the prescribed dynamics, we know that Y t = X * 0 Z t up to the first jump of X. At the first jump time τ , we see that and so Y τ = X * τ Z τ almost surely. Repeating this argument and using induction on the (almost surely countable) sequence of jumps of X, we see that Y t = X * t Z t up to indistinguishability. However Z t is deterministic, so we can define a deterministic function u(t, e i ) = e * i Z t , and we see that Y t = u(t, X t ).
The following corollary provides a frequently more convenient way to analyse such equations. Corollary 1. Let Y be as in Theorem 2, with associated function u. Write u t for the column vector with elements u(t, e i ). Write f (t, u) for the column vector with elements e * i f (t, u t ) :=f (e i , t, e * i u t , u t ). Then u satisfies the vector ordinary differential equation Proof. Simply note that u = Z in the proof of the theorem, and so the dynamics are as given.
Corollary 2. Let Y be the solution to a BSDE with Markovian terminal condition Y T = φ(X T ), for some deterministic function φ : • the associated vector u t satisfies the ODE (4) and • the solution process Z is given by Z t = u t . In particular, note that Z is deterministic and continuous.
Proof. Simply define u t as the solution to the ODE (4), working backwards in time, with initial value e * i u T = φ(e i ). Then the pair of processs (Y t , Z t ) := (X * t u t , u t ) is a solution to the BSDE with dynamics (2), and hence is unique (up to equivalence ∼ M for Z). The remaining properties follow directly from the theorem.

Brownian BSDE and semilinear PDE
It is worth comparing these results with those for BSDEs driven by Brownian motion. In the simplest classical case, suppose the filtration is generated by a scalar Brownian motion W , and consider a BSDE of the form Then, one can show from the Feynman-Kac theorem (see, for example, [6] or [10]) that Y t = u(t, W t ), where u is the viscosity solution to the semilinear PDE Comparing this to our equation (4) for u t , we see that first difference is that we have replaced the infinitesimal generator of the Brownian motion ( 1 2 ∂ 2 xx ), by the infinitesimal generator of the Markov chain (A * t ).
The second difference is that, in the Brownian case, f (x, t, ·, ·) depends only on the behaviour of u in a neighbourhood of x. In the Markov chain case, as our state space does not have a nice topological structure, it would seem that e * i f can depend on all the values of u, not only on those 'close' to the ith coordinate of u.
However, when we examine e * i f (t, u t ) =f (e i , t, e * i u t , u t ), we see that e * i f depends only on the ith coordinate of u, and on those properties of u which are invariant up to equivalence ∼ M . In particular, if a jump from state e i to state e j is not possible, then e * i f cannot depend on the value of e * j u t (as changing this value will give a vector which is equivalent to u t up to equivalence ∼ M ). Similarly, if a constant is added to every element of u t , e * i f will change only through the dependence on e * i u t , rather than through any other element of the vector. In this sense, the 'local' dependence is preserved by the equivalence relation.
An alternative way of thinking through the relationship between our result and those known in the Brownian case is through the following diagram, indicating how an equation of one type can be converted into another.

Feynman-Kac
Our result provides the link indicated by ( * ). It is natural to think that, given appropriate choices of spatial discretisations and finite element methods, this diagram will commute.

Calculating BSDE solutions
As we have shown that there is a connection between BSDEs driven by Markov chains and systems of ODEs, it is natural to use this connection for the purposes of computation. As with classical BSDE, this connection can be exploited in both directions, depending on the problem at hand. As mentioned before, various practical problems can be analysed using the framework of BSDE. Of particular interest are dynamic risk measures and nonlinear pricing systems, as described in [5]. By connecting this theory with the theory of ODEs, we gain access to the large number of tools available for the numerical calculation of ODE solutions, the only concern being the dimensionality of the problem. We note that while we shall consider an example from finance, the same methods can be applied in other areas of stochastic control.
We give a practical example taken from Madan 2 , Pistorius and Schoutens [8]. In this paper a 1600-state Markov chain on a non-uniform spatial grid is created to match the behaviour of a stock price in discrete time, assuming that there exists an underlying Variance-Gamma local Lévy process, based on the CGMY model of [3]. The transition probabilities are fitted in discrete time from month to month, yielding a calibrated discrete time risk-neutral transition matrix.
From this matrix, we extract a continuous-time Markov chain approximation, using a carefully constructed approximation of the matrix logarithm. (An approximation is needed as the matrix in question is large, and possibly due simply to calibration error, does not exactly correspond to the skeleton matrix of a continuous time Markov chain. We hope to give the details of this approximation in future work.) For our purposes, the only relevant quantities are the grid used, (that is, the value of the underlying stock in each state), and the rate matrix of the Markov chain.
Let S(X t ) denote the value of the stock in state X t . We shall consider the risk-averse valuation of a contingent claim using our BSDE. To do this, we fix the terminal value as a function of the state Y T = φ(S(X T )) for some function φ. We then take the risk-neutral valuation E[Y T ], and dynamically perturb this through the use of a BSDE with concave driver (if the driver were f ≡ 0, then we would simply obtain the risk-neutral price E[Y T ]). This yields a process Y t = u(t, X t ), which we interpret as the ask price (that is, the amount an agent is willing to pay, at time t in state X t ) of the terminal claim Y T .
The value Y t can be thought of as containing both the risk-neutral price and a correction due to risk-aversion. Through the use of a BSDE, we ensure that this correction can be dynamically updated, and so our prices are consistent through time (they do not admit arbitrage, see [5]).
If we think of Y t as the ask price, that is, the amount an agent is willing to pay to purchase Y T , then it is natural to also ask how much he would be willing to sell Y T for, that is, the bid price. This value corresponds to the negative of the solution of the BSDE with terminal value −Y T (as selling Y T is equivalent to purchasing −Y T , and we change the sign of the final solution so that it represents an inward rather than outward cashflow).
To solve this equation, we then convert our BSDE with driver f and terminal value φ(S(X T )) into a coupled system of ODEs, using Theorem 2. It is then a simple exercise to use any standard ODE toolbox (for our examples we have used the ode45 IVP solver in Matlab) to solve the relevant ODE system. Example 1. Consider the BSDE with driver f (X t , t, z) = min This equation corresponds to uncertainty about the overall rate of jumping from the current state -the parameter α ≥ 1 determines the scale of the uncertainty. The uncertainty is, however, only about the overall scale of the jump rate -the relative rates of jumping into different states remain the same.
Note that this driver is concave, and satisfies the requirements of the comparison theorem in [5]. Hence the solutions to this BSDE give a 'concave nonlinear expectation' E(Q|F t ) := Y t in the terminology of [5]. The driver is also positively homogenous (that is, f (X t , t, λz) = λf (X t , t, z) for all λ > 0) and so the nonlinear expectation is positively homogenous, that is, it does not depend on the units of measurement.
For our numerical example, we use a timeframe of one month, set α = 1.   The results of solving the resultant system of ODEs can be seen in Figures 1  and 2. In Figure 1 we plot the surface generated by plotting the solution of the ith term of the ODE against the stock price S(e i ). The assymetry that can be seen in Figure 1 is due to assymetry in the rate matrix of the markov chain -the volatility of the stock depends on its current level. The prices look qualitatively similar to what one would obtain using the classic expectation (which, in fact would lie between the bid and ask curves in Figure 2). As one would hope, the bid price lies above the ask price, however we can see that the difference (the bid-ask spread) is not constant, and is higher when the stock price is larger.
An aspect of these prices which is not immediately apparent from the figures is that they are dynamically consistent, that is, if prices evolve in this manner, then one cannot make an arbitrage profit. This follows from the fact that these prices satisfy a BSDE, see [5].
Example 2. In [8], various prices are determined using the estimated discrete time model and a nonlinear pricing rule. The technique used is to apply a concave distortion the the cumulative distribution function of the values onestep ahead. That is, the value at time 0 of a payoff at time 1 with cdf F is given by for some 'stress level' γ > 0. This is called the minmaxvar distortion.
In the same vein, we now consider the use of the driver whereÃ z t arises from the continuous time analogue of minmaxvar, where we distort the relative rates of jumps to each of the non-current states. This is defined by the following algorithm: Minmaxvar rate matrix distortion 1. Sort the components of z i to give an increasing sequence z π(i) , where π is a permutation of {1, ..., N }.
2. Define the cumulative sum of the corresponding sorted rates (excluding the current state),

Apply a concave distortion to the scaled cumulative sum
4. Define the individual distorted rates q i = ψ(G(i)) − ψ(G(i − 1)), with initial term q 1 = G(1); 5. Unsort these rates to define a distorted rate matrixÃ z t e * iÃ z t X t = q π −1 (i) , with the diagonal then chosen to give row sums of zero.
We apply this pricing mechanism to a digital option with a knockout barrier, that is, a payoff φ where This type of simple barrier option is straightforward to calculate, as we simply ensure that our solution satisfies the additional boundary condition u(t, X t ) = 0 for all states where S(X t ) ≥ 25. Our solution is then the value of the option, conditional on the barrier not having been hit. The results of this, using a stress level γ = 0.1, can be seen in Figures 3 and 4.
Again we can see the effects of risk aversion in the differences of the bid and ask prices in Figure 4. The effects of the knockout barrier are also clear, as it causes a sharp change in the prices at the boundary.

Calculating ODE solutions
It is well known that a potential application of the theory of Brownian BSDEs is to provide stochastic methods for large PDE systems. In our situation, we have seen that the natural relation is not between a BSDE and a PDE, but a BSDE and a system of coupled ODE. We therefore can consider adaptations of the stochastic methods for solving BSDE as candidates for novel schemes for solving large systems of coupled nonlinear ODE. As numerical algorithms for ODEs are generally very good, we do not expect that this method will be of use except in some extreme cases. For this reason, we simply outline the algorithm.
To implement such a method, consider the following general setting. Suppose we have a system of ODEs of the form and the object of particular interest is the value of e * k v T , that is, the value of the kth component of v at some future time T . (This method can, of course, be modified to give all components, however is particularly well suited to when our interest is in a single component.) We split the ODE (5) into the form where A t is a rate matrix (that is, a matrix with nonnegative entries off the main diagonal and all row sums equal to zero) and f is a function with the property that there exists c such that for all i, where Ψ(·, ·) is as in (1). Practically, this means that the ith component of f can only depend on the jth component of v when (A t ) ij > 0. We then reverse time, defining u t = v T −t , so that (6) becomes precisely the ODE we obtain from our BSDE (4). Therefore, we have converted our problem into determining the initial value of a BSDE with terminal value X * t φ, when X is a Markov chain with rate matrix A t . Furthermore, as our interest is in the value of the kth component of v, we are interested in the initial value of our BSDE when X 0 = e k .
We now outline the natural modification of the algorithm of Bouchard and Touzi [2] for the calculation of solutions to these BSDEs. We shall not give a proof of the convergence of this algorithm, however it is natural to believe that the conditions for convergence from the Brownian setting will carry over to the setting of Markov Chains. (a) Use the function u t+1 to calculate the values of f n := (X n t+1 ) * f (t, u t+1 ) (recall that as X n t+1 takes values from the standard basis vectors of R N , this simply selects out some terms of the function f . (b) Calculateû n t+1 = u t+1 (X n t+1 ) − f n ∆t. We wish to approximate

Monte-Carlo Algorithm for Markov Chain BSDE
This can be done using a Longstaff-Schwarz technique (see [7] (c) Repeat until an estimate for u 0 is obtained.
We note that the accuracy and cost of the backward step primarily depends on the size and computational cost of evaluating f . This algorithm is particularly well suited for calculating the values u 0 (e k ) = e * k v T , as the simulated paths can be chosen to start in state e k . Hence the algorithm will not attempt to accurately calculate values of u t (e k ) = e k v T −t for combinations of t and k which have little impact on e * k v T . We also note that there is a trade off between the costs of the forward and backward processes, depending on the decomposition from (5) to (6). Typically, it seems that the forward process is relatively cheap to simulate accurately, so it is preferable to try and minimise the size of f .

Conclusion
We have considered solutions to Backward Stochastic Differential Equations where noise is generated by a Markov chain. We have seen that solutions to these equations are markovian, that is, they can be expressed as a deterministic function of the current state, if and only if they come from the solution of a certain coupled ODE system.
Consequently, calculating the Markovian solutions to these BSDEs is a simple matter of evaluating an initial value problem, for which many good numerical methods exist. This has applications to calculating risk-averse prices (for example, bid-ask prices) for market models driven by Markov chains. We have also seen how this suggests new stochastic methods for the solution of large ODE systems.