An efficient Monte Carlo scheme for Zakai equations

In this paper we develop a numerical method for efficiently approximating solutions of certain Zakai equations in high dimensions. The key idea is to transform a given Zakai SPDE into a PDE with random coefficients. We show that under suitable regularity assumptions on the coefficients of the Zakai equation, the corresponding random PDE admits a solution random field which, for almost all realizations of the random coefficients, can be written as a classical solution of a linear parabolic PDE. This makes it possible to apply the Feynman--Kac formula to obtain an efficient Monte Carlo scheme for computing approximate solutions of Zakai equations. The approach achieves good results in up to 25 dimensions with fast run times.


Introduction
The goal of stochastic filtering is to estimate the conditional distribution of a not directly observable stochastic process blurred by measurement noise. The process of interest is usually called signal process, state process or system process, while the observed process is typically referred to as observation process. Whereas the signal process follows a hidden dynamics, probing the system only reveals the observation process, which, in general, might depend nonlinearly on the signal process and, in addition, is blurred by measurement noise. Stochastic filtering problems were first studied in connection with tracking and signal processing (see the seminal works by Kalman [26] and Kalman & Bucy [27]) but soon turned out to also be relevant in a variety of other applications in finance, the natural sciences and engineering. Among others, nonlinear filtering problems naturally arise in e.g., financial engineering ( [3,10,13,18,20,21]), weather forecasting ( [8,9,11,17,19,33]) or chemical engineering ( [7,12,34,35,36,38]). For further applications of nonlinear filtering, we refer to the survey paper [31]. Stochastic filtering problems are naturally related to stochastic partial differential equations (SPDEs) since in continuous time, the (unnormalized) density of the unobserved signal process given the observations is described by a suitable SPDE, such as the Zakai equation [39] or Kushner equation [30]. The SPDEs arising in this context can typically not be solved explicitly but instead, have to be computed numerically. Moreover, they often are high-dimensional as the number of dimensions corresponds to the state space dimension of the filtering problem.
In this paper, we focus on Zakai equations with coefficients that satisfy certain regularity conditions. Let us assume the signal and observation processes follow d-dimensional dynamics of the form respectively, for a d-dimensional random vector Y 0 with density ϕ : R d → [0, ∞), sufficiently regular functions µ, h : R d → R d , a constant d × d-matrix σ, and independent d-dimensional Brownian motions (W t ) t∈ [0,T ] and (V t ) t∈ [0,T ] that are independent of Y 0 . Then the solution of the corresponding Zakai equation describes the evolution of an unnormalized density of the conditional distribution of Y t given observations of Z s , s ≤ t; that is, Our numerical method is based on a transformation which transforms a Zakai SPDE of the form (2) into a PDE with random coefficients. We show that under suitable conditions on the coefficients of the Zakai SPDE, the solution of the resulting random PDE is ω-wise a classical solution of a linear second order parabolic PDE. This makes it possible to apply the Feynman-Kac formula to obtain an efficient Monte Carlo scheme for the numerical approximation of solutions of high-dimensional Zakai equations. The following is this paper's main theoretical result.
Theorem 1.1. Let T ∈ (0, ∞), d ∈ N, and σ ∈ R d×d . Consider functions ϕ ∈ C 2 (R d , [0, ∞)), µ ∈ C 3 (R d , R d ) and h ∈ C 4 (R d , R d ) such that ϕ has at most polynomially growing derivatives up to the second order, µ has bounded derivatives up to the third order, and h has bounded derivatives up to the fourth order. Let (Ω, F, P, (F t ) t∈ [ For every v ∈ C([0, T ], R d ), t ∈ [0, T ], and x ∈ R d , let R v,t,x : [0, t] × Ω → R d be an (F s ) s∈ [0,t] adapted stochastic processes satisfying Let the functions B v,y , u v,y : [0, T ] × R d → R, v, y ∈ C([0, T ], R d ) be given by 2 and for v, y ∈ C([0, T ], R d ), t ∈ [0, T ] and x ∈ R d . Then, the random field X : [0, T ] × R d × Ω → R given by satisfies the following: P-a.s. for all t ∈ [0, T ] and x ∈ R d .
Representation (8) makes it possible to approximate the solution X t (x, ω) of the Zakai equation (9) for given t ∈ [0, T ], x ∈ R d and ω ∈ Ω by averaging over different Monte Carlo simulations of the process R V (ω),t,x given in (5). Note that expression (8) formally depends on ω through the realizations of the processes Y and V , while in a typical stochastic filtering application, the Zakai equation has to be solved along a given path of the observation process Z. Y and V determine Z through (4), but the path realizations of Y and V cannot be recovered from a realized path of Z. In fact, the whole purpose of stochastic filtering is to estimate Y t from observations of Z s , s ≤ t. However, it follows from (9) that X t (x, ω) only depends on the realization of (Z s ) s∈ [0,t] . In other words, two different realizations of (Y s , V s ) s∈ [0,t] resulting in the same path of (Z s ) s∈ [0,t] lead to the same value of X t (x, ω). This means that for numerical purposes, a given realization z = (Z s (ω)) s∈ [0,t] can artificially be decomposed by first fixing a likely path y of (Y s ) s∈ [0,t] . In view of (4), a natural choice of y is the solution of the ODE The corresponding realization v of V can then be chosen so that s 0 h(y(r))dr + v(s) = z(s) for all s ∈ [0, t]. Theorem 1.1 ensures that formula (8) with input (y, v) yields the correct value for X t (x, ω) along the path z = (Z s (ω)) s∈ [0,t] .
The idea of transforming a stochastic differential equation into an ordinary differential equation with random coefficients goes back to Doss [16] and Sussmann [37]. An extension to SPDEs has been used by Buckdahn and Ma [4,5] to introduce a notion of stochastic viscosity solution for SPDEs and show existence and uniqueness results as well as connections to backward doubly stochastic differential equations. The same approach has been employed by Buckdahn and Ma [6] and Boufoussi et al. [2] to study stochastic viscosity solutions of stochastic Hamilton-Jacobi-Bellman (HJB) equations. In this paper, we analyze the regularity properties of such transformations and use them to develop a Monte Carlo method for approximating solutions of Zakai equations. For different numerical approximation methods for Zakai equations, see e.g., [1,14,15,22,23].
The rest of the paper is organized as follows. In Section 2 we provide preliminary regularity and stability results needed for the proof of the Feynman-Kac representation of Proposition 3.2 in Section 3.1, which is the main ingredient in the proof of Theorem 1.1, given in Section 3.2. In Section 4 we provide numerical results for a Zakai equation of the form (2) for dimensions d ∈ {1, 2, 5, 10, 20, 50, 75, 100}.

Preliminary regularity and stability results
In this section we present essentially well-known regularity and stability results needed to show the Feynman-Kac representation in Proposition 3.2, which is used in the proof of Theorem 1.1.

Approximation and mollification results for at most polynomially growing functions
, and consider at most polynomially growing functions G ∈ C([0, T ] × R d , R) and H ∈ C(R d , R). Moreover, assume that and let G n ∈ C([0, T ] × R d , R) and H n ∈ C(R d , R) for all n ∈ N, t ∈ [0, T ] and x ∈ R d be given by and Then Proof. From (11) and the fact that |min{T, max{s, 0}} − t| ≤ |s − t| for all s ∈ R and t ∈ [0, T ], one obtains for all t ∈ [0, T ] and x ∈ R d . In particular, lim sup n→∞ sup t∈ [0,T ] = 0, which shows (i). Next, note that one has and the assumption that H ∈ C(R d , R) is at most polynomially growing implies that Combining (15), (16), the assumption that H ∈ C(R d , R), the de la Vallée Poussin theorem (cf., e.g., [28,Corollary 6.21]), and the Vitali convergence theorem (cf., e.g., [28,Theorem 6.25]) yields that lim sup n→∞ sup x∈[−q,q] d |H n (x) − H(x)| = 0 for all q ∈ (0, ∞). This establishes (ii) and completes the proof of the lemma.
Lemma 2.2. Let d, m ∈ N, T ∈ (0, ∞), and consider a family of functions f n,t, and Let g n ∈ C(R m , R), n ∈ N 0 , such that for every p ∈ (0, ∞), Then, one has for all p ∈ (0, ∞), Proof. Observe that (18) ensures that there exist n p , N p ∈ N, p ∈ (0, ∞) such that The assumption that g 0 ∈ C(R m , R) and (18) hence imply that (22) Combining this with (19), (21), and the triangle inequality shows that for all p ∈ (0, ∞), which completes the proof of the lemma.
Brownian motion with continuous sample paths defined on a probability space (Ω, F, P). Then

A priori estimates and regularity properties for solutions of ODEs with additive noise
(Ω, F, P) be a probability space supporting a standard Brownian motion U : Then for all p ∈ (0, ∞) and x ∈ R d , and Proof. The triangle inequality, the assumption that b(t, x) R d ≤ c(1 + x R d ) for all t ∈ [0, T ] and x ∈ R d , and (24) assure that for all x ∈ R d , t ∈ [0, T ] and s ∈ [0, t]. Hence, one obtains from Gronwall's integral inequality (cf., e.g., [24,Lemma 2.11]) that for all x ∈ R d , t ∈ [0, T ] and s ∈ [0, t]. So it follows from Lemma 2.3 that for all p ∈ (0, ∞) and x ∈ R d , as well as for all p ∈ (0, ∞), which proves the lemma. Then for all i, j ∈ {1, 2, . . . , d}.

Stability properties of solutions of ODEs with additive noise
partial derivatives of first and second order with respect to the x-variables. Let (Ω, F, P) be a probability space supporting a standard Brownian motion U : [0, T ] × Ω → R d with continuous sample paths, and consider stochastic processes Proof. Throughout this proof we assume without loss of generality that From (42) and the triangle inequality one obtains Combining this with (46) and the assumption that b has bounded partial derivatives with respect to the x-variables yields for all t ∈ [0, T ] and x ∈ R d , which establishes (i). Next, observe that (42), Lemma 2.5.(ii) and the triangle inequality imply that for all i ∈ {1, 2, . . . , d}, t ∈ [0, T ], s ∈ [0, t] and x ∈ R d . Therefore, Lemma 2.5.(iii) and the assumption that c has bounded partial derivatives with respect to the x-variables yield Now, note that it follows from (42) and Lemma 2.5.(iv) that This shows (iii) and completes the proof of the lemma.

Differentiability properties of certain random fields defined in terms of
ODEs with additive noise have bounded partial derivatives of first and second order with respect to the x-variables.
R d < ∞ and the first and second order partial derivatives of B with respect to the x-variables are at most polynomially growing.
Let (Ω, F, P) be a probability space supporting a standard Brownian motion U : Proof. It follows from Lemma 2.5.(i), the assumptions on b, ϕ, B, the chain rule, the fundamental theorem of calculus, (56), and (57) that for all t ∈ [0, T ] and ω ∈ Ω, the mapping R d x → U(t, x, ω) ∈ R belongs to C 1 (R d , R) and Similarly, it follows from Lemma 2.5.(i), (ii), and the assumptions that for all t ∈ [0, T ] and This shows (i) and (iii) and completes the proof of the lemma.
have bounded partial derivatives of first and second order with respect to the x-variables.
have at most polynomially growing partial derivatives of first and second order with respect to the x-variables, and assume that (Ω, F, P) be a probability space supporting a standard Brownian motion U : Proof. Lemma 2.4 and the assumption that ϕ ∈ C 2 (R d , [0, ∞)) has at most polynomially growing partial derivatives of first and second order ensure that for all p ∈ (0, ∞). Similarly, Lemma 2.4 and the assumption that B ∈ C 0,2 ([0, T ] × R d , R) has at most polynomially growing partial derivatives of first and second order with respect to the x-variables guarantee that for all p ∈ (0, ∞). By Lemma 2.4 and the assumption that for all p ∈ (0, ∞). Combining this, (64), and (65) with (63) and Hölder's inequality shows that which together with the de la Vallée Poussin theorem (cf., e.g., [ for all p ∈ (0, ∞). Therefore, one obtains from Lemma 2.7.(i), the de la Vallée Poussin theorem (cf., e.g., [28,Corollary 6.21]), the Vitali convergence theorem (cf., e.g., [28,Theorem 6.25]), and the fundamental theorem of calculus that for all p ∈ (0, ∞). Hence, one obtains from Lemma 2.7.(i), (a)-(b) above, the de la Vallée Poussin theorem (cf., e.g., [28,Corollary 6.21]), the Vitali convergence theorem (cf., e.g., [28,Theorem 6.25]), and the fundamental theorem of calculus that  In this section we derive the Feynman-Kac representation of Proposition 3.2 below from the results of Section 2 and use it to prove Theorem 1.1.

Feynman-Kac representations for linear parabolic partial differential equations
The following lemma is a stepping stone towards the proof of the Feynman-Kac representation of Proposition 3.2. It makes stronger regularity assumptions on the coefficients. Proposition 3.2 can be derived from it by mollifying the coefficients.
have bounded partial derivatives of first and second order with respect to the x-variables, let ϕ ∈ C 2 (R d , [0, ∞)) have at most polynomially growing partial derivatives of first and second order, and let B ∈ C 0,2 ([0, T ] × R d , R) have at most polynomially growing partial derivatives of first and second order with respect to the x-variables. In addition, assume that for all t ∈ [0, T ], s ∈ [0, t] and x ∈ R d , and let the function u : [0, T ] × R d → R be given by Then and sup Let the mappings ϕ n : R d → [0, ∞) and b n , B n : [0, T ] × R d → R d , n ∈ N, be given by and for n ∈ N, t ∈ [0, T ] and x ∈ R d . (102) and (104) imply that (ϕ n ) n∈N ⊆ C 3 (R d , R) as well as Furthermore, since exp( −s 2 /2) is even in s ∈ R, it follows from (105) that Moreover, one obtains from (96) and the fact that | min{T, max{s, 0}}−t| ≤ |t−s| for all t ∈ [0, T ] and s ∈ R that for all n ∈ N, t ∈ [0, T ] and x ∈ R d . Combining this with the assumption that b ∈ C 0,2 ([0, T ] × R d , R d ) has bounded partial derivatives of first and second order with respect to the x-variables and (105) ensures that (a) for all n ∈ N, b n ∈ C 1,2 ([0, T ] × R d , R d ) has a bounded partial derivative with respect to t and bounded partial derivatives of first and second order with respect to the x-variables, Next, note that it follows from (106) and the assumption that In addition, one obtains from (103) and (106) Now, note that it follows from Lemma 2.8 and the assumptions on ϕ, b, and B that x)] for all t ∈ [0, T ] and x ∈ R d . By Lemma 2.8, Lemma 3.1, (a)-(c) and (114)-(110), one has Hence, we obtain from Lemma 2.4 that for all p ∈ (0, ∞), which together with Lemma 2.6.(ii),(b) and (117) shows that for all i ∈ {1, 2, . . . , d} and p ∈ (0, ∞). Furthermore, it follows from Lemma 2.5.(iii) and (b) that for all p ∈ (0, ∞). Lemma 2.6.(iii) (b), (c), and (118) hence assure that lim sup for all i, j ∈ {1, 2, . . . , d} and p ∈ (0, ∞). Moreover, it follows from Lemma 2.5.(v), (b) and (c) that In the next step, we note that Lemma 2.

Proof of Theorem 1.1
Throughout this proof, let be given by and L : and define Observe that it follows from (4)-(8) that for all t ∈ [0, T ] and x ∈ R d , the mapping X t (x) : Ω → R is F t /B(R)-measurable, which establishes (i). Next, note that the assumptions on Moreover, note that it follows from (4) that inf This, (6), and (138) assure that for P-a.a. ω ∈ Ω the following hold: inf inf inf (a)-(c), the assumption that ϕ ∈ C 2 (R d , [0, ∞)) has at most polynomially growing derivatives up to the second order, Lemma 2.8, and Proposition 3.
,Y (ω) in the notation of Lemma 2.8 and Proposition 3.2) ensure the following: for all t ∈ [0, T ] and x ∈ R d .
It follows from the assumptions on h ∈ C 4 (R d , R d ) that for all ω ∈ Ω, the mapping (8), and the fact that X t (x) = e I(t,x) u(t, x) for all t ∈ [0, T ] and x ∈ R d imply that for every ω ∈ Ω, the mapping . This establishes (ii).
In the next step, we note that the fact that (137), and (C) show that for all t ∈ [0, T ] and x ∈ R d . Now, consider the random field The fact that X t (x) = e I(t,x) u(t, x) for all t ∈ [0, T ] and x ∈ R d , (A), and Lemma 3.3 below (applied for every t So it follows from (B) and (149) that P-a.s. for all t ∈ [0, T ] and x ∈ R d . Combining this and (6) with the fact that (∇ (155) Moreover, since V is a Brownian motion and I(t, x) = h(x), V t R d for all t ∈ [0, T ] and x ∈ R d , it follows from (5), (7), and (8) that Itô's formula, the fact that X t (x) = e I(t,x) u(t, x) for all t ∈ [0, T ] and x ∈ R d , (154), and (155) show that for all t ∈ [0, T ] and x ∈ R d . Since X t (x) = e I(t,x) u(t, x) for all t ∈ [0, T ] and x ∈ R d , it therefore follows from (151) that for all t ∈ [0, T ] and x ∈ R d . This, (ii), the fact that (4), and (137) demonstrate that P-a.s. for all t ∈ [0, T ] and x ∈ R d . This establishes (iii) and completes the proof of Theorem 1.1.
and consider the mapping L : where a ∈ R d×d is a symmetric matrix and ν a function in Then for all I ∈ C 2 (R d , R), u ∈ C 2 (R d , R) and x ∈ R d .
Proof. Let I ∈ C 2 (R d , R) and u ∈ C 2 (R d , R). One obtains from the product rule and chain rule that A further application of the product rule and chain rule gives for all i, j ∈ {1, 2, . . . , d} and x ∈ R d , which implies that for all x ∈ R d . Since a is symmetric, one has Trace R d (av ⊗ w) = av, w R d = aw, v R d = Trace R d (aw ⊗ v) vor all v, w ∈ R d . Therefore, one obtains from (161) and (163) that for all x ∈ R d , which proves the lemma.

Numerical experiments
Together with time-discretization, the trapezoidal rule, and Monte Carlo sampling, the Feynman-Kac type representation formula of Theorem 1.1 can be used to approximate the solution of a given Zakai equation of the form (2) along a realization of the observation process Z. We illustrate this in the following example: Choose α, T ∈ (0, ∞), γ ∈ R, d ∈ N, and let σ ∈ R d×d be given by σ ij = d − 1 /2 for all i, j ∈ {1, . . . , d}. Consider a d-dimensional signal process with dynamics for a random initial condition Y 0 : Ω → R d with density ϕ(x) = α and v, y ∈ C([0, T ], R d ), t ∈ [0, T ], x ∈ R d . By Theorem 1.1, the random field X : [0, T ] × R d × Ω → R given by solves the Zakai equation t ∈ [0, T ], x ∈ R d , corresponding to the dynamics (167)-(168). We use formula (172) to approximate X T (x) for a given realization (z(t)) t∈[0,T ] of (Z t ) t∈ [0,T ] . But instead of Y (ω) and V (ω), which in a typical filtering application, are not observable, we plug artificial realizations of (Y t ) t∈[0,T ] and (V t ) t∈ [0,T ] that are consistent with (z(t)) t∈[0,T ] into (172). First, we create a typical path of (Y t ) t∈ [0,T ] , by solving (167) with Y 0 = 0 and W ≡ 0. This gives y(t) = 0, t ∈ [0, T ]. By (168), the corresponding realization of (V t ) t∈[0,T ] then needs to be v(t) = z(t), t ∈ [0, T ].
For numerical purposes we choose an N ∈ N and consider the following discretized versions of (167)-(168): and M ∈ N, x ∈ R d . It follows from the law of large numbers that, for M → ∞, which approximates X T (x). Table 1  The 95% confidence intervals were approximated, using the central limit theorem, with where q 0.975 is the 97.5%-quantile of the standard normal distribution and s 2 M the sample variance of (178) given by The reported runtimes are averages of the times needed to compute X M (0) for five different realizations of Z.
The numerical experiments presented in this section were implemented in Python using Ten-sorFlow on a NVIDIA GeForce RTX 2080 Ti GPU. The underlying system was an AMD Ryzen 9 3950X CPU with 64 GB DDR4 memory running Tensorflow 2.1 on Ubuntu 19.10. The Python source codes can be found in the GitHub repository https://github.com/seb-becker/zakai.