Nonliner Filtering for Reflecting Diffusions in Random Enviroments via Nonparametric Estimation

We study a nonlinear filtering problem in which the signal to be estimated is a reflecting diffusion in a random environment. Under the assumption that the observation noise is independent of the signal, we develop a nonparametric functional estimation method for finding workable approximate solutions to the conditional distributions of the signal state. Furthermore, we show that the pathwise average distance, per unit time, of the approximate filter from the optimal filter is asymptotically small in time. Also, we use simulations based upon a particle filter algorithm to show the efficiency of the method.


Introduction
Herein, we consider a nonlinear filtering problem in which the signal to be estimated is a reflecting diffusion process in a random environment. The motivation comes from the tracking problem of a dinghy lost on a lake or ocean based upon infrared camera readings. Whereas in calm waters the dinghy moves as a diffusion process reflecting at the shores, lake swells provide a random environment potentially altering its motion greatly.
For concreteness, we let the lake be D = (0, 1) d , denote by ∂D the boundary of D, and define D = D ∪ ∂D. Formally, the D-valued signal process X t for the dinghy can be described by the following stochastic differential equation (SDE) (1.1) where W is a C(D)-valued random variable, B t is a standard R d 1 -valued Brownian motion independent of W , σ 1 : R d → R d×d 1 is a measurable matrix-valued function, a = σ 1 σ T 1 , c = (c 1 , . . . , c d ) T is a measurable vector field, χ is the indicator function, U is the unit inward normal, and ξ t is the local time of X t . Note that Equation (1.1) is purely heuristic because almost all paths of the random environment W may be only continuous not differentiable. To use a symmetric Dirichlet form to construct a diffusion process formally associated with Equation (1.1), we assume that c i = 1 2 d j=1 ∂a ij ∂x j for 1 ≤ i ≤ d. Here, the derivatives are taken in the Schwartz distribution sense.
The R d -valued observation sequence {Y j } is defined by where t j = jε for some fixed constant ε > 0 and j ∈ N, h : D → D is a one-toone continuous differentiable vector field so the Jacobian J(h) = 0 on D, and {V j } is a sequence of i.i.d. N (0, σ 2 ) random vectors independent of X t for some fixed positive definite matrix σ 2 .
Recently, there have been many developments in the study of diffusion processes in random environments (see Tanaka (1995) for a survey). In Kouritzin, Long and Sun (2003), we began the study of the nonlinear filtering problem in which the signal to be estimated is a diffusion in a random medium and the signal may be correlated with the observation noise. Using Dirichlet form theory, we introduced a precise nonlinear filtering model for the signal process and established a multiple Wiener integrals representation for the unnormalized pathspace filtering process. Among other things, this representation generalizes Kunita (1982)'s Wiener chaos expansion [cf. also Ocone (1983)] to the singular coefficients case. Combining this representation with the idea of Lototsky, Mikulevicius and Rozovskii (1997)'s spectral approach we thus provided the capability to develop a numerical scheme for nonlinear filtering of diffusions in Brownian environments.
In this work, we consider the same filtering problem from a different viewpoint. Our practical simulation experience suggests that particle and space discretization methods of implementing filters often work better than chaos methods. Therefore, it is important to come up with particle system approximations for the conditional distributions of the signal based upon the observations. The difficulty that arises is that we do not want to introduce an extremely large number of particles to account for the random environment. Instead, we try to "learn" the environment. Under the assumption that the observation noise is independent of the signal, we develop a nonparametric estimation method for finding workable approximate solutions to the conditional distributions of the signal state given the back observations. In this connection, we also refer the interested reader to Liptser (1997, 2001), Elliott (2001), Kouritzin, Rémillard and Chan (2001) for some other recent works on nonparametric and parametric estimations for filtering problems. Since the functional estimations in our approximate filters are based upon long time observations, it is important to study the convergence of the approximate filters over an infinite interval. Following Budhiraja and Kushner (1999, 2000a, 2000b)'s occupation measure arguments, we show that the pathwise average distance, per unit time, of the approximate filter from the optimal filter is asymptotically small in time.
The remainder of this article is organized as follows. In Section 2, we perform nonparametric functional estimations for an unknown path of the random environment. In Section 3, we construct approximate filters using the obtained estimators and study their approximations to optimal filters over very long time intervals. Moreover, we introduce a particle filter algorithm for combined state and nonparametric estimation. In Section 4, we use simulation results to show the efficiency of the nonparametric estimation method. Finally, in the Appendix, we give the proof of the limit result in Section 3.

Nonparametric functional estimations
We start by introducing a precise signal and observation model. With the most natural example of a random environment, a Brownian sheet, in mind, we assume that the random environment W is a C(D)-valued random variable. That is, we assume that each randomly given outcome of the environment is a continuous function on D. In this section, we will use the observations to construct estimators for the fixed unknown path of the random environment. Hereafter, to simplify the notation, we let W denote the random environment outcome as well as the C(D)-valued random variable as the context will clarify the exact meaning.
Suppose that a = {a ij } d i,j=1 is a symmetric matrix-valued function satisfying the uniformly elliptic condition for some constant λ > 1. Let W be the fixed unknown path of the random environment.
Then, we consider on L 2 (D; µ) the symmetric bilinear form One can check that E is a regular Dirichlet form satisfying the local property so it is associated with a strong Markov diffusion process ( Fukushima, Oshima and Takeda (1994), Theorem 7.2.1 and Theorem 7.2.2]. Here- , the coordinate X t (ω) refers to the state of the process at time t on the path ω, and for any x ∈ D, the probability measure P x describes the probabilistic behavior of the paths when they are started at the point x at time 0. The association means that if we denote by (p t ) t>0 and (T t ) t>0 the semigroups associated with X t and E, respectively, then p t f = T t f dx − a.e. for any f ∈ L ∞ (D; µ) and any t > 0. Comparing their (formal) generators, one can consider X t to be a solution to the formal Skorohod SDE (1.1) [cf. Freidlin (1985), Section 1.6].
If W ≡ 0, we use E 0 to denote the Dirichlet form (2.1). Moreover, we denote its associated Markov process and transition semigroup by X 0 t and (p 0 t ) t>0 , respectively. From, e.g. Theorem 2.2. of Fukushima and Tomisaki (1996), we know that X 0 t is in fact a strong Feller diffusion. The relationship between (p t ) t>0 and (p 0 t ) t>0 can be characterized by where B b (D) denotes the set of all bounded measurable functions on D. Therefore, X t is also a strong Feller diffusion. If W ≡ 0 and a ≡ I d , the d-dimensional unit matrix, then the diffusion associated with (2.1) is the reflecting Brownian motion on D. By the ergodicity of the reflecting Brownian motion and the result on comparison of irreducibility [Fukushima, Oshima and Takeda (1994), Corollary 4.6.4], one can see that E is also irreducible and thus X t is an ergodic diffusion with the stationary distribution µ/µ(D).
Let σ 2 be a fixed d × d-positive definite matrix and {V j } a sequence of i.i.d. N (0, σ 2 ) random vectors on some probability measure space (Ω V , F V , P V ). We define the observation sequence {Y j } by Equation (1.2) and denote In the sequel, we use ||f || ∞ and ||f || 2 to denote respectively the supremum norm and L 2 (D; µ)-norm of any measurable function f on D.
Theorem 1. Suppose that the fixed unknown path W of the random environment and the Jacobian J(h) are 1-periodic in each argument of the vector x. Then, there exists a sequence {Θ n } of 1-periodic C ∞ (D)-valued random variables adapted to the filtration {Y n } such that lim n→∞ ||Θ n − (W + ln µ(D))|| ∞ = 0 in probability P 1 .
Proof. We denote O 0 = ln µ(D) and define Then, by the ergodicity of X t and the independence of {X t } and {V j }, we obtain from the Birkhoff ergodic theorem [cf. Rosenblatt (1971) From (2.3) we see that the Fourier coefficients of e −(W •h −1 +O 0 ) |J(h −1 )| can be estimated using the observations so we may apply trigonometric Fourier series to conduct functional estimation for W + O 0 . To do this, we first give an estimation of the convergence rate for (2.3). Following, e.g. Lemma 1.2 of Ledoux (2001), one can see that E satisfies the Poincaré inequality and thus (T t ) t>0 has the exponential L 2 -convergence rate. Namely, there exists a constant α > 0 such that In particular, for any k ∈ Z d Therefore, we obtain from the independence of {X t } and {V j }, the independence of {V j }, the Markovian property of X t and (2.4) that Let {m l } be a sequence of positive integers satisfying lim l→∞ (2l + 1) 3d m l l k 1 ,...,k d =−l e 4π 2 k T σ 2 k = 0.
Remark 1. Many random environments in practical applications only take effect in some subdomains of D. So it is not a strong restriction to assume in Theorem 1 that W satisfies the periodic boundary condition. Also, a lot of practical nonlinear sensor functions (via some linear transformations if necessary) satisfy the periodic boundary condition in Theorem 1. In the proof of Theorem 1, we made no attempt to optimize the positive integers m l . In practical filtering, we usually let m l be some suitably chosen large numbers (cf. the simulation in Section 4 for an example of choosing m l ).

Approximate filters and approximations over long time intervals
We let (Ω 2 , F 2 , (B 2 t ) t≥0 , P 2 ) be a standard R d 1 -valued Brownian motion, define Ω = Ω 1 × Ω 2 , F = F 1 × F 2 , P = P 1 × P 2 , and take {Θ n } to be a sequence of C ∞ (D)-valued random variables adapted to the filtration {Y n } such that lim n→∞ ||Θ n − (W + O)|| ∞ = 0 in probability P 1 , where W is the fixed unknown path of the random environment and O is some constant (cf. Theorem 1). Then, we define on (Ω, F, P ) a sequence of reflecting diffusion processes {X n t } by where ξ n t is the local time of X n t for n ∈ N. Compared with the signal process X t , X n t is a strong Feller process defined using the approximation Θ n to W that is constructed using only the observations rather than the unknown path W .
Let E denote expectation with respect to P . For φ ∈ C(D), the filtering problem is to evaluate which is the least square estimate of φ(X t j ) given all the observations up to time t j . We define for j ∈ N, . Then, we define a probability measure P 0 on (Ω, F) by where t/ε denotes the greatest integer not more than t/ε and F t := σ{X s , 0 ≤ s ≤ t} ∨ Y t/ε ∨ σ{B 2 s , 0 ≤ s ≤ t}. Let E 0 denote expectation with respect to P 0 . Under P 0 the distribution of {X t j } is the same as under P and {Y j } is a sequence of i.i.d. N (0, σ 2 ) random vectors independent of {X t j }.
Owing to the Markov property of X t , the optimal filter defined by Equation (3.2) satisfies the semigroup relation Hereafter, we use the notation E 0 for the conditional expectation (under P 0 ) of a function F of X t 1 , Y j given the data Y j , where the initial distribution of X is Π j−1 .
For n ∈ N, we define the approximate filter Π n · using the recursive representation Hereafter, we use the notation E 2 {Π n j−1 } [F (X n t 1 )] for the expectation (under P 2 ) of a function F of X n t 1 , where the initial distribution of X n is Π n j−1 . Now, we can state the main theoretical result of this section which shows that the pathwise average distance, per unit time, of the approximate filter from the optimal filter is asymptotically small in time. The proof of this limit result will be given in the Appendix.
Theorem 2. Let the filtering model be as in Section 2. Define the approximate filter Π n · via Equation (3.4), where X n t satisfies Equation (3.1). Let {N n } be a sequence of positive integers satisfying lim n→∞ n/N n = 0. Then, for any φ ∈ C(D), The approximate filters defined by Equation (3.4) usually require excessive computations. So it is attractive to construct numerically feasible approximations via particle filters. In the remaining part of this section, we consider the simplest one which is based upon pure random sampling of the approximating process X l t . Here l ∈ N, the order of the approximate Cesàro mean, is a suitably chosen large number. The basic algorithm for combined state and nonparametric estimation is as follows.
be a set of particles, where M l is the number of particles. In the initialization stage, each particle's state is independently initialized according to the uniform distribution on D.
Evolution: In the evolution stage, each of the particles is independently evolved according to the approximate SDE of the signal as described in Equation (3.1). Here Θ l is defined through the observation sequence: (a) Let m l ∈ N be a sufficiently large number. For a q ∈ Z d + satisfying q u ≤ l for any 1 ≤ u ≤ d, we define S l,q .
(b) Let g l ∈ C ∞ (D) be an approximate function of h. We define Selection: In the selection stage, particles are weighted based upon their likelihood given the current observation. The approximate filter Π l j is defined by the sample average: The evolution and selection steps are repeated at each observation time.
Similarly as in the proof of Theorem 3.1 of Budhiraja and Kushner (2000b), one can show that the conclusion of Theorem 2 holds (cf. Appendix). Also, this scheme can be generalized in many ways. Common variance reduction methods such as antithetic variables and stratified sampling can be used. We refer the interested reader to Budhiraja and Kushner (2000b) for more details. Furthermore, one expects to improve the performance of the particle filter algorithm using many recently developed powerful interacting and branching particle filters.

Simulation Results
Simulation results based upon the above particle filter algorithm are presented below for one dimensional case. We set W (x) = Z x − xZ 1 , where Z is the standard R 1 -valued Brownian motion, and use the nonlinear sensor function h(x) = 1 3 (1 + x − cos πx). We specify the signal noise via σ 1 = 0.1, the observation noise via √ σ 2 = 0.04, and the time period between observations to be ε = 1 time unit. We let l = 12, m l = 1200, and use M l = 2000 particles for the approximate filter.
The curve in the first figure represents an unknown path of the random environment. The curves in the second and third figures represent the signal state and the observation, respectively. When the values of the observations are out of the interval [0, 1], we truncate them. The fourth figure gives the estimations of the signal based upon the weighted particle method. Here, the filter does not have access to the true random environment, but rather must contend with noisy estimates. The filter performs better as time goes on and we get improved functional estimation in environment. The results show that our nonparametric estimation method provides an effective solution to the nonlinear filtering problem for reflecting diffusions in random environments.

Appendix: Proof of Theorem 2
Many arguments in the proof are similar to those used in Theorem 2.1 of Budhiraja and Kushner (2000b); however, the proof is nontrivial considering the dependence of X n and Y and the construction of our approximate filters. Also, we have to be very careful when dealing with asymptotic properties of nonlinear filters because of the gap recently found in Kunita's classic paper (1971) [cf. Budhiraja (2003)]. In the following, we will refer to the proof in Budhiraja and Kushner (2000b), and to concentrate on the differences.
Proof. The basic idea of the proof is to apply the method of occupation measures. For each n, j ∈ N, we define the process where Π n · is defined in Equation (3.4) and B j := j i=1 V i . For an arbitrary measurable set C in the product path space of Ψ n (j, ·), we define the measure-valued random variable Q n,Nn (·): Q n,Nn (C) = 1 N n Nn j=1 I C (Ψ n (j, ·)), where I C (Ψ n (j, ·)) denotes the indicator function of the event Ψ n (j, ·) ∈ C. By the compactness of the state space, one can show that the families {X j+· , j ∈ N}, {Π n j+· , n, j ∈ N}, {Y j+· − Y j , j ∈ N}, {B j+· − B j , j ∈ N} are tight and thus the sequence {E[Q n,Nn (·)]} is tight. Therefore, the sequence {Q n,Nn (·)} of measure-valued random variables is tight by Theorem 1.6.1 of Kushner (1990).
We need to determine the sample values Q ω of any weak sense limit Q(·). For each ω, Q ω induces a process Ψ ω · = (X ω · , Π ω · , Y ω · , B ω · ). The proof of the stationarity of the (X ω · , Π ω · ) in Budhiraja and Kushner (1999) will work without any change for our problem. By virtue of the assumption that lim n→∞ n/N n = 0, one can similarly prove that the relationship between X ω · and Y ω · is characterized by (1.2) as in Theorem 5.1 of Budhiraja and Kushner (1999). Also, the proof that X ω has the law of evolution of X · for almost all ω is the same as in Budhiraja and Kushner (1999).
Let p 0 (t, x, y) be the transition density function of X 0 t . By Theorem 2.4.4 of Davies (1989), we know that there exists c > 0 such that 0 ≤ p 0 (t, x, y) ≤ ct − d 2 almost everywhere on D ×D for all 0 < t < 1. Then, (T 0 t ) t>0 , the L 2 -semigroup associated with the Dirichlet form E 0 , is ultracontractive (cf. Lemma 2.1.2 of Davies (1989)). By the eigenfunction expansion of p 0 (t, x, y) (cf. Theorem 2.1.4 of Davies (1989)) and the argument as in the proof of Theorem 2.4 of Bass and Hsu (1991), one can see that there exist T > 0 and c > 0 such that ≤ e −c t almost everywhere on D × D for all t ≥ T . In other words, p 0 (t, x, y) approaches the stationary distribution uniformly and exponentially. So by Equation (2.2) and the Remarks after Theorem 7.2 of Budhiraja and Kushner (1999), we conclude that the filter {Π j } forgets its initial condition. Thus, the signal-filter pair {X t j , Π j } has a unique probability invariant measure by Theorem 7.1 of Budhiraja and Kushner (1999) [cf. also Budhiraja (2003)]. By Theorem 1 and Equation (2.2) (cf. also the remark below A.2.1 of Budhiraja and Kushner (2000a) , one can see that for any sequence {ν n } of probability measures on D converging weakly to some probability measure ν on D, (X n 0 , X n t 1 ) with the initial distribution ν n converges weakly to (X 0 , X t 1 ) with the initial distribution ν. Following the idea of Theorem 2.1 of Budhiraja and Kushner (2000b), we can thus establish the representation (3.3) for almost all ω. Furthermore, we obtain as in Budhiraja and Kushner (1999) in probability P for any φ ∈ C(D). The proof is therefore done by Theorem 5.2 of Budhiraja and Kushner (2000b).