On a role of predictor in the filtering stability

When is a nonlinear filter stable with respect to its initial condition? In spite of the recent progress, this question still lacks a complete answer in general. Currently available results indicate that stability of the filter depends on the signal ergodic properties and the observation process regularity and may fail if either of the ingredients is ignored. In this note we address the question of stability in a particular weak sense and show that the estimates of certain functions are always stable. This is verified without dealing directly with the filtering equation and turns to be inherited from certain one-step predictor estimates.


Introduction
Consider the filtering problem for a Markov chain (X, Y ) = (X n , Y n ) n∈Z + with the signal X and observation Y . The signal process X is a Markov chain itself with the transition kernel Λ(u, dx) and initial distribution ν. The observation process Y has the transition probability law P(Y n ∈ B|X n−1 , Y n−1 ) = B γ(X n−1 , y)ϕ(dy), B ∈ B(R), where γ(u, y) is a density with respect to a σ-finite measure ϕ on R. We set Y 0 = 0, so that, a priori information on the signal state at time n = 0 is confined to the signal distribution ν. The random process (X, Y ) is assumed to be defined on a complete probability space (Ω, F , P). Let (F Y n ) n≥0 be the filtration generated by Y : F Y 0 = {∅, Ω}, F Y n = σ{Y 1 , . . . , Y n }. It is well known that the regular conditional distribution dP(X n ≤ x|F Y n ) =: π n (dx) solves the recursive Bayes formula, called the nonlinear filter: subject to π 0 (dx) = ν(dx). Clearly is a version of the conditional expectation E f (X n )|F Y n for any measurable function f = f (x), with E|f (X n )| < ∞.
Assume ν is unknown and the filter (1.1) is initialized with a probability distribution ν, different from ν and denote the corresponding solution byπ = (π n ) n≥0 . Obviously, an arbitrary choice ofν may not be admissible: it makes sense to chooseν such thatπ n (dx) preserves the properties of a probability distribution, i.e. Bπ n (dx) ≥ 0 for any measurable set B ∈ R and Rπ n (dx) = 1 for each n ≥ 1 P-a.s. This would be the case if the right hand side of (1.1) does not lead to 0/0 uncertainty with a positive probability. As explained in the next section, the latter is provided by the relation ν ≪ν, which is assumed to be in force hereafter. In fact it plays an essential role in the proof of main result.
The sequenceπ = (π n ) n≥0 of random measures generally differs from π = (π n ) n≥0 and the estimateπ n (f ) of a particular function f is said to be stable if holds for any admissible pair (ν,ν). The verification of (1.2) in terms of Λ(u, dx), γ(x, y), ϕ(dy) is quite a nontrivial problem, which is far from being completely understood in spite of the extensive research during the last decade.
For a bounded f , (1.2) is closely related to ergodicity of π = (π n ) n≥0 , viewed as a Markov process on the space of probability measures. In the late 50's D. Blackwell, motivated by the information theory problems, conjectured in [5] that π has a unique invariant measure in the particular case of ergodic Markov chain X with a finite state space and noiseless observations Y n = h(X n ), where h is a fixed function. This conjecture was found to be false by T. Kaijser, [15]. In the continuous time setting, H. Kunita addressed the same question in [16] for a filtering model with general Feller-Markov process X and observations where the Wiener process W = (W t ) t≥0 is independent of X. According to [16], the filtering process π = (π t ) t≥0 inherits ergodic properties from X, if the tail σ-algebra of X is P-a.s. empty. Unfortunately this assertion remains questionable due to a gap in its proof (see [4]).
Notice that (1.2) for bounded f also follows from π n −π n tv −−−→ n→∞ 0, P − a.s., (1.4) where · tv is the total variation norm. Typically this stronger type of stability holds when X is an ergodic Markov chain with the state space S ⊆ R (or R d , d ≥ 1) and its transition probability kernel Λ(u, dx) is absolutely continuous with respect to a σ-finite reference measure ψ(dx), while the density λ satisfies the so called mixing condition: with a pair of positive constants λ * and λ * . Then (see [2], [18], [11], [8]), lim n→∞ 1 n log π n −π n tv ≤ − λ * λ * , P-a.s. (1.6) The condition (1.5) was recently relaxed in [8], where (1.6) was verified with λ * replaced by with µ(u) being the invariant density of the signal relative to ψ(du). The mixing condition, including its weaker form, implies geometric ergodicity of the signal (see [8]). However, in general the ergodicity (and even geometrical ergodicity) itself does not imply stability of the filter (see counterexamples in [15], [10], [4]). If the signal process X is compactly supported, the density λ(u, x) usually corresponds to the Lebesgue measure or purely atomic reference measure ψ(dx). Signals with non compact state space do not fit the mixing condition framework since an appropriate reference measure is hard to find and sometimes it doesn't exist (as for the Kalman-Bucy filter).
In summary, stability of the nonlinear filter stems from a delicate interplay of the signal ergodic properties and the observations "quality". If one of these ingredients is removed, the other should be strengthened in order to keep the filter stable. Notably all the available results verify (1.2) via (1.4) and, thus, require restricting assumptions on the signal structure. Naturally, this raises the following question: are there functions f for which (1.2) holds with "minimal" constraints on the signal model ?
In this note, we give examples of functions for which this question has an affirmative answer. It turns out that (1.2) holds if ν ≪ν and the integral equation with respect to g, has a bounded solution. The proof of this fact relies on the martingale convergence theorem rather than direct analysis of filtering equation (1.1). The precise formulations and other generalizations with their proofs are given in Section 2. Several nonstandard examples are discussed in Section 3.

Preliminaries and the main result
For notational convenience, we assume that the pair (X, Y ) is a coordinate process defined on the canonical measurable space (Ω, where B stands for the Borel σ-algebra. Let P be a probability measure on (Ω, F ) such that (X, Y ) is Markov a process with the transition kernel γ(u, y)Λ(u, dx)ϕ(dy) and the initial distribution ν(dx)δ {0} (dy), where δ {0} (dy) is the point measure at zero. LetP be another probability measure on (Ω, F ) such that (X, Y ) is Markov process with the same transition law and the initial distributionν(dx)δ {0} (dy). Hereafter, E andĒ denote expectations relative to P andP respectively. By the Markov property of (X, Y ), ν ≪ν ⇒ P ≪P and dP dP (x, y) = dν dν (x 0 ),P-a.s.
We assume that F Y 0 is completed with respect toP. Denote F Y ∞ = n≥0 F Y n and let P Y ,P Y and P Y n ,P Y n be the restrictions of P,P on F Y ∞ and F Y n respectively. Obviously, Letπ n (dx) be the solution of (1.1) subject toν considered on Ω, F ,P , so that, it is a version of the conditional distributionP(X n ≤ x|F Y n ). Since P ≪P,π n satisfies (1.1) on Ω, F , P as well.
In the sequel, we have to operate with ̺ n π n (dx) as a random object defined on (Ω, F ,P). Sinceν ≪ ν is not assumed, π n cannot be defined properly on (Ω, F ,P) by applying the previous arguments. However, the product ̺ n π n is well defined on (Ω, F ,P). Indeed, let Γ denote the set, where ̺ n π n is well defined. Notice that Γ ∈ F Y n and so,P(Γ ) =P n (Γ ). Now, by the Lebesgue decomposition ofP n with respect to P n , Since both π n and ̺ n are defined P-a.s., P n (Γ ) = 1 holds. Moreover, P n (̺ n > 0) = 1 since P n (̺ n = 0) = {̺n=0} ̺ n dP n = 0. Hence, that is,P n (Γ ) = 1.
Similarly toπ n , the predictorη n|n−1 (g) is well defined P-andP-a.s. while only ̺ n−1 η n|n−1 (g) makes sense with respect to both measures.

Examples
where ξ n (j), j = 1, . . . , d, are independent entries of the random vectors ξ n , which form an i.i.d. sequence independent of X. This variant of Hidden Markov Model is popular in various applications (see e.g. [12]) and its stability analysis has been carried out by several authors (see e.g. [3], [18], [4]) mainly for ergodic chain X. The nonlinear filter (1.1) is finite dimensional, namely, the conditional distribution π n (dx) is just the vector of conditional probabilities π n (i) = P(X n = a i |F Y n ), i = 1, ..., d and The following holds regardless of the ergodic properties of X: Then, lim n→∞ E π n −π n tv = 0.
Proof. The condition (ii) of Theorem 2.1 is satisfied for any g i (y) = y i , i = 1, . . . , d.
Indeed, (a1) and (a2) imply dν dν ≤ const. and the uniform integrability of g i (Y n ) for any i and, then, by Theorem 2.1, The latter and the nonsingularity of B proves the claim.

3.2.
Observations with multiplicative white noise. This example is borrowed from [13]. The signal process is defined by the linear recursive equation where |a| < 1 and (θ n ) n≥1 is (0, b 2 )-Gaussian white noise independent of X 0 , that is, the signal process is ergodic. The distribution function ν has density q(x) relative to dx from the Serial Gaussian (SG) family: where σ is the scaling parameter, α i 's are nonnegative weight coefficients, i≥0 α i = 1 and C 2i are the normalizing constants. The observation sequence is given by where ξ n is a sequence of i.i.d. random variables. The distribution function of ξ 1 is assumed to have the following density relative to dx: where ρ is a positive constant. This filtering model is motivated by financial applications when |X| is interpreted as the stochastic volatility parameter of an asset price. As proved in [13], the filter (1.1) admits a finite dimensional realization provided that α j ≡ 0, j > N for some integer N ≥ 1, namely for any time n ≥ 1 the filtering distribution π n (dx) has a density of SG type with the scaling parameter σ n and the weights a in , which are propagated by a finite (growing with n) set of recursive equations driven by the observations. Thus, the evolution of π n (dx) is completely determined via σ n and α in . Some stability analysis for the sequence σ n , (α in ) i≥1 n≥1 has been done in [14].
This bound is verified information theoretical arguments.
Then for any continuous function f (x), x ∈ R, growing not faster than a polynomial of order p, lim n→∞ E|π n (f ) −π n (f )| = 0.
Proof. If f is an unbounded function, it can be approximated by a sequence of bounded Further, for k = 1, 2, . . ., set wheref ℓ,k (x) is chosen so that the function f ℓ,k (x) is continuous and By the second Weierstrass approximating theorem (see e.g. [21]) one can choose a trigonometrical polynomial P m,ℓ,k (x) such that for any positive number m, Since P m,ℓ,k (x) is a periodic function, |P m,ℓ,k (x)| ≤ 1 m + max |y|≤k |f ℓ,k (y)| ≤ 1 m + ℓ, for any |x| > k.