Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

- The problem of filtering of unobservable components of a multidimensional continuous diffusion Markov process , given the observations of the (multidimensional) process taken at discrete consecutive times with small time steps, is analytically investigated. On the base of that investigation the new algorithms for simulation of unobservable components, , and the new algorithms of nonlinear filtering with the use of sequential Monte Carlo methods, or particle filters, are developed and suggested. The analytical investigation of observed quadratic variations is also developed. The new closed-form analytical formulae are obtained, which characterize dispersions of deviations of the observed quadratic variations and the accuracy of some estimates for . As an illustrative example, estimation of volatility (for the problems of financial mathematics) is considered. The obtained new algorithms extend the range of applications of sequential Monte Carlo methods, or particle filters, beyond the hidden Markov models and improve their performance.


Introduction
In the last two decades a great deal of works has been devoted to development and investigation of particle filters, or sequential Monte Carlo algorithms, for filtering of an unobservable process given observations of the other process , taken at discrete times with small time steps (see, for example, survey [1], collection [2], works [3 -5], and references wherein). The observations are being obtained consecutively, and an estimate for should be updated at each time . It is assumed that the whole process is a Markov process. In order to solve the problem of filtering of , given the sequence of observations with the use of Monte Carlo methods, the samples of the random sequences (from the distributions of when and are given) should have been simulated numerically for .
In general case, first of all, the problem arises: how to obtain an explicit and compact analytical expression (exactly or approximately) for the conditional probability density in order to simulate samples of when and are given.
Suppose that such sample sequences are being simulated, for . The joint probability density for the given observations and the sequence could be computed: where represents the transition probability density of the Markov process , and is the probability density for the initial joint distribution of the initial value and the initial observation .
In a large number of works devoted to particle filters the various algorithms of resampling were introduced in order to obtain the most a posteriori probable samples. For the problem of searching for the maximum a posteriori probable sample the following algorithm of resampling can be suggested: Introduce the values , for . Then can be considered as a weight of the sample sequence (as well as of the sample point ), which characterizes the a posteriori probability (or the importance) of the sample sequence in comparison with the other samples given . Due to the Markov property of the process , the recursive formulae for and can be written in general form; we have . ( Then, it is possible to calculate recursively the new values and , when the new measurement is obtained and the new sample point is augmented to the sequence , so that . It is easy to see, that it is not necessary to keep in memory all the sequence , but only the last point Then the samples with small weights could be deleted, but the "more important" samples should be continued with a few 'offsprings', so that the number of all sample sequences under consideration is equal to N (exactly or approximately). The point and the sequence that have the maximum weight correspond to the point ) and to the sample sequence that is maximally a posteriori probable given the observations , among all the considered sample points ) and sample sequences . Then the value of can be taken as the sought estimate of On the base of the Laws of Large Numbers, it could be proved that this Monte Carlo estimate converges to the maximum a posteriori probability estimate for if increases.
In most of works the additional assumption is accepted that the process is a Markov process in itself, and that the conditional probability density (for the observable process ) can be presented in explicit and simple analytical form. Such cases are often referred to as hidden Markov models. Then, in many works, the sample sequences are being simulated as trajectories of the Markov process in itself, since such a simulation can be easily done. The joint probability density, corresponding to the constructed sample and to the observations , is equal to Then the weights (or the other similar weights) could be easily calculated, and the above procedure of sampling and resampling could be realized, in order to obtain the estimate for . In some cases it would be better to simulate samples with the use of the conditional probability density which includes the observed value . But this could have required a large amount of computations if we do not have an explicit and compact analytical expression for that conditional probability density. Meantime, if the process is being simulated just as a Markov process in itself, the sample sequences represent samples from the a priori distribution for that can be far apart from the a posteriori distribution for given . Then resampling with the use of weights and significant increase in the number are needed in order to find the most a posteriori probable values of using the algorithm described above. Nevertheless, for hidden Markov models the particle filters with such a simple simulation of the sample sequences , with resampling based on consideration of weights, but with large , have been implemented and proved to be useful in some applications.
In general case when represents only some components of a multidimensional Markov process (and is not a Markov process in itself), it was not shown in the literature how to simulate sample sequences when the values of the other components, , are given (i.e. how to simulate sample sequences without a formidable amount of computations at each time ).
Note that the difficulties in obtaining the samples that correspond to the distributions have led to the introduction and the use of some 'proposal sampling distributions' (which can be simulated easier) in many works (see, for example, [2]). In the present work the explicit and compact analytical formulae and algorithms for simulation of the sample sequences , when is given, and for recursive calculations of , weights, and estimates are obtained for the general case of a multidimensional continuous diffusion Markov process .
For the problem of estimation of a function or the new estimates in explicit and closed analytical form are obtained in the following Section 2.2.
Moreover, in the quotients the 'scale' of all the is cancelled, and the information that could characterize the smallness of the a posteriori probability of all the generated samples is lost. In the present work the new sequential Monte Carlo algorithms are derived that include some tests in order to discard the samples of low a posteriori probability before the calculation of all the weights is done. The implementation of the suggested tests guarantees that the samples that remain under consideration belong to the domain where the a posteriori probability density is localized.
The obtained new algorithms improve performance of particle filters, or sequential Monte Carlo methods, and extend the range of their applications beyond the hidden Markov models.
The special case when the diffusion coefficients of the observable process depend on unobservable components is considered. In that case "observed quadratic variations" of the process can be introduced (when observations are taken in discrete time), and they contain a lot of information about . In the present paper, the analytical formulae that characterize the observed quadratic variations are obtained in explicit and closed form. With the use of these results, the algorithm of filtering and estimation that incorporates the observed quadratic variations into the set of observed data is developed.
The implementation of particle filters, as it is usual for Monte Carlo methods, requires a lot of repeating computations, but it became accessible and useful in some applications (for example, [5], [18]), since the speed of computations and the capacity of computers have increased dramatically in the last two decades.
2. Derivation of new recursive algorithms for Monte Carlo simulation and filtering of unobservable components of multidimensional diffusion Markov processes.

Simulation of trajectories of unobservable components. Analytical investigation of a multidimensional diffusion Markov process observed at discrete times.
Consider the multidimensional diffusion Markov process . The components are unobservable, but the other components are available for observations, which are being taken at discrete times .
A diffusion Markov process with continuous trajectories can be characterized by its drift and diffusion coefficients: , where is a small time step. Denote From the assumption that the trajectories of the process are continuous functions of it follows [6,7] The matrix is symmetric and nonnegative definite. Therefore, in general case, that matrix can be represented with the use of its 'square root' in the form , where stands for transpose matrix .
Then the process could be constructed as a solution of the system of stochastic differential equations: where are independent Wiener processes. The initial condition at is given as , where is a random variable independent of all with given probability density .
The system (7) should be interpreted as the system of stochastic integral equations: , (8) with stochastic integrals in the sense of Ito [11]. It is assumed that the drift coefficients satisfy the Lipschitz condition , with (9) It is also assumed that the diffusion coefficients are continuous and differentiable functions of and The Lipschitz conditions (9) guarantee that trajectories do not have finite escapes, i.e.
does not tend to infinity when tends to some finite moments of time.
The system of integral equations (8) with any given continuous trajectory can be solved with the use of successive approximations, which converge and define the continuous trajectory . Thus, the trajectories of the process are continuous with the probability 1.
The diffusion Markov process can be also constructed as the limit solution of the system of finite difference equations: , (10) where the random impacts are independent random variables (for all , ), with and ; ; . In particular, the increments of Wiener processes, can be used in the finite difference equations (10) instead of : , Here again it is assumed that the Lipschitz conditions (9) are satisfied, and that the functions and are differentiable with respect to . The construction of diffusion Markov processes by the passage to the limit in the scheme of finite difference equations when time steps tend to zero was first introduced and investigated by Academician S.N. Bernstein [9,10] in 1934, 1938 years. The works [9,10] were published at first in French. They are republished in Collected works of S.N. Bernstein, vol.4 (in Russian), Publishing House Nauka (Academy of Science of USSR, Moscow, 1964). The results, obtained in these works [9,10], define analytically all the joint probability distributions for the limit process , for all the random variables , ,…, for any and By the Theorem of A.N. Kolmogorov on extension (or continuation) of measures in function space [8], those multidimensional distributions can be continued to the measure on the σ-algebra that corresponds to the process with continuous time . The convergence of the linear broken lines, which represent interpolation of the solutions of finite difference stochastic equations (10) or (11), to the continuous trajectories of diffusion Markov processes was also proved in [19]. Thus, the diffusion Markov process is well defined by the system of finite difference equations (which could be more general, but similar to (10)) and by the passage to the limit from those Markov processes with discrete time [9,10]. Analytical methods and solutions for some problems of filtering and estimation, based on recursive finite difference stochastic equations, with the passage to the limit (as the time steps tend to zero), similarly to the scheme of S.N. Bernstein [9,10] were developed and investigated in [13 -16].
In the present work, we are interested to describe analytically the conditional probability density . Consider the local increment of the Markov process when the value is given, with small time step ; denote . We have the relations (4) -(6) for the moments of given Then the characteristic function for the random value can be presented in the form: , where is a real vector. The last equality follows from the known expression for the characteristic function of a random variable with the use of its moments (see, for example, [12]). The inverse transformation provides the representation , (13) where is the inverse matrix of (with 1 The above expressions (12), (13) show that the local increment (in the small, but finite time interval ) of the multidimensional diffusion Markov process with continuous trajectories, when is given, can be considered as a multidimensional Gaussian random variable.
Using the Theorem on Normal Correlation, we find the following expressions for the first and second moments of the conditional probability distribution of the increments (with ) provided that the increments (with ) and the value are given (see [13] . Hereinafter the summation is being made over repeated indices, with , the matrix represents the inverse or pseudo-inverse (Moore -Penrose) matrix of the matrix of the diffusion coefficients of the observable components , so that For the probability density of the increments we obtain the following expression: , where is the normalization factor.
Note, that in the theory of filtering of diffusion Markov processes with observations made in continuous time , i.e. if is supposed to be known exactly on the time interval , in such case it is assumed that the diffusion coefficients , with , do not depend on unobservable components In the contrary case (as it was pointed out by this author in [13], Chapter 3, and [16], Chapter 3) since the diffusion coefficients could be (at least, theoretically) precisely restored on the base of a single realization of the observed trajectory on the small time interval (no matter how small the taken value of δ is), the functions could have been incorporated into the set of observable functions. In the problem at hand, when filtering of the process should be obtained on the base of observations taken at discrete times with small, but finite time steps , that restriction may be cancelled. We shall consider further how to incorporate some estimates of the values of given in order to improve filtering of The matrix is symmetric and nonnegative definite, and it can be presented in the following form: with , Then the samples , when and are given, can be simulated as where are independent samples from Gaussian standard distribution, with , (for all , , ; . We shall denote shortly Note, that the above expressions (14) - (17) show that if all the diffusion coefficients vanish (with , ) and if , do not depend on , then will be simulated without the use of . Only the initial measurement will be taken into account: the initial sample point should be simulated as a random variable with the probability density . (18) This particular case do not correspond to all the hidden Markov models: The process could be a Markov process in itself, but it is still possible, that some coefficients are not equal to zero, for example, in case, if there are some 'white noises', which enter simultaneously into equations for and into 'random disturbances or random errors' of the observable process .
In general case of a multidimensional continuous Markov diffusion process , the influence of the observed data is manifested in the expressions (17), which describe simulation of the random samples .
The values and can be easily calculated: (19) where the conditional probability densities for the increments, ) , ) are Gaussian densities with moments determined by (14), (15), and (16). And the above procedure of sampling and resampling, which corresponds to a search for the point that provides maximum to the a posteriori probability density, could be implemented. In the next sections 2.2 -2.4 the results of the further analytical investigation and development of the algorithms are presented.

Estimating of a function
Consider the problem of estimation of some functions or given The general expression for the estimate can be written in the following form: . (20) It is assumed that for the considered functions or this conditional expectation exists. In case of a Markov process we can write: .
In our case of a multidimensional continuous diffusion Markov process we have derived the explicit analytical expressions (19), (16). Hence, in this case we obtain , where ) represents the known normalization factor. Denote shortly .
If samples , with and , have been independently taken from the distribution , then, in accordance with basics of Monte Carlo methods, the following integrals can be interpreted as mathematical expectations. Then we obtain , and similarly .
Due to the Laws of Large Numbers, the accuracy of those approximate estimates increases as N increases.
Then the sought estimate can be written in the following form: In most of applications, we can assume that there exists the mathematical expectation of the random value , where is fixed and represent the results of independent random simulations with the use of (17) In case of hidden Markov models where the unobservable process is a Markov process in itself, and the observable process is described by a stochastic differential equation: where is a given nonlinear function with respect to , and is a standard Wiener process, the above estimate takes the form, which is similar to the estimates for that were earlier constructed and studied for this particular case in the literature. The novelty of the estimates (23) -(27) and the simulation (17) are being independently taken from the distribution . Then those sample sequences (with 'offsprings') will be continued until the next time of branching, , except for some sample sequences that are discarded before since their weights become 'too small' (for example, smaller than some chosen threshold ); and so on. For the simplicity of discussion, we can renumber again all the current sample sequences (that still exist at the time ) as . Consider at first the case when all the sample sequences are being continued without discarding. Then their number is growing, and at the time it is equal to (with ).
Consider the estimate , under condition that the above branching sampling procedure is being implemented. Denote the time interval that contains , so that . We can consider the Monte Carlo estimates for the integrals (23), (24) for each -th 'tree' of the branching sample sequences, which begins at the point , with . Denote these random values as , with . Then the random values are independent of each other, and they have one and the same probability distribution, and . Then they obey the Strong Law of Large Numbers, so that converges with probability one to the sought integral as .
But many of the exponential weights are rapidly decreasing with increase in the number of time steps. Therefore, it is possible to discard some highly a posteriori improbable sample sequences , which do not provide noticeable contribution into the estimates . The numbers , , and the threshold can be adjusted (in practical implementation for a particular class of applications) in order to decrease the amount of all the calculations and make them feasible, and at the same time to keep the current sample points in the domain where the a posteriori probability density is localized. In practical implementation, it can be also purposeful to begin branching not at the fixed moment of time but at some current moment of time when the number of all existed sample sequences (or sample points) becomes less than some given fraction of .
It is worth noting that in the following Section 2.4 the tests (47), (48) are obtained that allow detection and rejection of samples of low a posteriori probability to be done without the use of the weights or (for , i.e. independently for each sample sequence . Thus, in the above algorithm we can use such tests instead of the comparison of the weight with the threshold . Then the algorithm described above can be effectively implemented with the use of parallel computing, since the simulation and continuation (the branching resampling) of each sample sequence can be done independently of the other samples with . The values determined by (25) or (27) will be also calculated recurrently and independently for each sample sequence . Thus, that large amount of calculations could be effectively performed with the use of parallel computing. And only on the other stage all the obtained accumulated values (with ) will be used for the calculation of the weights , which are needed for the calculation of the sought estimate .
The new Monte Carlo algorithm of estimation of or (presented above) is derived straightforwardly from Bayes formula (20). The estimates are constructed with the use of the samples or that are simulated by (17), and their weights are defined by (23) -(26), (27).
For the particular case of hidden Markov models, another algorithm with random branching resampling for 'particle filtering' of unobservable Markov 'signal' was developed, that is the Sequential Importance Resampling (SIR) algorithm; it is studied in the works [4], [1] (for the processes with discrete time). In that SIR algorithm with random branching resampling, the a posteriori probability distribution for is approximated by some 'cloud of random particles' with weights , With the use of our new algorithm of simulation of unobservable components given (see Section 2.1, (17) ) and the new closed form analytical expressions (14) - (17), (23) -(27), that SIR algorithm with random branching can be generalized for the general case of filtering of unobservable components of a multidimensional diffusion Markov process , in the following way.
In the generalized algorithm that we are proposing the sample sequence should be chosen for continuation at the time of branching (and kept up to the next time of branching, ) with probability (which will be determined below) in each of independent attempts to continue their ensemble, so that the expectation of the random number of its 'offsprings' is equal to , with , where denotes the number of all the sample points that still existed at the time . We can suggest that the times of branching , with , (Note, that in the standard SIR algorithms). The probability depends on all the sample points , … , or sample sequences . Such a procedure of random resampling is similar to the standard SIR algorithm, but we have to determine the probability for the general case of multidimensional diffusion Markov processes .
For the general case, with the use of analytical expressions (16)  In the case if the diffusion coefficients of the observed process do not depend on the unobservable process , the above expression can be written in the following more concise form: , where .
In case of hidden Markov models (28), which can be considered as a particular case of a multidimensional diffusion Markov process , from the above expression follows: , where . For the simplicity of notation, the above formula is written for the case of one-dimensional processes and , which satisfy (28). The last expression for (for the case ) is in agreement with the determination of the probabilities of continuation of the samples presented in [4], [1].
With the use of the above procedure of random branching and with our algorithm (17) of simulation of given , we can generalize the SIR algorithm of particle filtering with random branching, which was developed for hidden Markov models, to the general case of multidimensional diffusion Markov processes .
Thus, in the present paper a few different possible versions of algorithms of particle filters are provided for filtering of unobservable components given , which are justified theoretically. The new algorithm developed above, with branching sampling and recurrent calculation of the values (25) or (27), appears to be preferable for implementation with the use of parallel computing. The further practical implementation and comparison for various applications could be achieved in the future works.

Analytical investigation of observed quadratic variations.
Consider the case when some diffusion coefficients of the observable process depend on the unobservable process . We assume throughout this paper that the diffusion coefficients are continuous and differentiable functions with respect to . For simplicity of notation, consider at first the case when is one-dimensional. For example, consider the following model that plays an important role in financial mathematics [17]: Here are independent Wiener processes, μ and are known constants, and the functions and are continuous and differentiable with respect to . In the famous Black -Scholes models [17], the observable process in (29) represents the natural logarithm of a stock price, , so that , while the process corresponds to volatility, which is to be estimated given the path , . If the process were available for observations continuously over the time interval , the value could have been restored precisely, at least theoretically.
It is well known that a diffusion Markov process is not differentiable at any time . It can be proved that its quadratic variations , if time steps tend to zero, while ; here , with given , and the value of δ can be arbitrarily small. Note that the relation (30) will be also obtained as a consequence of the new analytical estimates that are developed below.
For the system (29), , and from (30) it follows that the value of could be restored with any desirable accuracy. Then , and the filtering problem would be solved exactly.
But the observations are being taken at discrete times with small, but finite time steps , so that it is impossible to observe precisely the values of In general case, we can calculate the 'observed quadratic variations', , and consider it as an estimate for The similar estimates for volatilities were introduced and considered in [17, chapter 15]. It is more convenient to use the recursive averaging instead of the moving averaging (31). Then (32) with In the limit, if all decreased and tended to zero, we would have obtained .
We can incorporate the observed quadratic variations into the set of observed data. We shall show further, in section 2.4, how to use that "observed quadratic variations" in order to reject at once the samples that are highly improbable given .
We may assume that the considered realization of the process , , is being obtained consecutively in accordance with finite difference stochastic equations (11). The value of observed quadratic variation is being accumulated in parallel, along with that realization of , so that the random increment will be produced after is realized. We are interested to describe the properties of the observed quadratic variations under condition that the realization of unobservable components, , is fixed, although it is unknown, and that the next measurement will arrive after the previous measurement is already given.
The conditional expectation .
In general case, as it was demonstrated above, the increment in small time step , given , can be described as the Gaussian multidimensional random variable with the probability density (16). The increments are independent from the past history before if is given.
For simplicity of discussion suppose that all the time steps We obtain: . That notation means that this small value is proportional to a value which decreases faster than if decreases.

The increments
and with are independent when is given. We obtain that the dispersion of the deviation , .
The process is considered on the finite time interval , so that the values of are limited. Thus, the variance of the deviation (37) is a value proportional to , and if the observed quadratic variation tends to the limit (33). We shall demonstrate below, that the probability distribution for the deviation or tends to the Gaussian one as . It follows that the deviations converge to zero if .
Besides, we obtained new analytical formulae (38) -(41), which describe the dispersion of the deviation (37) of the observed quadratic variation when observations are taken in discrete times, with small time steps.
For the system (28), the general analytical formulae (38) -(40), obtained in the present paper, provide characterization of the accuracy of the estimate of volatility , which is constructed with the use of the observed quadratic variation. If this accuracy satisfies requirements, the problem of estimation is solved. In that case the value of the given time step can be considered as 'small enough'.
In general case, if the value of cannot be uniquely recovered only on the base of observed quadratic variations, the further filtering may be needed.
Finally, we are going to demonstrate that the probability density of the random value can be approximately described as a Gaussian one. The deviation contains the sum of squares of the increments , which are independent of the past history before given , . The increments (given , ) can be described as the Gaussian random variables with probability density (16). The sum of squares of Gaussian random variables is not a Gaussian random variable. However, the probability density for the random deviation can be approximately described as the Gaussian probability density. Consider the case when (with are the functions of and only. Then the variance of deviation is equal to (40), which represents a value proportional to (here the time step can be small, but is finite).
Note, that for a Gaussian random variable with the following relations hold true: , where is a constant, and for the odd moments , with . Those relations can be useful in estimating of the moments of when , is given. Then the consideration of the higher moments of the deviation proves that , if Then the characteristic function of the random deviation takes the form similar to (12), which implies the Gaussian expression for its inverse transformation, similar to (13). Hence, the probability density of that deviation can be approximately described as a Gaussian probability density. The smaller the higher the precision of that approximation, although the deviation itself tends to zero if But in practice the time step should not be chosen too small since in our mathematical model is considered as a diffusion random process, which is not differentiable at any time . That is not the case in practice where we have a smooth trajectory , and the quadratic variation of is equal to zero. Diffusion approximation can be accepted when the time steps , where represents the characteristic time span of decay of correlation, so that correlation between the random values and (given ) would be negligibly small [13 -16].

Detection and rejection of highly a posteriori improbable samples
In the process of resampling, the a posteriori improbable samples are being rejected when their weights become small. Besides, it is possible to discard some highly a posteriori improbable samples before all the weights or are calculated. It can be done with the use of the following tests that are based on analysis of the obtained analytical expressions (19), (21), (22).

The measurements
are being obtained under condition that there is an unobservable realization of the process , which is unknown. (Here the superscript "tr" stands for "true"). The first and the second moments of the random variables given ).
For simplicity of discussion we can assume that all the time steps are equal to . (46) In order to reject the samples , which are highly improbable given , we can use approximately the following test. If the considered dynamic system is stable, the following value can be used as an approximate estimate for the above expression (46), if the hypothesis is true and the measurements are obtained: 22 .
The probability of the following event: is small under condition that is true. Hence, if the inequality (47) is satisfied for the considered sample , the hypothesis should be rejected, and that sample should be discarded. If all the considered sample sequences are discarded, the new sample points should be generated as initial points, i.e. as samples from the initial distribution (18). For the samples that remain under consideration the values and or will be recursively computed. The samples that were not discarded can be continued with a few 'offsprings'. Due to the test (47) the samples that are highly a posteriori improbable (when the sequence of observations is given) will be rejected at once, before the weights or are computed. is small if the hypothesis is true. Hence, if the inequality (48) is satisfied at least for one number ρ, with , then the hypothesis should be rejected, and the sample should be discarded at once, before the calculation of all the weights is done.
In case if , with , the following test can be used instead of (48): .
The derivation of the algorithms is completed.

Conclusion
In this work the analytical investigation of multidimensional diffusion Markov processes considered at discrete times , with small time steps , is developed. The analytical formulae in closed form for the conditional probability density for increments of unobservable components given increments of the observable components and for the probability density of , given , are obtained. On the base of this investigation the new algorithms for simulation of unobservable components, , of a diffusion Markov process given the measurements of the other components and the new algorithms of nonlinear filtering and estimation with the use of sequential Monte Carlo methods, or particle filters, are derived and suggested.
The analytical investigation of observed quadratic variations is also developed. If diffusion coefficients of the observed components, , depend on unobserved components, , then the observed quadratic variations can be used for estimation of unknown values of . The new analytical formulae, (38) -(40), obtained in this work, characterize the accuracy of such estimates. It is shown how to incorporate the observed quadratic variations into the set of observed data.
Besides the use of the weights or , some new tests for detection and rejection of highly a posteriori improbable samples are derived. Realization of such tests provides opportunity for implementation of the new suggested particle filtering algorithm (with branching sampling and recurrent accumulation of the values (25) or (27)) with the use of parallel computing.