Inference for Convolutionally Observed Diffusion Processes

We propose a new statistical observation scheme of diffusion processes named convolutional observation, where it is possible to deal with smoother observation than ordinary diffusion processes by considering convolution of diffusion processes and some kernel functions with respect to time parameter. We discuss the estimation and test theories for the parameter determining the smoothness of the observation, as well as the least-square-type estimation for the parameters in the diffusion coefficient and the drift one of the latent diffusion process. In addition to the theoretical discussion, we also examine the performance of the estimation and the test with computational simulation, and show an example of real data analysis for one EEG data whose observation can be regarded as smoother one than ordinary diffusion processes with statistical significance.


Introduction
We consider a d-dimensional diffusion process defined by the following stochastic differential equation, where λ > 0, {w t } t≥−λ is a standard r-dimensional Wiener process, x −λ is an R d -valued random variable independent of {w t } t≥−λ , α ∈ Θ 1 and β ∈ Θ 2 are unknown parameters, Θ 1 ⊂ R m 1 and Θ 2 ⊂ R m 2 are compact and convex parameter spaces, a : R d × Θ 1 → R d ⊗ R r and b : R d × Θ 2 → R d are known functions. Our concern is statistical estimation for α and β from observation. θ = (α , β ) denotes the true value of θ := (α, β).
We denote the observation as the discretised process X ih n ,n : i = 0, . . . , n with discretisation step h n > 0 such that h n → 0 and T n := nh n → ∞, where the convoluted process X t,n t≥0 is defined as X t,n := t t−ρh n V h n (t − s) X s ds = R V h n (t − s) X s ds = (V h n * X) (t) , where V h n is an R d ⊗ R d -valued kernel function whose support is a subset of [0, ρh n ], and ρ > 0 such that sup n ρh n ≤ λ. In this paper, we specify V h n = V ρ,h n which is a parametric kernel function whose support is a subset of [0, ρh n ] defined as δ (t) is the Dirac-delta function, ρ = ρ (1) , . . . , ρ (d) T ∈ Θ ρ := [0, ρ] d is the smoothing parameter determining the smoothness of observation. That is to say, the observed process is defined as follows: for all = 1, . . . , d. Let us consider both the problems that (i) ρ is a known parameter, and (ii) ρ is an unknown one whose true value is denoted by ρ and this is estimated by observation X ih n ,n , and the parameter space is denoted as Ξ := Θ ρ × Θ, where Θ := Θ 1 × Θ 2 . When assuming ρ as a known parameter, we can find literature for parametric estimation for α and/or β based on observation schemes which can be represented as special cases for some specific ρ. If ρ = 0, our scheme is simply equivalent to parametric inference based on discretely observed diffusion processes X ih n : i = 0, . . . , n studied in [1][2][3][4][5][6][7][8] and references therein. If ρ = [1, . . . , 1] T , we can regard the problem as parametric estimation for integrated diffusion processes discussed in Gloter [9], Ditlevsen and Sørensen [10], Gloter [11], Gloter and Gobet [12], Sørensen [13]. Even for the case ρ = [0, . . . , 0, 1, . . . , 1] T where some axes correspond to direct observation and the others do to integrated observation, we give consistent estimators for α and β by considering the scheme of convolutionally observed diffusion processes and this is one of the contributions of our study.
What is more, our contribution is to consider the scheme where ρ is unknown and succeed in representation of the microstructure noise which makes the observation smoother than the latent diffusion process itself, which can be seen in some biomedical time series data. Statistical modelling of biomedical time series data with stochastic differential equations has been one of the topics eagerly studied e.g., [14][15][16][17]. As [18] states, the existence of microstructure noise in financial data affects realised volatilities to increase as the subsampling frequency gets higher, for instance, see Figure 7.1 on p. 217 in [19]. However, realised volatilities of some biomedical data such as EEG decrease as subsampling frequency increases. For instance, some time series data for the 2nd participant in the dataset named Two class motor imagery (002-2014) of BNCI Horizon 2020 [20] show clear tendency of decreasing realised volatilities as subsampling frequency increases. Figure 1 shows the path of the 2nd axis of the data S02E.mat BNCI Horizon 2020 [20] for all 222 seconds (the observation frequency is 512 Hz, and hence the entire data size is 113664) and that for the first one second; it seems to perturb like a diffusion process. We define realised volatilities with subsampling as for a sequence of real-valued observation {Y i } i=0,...,n , where k = 1, . . . , 100 is the subsampling frequency parameter, and provide a plot of the realised volatilities the 2nd axis of the data S02E.mat in Figure 2: The altitudes of the graph represented in the y-axis correspond to the values of the realised volatilities RV (k) with subsampling at every k observation represented in the x-axis. It is observable that the increasing subsampling frequency results in decreasing realised volatilities, which cannot be explained by the existent major microstructure noises, e.g., see [21][22][23][24][25]. To explain this phenomenon, we consider the smoother process than the latent one, though ordinarily microstructure noises make the observation rougher than the latent process, because quadratic variation of a sufficiently smooth function is zero. One way to deal with smoother observation than the latent state is convolutional observation. As a concrete example, we show a convolutionally observed diffusion process and its characteristics in realised volatilities: let us consider the following 1-dimensional stochastic differential equation defining an Ornstein-Uhlenbeck (OU) process: where λ = 10 −2/5 . We simulate the stochastic differential equation by Euler-Maruyama method see [26] with parameters n = 10 7 , h n = 10 −5 , and T n = 10 2 and its convolution approximated by summation with the smoothing parameter ρ = 10 (for details, see Section 5). Figure 3 shows the latent diffusion process and the convoluted observation on [0, 1], and we can see that the observation is indeed smoothed compared to the latent state. In Figure 4, we also give the plot of realised volatilities of the convolutionally observed process with subsampling as Figure 2. It is seen that the convolutional observation of a diffusion process also has the characteristics of decreasing realised volatilities as subsampling frequency increases, which can bee seen in some biological data such as BNCI Horizon 2020 [20]. Of course, graphically comparing characteristics of simulation and real data is insufficient to verify the convolutional observation with smoothing parameter ρ > 0 in 1-dimensional case; therefore, we propose statistical estimation method for unknown ρ and hypothesis test with the null hypothesis H 0 : ρ = 0 and the alternative one H 1 : ρ = 0 from convolutional observation in Section 3. Moreover, in Section 6, we examine the real EEG data plotted in Figure 2 by the statistical hypothesis testing we propose, and see it is more appropriate to consider the data as a convolutional observation of a latent diffusion process with ρ = 0 rather than direct observation of the latent process, which indicates the validity to deal with the problem of the convolutional observation scheme with unknown ρ.
The paper is composed of the following sections: Section 2 gives the notations and assumptions used in this paper; Section 3 discusses the estimation and test for smoothing parameter ρ, and the discussion provides us with the tools to examine whether we should consider the convolutional observation scheme; Section 4 proposes the quasi-likelihood functions for the parameter of diffusion processes α and β, and corresponding estimators with consistency; Section 5 is for the computational simulation to examine the theoretical results in the previous sections; and Section 6 shows an application of the methods we propose in real data analysis.

Notations
First of all, we set A (x, α) := a (x, α) ⊗2 , a (x) := a (x, α ), A (x) := A (x, α ) and b (x) := b (x, β ). We also give the notation for a matrix-valued function G (x, α|ρ) such that G (i,j) (x, α|ρ) := The continuity of the function f G is shown in the supplementary material. For the detailed discussion of the necessity of f G , see Remark 4. In addition, we also give the notation used throughout this paper. (Ω, P, F , F t ) denotes the stochastic basis, where F t := σ (x −λ , w s : s ≤ t).
• µ ( f (·)) denotes the integral of a µ-integrable function f where µ is a measure.

Assumptions
With respect to X t , we assume the following conditions.
(iii) There exists a unique invariant measure ν 0 on R d , B R d and for all p ≥ 1 and f ∈ L p (ν 0 ) with polynomial growth,

Remark 1.
For sufficient conditions of these regularity ones, please see [A1] and Remark 1 in Uchida and Yoshida [7].
[A2] There exists C > 0 such that a : With the invariant measure ν 0 , ξ := ρ T , θ T T , and ξ denoting the true value of ξ, we define For these functions, let us assume the following identifiability conditions hold.

Estimation and Test of the Smoothing Parameter
In this section, we discuss the case where the smoothing parameter ρ of the kernel function V ρ,h n is unknown. The estimation is significant for estimation of α and β since we utilise the estimate for ρ in quasi-likelihood functions of α and β. The test problem for hypotheses H 0 : ρ = 0 and H 1 : ρ = 0 is also important to examine whether our framework of convolutional observation is meaningful.

Estimation of the Smoothing Parameter
For simplicity of notation, let us consider the case ρ > 2; otherwise the discussion is quite parallel. We should note that for all i = 1, . . . , d, Let us consider the estimation of ρ (i) with using the next statistics: the full quadratic variation because of Proposition 3 in supplementary material Appendix A, and the reduced quadratic variation defined as 1 converges in probability as follows.

Lemma 1. Under [A1]
, we have the convergence in probability such that Then we define the ratio of those statistics such that where R has the next property.
We defineρ n such that and then continuous mapping theorem for convergence in probability verifies the next result.

Test for Smoothed Observation
For all i = 1, . . . , d, we consider the next hypothesis testing: Let us consider the following test statistic: We also obtain the result to support the consistency of the test.

Theorem 3.
Under H 1 and [A1], we have convergence such that for any c ∈ R,

Remark 3.
These results are intuitive since the quadratic variation and the reduced one with some appropriate scaling should converge to the same value if H 0 holds and the quadratic variation with some appropriate scaling should converge to the value which is smaller than the value which the reduced one with scaling converge to.
Hence when we set the significance level α sig ∈ (0, 1), then we have the rejection region where Φ is the distribution function of the standard Gaussian distribution. Theorem 3 supports the consistency of the test. This test is essential in terms of examining the validity to consider the scheme of convolutional observation: if ρ = 0, then the ordinary observation scheme can be applied, but if ρ = 0, then we have the motivation to consider the convolutional observation scheme.

Least Square Estimation of the Diffusion and Drift Parameters
Let us set the least-square quasi-likelihood functions such that and the least-square estimatorsα n andβ n satisfying when ρ is known, and when ρ is unknown.

Remark 4.
The function G and f G are indeed complex and confusing; hence, we can consider some alternative estimation methods with subsampling or pre-averaging. However, these methods also have the problem what size of subsampling or pre-averaging is proper and the result of the estimation can be dependent on tuning the subsampling size or pre-averaging one. Therefore, our work proposes the estimation method which uses the observation without subsampling or pre-averaging.

Simulations
In this simulation section, we only consider the case where ρ is unknown and should be estimated by data with the method proposed in Section 3.

1-Dimensional Simulation
We examine the following 1-dimensional stochastic differential equation whose solution is a 1-dimensional Ornstein-Uhlenbeck (OU) process: 10], and λ = 10 −7/3 . The procedure of the simulation is as follows: in the first place we iterate an approximated OU process by Euler-Maruyama scheme, for example, see [26] with simulation parameters n sim = 10 5+m , h sim = 10 −10/3−m , T sim = 10 5/3 where m ∈ N is a parameter to determine the precision of approximation; secondly, we give the approximation of convolution by summation such that where i = 0, . . . , n, the sampling frequency h n = 10 −10/3 and n = 10 5/3 . In this Section 5.1, we fix the true value of α and β as α = 3 and β = [−2, 1] T , and change the true value of ρ ∈ Θ ρ := [0, 100] to see the corresponding changes of performance of estimation for ξ, and test for ρ in comparison to estimation by an existent method called local Gaussian approximation (LGA) for parametric estimation of discretely observed diffusion processes, e.g., see [4] which does not concern convolutional observation. All the numbers of iterations for different ρ's are 1000.
In the first place, we see the estimation and test with small values of ρ such that ρ = 0, 0.1, 0.2, . . . , 1 to observe how the performance of statistics changes by difference in ρ. Table 1 summarises the results of simulation ofρ n for ρ's with respective empirical means and root mean square error (RMSE). We can see the proposed estimatorρ n works well for small ρ. With respect to the performance of the test statistic T n proposed in Section 3.2, Table 2 shows the empirical ratio of the number of iterations whose T n is lower than some typical critical values where Φ indicates the distribution function of 1-dimensional standard Gaussian distribution as well as the maximum value of T n in 1000 iterations. Table 2. Empirical ratio of test statistic T n less than some critical values, and the maximum value of T n in 1000 iterations.
Empirical Ratio of T n Less Than ... Even for ρ = 0.1, the simulation result supports the theoretical discussion of the test with consistency. Because Φ 10 −16 = −8.222, all the iterations with ρ ≥ 0.3 result in rejection of H 0 with substantially significance level 10 −16 . Let us see the estimation for α and β by our proposal method and that by the LGA in Table 3. Table 3. Estimation of θ by the proposed method and LGA with small ρ.

The Proposed Method
LGA α β (1) β (2) α β (1) β (2)  Note that the biases of the estimation by LGA increase as the true value of ρ gets larger, while the estimation by our proposal method is not influenced by the true value of ρ. This result of the simulation supports the theoretical discussion in Section 4 stating the consistency ofθ n , and necessity to consider the convolutional observation scheme where the LGA method does not work properly.
Secondly, we consider the estimation and test with large ρ such that ρ = 10, 15, 20 to see if our proposal methods work even for large ρ. We note that the maximum values of T n for ρ = 10, 15, 20 in 1000 iterations are −55.091, −68.462 and −79.105, and hence we can detect the smoothed observation easily. Table 4 shows the empirical means and RMSEs ofρ n for ρ = 10, 15, 20 and we can see that the RMSEs increase as ρ's increase; it indicates the difficulty to estimate ρ accurately when ρ is large.  Table 5 summarises the estimation for θ by means and RMSE, and tells us that although the large RMSE ofρ n results in increase of RMSE ofα n , estimation by our method is substantially better than that by LGA. Table 5. Estimation of θ by the proposed method with large ρ.

2-Dimensional Simulation
We consider the following 2-dimensional stochastic differential equation whose solution is a 2-dimensional OU process: The simulation is conducted with the settings as follows: firstly, we iterate the OU process by Euler-Maruyama scheme with the simulation sample size n sim = 10 5+m , T sim = 10 5/3 and discretisation step h sim = 10 −10/3−m , where m = 2 is the precision parameter for approximation of convolution; in the second place, we approximate the convoluted process with summation such that  Table 6 summarises the estimation for ρ with the method proposed in Section 3 (the inverse of r is computationally obtained) with empirical means and empirical RMSEs ofρ n in 1000 iterations. We can see thatρ n is sufficiently precise to estimate the true value of ρ indeed in this result, which is significant to estimate the other parameters α and β. ρ (1) ρ (2) true value 2. With respect to the estimation for α and β, we compare the estimates by our proposal method with that by LGA which does not concern convolutional observation. Table 7 is the summary for α estimate by both the methods: Table 7. Summary for α estimate.
α (1) α (2) α (3)  We can see that the estimation precision for α by our proposal outperforms those by LGA. This results support validity of our estimation method if we have convolutional observation for diffusion processes. Regarding β, the simulation result is summarised in Table 8: Table 8. Summary for β estimate.
β (1) β (2) β (3) β (4) β (5) β (6)  Though the estimation for β (3) by our method has the smaller bias in comparison to that by LGA, the RMSE of our method is larger than that of LGA; in the estimation for other parameters, our proposal method outperforms the method by LGA. We can conclude that our proposal for estimation of α and β concerning convolutional observation performs better than that not considering this observation scheme.

Estimation and Test for the Smoothing Parameters
In the first place, we pick up the first 15 axes of the dataset and computeρ n and T n proposed in Section 3.1 and Section 3.2 respectively. The results are shown in Table 9.
We can observe that all the 15 time series data have the smoothing parameter ρ > 0 with statistical significance when we assume ordinary significance levels. These results motivate us to use our methods for parametric estimation proposed in Section 4 when we fit stochastic differential equations for these data.

Parametric Estimation for a Diffusion Process
We fit a 1-dimensional OU process for the time series data in the 2nd column of the data file S02E.mat with 512 Hz observation for 222 s (the plot of the path can be seen in Figure 1), whosê ρ n = 1.037 is the largest among those for the 15 axes and it is larger than 0 with statistical significance. According to the simulation result shown in Section 5.1, this size of the smoothing parameter gives critical biases when we estimate α and β with LGA method not concerning convolutional observation scheme.
In the next place, we fit α and β with the least square method proposed in Section 4, and then we have the next fitting result: It is worth noting that this fitting result is substantially different to that by LGA as shown above: hence these results indicate the significance to examine if the observation is convoluted with the smoothing parameter ρ > 0 and otherwise the estimation is strongly biased.

Summary
We have discussed the convolutional observation scheme which deals with the smoothness of observation in comparison to ordinary diffusion processes. The first contribution is to propose this new observation scheme with the statistical test to confirm whether this scheme is valid in real data. The second one is to prove consistency of the estimatorρ n for the smoothing parameter ρ, those for parameters in diffusion and drift coefficient, i.e., α and β, of the latent diffusion process {X t }. Thirdly, we have examined the performance of those estimators and the test statistics in computational simulation, and verified these statistics work well in realistic settings. In the fourth place, we have shown a real example of observation where ρ = 0 holds with statistical significance.
If we combine the test for noise detection by Nakakita and Uchida [28] and that for smoothed observation proposed in this paper, we can test if the observed process is diffusion or not in terms that the observation is noisy or smoothed. Note that the realised volatilities of the noisy observation of diffusion processes increase as observation frequency increases while those of the smoothed observation decreases as the frequency grows. On that point, the noisy observation in ordinary meaning and the smoothed one are 'opposite' to each other.
These contributions, especially the third one, will cultivate the motivation to study statistical approaches for convolutionally observed diffusion processes furthermore, such as estimation of kernel function V appearing in the convoluted diffusion X t := (V * X) (t), test theory for parameters α and β as likelihood-ratio-type test statistics, for example, see [29,30], large deviation inequalities for quasi-likelihood functions and associated discussion of Bayes-type estimators, e.g., [6,[31][32][33]. By these future works, it is expected that the applicability of stochastic differential equations in real data analysis and contributions to the areas with high frequency observation of phenomena such as EEG will be enhanced.

the Results for Some Laws of Large Numbers
In this subsection, we give the notations and statements of propositions without proofs except for Proposition 3: the detailed proofs are given in supplementary material. We assume ∆ ≤ λ, k ∈ N, M > 0, and consider a class of R k ⊗ R d -valued kernel functions on R denoted as K (∆, k, M) such that for all Φ ∆ ∈ K (∆, k, M), it holds: is continuous and at most polynomial growth,

Remark 5. Note one sufficient condition for
for the Cauchy-Schwarz inequality, and Fubini's theorem.
It is easily checked that V ρ,h n ∈ K (max i=1,...,d ρ i h n , d, d).
Let p denote an integer such that sup n∈N ph n ≤ λ, ∆ n := ph n . We set the sequence of the kernels {Φ ∆ n ,n } n∈N such that Φ ∆ n ,n ∈ K (∆ n , d, M) for some M > 0, ∆ n 0 Φ ∆ n ,n ds = I d and there exist a matrix a set L ⊂ {0, . . . , p} such that there exist functions D : We define and the following random quantities such that ν n ( f (·, ξ)) := 1 n n ∑ i=1 f X ih n ,n , ξ , and their first and second derivatives and themselves are at most polynomial growth uniformly in ξ ∈ Ξ. We set p = [ρ] + 1, ∆ n = ph n and see the evaluation of B, D and G when setting our kernel {Φ ∆,n } = V ρ,h n as follows (for the derivation of the evaluation, see the supplementary material):

Proof of the Results in Section 3.1
Proof of Lemma 1. By following the proof of the Proposition 3, it is sufficient to evaluate ρ,h n ph n − s ds ds for the asymptotic behaviour of the reduced quadratic variation by choosing Hence, we obtain the proof (for details, see the supplementary material).

Proof of Lemma 2. Continuity is obvious, and monotonicity is obtained as follows
The inverse can be obtained directly.
Proof of Theorem 1. It follows from Lemma 1, 2 and continuous mapping theorem.

Proofs of the Results in Section 3.2
Proof of Theorem 2. We can clearly prove the result by using Lemma 7 in Kessler [4], Proposition 7 in Nakakita and Uchida [28], and Slutsky's theorem.
Proof of Theorem 3. By Lemma 1, there exists a number < 0 such that and it is obvious that by Proposition A in Gloter [9], and a parallel result holds for X (i) (k−1)h n ,n − X (k−ρ−1)h n 4 . Hence we obtain the result.

Proof of the Results in Section 4
Proof of Theorem 4. We only deal with the case where ρ is unknown because the discussion for the case where ρ is known is parallel. First of all, we prove the consistency ofα n . We obtain that 1 In the next place, we consider the consistency ofβ n . Firstly, we consider the case max i ρ (i) ∈ ( − 1, ) for an integer ∈ {1, . . . , [ρ] + 1}. Then it is sufficient to show 1 nh n H 2,n (β|ρ n ) − 1 nh n H 2,n (β |ρ ) → P V 2 (β|ξ ) uniformly in β due to Assumption [A3]. Because the evaluation D j (x) = O where j ≥ max i=1,...,n ρ (i) * + 1 using independent increments of the Wiener process, Proposition 1 and Proposition 2 verify where F j (β) := − 1 nh 2 n n ∑ k=1+j X kh n ,n − X (k−1)h n ,n − h n b X (k−1−j)h n ,n , β 2 .