The YUIMA Project : a Computational Framework for Simulation and Inference of Stochastic Differential Equations

The Yuima Project is an open source and collaborative effort aimed at developing the R package named yuima for simulation and inference of stochastic differential equations. In the yuima package, stochastic differential equations can be of very abstract type, multidimensional, driven by Wiener process or fractional Brownian motion with general Hurst parameter, with or without jumps specified as Lévy noise. The yuima package is intended to offer the basic infrastructure on which complex models and inference procedures can be built on. The computational framework implemented allow for the estimation of high frequency data and also offer the ability to perform Monte Carlo analysis using cluster infrastructure whenever available in a transparent way to the user. Some real examples of model implementation and data estimation will be considered.


Introduction
The plan of the YUIMA 1 Project is to construct the bases for a complete environment for 1 YUIMA is both the acronym for Yoshida-Uchida-Iacus-Masuda-Andothers but also the name of an important character in Buddhism religion (http://www.kyohaku.go.jp/eng/syuzou/meihin/kaiga/chuugoku/ item01.html)whose approach to problems fits well this project.
simulation and inference for stochastic processes via an R (R Core Team 2013) package called yuima.The yuima package adopts the S4 system of classes and methods (Chambers 1998).
Under this framework, the yuima package also supplies various functions to carry out simulation and statistical analysis.Both categories of procedures may depend on each other.Statistical inference often requires a simulation technique as a subroutine, and a certain simulation method needs to fix a tuning parameter by applying a statistical methodology.It is especially the case of stochastic processes because most of expected values involved do not admit an explicit expression.The yuima package facilitates comprehensive, systematic approaches to the solution.
Stochastic differential equations are commonly used to model random evolution along continuous or practically continuous time, such as the random movements of stock prices.Theory of statistical inference for stochastic differential equations already has a fairly long history, more than three decades, but it is still developing quickly new methodologies and expanding the area.The formulas produced by the theory are usually very sophisticated, which makes it difficult for standard users not necessarily familiar with this field to enjoy utilities.For example, by taking advantage of the analytic approach, the asymptotic expansion method for computing option prices (i.e., expectation of an irregular functional of a stochastic process) provides precise approximation values instantaneously.The expansion formula, which has a long expression involving more than 900 terms of multiple integrals, is already coded in the yuima package for generic diffusion processes.
The yuima package delivers up-to-date methods as a package onto the desk of the user working with simulation and/or statistics for stochastic differential equations.In the yuima package, stochastic differential equations can be of very abstract type, multidimensional, driven by Wiener process or fractional Brownian motion with general Hurst parameter, with or without jumps specified as Lévy noise.
The yuima package is intended to offer the basic infrastructure on which complex models and inference procedures can be built on.This paper explains the design of the yuima package and illustrates some examples of applications.The paper is organized as follows.Section 2 is an overview about the package.Section 3 describes the way in which models are specified in yuima.Section 5 explains asymptotic expansion methods.Section 6 is a review of basic inference procedures.Finally, Section 7 gives additional details and the roadmap of the YUIMA Project.
Although we assume that the reader of this paper has a basic knowledge of the R language, most of the examples are easy to understand if he/she knows stochastic differential equations intuitively or symbolically.

The yuima Package
The package yuima depends on some other packages, like zoo (Zeileis and Grothendieck 2005), which can be installed separately.The package zoo is used internally to store time series data.This dependence may change in the future adopting a more flexible class for internal storage of time series.

How to Obtain the Package
The yuima package is hosted on R-Forge and the web page of the Project is http://r-forge.r-project.org/projects/yuima.The R-Forge page contains the latest development version, and stable version of the package will also be available through CRAN.Development versions of the package are not supposed to be stable or functional, thus the occasional user should consider to install the stable version first.The package can be installed from R-Forge using install.packages("yuima",repos="http://R-Forge.R-project.org").If, the R-Forge system does not provide binary builds of the yuima package, the user can also try install.packages("yuima",repos="http://R-Forge.R-project.org",type="source").

The Main Object and Classes
Before discussing the methods for simulation and inference for stochastic processes solutions to stochastic differential equations, here we discuss the main classes in the package.As mentioned there are different classes of objects defined in the yuima package and the main class is called the yuima-class.This class is composed of several slots.Figure 1 represents the different classes and their slots.The different slots do not need to be all present at the same time.For example, in case one wants to simulate a stochastic process, only the slots model and sampling should be present, while the slot data will be filled by the simulator.We discuss in details the different objects separately in the next sections.
The general idea of the yuima package is to separate into different subclass objects the statistical model, the data and the statistical methods.As will be explained with several examples, the user may give a mathematical description of the statistical model with setModel which prepares a yuima.modelobject by filling the appropriate slots.If the aim is the simulation of the solution of the stochastic differential equation specified in the yuima.modelobject then, using the method simulate, it is possible to obtain one trajectory of the process.As an output, a yuima object (by "yuima object" we mean a, possibly incomplete, object of class yuima) is created which contains the original model specified in the yuima.modelobject in the slot named model and two additional slots named data, for the simulated data, and sampling which contains the description of the simulation scheme used as well as other informations.The details of simulate will be explained in Section 4 along with the use of method setSampling which allows to specify the type of sampling scheme to be used by the simulate method.But yuima object may contain the slot data non only as the outcome of simulate but also because the user decides to analyse its own data.In this case the method setData is used to transform most types of R time series object into a a proper yuima.dataobject.When the slots data and model are available, many other methods can be used to perform statistical analysis on these SDE's (stochastic differential equation) models.These methods will be discussed in Section 6.Further, functionals of stochastic differential equations can be defined using the setFunctional method and evaluated using asymptotc expansion methods as explained in Section 5.The setFunctional method creates a yuima.functionalobject which is included along with a yuima.modelinto a yuima object in order to be used for the evaluation of its asymptotic expansion.

The yuima.model Class
At present, in yuima three main classes of stochastic differential equations can be easily specified.Here we present a brief overview of these models as they will be described in details in Section 3, but this allow to introduce an overall view of the slots of the yuima.modelclass.In yuima one can describe three main families of stochastic processes at present.These models can be one or multidimensional and eventually described as parametric models.Let X 0 = x 0 be the initial value of the process, then, the main classes are as follows: • diffusion models described as where W t is a standard Brownian motion; • SDE's driven by fractional Gaussian noise, with H the Hurst parameter, described as • diffusion process with jumps and Lévy processes solution to The functions a(•), b(•) and c(•) may have a different number of arguments.For example, if the model is homogeneous in time and drift and diffusion coefficients are entirely specified, then we will use the notaion a(x) and b(x) and describe the diffusion model simply as dX t = a(X t )dt+ b(X t )dW t .And so forth.Detailed hypotheses and regularity conditions on the coefficients a(•), b(•) and c(•) for each class of models will be given in the next sections.Nevertheless, it is important to remark that these notations only matter the mathematical description of the model while each coefficient is passed to yuima methods as R mathematical expressions.It means that, for example, a(t, X t , θ) = t • √ θX t will be passed as t*sqrt(x*theta), therefore, the order of the arguments is not relevant to R as well as its mathematical description as long as it is consistent through each specific section.Furhter, yuima is able to accept any userspecified notation for the state variable x (for X t ) and the time variable t and the remaining term of an R expression will be interpreted as parameter as will be explained in Section 3.1.We are now able to give an overview of the main slots of the most important class of the yuima package.
The yuima.model class contains information about the stochastic differential equation of interest.The constructor function setModel is used to give a mathematical description of the stochastic differential equation.All functions in the package are assumed to get as much information as possible from the model instead of replicating the same code everywhere.If there are missing pieces of information, we may change or extend the description of the model.
An object of class yuima.modelcontains several slots listed below.To see inside its structure, one can use the R command str on a yuima object.
• drift is an R vector of expressions which contains the drift specification; • diffusion is itself a list of 1 slot which describes the diffusion coefficient matrix relative to first noise; • hurst is the Hurst index of the fractional Brownian motion, by default 0.5 meaning a standard Brownian motion.More details will be given in Section 3.5; • parameter, which is a short name for "parameters", is a list with the following entries (explained details in Section 3.3): all contains the names of all the parameters found in the diffusion and drift coefficient; common contains the names of the parameters in common between the drift and diffusion coefficients; diffusion contains the parameters belonging to the diffusion coefficient; drift contains the parameters belonging to the drift coefficient; jump contains the parameters belonging to the coefficient of the Lévy noise; measure contains the parameters describing the Lévy measure (explained details in Section 3.6); • measure specifies the measure of the Lévy noise (see Section 3.6); • measure.type a switch to specify how the Lévy measure is described (see Section 3.6); • state.variableand time.variable,by default, are assumed to be x and t but the user can freely choose them and they matter the right hand-side of the equation of the SDE.The yuima.model class assumes that the user either uses default names for state.variableand time.variablevariables or specifies his own names.All other symbols are considered parameters and distributed accordingly in the parameter slot.
Example of use will be given in Section 3.1; • jump.variableindicates the name of the variable used in the description of the Lévy component (see Section 3.6); • solve.variablecontains a vector of variable names, each element corresponds to the name of the solution variable (left-hand-side) of each equation in the model, in the corresponding order.An example of use can be found in Section 3.4.
• noise.numberindicates the number of sources of noise.
• xinit is the initial value of the stochastic differential equation; • equation.numberrepresents the number of equations, i.e., the number of one dimensional stochastic differential equations.
• dimension reports the dimensions of the parameter space.It is a list of the same length of parameter with the same names.
• J.flag is for internal use only.

The YUIMA Project
As seen in the above, the parameter space is accurately described internally in a yuima object because in inference for stochastic differential equations, estimators of different parameters have different properties.Usually, the rate of convergence for estimators in the diffusion coefficient are similar to the ones in the i.i.d.(independent and idensitcally distributed) sampling while estimators of parameters in the drift coefficient are slower or, in some cases, not even consistent.The yuima always tries to implement the best statistical inference for the given model under the observed sampling scheme.

Model Specification
In order to show how general is the approach of the yuima package, we present some examples.
Throughout this section we assume that all the stochastic differential equations exist while in Section 6 we will give regularity conditions needed to have a properly defined statistical model.

One Dimensional Diffusion Processes
Assume that we want to describe the following stochastic differential equation In the above a(x) = −3x and b(x) = 1 1+x 2 according to the notaion of previous section and W t is a standard Wiener process.This can be described in yuima by specifying the drift and diffusion coefficients as plain R expressions passed as strings R> mod1 <-setModel(drift = "-3*x", diffusion = "1/(1+x^2)") By default, yuima assumes that the state variable is x and the time variable is t and the solve variable is again x.Notice that the left hand-side of the equation is implicit, this is why yuima.modelhas the slot solve.variable.The user should not be worried about the warning raised by yuima at this stage, as this is just to inform her on the implicit assumption on the solution variable.At this point, the package fills the proper slots of the yuima object

User Specified State and Time Variables
Suppose now that the user wants to specify her own model using a prescribed notation, e.g., some SDE's like where a(s, y) = −3sy and b(y) = 1/(1 + y 2 ).Then this model can be described in yuima as follows R> mod1b <-setModel(drift = "-3*s*y", diffusion = "1/(1+y^2)", + state.var="y",time.var="s") In this case the solution variable is the same as the state variable.Indeed, the yuima.modelobject appears as follows

Specification of Parametric Models
Assume now that we want to specify a parametric model like this where a(x, θ) = −θx and b(x, γ) = 1/(1 + x γ ).Then, yuima attempts to distinguish the parameters' names from the ones of the state and time variables R> mod2 <-setModel(drift = "-theta*x", diffusion = "1/(1+x^gamma)") so, in this case, theta and gamma, which are different form x and t, are assumed to be parameters.Notice that in the above notation θ and γ are generic names for the components of a parameters' vector θ in the notation of Section 2.3.

Multidimensional Processes
Next is an example of a system of two stochastic differential equations for the couple (X 1,t , X 2,t ) driven by three independent Brownian motions (W but this has to be organized into matrix form with a vector of drift expressions and a diffusion matrix For this system it is now necessary to instruct yuima about the state variable on both the left-hand side of the equation and the right-hand side of the equation, i.e. there is the need to specify also the solve.variablefor the left hand side of the SDE: R> sol <-c("x1","x2") # variable for numerical solution R> a <-c("-3*x1","-x1-2*x2") # drift vector R> b <-matrix(c("1","x1","0","3","x2","0"),2,3) # diffusion matrix R> mod3 <-setModel(drift = a, diffusion = b, solve.variable= sol) Looking at the structure of the noise.numberslot in mod3, one can see that this is now set to 3 which is taken as the number of columns of the diffusion matrix.Again, this model can be easily simulated R> set.seed(123)R> X <-simulate(mod3) R> plot(X, plot.type="single",lty=1:2) Notice that the role of solve.variable is essential because it allows to specify the left hand side of an SDE equation.For example, if one wants to specify this model instead of the previous one the solve.variableargument should be specified as solve.variable=c("x2","x1") in place of solve.variable=c("x1","x2"),all the rest being the same as in model mod3.

Lévy Processes
Jump processes can be specified in different ways in mathematics and hence in yuima package.Let Z t be a Compound Poisson Process (i.e., jumps size follow some distribution, like the Gaussian law and jumps occur at Poisson times).Then it is possible to consider the simple following SDE which involves jumps or the more general representation with µ a random measure associated with the jumps of X, that is, where δ denotes the Dirac measure and Z the driving pure-jump Lévy process of the form zµ(ds, dz).

Specification of Generic Models in yuima
In general, the yuima package allows to specify a large family of models solutions to using the following interface R> setModel(drift, diffusion, hurst = 0.5, jump.coeff,+ measure, measure.type,state.variable= "x", + jump.variable= "z", time.variable= "t", solve.variable,xinit) The YUIMA Project The coefficient c() may also depend on the process Z, in this case the admissible form for the coefficient is c(t, x, θ, z) = c(t, x, θ) • z.In the integral representation correspoding to formula (1) the coefficient c(•) is not allowed to depend on time though.The yuima package implements many multivariate Random Numbers Generators (RNG) which are needed to simulate Lévy paths including rIG (Inverse Gaussian), rNIG (Normal Inverse Gaussian), rbgamma (Bilateral Gamma), rngamma (Normal Gamma) and rstable (Stable Laws).Other user-defined RNG can be used freely.

Simulation, Sampling and Subsampling
The simulate function simulates yuima models according to Euler-Maruyama scheme in the presence of non-fractional diffusion noise and Lévy jumps and uses the Cholesky or the Wood and Chan methods for the fractional Gaussian noise.For diffusion models without jumps, yuima also implements the space discretized Euler-Maruyama method, which is more efficient, in the sense of mean squared error approximation, than the usual time-discretized Euler-Maruyama scheme.The (time discretized) Euler-Maruyama simulation scheme defines a grid of times 0 = τ As the interval τ j+1 − τ j is independent to {W t } t≥τ j for each j, W τ j+1 − W τ j has the same distribution as √ τ j+1 − τ j N j , where N j is an i.i.d.sequence of standard normal variables.
The basic discretization scheme is equidistant, i.e. τ j+1 − τ j = T /n = ∆ n , for all j ≥ 0. But the simulate method provides also space-discretized Euler-Maruyama method to solve SDE.This is at the moment designed for 1 factor SDE only, i.e., the case the driving Brownian motion W is 1-dimensional.In this case, the discretization scheme {τ j } is defined as for each j ≥ 0. Internally, simulate takes 2 = T /n = ∆ n which coincides with the mean of the interval τ j+1 − τ j .This space-discretizing scheme is known to be 3 times efficient than the usual time-discretized scheme one in the sense of the mean squared error.To make use of the space-discretized Euler-Maruyama scheme, one should use the argument space.discretized= TRUE which by default, is set to FALSE.Other simulation scheme for 1-dimensional diffusion processes as explained in Iacus (2008) will be implemented in the near future.
The simulate function accepts several arguments including the description of the sampling structure, which is an object of type yuima.sampling.The setSampling allows for the specification of different sampling parameters including random sampling.Further, the subsampling allows to subsample a trajectory of a simulated stochastic differential equation or a given time series in the yuima.dataslot of a yuima object.Sampling and subsampling can be specified jointly as arguments to the simulate function.This is convenient if one wants to simulate data at very high frequency but then return only low frequency data for inference or other applications.In what follows we explain how to specify arguments of these yuima functions.Complete details can be found in the manual pages of the yuima package.

Subsampling
The sampling structure can be used to operate subsampling.Next example shows how to perform Poisson random sampling, with two independent Poisson processes, one per coordinate of X2.
Subsampling can be used within the simulate function.What is usually done in simulation studies, is to simulate the process at very high frequency but then use data for estimation at a lower frequency.This can be done in a single step in the following way: q q q q qq q q q q q q q q q q q q qq q q qq qq q q q q q q q q qq q q q q q qq q q q q q q qq q q q q qq q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q In the previous code we have simulated the process twice just to show the effect of the subsampling, but the reader should use only the line which outputs the simulation to Y.sub as seen in the next plot.

Asymptotic Expansion
For numerical computation of the expectation of a random variable, the Monte-Carlo method gives a universal solution although it is time consuming and involves stochastic errors of a certain scale depending on the number of replications.An alternative tool is the asymptotic expansion method that can often give a solution with accuracy comparable or superior to Monte-Carlo methods.The asymptotic expansion method has an advantage in the computational time because the approximation is given through an analytic formula.
Let us consider a family of d-dimensional diffusion processes X = (X (ε) t ) t∈[0,T ] (ε ∈ (0, 1]) specified by the stochastic integral equation for ε ∈ (0, 1], where W t = (W 1,t , . . ., W r,t ) is an r-dimensional Wiener process.The functional of interest is expressed in the following abstract form A typical application is the Asian option pricing.For example, in the Black-Scholes model dX the price of the option under zero interest rate is of the form Thus the functional of interest is T and the price of the European call option is E[max(X (ε) T − K, 0)].This value has a closed form in the Black-Sholes economy, but it is necessary to apply some numerical method for pricing the Asian option even in this linear case.
Since X (0) t is the deterministic solution to the ordinary differential equation the functional F (0) becomes a constant given by T , 0). (5) Under standard regularity of a, b, f α and F , it is possible to show F (ε) has a version that is smooth in ε ∈ [0, 1) almost surely, and hence, admits a stochastic expansion as ε ↓ 0. This stochastic expansion makes sense in the Sobolev spaces of the Malliavin calculus.
Then the so-called Watanabe's theory (Watanabe (1987)) validates the asymptotic expansion of the (generalized) expectation as ε ↓ 0 for measurable functions g at most polynomial growth or, more generally, for Schwartz distributions, under the uniform nondegeneracy of the Malliavain covariance of F (ε) .2In the present situation, each d i (g) is expressed as where p i is a polynomial and φ(z; 0, v) is the density of the normal distribution N (0, v) with v = Cov[ F (0) ].Polynomials p i are given by the conditional expectation of multiple Wiener integrals.The expansion (7) holds uniformly in a class of functions g.
As mentioned above, Monte Carlo methods require a huge number of simulations to get the desired accuracy of the calculation of the expectation, while the asymptotic expansion of F (ε)  gives very fast and accurate approximation by analytic formulation.The yuima package provides functions to construct the functional F (ε) and perform automatic asymptotic expansion based on the Malliavin calculus starting from a yuima object.This asymptotic expansion approach to option pricing was proposed in early 1990's ( (Yoshida 1992a), (Takahashi 1999), (Kunitomo and Takahashi 2001)), and several related papers are available today.
We remark that the expansion of E[g( F (ε) )G (ε) ] is also possible by the same method for a functional G (ε) having a stochastic expansion like (6).Thus our method works even under the existence of a stochastic discount factor.
One can compare the result of the asymptotic expansion with other well known techniques like Edgeworth series expansion for the log-normal distribution as proposed, e.g., in Levy (1992).This approximation is available through the package fExoticOptions.

Inference for Stochastic Processes
The yuima implements several optimal techniques for parametric, semi-and non-parametric estimation of (multidimensional) stochastic differential equations.Most of the techniques presented below apply to high frequency data, i.e. when ∆ n , the time lag between to consecutive observations of the process, converges to zero as the number n of observations increases.

How to Input Data Into a yuima Object
Although most of the examples in this section are given on simulated data, the main way to fill up the data slot of a yuima object is to use the function setYuima.The function setYuima sets various slots of the yuima object.In particular, to a yuima.modelcalled mod on the data X one can use a code like this my.yuima<-setYuima(data=setData(X), model=mod) and then pass my.yuima to the inference functions as described in the following.
For example, assuming that an Internet connection is available, the following simple list of commands downloads data from the internet and constructs a yuima object with the data slot containing the time series.

Quasi Maximum Likelihood Estimation
Consider a multidimensional diffusion process where W t is an r-dimensional standard Wiener process independent of the initial variable X 0 .Moreover, The naming of θ 2 and θ 1 is theoretically natural in view of the optimal convergence rates of the estimators for these parameters.Given sampled data X n = (X t i ) i=0,...,n with t i = i∆ n , ∆ n → 0 as n → inf ty, QMLE (Quasi Maximum Likelihood Estimator) makes use of the following approximation of the true log-likelihood for multidimensional diffusions where exactly or approximately.
The yuima package implements QMLE via the qmle function.The interface and the output of the qmle function are made as similar as possible to those of the standard mle function in the stats4 package of the basic R system.The main arguments to qmle consist of a yuima object and initial values (start) for the optimizer.The yuima object must contain the slots model and data.The start argument must be specified as a named list, where the names of the elements of the list correspond to the names of the parameters as they appear in the yuima object.Optionally, one can specify named lists of upper and lower bounds to identify the search region of the optimizer.The standard optimizer is BFGS when no bounds are specified.If bounds are specified then L-BFGS-B is used.More optimizers can be added in the future.

Theoretical Remarks on QMLE
Consistency of an estimator is always a required property; otherwise the estimation would loose mathematical backing because the more data the observer obtains, the worse the estimator behaves.For the consistency of θ1 , we are assuming ∆ n → 0 as n → ∞.Indeed, under this condition, θ1 has asymptotically (mixed) normality (Genon-Catalot and Jacod 1993; Uchida and Yoshida 2012b).On the other hand, one needs T = n∆ n → ∞ for the consistency of θ2 since the Fisher information for θ 2 is finite for a finite T and consistent estimation of θ 2 is theoretically impossible.When T → ∞, usually ergodicity is assumed to ensure a law of large numbers and as a result the consistency of θ2 is obtained.Moreover, asymptotic normality is also established.We assume the condition n∆ p n → 0 for p = 2 while applying the quasi log-likelihood (10) based on the local Gaussian approximation.In practical applications, reduction of the parameter's dimension and relaxing the above condition to n∆ p n → 0 for p larger than 2 are extremely important.Adaptive estimation methods were proposed for p = 3 and for any p in (Yoshida 1992b) and (Uchida and Yoshida 2012a), respectively, with the convergence of moments by a large deviation argument.When T is regarded to be not large, the small sample effect on estimation of θ 2 appears, which will be discussed in Section 6.3.2.Modules for QMLE and Bayes estimators are going to be available for jump-diffusions.See Shimizu and Yoshida (Shimizu and Yoshida 2006a) and Ogihara and Yoshida (Ogihara and Yoshida 2012).

Adaptive Bayes Estimation
Consider again the diffusion process in (9) and the quasi likelihood defined in (10).
The adaptive Bayes type estimator is defined as follows.First we choose an initial arbitrary value θ 2 ∈ Θ 2 and pretend θ 1 is the unknown parameter for which we construct the Bayesian type estimator θ1 as follows where π 1 is a prior density on Θ 1 .According to the asymptotic theory under high-frequency samplings, any function π 1 can be used if it is positive on Θ 1 .For the estimation of θ 2 , we use θ1 to reform the quasi-likelihood function.That is, the Bayes type estimator for θ 2 is defined by where π 2 is a prior density on Θ 2 .In this way, we obtain the adaptive Bayes type estimator θ = ( θ1 , θ2 ) for θ = (θ 1 , θ 2 ).

Theoretical Remarks on Adaptive Bayes Estimator
Under the same conditions, the adaptive Bayes estimator θ1 and θ2 perform in the same way as θ1 and θ2 , respectively.See the remark in Section 6.2 and also Yoshida (1992b) and Uchida and Yoshida (2012b) for details.

Asynchronous Covariance Estimation
Suppose that two Itô processes are observed only at discrete times in a nonsynchronous manner.We are interested in estimating the covariance of the two processes accurately in such a situation.This type of problem arises typically in high-frequency financial time series.
Let T ∈ (0, ∞) be a terminal time for possible observations.We consider a two dimensional Itô process (X 1 , X 2 ) satisfying the stochastic differential equations dX l,t = µ l,t dt + σ l,t dW l,t , t ∈ [0, T ] X l,0 = x l,0 for l = 1, 2.Here W l denote standard Wiener processes with a progressively measurable correlation process d W 1 , W 2 t = ρ t dt, µ l,t and σ l,t are progressively measurable processes, and x l,0 are initial random variables independent of (W 1 , W 2 ).Diffusion type processes are in the scope but this model can express more sophisticated stochastic structures.
The process X l is supposed to be observed over the increasing sequence of times T l,i (i ∈ Z ≥0 ) starting at 0, up to time T. Thus, the observables are (T l,i , X l,i ) with T l,i ≤ T .Each T l,j may be a stopping time, so it possibly depends on the history of (X 1 , X 2 ) as well as on the precedent stopping times.Two sequences of stopping times T 1,i and T 2,j are nonsynchronous, and irregularly spaced, in general.In particular, cce can apply to estimation of the quadratic variation of a single stochastic process sampled regularly/irregularly.
The parameter of interest is the quadratic covariation between X 1 and X 2 : The target variable θ is random in general and it can be estimated with the nonsynchronous covariance estimator (Hayashi-Yoshida estimator) (15) That is, the product of any pair of increments (X 1,T 1,i −X 1,T 1,i−1 ) and (X 2,T 2,j −X 2,T 2,j−1 ) will make a contribution to the sum only when the respective observation intervals (T 1,i−1 , T 1,i ] and (T 2,j−1 , T 2,j ] are overlapping.It is known that U n is consistent and has asymptotically mixed normal distribution as n → ∞ if the maximum length between two consecutive observing times tends to 0. See Hayashi and Yoshida (2005, 2008a, 2006, 2008b) for details.

Example: Data Generation and Estimation by yuima Package
We will demonstrate how to apply cce function to nonsynchronous high-frequency data by simulation.As an example, consider a two dimensional stochastic process (X 1,t , X 2,t ) satisfying the stochastic differential equation dX 1,t = σ 1,t dB 1,t , dX 2,t = σ 2,t dB 2,t . (16) Here B 1,t and B 2,t denote two standard Wiener processes, however they are correlated as where W 1,t and W 2,t are independent Wiener processes, and ρ t is the correlation function between B 1,t and B 2,t .We consider σ l,t , l = 1, 2 and ρ t of the following form in this example: To simulate the stochastic process (X 1,t , X 2,t ), we first build the model by setModel as before.
It should be noted that the method of generating nonsynchronous data can be replaced by a simpler one but we will take a general approach here to demonstrate a usage of the yuima comprehensive package for simulation and estimation of stochastic processes.The parameter we want to estimate is the quadratic covariation between X 1 and X 2 :

R> # diffusion coefficient
Later, we will compare estimated values with the true value of θ given by R> CC.theta <-function( T, sigma1, sigma2, rho) + { + tmp <-function(t) return( sigma1(t) * sigma2(t) * rho(t) ) + integrate(tmp,0,T) + } For the sampling scheme, we will consider the independent Poisson sampling.That is, each configuration of the sampling times T l,i is realized as the Poisson random measure with intensity np l , and the two random measures are independent each other as well as the stochastic processes.Under this scheme the data become asynchronous.It is known that as n → ∞, where We now apply random sampling in the following way: we define a new sampling structure via setSampling specifying in the argument random a list which contains a vector of random distributions.For the i-th component of X we specificy an exponential distribution with rate n•p i /T for the random times.This will generate Poisson random times with the corresponding rate.

Change-Point Analysis
Consider a multidimensional stochastic differential equation of the form where W t is a r-dimensional Wiener process and a t and X t are multidimensional processes, When Y = X the problem is a diffusion model.The process a t may have jumps but should not explode and it is treated as a nuisance in this model.The change-point problem for the volatility is formalized as follows The change-point τ * instant is unknown and is to be estimated, along with θ * 0 and θ * 1 , from the observations sampled from the path of (X, Y ).The yuima implements the quasi-maximum likelihood approach as in Iacus and Yoshida (2012)  Φ n (t; θ0 , θ1 ).

Example of Volatility Change-Point Estimation for 2-Dimensional SDE's
One example of model that can be analyzed by this technique is, for example, the 2-dimensional stochastic differential equation where b 1 (•) and b 2 (•) are any functions and θ 1.k and θ 2.k the value of the parameters before (k = 0) and after k = 1) the change-point.Just for simplicity and in order to simulate some data, we specify some specific form for b 1 (•) and b 2 (•) but this information will not be used in the change-point analysis.For example, we will simulate the following 2-dimensional stochastic differential equation X 1,0 = 1.0,X 2,0 = 1.0, with change-point instant at time τ = 4 and T = 10.First, we describe the model to be simulated now we generate the second trajectory with parameters θ 1.1 = 0.2 and θ 2.1 = 0.4.For this trajectory, the initial value is set to the last value of the first trajectory stored in x1 and x2 for the two component of the process.
R> t1 <-list(theta1.k=0.2, theta2.k=0.4)R> ysamp2 <-setSampling(Initial=n*pobs*0.01,n=n*(1-pobs), delta=0.01)R> yuima2 <-setYuima(model=ymodel, sampling=ysamp2) R> yuima2 <-simulate(yuima2, xinit=c(x1, x2), true.parameter=t1) finaly we collate the two trajectories The composed trajectory appears as follows R> plot(yuima) As said, the change-point analysis does not consider the information coming from the drift part of the model and yuima ignores this internally.Just to make clear that the information on the drift term is not considered by the function CPoint, we redefine the yuima model removing the information coming from the drift and then adding back the data.

An Example of Two Stage Estimation
In practical situations, the initial values of the parameters are not known and there is the need to provide a preliminary estimators of them.One possible approach is the two stage change-point estimation approach as explained in Iacus and Yoshida (2012).The idea is to take a small subsets of observations at the very beginning and the end of the time series, estimate a change point and and then refine the estimation.
To this aim, the yuima package contains two functions which are useful in the framework of change-point or sequential analysis.The function qmleL estimates a model by quasi maximum likelihood using observations in the time interval [0, t] where t can be specificed by the user.In our example, we set t=2. Similarly for qmleR, which uses only observations in the time interval [t, T ].In our example, we take t=8.

LASSO Model Selection
The Least Absolute Shrinkage and Selection Operator (LASSO) is a useful and well studied approach to the problem of model selection and its major advantage is the simultaneous execution of both parameter estimation and variable selection (see Tibshirani (1996); Knight and Fu (2000); Efron, Hastie, Johnstone, and Tibshirani (2004)).
To simplify the idea: take a full specified regression model model selection occurs when some of the θ i are estimated as zeros.The same idea can be applied to diffusion processes.Let X t be a diffusion process solution to dX t = a(X t , α)dt + b(X t , β)dW t α = (α 1 , ..., α p ) ∈ Θ p ⊂ R p , p ≥ 1 β = (β 1 , ..., β q ) ∈ Θ q ⊂ R q , q ≥ 1 with b : Θ p × R d → R d , σ : Θ q × R d → R d × R m and W t , t ∈ [0, T ], is a standard Brownian motion in R m .We assume that the functions a and b are known up to α and β.We denote by θ = (α, β) ∈ Θ p × Θ q = Θ the parametric vector and with θ 0 = (α 0 , β 0 ) its unknown true value.Let H n (X n , θ) = − n (X n , θ) from equation ( 10 and Ḧn is the matrix of second partial derivatives of H with respect to the vector θ.For more details see De Gregorio and Iacus (2012a).The tuning parameters should be chosen as in Zou (2006) in the following way where αj and βk are the unpenalized QMLE's of α j and β k respectively, δ 1 , δ 2 > 0 and usually taken unitary.

Miscellanea and Roadmap of YUIMA Project
Other statistical techniques have been are already implemented in the developer version of the yuima package although not yet released into the current distribution.Among these we mention: the QMLE approach for stochastic differential equations with Lévy noise (Shimizu and Yoshida 2006b;Ogihara and Yoshida 2011); the parametric estimation for the fractional Ornstein-Uhlembeck model (Brouste and Iacus 2012); different simulation schemes as in (Iacus 2008) for multidimensional diffusion processes; Lead-lag estimation (Hoffmann, Rosenbaum, and Yoshida forthcoming); hypotheses testing via φ-divergence (De Gregorio and Iacus 2012b).
• diffusion is itself a list of 1 slot which describes the diffusion coefficient relative to first noise.
• parameter which is a short name for "parameters" which is a list of objects.

F
).The QMLE θ for this model is the solution of the following problemθ = (α, β) = arg min θ H n (X n , θ)The adaptive LASSO estimator is defined as the solution to the quadratic problem under L 1 constraints θ = (α, β) = arg min θ

FIGURE 2 . 1 :
FIGURE 2.1: The main classes in the yuima package.

Figure 1 :
Figure 1: The main classes in the yuima package.
The simulate function fills in addition the two slots data and sampling of the yuima object.
and constructs an approximative solution { Xt , t ≥ 0} of the original process {X t , t ≥ 0} given by Xτ j+1