On parameter estimation for critical affine processes

First we provide a simple set of sufficient conditions for the weak convergence of scaled affine processes with state space $R_+ \times R^d$. We specialize our result to one-dimensional continuous state branching processes with immigration. As an application, we study the asymptotic behavior of least squares estimators of some parameters of a two-dimensional critical affine diffusion process.


Introduction
In recent years quickly growing interest in pricing of credit-risky securities (e.g., defaultable bonds) has been seen in the mathematical finance literature. One of the basic models (for applications see for instance Chen and Joslin [8]) is the following two-dimensional affine diffusion process: where a, b, θ and m are real parameters such that a > 0 and B and W are independent standard Wiener processes. Note that Y is a Cox-Ingersol-Ross (CIR) process. For practical use, it is important to estimate the appearing parameters from some discretely observed real data set. In the case of the one-dimensional CIR process, the parameter estimation of a and b goes back to Overbeck and Rydén [30], Overbeck [31], and see also the very recent papers of Ben Alaya and Kebaier [5,6]. For asymptotic results on discrete time critical branching processes with immigration, one may refer to Wei and Winnicki [34] and [35].
The process (Y, X) given by (1.1) is a very special affine process. The set of affine processes contains a large class of important Markov processes such as continuous state branching processes and Orstein-Uhlenbeck processes. Further, a lot of models in financial mathematics are also special affine processes such as the Heston model [17], the model due to Barndorff-Nielsen and Shephard [4] or the model due to Carr and Wu [7]. A precise mathematical formulation and complete characterization of regular affine processes are due to Duffie et al. [11]. Later several authors have contributed to the study of properties of general affine processes: to name a few, Andersen and Piterbarg [1] (moment explosions in stochastic volatility models), Dawson and Li [10] (jump-type SDE representation for two-dimensional affine processes), Filipović and Mayerhofer [13] (applications to the pricing of bond and stock options), Glasserman and Kim [15] (the range of finite exponential moments and the convergence to stationarity in affine diffusion models), Jena et al. [23] (long-term and blow-up behaviors of exponential moments in multi-dimensional affine diffusions), Keller-Ressel et al. [25,26] (stochastically continuous, time-homogeneous affine processes with state space R n + × R d or more general ones are regular). We also refer to the overview articles Cuchiero et al. [9] and Friz and Keller-Ressel [14].
To the best knowledge of the authors the parameter estimation problem for multi-dimensional affine processes has not been tackled so far. Since affine processes are being used in financial mathematics very frequently, the question of parameter estimation for them is of high importance. Our aim is to start the discussion with a simple non-trivial example: the two-dimensional affine diffusion process given by (1.1).
The article is divided into two parts and there are two appendices. In Section 2 we recall some notations, the definition of affine processes and some of their basic properties, and then a simple set of sufficient conditions for the weak convergence of scaled affine processes is presented. Roughly speaking, given a family of affine processes (Y (θ) (t), X (θ) (t)) t 0 , θ > 0, such that the corresponding admissible parameters converge in an appropriate way (see Theorem 2.9), the scaled process θ −1 Y (θ) (θt), θ −1 X (θ) (θt) t 0 converge weakly towards an affine diffusion process as θ → ∞. We specialize our result for one-dimensional continuous state branching processes with immigration which generalizes Theorem 2.3 in Huang et al. [20]. The scaling Theorem 2.9 is proved for quite general affine processes since it might have applications elsewhere later on. In Section 3 the scaling Theorem 2.9 is applied to study the asymptotic behavior of least squares and conditional least squares estimators of some parameters of a critical two-dimensional affine diffusion process given by (1.1), see Theorems 3.5, 3.8 and 3.11. In Appendix A we check that some integrals in the form of the infinitesimal generator of an affine process that we use are well-defined. Appendix B is devoted to show that the least squares estimator of m cannot be asymptotically weakly consistent.

A scaling theorem for affine processes
Let N, Z + , R, R + , R − , R ++ , and C denote the sets of positive integers, non-negative integers, real numbers, non-negative real numbers, non-positive real numbers, positive real numbers and complex numbers, respectively. For x, y ∈ R, we will use the notations x ∧ y := min(x, y) and x ∨ y := max(x, y). For x, y ∈ C k , k ∈ N, we write x, y := k i=1 x i y i (notice that this is not the scalar product on C k , however for x ∈ C k and y ∈ R k , x, y coincides with the usual scalar product of x and y). By x and A we denote the Euclidean norm of a vector x ∈ R p and the induced matrix norm of a matrix A ∈ R p×p , respectively. Further, let we denote the set of twice (infinitely) continuously differentiable complex-valued functions on R + × R d with compact support, where d ∈ N. The set of càdlàg functions from R + to R + × R d will be denoted by D(R + , R + × R d ). For a bounded function g : R + × R d → R p , let g ∞ := sup x∈R + ×R d g(x) .
Convergence in distribution, in probability and almost sure convergence will be denoted by Next we briefly recall the definition of affine processes with state space R + × R d based on Duffie et al. [11].

Remark.
We call the attention that Duffie et al. [11] in their Definition 2.1 assume only that Equation (2.1) hold for x ∈ R + × R d , u ∈ ∂U = iR 1+d , t ∈ R + , i.e., instead of u ∈ U they only require that u should be an element of the boundary ∂U of U . However, by Proposition 6.4 in Duffie et al. [11], one can formulate the definition of a regular affine process as we did. Note also that this kind of definition was already given by Dawson 2.4 Remark. Note that our Definition 2.3 of the set of admissible parameters is not so general as Definition 2.6 in Duffie et al. [11]. Firstly, the set of admissible parameters is defined only for affine process with state space R + × R d , while Duffie et al. [11] consider affine processes with state space R n + × R d . We restrict ourselves to this special case, since our scaling Theorem 2.9 is valid only in this case. Secondly, our conditions (v) and (vi) of Definition 2.3 are stronger than that of (2.10) and (2.11) of Definition 2.6 in Duffie et al. [11]. Thirdly, according to our definition, a set of admissible parameters does not contain parameters corresponding to killing, while in Definition 2.6 in Duffie et al. [11] such parameters are included. Our definition of admissible parameters can be considered as a (1 + d)-dimensional version of Definition 6.1 in Dawson and Li [10]. The reason for this definition is to have a more pleasant form of the infinitesimal generator of an affine process compared to that of Duffie et al. [11, formula (2.12) , . . . , 1 + d}, and f ′′ i,j , i, j ∈ {1, . . . , 1 + d}, denote the first and second order partial derivatives of f with respect to its i-th and i-th and j-th variables, and f ′ (  [11]) of the affine semigroup (P t ) t∈R + in Theorem 2.5 is identically zero. This also implies that the affine processes that we will consider later on will have lifetime infinity. ✷ 2.7 Remark. In dimension 2 (i.e., if d = 1), by Theorem 6.2 in Dawson and Li [10] and Theorem 2.7 in Duffie et al. [11] (see also Theorem 2.5), for an infinitesimal generator A given by (2.2) with d = 1 one can construct a two-dimensional system of jump type SDEs of which there exists a pathwise unique strong solution (Y (t), X(t)) t∈R + which is a regular affine Markov process with the given infinitesimal generator A. ✷ The next lemma is simple but very useful.
2.8 Lemma. Let (Z(t)) t∈R + be a time-homogeneous Markov process with state space R + × R d and let us denote its infinitesimal generator by A Z . Suppose that C 2 c (R + × R d ) is a subset of the domain of A Z . Then for all θ ∈ R ++ , the time-homogeneous Markov process (Z θ (t)) t∈R + := (θ −1 Z(θt)) t∈R + has infinitesimal generator Proof. By definition, the infinitesimal generator of (Z θ (t)) t∈R + takes the form Let a, α, β ∈ R (1+d)×(1+d) , b ∈ R + × R d , and let (Y (t), X(t)) t∈R + be a (1 + d)-dimensional affine process with state space R + × R d and with the set of admissible parameters (a, α, b, β, 0, 0), where 2.10 Remark. (i) Note that the limit process (Y (t), X(t)) t∈R + in Theorem 2.9 has continuous sample paths almost surely. However, this is not a big surprise, since in condition (2.3) of Theorem 2.9 we require finite second moment for the measure µ.
(ii) Note also that the matrix α ∈ R (1+d)×(1+d) given in Theorem 2.9 is symmetric and positive semidefinite, since α is symmetric and positive semidefinite, and for all z ∈ R 1+d , ✷ Proof of Theorem 2.9. By Duffie et al. [11,Theorem 2.7], C ∞ c (R + × R d ) is a core of the infinitesimal generator A (Y,X) of the process (Y (t), X(t)) t∈R + , and hence ) denotes the domain of A (Y,X) , see, e.g., Ethier and Kurtz [12, page 17]. In other words, the closure First note that it is enough to prove (2.4) for real-valued functions f ∈ C ∞ c (R + × R d ), since if (2.4) holds for for real-valued functions f ∈ C ∞ c (R + ×R d ), then, by decomposing f into real and imaginary parts, the linearity of the infinitesimal generators in question and triangular inequality yield Hence in what follows without loss of generality we can Then, by Lemma 2.8, (2.2) and (2.5), Hence, for all x = (x 1 , x 2 ) ∈ R + × R d , using the triangular inequality and that | u, v | u v , u, v ∈ R p , we have and hence, by our assumptions, in order to prove (2.4) it is enough to check that (2.7) First we consider (2.6). Let ε ∈ R ++ be fixed. Let us choose an M ∈ R ++ such that In what follows, for abbreviation, [0, M ] × [−M, M ] d will be denoted by D M . Such an M can be chosen, since f ∈ C ∞ c (R + ×R d ) yields f ′ ∞ < ∞ and, by assumption (2.3), R + ×R d ξ m(dξ) < ∞. By Taylor's theorem, for all θ ∈ R ++ , x ∈ R + × R d and ξ ∈ D M there exists some τ = τ (θ, x, ξ) ∈ [0, 1] such that The convexity of D M implies τ ξ = τ (θ, x, ξ)ξ ∈ D M for all θ ∈ R ++ , x ∈ R + × R d and ξ ∈ D M , and, hence, sup Since f ′ is uniformly continuous on R + × R d (which follows by mean value theorem using also that f ′′ ∞ < ∞), there exists a θ 0 ∈ R ++ (depending on ε and M ) such that for all x ∈ R + × R d and θ ∈ [θ 0 , ∞). Further, by (2.8), we have for all x ∈ R + × R d and θ ∈ R ++ . Putting the pieces together we have (2.6).
Now we turn to prove (2.7) in a similar way. Let ε ∈ R ++ be fixed again. Let us now choose an M ∈ R ++ such that 2 sup Here Further, we have again for all x ∈ R + × R d and θ ∈ [θ 1 , ∞). Moreover, there exists a θ 2 ∈ R ++ (depending on ε and M ) such that . Here Consequently, by (2.10), we have for all x ∈ R + × R d and θ ∈ R ++ . Putting the pieces together we have (2.7).
Finally, Ethier and Kurtz [12, Corollary 8.7 on page 232] yields our assertion. Namely, with the notations of part (f) of this corollary (but replacing n by θ), let otherwise, satisfies (4.7) on page 113 in Ethier and Kurtz [12]), θ ) defined on page 24 in Ethier and Kurtz [12] by part (c) of Proposition 1.5 on page 9 in Ethier and Kurtz [12]), Then, by our assumptions, convergence of the initial distributions holds, condition (8.35) on page 232 in Ethier and Kurtz [12] is automatically satisfied, and (2.4) shows the validity of condition (8.36) on page 232 in Ethier and Kurtz [12]. ✷ 2.11 Remark. By giving an example, we shed some light on why we consider only (1+d)-dimensional affine processes with state space R + × R d in Theorem 2.9 instead of (n + d)-dimensional ones with state space R n + × R d , n ∈ N. Let (Y t ) t∈R + be a two-dimensional continuous state branching process with infinitesimal generator for f ∈ C 2 c (R 2 + ) and y = (y 1 , y 2 ) ∈ R 2 + , where p i , i = 1, 2, are σ-finite measures on R 2 + \ {0} such that see, e.g., Duffie et al. [11,Theorem 2.7]. Note that Y can be considered as a two-dimensional affine process with state space R 2 + (formally with d = 0). Then, by a simple modification of Lemma 2.8, for all θ > 0, f ∈ C 2 c (R 2 + ) and y = (y 1 , y 2 ) ∈ R 2 + , where the last equality follows by (2.12). Supposing that f is real-valued, by Taylor's theorem, Hence, similarly to the proof of (2.7), we get for real-valued f ∈ C 2 c (R 2 + ) and y = (y 1 , We also note that this phenomena is somewhat similar to that of Remark 2.1 in Ma [28]. ✷ In the next remark we formulate some special cases of Theorem 2.9. 3) is satisfied, then the conditions of Theorem 2.9 are satisfied for (Y (θ) (t), X (θ) (t)) t∈R + := (Y (t), X(t)) t∈R + , θ ∈ R ++ , and hence where (Y(t), X (t)) t∈R + is a (1 + d)-dimensional affine process on R + × R d with admissible parameters (0, α, b, 0, 0, 0), where α and b are given in Theorem 2.9.
(ii) If (Y (t), X(t)) t∈R + is a (1 + d)-dimensional affine process on R + × R d with (Y (0), X(0)) = (0, 0) and with admissible parameters (0, α, b, 0, 0, 0), then where L = denotes equality in distribution. Indeed, by Proposition 1.6 on page 161 in Ethier and Kurtz [12], it is enough to check that the semigroups (on the Banach space of bounded Borel measurable functions on R + × R d ) corresponding to the processes in question coincide. By the definition of a core, this follows from the equality of the infinitesimal generators of the processes in question on the core C ∞ c (R + × R d ), which has been shown in the proof of Theorem 2.9. ✷ Next we present a corollary of Theorem 2.9 which states weak convergence of appropriately normalized one-dimensional continuous state branching processes with immigration. Our corollary generalizes Theorem 2.3 in Huang et al. [20] in the sense that we do not have to suppose that ∞ 1 ξ 2 m(dξ) < ∞, only that ∞ 1 ξ m(dξ) < ∞ (with the notations of Huang et al. [20]), and our proof defers from that of Huang et al. [20].

2.13
Corollary. For all θ ∈ R ++ , let (Y (θ) (t)) t∈R + be a one-dimensional continuous state branching process with immigration on R + with branching mechanism and with immigration mechanism where α (θ) 0, b (θ) 0, β (θ) ∈ R and n and p are measures on (0, ∞) such that Let α, b, β ∈ R, and let (Y (t)) t∈R + be a one-dimensional continuous state branching process with immigration on R + with branching mechanism and with immigration mechanism Proof. For each θ ∈ R ++ and t ∈ R + , let X (θ) (t) := 0.
Then for each θ ∈ R ++ , the process (Y (θ) (t), X (θ) (t)) t∈R + is a two-dimensional affine process with admissible parameters where δ 0 denotes the Dirac measure concentrated on 0 ∈ R. Then, by Theorem 2.9, for the two-dimensional affine processes ( Note that in fact X is the identically zero process. Finally, Theorem 9.30 in Li [27] yields the assertion. ✷ 3 Least squares estimator for a critical two-dimensional affine diffusion process In this section continuous time stochastic processes will be written as (ξ t ) t∈R + instead of (ξ(t)) t∈R + . Let (Ω, F, (F t ) t∈R + , P) be a filtered probability space satisfying the usual conditions, i.e., (Ω, F, P) is complete, the filtration (F t ) t∈R + is right-continuous and F 0 contains all the P-null sets in F. Let (W t ) t∈R + and (B t ) t∈R + be independent standard (F t ) t∈R + -Wiener processes. Let us consider the following two-dimensional diffusion process given by the SDE where a ∈ R ++ and b, θ, m ∈ R.

Preparations and (sub)(super)criticality
The next proposition is about the existence and uniqueness of a strong solution of the SDE (3.1).
3.1 Proposition. Let (η, ζ) be a random vector independent of (W t , B t ) t∈R + satisfying P(η 0) = 1. Then, for all a ∈ R ++ and b, m, θ ∈ R, there is a (pathwise) unique strong solution (Y t , X t ) t∈R + of the SDE (3.1) such that P((Y 0 , X 0 ) = (η, ζ)) = 1 and P(Y t 0 for all t ∈ R + ) = 1. Further, for all 0 s t, Proof. By Ikeda and Watanabe [21, Example 8.2, page 221], there is a pathwise unique non-negative strong solution (Y t ) t∈R + of the first equation in (3.1) with any initial value η satisfying P(η 0) = 1. Next, by applications of the Itô's formula to the processes (e bt Y t ) t∈R + and (e θt X t ) t∈R + , respectively, we have Note that it is the assumption a ∈ R ++ that ensures P(Y t 0, ∀ t ∈ R + ) = 1.
Next we present a result about the first moment of (Y t , X t ) t∈R + .
Proof. By Proposition 3.1, we have and so, taking expectations of both sides, where we used that the processes are martingales which can be checked as follows. First we check that they are local martingales with respect to the filtration (F t ) t∈R + . Let us define the increasing sequence of stopping times by δ n := inf{t 0 : Y t n}, n ∈ N. Since Y has continuous trajectories almost surely, we have P(lim n→∞ δ n = ∞) = 1. Using (δ n ) n∈N as a localizing sequence, we have The local martingale property of t 0 e bu √ Y u dW u t∈R + follows by Ikeda and Watanabe [21, page 57]. Hence, using (3.2) and that a ∈ R ++ , we find that for all t ∈ R + and n ∈ N, and then, by Fatou's lemma, we can deduce that t 0 e bu √ Y u dW u t∈R + is indeed a martingale. First, we note that a local martingale M is a square integrable martingale if E([M, M ] t ) < ∞ for all t ∈ R + , where ([M, M ] t ) t∈R + denotes the quadratic variation process of M , see, e.g., Corollary 3 on page 73 in Protter [32]. Here the quadratic variation process of t 0 e bu √ Y u dW u t∈R + takes the form where, for the inequality, we used Fubini's theorem, (3.4) and our assumption E(Y 0 ) < ∞. Replacing b by θ, we have the desired martingale property of t 0 e θu √ Y u dB u t∈R + , too. ✷ Next we show that the process (Y t , X t ) t∈R + given by the SDE (3.1) is an affine process.
Then (Y t , X t ) t∈R + is an affine process with infinitesimal generator Proof. For calculating the infinitesimal generator of (Y t , X t ) t∈R + , without loss of generality, we may suppose that P((Y 0 , X 0 ) = (y 0 , x 0 )) = 1, where (y 0 , x 0 ) ∈ R + × R. By Itô's formula, for all real-valued functions f ∈ C 2 and A (Y,X) f is given by (3.5). Here (M t (f )) t∈R + is a square integrable martingale with respect to the filtration with some constants C 1 > 0 and C 2 > 0, where the finiteness of the integrals follow by Proposition 3.2. Finally, if f ∈ C 2 c (R + × R) is complex valued, then, by decomposing f into real and imaginary parts, one can argue in the same way as above. In what follows we define and study criticality of the affine process given by the SDE (3.1).

Definition.
Let (Y t , X t ) t∈R + be an affine diffusion process given by the SDE (3.1) satisfying P(Y 0 0) = 1. We call (Y t , X t ) t∈R + subcritical, critical or supercritical if the spectral radius of the matrix e −bt 0 0 e −θt is less than 1, equal to 1 or greater than 1, respectively.
Note that, since the spectral radius of the matrix given in Definition 3.4 is max(e −bt , e −θt ), the affine process given in Definition 3.4 is Definition 3.4 of criticality is in accordance with the corresponding definition for one-dimensional continuous state branching processes, see, e.g., Li [27, page 58].
In this section we will always suppose that For some explanations why we study only this special case, see Remarks 3.6, 3.7 and 3.9. In the next sections under Condition (C) we will study asymptotic behaviour of least squares estimator of θ and (θ, m), respectively. Before doing so we recall some critical models both in discrete and continuous time.
In general, parameter estimation for critical models has a long history. A common feature of the estimators for parameters of critical models is that one may prove weak limit theorems for them by using norming factors that are usually different from the norming factors for the subcritical and supercritical models. Further, it may happen that one has to use different norming factors for two different critical cases.
We recall some discrete time critical models. If (ξ k ) k∈Z + is an AR(1) process, i.e., ξ k = ̺ξ k−1 +ζ k , k ∈ N, with ξ 0 = 0 and an i.i.d. sequence (ζ k ) k∈N having mean 0 and positive variance, then the (ordinary) least squares estimator of the so-called stability parameter ̺ based on the sample ξ 1 , . . . , ξ n takes the form where (W t ) t∈R + is a standard Wiener process and L −→ denotes convergence in distribution. Here n( ̺ n − 1) is known as the Dickey-Fuller statistics. We emphasize that the asymptotic behaviour of ̺ n is completely different in the subcritical (|ρ| < 1) and supercritical (|ρ| > 1) cases, where it is asymptotically normal and asymptotically Cauchy, respectively, see, e.g., Mann and Wald [29], Anderson [2] and White [36].
For continuous time critical models, we recall that Huang et al. [20,Theorem 2.4] studied asymptotic behaviour of weighted conditional least squares estimator of the drift parameters for discretely observed continuous time critical branching processes with immigration given by where Y 0 0, a 0, b ∈ R, σ 0, W is a standard Wiener process, N 0 (ds, dξ) is a Poisson random measure on (0, ∞) × [0, ∞) with intensity ds n(dξ), N 1 (ds, du, dξ) is a Poisson random measure on (0, ∞) × (0, ∞) × [0, ∞) with intensity ds du p(dξ) such that the σ-finite measures n and p are supported by (0, ∞) and Our technique differs from that of Huang et al. [20] and for completeness we note that the limit distribution and some parts of the proof of their Theorem 2.4 suffer from some misprints. Furthermore, Hu and Long [19] studied the problem of parameter estimation for critical mean-reverting α-stable where Z is an α-stable Lévy motion with α ∈ (0, 2)) observed at discrete instants. A least squares estimator is obtained and its asymptotics is discussed in the singular case (m, θ) = (0, 0). We note that the forms of the limit distributions of least squares estimators for critical two-dimensional affine diffusion processes in our Theorems 3.5 and 3.8 are the same as that of the limit distributions in Theorems 3.2 and 4.1 in Hu and Long [19], respectively. We also recall that Hu and Long [18] considered the problem of parameter estimation not only for critical mean-reverting α-stable motions, but also for some subcritical ones (m = 0 and θ > 0) by proving limit theorems for the least squares estimators that are completely different from the ones in the critical case. Huang et al. [20] investigated the asymptotic behaviour of weighted conditional least squares estimator of the drift parameters not only for critical continuous time branching processes with immigration, but also for subcritical and supercritical ones.
Using our scaling Theorem 2.9 we can only handle a special critical affine diffusion model given by (1.1) (for a more detailed discussion, see Remark 3.7). The other critical and non-critical cases are under investigation but different techniques are needed.

Least squares estimator of θ when m is known
The least squares estimator (LSE) of θ based on the observations X i , i = 0, 1, . . . , n, can be obtained by solving the extremum problem This definition of LSE of θ can be considered as the counterpart of the one given in Hu and Long i.e., is indeed the solution of the extremum problem provided that n i=1 X 2 i−1 > 0.
3.5 Theorem. Let us assume that Condition (C) holds. Then P( n i=1 X 2 i−1 > 0) = 1 for all n 2, and there exists a unique LSE θ LSE n which has the form given in (3.6). Further, where (X t ) t∈R + is the second coordinate of a two-dimensional affine process (Y t , X t ) t∈R + given by the unique strong solution of the SDE   with initial value (Y 0 , X 0 ) = (0, 0), where (W t ) t∈R + and (B t ) t∈R + are independent standard Wiener processes.
3.6 Remark. (i) The limit distributions in Theorem 3.5 have the same forms as those of the limit distributions in Theorem 3.2 in Hu and Long [19].
(ii) The limit distribution of n θ LSE n as n → ∞ in Theorem 3.5 can be written also in the form (iii) By Proposition 3.3, the affine process (Y t , X t ) t∈R + given in Theorem 3.5 has infinitesimal generator Hence for all t ∈ R ++ , the conditional distribution of X t given X 0 and (Y s ) s∈[0,t] is a normal distribution with mean X 0 + mt and with variance t 0 Y s ds. Here the variance t 0 Y s ds is positive almost surely for all t ∈ R ++ . Indeed, let A t := {ω ∈ Ω : s → Y s (ω) is continuous on [0, t]}. Then P(A t ) = 1, and, since P(Y 0 0) = 1, for all ω ∈ A t , t 0 Y s (ω) ds = 0 if and only if Y s (ω) = 0 for all s ∈ [0, t]. By (3.1), we have The stochastic integral on the right hand side can be approximated as for all t ∈ R + , by Jacod and Shiryaev [22,Theorem I.4.44]. Hence there exists a sequence (n k ) k∈N of positive integers such that for all t ∈ R + . Consequently, with the notation implying P t 0 Y s ds = 0 = 0, and hence P t 0 Y s ds > 0 = 1. It yields that and hence P( n i=1 X 2 i−1 > 0) = 1 for all n 2. Now we turn to prove (3.7). By Itô's formula, we have d(X 2 t ) = 2X t dX t + Y t dt, t ∈ R + , and hence, using also X 0 = 0, we have For the process (X t ) t∈R + , a discrete version of the identity (3.10) can be easily checked. The aim of the following discussion is to prove (3.12) Let us consider the unique strong solution of the SDE Consequently, for all n ∈ N, we have as n → ∞, as n → ∞, Here the first two convergences are consequences of the definition of the Riemann integral using also that (X t ) t∈R + has continuous sample paths almost surely. The convergence (3.15) can be checked as follows. With the notations of Jacod and Shiryaev [22], τ n := i n ∧ 1 i∈N n∈N is a Riemann sequence of (adapted) subdivisions and hence, by Jacod and Shiryaev [22,Theorem I.4.47], the sequence of processes converges to the quadratic variation process of X in probability, uniformly on every compact interval. Especially, with t = 1, using also the SDE (3.8), we have (3.15). Hence, in order to prove (3.12), it suffices to show convergences as n → ∞. Indeed, one can refer to Slutsky's lemma using also that for any random vectors U n , By (3.1) and (3.13), we have For each n ∈ N, there exists an even and twice continuously differentiable function ψ n : R → R + with |ψ n (x)| |x|, |ψ ′ n (x)| 1, ψ n (x) ↑ |x| as n → ∞ for all x ∈ R, and for all n ∈ N and x, y ∈ R + , see, e.g., in Karatzas for all t ∈ R + and n ∈ N. The last term is an (F t ) t∈R + -martingale, since where the last inequality follows by Lemma 3.2. Thus the expectation of the last term on the right hand side of (3.22) is zero, whereas the expectation of the second integral is bounded by 2t/n. We conclude By monotone convergence theorem, we have which yields (3.21).
Finally, by (3.12) and the continuous mapping theorem, and using that we have the assertion. Indeed, the function g : R 5 → R, defined by is continuous on the set {(x, y, z, u, v) ∈ R 5 : y = 0}, and the limit distribution in (3.12) is concentrated on this set since P 1 0 X 2 t dt > 0 = 1. Indeed, if P( 1 0 X 2 t dt = 0) > 0 held, then, by the almost sure continuity of the sample paths of X , we would have P(X t = 0, ∀ t ∈ [0, 1]) > 0. Hence on the event {ω ∈ Ω : X t (ω) = 0, ∀ t ∈ [0, 1]}, the quadratic variation of X would be identically zero. Since dX t = m dt + √ Y t dB t , t ∈ R + , the quadratic variation of X is the process t 0 Y u du t∈R + , and then we would have t 0 Y u du = 0 for all t ∈ [0, 1] on the event {ω ∈ Ω : X t (ω) = 0, ∀ t ∈ [0, 1]}. This yields us to a contradiction similarly as at the beginning of the proof due to that a ∈ R ++ and dY t = a dt + √ Y t dW t , t ∈ R + . Hence the continuous mapping theorem (see, e.g., Theorem 2.3 in van der Vaart [33]) yields Since P n θ LSE for all n 2, the assertion follows using (3.10) and that if ξ n , η n , n ∈ N, and ξ are random variables such that ξ n L −→ ξ as n → ∞ and lim n→∞ P(ξ n = η n ) = 1, then η n L −→ ξ as n → ∞, see, e.g., Barczy et al. [3,Lemma 3.1]. ✷ 3.7 Remark. If the affine diffusion process given by the SDE (3.1) is critical but (b, θ) = (0, 0) (i.e., b = 0, θ > 0 or b > 0, θ = 0), then the asymptotic behaviour of the LSE θ LSE n cannot be studied using Theorem 2.9 since its condition lim θ→∞ θβ (θ) = β is not satisfied. ✷

Least squares estimator of (θ, m)
The LSE of (θ, m) based on the observations X i , i = 0, 1, . . . , n, can be obtained by solving the extremum problem We need to solve the following system of equations with respect to (θ, m): which can be written also in the form .
Then one can check that which consists of the second order partial derivatives of the function 3.8 Theorem. Let us assume that Condition (C) holds. Then for all n 2, (3.29) and there exists a unique LSE ( θ LSE n , m LSE n ) which has the form given in (3.27) and (3.28). Further, where (X t ) t∈R + is the second coordinate of a two-dimensional affine process (Y t , X t ) t∈R + given by the unique strong solution of the SDE   with initial value (Y 0 , X 0 ) = (0, 0), where (W t ) t∈R + and (B t ) t∈R + are independent standard Wiener processes.
3.9 Remark. (i) The limit distributions in Theorem 3.8 have the same forms as those of the limit distributions in Theorem 4.1 in Hu and Long [19].
By (3.3), for all n 2, where we used that the conditional distribution of By (3.27) and (3.28), we have , and using (3.11) and (3.12), as in the proof of Theorem 3.5, the continuous mapping theorem yields the assertion. We only remark that Since X has continuous sample paths almost surely, holds if and only if Hence, since X 0 = 0, we have (3.31) holds if and only if P(X t = 0, ∀ t ∈ [0, 1]) > 0, which is a contradiction due to our assumption a ∈ R ++ (for more details, see the proof of Theorem 3.5). ✷

Conditional least squares estimator of (θ, m)
For all t ∈ R + , let F (Y,X) t be the σ-algebra generated by (Y s , X s ) s∈[0,t] . The conditional least squares estimator (CLSE) of (θ, m) based on the observations X i , i = 0, 1, . . . , n, can be obtained by solving the extremum problem where we used that t 0 e θu √ Y u dB u t∈R + is a martingale (which follows by the proof of Proposition 3.2). Using that (Y t , X t ) t∈R + is a time-homogeneous Markov process, we have where ( γ CLSE n , δ CLSE n ) is a CLSE of (γ, δ) based on the observations X i , i = 0, 1, . . . , n, which can be obtained by solving the extremum problem is bijective and measurable, and then there is a bijection between the set of CLSEs of the parameters (θ, m) and the set of CLSEs of the parameters A(θ, m). This follows easily, since for all n ∈ N, For the extremum problem (3.33), we need to solve the following system of equations with respect to (γ, δ): which can be written also in the form consisting of the second order partial derivatives of the function 3.10 Remark. Using the definition of CLSE of (θ, m) we give a mathematical motivation of the definition of the LSE θ n of θ introduced in Section 3.2. Note that if θ = 0, then If θ = 0, then, by Taylor's theorem, 1 − e −θ = e −τ θ θ with some τ = τ (θ) ∈ [0, 1], and hence for i = 1, . . . , n − 1. Hence for small values of θ one can approximate . . , n. Based on this, for small values of θ, in the definition of the LSE of θ, the sum n i=1 (X i − X i−1 − (m − θX i−1 )) 2 can be considered as an approximation of the sum i−1 )) 2 in the definition of the CLSE of (θ, m). ✷ 3.11 Theorem. Let us assume that Condition (C) holds. Then for all n 2, (3.36) and there exists a unique CLSE ( θ CLSE n , m CLSE n ) which has the form given in (3.32). Further, where (X t ) t∈R + is the second coordinate of a two-dimensional affine process (Y t , X t ) t∈R + given by the unique strong solution of the SDE   with initial value (Y 0 , X 0 ) = (0, 0), where (W t ) t∈R + and (B t ) t∈R + are independent standard Wiener processes.
Proof. By the proof of Theorem 3.8, we have (3.36). By (3.34) and (3.35), for all n 2 we have Using (3.11) and (3.12), the continuous mapping theorem, by the same technique as in the proof of Theorem 3.8, we get as n → ∞. (3.40) By Slutsky's lemma, we also have γ CLSE n P −→ 1 as n → ∞. Hence, by Taylor's theorem using also that γ CLSE n > 0, n ∈ N (due to its definition given in (3.32)), we have where ξ n is in the interval with endpoints 1 and γ CLSE n . Since γ CLSE n P −→ 1 as n → ∞, we have ξ n P −→ 1 as n → ∞, and hence using the decomposition Slutsky's lemma and (3.39), we get (3.37).
Next we turn to prove (3.38). For this, by (3.32), (3.40) and by Slutsky's lemma, it is enough to check that as n → ∞.
by (3.41), for all ε > 0 we have 3.12 Remark. (i) We do not consider the CLSE of θ supposing that m is known since the corresponding extremum problem is rather complicated, and from statistical point of view it has less importance.
Note also that there is another way for checking that the integrals in (2.2) are well-defined under the conditions (v) and (vi) of Definition 2.3. Namely, using that the integrals in (2.12) in Duffie et al. [11] are well-defined, the assertion follows since and similarly B On consistency properties of the LSE and CLSE of (θ, m) Let us suppose that Condition (C) holds. Using Slutsky's lemma and Theorems 3.8 and 3.11 we get θ LSE n and θ CLSE n converge stochastically to the parameter θ = 0 as n → ∞, respectively, and in what follows we show that m LSE n and m CLSE n do not converge stochastically to the parameter m as n → ∞, respectively. For this it is enough to check that the weak limits of m LSE n and m CLSE n given in Theorems 3.8 and 3.11 do not equal to m almost surely, respectively. Since the weak limits in question are the same, we can give a common proof. First note that and hence X 1 Here (X t −mt) t∈R + = t 0 √ Y u dB u t∈R + is a square integrable martingale (see the proof of Proposition 3.2) and X s = ms + s 0 √ Y u dB u for s ∈ R + , thus, for all s ∈ [0, 1], we have where F Y 1 denotes the σ-algebra generated by (Y u ) u∈[0,1] . For the last equality above, we used that conditionally on the σ-algebra F Y 1 , the stochastic process  Similarly, for all s ∈ [0, 1], we have If s 1 , s 2 ∈ [0, 1], then, by (3.10), Similar expression hold in case of s 1 , s 2 ∈ [0, 1] with s 1 s 2 , we have to change s 1 by s 2 .
Furthermore, using (3.10), J 2 2 can be written in the form