Compound Poisson approximation for triangular arrays with application to threshold estimation

We prove weak convergence of triangular arrays to the compound Poisson limit using Tikhomirov's method. The result is applied to statistical estimation of the threshold parameter in autoregressive models.


Introduction and main result
This paper is concerned with weak convergence of sums over triangular arrays with certain dependence structure to the compound Poisson distribution. It is motivated by the threshold estimation problem, described in details in Section 2. We consider triangular arrays of random variables Y n,j , j = 1, ..., n, n ∈ N with rows, adapted to a filtration (F j ), j ∈ N. Y n,j 's are asymptotically negligible and satisfy a weak dependence (mixing) condition made precise by the following assumptions.
(A2) there is an integer ℓ ≥ 1, such that The following condition on the individual characteristic functions φ n,j (t) = Ee itY n,j together with the above assumptions, will assure convergence of the sums S n = n j=1 Y n,j , n ∈ N, to the compound Poisson law (hereafter we shall abbreviateφ(t) = d dt ϕ(t), etc.): (A4) There exists a characteristic function ϕ(t) and positive constants C 2 and µ such that φ n,j (t) − n −1 µφ(t) ≤ C 2 n −2 , t ∈ R.
Note that the mixing in A2 and A3 can be arbitrarily weak. Further assumptions on the rate of convergence of α(k) to zero, such as: (A5) α(k) ≤ C 3 r k for some r ∈ (0, 1) and C 3 > 0.
allow to obtain rates of convergence in an appropriate metric. Below we shall work with the Lévy distance, defined for a pair of distribution functions F and G by (see e.g. [10]) Our main result is the following: Theorem 1.1. Let Y n,j , j = 1, ..., n, n ∈ N be a triangular array of random variables, whose rows are adapted to a filtration (F j ), j ∈ N and satisfy the assumptions A1-A4. Then where S has the compound Poisson distribution, with intensity µ and i.i.d. jumps with characteristic function ϕ(t). Moreover, if the assumption A5 holds then there is a constant C > 0, such that for all n large enough, where L L (S n ), L (S) is the Lévy distance between the distribution functions of S n and S.
Remark 1.2. Both the constant C and the smallest n for which (1.2) holds, can be found explicitly in terms of the C i 's and α(·), mentioned in the assumptions above. Also bounds on the Lévy distance can be obtained similarly for e.g. polynomially decreasing α(·), by replacing b log n with n δ for some δ > 0 in the proof of Theorem 1.1 and optimizing the right hand side of the corresponding inequality, analogous to (3.5) below.
In application to threshold estimation, Y n,j is derived from an autoregressive stationary process X j , generated by the recursion where h(·) is a given measurable function and (ε j ) is a sequence of i.i.d. random variables, with continuous positive probability density q(·). As explained in Section 2, in this context where B n := [0, 1/n], f (·) is a measurable function and Theorem 1.1 implies that under appropriate conditions, S n converges weakly to the compound Poisson random variable with i.i.d. jumps, distributed as f (ε 1 ), and the intensity µ := p(0), where p(·) is the unique invariant density of (X j ).
Somewhat surprisingly, we were not able to find in the literature a general result, from which this limit could be deduced. In this regard, one naturally thinks of Stein's method or martingale convergence results. Stein's method appears to be particularly well suited to the compound Poisson distribution with integer valued jumps (see e.g. [3], [2]). The results such as [4], [18,19], [5] or [17] come close, but apparently do not quite fit our setting.
In the particular case, when Ef (ε j ) = 0, S n becomes a sum over the array of martingale differences Y n,j := f (ε j )1 {X j−1 ∈Bn} , j = 1, ..., n with the quadratic variation sequence A typical martingale limit result such as e.g. [6] or Theorem 2.27 Ch. VIII §2c in [13] requires that V n,n converges in probability. However in our case V n,n converges only in distribution (to a Poisson random variable), but not in probability (since e.g. V n,n is uniformly integrable, but is not a Cauchy sequence in L 1 ). It is known that S n may have a different limit or no limit at all, if the convergence in probability of quadratic variation is replaced with convergence in distribution (see [1] and the references therein), so that the martingale results also do not appear applicable 1 .
The objective of this paper is to give a proof of Theorem 1.1, using Tihomirov's method from [20]. Originally applied to CLT in the dependent case, it turns to be remarkably suitable to the setting under consideration. Before proceeding to the proof in Section 3, we shall discuss in more details the application, in which the aforementioned convergence arises.

Application to threshold estimation
Suppose one observes a sample X n = (X 1 , ..., X n ) from a threshold autoregressive (TAR) time series, generated by the recursion where g + (·) and g − (·) are known functions and (ε j ) is a sequence of i.i.d. random variables with known probability density q(·). The unknown threshold parameter θ, taking values in an open interval Θ := (a, b) ⊂ R, is to be estimated from the sample X n . TAR models, such as (2.1), have been the subject of extensive research in statistics and econometrics (see e.g. [21] and the recent surveys [22], [11], [8]). From the statistical analysis point of view, this estimation problem classifies as "singular", since the corresponding likelihood function is discontinuous in θ. Typically in such problems, the sequence of the Bayes estimators is asymptotically efficient in the minimax sense for an arbitrary continuous prior density π(·) (see [12]). The asymptotic distribution of these estimators is determined by the weak limit of the likelihood ratios as follows. Let θ 0 ∈ Θ be the true unknown value of the parameter and r n an increasing sequence of numbers. The change of variables where Z n (u), n ≥ 1 are the rescaled likelihood ratios If r n can be chosen so that Z n (u), u ∈ R converges weakly to a random process Z(u), u ∈ R in an appropriate topology, then holds (a comprehensive account of this approach can be found in [12]). For the likelihoods as in (2.2), a simple calculation (see eq. (4) in [9]) reveals that , and a similar expression is obtained for u < 0. It can be shown that (2.3) indeed holds with r n := n, if (X j ) is a sufficiently fast mixing with the unique invariant probability density p(x; θ 0 ), and the sequence log Z n (u) converges weakly to the compound Poisson process where (ε ± j ) are i.i.d. copies of ε 1 and Π + (u) and Π + (u) are independent Poisson processes with the same intensity p(θ 0 ; θ 0 ). The rate r n = n and the Poisson behavior is typical for discontinuous likelihoods (see e.g. Ch. 5, [12]). For the linear TAR model, i.e. when g ± (x) = ρ ± x with constants ρ − = ρ + , this asymptotic appeared in [7] and the aforementioned generalization is taken from [9].
One particularly interesting ingredient in the proof, which is the main focus of this article, is the convergence of the finite dimensional distributions of Z n (u) to those of Z(u). In its prototypical form, the problem can be restated as follows. Consider the stationary Markov sequence (X j ), generated by the recursion (1.3) and let (cf. (2.4)) where B n := [0, 1/n] and f (·) is a measurable function. It is required to show that, the sums (S n ) converge weakly to the compound Poisson random variable with i.i.d. jumps, distributed as f (ε 1 ), and the intensity p(0), where p(·) is the unique invariant density of (X j ). This convergence is not hard to prove using the blocks technique: S n is partitioned into, say, n 1/2 blocks of n 1/2 consecutive summands, n 1/4 of which are discarded. Removing total of n 1/2 · n 1/4 out of n terms in the sum does not alter its limit, but the residual blocks become nearly independent, if the mixing is fast enough. Moreover, a single event {X j ∈ B n } occurs within each block with probability of order n −1/2 and hence the sum over approximately independent n 1/2 blocks yields the claimed compound Poisson behavior. This approach dates back to at least [15] in the Poisson case, and the details for the compound Poisson setting can be found in [9].
An alternative proof now can be given by applying Theorem 1.1: Corollary 2.1. Let (X j ) be defined by (1.3) and S n by (2.6). Assume that (i) ε 1 has positive Lipschitz continuous bounded probability density q(x), x ∈ R with the finite first absolute moment R |x|q(x)dx < ∞ (ii) for some r ∈ (0, 1) and C > 0, for all n large enough.
Then the Markov process (X j ) has unique invariant density p(x), x ∈ R, which is positive, Lipschitz continuous and bounded; for stationary (X j ), the sums (S n ) converge weakly to the compound Poisson random variable with intensity p(0) and i.i.d. jumps with the same distribution as f (ε 1 ).
Remark 2.2. The Corollary 2.1 verifies the weak convergence of the one-dimensional distributions of the processes log Z n (u) from (2.4) to those of log Z(u), u ∈ R defined in (2.5). The convergence of finite dimensional distributions of higher orders can be treated along the same lines. The limit (2.3) then follows from the tightness of the sequence of processes log Z n (u) (see [9] for further details).
Proof. Under the assumptions i and ii, the standard ergodic theory of Markov chains (see e.g. Theorem 16.0.2 in [16]) implies that (X j ) is irreducible, aperiodic and positive recurrent Markov chain with the unique invariant measure. Due to the additive structure of the recursion (1.3), the invariant measure has density p(·), which is positive and continuous with the same Lipschitz constant L q as the density q(·) and p ∞ ≤ q ∞ := sup x∈R q(x). Moreover, (X j ) is geometrically mixing, i.e. there exist positive constants R and ρ < 1, such that for any measurable function |g(x)| ≤ 1 Hence A1 is satisfied for all n large enough with Further, by the Markov property, and Hence by (2.7), for i < j − 1, and A2 holds with ℓ = 2 and α(k) := Rρ k−2 .
(2.8) The assumption A3 is checked similarly. Finally, where ϕ(t) = Ee itf (ε 1 ) and interchanging derivative and the expectation is valid by the dominated convergence and iii.
Since the invariant density is Lipschitz, it follows that which verifies A4 and the claim now follows from Theorem 1.1. In fact, the assumption A5 holds by virtue of (2.8) and the Lévy distance to the limit distribution converges at the rate, claimed in (1.2).

Proof of Theorem 1.1
Tihomirov's approach [20] is applicable, when the characteristic function of the limit distribution uniquely solves an ordinary differential equation. Roughly, the idea is then to show that the characteristic functions of the prelimit distributions satisfy the same equation in the limit.
The characteristic function of the compound Poisson distribution with intensity µ and characteristic function of the jumps ϕ(t) is given by which solves uniquely the initial value probleṁ ψ(t) = µφ(t)ψ(t), ψ(0) = 1, t ∈ R.
Since E|S n | < ∞, the characteristic function ψ n (t) := Ee itSn is continuously differentiable and ∆ n (t) := ψ(t) − ψ n (t) satisfieṡ ∆ n (t) = µφ(t)∆ n (t) + r n (t), t ∈ R, subject to ∆ n (0) = 0, where r n (t) := µφ(t)ψ n (t) −ψ n (t). Solving for ∆ n (t) gives As we show below, for any constant b > 0, such that b log n is a positive integer, and, since |ϕ(t)| ≤ 1, it follows from (3.1) that Similar bound holds for t < 0 and the claimed weak limit (1.1) follows, once we check (3.2). To this end, we havė where we used E|S n | < ∞ and the dominated convergence to interchange the derivative and the expectation. Note that e ix − e i(x+y) ≤ 21 {y =0} for any x, y ∈ R, and hence by the assumption A1 E|Y n,k |1 { |j−k|≤b log n,j =k Y n,j =0} ≤ 2 n k=1 E|Y n,k | |j−k|≤b log n,j =k log n n .

Further, by the triangle inequality
EiY n,k e itY n,k exp it |j−k|>b log n Y n,j −φ n,k (t)ψ n (t) = EiY n,k e itY n,k exp it Similarly to (3.4), we have By the triangle inequality, Since U and V are F k -measurable, |U | ≤ 1, |W | ≤ 1 and E|V | ≤ C 1 n −1 , A3 implies and, since U is measurable with respect to F k−b log n , Further, by A2 for b log n ≥ ℓ, Hence and consequently, by A4 EiY n,k e itY n,k exp it |j−k|>b log n Y n,j − µφ(t)ψ n (t) ≤ C 2 n −1 + 3C 1 α(b log n) + 4C 2 1 b log n n .
Assembling all parts together, we obtain (3.2). The bound (1.2) for the Lévy metric is obtained by means of Zolotorev's inequality [23], L L (S n ), L (S) ≤ 1 π If α(k) decays geometrically as in A5, the bound (1.2) is obtained by choosing T = n 1/2 and b ≥ 1 log 1/r . Remark 3.1. The rate in (1.2) is not as sharp as the one, obtained by Tihomirov in [20] in the CLT case. Apparently, the deficiency originates in the specific form of the compound Poisson characteristic function ψ(t) = e µ(ϕ(t)−1) , which does not vanish as t → ∞. More specifically, the integration kernel K(s, t) := e µ(ϕ(t)−ϕ(s)) in (3.1) does not decay when t is fixed and s decreases, which contributes the linear growth in t of the right hand side of (3.3) and the corresponding linear growth in T in (3.5). In the Gaussian case, this kernel has the form K(s, t) := e s 2 /4−t 2 /4 (see eq. (3.25) page 809 in [20]), which yields better balance between growth in t and the decrease in n. It seems that in the compound Poisson setting under consideration the rate cannot be essentially improved within the framework of Tihomirov's method.