ON BROWNIAN MOTION APPROXIMATION OF COMPOUND POISSON PROCESSES WITH APPLICATIONS TO THRESHOLD MODELS

Compound Poisson processes (CPP) constitute a fundamental class of stochastic processes and a basic building block for more complex jump-diffusion processes such as the Lévy processes. However, unlike those of a Brownian motion (BM), distributions of functionals, e.g. maxima, passage time, argmin and others, of a CPP are often intractable. The first objective of this paper is to propose a new approximation of a CPP by a BM so as to facilitate closed-form expressions in concrete cases. Specifically, we approximate, in some sense, a sequence of two-sided CPPs by a two-sided BM with drift. The second objective is to illustrate the above approximation in applications, such as the construction of confidence intervals of threshold parameters in threshold models, which include the threshold regression (also called two-phase regression or segmentation) and numerous threshold time series models. We conduct numerical simulations to assess the performance of the proposed approximation. We illustrate the use of our approach with a real data set.

To circumvent the difficulty, an important approach is to approximate the CPP by a Brownian Keywords: Brownian motion, compound Poisson process, TAR, TARMA, TCHARM, TDAR, TMA, threshold regression. motion (BM) in some sense because the situation is much more tractable in the latter. See, e.g., Stryhn (1996). The primary objective of this paper is to introduce a new approximation of P(z) = I(z < 0) where {N 1 (z), z ≥ 0} and {N 2 (z), z ≥ 0} are independent Poisson processes with rates λ 1 and λ 2 , respectively, {ζ (1) k : k ≥ 1} and {ζ (2) k : k ≥ 1} are independent and identically distributed (i.i.d.) sequences with Eζ (i) 1 > 0 for i = 1, 2, respectively, and mutually independent. {N i (z)} and {ζ (j) k } are also mutually independent. Throughout the paper, these assumptions are always supposed to hold. This paper is organized as follows. We state the main results in Section 2. In Section 3, we describe some important applications in threshold models. In Section 4, we assess the efficacy of the theoretical results of approximation by numerical simulations. A real data set is also included. All proofs of Theorems are in the online supplementary material.
Thus, we embed (1.1) into a sequence of {P γ (z) : z ∈ R}. Our interests are m γ and the limits of {P γ (z) : z ∈ R} as γ shrinks to zero under some suitable conditions. To this end, we first introduce two assumptions.
Note that the choice of λ i is critical. For example, if the rate of the component Poisson process is chosen as a function such that it tends to a deterministic continuous one, the CPP converges weakly to a CPP. See, e.g., Jacod and Shiryaev (2003). In our approach, λ i 's are fixed constants. The following theorem states that the sequence of stochastic processes {P γ (z) : z ∈ R} can be approximated by a two-sided BM with drift.
Theorem 1. Let =⇒ stand for weak convergence. Let D(R) denote the space of functions defined on R, which are right continuous and have left limits. Let the space be endowed with the Skorokhod topology. If Assumptions 1-2 hold, then, as γ → 0, with B 1 (z) and B 2 (z) being two independent standard Brownian motions on [0, ∞). Further, let T := arg min z∈R W(z). Then m γ =⇒ T .
In the literature, the density of T is readily available and has a closed form, which is given in the following theorem; see Proposition 1 in Stryhn (1996).
Theorem 2. The probability density of T is given by and Φ(·) is the standard normal distribution.
Corollary 2. Suppose that γ = Eξ 1 > 0, λ 1 = λ 2 := λ, and E{ξ (1) where T 2 has the density Thus, Theorem 2 includes the density (2.2) as a special case. Yao (1987) used this special case in his study of the approximation of the limiting distribution of the maximum likelihood estimate of a change-point problem. The distribution of T 2 has exponential tails; see Remark 1 in Yao (1987). Note that Theorem 1 includes Theorem 1 of Hansen (2000) as a special case. Figure 1 displays the density and the cumulative distribution function (CDF) of T 2 . From Figure 1, we can see that T 2 is symmetric. Moreover, for our needs, it is easy to tabulate the quantiles of T 2 . For any given level α ∈ (0, 1), denote by Q α the αth quantile of T 2 . Table 1 gives some commonly used quantiles.  3.1. Threshold regression model. To the best of our knowledge, the threshold regression model, also called the two-phase regression model or the segmentation model, can be dated back to Quandt (1958). Since then it has been widely used in economics and other areas. Asymptotics on statistical inference for such models have been considered; see, e.g., Hinkley (1969Hinkley ( , 1971, Hansen (2000), Koul and Qian (2002), Seijo and Sen (2011) and Yu (2012Yu ( , 2015. We say (x , y, z) follows a threshold regression model if where y is a scalar dependent variable and x = (x 1 , ..., x p ) explanatory variables (or independent variables), z is called the threshold variable and r the threshold parameter, and ε is the error with zero mean and unit variance. Suppose that {(x i , y i , z i )} is a random sample of size n from model (3.1) with the true parameter θ 0 = (β 10 , β 20 , r 0 ) and (σ 10 , σ 20 ). Denote by θ n the least squares estimator (LSE) of θ 0 . Under some conditions (e.g., Koul and Qian (2002)), we have Here, N 1 (·) and N 2 (·) are independent Poisson processes with the same rate π(r 0 ), which is the value of the density π(·) of z at r 0 , and where {ζ (1) k : k ≥ 1} is a sequence of i.i.d. random variables with the same distribution as the one induced by and the sequence {ζ Here, z = r − 0 and z = r + 0 denote convergence to r 0 from below and from above respectively. Clearly, Eζ (i) is a function of β 10 − β 20 . To obtain an approximation of M − by Theorem 1 when β 10 − β 20 is small, we can introduce a new parameter γ = Eζ (1) to re-parameterize the CPP (3.2). Note that unlike Hansen (1997), β 10 − β 20 is fixed and not sample-size dependent.
In practice, π(·) can be estimated by the nonparametric kernel method. Then, on using the plugin method, an estimate π n ( r n ) of π(r 0 ) can be obtained. An estimate of σ 2 can be got from the residuals. However, the estimate of γ is a little complicated since it is a conditional expectation, namely γ = E({x (β 10 − β 20 )} 2 |z = r 0 ). Of course, if x and z are independent, then γ is an unconditional expectation. In this case, it is easy to estimate γ by γ n = n −1 n i=1 {x i ( β 1n − β 2n )} 2 . If they are not independent, a good choice is to use the best linear predictor of {x (β 10 − β 20 )} 2 based on z with θ n in lieu of θ 0 to approximate γ; see (3.7) in the following Subsection 3.3. Once the estimates of γ, π(r 0 ) and σ 2 are obtained, we can construct confidence intervals of r 0 by using the quantiles of T 2 .
3.2. Threshold AR model. The TAR model is an important class of nonlinear time series models. The idea of threshold in the time series context was initially conceived around 1976, first appeared in Tong (1978) and was later formalized in Tong and Lim (1980). Fuller results can be found in the monograph of Tong (1990). For history and future outlook, see, e.g., Tong (2011Tong ( , 2015. Chan (1993) is a significant contribution to the inference of TAR models. It is the first breakthrough in the asymptotic theory of the LSE of the threshold parameter in discontinuous two-regime TAR models. Other important contributions include Tsay (1989Tsay ( , 1998, Gonzalo and Pitarakis (2002), and others.  first established the asymptotic theory of the LSE in multiple-regime TAR models.
A time series {y t } is said to follow a two-regime TAR model of order p if it satisfies random variables with zero mean and unit variance and ε t independent of {y t−j : j ≥ 1}. Suppose that {y 1 , ..., y n } is a sample from the TAR model (3.3). Denote by r n the LSE of r 0 . Under Conditions 1-4 in Chan (1993) or Assumptions 3.1-3.4 in , and by Theorem 3.3 in , we have where the left and the right jump distributions in the two-sided CPP P(·) are induced by respectively. Both rates are the same, i.e., π(r 0 ), which is the value of the density π(·) of y t at r 0 . From the above expressions, we can set γ = E({y t−1 (β 10 − β 20 )} 2 |y t−d = r 0 ), which is a function of β 10 − β 20 . Note that when β 10 − β 20 is small, the range of M − is large. In this case, we can use In applications, in order to construct confidence intervals of r 0 by (3.4), we must estimate π(r 0 ) and γ. Clearly, estimating π(·) is easy. For example, we can use the nonparametric kernel method and then use the plug-in method to get an estimate π n ( r n ) of π(r 0 ). However, it is rather difficult to estimate γ directly since it is a conditional expectation. An easy and good choice is to use the best linear predictor to replace it. Of course, the re-sampling method in  is still helpful. Now, by using Algorithms B and C in , we can draw a new sample {y * t } with y * i−d = r n for each y * i and then use the new sample to estimate γ with β 1n − β 2n in lieu of β 10 − β 20 . In particular, we consider a simple discontinuous TAR(1) model: where the notation is the same as in model (3.3), except for var(ε t ) = σ 2 . In this case, the jumps are unconditional and simple: In this simple case, we can estimate γ by γ n = { r n ( β 1n − β 2n )} 2 and π(r 0 ) by π n ( r n ), a nonparametric kernel estimate, and σ 2 by σ 2 n = n −1 n t=1 ε 2 t , where { ε t } is the residual based on the LSE. Thus, can be approximated by T 2 when |r 0 (β 10 − β 20 )| is small.
3.3. Threshold MA model. The TMA model is an important class of threshold time series models. It is a natural generalization of linear MA models. The linear MA model was first introduced by Slutsky (1927) and since then it has been widely used in many areas such as business, economics, etc. It has played a prominent role in the development of time series analysis. However, nonlinear MA models have developed slowly and have been overshadowed by nonlinear AR models. The slow development was mostly due to difficulties in statistical inference for general nonlinear MA models; see Robinson (1977). To-date, studies on nonlinear MA models mainly focus on TMA ones; see, e.g., Ling and Tong (2005), Ling, Tong and Li (2007), Li and Li (2008), Li, Ling and Tong (2012) and Li (2012). Recently, Li, Ling and Li (2013) studied the asymptotic theory of the LSE in TMA models and succeeded in obtaining the limiting distribution of the estimated threshold for the first time in the literature.
A time series {y t } is said to follow a TMA model of order 1 if it satisfies with mean zero and variance σ 2 ε ∈ (0, ∞), and ε t is independent of {y j : j < t}. Let θ = (φ, ψ, r) denote the parameter and θ 0 its true value.
Let θ n be the LSE of θ 0 . Li, Ling and Li (2013) showed that under their Assumptions 2.1-2.3 Here, N 1 (·) and N 2 (·) are independent Poisson processes with the same rate π(r 0 ), which is the value of the density π(·) of y t at r 0 , and {ζ (1) k : k ≥ 1} is an i.i.d. random variable with the same distribution as the one induced by Therefore, by Corollary 2, it follows that In applications, π(r 0 ) is readily estimated by the nonparametric kernel method, and σ 2 ε by the residuals { ε t } based on the LSE. The real hard work is in estimating or approximating γ. The key point is how to approximate E(ε 2 t |y t = r 0 ). Here, we propose three ways to approximate this conditional expectation. One is the re-sampling method of Li, Ling and Li (2013). Similar to high-order TAR models in Subsection 3.2, we can draw a new sample satisfying the condition y t−1 = r n and then calculate the conditional expectation. This procedure is complicated and needs more computations. The second is to use nonparametric method to estimate it.
In particular, if ε t is symmetric, then κ 3 = 0 and in turn (3.8) reduces to 3.4. Threshold ARMA model. The TARMA model is a natural extension of TAR and TMA models. Like linear ARMA models, TARMA model can provide a parsimonious form for high-order TAR or high-order TMA models. Recently, Chan and Goracci (2019) studied the ergodicity of one-order TARMA models. However, in the literature to-date, there are few results on the statistical inference of TARMA models. Exceptions are  and Li, Li and Ling (2011), who considered the LSE and established its asymptotic theory.

T-CHARM.
To characterize the martingale difference structure implied in log-returns of assets in financial time series, Chan, et al. (2014) proposed a simple yet versatile model, called the conditional heteroscedastic AR model with thresholds (T-CHARM), which is a special case of Rabemananjara and Zakoïan (1993), Zakoïan (1994), , Li, Ling and Zakoïan (2015) and Li, Ling and Zhang (2016).
For multiple-regime T-CHARM, approximations of the limiting distributions of the thresholds can be obtained similarly.
3.6. Threshold DAR model. The TDAR model is a significant extension of conditional heteroscedastic models, including the threshold ARCH model of Rabemananjara and Zakoïan (1993) and Zakoïan (1994). On TDAR models, recent work can be found in Li, Ling and Zakoïan (2015) and Li, Ling and Zhang (2016).
A time series {y t } is said to follow a TDAR model of order (1, 1) if where the left and the right jumps in P(z) are respectively For simplicity, we assume that ε t ∼ N (0, 1) tentatively. Denote By a simple calculation, we have Thus, by Corollary 2, γπ(r) n( r n − r) T 2 .
In applications, γ and π(r 0 ) can be estimated by their sample counterparts. For high-order cases, similar to high-order TAR models in Subsection 3.2, we can use the re-sampling method to estimate γ.

Simulation studies.
In this section, we use simulations to assess the performance of the approximation in Section 2. The TAR(1), TMA(1) models and T-CHARM are used as typical cases. The error {ε t } is supposed to be i.i. d. N (0, 1) for simplicity. For each model, the sample size is 500 and 2000 replications are used.
The TAR(1) model is defined as .5I(y t−1 ≤ 1.5) + 0.9I(y t−1 > 1.5)}y t−1 + ε t . (4.1) Figure 2 shows the histogram and the empirical CDF of NS n in (3.5) as well as those of T 2 in (2.2), from which we can see that the approximation performs well, even when the threshold effect is not small with γ = |0.9 − 0.5| = 0.4. Hansen (1997,2000) was probably the first to adopt a BM approximation approach to handle statistical inference in TAR models. His approach is based on a different setting from ours: he has effectively replaced the TAR model by a sequence of TAR models indexed by the sample size n, with n-dependent regression slopes, which coalesce (with a speed apparently not easily determined) as n goes to infinity. Let us call the difference between the regression slopes of the two regimes the threshold effect. On the other hand, for cases with fixed threshold effects,  proposed a re-sampling method to simulate M − . This method works well when the range of M − is not very large, e.g., when the expectation of the jump is sufficiently large, implying a large threshold effect. However, the range becomes very large when the expectation of the jump is small associated with a small threshold effect. In this case the re-sampling method is not so accurate. We now take up the challenge of obtaining a BM approximation for the case with fixed (i.e. not n-dependent) but small threshold effects.
To compare the performance of likelihood ratio method in Hansen (1997) and ours, we compute the coverage probabilities of r 0 at 90% and 95% levels, respectively. The estimator of π(·) is obtained by two methods: one based on a nonparametric kernel method and the other the moving block bootstrapping (MBB) method. For this and other bootstrapping methods for dependent data, see Lahiri (2003). When the sample is small, the estimator π n ( r n ) may have a larger bias and will affect the performance of the statistic NS n . In this case, we recommend the MBB method. Table 2 reports the numerical results. Here, for each sample size, 1,000 replications are used. With each replication, 10 replicates are used for the MBB. From the table, we can see that Hansen's method over-estimates the coverage probability and becomes quite conservative when the sample size n is moderately large, like that in Hansen (1997Hansen ( , 2000. On the other hand, our method based on MBB performs well across all sample sizes; the method based on nonparametric kernels shows stable performance across all n, with good coverage probability for n = 500, but not as well as the MBB method for smaller n. Based on our experience, we recommend the nonparametric kernel method for large n, which will save computing costs, and the MBB for smaller n.  Figure 3 shows the histogram and the empirical CDF of NS n in (3.9) as well as those of T 2 in (2.2), from which we can see that the approximation performs well.
For the T-CHARM defined as y t = σ t ε t , σ 2 t = 1I(y t−1 ≤ 0.5) + 2I(y t−1 > 0.5). (4.3) Figure 4 shows that the performance of approximation is good. Here, the ratio κ := σ 2 2 /σ 2 1 = 2. When κ increases, the performance of approximation may deteriorate. For example, consider a T-CHARM model defined as Here, the ratio κ = 6. Figure 5 shows the approximation for (4.4). Compared with Figure 4, the approximation is poorer as expected. Of course, when κ > 5, simulating a CPP will generally result in a better approximation for n( r n − r 0 ).
Unfortunately, there are no theoretical results to guide us on the choice between the resampling method in  and our approximation method. However, our experience suggests the following procedure in practice. First, we use the resampling method in  to simulate M − . If the simulated numerical range of M − is large, e.g., bigger than 50, then we use our approximation method instead.
5. An empirical example. The unemployment rate is an important index in measuring economic activity. Hansen (1997) explored the presence of nonlinearities in the business cycle through the use of a TAR model for U.S. unemployment rate among males age 20 and over. The sample is monthly from January 1959 through July 1996. There are 451 observations in total over the period, which is plotted in Figure 6. Let {y t } be the rate. Hansen (1997) suggests the following fitted model where y t = y t − y t−1 , σ 2 1 = 0.154 2 , σ 2 2 = 0.187 2 , and the estimates of the coefficients are summarized in Table 3. For more details, including the standard errors and 95% confidence intervals of the estimated coefficents, see Table 5 in Hansen (1997). From (3.4), using the estimated coefficients, we can obtain the density of T 1 , which is displayed in Figure 7. The 2.5% and 97.5% quantiles of T 1 are −0.2477 and 0.3972, respectively. Now, using these quantiles, we can construct confidence intervals of the threshold parameter r 0 by our nonparametric kernel method with the MBB. This method gives the 95% confidence interval as [0.255, 0.332]. Here, the length of the moving block is 15 and the number of replicates is 50. The corresponding result using Hansen's method is [0.213,0.340], where the likelihood ratio is adjusted for residual heteroscedasticity by using a kernel estimator for the nuisance parameters. We note that Hansen's method has given a much wider confidence interval.
6. Conclusion and discussion. In this paper, we have developed an alternative approach to approximate two-sided CPPs by two-sided BMs. Significantly, we address the issue with small but fixed threshold effects. The new approach provides a simple yet efficacious tool to derive distributions of some functionals of the sample paths of CPPs, thus rendering statistical inference of the key threshold parameter in a threshold model, such as the construction of its confidence intervals, a practical proposition. Further, our approach continues to apply to threshold regressive/autoregressive models with multiple regimes since the distributions of all estimated threshold parameters are asymptotically independent; see, e.g., ), Chan, et al. (2014), Li, Ling and Zakoïan (2015. Thus, we can use our approach to construct confidence intervals for the thresholds one by one. Our theory can be applied for other applied-oriented problems. For example, Hansen (1997Hansen ( , 2000 proposed a likelihood ratio-based statistic LR n (r 0 ) to test the null hypothesis H 0 : r = r 0 in threshold (auto)regression under his framework. However, under Tong's framework, i.e., the threshold effect is fixed, the limiting distribution of the related likelihood ratio-based statistic LR n (r 0 ) is a functional of two-sided compound Poisson process, which is hard to use for the same purpose. Our new theory can provide a usable approximation on LR n (r 0 ) and statistical inference for threshold can be realised.
By Theorem 16 in Pollard (1984)(p.134), we claim that Similarly, we can show the weak convergence of P 1,γ (z). The proof of the first claim is complete. We now prove the second claim. Note that P γ (0) = 0; by the definition of m γ , for any A > 0, it follows that k ≤ 0, m γ < −A := I + II.