TAR Modeling with Missing Data when the White Noise Process Follows a Student ’ s t-Distribution

This paper considers the modeling of the threshold autoregressive (TAR) process, which is driven by a noise process that follows a Student’s t-distribution. The analysis is done in the presence of missing data in both the threshold process {Zt} and the interest process {Xt}. We develop a three-stage procedure based on the Gibbs sampler in order to identify and estimate the model. Additionally, the estimation of the missing data and the forecasting procedure are provided. The proposed methodology is illustrated with simulated and real-life data.


Introduction
TAR models proposed by Tong (1978) assume that the values of a process {Z t } (the threshold process) determine not only the values of the process of interest {X t }, but also its dynamics.When the threshold process is the same process of interest but is lagged, the model is known as SETAR (Self-exciting TAR, Briñez & Nieto, 2005).Nieto (2005) developed a Bayesian methodology for the identification and estimation of TAR models allowing missing data in the both threshold process and the process of interest.On the other side, Nieto (2008Nieto ( , 2011) ) characterized in univariate TAR models in terms of their mean, conditional mean, variance, conditional variance and also found the expressions for the best predictor.Vargas (2012) improved the prediction with TAR models, taking into account the variability in the parameters and Nieto & Moreno (2013) explored three kinds of conditional variance in order to try to compare this type of nonlinear models with GARCH models.Also the TAR models can be easily extended to threshold autoregressive moving average (TARMA) models, and its Bayesian modeling with two regimes has been investigated in Sáfadi & Morettin (2000) and Xia, Liu, Pan & Liang (2012).Another extension of the TAR model is when there are two threshold variables instead of one and this is done by Chen, Chong & Bai (2012) within the particular case of two regimes.
Despite TAR models usefulness, they are not easily identified due to the large number of parameters and the nesting structure between the parameters.For SE-TAR models, Tsay (1989) provided a simple and widely applicable model-building procedure.However, generally speaking, most of the parameters, as the thresholds and the number of regimes, are assumed to be known; otherwise they can be identified using Tong (1990)'s NAIC criterion together with some graphical techniques.Assuming that the noise process is Gaussian, Nieto (2005) developed a Bayesian procedure in order to identify the number of regimes and estimate the other parameters, once the thresholds are identified, using NAIC criterion for each possible number of regimes.This work is accomplished in the presence of missing data in both the process of interest and the threshold process.
In many cases, the data cannot be appropriately described by the Gaussian distribution; for example, it is well-known that financial time series often have heavy tails, and the t-distribution could be more appropriate for the noise process than the Gaussian distribution.Zhang (2012) estimated the parameters of the TAR model with t distributed error process when the model is completely identified.However, in practice, the identification problem may be difficult to be carry out, because this requires some additional knowledge about the phenomenon and this fact is particularly unrealistic due to the complex model structure.The focus of this work is to propose a Bayesian methodology which includes model identification in the TAR modelling.Specifically, a three-stage methodology based on the Gibbs sampler is proposed: in the first stage, the number of regimes together with the thresholds are estimated; in the second, the autoregressive orders in the regimes are estimated, and finally, in the last stage, the autoregressive orders, the variance weights and other parameters of the noise process are estimated.In this way, the identification of the model takes place in the first two stages, and the estimation of the model in the last stage.Additionally, a Bayesian methodology for the estimation of missing data and the forecasting issue is developed.The methodologies developed are illustrated with simulated and real-life data in the finance field.
The work is organized as follows: in Section 2, we introduce the TAR model with t distributed noise process; in Section 3, we present the estimation procedure for the non structural parameters when the structural parameters are known; in Section 4, we present the identification for the structural parameters; in Sections 5 and 6, we present forecasting and missing-data estimation procedures.Finally, in Section 7, we illustrate the developed methodology in simulated and real-life data.

TAR Model with T -Distributed Noise
In order to introduce the TAR model, first suppose that the set of real numbers is divided in l disjoint intervals as R = l j=1 R j where R j = (r j−1 , r j ], where The R 1 , . . ., R l are denomited the regimes and the values r 1 , . . ., r l−1 are denominated the thresholds.Let {Z t } be a stochastic process with stochastic behaviour described by a Markov chain of order p called the threshold process and let {X t } be the process of interest.
The dynamic of the process {X t } is determined by the process {Z t } in the way that when Z t ∈ R j = (r j−1 , r j ] for some j = 1, . . ., l the model for {X t } is (1) The values k 1 , . . ., k l are nonnegative integer numbers representing the autoregressive orders in the l regimes, that is, different autoregressive orders are allowed in different regimes.h (j) > 0 for j = 1, . . ., l, Nieto & Moreno (2013) found that the parameters (h (j) ) 2 correspond to the variance of X t conditional on the regime and the past values of X, the so-called type II conditional variance.With respect to the noise process {e t }, Nieto (2005) uses a Gaussian distribution, in this paper a Student's t-distribution is used.However, in order to mantain the interpretation of (h (j) ) 2 mentioned before, we use a t-distribution with degrees of freedom n divided by its standard deviation n/(n − 2), that is, with n > 2, which is mutually independent from the process {Z t }; in this way, V ar(e t ) = 1 for all t and hence, following Nieto & Moreno (2013), (h (j) ) 2 = V ar(X t |R j , x t−1 , . . ., x 1 ).Additionally, we assume that {Z t } is exogenous in the sense that there is no feedback of {X t } towards it.Nieto & Moreno (2013) found conditions on the coefficients a (j) i to achieve stationarity when the distribution for the noise process is Gaussian; however, as pointed out by Nieto (2005), the stationarity is not required for the correct implementation of the proposed Bayesian methodology; we have the same situation in this paper and the corresponding conditions for stationarity deserve future investigations.
The parameters of the model can be divided into two groups: • Structural parameters: the number of regimes l, the l − 1 thresholds r 1 , . .., r l−1 and the autoregressive orders of the l regimes k 1 , . .., k l .

Conditional Likelihood Function of the Model
According to Nieto (2005) and conditioned upon the values of the structural parameters, the initial values x k = (x 1 , . . ., x k ) , where k = max{k 1 , . . ., k l } and the observed data of the threshold process z = (z 1 , . . ., z T ) , the conditional likelihood function is given by: where θ z denotes the vector of parameters of the threshold process {Z t } and θ x denotes the vector of all the non structural parameters, that is , for t = k +1, . . ., T , the variable x t |x t−1 , . . ., x 1 , z is distributed as a t n variable multiplied by and adding a (jt) 0 x t−i , where j t = j if Z t ∈ R j = (r j−1 , r j ] for some j = 1, . . ., l.That is, the distribution of x t , conditioned upon the past values of x and z, is the non-standardized Student's t-distribution with n degrees of freedom, location parameter a (jt) 0 . Thus, Consequently, the conditional likelihood function is given by x t−i and j t = j when Z t ∈ R j .

Estimation of Non Structural Parameters
In this part of the research, the structural parameters are assumed to be known, and we focus on finding the posterior conditional distributions of the autoregressive coefficients θ j , the variance weights h (j) with j = 1, . . ., l and the degrees of freedom of the noise process n.Additionally we assume prior independence between the parameters θ, h and n, as well as prior independence among the non structural parameters in each one of the l regimes.
The prior distribution for the vector θ j is a multivariate normal distribution with mean vector θ 0,j and covariance matrix V −1 0,j , denoted as θ j ∼ N (θ 0,j , V −1 0,j ), and the posterior conditional distribution of θ j is given by the following result: Proposition 1.For each j = 1, . . ., l, the conditional distribution of θ j given the structural parameters θ i , with i = j, h, and n is given by (3) Note that the posterior conditional distribution of θ j is affected only by h (j) , but not the other components of h in regimes different from j, so we have some class of posterior independence between regimes.Now, with respect to the variance weights h (j) , we follow the standard Bayesian methodology assigning an inverse Gamma distribution with shape parameter α and scale parameter β, (IG(α, β)), as the prior distribution of (h (j) ) 2 , that is, Combining this prior distribution of (h (j) ) 2 and the conditional likelihood function, we have the following posterior conditional distribution (h (j) ) 2 : Proposition 2. For each j = 1, . . ., l, the posterior distribution of (h (j) ) 2 given the structural parameters, θ j , j = 1, . . ., l, h (i) , with i = j and n, is given by (4) Note that, the posterior conditional distribution of (h (j) ) 2 is affected only by θ j , but not by θ i with i = j, so again we have the posterior independence between θ j , h (j) in different regimes.
Finally, we found the posterior conditional distribution of the degrees of freedom of the noise process n.The prior distribution of n is a Gamma distribution following the suggestion of Watanabe (2001), since in the distribution Gamma(α , β ), the expectation and the variance are given by α β and α β 2 , respectively.Actually, α and β can be chosen according to prior knowledge about n, and in case that there is no prior information about n, we can choose a quite large prior variance to represent the high degree of uncertainty in the prior information of n.The prior distribution of n is given by Using this prior distribution, we find the posterior conditional distribution of n.
Proposition 3. The posterior conditional distribution of the degrees of freedom of the noise process {e t } is given by In conclusion, the estimation of non structural parameters can be carried out by means of a Gibbs sampler, using the conditional densities (3), ( 4) and ( 5).We use the grid method to simulate values from these distributions; in order to define the parameter space, we fit an autoregressive AR(p) model (where p = max{k 1 , . . ., k l }) and use the estimation plus and minus two times the standard error as the parameter space.For example, suppose that l = 2, k 1 = 1 and k 2 = 2, and the estimated coefficients of the AR(2) model are 0.8 and -0.4 with standard error 0.1 and 0.2, then the parameter space for a (1) 1 and a (2) 1 would be (0.6,1) and the parameter space for a (2) 2 would be (-0.8,0).Also for the coefficients h (j) we take into account the estimation of the error variance in the AR(2) fitting.Finally, for the degree of freedom n, we choose the parameter space to be (2,30), since the t distribution has no finite variance when n ≤ 2 and would be too similar to a normal distribution for n > 30.

Estimation of Structural Parameters
In this section, we develop the results concerning the estimation of the structural parameters, i.e. the identification of a TAR model.Firstly, we assume that the number of regimes l and the l − 1 thresholds are known, and we estimate the autoregressive orders in these regimes; then we consider the case where the thresholds are known, and finally, we have the general case, where all the structural parameters are unknown.

Estimation of the Autoregressive Orders k 1 , . . . , k l
As we assume that the number of the regimes and the thresholds are known, remaining parameters to be estimated are the autoregressive orders and the nonstructural parameters.
We assume that the autoregressive orders k 1 , . . ., k l are realizations of discrete random variables K 1 , . . ., K l , and each of theses variables takes value in the set {0, 1, . . ., k max }.It is important to note that when the values of some autoregressive orders change, the specification of the TAR model changes and the dimension of the vector of the autoregressive coefficients Θ also changes.Carlin & Chib (1995) developed a Bayesian methodology for the selection of models, and Nieto (2005) adapted this methodology in order to identify the TAR model with Gaussian noise.In this research, we adapt the same methodology to identify the TAR model with t distributed noise.Suppose that M is a discrete random variable indexing the model which takes values 1, . . ., (k max + 1) l .For each possible model M = m, we define the vector of parameters Θ m as Θ m = (θ 1 , . . ., θ l , h ) for the model m with m = 1, . . ., (k max + 1) l .The degrees of freedom n can be considered as a nuisance parameter, since its dimension is the same for all models as well as its interpretation.Carlin & Chib (1995) found the following conditional densities where Θ = {Θ 1 , . . ., Θ (kmax+1) l }, y = (x, z) is the vector of full data.And The densities p(Θ m |M = m) are denominated as the link functions which can be taken as the prior distribution of Θ m .
In the context of the problem of identification of the autoregressive orders, the model indicator M is determined jointly by the values of variables K 1 , . . ., K l .In this way, computing the density ( 6) is equivalent to computing the densities p(k j |Θ, k i,i =j , y) with j = 1, . . ., l, because when we know the conditional distribution of each k j , we can sample values of k j by using a Gibbs sampler and, thus, we can sample values of the model indicator M .In order to compute these densities, Nieto (2005) found that where k = (k 1 , . . ., k l ), and k is obtained by replacing the component k j of the vector k by k j for all j = 1, . . ., l.
In summary, using the densities (3), ( 4), ( 5) and ( 8), a Gibbs sampler can be implemented in order to obtain the estimations of the probabilities of all the possible values for each K j with j = 1, . . ., l. Denoting these estimated probabilities as p0j , p1j , . . ., pkmaxj , we can choose the value of K j for which the highest probability is associated.

Estimation of the Number of Regimes l
In order to estimate the number of regimes, we use again the approach developed in Nieto (2005) adapting the methodology of Carlin & Chib (1995).Suppose that the number of regimes l is the realization of a discrete random variable L which takes values in the set {2, . . ., l max }, and the prior distribution of L is denoted by p(l).
Clearly, when the value of l changes, the model specification also changes; we have l max − 1 possible models.Suppose that M is the discrete random variable indexing the model, then M takes values 2, . .., l max , and for each possible model, M = j, Θ j denotes the vector of the parameters in this model, that is: where k ij denotes the autoregressive order in the ith regime in the model M = j, h j = (h (1) , . . ., h (j) ) .Finally, we define Θ = (Θ 2 , . . ., Θ lmax ), the vector containing all the parameters for all the possible models.Nieto (2005) found the following conditional densities for l = 2, . . ., l max , ( 9) where Θ −kij denotes the vector Θ without the element k ij , and where Θ −θj ,h (j) denotes the vector Θ without the components θ j and h (j) .
Jointly using the conditional densities ( 9), ( 10), ( 11) and ( 5), we can implement a Gibbs sampler and obtain the posterior probabilities for all possible values of L. The estimation of the number of regimes l could be the value with major posterior probability or the mode of the value of L in the iterations of the Gibbs sampler.

Estimation of the Number of Regimes l and the Thresholds
Finally, we assume that the l − 1 thresholds are also unknown, and they need to be estimated jointly with the number of regimes l.Following the approach of Carlin & Chib (1995), the model is indexed by a discrete variable M , which takes values 2, . . ., l max according to the value of the variable L. For each possible model M = j, the thresholds are denoted as r j = (r 1 , . . ., r j−1 ) with j = 2, . . ., l max .
It is straightforward to obtain the posterior conditional density of r j given the values of other structural and non-structural parameters, given by p(r j |l, Θ −rj , y) where j t = j if Z t ∈ R j = (r j−1 , r j ] for some j = 1, . . ., l.The posterior conditional density of l is given by ( 9).Note that the expression in ( 12) depends on the thresholds r j since j t = j if Z t ∈ R j = (r j−1 , r j ], so R j and j t depend on the thresholds, and so does the expression (12).In this way, using the posterior conditional density of r j , l, and Θ j , we can implement a Gibbs sampler and obtain the estimation of the number of regimes and thresholds.In order to extract samples from this density, we use the grid method where the parameter space consists of all possible ordered values of Z t .
With respect to the prior density of r j , we recall that the values of the thresholds are based on the values of the process {Z t }, so we can assume that the thresholds take values in a interval (a, b), appropriately specified; furthermore, we assume a uniform distribution for the thresholds r 1 , . .., r j−1 , that is for j = 2, . . ., l max .

Proposed Algorithm
In conclusion, a three-stage process is proposed for the identification and estimation of TAR models with t-distributed noise with no missing data.This algorithm consists of the following steps: 1.The number of regimes and thresholds are estimated using a Gibbs sampler based on the densities (9), ( 10), ( 11), ( 12) and (5).
2. The number of regimes and thresholds are fixed and the autoregressive orders are estimated using a Gibbs sampler based on the densities ( 7), ( 8) and (5).

Forecasting
In order to develop the predictive inference, we focus on finding the posterior predictive distribution of the variable X T +h conditional on the observed x T = (x 1 , . . ., x T ) and z T = (z 1 , . . ., z T ) with h > 0. Vargas (2012) worked on the formal Bayesian approach to find the predictive density of X T +h involving the variability in the parameters of the model; this predictive density is given by: where p j (h) = P (Z T +h ∈ R j |x T , z T ), for h = 1, 2, . .., and j = 1, . . ., l, and where θ (j) denotes the vector of the non-structural parameters in the regime j, and On the other hand, in order to forecast the threshold variable Z T +h , Nieto (2008) found that: Based on the equations ( 13), ( 14) and ( 15), we can compute forecasts for both processes {X t } and {Z t }.In order to draw values from p(z T +h |z T ) with h = 1, 2, . .., Congdon (2001) suggests to draw a value for z T +1 from p(z T +1 |z T ), then draw value for z T +2 from p(z T +2 |z T +1 , z T ) and so on.
On the other hand, in order to forecast the process {X t }, notice that p(x T +h | x T , z T ) is a mixture density, so we just draw a value from p(x T +h |R j , x T , z T ) with probability p j (h).Secondly, we note that each term p(x T +m |θ (j) , R j , x T +m−1 , ) for m = 1, . . ., h corresponds to the density of a non-standarized Student's t-distribution with n degrees of freedom, location parameter a .This concludes the forecasting process.

Estimation of Missing Data
We assume that there are missing observations in both processes {X t } and {Z t } and that the observed data of {X t } are located in time points t 1 , . . ., t N with The estimation of these missing data can be carried out using the approach of Nieto ( 2005) as shown below.
The TAR model without missing data can be put in state space form taking the state vector as α t = (X t , X t−1 , . . ., X t−k+1 ) , with k = max{k 1 , . . ., k l }, as: where H = (1, 0, . . ., 0), ω t = (e t , 0, . . ., 0) and where a (j) i = 0 for i > k and I k−1 denote the identity matrix of order k − 1.The equation ( 16) is the observation equation and the equation ( 17) is the state equation.As pointed by Nieto (2005), this state space form corresponds to a state space model with regime switching and can be analysed efficiently using MCMC simulation procedure.
When there are missing data, the state space form in ( 16) can be modified to include such missing data; the new observation equation is: where H t = H and δ t = 0 if t ∈ {t 1 , . . ., t N } and H t = 0 and δ t = 1, otherwise, W is a discrete random variable with P r(W = w 0 ) = 1 for some point w 0 in the support of X t .The state equation remains the same.
Since the optimal estimates of the missing data, in the sense of minimum mean square error criterion, are the conditional expectations of the missing data given observed data, we need to sample from the density p(x m , z m |x o , z o ), where x m and z m denote the missing data set, and x o and z o denote the observed data set.Nieto (2005) states that this goal can be achieved by sampling from p(α, z|x), where x and z are constituted by full data x 1 , . . ., x N and z 1 , . . ., z N , and the missing data are replaced by artificial data, for example, the median of {x t } and {z t } and α = (α 1 , . . ., α T ) .Nieto (2005) propose the use of a Gibbs sampler in order to draw samples from p(z|α, x) and p(α|z, x).The density p(z|α, x) it is found to be: where z t = (z t−p+1 , . . ., z t ), α t and x t are similarly defined, and Finally, in order to sample values from the joint distribution of p(α|z, x), note that where sampling each term p(α t |α t−1 , . . ., α 1 , z, x) is equivalent to sample values from α t |α t−1 , Z t since α t = (X t , X t−1 , . . ., X t−k+1 ) ; and this is equivalent to sample values from the density p(X t |X t−1 , . . ., X t−k , Z t ), so we just need to sample values from the density of a non-standardized Student's t-distribution with n degrees of freedom, location parameter a . In summary, the estimation of missing data in TAR models can be carried out as follows: 1. Completion of the time series replacing the missing data {x t } and {z t }, with their respective median.
2. Identification and estimation of the TAR model using the completed time series following the algorithm presented in subsection 2.4.
3. Estimation of the missing data by means of a Gibbs sampler using the above methodology.
4. Re-estimation of the TAR model with the missing data replaced by their estimates.

Simulated Examples
In this section we present two simulation examples in order to illustrate the performance of the proposed methodology.

Example 1
We simulated a series {x t } of 100 observations from the model: with e t ∼ t5 √ 5/3 , Z t = 0.5Z t−1 + t and t is a Gaussian white noise process of mean 0 and variance 1 (GWN(0,1)).The simulated series are shown in Figure 1.In the first stage, we identified the number of regimes and the thresholds.Following Nieto, Zhang & Li (2013), the prior distribution for l is the Poisson distribution truncated1 in the set {2, 3, 4} with parameter 3, and the prior distribution of the thresholds is as described above.We run a Gibbs sampler of 1,000 iterations with a burn-in period of 200.In order to ensure that for each parameter, the draws have converged to the posterior distribution, we use the Geweke's Z-score plot (Geweke 1992) from the package coda in R (Plummer, Best, Cowles & Vines 2006).Geweke's Z-score computes the difference between the means of the first and last part of the draws of a Markov chain, and the plot shows the values of the Z-score where successively larger numbers (at most half of the chain) of iterations are removed from the beginning of the chain.For the number of regimes and autoregressive orders, we monitor the corresponding posterior probabilities.Since there are a lot of parameters in the model, we cannot show all the plots, only a few where we consider that the burn-in period of 200 iterations is appropriate (Figure 2).The posterior probability of the number of regimes is given in Table 1, where we can see that the number of regimes associated with the largest posterior probability is 2. The estimation of the threshold is 0.08462 .The 95% credible interval for the threshold is given by (-0.2892, 0.6737) containing the real threshold 0.
In the second stage, we estimated the autoregressive order in each of the two regimes where the number of regimes is fixed to be 2 and the value of the threshold to be 0.0846.The prior distribution for k j is the truncated Poisson distribution with parameter 2 in the set {0, 1, 2, 3}3 for each j = 1, 2. We run a Gibbs sampler of 1,000 iterations, and obtained the posterior probabilities for k 1 and k 2 displayed in Table 2.We can see that the identified autoregressive orders are k1 = 2 and k2 = 1, corresponding to the real autoregressive orders.Finally, we estimated the non-structural parameters: autoregressive coefficients, the variance weights and the degrees of freedom of the process of error.The prior distribution for these parameters is: N (0, 10) for the autoregressive coefficients a j i with i = 1, • • • , k j and j = 1, 2; distribution IG(2, 3) for the variance weights (h (1) ) 2 and (h (2) ) 2 ; and distribution Gamma(1, 0.1) for the degrees of freedom n.In this way, the prior mean of n is 10 and the prior variance is 100, which can considered as a non-informative prior distribution.
We run another Gibbs sampler of 1,000 iterations; the estimation and the 95% credible intervals of the autoregressive coefficients and the variance weights are given in Table 3.These estimations are close to the true parameters and all the 95% credible intervals contain the parameters.With respect to the degrees of freedom n, the results obtained from the Gibbs sampler are displayed in Figure 4, where we noted that the values of n with large posterior probability is around the true parameter 5.The posterior mean of n is given by 7.25, and a 95% credible interval of n is given by (3.3, 21.56).In order to check the appropriateness of the model, we use the CUSUM and CUSUMSQ plot of the standardized residuals since to our knowledge, there is no investigation about the distribution of the usual statistical tests in TAR models.These plots are shown in the Figure (3) where we can see that the overall performance is good.In conclusion, identification and estimation results were satisfactory, and we proceeded with the illustration of the estimation of the missing data.We set the number of missing data in the processes {Z t } and {X t } to be 4 and 6, respectively, and placed the missing data randomly.The resulting missing data for {Z t } and {X t } were situated at time points 8, 55, 63, 83 and 2, 13, 37, 41, 77, 80, respectively.The estimation and the credible intervals for the missing data after 5000 iterations are shown in Table 4.We can see that the overall performance of the procedure is satisfactory except at time 63 for {Z t }, where the observed value lays beyond the 95% credible interval.Finally, we illustrate the forecast procedure where the sample period considered is 1-92, and the forecast horizon is set to be 8.We assume that non-structural parameters are known and for each horizon we simulate 100 series from the model ( 18).For each we estimate the structural parameters and compute the forecasting value and the respective credible interval.In order to illustrate the results we compute the percentage of these 100 credible intervals containing the true observation.In Table 5 we show these percentages.
Table 5: Coverage of the credible intervals of forecasting results for the simulated {Xt} in example 1.

Example 2
We simulated a series {x t } of 300 observations from the model , with e t ∼ t5 √ 5/3 , Z t = 0.5Z t−1 + t and t ∼ GW N (0, 1).The simulated series are shown in Figure 6.In the first stage, we identified the number of regimes and the thresholds.The prior distribution for l is the Poisson distribution truncated in the set {2, 3, 4} with parameter 3, and the prior distribution of the thresholds is as described before.We run a Gibbs sampler of 1,000 iterations and the posterior probability for all the possible values of the number of regimes are shown in Table 6 suggesting that l = 3.The estimation of the two thresholds are -0.6394 and 0.5205, respectively.In Figure (7), we present the histogram of the simulated values for the threshold.The 95% credible interval for the threshold is given by (−1.1701, 0.6956) and (−0.3848, 1.1779), respectively, containing the real values of the two thresholds.However, note that the histogram of the simulated thresholds seems to be bimodal, Revista Colombiana de Estadística 38 (2015) 239-266 so we computed the median given by -1.08 and 0.723, which are slightly away from the simulated values.In future research, we will do more simulations in order to validate that , in the cases, the median works well.In the second stage, we estimated the autoregressive order in each of the two regimes.The prior distribution for k j is the truncated Poisson distribution with parameter 2 in the set {0, 1, 2, 3} for each j = 1, 2, 3. We run a Gibbs sampler of 1,000 iterations, and obtained the posterior probabilities for K 1 , K 2 and K 3 , displayed in Table 7.We can see that the identified autoregressive orders are k1 = 1, k2 = 2 and k3 = 1, corresponding to the real autoregressive orders.Finally, we estimated non-structural parameters: autoregressive coefficients, variance weights and degrees of freedom of the process of error.The prior distribution for these parameters is: N (0, 10) for the autoregressive coefficients a j i with i = 1, . . ., k j and j = 1, 2, 3; distribution IG(2, 3) for the variance weights (h (1) ) 2 and (h (2) ) 2 ; distribution Gamma(1, 0.1) for the degrees of freedom n, which can be considered a non-informative prior distribution as discussed in the example 1.

Histogram of the simulated threshold
We run another Gibbs sampler of 1,000 iterations; the estimation of the autoregressive coefficients and the variance weights are given in Table 8.These estimations are close to the true parameters and, except for paramter a (2) 1 whose value is 0.2, all the 95% credible intervals contain the true parameters.With respect to the degrees of freedom n, the results obtained from the Gibbs sampler are displayed in Figure 8, where we noted that the values of n with large posterior probability are around the true parameter 5.The posterior mean of n is given by 5.11 and a 95% credible interval of n is given by (3.17, 8.92).Since identification and estimation results are, generally speaking, satisfactory, we proceeded with the estimation of the missing data.We set the number of missing data in the processes {Z t } and {X t } to be 10 in both series, and placed the missing data randomly.The estimation and the credible intervals after 5000 iterations are shown in Table 9.We can see that the overall performance of the procedure is satisfactory in the sense that all the observed data are within the 95% credible interval.

Histogram of n
Finally, we illustrate the forecast procedure where the sample period considered is 1-290, and the forecast horizon is set to be 10.We assume that the non-structural parameters are known and for each horizon we simulate 100 series from the model (18).Like the previous example, we compute the percentage of these 100 credible intervals containing the true observation.In Table 10 we show these percentages.

An Application in Finance
In this section, we applied the proposed algorithm to financial time series to illustrate the methodology.Specifically, we used the daily log return of the Dow Jones industrial average as the threshold series, and the daily log return of the BOVESPA index (Brasil Sao Paulo Stock Exchange Index) as the series of interest, from December 12th, 2000 to June 2nd, 2010.Moreno (2010) tested the non-linearity of the data using the test of Tsay (1998) with lag up to 4 for the log return of the Dow Jones index, and found that the appropriate lag is 0. In this way, we defined X t = ln(BOV ESP A t ) − ln(BOV ESP A t−1 ) and Z t = ln(DOW JON ES t ) − ln(DOW JON ES t−1 ).The log return of these series is displayed in Figure 11.
In the first stage of the algorithm, the identified number of regimes is 3 with probability 1, that is, in all of the 1,000 iterations of the Gibbs sampler, the sampled value of L is 3.The estimated thresholds are -0.0051 and 0.0054, with respective credible intervals (-0.0242, 0.0099) and (-0.0081, 0.0226).The histograms of the sampled thresholds are shown in Figure 12.Observing the values of the two thresholds, we could name the three regimes as large negative return in Dow Jones, small return in Dow Jones and large positive return in Dow Jones.
Once the number of regimes and thresholds were identified, we proceeded with the identification of the autoregressive orders using another Gibbs sampler.In Table 11, we show the posterior probabilities for all possible values of the autoregressive orders, suggesting that k1 = k2 = 1 and k3 = 3. of the residuals, the squared residuals show large autocorrelations, which is a disadvantage compared to the family of GARCH models where the residuals show strong evidence of independence (see the ACF of residuals and squared residuals of a GARCH(1,1) model in Figure 14).In Figure 15, we can observe the ACF of the residuals and squared residuals, and obviously the squared residuals of the TAR model with t distributed noise still exhibit the same problem as the TAR model with Gaussian noise.Although the TAR model seems to fail in capturing all the structure of dependence in the data, Nieto & Moreno (2013) found the expression for the conditional variance V ar(X t |x t−1 , . . ., x 1 ) in a TAR model, making this class of model comparable with the GARCH models.In Figure 16, we show the conditional variance V ar(X t |x t−1 , . . ., x 1 ) in the TAR model 19, as well as the GARCH(1,1) model; we can see that the general behaviour is similar for the two models, although the bottom line in the TAR model is around 0.0076, while in the GARCH model it around 0.0003.

Conclusion
In this work, we proposed a new family of TAR models: the TAR models with t-distributed noise process with a three-stage procedure consisting of: (1) identifying the number of regimes and the corresponding thresholds, (2) identifying the autoregressive order in each regime, and (3) estimating non-structural parameters, i.e., the autoregressive coefficients and the type II conditional variance in each regime, and other parameters that each particular model may contain.The performance of the developed methodology is satisfactory in simulated data, however, the GARCH models seems to better capture the heterocedastic aspect contained in financial data.In future investigation the GARCH models may be used together with the TAR model.

Figure 2 :
Figure 2: Geweke's Z-score for some of the parameters in the Gibbs sampler.

Figure 3 :Figure 4 :
Figure 3: CUSUM and CUSUMSQ plot of the standardized residuals of the estimated model in example 1.

Figure 7 :
Figure 7: Histogram of the simulated values of the thresholds for three regimes in example 2.

Figure 8 :
Figure 8: Histogram of the simulated values of the degrees of freedom n in example 2.

Table 1 :
Posterior probability for the number of regimes L in example 1.

Table 2 :
Posterior probabilities of the variables K1 and K2 in example 1.

Table 3 :
Estimation and 95% credible intervals for the non-structural parameters for the simulated data in example 1.

Table 4 :
Estimation and 95% credible intervals for the missing data in example 1.

Table 6 :
Posterior probability for the number of regimes L in example 2.

Table 7 :
Posterior probabilities of the variables K1 and K2 in example 2.

Table 8 :
Estimation and 95% credible intervals for the non-structural parameters in example 2.