Volatility Forecasting Models and Market Co-Integration: A Study on South-East Asian Markets

Volatility forecasting is an imperative research field in financial markets and crucial component in most financial decisions. Nevertheless, which model should be used to assess volatility remains a complex issue as different volatility models result in different volatility approximations. The concern becomes more complicated when one tries to use the forecasting for asset distribution and risk management purposes in the linked regional markets. This paper aims at observing the effectiveness of the contending models of statistical and econometric volatility forecasting in the three South-east Asian prominent capital markets, i.e. STI, KLSE, and JKSE. In this paper, we evaluate eleven different models based on two classes of evaluation measures, i.e. symmetric and asymmetric error statistics, following Kumar's (2006) framework. We employ 10-year data as in sample and 6-month data as out of sample to construct and test the models, consecutively. The resulting superior methods, which are selected based on the out of sample forecasts and some evaluation measures in the respective markets, are then used to assess the markets cointegration. We find that the best volatility forecasting models for JKSE, KLSE, and STI are GARCH (2,1), GARCH(3,1), and GARCH (1,1), respectively. We also find that international portfolio investors cannot benefit from diversification among these three equity markets as they are cointegrated.

Volatility modeling is a research area that has been growing sharply since few years ago. This approach is a non linear modeling that is used to estimate capital market products, such as stock. A volatility model tries to estimate risk of an asset, which is well known as Value at Risk or VaR. The calculation of VaR can be parametric and nonparametric. A statistically developed model is usually categorized as parametric model and based on probability distribution of return.
The development of forecasting modeling was started from the mean models, such as AR, MA, ARMA and ARIMA, to the discovery of models that incorporate volatility values, such as ARCH /GARCH and its derivatives. There are some limitations on the mean models, as they do not anticipate the timevarying volatility in the forecasting. Meanwhile, volatility models assume volatility varies over time, and therefore are considered more suitable for the forecasting.
Moreover, volatility forecasting and correlation are the crucial factors in risk management. Investor's ability to appropriately estimate the variability in the asset price movements and relationship among the assets may help him reduce the risk he faces.

Obstacles in Dealing With Financial Time Series
The need for downside risk measurement forces scholars and institutions to work on the measurement technique. Finally, in 1994 JP Morgan introduced Value at Risk (VaR) to measure market risks and record in a standard way of results. Although VaR itself cannot be perfect solution for measuring the market risks, it plays an vital role to convey the other risk studies and enhance investors' risk understanding. VaR is a statistical definition that states one number of maximum loss per day, per week or per month. In other words, VaR is a statistical summary of financial assets or portfolio in terms of market risk (Culp, Mensink, Neves, 1999:3). A VaR calculation is aimed at making a statement that the investors are X % certain that they will not lose more than V a month of money in the next N days.
There are some problems occurring when a financial model is developed using financial time series (Hassan & Shamiri, 2005), especially those of high frequency data. First of all, financial time series often reveal volatility clustering. In such a circumstance, large changes tend to be followed by large changes and small changes by small changes. Secondly, the series often exhibit leverage effects in the sense that changes in stock prices tend to be negatively correlated with changes in volatility. This implies that volatility is higher after negative shocks than after positive shocks of the same enormity. Finally, the series often show leptokurtosis, i.e. the distribution of their returns is heavily tailed (McMillan and Speight, 2004).
Meanwhile, we cannot employ traditional regression tools to overcome the abovementioned obstacles as they have been proven limited in the modeling of high-frequency data. The tools assume that that only the mean response could be changing, while the variance stays constant over time. This is impractical, as financial series demonstrate clusters of volatility, which can be identified graphically. Engle (1982) proposed Auto-Regressive Conditional Heteroscedastic (ARCH) models to alleviate the first two problems, i.e. volatility clustering and leptokurtosis. Such models provided new instruments for measuring risk, and the associated influence on return. The models also provided new means for pricing and hedging non-linear assets. To overcome the third constraint, i.e. leptokurtosis, the ARCH models were then generalized. Bollerslev (1986) introduced Generalized Auto-Regressive Conditional Heteroscedastic (GARCH), which were then advanced into some derivations, such as EGARCH (Nelson, 1991) and TGARCH (Zakoian, 1994). Nevertheless, GARCH models often do not fully portray the heavy tails property of high frequency data. Therefore, the application of non normal distribution, such as Student-t, generalized error distribution (GED), Normal-Poisson, is inevitable. Additionally, adaptive exponential smoothing methods allow smoothing parameters to change over time, in order to adapt to changes in the characteristics of the time series. In this paper, we compare covariance matrix model with Exponential Smoothing Model and GARCH Derivation and the Associated Derivation Models. With the exception of GARCH models, Ederington dan Guan 2005 find that models based on absolute return deviations generally forecast volatility better than otherwise equivalent models based on squared return deviations. Among the most popular time series models, we find that GARCH(1,1) generally yields better forecasts than the historical standard deviation and exponentially weighted moving average.

Cointegration of Three Stock Markets
In regional and international investment activities, investors, portfolio managers, and policy makers require a model that can reveal linkage and causality across financial markets, especially markets in a neighboring area. The model will provide them better view of the markets' movement and, therefore, enable them to appropriately price underlying assets and their derivatives, as well as to hedge the associated portfolio risks. Cointegration analysis has been the most popular approach employed by academicians and stock market researchers in developing such a linkage and causality model.
Cointegration analysis was firstly developed 19 years ago, starting with the seminal contributions by Granger (1981), Engle & Granger (1987), and Granger & Hallman (1991). It can reveal regular stochastic trends in financial time series data and be useful for long-term investment analysis. The analysis considers the I (1) − I (0) type of cointegration in which linear permutations of two or more I (1) variables are I (0) (Christensen & Nielsen, 2003). In the bivariate case, if y t and x t are I (1) and hence in particular nonstationary (unit root) processes, but there exists a process e t which is I (0) and a fixed β such that : y t = β'x t + e t then x t and y t are defined as cointegrated. Consequently, the nonstationary series shift together in the sense that a linear permutation of them is stationary and therefore a regular stochastic trend is shared. Granger & Hallman (1991) proves that investment decisions merely-based on short-term asset returns are inadequate, as the long-term relationship of asset prices is not considered. They also shows that hedging strategies developed based on correlation require frequent rebalancing of portfolios, whereas those developed strictly based on cointegration do not require rebalancing. Lucas (1997) and Alexander (1999), using applications of cointegration analysis to portfolio asset allocation and trading strategies, have proven that Index tracking and portfolio optimization based on cointegration rather than correlation alone may result in higher asset returns. Meanwhile, Duan and Pliska (1998), by developing a theory of option valuation with cointegrated asset prices, reveal that cointegration method can have a considerable impact on spread option price volatilities. Furthermore, economic policy makers must have comprehensive knowledge on transmission of price movements in regional equity markets, especially during periods of high volatility. Appropriate policy may be designed to lessen the degree of financial crises. Therefore, a research on cointegration and causality among regional equity markets is essential. Cointegration approach complements correlation analysis, as correlation analysis is appropriate for short-term investment decisions, while cointegration based strategies are necessary for long-term investment.

Historical Model
The simplest model for volatility is the historical estimate. Historical volatility simply involves calculating the variance (or standard deviation) of returns in the usual way over some historical period, and this then becomes the volatility forecast for all future periods. The historical average variance (or standard deviation) was traditionally used as the volatility input to options pricing models, although there is a growing body of evidence suggesting that the use of volatility predicted from more sophisticated time series models will lead to more accurate option valuations. Historical volatility is still useful as a benchmark for comparing the forecasting ability of more complex time models.

Exponential Smoothing
With a large history of observations available, variance estimator can be written in the simple exponential smoothing recursive form with smoothing parameter, α: Some researchers have argued that a smoothing parameter should be allowed to change over time in order to adapt to the latest characteristics of the time series. Since exponential smoothing for volatility forecasting is formulated in terms of variance forecasts, , RiskMetrics (1997) suggests the following minimization:

ARIMA Model
An ARIMA model is a univariate model that seeks to depict a single variable as an Autoregressive Integrated Moving Average process. Herein, the series is fully described by p, the order of the AR component, q, the order of the MA component and d, the order of integration. The AR component is built upon the assumption that future realizations can be approximated and predicted by the behaviour of current and past values. The MA component, on the other hand, seeks to depict the processes where the effects of past environmental innovations continue to reverberate for a number of periods. If y t is an ARIMA p,d,q process, then the series evolves according to the following specification: Where θ0 is a constant, ε is the error term, q is the number of lagged terms of ε and p is the number of lagged terms of y t The ARIMA model can be described as a theoretical, as it ignores all potential underlying theories, except those that hypothesis repeating patterns in the variable under study.

EWMA Model
RiskMetrics measure the volatilty by using EWMA model that gives the heaviest weight on the last data. Exponentially weighted model give immediate reaction to the market crashes or huge changes. Therefore, with the market movement, it has already taken these changes rapidly into effect by this model. If give the same weight to every data, it is hard to capture extraordinary events and effects. Therefore, EWMA is considered to be a good model to solve the problem. If the exponential coefficient is chosen as a big number, current variance effects will be small over total variance. EWMA model assumes that the weight of the last days is more than old days. EWMA is a model that assumes assets price changes through time. JP Morgan uses EWMA model for VaR calculation. EWMA responds the volatility changes and EWMA does assume that volatility is not constant through time. Using EWMA to modelling volatility, the equation will be: Where λ is an exponential factor and n is a number of days. In equation μ is the mean value of the distribution, which is normally assumed to be zero for daily VaR. The equation can be stated for exponential weighted volatility: This form of the equation directly compares with GARCH model. The crucial part of the performance of the model is the chosen value factor. JP Morgan`s RiskMetrics model uses factor value as of 0,94 for daily and 0,97 for monthly volatility estimations. For EWMA calculation, the necessary number of days can be calculated by the following formula (Best, 1999:70). To minimize the average of error squares, it needs to identify the number of exponential factor with variance is the function of exponential factor. By using this methodology, it is determined that daily volatility forecasting for 0.94 and for monthly volatility forecasting is 0.97.

Auto-Regressive Conditional Heteroscedastic (ARCH)
ARCH was firstly developed by Bachelier in 1900s, before Mandelbrot (1963) advanced this method in observing economics and finance variables. He stated that non conditional distribution had thick tails, variance changed over time, and each change, small or large, would usually be followed by another change. Several years later, Engle (1982) developed this approach by assuming that error value of ARCH mode is normally distributed with mean = 0 and non constant variance or , or where the equation ensures that variance is positive, or explicitly stated as:

Generalized Auto-Regressive Conditional Heteroscedastic (GARCH)
The GARCH model was developed independently by Bollerslev (1986) and Taylor (1986). The GARCH model allows the conditional variance to be dependent upon previous own lags, so that the conditional variance equation in the simplest case is now This is a GARCH(1,1) model. σ t 2 is known as the conditional variance since it is a one-period ahead estimate for the variance calculated based on any past information thought relevant. GARCH is considered better than ARCH as the former is more parsimonious, and avoids over fitting. Consequently, the model is less likely to breech non-negativity constraints.
The GARCH(1,1) model can be extended to a GARCH(p,q) formulation, where the current conditional variance is parameterized to depend upon q lags of the squared error and p lags of the conditional variance: σ t 2 = α 0 +α 1 +α 2 +...+α q +β 1 σ t-1 2 +β 2 σ t-2 But in general a GARCH(1,1) model will be sufficient to capture the volatility clustering in the data, and rarely is any higher order model estimated or even entertained in the academic finance literature.

Exponential Generalized Auto-Regressive Conditional Heteroscedastic (EGARCH)
The exponential GARCH model was proposed by Nelson (1991). There are various ways to express the conditional variance equation, but one possible specification is given by The model has several advantages over the pure GARCH specification. First, since the log(σ t 2 ) is modeled, then even if the parameters are negative, σ t 2 will be positive. There is thus no need to artificially impose non-negativity constraints on the model parameters. Second, asymmetries are allowed for under the EGARCH formulation, since if the relationship between volatility and returns is negative, γ, will be negative.
Note that in the original formulation, Nelson assumed a Generalized Error Distribution (GED) structure for the errors. GED is a very broad family of distributions that can be used for many types of series. However, due to its computational ease and intuitive interpretation, almost all applications of EGARCH employ conditionally normal errors as discussed above rather than using GED.

Threshold Auto-Regressive Conditional Heteroscedastic (TARCH)
TARCH or Threshold ARCH was introduced independently by Zakoian (1990) and Glosten, Jaganathan and Runkle (1993). This Specification for the conditional variance is given by Where d t = 1 if u t > 0, and 0 otherwise. In this model, good news (u t > 0), and bad news (u < 0), have differential effects on the conditional variance-good news has an impact of a, while bad news has an impact of (α + γ). If γ > 0 we say that the leverage effect exists. If γ ≠ 0, the news impact is asymmetric. These findings suggest that traders and risk managers are able to generate asset profit and minimize risks if they obtain a better understanding of how volatility is being forecasted.

The Power ARCH (PARCH) Model
Taylor (1986) and Schwert (1989) introduced the standard deviation GARCH model, where the standard deviation is modeled rather than the variance. This model, along with several other models, is generalized in Ding et al. (1993) with the Power ARCH specification. In the Power ARCH model, the power parameter of the standard deviation can be estimated rather than imposed, and the optional parameters are added to capture asymmetry of up to order :

Cointegration
The second phase involves an assessment on the three market series for cointegration. The cointegration test is to determine whether or not the three non-stationary price indices share a common stochastic trend. The estimated cointegrating equation is as follows: (4) In the equation (4), the cointegrating relationship is normalized on the log of JKSE index. If it is normalized, say, on the log of JKSE, then (4) becomes: We do not survey cointegration results that are normalized on the largest stock market based on capitalization. Instead, we report results that are normalized on JKSE that has the smallest market capitalization value among the three markets.
JJ estimation procedure that uses the maximum likelihood method is then employed. The cointegration tests assume no deterministic trends in the series and use lag intervals 1 to 1 as suggested by the SBIC for appropriate lag lengths. However, it would not have made any difference even if we had chosen AIC (Akaike Information Criterion) because both the AIC and SBIC suggested the same lag length as well as the assumptions for the test. The assumptions of the test are that the indices in log levels have no deterministic trends and the cointegrating equation has an intercept but no intercept in the VAR. The results of cointegration tests are presented in Table 6. The trace test, which tests the null hypothesis of r cointegrating relations against k cointegrating relations, where k is the number of endogenous variables, for r = 0,1, ……k. If there are k cointegrating relations, it implies that there is no cointegration between the three series. The maximum eigen value test which tests the null of r cointegrating relations against the alternative of r + 1 cointegrating relations, results indicated one cointegrating equation at the 5% percent level of significance. The critical values used from Osterwald-lenum (1992) are slightly different from those reported in JJ (1990). The cointegrating relationship is normalized on Ijkse. The cointegrating vector of the three daily price indices, JKSE, KLSE, and STI, normalized on JKSE is: [1 -1.0 -0.44]. The cointegrating equation indicates that JKSE and KLSE indices adjust one-to-one in the long-run, and a smaller adjustment occurs between JKSE index and STI index.

Descriptive Data
On Figure 1, it can be seen that STI values had consistently been above values of the other two from 1998 to 2007. Meanwhile, JKSE index had been below KLSE index before crossing over KLSE index in 2004. Since then, JKSE index has been consistently above KLSE index, and was considered as the best index in Asia Pacific.    Table 1 we can see that mean value of JKSE is the highest, i.e 0.0004, which is followed by that of STI and KLSE. JKSE index also shows the highest volatility, with standard deviation of 0.017, while STI index records the least standard deviation, i.e. 0.013957. Probability of Jarque Berra values show that data of all indices are normally distributed, as all the probabilities are less than 0.05.

The Best Forecasting Models
From Table 2, we can see that the best model applied to the three indices is GARCH. However, the respective GARCH combinations for the indices are different from each other. For JKSE, KLSE, and STI, the respective best model are JKSE GARCH (2,1), KLSE GARCH (3,1), and STI GARCH (1,1), consecutively. The best model is chosen based on the greatest SIC absolute value of a model. Based on that criterion, models of AR, MA, ARMA, ARIMA, as well as some derivations of ARCH and GARCH, do not show the best results. The ARCH LM Test results validate the selected models, as all the associated figures are greater than the significance level of 0.05, which means that there is no more ARCH element in the formed models. Source: Processed Data Note : This table presents the results of the four models for the conditional mean and conditional variance of JSX, JKSE and STI daily return in log from July, 1 1997 to June, 30 2007, a total of 2559 observations. * significant at confidence level of 10% ** significant at confidence level of 5% *** significant at confidence level of 1% The above three models are in-sample forecasting models. The performance of out-of-sample models does not outperform that of the in-sample models. In some degree, this finding is in line with result of a study conducted by Day and Lewis (1992), who concluded that out-of-sample model was not accurate in predicting stock or bond prices. On Table 3, we can see that RMSE and MAE indicators of the respective models are quite close, meaning that their forecasting powers are somewhat similar.

Augmented Dickey Fuller (ADF) Unit Root Test
The very early phase in the estimation process is deciding the order of integration of the individual price index series in natural log levels. The logs of the indices, denoted as JKSE, KLSE, and STI, are tested for unit roots using the augmented Dickey-Fuller (ADF) (1979) test using the lag structure indicated by Schwarz Bayesian Information Criterion (SBIC). The p-values used for the tests are the MacKinnon (1996) one-sided p-values. The test results, as can be seen on Table 5, indicate that the null hypothesis, the price index in log levels contains a unit root, cannot be rejected for each of the three price series. Then, unit root tests are performed on each of the price index series in log first differences. The null hypothesis of a unit root could be rejected for each of the time series. No further tests are performed, since each of the series is found to be stationary in log first differences. The finding that each price series is non-stationary implies that each observed market is weakly efficient.
We test for market indices cointegration between JKSE and KLSE, JKSE and STI, KLSE and STI. All the above pairs are cointegrated, but the test results are not presented, as our focus is the relationship among the three markets. The finding that the market indices are cointegrated means that there is one linear combination of the three price series that forces these indices to have a long-term equilibrium relationship even though the indices may wander away from each other in the short-run. It also implies that the returns on the indices are correlated in the long-term. The message for long-term international investors is that it does not matter, in terms of portfolio returns, whether investors in the three countries hold a fully diversified portfolio of stocks contained in all the three indices or hold portfolios consisting of all stocks of only one index. Cointegration between the portfolio and the index is assured when there is at least one portfolio of stocks that has stationary tracking error, that is, the difference between the portfolio of stocks and the stock index is stationary, or to put it differently, the price spread between the two is mean-reverting. However, in the short-run, the two may deviate from each other with the potential for higher returns on the portfolio relative to the index. So, investors may still be able to earn excess returns in the short-run by holding a portfolio of stocks from the three markets.
The final phase is the estimation of the three variable VEC model. In terms of this study analysis, the estimated vector error-correction model of price indices has the following form: where , and are the first log differences of the three market indices lagged p periods, are the equilibrium errors or the residuals of the cointegrating equations, lagged one period, and are the coefficients of the error-correction term. The lag lengths for the series in the system are determined according to the SIC. The suggested lag lengths are one to one. No restrictions are imposed in identifying the cointegrating vectors. The coefficients of the error correction terms are denoted by . Estimated results can be seen on Table 5.

12
The estimated coefficient values of the lagged variables along with the t-statistics are presented without the asymptotic standard errors corrected for degrees of freedom for want of space, and will be available from the authors. At the bottom of the output on Table 5 the log likelihood values, the AIC and SBIC are reported.
Three types of inference, concerning the dynamics of the three markets, can be drawn from the reported results of the VEC model in Table 5. The first one concerns whether the left hand side variable in each equation in the system is endogenous or weakly exogenous. The second type of inference is about the speed, degree, and direction of adjustment of the variables in the system to restore equilibrium following a shock to the system. The third type of inference is associated with the direction of short-run causal linkages between the three markets.

Adjustment to Shocks
In general, a cursory look at the statistical significance of the reported coefficients of the error-correction terms (λ i ) of Aljkse, Alklse, and Alsti equations provides us an idea whether the left-hand side variable in each equation of the system is exogenous or endogenous. If the coefficient of the error-correction term is not significantly different from zero, it usually implies that that variable is weakly exogenous, otherwise, it is endogenous.
Reviewing the results on Table 5, we see that the coefficient of the error correction term, λ 3 , in the Alsti equation is not significantly different from zero implying that the STI index is weakly exogenous to the system. The weak exogeniety of STI index means that it is the initial receptor of external shocks, and it in turn, will transmit the shocks to the other markets in the system. As a result, the equilibrium relationship of the three markets is disturbed. The adjustment back to equilibrium can be inferred from the signs and magnitude of the coefficients, λ 1 , (AlJKSE equation) and λ 2 (AlKLSE equa-tion). The sign of λ t is positive and its magnitude, in absolute terms, is relatively small (0.000502), and the sign of λ 2 is negative and larger (-0.015962), while λ 3 shows slightly smaller magnitude of -0.004787.
Meanwhile, the risk performance of each of the observed markets is assessed using delta normal based Value at Risk. Using variance of each market displayed on Table 1, with number of observations of 2,436 for each market, and using significance level of 95%, our calculation ends up with the following delta-normal-based-Value at Risk: The delta normal VaR of JKSE index is the largest, meaning the market is the riskiest among the three markets. Delta normal VaR of KLSE is slightly smaller than that of STI. If this risk measure is compared with the markets' return, we can say that the longtime rule of financial management, i.e. high risk means high return, does not hold. JKSE index records the lowest average return, while revealing the highest risk. On contrary, STI market records the highest growth level with relatively low risk level. In some extent, this phenomenon can be explained by the associated domestic political and economic stability influencing the market.

Discussion
We find that the best volatility forecasting models for JKSE, KLSE, and STI are GARCH (2,1), GARCH(3,1), and GARCH (1,1), respectively. These three models are in-sample forecasting models, whose performances are better than those of out-of-sample models. This finding is in some extent in line with