Kernel-based portmanteau diagnostic test for ARMA time series models

In this paper, the definition of the Toeplitz autocorrelation matrix is used to derive a kernel-based portmanteau test statistic for ARMA models. Under the null hypothesis of no serial correlation, the distribution of the test statistic is approximated by a standard normal using the kernel-based normalized spectral density estimator, without having to specify any alternative model. Unlike most existing portmanteau test statistics, the proposed test is defined for all lags. Simulation studies are conducted to assess the performance of the proposed test. A real application is given to demonstrate the usefulness of this goodness-of-fit test statistic. Subjects: Science; Mathematics & Statistics; Statistics & Probability; Statistics; Mathematical Statistics; Statistical Computing; Statistics & Computing; Statistical Theory & Methods; Statistics for Business, Finance & Economics

ABOUT THE AUTHOR Esam Mahdi obtained his PhD in Statistics from Western University, London Ontario, Canada. He has been working as an assistant professor since 2011 in the Department of Mathematics, Islamic University of Gaza, Palestine. Since September 2015, Dr Mahdi was appointed as an adjunct faculty member at the Department of Statistics and Actuarial Sciences in the University of Western Ontario. His research's interests include time series and regression analysis, multivariate data analysis, and spatio-temporal analysis.

PUBLIC INTEREST STATEMENT
One of the most important stages of constructing a statistical model is that of its diagnosis. If the model is well specified then the residuals (the observable estimate of the unobservable statistical errors) serial autocorrelation should be close to zero. The portmanteau test statistics are used as diagnostic tools to check whether any group of autocorrelations of the residual time series are correlated. Based on this concept, and using the autocorrelation matrix defined from Peña and Rodríguez (2002), we derive a portmanteau test statistic. Our test statistic is essentially the same as the one proposed by Peña and Rodríguez (2006); however, they have different distributions. The distribution of our proposed statistic is approximated by a standard normal using the approach described by Hong (1996a). We support the theoretical results with simulations experiments and provide a real application to demonstrate the usefulness of our test statistic in detecting underfit time series models.

Introduction
The autoregressive-moving-average, ARMA(p, q), models for the univariate time series Z t can be written as i B i are polynomials in B of degrees p and q respectively; p and q ≥ 0 are the order of the autoregressive, AR, model and moving average, MA, model respectively; B is the backshift operator on t; and {a t } is a white noise series that is uncorrelated with a mean zero and a finite variance. It is assumed that the model is stationary, invertible with no common roots between Φ p (B) and Θ q (B) (Box & Jenkins, 1970Brockwell & Davis, 2009).
Denote the variance, autocovariance, and autocorrelation functions of a t by 2 = VAR(a t ), = cov(a t , a t− ) = E(a t a t− ), and r = ∕ 2 , for time lag ∈ ℤ respectively, the null and alternative hypothesis are The normalized spectral density of the stationary process {a t } is The null hypothesis, H 0 , is equivalent to the null normalized spectral density f ( ) = f 0 ( ) = 1∕2 (Anderson, 1993;Hong, 1996aHong, , 1996b. After fitting the model in Equation (1) to a series of length n and obtaining the residuals, â t , it is important to conduct some goodness-of-fit tests on these residuals to check the validity of the model assumptions (Li, 2004). Under the null hypothesis which indicates that the model has been correctly identified, we expect the residuals to be approximately white noise. When there is no significant autocorrelation in the residuals, their sample autocorrelations, r = ∑ n t= +1âtât− ∑ n t=1â 2 t ≈ 0, for = 1, 2, … , m ≤ n − 1, where m is the largest lag considered for autocorrelation. On the other hand, when there is autocorrelation present, the autocorrelation values should significantly deviate from zero.
Since the early 1970s, many research workers including Box and Pierce (1970), Ljung and Box (1978), Monti (1994), Rodríguez (2002, 2006), Fisher and Gallagher (2012) and Mahdi and McLeod (2012), developed portmanteau test statistics to check the lack of fit of ARMA models; although, the asymptotic distributions of these test statistics were derived based on the assumption that the time-series data are driven by independent and identically distributed (iid) Gaussian innovations, simulation experiments suggested that these tests are asymptotically valid without the Gaussian innovations assumption, taking into account that the zero mean iid and finite fourth moment assumption is kept. In this respect, Hong (1996aHong ( , 1996b implemented the kernel-based normalized spectral density approach to derive three classes of efficient portmanteau test statistics without restrictions on the distribution of the innovations except that they are iid with mean zero and finite fourth moment. Hong (1996a) showed that the distribution of these test statistics are asymptotically normal with better power than the portmanteau tests of Box and Pierce (1970) and Ljung and Box (1978).
In this paper, we devised a portmanteau test statistic for ARMA models using the Toeplitz autocorrelation matrix defined from Peña and Rodríguez (2002). The proposed test statistic is essentially the same to the statistic provided by Peña and Rodríguez (2006); however, they have different distributions. Under the null hypothesis of no serial correlation, we proved that the kernel-based normalized spectral density estimator described by Hong (1996a) can be used to approximate the distribution of the test statistic by a standard normal, without having to specify any alternative model.
In Section 2, a brief review of commonly univariate portmanteau tests employed for diagnostic checking in ARMA models is given. In Section 3, we propose a kernel-based test statistic using Toeplitz autocorrelation matrix defined from Peña and Rodríguez (2002), where the kernel-based spectral density approach in Hong (1996a) is used to construct its asymptotic distribution. Section 4 provides simulation experiments to demonstrate the behaviour of the proposed test statistic and to compare its power with the Ljung and Box (1978) and Gallagher and Fisher (2015) test statistics. We close this article with Section 5 by introducing an illustrative application of real data demonstrating the usefulness of the proposed test.

Portmanteau test statistics for ARMA models
Several works in the time series literature consider the portmanteau tests for diagnosis the adequacy of the fitted ARMA models. In this section, we briefly review the most significant contributed tests.
The well-known portmanteau test statistics are where Q m was proposed by Box and Pierce (1970), Q m was proposed by Ljung and Box (1978), and Q m was proposed by Monti (1994), where is the residual partial autocorrelation at lag . The asymptotic distribution of each of these three statistics was derived as a chi-squared of m − p − q degrees of freedom. Peña and Rodríguez (2002) devised a generalized variance portmanteau test statistic based on the mth root of the determinant of the Toeplitz Hermitian residual autocorrelation matrix of order m + 1,  m = [r i,j ; i, j = 0, 1, … , m], where r i,j = 1 for i = j and r i,j =r j−i for i ≠ j, i.e. a matrix of the form They approximated the distribution of their proposed test statistic by the gamma distribution and provided simulation experiments to demonstrate the improvement of their statistic in comparison with the one given by Ljung and Box (1978). Peña and Rodríguez (2006) modified the previous generalized variance test by taking the log of the (m + 1)th root of the determinant of  m in Equation (4). They approximated the distribution of their modified test using the Gamma and Normal distributions and indicated that the performance of both approximations for diagnostic checking in linear models are similar and more powerful for small sample size than their previous one. The first approximation distributed as a Gamma with parameters , whereas the second approximation which is distributed as a  (0, 1) is defined as 3[m(2m+1)∕(6(m+1))−(p+q)] 2 −1 . Mahdi and McLeod (2012) extended Rodríguez (2002, 2006) tests to the multivariate time series. The univariate test statistic of Mahdi and McLeod (2012) is where  m is defined in Equation (4). They demonstrated that the null distribution of m is approximately Recently, Fisher and Gallagher (2012) provided a portmanteau statistic consisting of a weighted sum of squared of residual autocorrelation terms as follows They utilized the gamma approximation similar to Peña and Rodríguez (2002) and derived the limiting distribution of their weighted portmanteau tests as a Gamma distribution. More recently, Gallagher and Fisher (2015) suggested to consider three weighting schemes for the weights in Equation (6). The weighting schemes used in their three statistics were: the squared Daniell kernelbased weights as suggested by Hong (1996aHong ( , 1996b, w = (n + 2)(n − ) −1 K 2 ( ∕m), the geometrically decaying weights, w = (p + q)a −1 , for some 0 < a < 1, and the data-adaptive weights which gives the data-adaptive weights test as follows where the first m 0 terms obtain the standardizing weight (n + 2)∕(n − ) from the Ljung-Box statistic, and the remaining weights (from m 0 + 1 to m) selected from the data to be summable w = − log(1 − |̂ |). In practice we select m 0 = ⌊log(n)⌋ or m 0 = ⌊p + q + 1⌋ where ⌊x⌋ is the floor function denotes the largest integer less than or equal to x. Note that if m 0 ≥ m then the dataadaptive weights test in Equation (7) will be the same as the classical Ljung-Box test statistic. Gallagher and Fisher (2015) showed that the weighted portmanteau tests can be more powerful to detect the underfit ARMA models in many situations and less sensitive to the choice of the maximum correlation lag, especially when m depends on n comparing with the other statistics from the literature. They highlighted that the data-adaptive weights test statistic inherit desirable properties of both the Ljung-Box (AR underfit power) and Monti (MA underfit power) statistics.
The problem is that, for large order p + q and small lags m of ARMA(p, q) models, the previous mentioned test statistics are not defined. Hong (1996aHong ( , 1996b addressed the problem of testing the null hypothesis of no serial correlation of unknown distribution for the residual series without having to specify any alternative model. He proposed three classes of portmanteau test statistics using the kernel-based normalized spectral density estimators of f ( ), here defined where k(⋅) is a nonnegative symmetric kernel function and m n is the bandwidth that depends on the sample size (Priestley, 1981), where m n → ∞ and m n ∕n → 0. The tests measure the distance (quadratic norm, Hellinger metric, and Kullback-Leibler information criterion) between a kernel-based normalized spectral density estimator and the null normalized spectral density, where each of them was approximated as a standard normal distribution.

Asymptotic distribution
In this section, under the null hypothesis of no serial correlation, and without having to specify any alternative model, we use the Toeplitz autocorrelation matrix defined from Peña and Rodríguez (2002) to derive a portmanteau test statistic. The portmanteau test statistic is essentially the same as the one proposed by Peña and Rodríguez (2006) and can be seen as Kullback-Leibler discrimination information test discussed by Hong (1996a). We derive the asymptotic distribution of the portmanteau test statistic as  (0, 1). Using Szegö's theorem (Grenander & Szegö, 1958), which links the asymptotic behavior (as m is large) of the eigenvalues, {̂m ,i ; i = 0, 1, … , m}, to the integral of the spectral density function, we get where f is a positive function associated with positive eigenvalues.
It follows from Hadamard's inequality for the determinant of a positive definite matrix, log | m | ≤ 0. When there is no significant autocorrelation in the residuals, r = O(n −1∕2 ), so  m is approximately identity matrix and hence log | m | ≈ 0. On the other hand, when there is autocorrelation present, log | m | will be expected to be less than 0. i.e. under the null hypothesis H 0 ,  m ≈  m = [r i,j ; i, j = 0, 1, … , 1] where r i, j = 1 for all i = j and 0 otherwise. Hence the difference between (m + 1) −1 log | m | and (m + 1) −1 log | m | will be where f 0 ( ) = 1∕2 is the null spectral density.
Note that the integral in Equation (11) is the same as Kullback-Leibler discrimination information (Kullback, 1951;Kullback & Leibler, 1959. Note also that Hong (1996a) used the Kullback-Leibler information criterion as an entropy-based test to measure the divergence between the two spectral densities f ( ) and f 0 ( ) as follows Under the iid assumption of a t where m n → ∞, m n ∕n → 0, Hong (1996a) showed that where f n is the kernel spectral density estimates of f and defined from Equation (8), (1 − ∕n)(1 − ( + 1)∕n)k 4 ( ∕m n ), the lags of order > m n receive zero weight, and d ⇒  (0, 1) denotes convergence to a standard normal distribution. Hong (1996a) indicated that the distribution given in Equation (12) still hold if we replaced  Hong (1996aHong ( , 1996b demonstrated that the Daniell kernel is the optimal that maximizes the power of the proposed tests under alternative hypotheses. He showed that the Daniell kernel tests have better power than the portmanteau tests of Box and Pierce (1970) and Ljung and Box (1978). The kernel k(⋅) satisfies the following conditions: is the fourth joint cumulant of the distribution of {a t , a t+i , a t+j , a t+l } and defined as where {ã t } is a Gaussian sequence with the same mean and covariance as {a t }. i.e. {a t } is a fourth order stationary linear process with absolute summable coefficients and innovations whose fourth moments exist, the cumulant condition also holds.
Following the kernel-based normalized density approach in Hong (1996a), with the above mentioned assumption, one can show that −n(m + 1) −1 log | m | is equal to nI(f n ( ); f 0 ( )). In this respect, we propose the kernel-based portmanteau test statistic (1 − ∕n)(1 − ( + 1)∕n)k 4 ( ∕m), where k(⋅) is the estimated kernel which can be computed from Equation (13). It is worth noting that the test statistic * m is defined for all lags m < n regardless the ARMA order p + q.

Simulation study
In this section, we conduct numerical simulation experiments to explore the performance of the proposed portmanteau test statistic in finite samples. We investigate the accuracy of its approximation distribution in estimating the empirical type I error rates. We also conduct comparison studies between the empirical power of the classical Ljung-Box test, Q m , the data-adaptive weights test of Gallagher-Fisher, Q D , and the proposed test, * m , against some ARMA(p, q) models. The main objective of our simulation is to show that the proposed test has a good performance as Q m and Q D and more powerful in many cases. In our simulation, we consider the Daniell kernel-based weights suggested by Hong (1996aHong ( ,1996b. The simulations were run on a quad-core personal computer using the packages portes (Mahdi, Xiao, & McLeod, 2014) and WeightedPortTest (Fisher & Gallagher, 2012) that are available from the CRAN website (R Development Core Team, 2015).

Empirical size
Here we investigate the accuracy of the * m statistic by comparing the empirical significance levels with the corresponding nominal levels at several lags m and sample sizes n. The type I error rates at nominal levels 1, 5, and 10% have been evaluated under the fitted AR(1) models with different AR coefficients using the approximation distribution significance test based on 10 4 iterations. We also evaluate the empirical size of the proposed statistic as the lag increases in the case of a properly fitted ARMA(1, 1) model of sample size n = 150 based on 10 4 iterations. We use the Gaussian and uniform distributions to generate the sequence of innovations. The results are summarized in Figures  1-3 and Tables 1 and 2. Figure 1 shows the empirical rejection rate using the distribution of * m . The AR(1) with parameter = 0.7 is simulated 10 4 times for Gaussian series of lengths n = 50, 100, 200, and 500 and the empirical distribution of * m at lag m = 10 is compared with its asymptotic distribution using Q-Q plot with corresponding levels 0.01 and 0.05 × i where i = 1, 2, … , 19. Figure 1 shows that the empirical rate gets more closer to its nominal 5% level as n increases. In Figure 2, the empirical rejection rates and their 95% confidence intervals are shown for the lags m = 10, 15, 20, and 25 with 10 4 simulations based on the non-Gaussian (uniform) innovation series. The 95% confidence intervals are indicated by the vertical lines in Figure 2. From the plots in this figure, we see the empirical rate varies between 0.039 and 0.060 so that the test is not conservative. It can also be seen from Figure 2, that the empirical rate gets a little closer to its nominal 5% level as n increases. Although simulation results suggest that empirical size of the * m statistic based on the uniform innovations distribution worsens as m increases, it is not significantly different from the corresponding nominal level based on the Gaussian innovations distribution in most cases (Tables 1 and 2). This suggests that the proposed test statistic is robust under strong assumptions on the noise (iid innovations but not necessarily Gaussian).

Power comparisons
It is worth noting that, for ARMA models, Gallagher and Fisher (2015) demonstrated that the power of data-adaptive weights test statistic, Q D , is always larger than the power of the classical test statistic of Box and Pierce (1970) and in most cases it has a better performance than Ljung and Box (1978), Peña and Rodríguez (2002) and Peña and Rodríguez (2006). In this respect, we conduct a comparison study between the empirical power of the proposed test statistic and the empirical power of the Ljung-Box (Q m ) and the data-adaptive weights (Q D ) test statistics, against some ARMA(p, q) models. We use the same models originally used by Monti (1994) and which appears in several later articles (Fisher & Gallagher, 2012;Gallagher & Fisher, 2015;Peña & Rodríguez, 2002, 2006. Tables 3 and 4 provide the empirical powers for a nominal 5% level of Q m , Q D , and * m tests  when a series of 12 different ARMA(2,2) process (each of size n = 100) are inadequately fitted as an AR(1) and tested at lag m = 10. The results from Table 3 are obtained from data driven by Gaussian iid innovations, whereas the results of Table 4 are obtained from data with Uniform noise.
We see in the results of Tables 3 and 4 that the statistics Q D and * m are always the most powerful in comparison with the Ljung-Box test statistic. Generally, the simulation results show that the proposed test has as good performance as Q D and it provides more power in some situations.
Furthermore, the results suggest that the powers associated with data driven by Gaussian innovations are not significantly different from the corresponding powers associated with data driven by non-Gaussian innovations. Note: Data were driven by Gaussian iid innovations.

An illustrative application
Here, we are not interested in selecting the best fitted model, but the main objective of this application is to demonstrate that the proposed test statistic can be useful for investigating whether the autocorrelations of the residual ARMA model are different from zero.
In this section, we make use of the monthly Recruitment (number of new fish) series. Data is available from the R package astsa with the name rec ranging over the years 1950-1987 with 453 observations (Shumway & Stoffer, 2011). Although, Figure 4 suggests that the Recruitment series tend to exhibit repetitive behavior every 12 months, (Shumway & Stoffer, 2011, p. 108) indicated that the second-order (p = 2) autoregressive model might provide a good fit.
Following the recommendation of Shumway and Stoffer (2011), we purposely fit the AR(2) model for the Recruitment series, and then we investigate how the different portmanteau statistics detect the underfit at several lags (see Table 5). Using the 5% significance level and considering the lags

Conclusion
Most existing portmanteau test statistics are not consistent against serial correlation of unknown form. Moreover, these tests for ARMA(p, q) models are not defined in many cases where p + q is large and m is small. In this paper, we proposed a kernel-based consistent portmanteau test statistic for serial correlation of unknown form for the residual (i.e. when no prior information about the true alternative is available). Regardless of the ARMA order p + q, the proposed test is defined for all lags m < n, where n is series length. We derived the test statistic from the Hermitian Toeplitz autocorrelation matrix defined from Peña and Rodríguez (2002). Essentially, the proposed test statistic is the same as Peña and Rodríguez (2006); however, they have different distributions. The asymptotic distribution of the proposed statistic is derived as a normal distribution. We conducted simulation studies to demonstrate the efficacy in the power of the proposed statistic in finite samples. The simulation results suggest that the proposed test statistic is robust under strong assumptions on the noise (iid innovations but not necessarily Gaussian). Perhaps, the idea and procedures in this paper can be extended in several ways to check goodness-of-fit of other univariate and multivariate time series models. For instance, the methods of Francq and Raïssi (2007) and Francq, Roy, and Zakoïan (2005) could be modified using the ideas in this article to provide new portmanteau test statistics with higher powers to detect the underfit time series models when the underlying noise process is assumed to be uncorrelated rather than independent (weakly dependent white noise). It may be possible to implement the idea of the Toeplitz autocorrelation matrix, as defined by Peña and Rodríguez (2002), and the kernel-based normalized density method, as described by Hong (1996a), to derive a new portmanteau test statistic to check the adequacy of a fitted detecting long-memory nonlinear model as in McLeod and Li (1983).