Sparse transition matrix estimation for high-dimensional and locally stationary vector autoregressive models

We consider the estimation of the transition matrix in the high-dimensional time-varying vector autoregression (TV-VAR) models. Our model builds on a general class of locally stationary VAR processes that evolve smoothly in time. We propose a hybridized kernel smoothing and $\ell^1$-regularized method to directly estimate the sequence of time-varying transition matrices. Under the sparsity assumption on the transition matrix, we establish the rate of convergence of the proposed estimator and show that the convergence rate depends on the smoothness of the locally stationary VAR processes only through the smoothness of the transition matrix function. In addition, for our estimator followed by thresholding, we prove that the false positive rate (type I error) and false negative rate (type II error) in the pattern recovery can asymptotically vanish in the presence of weak signals without assuming the minimum nonzero signal strength condition. Favorable finite sample performances over the $\ell^2$-penalized least-squares estimator and the unstructured maximum likelihood estimator are shown on simulated data. We also provide two real examples on estimating the dependence structures on financial stock prices and economic exchange rates datasets.


Introduction
Vector autoregression (VAR) is a basic tool in multivariate time series analysis and it has been extensively used to model the cross-sectional and serial dependence in various applications from economics and finance [31,3,4,10,32,14,34,13,1].There are two fundamental limitations of the vector-autoregressive models.First, conventional methods to estimate the transition matrix of a VAR model are based on the least squares (LS) estimator and the maximum likelihood estimator (MLE), in which the parameter estimation is consistent when the sample size increases and the model size is fixed [8,25].Since the number of parameters grows quadratically in the number of time series variables, the VAR model typically includes no more than ten variables in many real applications [18].However, due to the recent explosive data enrichment, analysis of panel data with a few hundred variables is often encountered, in which the LS and MLE are not suitable even for a moderate problem size.Second, stationarity plays a major role in the VAR model.Therefore, the stationary VAR does not capture the time-varying underlying data generation structures, which have been observed in a broad regime of applications in economics and finance [2,27].Motivated by the limitations of the VAR, this paper studies the estimation problems of the time-varying VAR (TV-VAR) model for high-dimensional time series data.Let X d×n = (x 1 , • • • , x n ) be a sequence of d-dimensional observations generated by a mean-zero TV-VAR of order 1 (TV-VAR(1)) where A(t), t ∈ [0, 1], is a d × d matrix-valued deterministic function consisting of the transition matrices A i := A(i/n) at evenly spaced time points and e i are independent and identically distributed (iid) mean-zero random errors, i.e. innovations.In this paper, our main focus is to estimate the transition matrices A i for the TV-VAR(1) and the extension to higher-order VAR is straightforward.Indeed, for a general TV-VAR of order k ≥ 1 imsart-ejs ver.2014/10/16 file: tvvar_Sep-29-2017_EJS1325.tex date: October 3, 2017 we can rewrite z i = (x i , • • • , x i−k+1 ) at time i = k, k + 1, • • • , n, as a TV-VAR(1) in the augmented space where . . .
, I d×d is the d × d identity matrix and 0 d×d is the d × d zero matrix.Then, we need only to estimate the first d rows in A i .
To make the estimation problem feasible for high-dimensional time series when d is large, it is crucial to carefully regularize the coefficient matrices A i .The main idea is to use certain low-dimensional structures in A i and the degree of freedom of (1.1) can be dramatically reduced.In our problem, we assume two key structures in A i .First, since our goal is to estimate a sequence of transition matrices, which can be viewed as the discrete version of a matrixvalued function, it is natural to impose the smoothness condition to A(t).In this case, (1.1) is closely related to the locally stationary processes, a general class of non-stationary processes proposed by [15].Examples of other linear and nonlinear locally stationary processes can be found in [11].In particular, let i 0 = 1, • • • , n, be any time point and t 0 = i 0 /n.Then, under suitable regularity conditions [16], there exists a stationary process xi (t 0 ) such that for all j = 1, where X ij and Xij (t 0 ) are the j-th element of x i and xi (t 0 ) respectively, and Therefore, x i is an approximately stationary VAR process xi (t 0 ) in a small neighborhood of t 0 .Second, at each time point i = 1, • • • , n, we need to estimate a d × d matrix.There have been different structural options in literature, such as the sparse [21,35,33,30], banded [19], and low-rank [22] transition matrix, all of which only considered the stationary VAR model.In this paper, we consider a sequence of sparse transition matrices and we allow the sparsity patterns to change over time.Note that our problem is also different from the emerging literature for estimating the high-dimensional covariance matrix and its related functionals of time series data [11,12,37,5], since our goal is to directly estimate the data generation mechanism specified by A(•).
To simultaneously address the two issues, we propose a hybridized method of the nonparametric smoothing technique and 1 -regularization to estimate the imsart-ejs ver.2014/10/16 file: tvvar_Sep-29-2017_EJS1325.tex date: October 3, 2017 sparse transition matrix of the locally stationary VAR.The proposed method is equivalent to solving a sequence of large numbers of linear programs and therefore the estimates can be efficiently obtained by using the high-performance (i.e.parallel) computing technology; see Section 2 for details.In Section 3, we establish the rate of convergence under suitable assumptions on the smoothness of the covariance function and the sparsity of the transition matrix function A(•).Specifically, the dimension d is permitted to increase sub-exponentially fast in the sample size n to obtain consistent estimation, i.e. d = o(exp(n)).In addition, we also prove that when our estimator is followed by thresholding, type I and type II errors in the pattern recovery asymptotically vanish in the presence of weak signals.In contrast with the existing literature on consistent model selection, we do not require the minimum nonzero signal strength to be bounded away from zero.Simulation studies in Section 4 and two real data examples in Section 5 demonstrate favorable performances and more interpretable results of the proposed TV-VAR model.Technical proofs are deferred to Section 6.
We fix the notations that will be used in the rest of the paper.Let M be a generic matrix, I ⊂ N is an index subset, [M ] * I and [M ] I * be the submatrix of M with columns and rows indexed by is the support of M and |S| is the number of nonzeros in the support set S. For q > 0 and a random variable (r.v.) X, X q = (E|X| q ) 1/q and we say that X ∈ L q iff X q < ∞.Write X = X 2 .Denote a ∧ b = min(a, b) and a ∨ b = max(a, b).If a and b are vectors, then the maximum and minimum are interpreted element-wise.

Method and algorithm
Therefore, for any estimator, say Âi , that is reasonably close to the true coefficient matrices A i , we must have naive estimator for A i would be constructed to invert the sample versions of (2.1).Estimators of this kind do not have good statistical properties in high-dimensions because of the ill-conditioning and the dependence information in A i is not directly used in estimation.If A i is known to be sparse as a priori, we may consider the following constrained minimization program Because Σ i,1 and Σ i,0 are unknown, the solution of (2.2) is an oracle estimator and therefore it is not implementable in practice.
Let A(•) be the continuous version of , and fix a t ∈ (0, 1).To estimate A(t), we first estimate Σ i,1 and Σ i,0 in (2.1) by their empirical versions.Let be the kernel smoothed estimator of Σ i,0 , Σ i,1 and Σ i,−1 , where w(t, m) are the nonnegative weights.Here, we consider the Nadaraya-Watson smoothed estimator where K(•) is a nonnegative bounded kernel function with support in [−1, 1] such that 1 −1 K(v)dv = 1 and b n is the bandwidth parameter.We assume throughout the paper that the bandwidth satisfies the natural conditions: b n = o(1) and n −1 = o(b n ).Then, our estimator Â(t) is defined as the transpose of the solution of Observe that (2.5) is equivalent to the following d optimization sub-problems βj (t) = arg min in that [ Â(t)] j * = βj (t) .Since the d sub-problems (2.6) can be independently solved, we can efficiently compute the solution of (2.5) by parallelizing the optimizations of (2.6).In addition, each sub-problem in (2.6) can be recasted as a linear program (LP) To establish an asymptotic theory for estimating the continuous matrix-valued transition function A(•) of the non-stationary VAR, it is more convenient to model the time series as realizations from a continuous d-dimensional process.
Following the framework of [17], such that for all m = 1, and the constant C > 0 does not depend on d, m, and n.
Note that the approximating stationary VAR process in Definition 3.1 generally depends on t.As suggested by the Yule-Walker equations (2.1), given a fixed time of interest, estimation of A(t) relies on an estimate of Σ t, .Consider = 0 and let Σ(t) = Σ t,0 be the matrix-valued covariance function at lag-zero.With a finite number of observations x 1 , • • • , x n , it is unclear how to define the covariance function Σ(•) off the n points t i = i/n.Nevertheless, the weakly locally stationary VAR processes provide a natural framework for extending Σ(t i ) to Σ(t) for all t ∈ (0, 1).
Since A(•) is continuous on [0, 1], the stationary VAR process in (3.1) is defined for all t ∈ (0, 1).Let Σ(t) = E[x m (t)x m (t) ].By (3.2) and the Cauchy-Schwarz inequality, we have for each where the constant C here is uniform in j Letting n → ∞ and by the continuity of A(•), we can extend Σ(t) = Σ(t) for all t ∈ (0, 1).Similar extension can be done for Σ t, for = ±1.In Section 3.2, it will be shown that the asymptotic theory of estimating A(t), t ∈ (0, 1) depends on the smoothness of the weakly locally stationary VAR processes only through the smoothness of Σ(t) and therefore A(t).

Rate of convergence
In this section, we characterize the rate of convergence of our estimator (2.5) under various matrix norms.We assume d ≥ 2. To study the asymptotic properties of the proposed estimator, we make the following assumptions on model (1.1).
1.The coefficient matrices are sparse: let 0 where 2. The marginal and lag-one covariance matrix processes {Σ 0 (t)} t∈[0,1] and ) is the class of functions defined on [0, 1] that are twice differentiable with bounded derivatives uniformly in j, k = 1, • • • , d. 3. The random innovations e i = (e i1 , • • • , e id ) have iid components and sub-Gaussian tail: Before proceeding, we discuss the above assumptions.(3.3) requires that the transition matrices are sparse in both columns and rows at all time points.A similar matrix class defined by (3.3) was first proposed in [7] for symmetric matrices and it has been widely used for estimating high-dimensional covariance and precision matrix; see e.g.[9,11].If α = 0, then the maximum number of nonzeros in columns and rows of A i is at most s.Assumption 2 requires that the marginal and lag-one covariance matrices evolve smoothly in time.The smoothness is not defined directly on A(•) for the ease of theorem statements.In view of (3.2), Assumption 2 is implied by the smoothness on A i under extra regularity conditions.For a generic matrix M (t) parameterized by t ∈ (0, 1), we let Ṁ (t) and M (t) be the first two element-wise derivatives of M (t) w.r.t.t. and ) function and therefore Assumption 2 is fulfilled.
Assumption 3 specifies the tail probability of the innovations e i .In [21], e i 's follow iid N (0, Ψ) for some error covariance matrix Ψ.A simple transformation by Ψ −1/2 will reduce to the case that e i has iid components with the standard normal distribution, a special case of Assumption 3, which covers the sub-Gaussian innovations.
Then, with probability at least 1−2d −1 , we have the estimator Âi with tuning parameter From Theorem 3.2, the bandwidth b n ((log d)/n) −1/5 gives the optimal rate of convergence and the resulting tuning parameter τ ≥ C(1+M d )((log d)/n) −2/5 .When d is fixed, it is known that the optimal bandwidth in the nonparametric kernel density estimation is n −1/5 for twice continuously differentiable functions under the mean integrated square error.So in the high-dimensional context, the dimension only has a logarithm impact on the choice of optimal bandwidth.

Pattern recovery
We also study the recovery of time-varying patterns using the estimator (2.5).Let S i = supp(A i ) be the nonzero positions of A i .If the nonzero entries of A i are small enough, then it is impossible to accurately distinguish the small nonzeros and the zeros.Therefore, the best hope we can expect is that the nonzero entries of A i with large magnitudes can be well separated from the zeros in A i .Let where τ is determined in Theorem 3.2.We use the thresholded version of (2.5) as an estimator of Theorem 3.3 states that, with high probability, the zeros in A i can be identified and the nonzero entries in A i with strong signal strength above 2u can be recovered by Ŝi .Therefore, the false positives (type I error) of the estimator (3.10) are asymptotically controlled; see Theorem 3.5 for precise statement.However, Theorem 3.3 does not provide too much information regarding the false negatives since there is no characterization of the signal strength in (0, 2u ).
Let β > 0 and u 0 ∈ (0, 1).We introduce the following d × d matrix class , the parameters β and L d control the proportion of small entries in the support of A. If β is large and L d grows slowly, then the fraction of weak signals in A is small and therefore the false negatives (type II error) can also be well controlled.Below, we shall give such an example.
Example 3.1 (A spatial design).Let 0 < r < 1.Consider a d × d symmetric matrix A = (A mk ) d×d that is generated by the covariance function of a spatial process Z 1 , Z 2 , ..., Z d , which is a random vector that is observed at sites h where and f is a real-value covariance function.Here, we consider the rational quadratic covariance function [11,36] (3.14) In this example, and For any fixed distance parameter r ∈ (0, 1), the weak signal parameter L d has a natural dependence on γ.If γ is smaller, then the covariance function f decays to zero slower and there is a less fraction of weak signals in A. This can allow L d to grow slowly in d.Note that class G α,β (s, M d , L d ) is much less stringent than the widely used condition for support recovery and model selection in literature, which requires that the minimal nonzero signal strength is uniformly bounded away from zero [28].To quantify the error in the pattern recovery, we use the following two error rate measures.Definition 3.2.The false positive rate (FPR) and false negative rate (FNR) of Ŝi are defined as By convention, if S c i = ∅, then FPR i = 0; if S i = ∅, then FNR i = 0.If FPR i = FNR i = 0 with probability tending to one, which is a very strong requirement, then we have the pattern recovery consistency P( Ŝi = S i ) → 1. Below, we show that both FPR and FNR are asymptotically controlled in the presence of weak signals.
Theorem 3.5.Assume that Assumption 1,2,3 are satisfied.Fix an i ∈ Since u = o(1), the FNR vanishes with probability tending to one if

Simulation studies
In this section, we present some numerical results on simulated datasets.We compare the following five methods: (i) Our TV-VAR estimator (2.5).

Data generation
We consider different setups of n = 100 and d = 20, 30, 40 and 50.For each setup (n, d), the data are generated by the following procedure.First, the baseline coefficient matrices A 01 and A 02 are generated by using the sugm.generator() in flare R package [24].We consider four graph structures defined in flare: hub, cluster, band and random for A 01 and A 02 .Examples of these four structures are shown in Appendix B. Then we normalize ρ(A 01 ) = 0.2, ρ(A 02 ) = 1 and smoothly interpolate on the intermediate values Following [21], we specify the innovation covariance matrix Ψ = Σ − A 01 ΣA 01 where Σ = I d .[21] showed that the choice of Σ does not affect the numeric performance significantly.In our simulation studies, we use the Epanechnikov kernel K(v) = 0.75(1 − v 2 )I(|v| ≤ 1) and fix the model order k = 1.The bandwidth b n is set to be b n = 0.8n −1/5 for (i), (iii) and (iv).

Tuning parameter selection
For the tuning parameter selection in (i)-(iv), we propose a data-driven procedure that minimizes the one-step-ahead prediction errors as follows.
1. Choose a grid for the tuning parameter (say τ ) and the number n 1 of training data points.2. For each τ , perform the one-step-ahead prediction on the testing set by estimating A t with Ât (τ ) and then predict X t by Xt = Ât (τ )X t−1 , where t = n 1 +1, ..., n.Then calculate the prediction error at time t as Err From Table 1 to Table 4, larger d often results in larger errors.In general, the unstructured MLE performs the worst almost under all matrix norms.Although the ridge method shrinks the coefficients in the transition matrix to zeros, the values of those coefficients are not exactly zeros.Thus, the estimated transition matrices by the ridge method are not sparse which lead to higher estimation errors compared with the TV-VAR, time-varying lasso and stationary VAR.Stationary VAR always perform worse than TV-VAR and time-varying lasso, since it cannot capture the dynamic structure.The time-varying lasso performs better than any other methods except TV-VAR.The proposed TV-VAR performs the best almost under all matrix norms.

Pattern recovery
For the TV-VAR, we also report the FPR t,l,nn and FNR t,l,nn in (3.17We also calculate u by using the true value of Σ in (3.9) and the resulting ROC curves are shown in Appendix C. Those ROC curves are similar to those based on u = 10 −3 .

Finance data: stock prices
In this section, we compare the aforementioned estimators on a real financial dataset.The dataset is from Yahoo! Finance (finance.yahoo.com).The data matrix contains daily closing prices of 452 stocks that are consistently in the imsart-ejs ver.2014/10/16 file: tvvar_Sep-29-2017_EJS1325.tex date: October 3, 2017   S&P 500 index between January 1st, 2003 and January 1st, 2008.We choose such time range to avoid the effects of the two financial crises in the year of 2001 and 2008, which could make the stock prices non-smooth with sharp drops and pick-ups.In total, there are 1,258 time points.We first standardize the data to zero mean and unit variance and detrend the data.We then fit the detrended data for those stocks which are most smoothed (without obvious change points) using AR(1) model and perform the Ljung-Box Tests by using Box.test() in R. We keep those stocks with nonzero coefficients at significant level 0.05.30 stocks are finally selected and 10 of them are shown in Table 9.The ten selected stocks in Table 9 are from six sectors including: Consumer Staples, Consumer Discretionary, Industrials, Financials, Utilities and Energy.The resulting data matrix is denoted as X ∈ R 1258×30 .We use the Priestley-Subba Rao (PSR) stationarity test (such as stationarity() in R package fractal) to some of the stocks such as Kellogg Co. and Target Corp. to find that these time series are not stationary at significant level 0.05.Thus, it is inappropriate to model the data with the stationary VAR model in [21].Finally we fit the data X into the sparse TV-VAR model with order k = 1, stationary VAR [21] with order k = 1, time-varying lasso method [23], time-varying ridge method [20] and time-varying MLE.
In order to compare the performance of the five methods, we consider the one-step-ahead prediction for Xt , t = 1159, ..., 1258, by using XJt, * , where J t = {j : t−1158 ≤ j ≤ t−1} as the training set.We estimate the transition matrices Ât (λ) where λ is the regularization parameter such as the τ in the TV-VAR or the shrinkage parameter in the ridge method.We use the Epanechnikov kernel K(v) = 0.75(1 − v 2 )I(|v| ≤ 1) with bandwidth b n = 0.3, which means that only the last 30% data in the training set are used for prediction for the TV-VAR, time-varying lasso method, time-varying ridge method and time-varying MLE.Since stationary VAR [21] model uses all training data, in order to make fair comparisons, we only treat the last 30% time points (347 observations) in XJt, * as the training set for the stationary VAR.The averaged prediction error for a specific λ is measured by: The smallest averaged one-step-ahead prediction errors min λ * Err(λ) for five methods were shown in Table 5.In Figure 2, we show an example for the predicted closing prices from the five methods and the detrended true closing prices of Target Corp.Clearly, the methods with time-varying structures perform similarly but outperform stationary VAR.
One advantage of the TV-VAR compared with the stationary VAR is that it can capture the time-varying data structures.If we treat a transition matrix as an adjacency matrix for an undirected weighted graph, we can visualize the graph structures at different time points.the edges represent the cross-sectional dependence between stocks.Bolder lines indicate stronger dependence and the stocks in the same sector are in the same color.In order to illustrate the time-varying structure clearly, we only consider the 10 stocks in Table 9.From these four figures, we observe that stocks in the same and closely related sectors are often connected.For examples, CME Group Inc. and Hartford Financial Svc.GP from Financials sector show a dynamic dependence structure, and Boeing Company in Industrials shows the consistent correlation with Exxon Mobil Corp. in Energy sector.

Economic data: exchange rates
We also apply the exchange rates data on international currencies and study how the exchange rates evolve over time.We use the exchange rates data from the Federal Reserve Bank at St. Louis and choose the date range from January 1st, 2003 to January 1st, 2008 to avoid the financial crises' effects on exchange rates.In total, there are 1,303 time points.We choose 15 major currencies from the continents including Europe, Australia, North America, Asian, South America and we normalize each currency by the exchange rate of the U.S. dollar shown in Table 10.As in Section 5.1, we standardize and detrend the data.The exchange rates for all currencies to the U.S. dollars are smoothed and in addition the PSR test rejects the stationarity hypothesis of the exchange rate Euro/U.S. at the significant level 0.05.Thus, stationary VAR is also not appropriate in this application, which is also supported by our empirical findings in Table 6 and Figure 4. We compare the performance of the five methods by the one-stepahead prediction errors for the last 50 data and use the same prediction metric in Section 5.1.The bandwidth here is b n = 0.3.We do not include the prediction curve of the stationary VAR in Figure 4 since its errors are too large.Table 6 and Figure 4 show that methods with time-varying structure performs similarly but much better than the stationary VAR.
Then, (3.4) follows from the assumption that sup t∈[0,1] |A(t)| 1 < 1 and that Σ(t) is symmetric.(3.5) follows from the differentiation of Σ(t) w.r.t.t.Lemma 6.1.Let A, B, C be matrices of compatible dimension for the product ABC.Then Proof of Lemma 6.1.The first and second inequalities follow from The third inequality follows from the second one by considering (ABC) = C B A and Recall the TV-VAR(1) model x i = A i x i−1 + e i .The following key lemma presents a large deviation bound for the marginal and lag-one autocovariance matrices with sub-Gaussian innovations.Lemma 6.2.Suppose ρ = sup i≥0 ρ(A i ) < 1 and for some absolute constant C 0 < ∞.If e i = (e i1 , • • • , e id ) has iid sub-Gaussian components, i.e. e ij q ≤ C 1 q 1/2 for all q ≥ 1, then, for any fixed t ∈ [b n , 1−b n ], we have with probability at least Then, the bias part is controlled by Now, we deal with the stochastic part I. Let Y Then, sup i ρ(B i,m ) ≤ ρm and we have the movingaverage (MA) representation of )) be n × n diagonal matrix, and .
Then, we can write .
Observe that ( B(j) ) W t B(k) has the same nonzero eigenvalues as the n × n matrix W we get and by the definition of matrix spectral norm Therefore, (*) is bounded by By the Cauchy-Schwartz inequality and the spectral norm bound sup i ρ(B i,m ) ≤ ρm , we have imsart-ejs ver.2014/10/16 file: tvvar_Sep-29-2017_EJS1325.tex date: October 3, 2017 and ρ(W Therefore, we have that for any x > 0 sup Now, by (6.2) and using the union bound applied to (6.3), we obtain that there exists a constant C which only depends on ρ, C 0 , C 1 such that holds with probability ≥ 1 − 2d −1 .Similar argument applied to m = 1 shows that the lag-one autocovariance matrix obeys the same bound in (6.1).
Proof of Lemma 3.4.Denote l = d/2.For 0 < α < 1, note that max Then, we have from which the theorem follows.
), where l = 1, .., 30 which are the indexes of 30 possible tuning parameters from 0.001 to 0.45.Then, we calculate the averaged FPR l = t,l,nn }.Following[9], we set the u to 10 −3 , which is considered to be numerical nonzero.The ROC curves for all possible values of the sparsity control parameter are plotted in Fig 6a, Fig 6b, Fig 6c and Fig 6d.Based on the ROC curves, the TV-VAR method has better discrimination power for band or random patterns than hub or cluster patterns.

Fig 2 :
Fig 2: Comparison of the predicted closing prices and the true detrended closing prices of Target Corp.
By Theorem 3.2, the maximal fluctuation | Âi − A i | ∞ is controlled by u with probability at least 1 − 2d −1 .So if we apply a thresholding procedure for Âi at the level u , then we expect that the zeros and non-zeros with magnitudes larger than 2u in A i can be identified by the thresholded support of Âi .Precisely, we have the instantaneous partial recovery consistency.

Table 1
Comparison of estimation errors under different setups.The standard deviations are shown in parentheses.Here ρ, F , 1 and ∞ are the spectral, Frobenius, 1 and ∞ matrix norm resp.The pattern of transition matrix is 'hub'.

Table 2
Comparison of estimation errors under different setups.The standard deviations are shown in parentheses.Here ρ, F , 1 and ∞ are the spectral, Frobenius, 1 and ∞ matrix norm resp.The pattern of transition matrix is 'cluster'.

Table 3
Comparison of estimation errors under different setups.The standard deviations are shown in parentheses.Here ρ, F , 1 and ∞ are the spectral, Frobenius, 1 and ∞ matrix norm resp.The pattern of transition matrix is 'band'.

Table 4
Comparison of estimation errors under different setups.The standard deviations are shown in parentheses.Here ρ, F , 1 and ∞ are the spectral, Frobenius, 1 and ∞ matrix norm resp.The pattern of transition matrix is 'random'.

Table 5
The prediction errors for the five methods on the stock price data.The standard deviations are shown in parentheses.

Table 6
The prediction errors for the five methods on the exchange rates data.The standard deviations are shown in parentheses.