ΑΔΣ Advances in Decision Sciences

. Factor models (FM) are now widely used for forecasting with large set of time series. Another class of models, which can be easily estimated and used in a large dimensional setting, is multivariate autoregressive models (MAR), where independent autoregressive processes are assumed for the series in the panel. When applied to big data, the estimation, model selection and combination of both models can be time consuming. We assume both FM and MAR models are misspeciﬁed and provide a scoring rule which can be evaluated on an initial training sample to either select or combine the models in forecasting exercises on the whole sample. Some numerical illustrations are provided both on simulated data and on well known large economic datasets. The empirical results show that the frequency of the true positive signals is larger when FM and MAR forecasting performances diﬀer substantially and it decreases as the horizon increases.


Introduction
The recent fast growth in (real-time) big data allows researchers to model and predict variables of interest more accurately and suggests that there are large potential gains from using a big set of variables instead of a single univariate time series models in many inference applications. Consequently, new big databases have been developed and studied for economic problems, see, for example, Choi and Varian (2012); Varian (2014); Varian and Scott (2014); Einav and Levin (2014). Some leading examples in macroeconomics are Watson (2002, 2005); Bańbura et al. (2010); Koop and Korobilis (2013); Stock and Watson (2012). However, there are many issues still open when working with big datasets, including high-dimensional modeling and lack of inference efficiency. We refer to Granger (1998) for an early discussion of these issues, and to Litterman (1980); Sims and Zha (1998); Koop (2013) for a discussion from a Bayesian perspective.
In this paper we focus on two simple models that are widely used in forecasting: factor models (see, e.g. Stock and Watson, 2002, 2004, 2005Bańbura et al., 2010;Watson, 2014, 2012), with reduced number of factors (FM), and multivariate autoregressive models (MAR), where no interaction is assumed between the series in the panel, (e.g., see Penny and Harrison, 2006;Hytti et al., 2006). MAR models have been successfully used in various fields such as ecology (Ives et al., 2003) and neuroimaging (Harrison and Friston, 2003) and revealed their potential when applied to large set of time series. We refer to Gabriel Fagan (2006) for the use of time series models by the European Central Banks and to Reinsel (1983); Carriero et al. (2016); Cubadda and Guardabascio (2017) for extensions and recent applications to economics. In order to obtain some theoretical results and having in mind applications to very large set of time series, in this paper we will focus on a parsimonious specification of MAR. We assume a diagonal MAR model where lagged interactions among variables are excluded (Marcellino et al., 2006). The model is very simple and can be extended along various directions, for example by including, lagged interactions, exogenous covariates and interaction between covariates.
We derive a scoring rule for FM and MAR models. Various scoring rules have been proposed in the literature (Mitchell and Hall, 2005;Gneiting and Raftery, 2007;Ranjan, 2011, 2013;Lerch et al., 2016), but in order to preserve some analytical tractability of the comparison we consider mean square errors (MSEs) of the vectors of point forecasts generated with the two models. MSEs are the forecast error second moments and their trace is often used to evaluate the forecast accuracy. See Hendry and Martinez (2017) for a discussion and an alternative proposal in term of MSE determinants. We expect that the use of a reduced number of factors in the FM model and of prior parameter restrictions in the MAR model, may lead to model misspecification. Thus, in this paper we follow some forecast analysis studies, see, e.g., Schorfheide (2005), and score the two models under the assumption they are misspecified.
Our new model-specific scoring rule for FM and MAR models indicates that the forecasting performances of the models depend crucially on the parameters setting of data generating process. More specifically, the goodness of the forecasting depends on the level of simultaneous or lagged dependence between the series. The proposed scoring rule is well suited for big data applications since it can be used on a initial set of observations to choose the forecasting model on the remaining set of data. It can be generalized to models that include autoregressive and factor models features, such as the multivariate index-augmented autoregression (MIAAR) models of Cubadda and Guardabascio (2019). Also, the scoring rule can be applied to combine forecasts as well (e.g., see Bates and Granger, 1969;Diebold andPauly, 1987, 1990;Geweke and Amisano, 2010). See also Kapetanios et al. (2015); Bassetti et al. (2018) for generalized combination schemes and Billio et al. (2013); Pettenuzzo and Ravazzolo (2016) for forecast combination based on time-varying weights.
We provide an illustration of the theoretical results and study the reliability of the proposed scoring rule through some simulation experiments and three applications to widely used datasets. The first database includes the quarterly Stock and Watson (2004) series; the second one the monthly McCracken and Ng (2015) series; the third one a set of cross-country series downloaded from Bloomberg which represent a three-country extension (EU, US and Japan) of the previous two datasets. We find that our scoring rule provides detects ex-ante the more accurate model in all these exercises. The reliability of the proposed scoring rule increases when accuracy differs substantially across models and it decreases as the horizon increases.
The paper is organized as follows. Section 2 introduces some notation and the models discussed in this paper. Section 3 derives the scoring rule. Section 4 applies the proposed scoring rule to simulation exercises. Section 5 exhibits some empirical results on well studied macroeconomic datasets. Section 6 concludes.

Forecasting models
We introduce some notation and define the factor (FM), vector autoregressive (VAR) and independent multivariate autoregressive (MAR) models used in this paper. Let {x t } t≥0 be a n-dimensional real-value random process. In what follows, we assume the process is weak stationary and has zero mean and variance-covariance matrix E(x t x ′ t ) = Γ X and auto-covariance function E(x t x ′ t+j ) = Γ X,j , j ∈ Z with Γ X,j = Γ ′ X,−j . We denote with a i and λ i , i = 1, . . . , n, the eigenvectors and eigenvalues, respectively of Γ X such that Let A be the orthogonal matrix with the normalized eigenvectors a i , i = 1, . . . , n, in its columns and Λ the diagonal matrix with the eigenvalues λ i , i = 1, . . . , n on the main diagonal, where we assume the eigenvalues are in a decreasing order, i.e.
λ 1 ≥ λ 2 ≥ . . . ≥ λ n . It follows that the decomposition holds true. Under suitable conditions, the process {x t } t≥0 can always be represented by using a set of factors f n,t = (f 1,t , . . . , f n,t ) ′ and a factor loadings matrix A such that x t = Af n,t , t = 1, 2, . . . , T.
In order to obtain a representation of {x t } t≥0 on a lower dimensional space, we use the subset f k,t = (f 1,t , . . . , f k,t ) ′ of the first k latent factors in f n,t , with k < n, and assume that the factors have an autoregressive dynamics. In our empirical applications the k factors are used to forecast n * < n variables of interest, whereas our theoretical results are presented for n * = n, for the sake of simplicity and without loss of generality. Our FM model is defined by: is an idiosyncratic component with E(ξ t ) = 0, Cov(ξ t , ξ s ) = 0, t = s and V(ξ t ) = Σ ξ,t , and WN(0, Σ k ) denotes a white noise process with mean 0 and variance-covariance matrix Σ k . Furthermore, we assume the factors f k,t admit the infinite MA representation In the forecasting practice the factors f k,t are first extracted and then predicted out of sample with a dynamic model (e.g. a VAR model). Predictions are then used to generate forecasts for x t . The regression model x t = (I n ⊗ f ′ k,t )β k +ξ t is usually employed to recover relationship between dependent variables and the factors.
Remark 1. We shall notice that this model does not allow for extracting further information from the data than the one encoded in the matrix A k . The least square estimatorβ k of β k is equal to vec (A ′ k ) (see Appendix A). Thus the forecasting performance depends crucially on the predictability of the factors and the choice of the reduced number of factors k to use for forecasting the n * variables of interest.
The second forecasting model used in this paper is a MAR, which is defined as t = 1, 2, . . . , T , where Φ = diag{(φ 1,1 , . . . , φ 1,n ) ′ } is a diagonal coefficient matrix and Σ η = diag{(σ 2 1 , . . . , σ 2 n ) ′ } is a diagonal variance-covariance matrix. We assume the process {x t } t≥0 admits the infinite MA representation We will compare the FM and MAR models under the assumption they are misspecified. We assume that data generating process (DGP) for {x t } t≥0 comes from a VAR process of the first order with infinite MA representation The difference between a MAR model and a VAR relates to the assumptions on lags order and on Σ η . Appendix A describes the linkage between a factor model and a VAR(1) model.

Forecast accuracy
Let x j,t+s|t denote the s-step-ahead forecast for x t+s made at time t with a given model j, with j ∈ {k, m}, where k indicates the factor model and m the MAR model. We define the mean square forecast error (MSE) as The following theorems give the MSE for FM and MAR models and two useful decompositions, which will be used to evaluate the model's forecast accuracy. The MSE is derived under the assumption that the data generating process is a VAR process.
Theorem 1. Assume x t follows the model in Eq. 9. The MSE's trace for the FM can be decomposed as follows where e k,t+s|t = x t+s|t − x k,t+s|t .
Proof : see Appendix A.

Theorem 2. The MSE's trace for the MAR model is
where e m,t+s|t = x t+s|t − x * t+s|t and x * t+s|t is the forecast under the assumption of MAR dynamics and γ j,1 is the j-th element of the main diagonal of the MAR first order autocorrelation matrix Γ 1k = diag{(γ 1,1 , . . . , γ n,1 ) ′ } and σ 2 j is the j−th element of the main diagonal of the covariance matrix Proof : see Appendix A.
The following properties of the factors f k,t will be used to derive the main result of the paper. It can be shown that, under our assumption on the DGP, and the assumption on the factor loading matrix A, the sets of factors f n,t and f k,t satisfy the following conditions We assume both FM and MAR models are misspecified and provide in the following theorem a scoring rule which is a function of the FM and MAR parameters.
The rule can be computed on an initial training sample to either select or combine the models on the whole sample and in out-of-sample forecasting exercises. Also, the rule can be applied recursively in sequential forecasting exercises.
t+s and x * t+s be the value of the process under the assumption of MAR dynamics. If the following inequality is satisfied: Proof : see Appendix A.
The inequality shows that in the presence of misspecification one model is not always superior to the other in terms of MSE and its forecasting performance crucially depends on the covariance and auto-covariance structures in the series.
In the following sections, we show how to use the inequality to score the FM and MAR models or to combine their forecasts (see Billio et al. (2013)). Our empirical applications will show that the inequality can be successfully used in a context of large datasets.

Simulation results
We generate a dataset of 57 time series of 290 time observations each from a VAR model of the first order We consider two sets of experiments. In the first set we study the effect of the correlation between idiosyncratic error terms on the reliability of the proposed scoring rule. We assume the variance-covariance matrix is parametrized in the variance and correlation parameters, ̺ ∈ (−1, 1) and σ 2 , respectively, as follows where ι = (1, . . . , 1) ′ is the unit vector and I n the identity matrix. As regards the coefficients matrix, we assume where Y and X are two random matrices of dimension T × n with i.i.d. entries generated from a standard normal N (0, 1). Two experiment settings are considered: "weakly correlated noise" (̺ = 0.2), and "strongly correlated noise" (̺ = 0.7). In all settings σ 2 = 3. For each settings MAR and FM models have been estimated on 25 expanding window samples and forecasts generated for 12 steps ahead.
In the second set of experiments we study the effect of the lagged dependence between series on the reliability of our scoring rule. We assume the VAR coefficient matrix is parametrized in the variance and causality parameters, α and β, respectively, as follows where α and β are such that the process is weak stationary. We consider the following settings: α = 0.01 and α = 0.5 with β = (1 − α − 0.01)/n. The covariance matrix Σ ε is parametrized in ̺ and σ 2 as in the first set of experiments, with ̺ = 0.9 and σ 2 = 3. We evaluate the trace of the mean square error, tr(MSE j (s, i)), for all subsamples i = 1, . . . , 25, horizons, s = 1, . . . , 12, and models j ∈ {k, m}. For each pair (s, i), we compute the following scoring rule If C < 0 then FM is underperforming MAR. In our applications, evaluating U m and U k would require the estimation of a VAR on n variables, which might be not feasible for large n, thus forecasting error estimates are obtained with a VAR model on the n * variables of interest.
We study the reliability of the proposed scoring rule by evaluating the frequency of the true positives and false negatives in our 25 samples. More specifically, we count the proportion of times the ordering of the models induced by the scoring rule C agrees with the one induced by their MSE, that is We also compute the average performance In Fig. 1 we report the frequency, f (s), (vertical axis) at different forecasting horizons, s, (horizontal axis) for the three datasets (different rows). Our results show that the reliability of the scoring rule changes over the horizons and depends crucially on the dependence structure between the series. The left plot shows that larger simultaneous dependence, ̺ = 0.7, induces an increase in the reliability on the short-term horizons. In the right plot, one can see that larger lagged dependence levels, α = 0.5, has impact on both long-and short-term horizons.

Empirical results
We consider some well known datasets used in macroeconomics. The first one is an extension of the Stock and Watson (2005)  The second dataset is the one described in McCracken and Ng (2015) (NM dataset). The dataset includes 120 time series sampled at monthly frequency from September 1992 to November 2016 and covers most of the time series used in the previous database. It exhibits some important appealing features. First, it is updated monthly using the FRED database. Second, it is publicly accessible in an easy manner, facilitating comparison of related research and replication of empirical work. McCracken and Ng (2015) show that factors extracted from this dataset share the similar predictive content as factors based on the Stock and Watson (2005) dataset.
The third dataset used is a collection of 68 macroeconomics series from Bloomberg (BL dataset), sampled at monthly frequency from September 1993 to November 2017 for the US, EU and Japan. Precisely, it includes 31 variables for the US; 17 for EU; 18 for Japan; and the two exchange rates euro/yen and euro/US dollar. The list of variables is similar to the previous databases and contains different measures of core and headline prices; labor market variables; imports and exports; industrial production; consumption; sales; leading indicators; and several interest rates. See Tables 1-2  We report in Figure 3 the 25 sets of 12-step-ahead forecasts (red lines) for various US variables obtained by fitting FM (first line) and MAR (second line) models on the SW, NM and BL datasets. We report in Fig. 2 the sequences of generated forecasts from the two models for two variables of interest, industrial production and unemployment rate, for the twelve horizons and the three databases. Also, the forecasts for the EU and Japan variables are reported in Figure 4.

INSERT FIGURES 3-4 HERE
In the FM we choose the number of factors (column k in Tab. 3) such that it is the same in all datasets and the percentage of variance explained is above the 75% (column V in Tab. 3). We evaluate the mean square error expressions given in Th. 1-2, MSE j (s, i), for all subsamples i = 1, . . . , 25, horizons, s = 1, . . . , 12, and models j ∈ {k, m}.

INSERT TABLE 3 HERE
The average MSE over samples and horizon is reported in Tab. 3, whereas the horizon-specific MSE averaged over samples is reported in the first and second column of Fig. 5.

INSERT FIGURE 5 HERE
The main evidence is that the FM performs on average less accurately than the MAR in all three databases. The difference varies across database: average MSE for the FM equal to 0.097 versus average MSE for MAR equal to 0.094 in the SW database; average MSE for the FM equal to 0.039 versus average MSE for MAR equal to 0.034 in the NM database; MSE for the FM equal to 0.021 versus average MSE for MAR equal to 0.017 in the B database. In the NM the loss is economically important and up to 8%; difference is smaller for the SW database where the two models perform similarly. Looking to Fig. 5, the MAR outperforms on average the FM mainly at short horizons in the SW and NM database. Maximum and minimum errors of the two models are more comparable over horizons. In the case of the Bloomberg database, MAR provides more accurate forecasts more uniformly across horizons. Therefore, our finding is similar to recent literature on the lack of superior forecast abilities of the FM models, when used for some variables or forecast horizons, see, for example, Medeiros et al. (2018).
In all three cases, predictions of MAR models are less volatile and flatter than those of the FM (e.g., see trending behaviour of the forecasts in Fig. 2 Our results provide also some guidelines for improving the predictive ability of FM models. In the forecasting regression, factors should be chosen specifically for each predicted variable, by applying some model selection procedures, or by estimating factor loading with regularizing techniques proposed in sparse factor and principal component analysis (e.g., see Carvalho et al., 2008;Zou et al., 2011;Qi et al., 2013;Lan et al., 2014;Ročkovà and George, 2016). Also horizon-specific models could be considered and the dynamic properties accounted when extracting the factors. We leave these issues for further research.

Conclusion
This paper establishes a set of conditions to be satisfied for a factor model to overperform a multivariate autoregressive model in terms of mean square forecasting error. The condition results in a scoring rule that can be used to select between the two models or to combine them. Furthermore, in the paper we show the performance of the scoring rule in simulation exercises where both the factor model and the multivariate autoregressive model are misspecified. Then, the analysis continues with three well-known macroeconomic datasets. The empirical results show that the frequency of the true positive signals is larger when factor model and the multivariate autoregressive model forecasting performances differ substantially. It also documents that factor models are not providing more accurate forecasts uniformly across variables and horizons.
A Proof of the results in the paper A.1 Proof of the result in Remark 1 Let X be the T × n sample matrix and A k the matrix with the first k orthonormal vectors of X, and F k the associated T × k factor matrix such that The least square estimatorB k of B k in the regression model A.2 Linkage factor model and VAR(1) model Let x t be a n-dimensional real-value vector, with zero mean and stationary variancecovariance matrix. Then: where a i and λ i , i = 1, . . . , n are the eigenvectors and eigenvalues of Γ X and A is the (n × n) orthogonal matrix with the normalized eigenvectors a i , i = 1, . . . , n, in its columns. Then, it follows that with Λ the diagonal matrix with the eigenvalues λ i , i = 1, . . . , n. It follows that A ′ Γ X A = Λ; Γ X = AΛA ′ . The factors f n,t are computed as and the kth factor as where A k is the matrix of the first k column of A and A ′ A = I, A ′ k A k = I k . Therefore, we can computex which can proxy the true x k,t . Let now assume that x t is generated by a VAR(1) model: By pre-multiplying for A ′ k , we have: Then, we introduce a k such as By pre-multiplying for A k and recallingx k,t = A k f k,t , we find the VAR(1) representation:x

A.3 Proof of the result in Theorem 1
Let x t+s|t denote the best linear forecast under the DGP, that is then by applying the infinite MA representation of f k,t+s we obtain From the decomposition where e t+s|t = x t+s − x t+s|t , e k,t+s|t = x t+s|t − x k,t+s|t , it follows that

A.4 Proof of the result in Theorem 2
Decompose the MSE as follows (2005), p. 34, then Recall that E(η k,t+1 η ′ k,t+1 ) = Σ k , then 1,1 , . . . , γ k,1 ) ′ } is the first order autocovariance matrix and Γ k = E f k,t f ′ k,t the covariance matrix. By using Eq. A.21 and A.22 with k = n, we conclude that, the second term of the decomposition in Eq. A.20 can be written as

A.5 Proof of the result in Theorem 3
From the properties of the factors given in Section 3, it follows that k Γ k,1 and the factor mean forecasting error can be written as It follows that By applying the trace operator one obtains and using the decomposition in Th. 1 tr(MSE k (s)) = tr E e k,t+s|t e ′ k,t+s|t Finally, since n j=1 σ 2 j = n j=1 λ j and applying the decomposition in Th. 2 we obtain the inequality: if and only if tr(MSE k (s)) ≤ tr(MSE m (s)). In each panel, the forecasts for the SW (first column), NM (second column) and B (third column) datasets by applying FM (first line) and MAR (second line) models.  Figure 5: Estimates of the MSE in Th. 1-2 (solid line) and 5% and 95% quantiles, for the FM (first column) and MAR (second column) models. Ex-ante scoring rule performance (third column) measured as the frequency (vertical axis) of a true positive signal f (s), over different horizons s = 1, . . . , 12 (horizontal axis). Top: SW dataset. Middle: NM dataset. Bottom: BL dataset.