Testing big data in a big crisis: Nowcasting under Covid-19

During the Covid-19 pandemic, economists have struggled to obtain reliable economic predictions, with standard models becoming outdated and their forecasting performance deteriorating rapidly. This paper presents two novelties that could be adopted by forecasting institutions in unconventional times. The first innovation is the construction of an extensive data set for macroeconomic forecasting in Europe. We collect more than a thousand time series from conventional and unconventional sources, complementing traditional macroeconomic variables with timely big data indicators and assessing their added value at nowcasting. The second novelty consists of a methodology to merge an enormous amount of non-encompassing data with a large battery of classical and more sophisticated forecasting methods in a seamlessly dynamic Bayesian framework. Specifically, we introduce an innovative “selection prior” that is used not as a way to influence model outcomes, but as a selection device among competing models. By applying this methodology to the Covid-19 crisis, we show which variables are good predictors for nowcasting gross domestic product and draw lessons for dealing with possible future crises.

This section provides additional details about the data set and detailed information on the transformation applied for each variable. We select data starting in January 1995 until the most recent release available. Our data set is updated with the most recently available information every week and the variables observed at weekly or daily frequencies are aggregated by taking the monthly averages. Most data are publicly available, few data series are confidential and were provided by internal sources of the European Commission 1 . Table A.1 reports the variables included as additional regressors in our models falling under the "fat" data category. We consider stock and volatility indexes to proxy the present state of financial markets. We crawl the complete DBnomics 2 data sets to extract, at monthly frequency, all financial and macro-economic variables related to the countries under analysis. Moreover, we include the complete list of variables described in Schumacher (2016). Table A.2 describes all the regressors defined as "big" data, namely variables extracted from alternative sources that are non commonly used in economic forecasting (e.g., air quality, mobility and news indicators among others). This type of data has three main advantages: (i) they are commonly observed at a higher frequency (e.g., daily) than standard official economic statistics, (ii) they are released in real-time, with short or no publication delay and no later revision, (iii) they may provide early warnings when a rapid deterioration of economic conditions occurs. However, the signal extracted from these alternative data sources is often noisy and its relevance for forecasting purposes is harder to evaluate. Furthermore, alternative data are available with separate starting dates, raising doubts on how to properly compare different models across time points.
The majority of the big data variables in Table A.2 are publicly available and collected from published sources: for instance, aviation figures are collected from Iacus et al. (2020), mobility information based on mobile phone data come from Santamaria et al. (2020) or text-based sentiment indicators are downloaded from Barbaglia et al. (2021). Among all the variables listed in Table A.2, the GDELT indicators and Google trends are the only big data that are a novel addition to the final data set. From GDELT we extract media attention, sentiment and emotion indicators belonging to five main topics: macroeconomics and structural policies, economic growth, social protection and labour, macroeconomic vulnerability and debt, and disease. The GDELT platform 3 collects real-time news stories worldwide and,  by using state-of-the-art natural language processing techniques (see Leetaru and Schrodt, 2013), extracts themes according to popular domain expert topical taxonomies and retrieves sentiments and emotions from news. From this vast amount of data, we select only the narratives from newspapers belonging to the four countries of interest and focus on articles having at least two keywords related to each specified theme. We have used the World Bank Topical Taxonomy to understand the primary focus (topic) of each article and select the relevant narratives. From this subset of news, we construct three different sets of indicators. The first one captures media attention through the five topics mentioned above (news volume). For each country and for each topic, our measure is the count of the total number of stories focusing on each specific theme normalized by the overall number of stories published in a country. The second set of indicators provides the tonality measures of the selected news calculated using a generalist GDELT built-in dictionary and three dimensions of the Loughran and McDonald (2011) dictionary: positive, negative and uncertainty. We normalize these metrics by the overall number of stories published in a country and the number of news related to the topics of interest. The last set of indicators includes the emotional connotation of the selected narratives. We collect the word count of emotions belonging to two dictionaries, namely the Regressive Imagery dictionary of Martindale (1987) and the WordNet Affect of Strapparava and Valitutti (2004). From the first dictionary we consider only its anxiety dimension, while from the second one we select the following dimensions: anger, contempt, disgust, fear, happiness, sadness and surprise. We also retrieve the happiness score proposed by Dodds et al. (2014). All the emotional measures are normalized by the overall number of stories published in a country and the number of news that are associated with the topics of interest.

Model-specific transformations
In our application, we rely on BMA to produce the final forecasts based on the individual predictions provided by our set of models. Each of the underlying models requires specific treatment on the input and deals with the issues associated with imperfect data structures (e.g., times series with different start dates, missing values or "ragged edge" as in Wallis, 1986) in different ways. As each of the models deals with input data differently, we decided not to apply any transformation to the data set, except for the ones suggested in the original work. For instance, the text-based indicators by Barbaglia et al. (2021)  have mean zero and variance one. Here below we now provide detailed information on the transformation applied to the input data by each model.
For the ARDL unrestricted equations we adopt several transformations for each big data variable. We consider each transformation of a variable as an additional regressor, therefore in different equations the same variable may enter in levels, quarter-on quarter or monthon-month growth rates, and with different lags (up to three months). Non-stationary specifications are dropped. Estimation samples may therefore vary, depending on variable and transformation. Ragged edge issues are dealt with by using different lags, and dropping those models where the transformed indicator is not available.
The DFM picks up, for each country in the analysis, only 20 variables at monthly frequency. More precisely, the model contains: six text-based sentiment indicators by Barbaglia et al. (2021) related to the overall state of the economy, financial sector, industrial production, inflation and monopoly (these are daily variables that are aggregated monthly by averaging); six indicators produced by the Eurostat such as construction, consumer, industrial, retail and service confidence indicators and an economic sentiment indicator; five surveys from the European Commission which are the composite PMI output index, the construction PMI total activity index, manufacturing PMI new orders index, manufacturing PMI index, services PMI business activity index; two confidence indicators from OECD namely the consumers opinion surveys and the business tendency surveys (manufacturing). These variables enter in the model without any transformation since they are stationary by construction (see European Union, 2006 and the appendix of Giannone et al., 2009 for more details). For all the variables we consider a sample period which starts in 2000 and goes till the last monthly available observation. We deal with the ragged edge issue by imputing the last available data to missing observations. In the estimation procedure, no missing values are allowed.
The MG-MIDAS, thanks to the GLSS step, is able to use all the available monthly variables that have no missing observations. Variables are stationarized by differencing if an Augmented Dickey-Fuller test is accepted. This conservative procedure could result in overdifferentiation because we are not considering any correction for multiple testing (e.g., Bonferroni), but the model inference remains valid. The lags considered for the MIDAS weights are selected estimating different models for different set of lags and then selecting using Bayesian Information Criteria. Different variables could enter in the different lagged models according to their missing value structure. This takes also into consideration the ragged edge issue.
The MF-BVAR considers only a limited subset of the available variables, namely GDP, unemployment rate, CPI, the business and consumer confidence indicators, the PMI activity indicator and text-based sentiment measure about the overall state of the economy. Following Schorfheide and Song (2015), the variables enter the model in log levels, with the exception of the unemployment rate which is not log-transformed. The series that are available at weekly or daily frequency are monthly aggregated by averaging. In this way, variables that enter the MF-BVAR are either quarterly or monthly. We select the input variables such that no missing values are present at the begging of the sample. As it regards the presence of ragged edges, we fill the monthly series with missing data as in Ankargren and Jonéus (2021).
The ML models consider the full data set as input. The variables are aggregated at quarterly frequency by averaging and taken as percentage returns (e.g., the target variable GDP is log-transformed and taken in first difference). We expand the cross-section of the input data by adding a one-quarter lag observation for each variable. As it regards the presence of incomplete data structures, each ML model deals with missing values in a different way 4 . For instance, NN deals with missing entries by mean imputation, while RF and XGB consider missing values as separate categorical labels.
Appendix B. Model predictive likelihood Figure B.1 reports the predictive likelihood attached to each model listed in the main paper, namely autoregressive distributed lag models (ARDL), mixed-data sampling regressions (MIDAS), mixed-frequency Bayesian vector autoregression (MF-BVAR) and dynamic factors models (DFM), with some machine learning (ML) forecasting models. At each point in time, the predictive likelihood of each model is measured and reported (after normalization), so that a bar twice as big as another also reflects a predictive likelihood ratios of two among model types. The models' weights seems to be relatively stable before the pandemic, while the COVID-19 crisis imposes a different weighting scheme. On the one hand, in 2020 we observe higher weights attached to the ML and DFM models, which include big data variables as additional regressors and are able to more easily fit the non-linear shock imposed by the pandemic on the national economies. On the other hand, the weights attached to the MF-BVAR, MIDAS and the BMA equations are relatively smaller in 2020: while these models are dominant in normal times, their performance is drastically reduced with the pandemic breakthrough.