Macroeconomic indicators alone can predict the monthly closing price of major U.S. indices: Insights from artificial intelligence, time-series analysis and hybrid models
Graphical abstract
Introduction
The prediction of stock prices has continued to fascinate both academia and business. The question driving early stock market research was: “To what extent can the past history of a common stock's price be used to make meaningful predictions concerning the future price of the stock?” [1]. The Efficient Market Hypothesis (EMH) [1] and the Random Walk Theory [2] provided a theoretical foundation for tackling this question. These models posited that stock prices cannot be forecasted since they are driven by new information and not present/past prices. Thus, prices will follow a random walk and cannot be predicted accurately.
There has been an increasing number of studies [3], [4], [5], [6], [7] that provide evidence contrary to what is suggested by the EMH and random walk theory. These studies show that the stock market can be predicted to some degree and therefore, questioning the EMH's underlying assumptions. Moreover, many practitioners refer to two main examples, which demonstrate that the stock market can be accurately predicted: (a) Warren Buffet's ability to consistently beat the S&P index [8], [9]; and (b) the successful prediction of the 2008 Stock Market crash based on the “housing bubble”, which was popularized by the New York Times Bestseller book (turned movie): “The Big Short: Inside the Doomsday Machine” [10].
The literature on stock market prediction can be divided, based on the type of prediction models used, into statistical time-series models and machine learning techniques [11]. Based on the review of [12], autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroscedasticity (GARCH) are the most commonly used time-series approaches for stock prediction. Despite their widespread use, limitations of these models include: (a) needing the model to be prespecified [13]; (b) an increased effect of estimation error as the models become more complex [13]; and (c) sub-par predictive ability when compared to machine learning methods [14], [15], [16]. On the other hand, the machine learning (ML) techniques can be categorized into (a) non-voting approaches, which include artificial neural networks (ANNs) [17], support vector machines (SVM) [18], [19], [20], and classification & regression trees (CART) [20], [21]; and (b) voting/ensembles [22], [23], [24] and hybrid methods [25], [26], [27]. The reader is referred to [14], [15] for detailed reviews of ML stock market prediction methods.
Based on the above discussion and the reviews of [12], [14], [15], [17], there are four important observations to be made. First, most (if not all) of the stock market prediction papers used some form of the previous price as a predictor/feature. In our estimation, this can be explained by the following logic: if the market can be predicted, then the previous price (or a feature based on it, e.g., through a technical indicator) should explain some of the variation in prices/returns. Second, only a small subset of ML papers considered using macroeconomic predictors [28], [29], [30], [13], [19] (see Table 1 for a summary of their contributions). In our estimation, this can be explained by the following: (a) majority of stock market prediction papers focus on short-term predictions, and (b) macroeconomic indicators are released, at best, monthly. Thus, any paper focusing on the short-term prediction cannot utilize an invariant predictor. The third observation relates to the papers discussed in Table 1. These papers typically showed that macroeconomic indicators can be strong predictors of price (when machine learning models are used). However, these papers generally: (a) had a single index, and (b) utilized both macroeconomic indicators and past prices as predictors so it is not clear how generalizable the results are (to other indices and whether macroeconomic indicators alone can be predictors of future prices). Fourth, the use of ensemble and hybrid-based approaches improves the prediction results through voting/averaging, which is an expected result based on the data mining literature. Based on these observations, this paper will investigate the utility of macroeconomic variables (including those highlighted in [10]) in predicting the one-month ahead price for major U.S stock and sector indices. The overarching hypothesis is that the price for different indices can be predicted fairly accurately by different economic indicators. Such effects will be quantified/validated using a novel soft computing approach.
In this paper, the main research questions are: (a) to examine whether macroeconomic factors can predict the 1-month ahead price of four major U.S stock indices (the Dow Jones Industrial Average Index, $DJI, the NYSE Composite Index, $NYA, the NASDAQ Composite Index, $IXIC, and the S&P 500 Index, $GSPC) and the nine U.S. sector indices; and (b) if the answer to question (a) is “yes”, then which factors are the most predictive to each index. To examine these research questions, a two-stage experimental-based framework is proposed.
The first stage is comprised of two main phases. In phase I, an automated data acquisition procedure is developed to capture the monthly values for the different macroeconomic factors (i.e., the independent variables) and closing price for different stock indices (i.e., the response variables). In phase II, four ensemble models and three time-series models are used for predicting the closing price of the different indices. The ensemble models chosen for the analysis are: (i) quantile regression random forest (QRF), (ii) quantile regression neural network ensemble (QRNN), (iii) bagging regression ensemble (BAGReg), and (iv) boosting regression ensemble (BOOSTReg). These have been chosen since they are the most commonly used ensembles for continuous predictions. The performance of these ensembles are then compared with ARIMA and GARCH models, as well as a deep long-term memory recurrent neural network (LSTM) for time-series forecasting [32] (see recent applications to stock predictions in [33], [34], [35]). If the overarching hypothesis in this paper is true (i.e., medium term index prices are driven by macro-economic factors), then one would expect that the performance of the ensemble methods would outperform the time-series methods since the information affecting the medium-term price is in the economy (and not contained completely in past prices).
To validate the insights gained from the first stage, a hybrid approach of the LSTM and the ensemble models will be constructed and utilized in the second stage. In the hybrid approach, the LSTM model (chosen given its nonparametric nature) is used to predict the closing price of the different indices (i.e., the same approach from stage 1). The residuals from this model are then used as a target for prediction (i.e., the dependent variable) for the four ensemble models, then the predictions from the LSTM model are adjusted by adding the corrections predicted by the ensembles. Just as the first stage, the macroeconomic indicators are used to predict the 1-month ahead residuals from the LSTM model. Thus, the hybrid approach is used to test the following secondary hypothesis: the errors/residuals from the time-series models are not completely random and can be explained by the macroeconomic indicators.
The remainder of this paper is organized as follows. Section 2 presents the macroeconomic factors that are used in this study and provides a justification for their selection. In Section 3, the proposed two-stage approach is detailed. The results of the experimental study for Stages 1 and 2 are shown in Sections 4 and 5, respectively. Finally, the main contributions of the paper, its practical implications, and suggestions for future work are highlighted in Section 6.
Section snippets
Justification for the macroeconomic indicators used
Researchers list several different macroeconomic factors that could potentially have an impact on stock market movements including oil prices [36], [37], [38], housing prices [39], interest rates [40], foreign markets [41], and inflation [21]. Ref. [42] explored the effects of important macroeconomic variables on stock market returns. From the results, they concluded that industrial production, risk premium change, yield curve twist, and inflation all have significant effects on the variability
Two-stage approach to demonstrate the utility of macroeconomic indicators in predicting monthly stock prices
Fig. 1 presents the process to build up the model. In Stage 1, the data from several different online resources are first collected. The data acquisition phase is divided into two main steps, where the dependent indices’ monthly closing prices and the independent macroeconomic predictors (used in the ensemble models) are obtained. Then, in phase II the variables are selected using three machine learning models and consolidated into one final set of features. Phase III compares the seven
Stage 1: Results and discussion
In this section, the Stage 1 results for the proposed method are presented. First, the phase I results are highlighted, where irrelevant and redundant features that do not contribute to, or have a minimal contribution to, the predictive models are identified. Then, the results of the seven prediction models (four ensembles and three time-series models are presented). The performance of these models are compared using three metrics as mentioned in Section 3.1.3. To facilitate the replication of
STAGE 2: Experimental results and discussion
In this section, the evidence supporting the secondary hypothesis: “the errors/residuals from the time-series models are not entirely random and can be explained by the macroeconomic indicators” is evaluated. As explained in Section 3, the results for the proposed hybrid Deep LSTM-Ensemble formulation are presented here. Recall that the residuals from the Deep LSTM model et are used as Target for the four ensembles analyzed. The predicted residuals are then used to correct the errors in the
An overview of the impacts and contributions of this paper
The overarching goal behind this paper was to investigate if macroeconomic indicators are drivers for the monthly prices of the main stock and sector indexes in the U.S. To investigate this hypothesis, a two-stage approach was proposed. The first stage was comprised of three phases. In phase I, the data from 01/1992 to 10/2016 was acquired, covering the monthly values of 13 major indexes and 23 potentially relevant macroeconomic indicators. Phase II involved the use of variable selection
References (80)
- et al.
Twitter mood predicts the stock market
J. Comput. Sci.
(2011) - et al.
Stock index forecasting based on a hybrid model
Omega
(2012) - et al.
The use of data mining and neural networks for forecasting stock market returns
Expert Syst. Appl.
(2005) - et al.
Surveying stock market forecasting techniques – Part II: Soft computing methods
Expert Syst. Appl.
(2009) Financial time series forecasting using support vector machines
Neurocomputing
(2003)- et al.
Forecasting stock market movement direction with support vector machine
Comput. Oper. Res.
(2005) - et al.
Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches
Decis. Support Syst.
(2010) - et al.
Predicting stock returns by classifier ensembles
Appl. Soft Comput.
(2011) - et al.
Tweet sentiment analysis with classifier ensembles
Decis. Support Syst.
(2014) - et al.
A hybrid approach by integrating wavelet-based feature extraction with mars and SVR for stock index forecasting
Decis. Support Syst.
(2013)
Predicting stock market index using fusion of machine learning techniques
Expert Syst. Appl.
A hybrid model based on differential fuzzy logic relationships and imperialist competitive algorithm for stock market forecasting
Appl. Soft Comput.
Using neural networks for forecasting volatility of s&p 500 index futures prices
J. Bus. Res.
Oil price shocks and stock market activity
Energy Econ.
Oil price shocks and stock markets in the us and 13 European countries
Energy Econ.
A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival
Decis. Support Syst.
Predicting heart transplantation outcomes through data analytics
Decis. Support Syst.
Evolving and clustering fuzzy decision tree for financial time series data forecasting
Expert Syst. Appl.
Using artificial neural network models in stock market index prediction
Expert Syst. Appl.
Applying artificial neural networks to prediction of stock price and improvement of the directional prediction index-case study of petr4, petrobras, Brazil
Expert Syst. Appl.
Machine learning the harness track: crowdsourcing and varying race history
Decis. Support Syst.
Time series forecasting using a hybrid arima and neural network model
Neurocomputing
A hybrid arima and support vector machines model in stock price forecasting
Omega
Comparison of two new arima-ann and arima-kalman hybrid methods for wind speed prediction
Appl. Energy
Support vector regression with chaos-based firefly algorithm for stock market price forecasting
Appl. Soft Comput.
A decision support system for stock investment recommendations using collective wisdom
Decis. Support Syst.
Measures of risk
J. Bank. Finance
The behavior of stock-market prices
J. Bus.
The Random Character of Stock Market Prices
The efficient market hypothesis and its critics
J. Econ. Perspect.
Constructivist and ecological rationality in economics
Am. Econ. Rev.
Social mood and financial economics
J. Behav. Finance
The financial/economic dichotomy in social behavioral dynamics: the socionomic perspective
J. Behav. Finance
Here's How Badly Warren Buffett Beat the Market
Buffett Beats the SP for the 39th Year
The Big Short: Inside the Doomsday Machine (movie tie-in)
Forecasting volatility in financial markets: a review
J. Econ. Lit.
Application of data mining techniques in stock markets: a survey
J. Econ. Int. Finance
An analysis of the performance of artificial neural network technique for stock market forecasting
Int. J. Comput. Sci. Eng.
Stock market prediction performance of neural networks: a literature review
Int. J. Econ. Finance
Cited by (31)
Predicting the core determinants of cloud-edge computing adoption (CECA) for sustainable development in the higher education institutions of Africa: A high order SEM-ANN analytical approach
2024, Technological Forecasting and Social ChangeStock market predictor using prescriptive analytics
2023, Materials Today: ProceedingsStructural rule-based modeling with granular computing
2022, Applied Soft ComputingAComNN: Attention enhanced Compound Neural Network for financial time-series forecasting with cross-regional features
2021, Applied Soft ComputingCitation Excerpt :With each of the stock markets opens and closes, their influence transfer from western hemisphere to the eastern hemisphere and finally affect the HSI trend in the next day. Besides, the other economic indicators worldwide also influence the financial market cyclically [26–28]. Thus, we mitigate the information insufficiency.
A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint
2020, Decision Support SystemsCitation Excerpt :In the context of ML methodologies, our framework represents an extension to the use of “hybrid” methodologies. Existing “hybrid” methods typically incorporate two or more approaches for the purposes of improving prediction accuracy [e.g., see 48]. However, our proposed framework introduces the idea of using a hybrid approach to constrain the predictions from the initial ML modeling stage, which can influence the development of other “hybrid” approaches where different criteria for calibration are to be enforced.