Oil Price Forecasting Using FRED Data: A Comparison between Some Alternative Models

: This paper investigates the forecasting accuracy of alternative time series models when augmented with partial least-squares (PLS) components extracted from economic data, such as Federal Reserve Economic Data, as well as Monthly Database (FRED-MD). Our results indicate that PLS components extracted from FRED-MD data reduce the forecasting error of linear models, such as ARIMA and SARIMA, but produce poor forecasts during high-volatility periods. In contrast, conditional variance models, such as ARCH and GARCH, produce more accurate forecasts regardless of whether or not the PLS components extracted from FRED-MD data are used.


Introduction
Energy consumption is required to power the worldwide economy, including most human activities [1]. Accordingly, nearly all worldwide industrial production activities, transportation systems, economic development projects, and machinery and equipment rely upon the energy production and distribution of oil, natural gas, coal, electricity, biofuels, waste, and other sources of supply and their derivatives to enable all of these integrated components of the worldwide economy to function [2]. This also means that energy prices that fuel all parts of the supply chain distribution system are a vital fraction of the price of every commercial product that requires shipping and electrical products to function [3].
Because of this dependence, unexpected steep increases in energy prices can suppress economic growth. They could generate inflation in countries that are net importers of oil, depending upon circumstances, duration, the type of oil, existing inventory, the size of the nation, energy substitution capability, and the number of industries affected [4,5]. Conversely, substantial energy price drops create serious budgetary problems for oilexporting countries [6]. At present, oil, which is by far the most highly consumed and concentrated energy source, provides somewhere between 30-35% of all of the energy needed to run the global economy, which is 5-10% more than coal, 10-15% more than natural gas, and 25-30% more than renewable and nuclear energy [7].
Despite the importance of energy in human activities, there needs to be more consensus on the roles of energy consumption and energy pricing mechanisms [8]. Historical examples such as oil demand shocks in the price of oil in 1973/74, 1979/80, 1986, 1997/2000, and 2003/2008 are examples of complex events that have both stimulating and depressing effects on national economies. There is also a lack of consensus about the predictive accuracy of energy commodity pricing models, e.g., [9,10]. The lack of understanding of how oil prices are developed and structured has resulted in unexpected, international high and low oil price volatility, spikes, and supply and demand shocks. Price volatility has a significant impact on global financial markets, economic development activity, and political stability, as well as on automobile fuel prices, airline transportation prices, and shipping rates. It also financially impacts national economies and consumer goods and services.
To alleviate the oil price modeling challenges, researchers have used a panoply of models spanning from Stevenson and Bear's [11] most trivial random walk to highly sophisticated nonlinear models, such as dynamic model averaging, not to mention the recent advances in commodity forecasting using machine learning models. Most of the literature on commodity price forecasting can be grouped into three categories: univariate time series, multivariate time series, and volatility forecasting. In turn, under each category, researchers have used linear and nonlinear models with and without exogenous variables.
The autoregressive and moving average (ARMA) model of Box and Jenkins [12] and its variants are widely used in time series forecasting. ARMA models stipulate that a stationary (A time series is stationary if its mean, variance, and covariance do not vary with time. If the series is not stationary, we can integrate it to make it stationary by taking successive differences, yielding an autoregressive integrated moving average (ARIMA) process) time series can be modeled as a weighted average of its past observations (AR process) and the past observations of the white noise process. The literature abounds with studies that use ARIMA as either a forecasting tool or a benchmark in predicting commodity prices, for example, [13][14][15][16] for oil prices, [17] for electricity prices, and [18] gold.
There are other variations of the ARIMA model, such as the seasonal autoregressive integrated moving average or SARIMA, the controlled autoregressive integrated segmented moving average or CARISMA [19], the fuzzy autoregressive integrated moving average or FARIMA [20], and the fuzzy seasonal autoregressive integrated moving average (FSARIMA) that combines the seasonal SARIMA with the fuzzy regression model [21]. The ARIMA model, Poisson, Markov, autoregressive, moving average, ARMA, and ARIMA processes are limited to short-range dependencies. In addition, the autoregressive fractional integral moving average model (ARFIMA) is identified as a fractional order signal processing technique as a generalization of the ARIMA and the ARMA models [22]. Because of this integration, the authors of [22] found the ARFIMA to offer broader applications than all of the previously mentioned time series analysis methodologies with both short-term and long-term dependence. In addition, the authors found the ARFIMA forecasting model to result in a superior fit compared with established integer-order models when working with spatial or time series data with long-range dependence (LRD), that is, long-memory or long-range persistence.
To overcome the limitations of the ARMA and its variant models, some researchers investigate whether oil price time series portray long-memory properties or volatility (see, for example, [23][24][25][26][27]). However, with evidence of nonlinear dependence, predictable returns, and volatility, long-memory properties contradict the validity of weak-form oil market efficiency. Although, at the same time, some studies have empirically tested the modeling and forecasting of long-memory fluctuations in crude oil markets with the use of generalized autoregressive conditional heteroskedasticity (GARCH) type models (e.g., [28][29][30][31]; researchers conducting these studies still think that long memories and volatility that appear in returns are irrelevant. Nevertheless, because it has been well-known that market shocks have a substantial simultaneous influence on returns and volatility, they have dual long-memory properties. Thus, based on this, researchers such as [32][33][34][35] use the joint ARFIMA-FIGARCH model to study the relationship between returns and volatility in economic and financial time series. This model is well suited to conducting such an analysis of a process demonstrating dual long-memory properties.
The recent availability of economic data, such as the Federal Reserve Economic Data (FRED) (For a detailed description of the FRED data, see [36]) and the Economic Policy Uncertainty data (EPU; the authors of [37] have presented new research perspectives to use the exogenous variable approach to forecast oil prices. For instance, the authors of [38] use FRED and EPU datasets and other financial variables to predict crude oil prices using various machine learning models. However, these data exhibit high multicollinearity among their variables. For instance, Appendix Table A1 indicates that approximately 50% of the variables in FRED data show a correlation coefficient greater than 0.50 in absolute value, with more than 26% exhibiting a correlation coefficient greater than 0.75. The high correlation coefficient and dimensionality of big data, such as FRED, make data reduction techniques suitable by projecting the high dimension data onto a few orthogonal components, eliminating multicollinearity, and reducing the computational cost when the big set is used (for instance, the dynamic model averaging, introduced by [39], does not allow more than 30 variables in the DMA R package). Principal component analysis (PCA) appears to be a popular choice among researchers in the area of economics and finance (see, for example, [40][41][42][43]). The partial least squares (PLS) method is another data reduction technique to avoid the multicollinearity problem while taking advantage of the data pattern. While the PCA technique uses only the explanatory variables to extract the principal components without considering how each variable relates to the dependent variable, the partial least squares method offers an alternative approach to PCA by capturing the relationship between explanatory variables and dependent variables when extracting the components.
This paper assesses the out-of-sample forecasting accuracy of several time series models in predicting the West Texas Intermediate (WTI) crude oil prices. We consider a basic ARIMA model, a seasonal ARIMA (SARIMA), the partial least squares using FRED economic data, augmented ARIMA and SARIMA models with exogenous variables (the PLS components, COVID-19, and the 2008 financial crisis dummy variables), an autoregressive conditional heteroskedasticity model (ARCH), and a generalized autoregressive conditional heteroskedasticity model (GARCH). We also ask and answer the following question: Can economic data, such as FRED, improve oil forecasting accuracy? We compare these alternative models using the mean absolute (MAE), the mean absolute percentage error (MAPE), and the root-mean-squared error (RMSE) of the out-of-sample predictions.
This paper adds to the existing literature on oil price forecasting in at least two aspects. First, we combine mean and variance models to produce more accurate oil price forecasts. The results show that considering the volatility equation along the mean equation improves the forecasting accuracy by 70%. Second, we use the FRED data after reducing its dimension using partial least-squares. This is crucial as the original FRED data are characterized by a high correlation among most variables, potentially inducing overfitting.
The rest of the paper is organized as follows. First, in Section 2, we describe the alternative models used for prediction. Then, Section 3 describes the data and estimation issues. Finally, in Sections 4 and 5, we present and discuss the results; in Section 6, we conclude and suggest future research avenues.

Methods
We start this section by presenting the partial least squares method to extract the most relevant components from FRED economic data, to explain the variation in crude oil price WTI . Then, we present ARIMA and SARIMA models and their variants and close the section by describing the ARCH-GARCH models.

Partial Least Squares
The starting point is the research's goal to explain the variation of the dependent variable y by the variation of the explanatory variables X. However, using the linear regression y = Xβ + poses the issue of collinearity in the variables of the matrix X.
To overcome this issue, in the case of principal component analysis, the matrix X of explanatory variables is decomposed into a smaller set of uncorrelated components. While PCA extracts the features without considering the dependent variable y, the partial least squares extract the components by maximizing the covariance between y and X. We project the explanatory variable space into components z, such as z = v X with v v = I, where I is the identity matrix. Similarly, we project the dependent variable space into components r, such that r = w y, where w w = I (y canrepresent multivariate dependent variables. In this study, y represents the oil price.) The loading factors v and w are then found by maximizing the covariance between the loading scores z and r, that is, max E zr = max E v Xw y] subject to v v = I and w w = I With v at hand, we can predict the dependent variable as y = zγ = v Xγ. Another alternative is to use a set of components explaining a given level of variance in y as regressors instead of X.

ARIMA-SARIMA Models
In these models, the dependent variable y t is a weighted average of the past value of y t and a white noise error term t , that is, where p is the number of autoregressive processes (AR) and q is the number of moving average processes (MA). The dependent variable needs to be stationary; otherwise, it should be differenced as many times as necessary to make it stationary, yielding the integration order, d. So, an ARIMA(p, d, q) is a process that has been differenced d times and decomposed into p AR and q MA processes. In addition, some time series exhibit seasonality in the AR, the d, and the MA processes. Therefore, the ARIMA model is modified accordingly, yielding the seasonal ARIMA model or SARIMA. The following equation summarizes the SARIMA model: where P indicates the number of seasonal autoregressive terms, Q represents the number of moving average terms, and s is the number of seasonal periods (for example, s = 12 for monthly data, and s = 4 for quarterly data). For instance, SARIMA(p = 1, d = 0, q = 1)(P = 1, D = 0, Q = 1) s=12 =SARIMA(1, 0, 1)(1, 0, 1) 12 is given by: In addition, the ARIMA and SARIMA models can be extended to include exogenous variables. This study will add partial least squares components to ARIMA and SARIMA to create ARIMAX and SARIMAX models.

ARCH-GARCH Models
The ARIMA, ARIMAX, SARIMA, and SARIMAX models assume the variance is constant over time and therefore do not capture the time-varying volatility commodity prices exhibit. One of the stylized facts of financial time series is volatility clustering: periods of high volatility follow periods of high volatility, and vice versa [44]. To account for volatility clustering, the mean equation model is augmented by the variance equation. There are two main approaches to modeling volatility: the autoregressive conditionally heteroscedastic (ARCH) model and the generalized autoregressive conditionally heteroscedastic (GARCH) model. The ARCH model, introduced by [45], simultaneously models the mean of the time series y t and the shocks variance σ 2 t as follows: where η t is a white noise process. Here, µ can follow any process described in the previous section, such as ARIMA or SARIMA. We obtain the GARCH model by adding lagged shock variances as suggested by [46]. Hence, the model in Equation (5) becomes: The use of these panoplies of models allows the researchers to benefit from their advantages and avoid their disadvantages. For example, the linear models of ARIMA and SARIMA present the advantage of linearity and parsimony that are simple to estimate. However, most financial and economic time series are highly nonlinear, implying the failure of linear models to provide better forecasts. To avoid this disadvantage, volatility models (ARCH and GARCH) are necessary, especially if the time series shows periods of volatility clustering, as in the case of crude oil prices. Nevertheless, when there is an asymmetry in the volatility clustering between good news and bad news, the GARCH models are inadequate.

Data
The data used in this study consist of monthly West Texas Intermediate crude oil prices obtained from the U.S. Energy Information Administration (EIA) and the Federal Reserve Economic Data-Monthly Database (FRED-MD) from February 1992 to October 2022. The FRED-MD data consist of 128 economic variables (FRED-MDalso has an oil price variable, but we excluded it and used WTI from US-EIA), classified into eight groups: (1) output and income, (2) labor market, (3) consumption and orders, (4) orders and inventories, (5) money and credit, (6) interest rate and exchange rates, (7) prices, and (8) the stock market. (Giventhe big number of variables in FRED-MD data, we do not provide descriptive statistics for this data. Interested readers are referred to [36] for a thorough description of the FRED-MD data.). Table 1 provides descriptive statistics of the oil prices and the percentage change in the oil price (price return), defined as r t = 100 * p t −p t−1 p t−1 , during the period of study. The WTI prices averaged $51.27 a barrel and had a standard deviation of $29.50 a barrel, indicating the high volatility of the oil prices during this period. The oil prices oscillated between a minimum of $11.28 a barrel and a maximum of $133.93 a barrel with relatively low skewness (0.53) and kurtosis (−0.81).  Oil prices increased again until July 2014 but fell sharply as the United States shale oil product expanded. From 2016 to 2019, oil prices exhibited relatively stable behavior but dropped dramatically during the COVID-19 pandemic. In recent months, oil prices exhibited an upward trend in response to the war between Russia and Ukraine. Moreover, oil prices showed high levels of volatility from 2006 to 2008, increasing by almost 120%, from $60 to approximately $134 a barrel. Within six months, they decreased by nearly 250% to $39 a barrel by February 2009. Since then, oil prices have substantially oscillated, indicating structural breaks in the industry.

Estimation Strategies
Before using the price in time series analysis, proceeding to some diagnostic checks, such as the stationarity and autoregressive conditional heteroscedasticity tests, is routine. This study applies different traditional tests to test the stationary nature of oil prices, namely, the augmented Dickey-Fuller test ( [47], ADF hereafter). For completeness, we also run the Phillips and Perron (PP) test [48] and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test proposed by [49]. In addition, we perform Engle's autoregressive conditional heteroscedasticity test [45] to test for the presence of volatility clustering or time-varying volatility.
For the specification of the ARIMA and SARIMA models, we follow the authors of [12]'s strategy, which consists of identification, estimation, and diagnostic checking stages. In the identification stage, we determine the autoregressive and moving average orders using the autocorrelation functions (ACF) and the partial autocorrelation functions (PACF) and their corresponding plots (see Figure A1 in the Appendix A.2). For the WTI crude oil price, the figure indicates that the autocorrelations do not die out quickly for the price variable, suggesting nonstationarity. However, the ACF dies out quickly for the percentage change in the price or price return, implying stationarity, as established by the stationarity tests in Table 2. For a moving average process MA(q), the autocorrelations are zero for j > q, and the partial autocorrelations tapper off. The cutoff point for the autocorrelation function is to compare the sample autocorrelations to ± √ 2 T , where T is the number of observations. For an AR(p) process, the partial autocorrelations θ ii = 0 f or i > p, and the autocorrelations die out quickly. We also compare the partial autocorrelation function T to decide about the AR(p) cutoff. If neither the autocorrelations nor the partial autocorrelations have a cutoff point, then the ARMA model may be appropriate [50].
In the empirical part, we split the data into a training set (75%) and a test set (25%). We also augment the FRED-MD data with a dummy representing the 2007-2009 recession (According to the U.S. National Bureau of Economic Research (NBER) website, the recession started in December 2007 and ended in June 2009. See https://www.nber.org/research/ data/us-business-cycle-expansions-and-contractions, accessed on 22 April 2023), and a dummy representing the COVID-19 pandemic. We then estimate and use the following models to forecast the WTI oil price:

1.
Partial least squares model (PLS). We use this model to forecast the WTI oil price and extract a reduced set of components that we use as explanatory variables in subsequent models.

2.
Ordinary least-squares (OLS) with the most important PLS components as explanatory variables, in addition to recession and COVID-19 dummy variables. 3.
Autoregressive integrated moving average (ARIMA) with p, d, and q determined by minimizing the Akaike information criterion. 4.
Autoregressive integrated moving average with the most important PLS components as exogenous variables (ARIMAX1) and the PLS components, in addition to recession and COVID-19 dummy variables as exogenous variables (ARIMAX2).

5.
Seasonal autoregressive integrated moving average (SARIMA) with p, d, q, P, and Q determined by minimizing the Akaike information criterion. 6.
Seasonal autoregressive integrated moving average with the most important PLS components as exogenous variables (SARIMAX1) and the PLS components, in addition to recession and COVID-19 dummy variables as exogenous variables (SARIMAX2). 7.
Autoregressive conditional heteroscedasticity (ARCH) model. Given its wide use for financial data and parsimony, we used ARCH(1). 8.
The autoregressive conditional heteroscedasticity model with the most important PLS components as exogenous variables (ARCHX (1)). 9.
The generalized autoregressive conditional heteroscedasticity (GARCH) model. Given its wide use for financial data and parsimony, we used GARCH(1,1). 10. The generalized autoregressive conditional heteroscedasticity model with the most important PLS components as exogenous variables (GARCHX(1,1)). To compare the forecasting performance of each model, we use the mean absolute (MAE), the mean absolute percentage error (MAPE), and the root mean squared error (RMSE) on the out-of-sample observations. Equation (7) gives the expression of each criterion used in model comparison.
where T os is the out-of-sample total number of observations, y t is the observed WTI oil price, andŷ t is the forecast WTI oil price. Table 2 shows the p-values of the unit root test result using the ADF, PP, and KPSS tests for the WTI oil price and its return. For the ADF and PP tests, the null hypothesis is the existence of a unit root (the time series is not stationary), while for the KPSS test, the null hypothesis is that the time series is stationary. The p-values indicate that the ADF and PP test statistics fail to reject the null hypothesis for the oil price. The KPSS test also confirms this result, rejecting the null hypothesis of the stationarity of the oil prices. In contrast, the ADF and PP tests reject the null hypothesis for the price return variable, indicating that the oil price is integrated with order one. A similar conclusion is reached using the KPSS test. To avoid spurious regression results, we estimate all of the models using the percentage change in the price (price return) but recover the forecast of the original price series.

Time Series Specification and Diagnostic Results
Moreover, Table 2 provides the results of Engle's test, testing the null hypothesis of the absence of conditional heteroscedasticity. The p-values of Engle's test indicate the presence of autoregressive conditional heteroscedasticity for the WTI oil price and its return.
In selecting the appropriate ARIMA model, we consider the parsimony concept; the authors of [12] argue that parsimonious models produce better forecasts than over-parameterized models. We achieve parsimony by using the Akaike information criterion (AIC) when using the function auto_arima from the Python package pmdarima [51]. The algorithm returned an ARIMA(3,1,1) as the best model with AIC = 1574.575. For the seasonal autoregressive moving average model, the algorithm returned a SARIMA(3,1,1)(3,0,1,12) for the seasonal autoregressive moving average. Moreover, the partial least-squares method reduces the number of explanatory variables and removes the multicollinearity issues inherent in the FRED-MD data, as evidenced in Appendix Table A1. Table 3 gives the cumulative variance and the reduction in the mean squared error of the PLS components. In contrast, Figure 2 provides a graphical representation of Table 3. The results indicate that ten PLS components explain more than 98% of the oil price variation. As the number of components increases, the explained variance increases but very marginally. Consequently, we can use the first ten components as explanatory variables of the oil price variation, that is,

PLS Results
The partial least-squares model produced the out-of-sample WTI oil price forecast summarized in Figure 3. The figure shows that the PLS model with ten components successfully forecasts the in-sample WTI oil prices. The out-of-sample forecasts tell a different story, however. The out-of-sample predictions mimic the observed oil price movements, but the prediction values are off, especially during the COVID-19 pandemic. During the pandemic, the PLS model with ten components predicted negative oil prices. Nevertheless, the results indicate that the FRED-MD data, summarized in PLS principal components, is a good predictor tool, but not during extreme events, such as the COVID-19 pandemic. Observed oil price In-sample forecasts Out-of-sample forecasts

OLS Results
To overcome this issue, we augment the ten PLS components with the 2007-2009 financial crisis and COVID-19 pandemic dummy variables and use them as predictors for the WTI oil price. We summarize the results of the least-squares estimation in Table 4. The OLS results show that all ten PLS components are statistically significant in explaining the variation in oil prices. In addition, the 2007-2009 financial crisis had a positive and statistically significant effect on WTI crude oil prices. However, the COVID-19 dummy variable is not statistically significant in explaining the variation in crude oil prices.  Figure 4 displays the in-and out-of-sample forecasts using the OLS model with the PLS components, the COVID-19, and the financial crisis dummy variables. As with PLS, the OLS model successfully fitted the data. However, the out-of-sample forecasts fail to provide high-quality forecasts, especially during high-volatility periods. The OLS model overshoots the price decrease during the COVID-19 pandemic in 2020 and the price increase during the start of the Russia-Ukraine war in 2022. Observed oil price In-sample forecasts Out-of-sample forecasts

ARIMA Results
As indicated before, the Box-Jenkins [12] identification step yielded the following ARIMA(3,1,1) as the best model: The maximum likelihood estimation of this model yields the results summarized in Table 5. All of the parameter estimates are statistically significant. The model fitted well the in-sample data, as shown in Figure 5. However, ARIMA(3,1,1) performed poorly in producing accurate out-of-sample forecasts (see Figure 5). The model does not even mimic the price movements as the PLS and OLS did.  Compared to the PLS and OLS models, the ARIMA model's poor out-of-sample performance may suggest that the FRED-MD data's explanatory power, summarized in PLS components, could improve the forecasting accuracy. To explore this possible explanation, we augment the ARIMA(3,1,1) model with ten PLS components and estimate the following equation: The results of this estimation are summarized in Table 6. The findings indicate that all of the variables included are statistically significant except for the second lag of the price difference. In addition, the augmented ARIMA model produces better out-of-sample forecasts than the plain ARIMA model, as indicated by Figure 6. This could indicate the importance of including economic data, such as FRED data, when forecasting commodity prices. However, this model also fails to reproduce the out-of-sample oil prices during highvolatility periods. Observed oil price In-sample forecasts Out-of-sample forecasts To account for the 2007-2009 financial crisis and the COVID-19 pandemic, we added these two events as dummy variables and estimated the following model: The estimation results, summarized in Table 7, show that all PLS components and ARIMA terms are statistically significant at the 1% level, except for the second lag of the price difference, which is significant at the 10% level. The COVID-19 dummy variable has a negative but not statistically significant effect on the oil price. In contrast, the 2007-2009 financial crisis contributed to oil price increases statistically significantly. Nevertheless, adding these two event variables did not improve the forecasting accuracy of the model, as indicated by the negative price forecast during the COVID-19 pandemic period (see Figure 7).

SARIMA Results
To investigate the seasonality of the WTI crude oil prices, we estimate the following seasonal ARIMA model (the orders of SARIMA were chosen using the AIC in the auto_arima function of the pdmarima Python package.): Though the seasonal lags are statistically significant, as indicated by the results in Table 8, the out-of-sample SARIMA forecasts in Figure 8 show poor performance as the model fails to mimic the price movements, let alone the forecast accuracy.   However, SARIMA forecasts improve when we add the PLS components alone (Figure 9a) and when we augment them with the 2007-2009 financial crisis and the COVID-19 dummy variables (Figure 9b), at least in terms of mimicking the price movements. In terms of the results of the regressions, most PLS components and seasonal terms are statistically significant (see Table 9). In contrast, the results in Table 10 show the COVID-19 dummy variable is not statistically significant in explaining the variation in oil prices for the sample at hand.

ARCH Results
As seen in Figure 1b, the WTI crude oil price exhibits volatility clustering with periods of high volatility and periods of low volatility. To account for this phenomenon, we estimate augment the ARIMA models to account for conditional variance. For parsimony reasons, we estimated an ARIMA(1,1,0) for the mean and autoregressive conditional heteroscedasticity, ARCH(1), for the variance, that is, The maximum likelihood estimation of Equation (14) produces the results summarized in Table 11. The parameter estimates of the mean and volatility equations are statistically significant at conventional levels. More importantly, the out-of-sample forecasts of the model accurately mimic the observed oil prices even during periods of high instabilities. Figure 10 shows how the ARIMA(1,1,0)-ARCH(1) model predicts future prices with high accuracy. Including the PLS components does not provide a good statistical fit as all of the variables included in the mean and volatility equations are not statistically significant (see Table 12). Nonetheless, the ARIMA(1,1,0)-ARCH(1) with exogenous variables forecasts still produce accurate forecasts, as evidenced in Figure 11.

GARCH Results
We generalized the ARCH model by including past volatility values and estimated the following ARIMA(1,10)-GARCH(1,1): Table 13 presents the maximum likelihood estimation results. The findings show that ARIMA(1,1,0)-GARCH(1,1) provides an appropriate statistical fit for the data at hand. The lagged price difference in the mean equation and the ARCH and GARCH effects are statistically significant at the 5% level. Moreover, the model produces accurate out-of-sample forecasts. Figure 12 shows that ARIMA(1,1,0)-GARCH(1,1) accurately predicts the oil price movements and values, even during periods of high volatility, such as the COVID-19 pandemic and the Russia-Ukraine war. This narrative does not change when we add the PLS components to the mean equation. Unlike in the case of ARCH, adding PLS components to the GARCH specification (mean equation) does not impact the statistical significance of ARCH and GARCH effects (see Table 14), nor does it significantly improve the forecasting accuracy of the model (see Figure 13). Observed oil price In-sample forecasts Out-of-sample forecasts

Discussion
In this study, we provided alternative models to forecast WTI crude oil prices. We also emphasized the forecasting power of the FRED-MD economic data by estimating models with and without the data. In Table 15, we used the mean absolute percentage error (MAPE), the mean absolute error (MAE), and the root mean squared error (RMSE) to compare alternative models. Our results indicate that the ARIMA(1,1,0)-GARCH(1,1) model outperforms all of the other models in the three criteria. In addition, the models with conditional volatility outperform all of the other models using the three criteria.  Table 15 shows a significant difference between the ARIMA family models and volatility models, whether we use FRED data or not. Hence, the worst volatility model (in terms of RMSE), namely, ARIMA(1,1,0) with ARCH(1), reduces the forecasting error of ARI-MAX1(3,1,1) by more than 70%. This highlights the importance of considering nonlinear time series when studying variables with extreme events.
On the other hand, using FRED-MD data reduces the forecasting error of the corresponding model, except for GARCH. For instance, ARIMA(3,1,1)'s mean absolute percentage error drops from 32.62% to 27.80% when we add the PLS components extracted from FRED-MD data. Moreover, the forecasting error also drops when the PLS components are added to the mean equation of the ARCH(1) model, from 5.69 to 5.34 in terms of root mean squared error.
Regarding previous studies, the authors of [14,38] offer two interesting studies we can use to compare our findings as their goals were to forecast crude oil prices using somewhat similar series. The authors of [14] use ARIMA and SARIMA to forecast crude oil prices in the United States and Europe from January 2017 to September 2021. Their ARIMA and SARIMA models yielded an out-of-sample MAPE of 0.05 and 0.09, respectively. However, their training set included the COVID-19 pandemic period. In contrast, our training set did not include that period, but our conditional variance models (ARCH and GARCH) were successful in producing highly accurate forecasts.
The study by [38] is close to our study regarding the variables and the training set used. The authors used several machine learning models to forecast the WTI crude oil prices, using data from March 1993 to December 2021 (thedata we used was from February 1992 to October 2022) with a test set including the COVID-19 pandemic. In addition, their study used also FRED data besides the economic policy uncertainty data and other financial data. Their change point-adaptive recursive neural network (CP-ADARNN) allows for more reduction in forecast errors compared to our best model. However, one has to be cautious regarding the use of economic data without orthogonalization (partial least-squares or principal components) due to multicollinearity and the consequent risk of overfitting. In contrast, the GARCH model does not present the overfitting risk and is available in most traditional statistical software.

Conclusions
In this study, we propose alternative time series models to forecast WTI crude oil prices. We also assess the forecasting power of economic data, namely, FRED data. Our results indicate that when linear models are used (ARIMA and SARIMA), the inclusion of the partial least-squares components extracted from FRED reduces the forecasting error without providing an accurate forecast during high-volatility periods. In contrast, including PLS components from FRED data does not improve the forecasting power volatility models, especially the GARCH model.
Moreover, this paper offers empirical evidence against ignoring conditional volatility in forecasting commodity prices. Even when augmented with economic data spanning all economic activities, linear models, such as ARIMA, provide poor forecasts, especially during extreme events. Our finding highlights the importance of considering nonlinear time series when studying variables with extreme events. In addition, this study's outcome has practical implications for commodity traders as the GARCH estimation only requires the commodity price.
The findings of this study confirm the use of the generalized autoregressive conditional heteroskedasticity (GARCH) models as a statistical tool that provides high-quality forecasts, not only for the volatility but also for the time series mean. The GARCH model is important because it makes it possible to model financial and economic time series data more precisely. The model effectively captures the key characteristics of the data by taking volatility clustering into consideration. This is critical in crude oil pricing since forecasting depends on precise modeling of volatility. As indicated by Figure 1, the WTI oil prices exhibited volatility clusters during several periods in the 1992-2022 monthly data. Unlike studies, such as [14], our study properly accounts for volatility clustering to capture the important feature of WTI oil prices and produce accurate forecasts.
This study can be improved in several respects. First, future studies may extend the factor models to include global demand, international economic, and geopolitical indicators instead of limiting the analysis to the United States' economic indicators. In fact, the FRED database only includes variables related to U.S. economic activities. Exploring other variables or indices, such as the ones provided by the Economic Policy Uncertainty (EPU) database, may take into consideration global economic activities.
Second, to palliate linear models' shortcomings, one of the promising nonlinear modeling techniques is the dynamic model averaging developed by [39]. To allow the forecasting model's parameters and forecasting accuracy to vary over time, the model integrates the state space approach with the Markov chain process. This modeling approach allows the parameters and the set of explanatory variables to vary over time. To select the model with the highest probability at any given time, the model is frequently accompanied by dynamic model selection (see, for instance, [52]).
Finally, future studies should also consider asymmetric volatility models. Observational evidence shows that negative shocks have different impacts than positive ones on volatility. The use of asymmetric GARCH models, such as the exponential general autoregressive conditional heteroskedastic (EGARCH) model, developed by [53], and the Glosten-Jagannathan-Runkle GARCH(GJR-GARCH) model, developed by [54], are some examples that consider asymmetric volatility.

Data Availability Statement:
The data used in this study will be made available upon request.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Appendix A.1 Table A1. Correlation among FRED data variables.

Percentage of Variables
More than 0.95 5.55% More than 0.90 11.06% More than 0.75 26.07% More than 0.