Deep learning systems for forecasting the prices of crude oil and precious metals

Commodity markets, such as crude oil and precious metals, play a strategic role in the economic development of nations, with crude oil prices influencing geopolitical relations and the global economy. Moreover, gold and silver are argued to hedge the stock and cryptocurrency markets during market downsides. Therefore, accurate forecasting of crude oil and precious metals prices is critical. Nevertheless, due to the nonlinear nature, substantial fluctuations, and irregular cycles of crude oil and precious metals, predicting their prices is a challenging task. Our study contributes to the commodity market price forecasting literature by implementing and comparing advanced deep-learning models. We address this gap by including silver along-side gold in our analysis, offering a more comprehensive understanding of the precious metal markets. This research expands existing knowledge and provides valuable insights into predicting commodity prices. In this study, we implemented 16 deep-and machine-learning models to forecast the daily price of the West Texas Intermediate (WTI), Brent, gold, and silver markets. The employed deep-learning models are long short-term memory (LSTM), BiLSTM, gated recurrent unit (GRU), bidirectional gated recurrent units (BiGRU), T2V-BiLSTM, T2V-BiGRU, convolutional neural networks (CNN), CNN-BiLSTM, CNN-BiGRU, temporal convolutional network (TCN), TCN-BiLSTM, and TCN-BiGRU. We compared the forecasting performance of deep-learning models with the baseline random forest, LightGBM, support vector regression, and k-nearest neighborhood models using mean absolute error (MAE), mean absolute percentage error, and root mean squared error as evaluation criteria. By considering different sliding window lengths, we examine the forecasting performance of our models. Our results reveal that the TCN model outperforms the others for WTI, Brent, and silver, achieving the lowest MAE values of 1.444, 1.295, and 0.346, respectively. The BiGRU model performs best for gold, with an MAE of 15.188 using a 30-day input sequence. Furthermore, LightGBM exhibits comparable performance to TCN and is the best-performing machine-learning model overall. These findings are critical for investors, policymakers, mining companies, and governmental agencies to effectively anticipate market trends, mitigate risk, manage uncertainty, and make timely decisions and strategies regarding crude oil, gold, and silver markets.


Introduction
Nonrenewable commodities usually mined in certain countries can strongly impact their economies, policies, currencies, and international or political issues.Energy and precious metals markets, among other commodities, are well-known alternatives to stock markets (Pullen et al. 2014;Hussain Shahzad et al. 2017;Akbar et al. 2019;Adekoya et al. 2022;Phan et al. 2016;Sarwar et al. 2019).Their prices are critical indicators of economic health and crucial determinants for financial planning and decision making.In this regard, understanding the dynamics of such markets and forecasting their evolutions is crucial for portfolio optimization and management.Crude oil, a crucial energy commodity, is pivotal in global macroeconomics and influences the decisions made by policymakers like governments and central banks.Fluctuations in crude oil prices have profound implications for a country's political and economic security; therefore, accurate crude oil price forecasting is imperative.Crude oil market shocks in April 2020 and their impacts have increased interest in understanding oil price dynamics (Wang et al. 2021;Murshed and Tanha 2021;Balcilar et al. 2021;Zhang et al. 2022a, b;Enwereuzoh et al. 2021).Conversely, gold is important for investment portfolio diversification and hedging (ben Khelifa et al. 2021;Reboredo 2013;Baek 2019).Gold contributes a large portion of the commodity reserves of major economies.As of September 2022, the official United States (US) gold reserve was 8133.47 tons, approximately 66.6% of total US reserves. 1iven these markets' multifaceted nature, forecasting the trajectories of these commodities is crucial in financial markets, serving as an essential tool for investors, policymakers, and analysts.For investors, anticipating price movements in crude oil and precious metals provides a strategic advantage in optimizing portfolio performance and risk management.A comprehensive understanding of potential price fluctuations allows investors to make informed decisions, allocate resources optimally, and ultimately enhance their overall financial returns (Bhowmik and Wang 2020).In contrast, policymakers rely on accurate market forecasts to develop effective economic policies and mitigate the potential impact of market volatility on national economies.Fluctuations in crude oil prices, for instance, can have cascading effects on inflation, trade balances, and overall economic stability (Uzo-Peters et al. 2018;Xiuzhen et al. 2022;Periwal 2023).Similarly, precious metal prices often indicate broader economic sentiments and can influence monetary policies and international trade relationships.
In this context, the science of forecasting plays a pivotal role in providing foresight into future trends in crude oil and precious metal prices.Advanced analytical models (Kou et al. 2021(Kou et al. , 2022;;Li et al. 2022a, b;Lahmiri 2023a), statistical methods (Lahmiri et al. 2022;Lahmiri 2023b), machine learning (Lahmiri et al. 2023), and deep-learning algorithms (Amirifar et al. 2023;Amirshahi and Lahmiri 2023a, b;Lahmiri and Bekiros 2019, 2020, 2021) enable analysts to search through vast datasets, identify patterns, and make predictions that are invaluable for both short-term traders and long-term investors (Abdullah Ahmed and Bin Shabri 2014;Zhao et al. 2015;Das et al. 2022;Jiang et al. 2022;Liang et al. 2023).Driven by this motivation, this study investigates forecasting methodologies within the domains of crude oil and precious metals markets to enhance the precision of price predictions.
Recent innovations in deep learning models seem promising for time-series forecasting; however, the crude oil and precious metals forecasting literature struggles to use these models for price prediction.This study attempts to fill this gap in the forecasting literature by applying several deep-and machine-learning models to predict the daily closing prices of crude oil, gold, and silver.First, the time-series data of daily spot prices of two prominent crude oils, West Texas Intermediate (WTI) and Brent, and two precious metal markets, gold and silver, are gathered and normalized.Then, several input sequences are prepared using the sliding window method with four different window lengths.Next, the dataset is split into training, validation, and test sets using a time-based splitting approach.Finally, a comprehensive set of 16 forecasting models, consisting of 12 deep-learning models, 2 baseline-ensemble models, and 2 baseline machine-learning models, is implemented to predict the next-day market price.The deep learning models used in the current study include long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent units (GRU), bidirectional GRU (BiGRU), Time2Vector BiLSTM (T2V-BiLSTM), Time2Vector BiGRU (T2V-BiGRU), convolutional neural networks (CNN), hybrid CNN-BiLSTM, hybrid CNN-BiGRU, temporal convolutional networks (TCN), hybrid TCN-BiLSTM, and hybrid TCN-BiGRU models.Two baseline ensemble models are the random forest and LightGBM gradientboosting models, and two baseline machine-learning models are the support vector regression (SVR) and k-nearest neighborhood (KNN) models.
Each of the employed models has its strengths and limitations.LSTM models are a type of recurrent neural networks (RNN) that are popular for their ability to capture long-term dependencies, overcome the gradient vanishing problem, and handle variable-length sequences; however, LSTMs can be computationally expensive and prone to overfitting, requiring regularization techniques (Yu et al. 2019).GRU models, another type of RNN, have a simpler architecture, resulting in faster training and inference times; however, they may have limitations in capturing complex patterns compared with LSTM models.Bidirectional models, such as BiLSTM or BiGRU, consider both forward and backward information, making them more robust to variations in the input sequence order; however, they are computationally complex and require more memory resources (Khan et al. 2021).CNNs are effective at capturing local patterns and features within time-series data.CNNs learn filters to detect specific temporal patterns and are translation invariant, meaning they can detect patterns regardless of their position in the input sequence; however, CNNs have limitations, such as the requirement for fixed-length inputs, limited consideration of temporal ordering, and the ability to capture long-term dependencies.Hybrid CNN-LSTM models combine the strengths of both CNNs and LSTMs, capturing spatial and temporal features.They are suitable for tasks that require capturing complex patterns in time-series data; however, they can be less interpretable than standalone models (Gharghory 2021).TCNs are designed to capture long-term dependencies efficiently.They use dilated convolutions to capture information from several past time steps.TCNs are adaptable to different time-series lengths without padding or truncation; however, they can be complex to design and tune and are sensitive to input scaling (Gopali et al. 2021).Ensemble machine-learning models such as random forest and LightGBM are also used in time-series analysis.Random forest combines multiple decision trees and offers high prediction accuracy and robustness against outliers.Light-GBM is an efficient gradient-boosting framework that effectively handles large datasets.Both models have their accuracy and generalization strengths but cannot explicitly capture temporal dependencies (Ke et al. 2017).SVR is a flexible model that can capture linear and nonlinear relationships; it focuses on support vectors, which greatly influence the model's decision boundary.SVR can handle high-dimensional datasets and complex relationships between variables; however, the performance of SVR depends on selecting appropriate hyperparameters, and it does not explicitly model temporal dependencies.KNN is an instance-based algorithm that makes predictions based on the similarity of training instances; it requires no training phase but suffers from the curse of dimensionality and cannot capture temporal dependencies.
Our paper compares the forecasting performance of these models by mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) error functions.This paper primarily aims to answer the following questions through empirical experiments.(1) What is the best deep-learning model that can predict crude oil, gold, and silver spot prices reliably and precisely?(2) In response to the first question, does a particular model outperform other models for crude oil and precious metals prices?(3) Which input sequence length is more informative for each market's price prediction?(4) Are hybrid models effective in forecasting crude oil, gold, and silver spot prices?( 5) What conclusions about the properties of each deep-learning model can be drawn in the context of crude oil and precious metals time-series forecasting?
The arrangement of the rest of this manuscript is as follows."Literature review" section provides an overview of the relevant prior research and summarizes our contributions to the existing literature."Methodology" section explains the methods and performance evaluation criteria used in this study."Empirical analysis and results" section describes the datasets, demonstrates the results, and discusses our findings.Finally, "Conclusion" section summarizes the paper and presents some managerial implications and policy suggestions.

Literature review
Accurately forecasting financial markets is a critical guide for determining economic policies.Consequently, researchers have dedicated their efforts to developing and improving models that capture the intrinsic behavior and dynamics of financial market time series.The prediction methods used in these studies generally comprise statistical or econometrics, machine learning, and deep-learning methods.Several forecasting modeling approaches have recently been applied to crude oil and precious metals.For instance, Zhao et al. (2018) proposed a numerical vector trend forecasting method for predicting the daily spot price of Brent crude oil, outperforming traditional models such as autoregressive integrated moving average (ARIMA), SVR, and wavelet analysis models.Similarly, Szarek et al. (2020) proposed a new stochastic distribution, skewed Student's t-distribution, for silver, copper, and gold time-series estimation, which accounts for the time-dependent parameters and non-Gaussian behavior of time-series data.Drachal (2022) employed the Bayesian symbolic regression method to address variable uncertainty in monthly crude oil price forecasting.
Due to the nonlinearity, nonstationarity, and heteroscedasticity of crude oil and precious metal markets, classical statistical forecasting models such as vector autoregressive (VAR), ARIMA, and autoregressive distributed lag (ARDL) struggle to perform well in forecasting tasks.These models make assumptions about the normality and stationarity of price data, which often do not hold for many time-series data for commodity markets.As a result, recent studies have used machine-and deep-learning models, which excel in handling nonlinear data and do not rely on the normality assumption for accurate price predictions.In the literature, three main types of deep neural networks are used for sequence modeling, and they can be applied for time-series forecasting (Lim and Zohren 2021).These networks include (i) RNNs and their variants, such as LSTM (Hochreiter and Schmidhuber 1997) and GRUs (Cho et al. 2014), (ii) CNNs (Lecun et al. 1998) and their recent variant, TCN (Lea et al. 2016), and (iii) transformer (Vaswani et al. 2017) and its variants (Devlin et al. 2018;He et al. 2020;Liu et al. 2019).
Several studies used statistical, machine learning, and deep-learning models to account for the importance of gold price forecasting.Alameer et al. (2019) used a multilayer perceptron model with a whale optimization algorithm for gold next-month price forecasting.This model demonstrates a lower forecasting error than ARIMA model forecasts.Madziwa et al. (2022) employed an ARDL model to forecast annual gold prices using lagged gold prices, gold demand, and treasury bill rates as predictors.In another study, Zhang and Ci (2020) used the US Consumer Price Index, crude oil price, exchange rate, and Dow Jones Industrial Price Index in a deep belief network to predict monthly gold prices.Risse (2019) predicted gold excess returns to the risk-free rate of return using the ana SVR model.SVR finds the nonlinear relationship in the data by mapping a linear function into a high-dimensional feature space.Tree-based ensemble models have demonstrated promising performance in forecasting gold prices.Yuan (2023) leveraged the XGBoost (Chen and Guestrin 2016) and LightGBM (Ke et al. 2017) models for gold and bitcoin price forecasting.Furthermore, deep-learning methods have been increasingly used for gold price prediction.For instance, using association rules and the LSTM mode, Boongasame et al. (2022) predicted the price of gold.Vidal and Kristjanpoller (2020) developed a hybrid of convolutional neural networks and long-and short-term memory models (CNN-LSTM), which incorporate historical log-return series and timeseries data in an image format to predict the volatility of gold spot prices.Likewise, various studies have used deep-learning models for crude oil price forecasting.Orojo et al. (2019) employed a multirecurrent network to forecast a one-month ahead WTI crude oil price.Lin et al. (2022) forecasted crude oil futures prices using a BiLSTM-Attention-CNN model with wavelet transform.Swamy and Lagesh (2023) explored the effectiveness of investor sentiments from Twitter in predicting the daily gold price by a wavelet analysis method and unveiled a strong correlation between Twitter sentiments and the gold price.Fang et al. (2023a, b) forecasted Brent crude oil prices using an improved slope-based method based on empirical mode decomposition (EMD) and feedforward neural network (FNN) methods.
Conversely, the literature on forecasting other precious metal markets is relatively limited.Sroka (2022) utilizes block bootstrap methods to forecast daily silver prices, while Salisu et al. (2020) tested the impact of Google Trends on forecasting the prices of four precious metal markets using an ARDL model.Zhang et al. (2022a, b) introduced a new objective function to forecast commodity markets, including silver prices.To our knowledge, there is no precedent study to forecast the silver price using machine-and deep-learning models.We attempt to fill this void in the literature.
Given the ongoing improvements in natural language processing (NLP) tasks, recent studies have incorporated news text and Google Trends features into their forecasting models.These approaches leverage the valuable information in the textual data to enhance the accuracy of predictions.For example, Li et al. (2019) extracted text data from online news media and created sentiment features that were grouped by their topics using a latent Dirichlet allocation method.Their topic-sentiment forecasting model shows that text features complement financial features for crude oil price forecasting.Similarly, Bai et al. (2022) constructed features from news headlines for WTI crude oil forecasting.Fang et al. (2023a, b) employed a FineBERT approach to extract sentiment information from crude oil-related news, which was then integrated into a hybrid attention-based BiGRU model for WTI price forecasting.Kertlly de Medeiros et al. ( 2022) demonstrated performance enhancement using a mixed data sampling model incorporating mixed-frequency data and a textual sentiment indicator for oil price forecasting.Salisu et al. (2020) utilized an econometric ARDL model to show that search engine data from Google Trends significantly positively affect precious metal returns.Similarly, Tang et al. (2020) considered Google Trends a useful predictor in a multivariate empirical mode decomposition method for forecasting Brent crude oil spot prices.Other EMD methods have been used by Wang et al. (2018), Qin et al. (2019), Yang et al. (2020), G. Li et al. (2022a, b), and Guo et al. (2022) in their proposed crude oil forecasting models.Liang et al. (2023) also used historical crude oil prices in a deep reinforcement learning algorithm to forecast multistep ahead WTI, Brent, and Oman prices.A recent review paper (Mohamed and Messaadia 2023) highlights that artificial neural networks and support vector machines (SVMs) are the most popular artificial intelligence techniques used to forecast crude oil prices.Collectively, these studies showcase the growing significance of advanced forecasting methods to enhance the accuracy and reliability of predictions in the crude oil and precious metal markets.Some studies have achieved improved forecasting performances by developing ensemble models.Zhao et al. (2017) combined the advantages of stacked denoising autoencoders (SDAE) and bootstrap aggregation (bagging) techniques to model the nonlinear and complex relationships of oil price factors and to generate multiple data sets for training a set of base learners.Wang et al. (2020) proposed an ensemble of five linear and nonlinear submodes to produce the prediction intervals of crude oil spot prices while optimizing the weights of submodes using the gray wolf optimizer.Zhang et al. (2021) developed an ensemble deep-learning model for electricity price series prediction.Jiang et al. (2022) combined a decomposition-ensemble approach optimized by the seagull algorithm with sentiment analysis to forecast future crude oil prices.Su et al. (2022) proposed a hybrid forecasting model using SVM, extreme learning machines, XGBoost, and LSTM models to predict crude oil futures series.Sun et al. (2022) proposed a secondary decomposition-reconstruction-ensemble approach for crude oil price forecasting.
The temporal convolutional networks (TCNs) (Lea et al. 2016) are variants of CNN models that employ casual convolutions and dilations to predict sequential data with temporality and large receptive fields.A simple convolution can only look back at a fixed timing window, whereas a TCN uses dilated convolutions to achieve a large receptive field with fewer convolutional layers.TCNs capture long-term patterns using a hierarchy of temporal convolutional filters, and in that manner, they tend to outperform bidirectional LSTM models and are a magnitude faster to train.A TCN was first developed for action detection in video data settings to account for spatial and temporal input features (Lea et al. 2016).However, recently, TCNs have drawn more attention from scholars and have been applied to various time-series data.For instance, Lara-Benítez et al. ( 2020) utilized a TCN model to forecast electricity demand and prices in Spain.In the environmental milieu, Yan et al. (2020) predicted the El Niño-Southern Oscillation, an index measuring the earth's climate variability, by applying an ensemble empirical mode decomposition-TCN model.This model shows improved prediction performance compared with the LSTM model.
Considering temporal patterns in predicting time-series data is a significant challenge for many models.Some recent studies have introduced learnable time representations to account for temporal patterns in sequential data (Xu et al. 2019(Xu et al. , 2021;;Li et al. 2017).Among these studies, Kazemi et al. (2019) introduced the Time2Vector method to represent sequential data as periodic and nonperiodic vectors that can capture complex temporal patterns in data.Yang et al. (2021) improved the performance of an attention neural network for nonintrusive load monitoring by applying the Time2Vector method.This current study applies Time2Vector embedding to input series and incorporates the resulting periodic and nonperiodic features into several deep-learning models to forecast crude oil, gold, and silver prices.Table 1 summarizes the literature on crude oil and precious metal forecasting.
Gradient-boosting methods are powerful predictive models for many tasks.Borisov et al. (2021) compared the performance of tree-based ensembles, such as XGBoost, LightGBM, and CatBoost (Prokhorenkova et al. 2018), with some deep-learning models, including but not limited to multilayer perceptron, regularization learning networks, neural oblivious decision ensembles, and transformers.They assert that machine learning tree-based models outperform deep-learning models in several prediction tasks with tabular data; however, their study does not include deep-learning models for sequential data and is silent about forecasting financial market prices.To address this shortfall, in the current study, we will use tree-based ensemble models such as random forest and LightGBM compared with 12 deep-learning models and two other machine-learning models (KNN and SVR) to forecast daily crude oil and precious metals market prices.
This study makes significant contributions to the literature on forecasting commodity market prices.
• Considering that there is limited literature on using deep-learning models to forecast the price of commodity markets, this study implements and compares various types of state-of-the-art deep-learning models for crude oil and precious metal spot price forecasting.Hence, our study encompasses several forecasting results that provide comprehensive insights for crude oil, gold, and silver market players and investors.• Most studies on precious metals focus only on gold price predictions; however, this study forecasts the price of both gold and silver to maintain a more general understanding of the precious metal markets.• To the best of our knowledge, this study is the first in forecasting literature that applies the TCN model, Time2Vector embedding module, and hybrid TCN-BiLSTM and TCN-BiGRU models to forecast the spot price of WTI, Brent, Gold, and Silver time series.• The forecasting period in the test dataset of this study, from 2020-01-03 to 2022-03-25, covers two critical global events that significantly affected financial markets.First, the financial crisis during the COVID-19 pandemic significantly impacted all financial markets; in particular, crude oil prices plunged in April 2020.Second, the Russia-Ukraine conflict in February 2022 was associated with a sharp rise in crude oil, gold, and silver prices.Therefore, the results of this study and the proposed models can be used during financial crises and extreme global situations.Figure 5 shows the line chart of the WTI, Brent, gold, and silver prices for reference.

LSTM and BiLSTM
LSTM and BiLSTM are structural variants of RNN models that can remember important information from time-series sequences (Lin et al. 2022).In particular, BiLSTM concatenates two LSTM layers in opposite directions.The interior structure of a common LSTM cell is shown in Fig. 1a.An LSTM unit consists of an input gate, a forget gate, and an output gate.These gates facilitate information flow and help the cell forget unnecessary information.First, the forgetting gate decides what information from the inputs and previous hidden states to discard.Second, the input gate decides what information from the inputs and previous cell states to keep and updates the cell state.Finally, the output gate obtains the output h t by multiplying the o t of the input information pro- cessed by the sigmoid activation function and the cell state vector transformed by the tanh activation function.The equations of a forward pass in an LSTM unit are as follows: (1) where x t ∈ R d is the input vector, and h t ∈ R h is the hidden state vector.Furthermore, f t is the forget gate vector, i t is the input gate vector, o t is the output gate vector, c t ′ is the tem- porary cell state vector, c t ∈ R h cell state vector, and W ∈ R h×d , U ∈ R h×h , and b ∈ R h represent the parameter matrices and vectors.
In a BiLSTM model, from opposite directions, h t is concatenated to construct the bidi- rectional hidden state.The formulas of bidirectional h t are as follows: (2)

GRU and BiGRU
Like the LSTM, the GRU is a variant of RNN cells that can forget insignificant information and help the model use longer data sequences.GRU has fewer parameters than LSTM because it eliminates the output gate.
where x t ∈ R d is the input vector, and h t ∈ R h hidden state vector.Additionally, z t is the forget gate vector, r t is the reset gate vector, ĥt is the candidate activation vector, W ∈ R h×d , U ∈ R h×h , and b ∈ R h represent the parameter matrices and vectors, and σ is the sigmoid activation function.For certain sequential datasets, GRUs outperform LSTM models (Chung et al. 2014;Gruber and Jockisch 2020).The internal structure of the GRU cell is depicted in Fig. 1b.
For a bidirectional GRU model, hidden state vectors from two opposite directions are concatenated as follows: Figure 1c shows the architecture of a single-layer bidirectional LSTM (BiLSTM) or bidirectional GRU (BiGRU) model.

CNN
A CNN is a FNN model proposed by Lecun et al. (1998).CNNs are very popular in computer vision applications, such as facial recognition systems, object localization, object detection, and semantic segmentation.CNNS are effective at capturing local patterns and features within a time series.The convolutional layers learn filters to detect specific (8) temporal patterns, making CNNs well suited for capturing local dependencies and shortterm patterns in time-series data.CNNs are inherently translation invariant, meaning they can detect patterns regardless of their position in the input sequence.This property is helpful for time-series analysis because the same patterns may occur at different time steps.The local perception and weight sharing of CNN can significantly reduce the number of parameters, thus improving the efficiency of model learning (Lu et al. 2020); however, they suffer from limitations such as the requirement for fixed-length inputs, lack of consideration of temporal ordering, and limited ability to detect long-term temporal dependencies.
The architecture of this model is generally constructed from two layers: the convolution layer and the pooling layer.The convolution layer extracts useful features from the input series by applying several convolution kernels to the inputs, as indicated in Eq. 17, which downsamples the input for final forecasting.Then, a pooling layer is applied to the output of the convolution layer to reduce the dimensionality of the model.where l t is the output of the convolution layer, σ is the activation function, x t ∈ R d is the input vector, k t ∈ R d is the parameter vector of the convolution kernel, and b t is the bias term.

TCN
The intrinsic weaknesses of CNN, including fixed-size inputs and mismatched input and output dimensions, restrict its application in time-series forecasting.The TCN (Lea et al. 2016) is a variant of the CNN that employs casual and dilated convolutions appropriate for sequential data with temporality and large receptive fields.Causal means no information leakage from the future to the past, and the receptive field means the set of sample elements of the original input that affect a specific element of the output.A TCN model can show full coverage of the input history by setting a proper dilated factor and kernel size.Furthermore, the TCN has a simple network structure and outperforms standard recurrent networks, such as the RNN and LSTM networks, regarding the effectiveness and efficiency of time-series predictions (Yan et al. 2020).Figure 2 shows a general representation of our TCN model with dilated causal convolutions.This model's architecture consists of the following.
Dilated convolution layer: The dilated convolution architecture modifies Kronecker-factored convolutional filters, enabling a larger receptive field with fewer parameters and layers (Zhou et al. 2015).For a sequence of x t ∈ R d and a filter f : 0, . . ., k − 1 → R , the dilated convolution operation * D on entries s of the sequence is defined as follows: where D is the dilation factor, k is the filter size, and s − D.i assures that only past data are convoluted.A tanh function transforms the output of the dilated causal convolution layer. (17 Dropout layer: A dropout layer with a probability of 0.2 is applied after each dilated convolution layer to regularize the model and eliminate the overfitting problem.
Residual block: We used a stack of two dilated causal convolution layers together, and the results from the final convolution were added back to the inputs to obtain the outputs of the block.The residual connection avoids the vanishing and/or exploding gradient problem in deep-learning models.
Fully connected layer: The output of the residual block is then inputted into a fully connected layer to predict the next-day price.
In Fig. 2, the TCN model has a stack of two layers, a residual connection, and a fully connected layer.Each layer in the stack has a dilated causal convolution, a tanh activation function, and a dropout for regularization.The dilation factors for the dilated convolution layer are D = 1, 2, 4 and a filter size of k = 2 .When D = 1 , the dilated con- volution becomes a basic convolution.
In recurrent-type neural networks, operations apply sequentially.In contrast, in a TCN model, all sequences are convolved simultaneously in each dilated convolutional layer; hence, the training of TCN is much faster than in STM or GRU models (Lea et al. 2016).

Time2Vector (T2V-BiLSTM and T2V-BiGRU)
Time-series input can be considered a sequence in which a dependency across time exists among the sample data rather than being identically and independently distributed (i.i.d); therefore, it is essential to account for time features while developing a time-series forecasting model.Vector embedding has been successfully used in many NLP tasks (Pennington et al. 2014;Mikolov et al. 2013;Almeida and Xexéo 2019).Similarly, Time2Vector (Kazemi et al. 2019) is a learnable vector embedding for time that can be easily combined with many deep-learning models.Time2Vector is a decomposition technique that encodes a temporal signal into periodic and nonperiodic patterns, allowing the model to understand and learn from the time-dependent patterns.It eliminates the need for explicit feature engineering when dealing with time-related features.By incorporating temporal information meaningfully, Time2Vector can improve the performance of time-series models.
For a given scalar notion of time τ , Time2Vec of τ is a vector of size k + 1 defined as follows: where T 2V (τ ) [i] is the i th element of T 2V (τ ) .F is a periodic activation function, and w and b are learnable weight and bias parameters, respectively.Following the indicated activation function in the original T2V paper (Kazemi et al. 2019), we use a sine function as F .Time2Vector (T2V) assures that the time scale will not affect the learned periodic and nonperiodic time features (Yang et al. 2021).
To construct the T2V-BiLSTM and T2V-BiGRU models, first, the input sequences are transformed by Time2Vector embeddings, then the embedded input vectors are entered into a single-layer BiLSTM or BiGRU model, and finally, the output is predicted through a fully connected layer.Figure 3 presents a schematic of the T2V-BiLSTM or T2V-BiGRU model.Figure 4 summarizes the complete data preprocessing, model training, and prediction process for this study's test set.

Hybrid models
To verify the applicability of hybrid models in forecasting daily crude oil, gold, and silver prices, we used CNN-BiLSTM, CNN-BiGRU, TCN-BiLSTM, and TCN-BiGRU models.CNNs in the initial layers of the hybrid model can learn low-level spatial features, such as local patterns, while the BiLSTM layers can learn high-level temporal dependencies.This hierarchical representation learning allows the model to capture local and global dependencies in the time-series data.CNNs and TCNs are well suited for feature Fig. 3 T2V-BiLSTM or T2V-BiGRU models.s is the input sequence length, k is the T2V output size, h is the recurrent hidden size extraction from raw data, including time-series data.They can automatically learn relevant features and reduce the dimensionality of the input, which can be beneficial for downstream BiLSTM or BiGRU layers to learn more meaningful representations.The explanation of each model structure is as follows.
CNN-BiLSTM and CNN-BiGRU models: First, a one-dimensional convolution layer is applied to input sequences in the CNN module.Then, a max pooling layer is applied to the output of the convolution layer to extract the essential features.Next, the output of the pooling layer is entered into a single-layer BiLSTM or BiGRU module, and the final output is predicted through a fully connected layer.
TCN-BiLSTM and TCN-BiGRU models: First, a TCN module receives the input sequences.Next, the output of the TCN is introduced into a single-layer BiLSTM or BiGRU module, and the final output is predicted through a fully connected layer.

Ensemble and machine-learning models
This study uses random forest and LightGBM, a gradient-boosting technique among the ensemble machine-learning models.Random forest generally provides high prediction accuracy because of the aggregation of multiple decision trees.It is less prone to overfitting than individual decision trees.By combining multiple trees and using techniques such as bagging and random feature selection, random forest reduces variance and improves the model's generalization ability.It is also robust to outliers and missing values; however, it lacks autocorrelation modeling because random forest treats each data point independently and does not explicitly consider the temporal dependencies between consecutive observations in the time series.Random forest is not well suited for extrapolation, especially for long-term forecasts; thus, it may be difficult to capture and project future trends extending beyond the observed data range.While random forest is generally robust to overfitting, it can still be sensitive to noisy data; it may overfit the noise if the dataset contains a substantial amount of noise or irrelevant features, leading to degraded performance.
LightGBM is a powerful and efficient gradient-boosting framework that performs excellently in various machine-learning tasks.LightGBM is highly efficient and can handle large datasets with millions of instances and features.It uses a histogram-based algorithm to achieve faster training and prediction times than traditional gradient-boosting implementations.The main advantage of LightGBM is low memory usage due to the use of a compact data structure for representing the dataset during training.Like other gradient-boosting algorithms, LightGBM can be prone to overfitting if not properly regularized or tuned.LightGBM may struggle to capture complex feature interactions compared with deep-learning models.
SVR is a machine-learning model that captures linear and nonlinear relationships between variables.It can handle high-dimensional datasets and capture complex relationships between variables.The algorithm focuses on the support vectors, the data points that influence the model's decision boundary most.Outliers have less impact on this model because of the use of a margin.SVR allows using different kernel functions, such as linear, polynomial, radial basis function, and sigmoid.This flexibility enables the modeling of various relationships between the input and target variables; however, SVR performance highly depends on selecting appropriate hyperparameters, such as kernel type, regularization parameter, and kernel-specific parameters.Training an SVR model can be computationally expensive, especially when dealing with large datasets or complex kernel functions.SVR does not account for the temporal dependencies among observations for time-series datasets.
KNN is an instance-based, nonparametric algorithm that uses different distance metrics, such as Euclidean distance, Manhattan distance, or cosine similarity, to make predictions.The KNN does not explicitly learn a model from the training data.Instead, it stores the entire training dataset and uses it during prediction, eliminating the need for a time-consuming training phase.As the number of training instances increases, the algorithm's prediction time can be significant because it requires calculating distances to all training samples.Some limitations of KNN models are the curse of dimensionality, sensitivity to the scale of features, intensive memory requirement, time-consuming predictions with large datasets, and lack of capturing temporal dependencies.

Evaluation criteria
This study adopts the following three metrics to calculate the forecasting error and evaluate the prediction performance: MAE, MAPE, and RMSE.MAE measures the difference between two continuous variables and calculates the mean value of all absolute errors.MAPE is a scaleless error value that measures the relative forecasting error.RMSE represents the standard deviation of the residual error between the predicted and observed values.The models' prediction performance increases with decreasing error measures.The formula for the above evaluation criteria is as follows: where n is the sample size, and y i and ŷi are the true and predicted values for sample i , respectively.

Data description and preprocessing
The daily closing prices of WTI and Brent crude oil, gold, and silver were collected from 2000-01-04 to 2022-03-25 (Fig. 5).The original spot price data for WTI and Brent crude oil are derived from the US Energy Information Administration (https:// www.eia.gov), while the spot prices of gold and silver are from KITCO (https:// www.kitco.com).We used data from the same trading days across all four markets to obtain an identical sample size for all time series.
To find the best hyperparameters and evaluate the models' real-world performances, evaluating them on a separate validation set and a test set representing future unseen data is essential.Splitting the time-series datasets is challenging because of temporal dependencies, seasonality, and trends.If we split the data randomly, it breaks the temporal order, and the model may be trained on future data, leading to data leakage and overfitting.Moreover, if the training set does not capture the full range of seasonality or fails to include representative trend patterns, the model's ability to generalize to unseen data may be compromised.Ensuring the training set contains consecutive past observations to predict future observations, includes multiple seasonal cycles, and adequately captures the underlying trends is crucial.Time-based splitting and rolling window approaches can address these challenges in time-series analysis.In time-based splitting, we split the data based on a specific date or time, ensuring that the training set only contains past observations and the test set contains future observations.In the rolling window approach, a sliding window is used to create samples in the training, validation, and test sets, where each sample includes past observations and the corresponding future where x t and x t ′ denote the data before and after standardization, respectively.Table 2 summarizes the sample's descriptive statistics and statistical tests for WTI and Brent crude oil, gold, and silver.The total sample size for all markets is 5426.All four market spot prices show significant characteristics of skewness, while WTI, Brent, and gold also represent significant leptokurtic properties at a 5% significance level.Furthermore, the significant Jarque-Bera test statistics at a 1% significance level show that the WTI, Brent, gold, and silver price time series do not comply with the normal distribution; hence, these markets can be treated as nonstationary signals.
For these forecasting tasks, x t = {x 1 , x 2 , . . ., x s } is the input vector, where x i is the price data at day i and s is the sequence length (sliding window length), and y t = {x s+1 } is the target.We created inputs for different sequences before sending a series into the model.In this study, we train 16 deep-and machine-learning models with four different sliding window lengths of 5, 30, 60, and 90 days to predict the next-day WTI, Brent, gold, and silver prices.We have considered 5 as a relatively short sliding window length and 30, 60, and 90 as relatively long to capture any seasonality or trend in the data.We will compare deep-and machine-learning models to determine how they forecast commodity price time series with longer input sequences. (23)

Empirical results
Crude oil and precious metals are essential commodities in financial markets.This study aims to forecast the daily price of WTI and Brent crude oil, gold, and silver through deep-learning models and compare the prediction performance of deep-learning models with random forest, LightGBM, SVR, and KNN models as baseline machine-learning models, hence, our results indicate the best deep-learning model for forecasting crude oil, gold, and silver daily prices.We will experiment with the performance of all models across four sliding window lengths of 5, 30, 60, and 90 days to indicate the suitable input length for superior performance with each model.The deep-learning models used in this study are LSTM, BiLSTM, GRU, BiGRU, T2V-BiLSTM, T2V-BiGRU, CNN, CNN-BiL-STM, CNN-BiGRU, TCN, TCN-BiLSTM, and TCN-BiGRU models.
We used grid search on the validation dataset to tune and select the optimal hyperparameters of each model.The common hyperparameters among all models are the number of epochs, batch size, dropout rate, and learning rate, equal to 50, 32, 0.2, and 0.001, respectively.Table 3 presents the selected hyperparameters of four best-performing models in this study.Due to the large scale of the study and space limitations, we only presented the selected hyperparameters of BiGRU, T2-BiGRU, TCN, and TCN-BiGRU models for each market.The hyperparameters of the other models are available upon request from the corresponding author.
After each training step, the weights of the models are updated by the Adam optimizer with a scheduled learning rate (lr) as follows: The initial learning rate ( lr 0 ) is 0.001, applied from epoch one through epoch five, and then exponentially decreases for each epoch after epoch five.In this study, the models were trained to minimize the MSE loss function.The objective function of the training process is as follows: (24) lr = lr 0 if epochs < 5 lr * e (−0.1) otherwise .where ŷi is the predicted price, and y i is the true target price for sample i.
Overfitting in financial market price forecasting experiments can lead to misleading and unreliable results.Overfitting occurs when a model is too complex and can capture the noise in the data rather than the underlying patterns.The consequences of overfitting in financial market price forecasting can be severe.Traders reliant on the overfilled model may make poor investment decisions, leading to significant losses.Furthermore, the overfilled model may be susceptible to market changes, making it difficult to use in real-world situations.Techniques such as cross-validation, dropout, early stopping, and pruning (for random forest and LightGBM) are employed to mitigate the risk of overfitting in crude oil and precious metals market price forecasting.Cross-validation involves partitioning the data into training and validation sets and evaluating the model on the validation set to assess its generalization performance.Model regularization in this study is achieved through a dropout layer in the models' architectures and early stopping after 10 epochs during training.Early stopping will end the training process if the validation error does not improve.To further assure the robustness of the forecasting results, all reported errors and predicted values are the average outputs from 10 runs of each model.
All deep-learning models are implemented using Tensorflow Keras, and machinelearning models are created using Sklearn.The experiments were conducted using Python 3.8 and run on a computing system with a 70 W Tesla T4 NVIDIA-SMI GPU, CUDA version 11.2, and 16 GB RAM.

WTI price forecasting
To show the computational performance of our deep-learning models for WTI next-day spot price forecasting, we draw the forecasting performance of LSTM, BiLSTM, GRU, BiGRU, T2V-BiLSTM, T2V-BiGRU, CNN, CNN-BiLSTM, CNN-BiGRU, TCN, TCN-BiLSTM, and TCN-BiGRU models, which we compare with the baseline models, i.e., random forest, LightGBM, KNN, and SVR models.Each model was executed 10 times to reduce randomness and improve the robustness of the results.Table 4 presents the MAE, MAPE, and RMSE values for the forecasted next-day WTI prices in the test dataset across all models.Among the evaluated models and considering two out of three performance criteria, the TCN model consistently achieves the lowest MAE and MAPE for WTI price forecasting across all input sliding window sizes.However, when considering the RMSE metric, the BiGRU model outperforms the other models for input sequences of lengths 5 and 30.Conversely, for input sequences of lengths 60 and 90, the TCN-BiGRU and T2V-BiGRU models demonstrate superior performance, respectively.In addition to the superior prediction performance, the forecasting error of the TCN model is not significantly affected by the input sequence length, as we obtain MAE values of 1.510, 1.455, 1.444, and 1.472 with sequence lengths of 5, 30, 60, and 90, respectively.Comparing this with other models, we can see that most models' performance is more sensitive to the input sequence length.Using bidirectional models has proved effective

Table 4 WTI price forecasting performance
To assure the robustness of models' performances, the average of errors in ten runs of the models are reported here.in NLP tasks (Arbane et al. 2023;Huang et al. 2023;G. Liu and Guo 2019;Raza and Schwartz 2023); however, little attention has been paid to using these models for price time-series forecasting.In this study, all three performance criteria from Table 4 show that bidirectional recurrent models, such as BiLSTM and BiGRU, perform better than unidirectional models, such as LSTM and GRU, for all sequence lengths.Bidirectional RNNs exploit the network memory to process information from backward and forward directions.Therefore, interdependency among data samples is learned better compared to unidirectional models that only use forward-direction information processing.Our findings comply with Yang and Wang (2022) and Siami-Namini et al. ( 2019), who found that the BiLSTM model outperformed the LSTM model for time-series prediction.Furthermore, it is evident from Table 4 that GRU-type models such as GRU, BiGRU, T2V-BiGRU, CNN-BiGRU, and TCN-BiGRU perform better than LSTM-type models such as LSTM, Bi LSTM, T2V-Bi LSTM, CNN-Bi LSTM, and TCN-Bi LSTM in WTI price forecasting.
To evaluate the effectiveness of Time2Vector embedding in WTI price forecasting, we compare the MAE, MAPE, and RMSE of the BiLSTM and BiGRU models with those of the T2V-BiLSTM and T2V-BiGRU models, respectively.Using the T2V input embedding, the MAE of the BiLSTM and BiGRU models with input sequence 5 increases from 1.821 and 1.570 to 1.985 and 1.889, respectively.In contrast, the MAE of the BiLSTM and BiGRU models with input sequence 90 decreases from 1.904 and 1.699 to 1.670 and 1.523, respectively.Arguably, Time2Vector embedding does not improve forecasting with smaller input sequences, 5 and 30, while it improves the WTI price forecasting performance for longer sequences of 60 and 90.To study the impact of hybrid models, such as CNN-BiLSTM and CNN-BiGRU, we compared their performance with single BiLSTM and BiGRU models.Combining the CNN model with recurrent-type models has a detrimental effect on the forecasting performance of WTI prices, as evidenced by an increase in MAE across all sequence lengths.This outcome occurs because the CNN module downsamples the input sequence, and some information that might be useful for BiLSTM or BiGRU models will be lost, resulting in higher forecasting errors.Similarly, a single TCN model outperforms the hybrid TCN-BiLSTM and TCN-BiGRU models.The TCN model can see the entire sequence in its receptive field and use the best temporal features to forecast the WTI price; therefore, combining it with a recurrent-type model will only increase the complexity of the model and cause an overfitting problem without significant improvements in forecasting performance.
Upon examining the forecasting errors of ensemble tree-based models, i.e., random forest and LightGBM, it becomes clear that random forest performs poorly in predicting WTI prices, whereas LightGBM demonstrates exceptional forecasting capabilities.The MAPE and RMSE values of LightGBM across sequence lengths of 5, 30, and 90 days are consistently the lowest among all 16 forecasting models.Consequently, LightGBM can be considered an approximate match to the TCN model as the top-performing method for WTI price forecasting.Moreover, the performance of LightGBM exhibits a slight decline as the input sequence lengths increase; however, this decrease in performance is not significant, indicating that LightGBM is relatively insensitive to variations in the input sequence length.Conversely, using the SVR and KNN models, it becomes clear that the performance of conventional machine-learning models tends to deteriorate as the input sequences grow.In contrast, deep-learning models are less affected by larger input sequences, demonstrating their robustness.All deep-learning models outperform the SVR and KNN models for larger input sequences; however, for smaller sequences, such as those with a length of 5, the KNN model performs better than the deep-learning models, except for the BiGRU and TCN models.This discrepancy can be attributed to the data within each sequence serving as input features for the KNN model.As the sequence length increases, the KNN model faces greater challenges in identifying the nearest neighbors required for accurately predicting the target price.
Figure 6 presents the RMSE for the WTI next-day spot price forecasting models to find the best sliding window length for each forecasting model.Our experiments with WTI price forecasting show that using only recurrent-type models such as LSTM, GRU, BiLSTM, BiGRU, T2V-BiLSTM, and T2V-BiGRU, we obtain better prediction performance compared with using only CNN or a hybrid of CNN with Recurrent-type models such as CNN-BiLSTM and CNN-BiGRU.Recurrent-type models are not very sensitive to the input sequence length, and they even perform slightly better with relatively longer input sequences because longer sequences enable the model to learn more upward, downward, and complex patterns and generalize better in predicting unseen data.Nonetheless, since the CNN models cannot memorize important information from past data points, the forecasting error of CNN-type models, such as a single CNN, CNN-BiLSTM, and CNN-BiGRU, increases with the input sequence length.The RMSE of TCN-BiLSTM and TCN-BiGRU is generally smaller than the RMSE of CNN-BiLSTM and CNN-BiGRU models; therefore, we can conclude that among the hybrid models, the TCN module performs better than the CNN module in extracting the essential temporal features.Figure 6 shows that the input sequence of 60 days of lagged data points is generally better than other sliding window lengths such as 5, 30, or 90 days for WTI daily price forecasting; however, the CNN, CNN-BiLSTM, and CNN-BiGRU models perform better with an input sequence of 5 days than the other sequence lengths for WTI price prediction.Among the machine-learning models, Ensemble tree-based models emerge as the leading models for forecasting WTI prices.Notably, the random forest model exhibits subpar performance with shorter input sequences.LightGBM consistently performs well across all input sequences, demonstrating its robust forecasting capabilities.In contrast, the forecasting performance of the SVR and KNN models deteriorates as the input sequence length increases, suggesting that these models struggle to capture complex patterns and relationships effectively within longer data sequences.
Our observations regarding WTI forecasting align with Qin et al. (2023), where the GRU model demonstrated superior performance compared with random forest, SVR, and LSTM models, achieving a lower MAPE value.Similarly, our results corroborate with J. Yuan et al. (2023), highlighting that LightGBM exhibited significantly better performance than the LSTM and SVR models.

Brent price forecasting
Table 5 shows the errors, MAE, MAPE, and RMSE, of our forecasting models for Brent next-day spot price forecasting.We compared the forecasting performance of the LSTM, BiLSTM, GRU, BiGRU, T2V-BiLSTM, T2V-BiGRU, CNN, CNN-BiLSTM, CNN-BiGRU, TCN, TCN-BiLSTM, and TCN-BiGRU models with the baseline models, random forest, LightGBM, KNN, and SVR models.According to the lowest values of the MAE and RMSE measures for all input sequence lengths, 5, 30, 60, and 90, the TCN is the bestperforming model in predicting the Brent crude oil price in the test dataset.Considering the MAPE for input sequences with 5 lagged data points, the TCN model has the best Brent price prediction performance; for input sequences of lengths 30, 60, and 90, the T2V-BiGRU model outperforms other models.Furthermore, the TCN model is not particularly sensitive to the input sequence length.The TCN achieves a robust and stable forecasting performance for all input sequence lengths as the MAE with 5, 30, 60, and 90 sequences are 1.295, 1.353, 1.315, and 1.301, respectively.The performance of most other models exhibits higher sensitivity to changes in the input sequence length for Brent crude oil.For instance, the MAEs of the CNN model grow with increasing sequence length as it obtains MAEs of 1.542, 1.879, 2.818, and 5.194 with sequence lengths of 5, 30, 60, and 90, respectively.Similar to our findings for WTI crude oil price forecasting, we found that BiLSTM and BiGRU models generally outperform unidirectional LSTM and GRU models in forecasting Brent crude oil prices.By juxtaposing the MAE, MAPE, and RMSE of the GRU-type models (such as GRU, BiGRU, T2V-BiGRU, CNN-BiGRU, and TCN-BiGRU) with those of the LSTM-type models (such as LSTM, BiLSTM, T2V-BiLSTM, CNN-BiLSTM, and TCN-BiLSTM) we found that a GRU unit is a more appropriate recurrent unit for Brent crude oil price forecasting.
The impact of Time2Vector embedding in Brent crude oil price forecasting is assessed by comparing the MAE, MAPE, and RMSE of the T2V-BiLSTM and T2V-BiGRU models with the BiLSTM and BiGRU models, respectively.Table 5 shows that T2V embedding improves the forecasting performance of the BiLSTM model for input sequences of 60 and 90 while it stimulates the performance of the BiGRU model for input sequences of 30, 60, and 90.The results of Brent crude oil price forecasting confirm that T2V embedding favorably influences forecasting with longer input sequences.For the hybrid models, our results indicate that combining the CNN model with recurrent-type models adversely affects the performance of the BiLSTM and BiGRU models for Brent crude oil price forecasting.The same pattern appears when comparing the forecasting performance of a single TCN model with the TCN-BiLSTM and TCN-BiGRU hybrid models in predicting Brent daily prices.The TCN model outperforms the hybrid models.
Comparing the forecasting errors of the random forest, LightGBN, SVR, and KNN models with our deep-learning models indicates that the forecasting performance of deep-learning models is superior to that of machine-learning models.However, the ensemble LightGBM model stands as an exception, demonstrating remarkable performance as the second-best model among all 16 models for forecasting Brent crude oil prices across all input sequence lengths.This exceptional performance sets Light-GBM apart from the other models, emphasizing its robustness and effectiveness in accurately predicting Brent crude oil prices, regardless of the input sequence length; however, for the short sequence length of 5, the KNN performs better than the deeplearning models, except for the BiGRU, CNN, and TCN models.Figure 8 represents the RMSE of the forecasting models implemented in this study to predict the next-day Brent crude oil price in the test dataset.Our results denote that the recurrent-type models such as LSTM, GRU, BiLSTM, BiGRU, T2V-BiLSTM, and T2V-BiGRU outperform the CNN and hybrid models such as CNN-BiLSTM, CNN-BiGRU, TCN-BiLSTM, and TCN-BiGRU in terms of Brent price forecasting.Figure 8 shows that, in general, the efficacity of recurrent-type models in predicting the Brent price is enhanced with relatively longer input sequences; however, the CNN and hybrid models do not perform well with longer input sequences.The RMSE of TCN-BiLSTM and TCN-BiGRU are mainly lower than the RMSE of CNN-BiLSTM and CNN-BiGRU models; therefore, we can infer that the TCN module performs better than the CNN module in extracting the critical temporal features of Brent crude oil price.Examining the ensemble and conventional machine-learning models, namely random forest, LightGBM, SVR, and KNN, indicates that the optimal forecasting input sequence for Brent price prediction is five days.The LightGBM model achieves superior forecasting across all input sequences and, thus, is not significantly affected by changes in the input sequence length.As a general observation, the forecasting performance of these baseline models declines as the input sequence length increases, which indicates that shorter input sequences provide more accurate and reliable predictions than longer sequences when using these models for forecasting Brent prices.Regardless of the machine learning-type models, CNN, CNN-BiLSTM, and CNN-BiGRU models that perform better with shorter input sequences, our experiments indicates that the best input sequence length for Brent crude oil forecasting is 60 days of past data.Hence, the lowest RMSE values across most of the deep-learning models in this study are achieved for an input sequence length of 60 for Brent crude oil price forecasting.

Gold price forecasting
Table 6 presents the forecasting errors of gold price prediction with 16 deep-and machine-learning models.Considering the models' resulting MAE, MAPE, and RMSE, the TCN model has the best gold price prediction performance for input sequences of 5 and 90 days.Moreover, for gold price predictions with input sequences of 30 and 60, the BiGRU and GRU models show superior performance.Our results show that in most cases, the deep-learning models performed remarkably better than the baseline random forest, LightGBM, SVR, and KNN models in predicting the price of gold.Compared with CNN-BiLSTM, TCN-BiLSTM, and TCN-BiGRU, the SVR model achieved lower MAE, MAPE, and RMSE values.The prediction with gold price data shows that bidirectional LSTM models perform better than unidirectional LSTM models for all input sequences.Meanwhile, the BiGRU model outperformed the GRU model exclusively for input sequences of 5 and 60 days.Comparing the gold price forecasting errors of the GRU-type models, such as GRU, BiGRU, T2V-BiGRU, CNN-BiGRU, and TCN-BiGRU, with those of the LSTM-type models, such as LSTM, Bi LSTM, T2V-Bi LSTM, CNN-Bi LSTM, and TCN-Bi LSTM, we found that the GRU-type models are more appropriate than the LSTM-type models for gold price forecasting.
Figure 5 shows that the dynamics of gold price movement from 2000-01-04 to 2022-03-25 differs from the WTI and Brent crude oil markets, and an upward trend is visible in Gold price movements throughout the time.Nevertheless, our deep-learning models could predict the gold price for the test data relatively well.In contrast to its performance in WTI and Brent price forecasting, the LightGBM model surprisingly did not  exhibit strong generalization capabilities when predicting the gold price during the test data period.Despite its success in other forecasting tasks, the LightGBM model failed to provide accurate and reliable predictions for gold prices, indicating that the underlying dynamics and patterns of gold price data might differ significantly from those of WTI and Brent.Table 8 shows the coefficient of variation for the resulting MAEs of all forecasting models.The coefficient of variation is a scaleless value calculated by dividing the SD of the model MAEs through various input sequence lengths by the mean of those MAEs.The forecasting results of the gold market with the results of the WTI and Brent crude oil markets from Table 8 show that the models are more sensitive to the input sequence lengths of the gold market as the MAE forecasting error of each model varies markedly across the sequence lengths.
Figure 10 depicts the RMSE of our forecasting models to predict the next-day gold price in the test dataset.The recurrent-type models, such as LSTM, GRU, BiLSTM, BiGRU, T2V-BiLSTM, and T2V-BiGRU, generally have lower RMSE values compared to the CNN and hybrid models, such as CNN-BiLSTM, CNN-BiGRU, TCN-BiLSTM, and TCN-BiGRU.This result aligns with the research conducted by He et al. (2019) on gold price prediction, which demonstrated that a hybrid CNN-LSTM model did not exhibit superior performance compared with individual CNN or LSTM models.
A shorter input sequence of 5-day price data is more useful in gold price predictions with deep-and machine-learning models.The gold price forecasting performance generally deteriorates by increasing the input sequence length.The best prediction performance across all models and sequences was achieved through the BiGRU model using 30 days of gold price data.Based on the findings presented in Table 8, it is evident that LightGBM exhibits a higher coefficient of variation for MAE in Gold price forecasting than WTI and Brent crude oil.This outcome indicates that LightGBM is considerably sensitive to changes in the input sequence length when predicting the gold price.The higher coefficient of variation indicates that the performance of LightGBM may vary significantly when the input sequence length changes, underscoring the need for careful consideration and optimization of the input sequence length specifically for gold price forecasting with LightGBM.

Silver price forecasting
As a precious metal, the daily spot price of silver is forecasted through the deep-learning models in this study and compared with the random forest, LightGBM, SVR, and KNN forecasts.Table 7 shows the MAE, MAPE, and RMSE of silver price predictions.The TCN model is the best-performing model across all input sequence lengths to forecast the daily silver price, as it scores the lowest MAE, MAPE, and RMSE among all models.Besides the TCN's superior ability to forecast the silver price, this model is the least susceptible to the input sequence length, as shown by the MAE coefficient of variation in Table 8.The coefficient of MAE variation across all sequence lengths is 0.015 for the TCN model, the lowest among all models.The results of this study indicate that, except for the TCN-BiLSTM and TCN-BiGRU models with an input sequence of five days, our deep-learning models are superior to the SVR and KNN models in predicting the price of silver.For silver price forecasting, providing bidirectional information seems promising with the BiLSTM model as it reached lower MAE, MAPE, and RMSE values than the unidirectional LSTM; however, bidirectional information did not improve the forecasting performance of the GRU model for silver price prediction.Furthermore, the results from Table 7 indicate that GRU-type models have a relatively better forecasting performance than LSTM-type models for silver price prediction.
Using the ensemble (random forest and LightGBM) or conventional (SVR and KNN) machine-learning models, only LightGBM outperformed some of the deep-learning models, namely CNN, CNN-BiLSTM, CNN-BiGRU, TCN-BiLSTM, and TCN-BiGRU, in silver price forecasting.LightGBM was the best machine-learning model for silver price forecasting across all sequence lengths.
Comparing the MAE coefficient of variations between the silver and gold markets in Table 8 shows that the performance of our forecasting models is relatively less affected by changes in the input sequence length when predicting the silver market.This finding indicates that the forecasting models exhibit greater stability and consistency in their predictions for the silver market, regardless of variations in the input sequence length.Unlike the gold market, where the models show higher sensitivity to changes in the input sequence length, the silver market demonstrates a more robust and reliable forecasting performance across different input sequence lengths.
Figure 12 presents the RMSE of our deep-learning models to forecast the silver nextday price in the test dataset.Similar to the results of the WTI, Brent, and gold markets, the silver price forecasting error of the recurrent-type models such as LSTM, GRU, BiLSTM, BiGRU, T2V-BiLSTM, and T2V-BiGRU are generally lower than the forecasting error of the CNN and hybrid models such as CNN-BiLSTM, CNN-BiGRU, TCN-BiLSTM, and TCN-BiGRU.The best-performing model for predicting the silver price is the TCN model, which demonstrates robust forecasting performance across all input sequence lengths.Our results show that the recurrent-type models generally perform better with a longer input sequence of 90 days to predict the next-day silver price.The best prediction performance across all models and sequences is achieved through the TCN model using 60 days of past silver price data.Moreover, in hybrid models such as CNN-BiLSTM, CNN-BiGRU, TCN-BiLSTM, and TCN-BiGRU, the TCN module performs better than the CNN module in extracting the temporal features of the silver market price.
Figure 13 illustrates the line chart of the best-predicted silver prices in the test dataset with the actual silver price values from 2020-01-03 to 2022-03-25, showing that the TCN and random forest models are the best and least generalizing models in silver price forecasting.
Using MAPE as the metric, our silver price prediction results surpass those of a Gono et al. (2023), which employed random forest and XGBoost methods.Our best MAPE for silver price prediction, 1.52%, significantly outperforms the best MAPE of 5.98% achieved by Gono et al. (2023).
Our significant empirical findings can be summarized as follows.
1. TCN is the best model for generalizing and forecasting commodity market prices.2. LightGBM is the best machine-learning model for forecasting commodity market prices; however, compared with the TCN model, it performs poorly in capturing and responding to sharp market dynamics.manage exposure to market volatility.Accurate crude oil price forecasting can provide a competitive advantage by enabling managers to make timely and informed decisions.They can anticipate market trends, respond quickly to price fluctuations, and stay ahead of competitors regarding pricing, supply chain management, and customer satisfaction.

Conclusion
Crude oil, particularly WTI and Brent, is crucial in global financial markets and economics.In recent years, crude oil prices have become more vulnerable to geopolitical and macroeconomic factors.Thus, understanding the dynamics of crude oil markets is inevitable.Furthermore, precious metals such as gold and silver are key commodities mined in particular countries, which makes the economies of these countries highly reliant on precious metal markets.Moreover, gold is a substitute asset for stock markets and is indispensable in financial investment portfolios.Therefore, developing an accurate forecasting model for crude oil, gold, and silver price movements is vital for policymakers, business owners, investors, and other stakeholders to mobilize timely political movements, foresee market trends, and properly design investment strategies to mitigate investment risks.In this study, we implement 12 deep-learning models, namely, LSTM, BiLSTM, GRU, BiGRU, T2V-BiLSTM, T2V-BiGRU, CNN, CNN-BiLSTM, CNN-BiGRU, TCN, TCN-BiLSTM, and TCN-BiGRU, to forecast the WTI, Brent, gold, and silver market prices and compare their forecasting performance with four baseline models, namely, random forest, LightGBM, SVR, and KNN models.We use each market's historical price information for this and apply four different sliding window lengths of 5, 30, 60, and 90 days.MAE, MAPE, and RMSE evaluation metrics are employed to assess the forecasting power of each model.We compared the forecasting performance of these models across various input sequence lengths and found that the TCN model is the bestperforming model for forecasting the prices of WTI, Brent, gold, and silver.LightGBM exhibits comparable forecasting performance to the TCN model in accurately predicting WTI and Brent crude oil prices.Our results also indicate that the BiGRU and GRU models are the best for predicting gold spot prices with input sequences of 30 and 60, respectively.The best forecasting performance for each market is WTI through a TCN model with input sequence 60, MAPE 3.53%, Brent through a TCN model with input sequence 5, MAPE 2.64%, gold through a BiGRU model with input sequence 30, MAPE 0.85%, and silver through a TCN model with input sequence 60, MAPE 1.53%.Eventually, our study indicates using the TCN model for superior financial time-series price predictions.
From the empirical results, we determine that the bidirectional LSTM and GRU models outperform the unidirectional LSTM and GRU models, respectively.Moreover, GRUtype models such as GRU, BiGRU, T2V-BiGRU, CNN-BiGRU, and TCN-BiGRU outperformed their LSTM-type peers in predicting WTI, Brent, gold, and silver prices.
Our study has several implications for policymakers and investors.First, the results of this study can assist investors and decision makers in promptly anticipating crude oil, gold, and silver market prices and adjusting their investment portfolios.Additionally, stakeholders can execute risk-hedging methods and lower their losses with timely predictions.In particular, gold is considered a suitable safe-haven asset for the stock and cryptocurrency markets (Junttila et al. 2018).Therefore, timely prediction of the gold market price will help stock market investors hedge their portfolios.Regarding organizational-level and country-level relationships, organizations such as the Organization of the Petroleum Exporting Countries, World Petroleum Council, and International Energy Agency and government agencies can further apply the indicated method, for example, the TCN model, to devise profitable policies related to global crude oil prices.Finally, our study would be particularly valuable for forecasting crude oil, gold, and silver prices in case of extreme events such as the COVID-19 pandemic and the recent conflict between Russia and Ukraine, which were covered in the period considered in this study.
Several limitations must be acknowledged in our research on forecasting crude oil and precious metal prices.First, these markets' volatile and nonlinear nature poses difficulties in capturing all the intricate patterns and sudden price changes.Additionally, external factors such as natural disasters, geopolitical events, and supply-demand dynamics can significantly influence commodity prices and accurately incorporating these factors into forecasting models remains a complex task.Finally, it is essential to acknowledge the inherent uncertainty in forecasting and implement appropriate risk management strategies.Addressing these limitations will enhance the robustness and reliability of our research findings.
Some possible directions for improving crude oil and precious metals price forecasting exist.First, rather than using only historical price data, other features such as technical indicators, macroeconomic features, supply and demand data, production rate, and interconnections with other financial markets can be used to predict crude oil and precious metal prices.Second, incorporating the stakeholders' sentiments, which can be derived from news articles and social media platforms, might improve the forecasting performance of our proposed method.Finally, an alternative to using sequential data, other data structures, and learning methods, such as temporal graph neural networks, can be implemented to forecast price time-series data.

Fig. 1 a
Fig. 1 a LSTM internal cell structure, b GRU internal cell structure, c A single layer BiLSTM or BiGRU model

Fig. 2
Fig. 2 (left) The architecture of a TCN model with a stack of two dilated causal convolutional layers and a residual connection.(right) a dilated causal convolution layer with dilated factos D = {1, 2, 4} and kernel size k = 2

Fig. 4
Fig. 4 The price time-series forecasting flow chart Thus, for each market, the entire dataset is split into three parts: 65% training data (from 2000-01-04 to 2014-06-15), 25% validation data (from 2014-06-16 to 2020-01-02), and 10% test data (from 2020-01-03 to 2022-03-25).The test data period includes the financial crisis due to the COVID-19 pandemic and the sharp decline in crude oil prices in April 2020.Therefore, test data include highly volatile price data, making forecasting even more challenging.Since deep-learning models are sensitive to the scale of data, we normalized each dataset into [0,1] intervals to limit the effect of noise, speed up the updating of neural network parameters, and enhance the training performance of the model.The formula to standardize the data is as follows:

Fig. 6
Fig. 6 RMSE of WTI crude oil next-day price forecasting models

Figure 7
compares the line chart of predicted WTI prices in the test dataset with the actual WTI price value from 2020-01-03 to 2022-03-25.The predicted values at the end of April 2020 indicate that the TCN model surpasses the LightGBM model in accurately capturing sharp changes in the WTI price.The TCN model demonstrates superior performance in detecting and predicting abrupt fluctuations in price, showcasing its ability to capture and respond to sudden market dynamics with greater precision than the LightGBM model.

Fig. 7
Fig. 7 Comparison of WTI crude oil price forecasting models on the test dataset

Fig. 8
Fig. 8 RMSE of Brent next-day price forecasting models

Fig. 9
Fig. 9 Comparison of Brent crude oil price forecasting models on the test dataset

Fig. 10
Fig. 10 RMSE of Gold next-day price forecasting models

Fig. 11
Fig. 11 Comparison of Gold price forecasting models on the test dataset

Fig. 12
Fig. 12 RMSE of Silver next-day price forecasting models

Table 1
Literature review of crude oil and precious metal forecasting

Table 2
Descriptive statistics a 1Null hypothesis is that the series are not skewed b Null hypothesis is that the series show normal kurtosis c Null hypothesis is that the series are not normally distributed *, ** denote the rejection of the null hypothesis at the 1% and 5% significance level, respectively

Table 3
Selected hyperparameters of models Bold indicates the lowest value

Table 5
Brent price forecasting performanceTo assure the robustness of models' performances, the average of errors in ten runs of the models are reported here.Bold indicates the lowest value

Table 6
Gold price forecasting performanceTo assure the robustness of models' performances, the average of errors in ten runs of the models are reported here.Bold indicates the lowest value

Table 7
Silver price forecasting performanceTo assure the robustness of models' performances, the average of errors in ten runs of the models are reported here.Bold indicates the lowest value

Table 8
coefficient of variation (CoV) for the MAE of forecasting models i = s 5 , s 30 , s 60 , s 90 .Models with values in bold are least sensitive to the input sequence lengths for each market's price predictions.Bold indicates the lowest value ,