Abstract
Several events in the last years changed to some extent the common understanding of the electricity day-ahead market (DAM). The shape of the electricity price curve has been altered as some factors that underpinned the electricity price forecast (EPF) lost their importance and new influential factors emerged. In this paper, we aim to showcase the changes in EPF, understand the effects of uncertainties and propose a forecasting method using machine learning (ML) algorithms to cope with random events such as COVID-19 pandemic and the conflict in Black Sea region. By adjusting the training period according to the standard deviation that reflects the price volatility, feature engineering and by using two regressors for weighing the results, significant improvements in the performance of the EPF are achieved. One of the contributions of the proposed method consists in adjusting the training period considering the price variation. Thus, we introduce a rule-based approach given an empirical observation that for days with a higher growth in prices the training interval should be shortened, capturing the sharp variations of prices. The results of several cutting-edge ML algorithms represent the input for a predictive meta-model to obtain the best forecasting solution. The input dataset spans from Jan. 2019 to Aug. 2022, testing the proposed EPF method for both stable and more tumultuous intervals and proving its robustness. This analysis provides decision makers with an understanding of the price trends and suggests measures to combat spikes. Numerical findings indicate that on average mean absolute error (MAE) improved by 48% and root mean squared error (RMSE) improved by 44% compared to the baseline model (without feature engineering/adjusting training). When the output of the ML algorithms is weighted using the proposed meta-model, MAE further improved by 2.3% in 2020 and 5.14% in 2022. Less errors are recorded in stable years like 2019 and 2020 (MAE = 6.71, RMSE = 14.67) compared to 2021 and 2022 (MAE = 9.45, RMSE = 20.64).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Starting from 1996, the European Union (EU) by Directive 96/92/EC progressively enabled the competition in the energy market, aiming to create one single integrated internal European electricity market, to reduce prices, increase the security of supply, and enhance the life standards for the European citizens. This effort has started more than 20 years ago and is still ongoing [27]. It assumed the translations of the EU directives and regulations into the national legislation [16]. Therefore, the structure of the electricity market is relatively similar in the European state members. In this context, the performance of the EPF is important for economic activities, energy traders, suppliers, producers, researchers and policy makers as the electricity traded on the DAM has currently a predominant share (43%) in the total electricity consumption. Thus, the DAM prices tremendously influence on the economy as they have a significant impact on the final electricity prices [23]. Furthermore, the EPF have become essential for energy suppliers to optimize their energy procurement strategies [33].
Differences between long- and mid-term bilaterally negotiated contracts and short-term energy requirements are settled through DAM and intraday markets. However, even in more stable intervals, the EPF has been a challenge for researchers as it depends on numerous complex drivers; some influences are difficult to be modeled and sharp fluctuations of electricity prices also known as spikes often occur [20, 38]. The share of RES is increasing in the European state members, although its generation is more volatile, and administrators rely on short-term forecast and prefer to trade on DAM rather than to sign forward contracts [12, 31]. On the other hand, random events have tremendous side effects that influence electricity prices. Therefore, most of the EPF methods are not robust enough to cope with random events or black swans such as those recently experienced.
The first event, the COVID-19 pandemic disease shook the world business and was something that the modern citizens have not experienced yet. Moreover, the constraints and lockdowns baffled and slowed down the economic activities slightly decreasing the electricity prices. It started to spread in March 2020 and soon affected the entire world. From the electricity consumption point of view, between 2019 and 2020, it rather stalled at almost the same level and the prices slowly went down until 2021, when after the lockdowns, the activities were almost fully resumed, and the electricity requirements increased. In 2021, the electricity prices on the DAM were more than doubled compared with 2020 and, in 2022 vs. 2021, they doubled and even tripled on average at certain time intervals. Such soaring prices were also influenced by excessive draughts in recent years and higher inflation after the lockdowns, when the economic activities were interrupted or drastically reduced. Thus, the offer of goods and services was reduced in lockdowns, but afterwards the request increased abruptly leading to higher inflation and interest rates. Furthermore, some electricity producers encountered higher prices for emission allowances via the European Union Emissions Trading Scheme (EU ETS) leading to more expensive electricity especially generated by coal-based power plants. These facts created the snowball effect that amplified the electricity prices.
The second event, the military conflict that started in Ukraine due to Russian invasion that eventually has been transformed into a war with millions of refugees and casualties was also rather unexpected. In modern times, when business prevails and artificial intelligence should guide and improve our living standards, nobody expected conflicts in Europe. It also led to sanctions and energy sabotage that underlined the dependency of Europe from the energy resources of Russia especially gas and oil, on which Europe unfortunately relied. This conflict increased the gas and oil prices to unexpected levels, acting as a cold shower on the most European day-ahead markets. Furthermore, the measures taken in the European countries increased the speed of installing PV systems and transforming consumers into prosumers, as well as efficiency measures such as replacing light bulbs, installing heat pumps and so on. However, this stress on the electricity price inflated more on the prices of goods and services and the interest rate, exacerbating the side effects of the first event. Huge variations of average prices were encountered on the DAMs of the European countries. In Fig. 1, two snapshots of the average electricity prices from 5th of Jan. and 30th of Aug. 2022 are presented.
Noticeable changes took place in the DAM, the energy crisis hit this market altogether: prices go up and down, the curve level is 3–4 times higher than in 2020, it is more humped, and the spikes are sharper and more frequent. The effects of these events and the evolution of prices on the DAMs motivated us to understand the trends and propose a forecast method to cope with random events. To obtain a robust forecasting method able to handle random events, an economical understanding of electricity markets, combined with feature engineering, and standout machine learning algorithms are necessary.
The proposed forecasting method relies on a couple of ML algorithms, such as: random forest, light gradient boosting, histogram-based gradient boosting, extreme gradient boosting and voting regressors. The results of the ML algorithms are weighted, and the training interval is adjusted per forecasting interval considering the standard deviation of the prices. Furthermore, its strengths relies in collecting numerous fundamental features extracted from various sources such as: National Statistics Institute, Electricity Market Operator, Romanian Commodities Exchange, Transmission System Operator, reliable websites for commodities historical prices, Romanian National Bank for projections, Eurostat, etc. and feature engineering or derivate features extracted from the raw ones.
An interesting finding is related to the shape of electricity price curve and the importance of the features. Some of them lost or gained importance depending on the forecast interval as we train and test the algorithms for a longer interval from January 2019 to August 2022. The novelty of our method relies in adjusting the training period considering the price variation in the past days. Thus, we build a rule-based approach based on an empirical observation that for days with a higher increase in prices the training interval should be shortened capturing the sharp increase and speculative effect of the price soaring. The results of several state-of-the-art ML algorithms are input for a regressor (that could be linear regression or gradient descent) to obtain the best solution or the weighed forecast. This approach is robust and provides accurate results for both more stable intervals and for those with much higher price variation.
The remainder of this paper is structured as follows: in Sect. 2, an investigation of similar EPF methods is performed and in Sect. 3, input data are analyzed to understand the current general context. Section 4 focuses on the proposed EPF method, Sect. 5 is dedicated to the results and comparison, whereas Sect. 6 is dedicated to the discussions and interpretation of the results. Conclusion and implications for DAM are presented in Sect. 7.
2 Literature Review
As the world transitions to cleaner and more sustainable energy sources to combat climate change, there are risks associated with this transition [17] including economic and employment disruptions in fossil fuel-dependent regions, technological challenges, and policy uncertainties. Considering the importance of the EPF, researchers struggled to perform accurate EPF. There is a large palette of methods implemented to obtain the EPF: from statistical and hybrid methods [13, 32] to deep learning [26, 35]. However, in comparison with other research studies, that consider shorter intervals that do not cope with uncertainties, the proposed EPF solution is robust and efficient in both stable and tumultuous context of the electricity market. The drawback of shorter intervals in which EPF solutions are tested is that they are unable to effectively predict prices outside these intervals.
To identify the factors that influence the electricity price on DAM, several hypothesis are tested [14] with data from 2007 to 2019. In terms of random events, this interval is characterized by the financial crisis that started in 2008 and continued in the following years. The influence of the cost of active energy, utilization cost of transmission and distribution capacity, losses in transmission and distribution systems, supplier and market operator fees, fee for stimulating RES and electricity generated by RES on the electricity price on DAM in Montenegro are tested using a regression model. Only the cost of active energy had a significant impact on the price of electricity. Furthermore, the impact of distribution capacity utilization on the price of electricity was significant.
Aiming to forecast the baseline and spikes in the electricity price, decision trees are involved [15]. One classification tree is envisioned to predict the occurrence of spikes and two regression trees are applied for EPF. First, the price is classified as spike or normal, then the price variation is calculated. Feature selection is performed to choose the features with the highest explanatory capacity. As input, hourly electricity prices on different bidding zones, load, transmission capacity, cross-border physical flows, wind and PV forecast are collected to perform EPF. A price value is identified as spike if it is outside a band of plus/minus 2 times standard deviation of the price series. Econometric time-series models for EPF, such as autoregressive ARMA, ARIMA, VAR, GARCH and ARIMAX, singular spectrum analysis [29] are usually outperformed by the machine learning algorithms [22, 25, 34] artificial neural networks are considered to perform EPF focusing on the selection and processing of the input variables [24], such as gas prices, CO2 certificate prices, renewable feed-in, cost of fuel and consumption. It is interesting that in the recent past, electricity prices were depended on the weekdays and weekend days, holidays, seasons, weekly and annual cycles, but they are not significant anymore as random events are stronger drivers. RES volatility also changed the pattern of the DAM, increasing its share and introducing more price variation. [24] focused on preparing 55 fundamental variables and a combination of multiple configurations of the ANN model tested in with various input parameters, finding the best setting of the model. They compared ANN with (S)ARIMA results using RMSE and MAD. However, [24] did not provide forecast at different moments in time (in various years, for instance) and the capacity of the model to handle random events was not tested.
As many recent studies [18, 19, 28] included RES generation into the main EPF driver along with total load, weather data (temperatures in 2 cities in Germany), and intraday prices, investigating several years, from October 2015 to September 2019, but no significant random events took place. They used autoregressive (AR) models with exogenous variables for day-ahead and intraday prices. However, the lack of the random events (numerous fundamental variables are ignored as they were not significant at that time such as gas/oil price, emission price and price index) and the fame of the AR models do not recommend this approach as a robust method. Additionally, more recent research is focused on the RES optimization and integration into the power systems, energy communities and microgrids [2, 3, 8, 30], prediction [2, 6,7,8], air quality [4], conventional resources optimization [6, 7, 9], water quality [1], reducing computation [21] and data processing [5].
Zhang et al. [36] proposed an integrated model using the improved empirical mode decomposition, ARMA with exogenous terms (demand for Spanish and Australian markets), exponential GARCH and adaptive network-based fuzzy inference system. They considered 672–1200 observations for training and 24–168 observations for testing and compared the results with wavelet transform combined with ARMA and back propagation neural network, kernel extreme learning machine, and least squares support vector machine. Ziel and Steinert [37] proposed a model, namely X-model, using the sale and purchase curves of the electricity exchange or biding structure of the auction data. They processed the data, reduced its dimension and considered lasso-based estimation methods to obtain the day-ahead EPF for the German and Austrian markets. Usually, the individual bids and offers are not open-source data. Another drawback of the model is that it was not tested on long-term considering severe changes generated by random events.
More recent studies regarding electricity price predictions were provided in [10, 11]. In this paper, the dataset interval is generous and recent. It refers to an S–E European country (Romania), whereas most previous studies focused on Western and Northern European markets (such as UK, Norway, Germany, Austria, France, etc.). The authors identified a research gap as the S–E European countries were less investigated. Moreover, most of the studies do not include recent events (such as: COVID-19 pandemic and the conflict in the Black Sea region) and their impact on electricity prices. Another interesting aspect refers to the novelty of the dataset that was entirely built for this research. It can be used in further studies to analyze the storage requirements to replace conventional power plants or to identify the required power system flexibility that can come from consumers or business partners.
3 Input Data Depiction
In this section, the input dataset is described. Numerous time series are considered in the EPF to cope with the uncertainties brought by random events. We collected several datasets for January 2019–August 2022 from various open sources and merged these time series on a daily and hourly basis. Different time resolutions were encountered: trimestral, monthly, daily hourly, and 10-min resolution. As the target is the electricity price that is extracted from the Romanian Market Operator (MO) on an hourly basis, all datasets were adjusted to be in line with this resolution. Furthermore, the traded electricity on DAM was extracted from MO.Footnote 1 The data related to the Romanian power system operational state (total consumption and generation, wind, hydro, PV, oil and gas, biomass, coal and nuclear generation and exchange on the overhead tie-lines) were extracted from the Transmission System Operator website.Footnote 2 Furthermore, gas and oil prices, gas quantity traded in Romania (Romanian Commodities ExchangeFootnote 3), price index or inflation in Romania (National Statistics Institute) and at the European Union level (EurostatFootnote 4), interest rate,Footnote 5Footnote 6 Danube water levelFootnote 7 (measured at three points) and emission allowanceFootnote 8 were scrapped from open data sources. The initial data processing flow is briefly depicted in Fig. 2.
Out of the total 43 features, 21 are raw features (electricity quantity on DAM, gas price on DAM, oil price, etc.), 22 are engineered features (by aggregation six features: minimum, mean, variance, standard deviation, maximum, median, and 16 are derivates: range, max/mean, previous prices 24 h, previous prices 48 h, previous prices 72 h, previous prices 96 h, previous prices 120 h, previous prices 144 h, previous prices 168 h, previous 3 days prices mean, previous 7 days prices mean, hourly variations of the previous prices for the last 3 days, encoded weekday from 0 to 6, hour from 1 to 24). The electricity price on DAM is the target that is present in the training dataset and removed from the testing datasets. The raw features extracted from open data sources from 2019 to 2022 are depicted in Appendix 1, Tables 4, 5, 6 and 7, showing the average basic statistics. It is worth mentioning that the electricity price standard deviation varied from 85 in 2020 to 707 in 2022. In 2019, the electricity price on DAM (or target) is directly correlated with traded electricity (60%) on DAM and total consumption (64%), and inversely correlated with Danube water levels (− 30%) as in Table 1. In 2020, the electricity price on DAM is directly correlated with total consumption (77%), coal and oil & gas generation (57%). Furthermore, it was medium correlated with the price of emissions (44%). In 2021, the electricity price on DAM is directly correlated with gas price on DAM (83%), inflation (79%), price of emissions (71%), interest rate—ROBOR 3 M (70%), oil price (59%) and inversely correlated with Danube water levels (− 30%) and inversely correlated with Danube water levels (− 57%). As in Table 1, in 2022, the electricity price on DAM is directly correlated with gas price on DAM (76%), interest rate (59%) and inversely correlated with Danube water levels (− 52% on average) and inversely correlated with wind generation (− 32%). Several more comprehensive correlations are presented as heatmaps in Appendix B.
On average, the hourly consumption evolution is presented in Fig. 3a, its variations are small in amplitude, whereas the hourly average price variation is depicted in Fig. 3b. It can be noticed that both level and shape of the electricity price have changed in time. The price soared to unprecedented levels in 2022. The electricity price shapes in 2019 and 2020 are almost flat, but in 2021 and 2022 they show two humps that are similar with the daily load curves, showing morning/evening peaks and night valleys. This characteristic is related to the very expensive generating units that are able to satisfy the request bids at peaks. Also, there is a speculative effect of the ever-increasing prices, thus new features are built to overcome and predict these spikes. Producers tend to ask for more on DAM only because they know the trend and try to maximize its advantage. The producers also know the hydrologic conditions, RES forecast and the planned outages of the nuclear units, anticipating the level of scarcity and asking for higher prices. Another aspect is related to the auctioning price mechanism that favors the producers with cheaper fuel. The uniform price mechanism offers the price of the most expensive unit of energy to all producers with accepted offers regardless of their prices. This means the same price is paid for all producers, but this increases the burden on the supplying side especially during energy scarcity. The pay-as-bid pricing mechanism could be more appropriate for this market.
In Fig. 4, the average electricity traded and prices on DAM are depicted in combo graphs, year by year: from 2019 to 2022. The price scales are similar in 2019 and 2020, but in the next two years, tremendously soaring prices are encountered.
The average monthly prices for electricity and gas on DAM in 2019–2022 are displayed in Fig. 5. It is noticeable that in the last two years 2021 and 2022, the electricity price is highly correlated with gas price. The increase is constant from May 2021 to December 2021. Then, the increase is even sharper in 2022.
The monthly electricity traded on DAM and the share of electricity traded on DAM in total consumption are shown in Fig. 6. Monthly electricity traded on DAM varied according to Fig. 6a. The monthly electricity traded is higher in July and August in 2022, whereas in January 2020 was the highest electricity traded on DAM. However, the differences are not big, but the last two months in 2022 show an increase that is directly correlated with the increasing prices. The share of electricity traded on DAM in the total consumption varied between 38.5% in 2019 and 46.6% in 2022 (as in Fig. 6b).
The evolution of electricity price and quantity on DAM and gas price on DAM are also represented in Fig. 7. There is an obviously directly correlated increase in the prices starting from mid-2021, when lockdowns and restrictions were eliminated. Then, the increase is amplified by the second random event–conflict in Ukraine.
The variations of the electricity prices and quantities on DAM are showcased in Fig. 8 using boxplots. They show an accelerated increase in the electricity price from 2020 to August 2022. In 2022, values above 2900 RON/MWh are outliers, whereas 50% of the prices varied between 849 and 1645 RON/MWh. The price variations is more evident in 2021 and 2022, showing more extreme values and even outliers that were less frequent in the previous years.
In comparison with 2022, in 2020, 50% of the prices varied between 173 and 289 RON/MWh. Impressive variations took place in the European market for emission allowances, as in Fig. 9a. The price of emitting a tone of carbon (in the EU ETS) attained a minimum value in March 2020 (16 €/tCO2) and then it rose from 33 €/tCO2 (1st Jan. 2021) to a record high of 89 €/tCO2 (8th Dec. 2021). The barrier of 40 €/tCO2 was broken in March 2021 due to cold temperatures. In May 2021, prices climbed above 56 €/tCO2 for the first time and in Aug. 2022, it continued to head higher, above 98 €/tCO2. The monthly variation in Fig. 9b shows a constant increase in 2021.
Three of the Danube water levels are represented in Fig. 10. The water levels were daily recorded in Turnu Magurele, Braila and Tulcea points. They reflect the precipitation level in different years and the potential that some important hydro-power plants (like Portile de Fier) had to generate electricity. There is a clear decrease of the water levels starting from the end of 2021 that accentuated the drought and diminished the hydro generation. This issue led to a smaller hydro capacity and a higher market concentration.
The levels are highly correlated and especially in 2022, the water levels were extremely low indicating a weak hydrology and less potential for hydro-power plants depending on the Danube river.
4 Methodology and Research Hypotheses
The EPF methodology consists in two major steps that first prepare and configure the input of the ML models, then trains the algorithms, compute the final forecast, and evaluate the accuracy of the models. For training the ML models, a rule-based approach is built based on an empirical observation that for a month with a higher deviation in prices, the training interval should be shortened capturing the sharp increase and speculative effect of the soaring price. After training, the output of the ML algorithms is the input for a regressor model in order to increase the individual accuracy and obtained a weighed forecast of the electricity prices.
4.1 Configure the Input for the ML Algorithms
The input of the ML algorithms is initially composed by the 22 raw features (including the target) that represents the recorded values of the previous day:
The input is filled with 22 more engineered features obtained by aggregating the previous day’s hourly prices and by determining the range and variations of the current prices versus previous days prices.
4.1.1 Aggregated Features
The following variables are calculated for the previous day: minimum and maximum price, mean price, variance, standard deviation and median of the previous day prices.
These variables are added to the initial input \({X}^{{{h}}}\):
4.1.2 Derivate Features
Based on the previous day aggregated values, in this step, 16 new variables are calculated and added to the model. Range price is obtained in Eq. (7) as the difference between the maximum and the minimum price of the previous day:
Ratio price index is obtained in Eq. (8) as the ratio between the maximum and the average price of the previous day:
Previous hourly prices with a lag of 24–168 h: \({{\text{ElQ}}}^{{{h}}-24}\), \({{\text{ElQ}}}^{{{h}}-48}\), \({{\text{ElQ}}}^{{{h}}-72}\), \({{\text{ElQ}}}^{{{h}}-96}\), \({{\text{ElQ}}}^{{{h}}-120}\), \({{\text{ElQ}}}^{{{h}}-144}\), \({{\text{ElQ}}}^{{{h}}-168}\). The average of the previous hourly prices for 3–7 consecutive days are calculated using Eqs. (9), (10):
The hourly variations of the previous prices for the last 3 days are determined using Eq. (10):
The weekday (\({W}^{{\text{d}}}\)) and the hour (\({{h}}\)) are also added to the input variable to capture the time variations. Finally, the input of the ML algorithms is completed with the above calculated features:
The actual electricity hourly prices constitute the target variable (\({y}^{h}={ElP}^{h}\)) of the ML algorithms that are trained and applied to provide their estimations or output (\(\widehat{{y}_{m}^{h}}\)).
4.2 Training and Evaluation of the Ensemble ML Algorithms
Five ML algorithms are trained: Random Forest (RF) from bagging ensemble machine learning algorithms category, Light Gradient Boosting Regressor (LGBR), Histogram-Based Gradient Boosting Regressor (HGBR), eXtreme Gradient Boosting (XBR) from boosting category and voting regressors (VR) from voting category.
4.2.1 Setting the Training Interval
The training process reveals that the results depend on the training interval. Thus, for periods with small fluctuations a longer training interval is suitable for ML training, while for periods with higher fluctuations it is recommended to shorten the training interval. Therefore, the training interval is adapted considering the monthly standard deviation of the price. The greater the deviation, the shorter the training interval. As it can be noticed from Fig. 11, the deviations are higher during 2021–2022. This is also evident from the basic statistics performed at the yearly level and shown in Appendix A (variable El_price_DAM).
Based on the results obtained during the training process, the following rules can be applied to the training interval (\(T\)):
where \({\sigma {\text{ElP}}}^{m-1}\) represents the previous month standard deviation of electricity price. The ML models are trained and tested on the pair (\({X}^{h},{y}^{h}\)) and provide their output \(\widehat{{y}_{k}^{h}}\) that is evaluated according to Eqs. (18–20) and used in the next step to train a regressor model and obtain the final prediction of electricity price.
4.2.2 Weighted Forecast of Electricity Price
The estimations provided by the ML algorithms are used as input for a regressor model to obtain a weighed forecast that increases the accuracy of the individual models. The final forecast of the hourly electricity prices for the next day (\({\text{hf}}=\stackrel{-}{\mathrm{1,24}}\)) is obtained using Eq. (16) by adjusting the individual estimations of the ML models (\(\widehat{{y}_{k}^{{\text{hf}}}}\)) with a set of weights (\({\theta }_{k}\)):
The weights are calculated with a regressor model (e.g., using linear regression or gradient descent) by minimizing the difference between the actual electricity prices and their estimation during the training interval (\(T\)). Therefore, the regressor model minimizes the sum of squared errors using Eq. (17). Figure 24 (Appendix C) represents the steps of the proposed methodology to obtain the forecast of the hourly electricity prices.
4.2.3 Assessing the Accuracy of the Models
To evaluate the accuracy of the ML models, the following metrics are calculated for the training and testing process: root-mean squared error (RMSE), coefficient of determination (\({R}^{2}\)), mean absolute percentage error (MAPE) and mean absolute error (MAE):
For the weighted forecast, the same equations are used by replacing \(\widehat{{y}_{k}^{{{h}}}}\) with \(\widehat{{{\text{ElP}}}^{{{h}}}}\).
5 Results and Assessment
The training interval varies according to Eq. (15) and the forecasting horizon is 7 days although the focus is on the day-ahead forecast (or the electricity prices for the next 24 h) that is required for DAM. The comparison is provided in Tables 2 and 3 between the proposed method (weighted forecast), the average results of the five ML algorithms and the baseline approach with fixed training interval and no feature engineering. The results for the baseline approach are obtained by calculating the yearly average of the estimations of the five ML algorithms without feature engineering.
Several graphical results are extracted and showcased in Figs. 12, 13, 14, 15 and 16. Results are grouped by several consecutive days in each year (2019–2022). The forecast is performed for the next 7 days, but the results are reliable for only for the first days: the more distant the time horizon, the greater the prediction errors. For 2019–2021, the forecast is reliable for the next 4–5 consecutive days, whereas for 2022, only the electricity prices for the next 2 consecutive days are accurately predicted. P1–P5 represent the estimations (\(\widehat{{y}_{k}^{{{h}}}}\)) of the five ML algorithms described in Sect. 4.2; El_price_DAM (orange) represents the actual electricity price (\({y}^{{{h}}}\)); El_price_DAM_F (blue) represents the average of the estimations obtained by the five ML algorithms (average of the \(\widehat{{y}_{k}^{{{h}}}}\) values). For simulations performed for 2019 and 2020, the results are showcased in Figs. 12 and 13.
For simulations performed for 2021, the results are showcased in Fig. 14.
For simulations performed for 2022, the results are showcased in Figs. 15 and 16.
For stable years such as 2019 and 2020, the forecast was more accurate, the model providing very good results. Additionally, for more disruptive years (2021 and 2022), the forecast provided reliable results by adjusting the training interval and extracting features that enhance the proposed model.
The weighted forecast (\(\widehat{{{\text{ElP}}}^{{{h}}}}\)) is obtained using two regressors (linear regression and gradient descent) that calculate the weights (\({\theta }_{k}\)) and adjust the predictions obtained with the five ML algorithms. El_price_DAM_PF represents the weighted forecast (\(\widehat{{{\text{ElP}}}^{{{h}}}}\)) that is obtained either by linear regression or gradient descent (as in Figs. 17, 18, 19). The results are provided for days in 2019 and 2022, showing that the proposed method proved to be robust for calmer and more thrilling intervals. Our approach relies on adjusting training intervals based on the monthly standard deviation, multiple fundamental features from various open data sources, feature engineering, several standout ML algorithms and combining the individual estimations of the ML algorithms with a set of weights. The regressor model (either linear regression or gradient descent) minimizes the sum of squared errors. For simulations performed for 2019 (known as a stable year from the price fluctuations point of view) using the proposed methodology, the results are showcased in Figs. 17 and 18.
For simulations performed for 2022 (known as a disruptive year from the price fluctuations point of view) using the proposed methodology, the results are showcased in Fig. 19a, b. Thus, based on the results, the proposed EPF approach is able to cope with random events such as the COVID-19 pandemic, lockdowns and conflict in Ukraine.
6 Discussion
In this paper, we investigated a newly built data set with numerous variables that spans from January 2019 to August 2022. It entirely covers the interval before and after COVID-19 pandemic and the emergence of the conflict in the Black Sea region. For interpreting the results, we grouped them on the two types of years: more stable 2019 and 2020 and more disruptive from the price fluctuations point of view: 2021 and 2022. The numerical results are presented in Tables 2 and 3. Numerical results indicate that on average MAE improved by 48% and RMSE improved by 44% over the entire interval compared to the baseline model (without feature engineering).
When the output of the ML algorithms is weighted using the proposed meta-model, MAE further improved by 2.3% in 2020 and 5.14% in 2022. Less errors are recorded in stable years like 2019 and 2020 (MAE = 6.71, RMSE = 14.67) compared to 2021 and 2022 (MAE = 9.45, RMSE = 20.64). MAPE and coefficient of determination follow similar trends. The latter significantly improved from 0.73 to 0.97 in 2019, from 0.75 to 0.98 in 2020, from 0.68 to 0.95 in 2021 and from 0.65 to 0.93 in 2022.
When the results of the ML algorithm were averaged, MAPE decreased from 12.98 to 4.06 compared to the baseline model in 2019, from 11.76 to 3.89 in 2020, from 26.94 to 5.05 in 2021 and from 29.18 to 6.43. It further decreased by around 1 unit when the proposed model was applied. The lowest error values were recorded in the more stable years: 2019 and 2020. However, good results were obtained for 2021 and 2022 when the price fluctuations were much higher.
To graphically visualize the prediction results, they were depicted in several figures showcased in the previous section. Several consecutive days (in March and April) were chosen to show the hourly price variation and the prediction performed with the individual ML algorithms and with the proposed model in which their results are combined. One can notice that the forecasting lines follow the price for both stable and disruptive intervals, indicating a good predictive capacity of the proposed model. The linear regressor meta-model provides better results. Additionally, the average output of the ML models (blue line) proved to be a good estimation.
7 Conclusion
To cope with black swans, a robust method for EPF is required, in which more fundamental features and feature engineering should be considered. The training interval requires adjustment, the more price variates the shorter the training. They are adjusted based on the price variations to obtain the best performance. Aggregated features and features generated by shifting the interval from 24 to 168 h are important especially for the last 2 years. Finally, two approaches are taken into account to combine the results of the five standout ML algorithms—random forest, light gradient boosting, histogram-based gradient boosting, extreme gradient boosting and voting regressors. Thus, the results are weighted using two regressors (linear regression and gradient descent). The algorithms learn from different features depending on the forecasting interval.
From the exploratory data analysis, we noticed that the most important features in 2019 lost their importance and other features become significant in the coming years marked by black swans or random events, such as COVID-19 and conflict in Ukraine. Some features lost importance (e.g., total consumption in 2021 and 2022 is less important) and other emerged from uncertainties (e.g., gas price, price index, interest rate, emissions allowance, water level) brought by the random events or extreme climate conditions. In simulations, we also added the electricity prices and quantities traded on other DAMs from neighboring countries (Bulgaria, Serbia, Hungary). Moreover, the capacity flows on the interconnection with Hungary was added in the model as fundamental features, but these did not improve the performance of the results and they were removed from the input dataset.
From investigating the evolution of the variables between January 2019 and August 2022, we found out that the shape of the electricity price showed sharper humps in the last 2 years proving that the most expensive generators are used at peaks. RES producers and even other producers with a more predictable output tend to trade more on DAM. The share of electricity traded on DAM in the total consumption increased at 46% in 2022 (in the interval January–August). The humps of the electricity price shape indicate that the uniform price mechanism may not be appropriate for DAM in the context of severe drought, higher gas and oil prices, higher price index and increased consumption level. The most expensive price for each MWh will be paid to all generators regardless of their costs, but the burden on the electricity suppliers and final consumers is tremendous. Alternative price mechanisms should be investigated and applied for DAM, such as more popular pay-as-bid, or less speculative generalized second-price or Vickrey–Clark–Glove pricing mechanisms.
In Romania, governmental measures to protect consumers consider capping the price of the electricity to the households to 0.68 RON/kWh if the consumption is less or equal to 100 kWh, 0.8 RON/kWh if the consumption is between 101 and 255 kWh. For those with higher consumption, the market price that can be five times higher will be applied. Some offers are as high as 4 RON/kWh. However, for this measure, the budgetary impact is 1 billion RON/month up to August 2023. One of the drawbacks of this measure is that there is no incentive for consumers to reduce the actual consumption since the thresholds are set for 2021. One suggestion is to consider the current consumption and better stimulate consumption reduction.
To reduce the burden on the governmental budget, alternatives should be further analyzed: such as more incentives for PV panel installations at the households’ and small and medium business’s side. Since December 2021, the greenhouse program was stopped for verifications of files due to the lack of administrative capacity. This capacity should be increased, and the process speed up. Alternatively, the flexibility of consumers to adjust or reduce consumption should be moreover investigated. What if the consumers’ flexibility is cheaper than the financial effort of capping the price? Therefore, to combat spikes, as future work, the flexibility of consumers and the storage facilities potential will be investigated.
One of the limitations is related to the data extraction as in the current paper, the data were scraped from various sources. When the page source or libraries/drivers change over time, the scraper has to be updated to further extract data. Therefore, an API or a download option could provide more reliability in terms of data extraction. Another limitation is related to the variables that are selected for Romania in the current paper, but for other country, more or less variables could be considered, thus the variables may depend on the region.
Availability of data and materials
Data will be made available on request.
Notes
Abbreviations
- \(h\) :
-
Time of the historical records used for training and testing (hours]
- \({\text{hf}}\) :
-
Time of the forecasted values (hours]
- \(d, m\) :
-
Day, month corresponding to the training interval
- \(T\) :
-
Time interval for the training and testing of the ML algorithms (hours]
- \({\text{RMSE}}, {R}^{2},\mathrm{ MAPE},\mathrm{ MAE}\) :
-
Evaluation metrics of the ML models
- \({{\text{ElQ}}}^{{{h}}}\) :
-
Traded quantity of electricity on DAM (MWh]
- \({{\text{ElP}}}^{{{h}}}\) :
-
Hourly electricity price on DAM (RON/MWh]
- \({{\text{GasP}}}^{{\text{d}}}\) :
-
Daily gas price on DAM (RON/MWh]
- \({{\text{GasQ}}}^{{\text{d}}}\) :
-
Traded quantity of gas on DAM (MWh]
- \({{\text{InfRO}}}^{{\text{m}}}\) :
-
Monthly inflation for Romania
- \({{\text{InfEU}}}^{{\text{m}}}\) :
-
Monthly inflation for EU
- \({{\text{Robor}}}^{{\text{m}}}\) :
-
ROBOR index
- \({{\text{OilP}}}^{{\text{d}}}\) :
-
Daily oil price ($/barrel]
- \({{\text{LevelTM}}}^{{\text{d}}}\), \({{\text{LevelBR}}}^{{\text{d}}},\,{{\text{LevelTL}}}^{{\text{d}}}\) :
-
Danube water levels in 3 regions (Turnu Magurele, Braila, Tulcea) (cm]
- \({{\text{ElC}}}^{{{h}}}\) :
-
Hourly electricity consumption (MW]
- \({{\text{ElG}}}^{{{h}}}\) :
-
Hourly electricity generation (MW]
- \({{\text{CG}}}^{{{h}}}\) :
-
Hourly coal generation (MW]
- \({{\text{HG}}}^{{{h}}}\) :
-
Hourly hydro generation (MW]
- \({{\text{OGG}}}^{{{h}}}\) :
-
Hourly oil and gas generation (MW]
- \({{\text{NG}}}^{{{h}}}\) :
-
Hourly nuclear generation (MW]
- \({{\text{WG}}}^{{{h}}}\) :
-
Hourly wind generation (MW]
- \({{\text{PVG}}}^{{{h}}}\) :
-
Hourly photovoltaic generation (MW]
- \({{\text{BG}}}^{{{h}}}\) :
-
Hourly biomass generation (MW]
- \({{\text{Ex}}}^{{{h}}}\) :
-
Hourly exchange (MW]
- \({{\text{EUETS}}}^{{\text{d}}}\) :
-
Daily price per tonne of CO2 (EUR/tCO2]
- \({W}^{{\text{d}}}\) :
-
Weekday, \({W}^{{\text{d}}}=\stackrel{-}{0.6}\)
- \({\mu {\text{ElP}}}^{{\text{d}}-1}\) :
-
Mean price of the previous day (RON/MWh]
- \({{\text{ElPmax}}}^{{\text{d}}-1}, {{\text{ElPmin}}}^{{\text{d}}-1}\) :
-
Maximum and minimum price of the previous day (RON/MWh]
- \({\sigma {\text{ElP}}}^{{\text{d}}-1}\), \({{\sigma }^{2}{\text{ElP}}}^{{\text{d}}-1}\) :
-
Standard deviation and variance of the price of the previous day
- \(\widetilde{{{\text{ElP}}}^{{\text{d}}-1}}\) :
-
Median of the price of the previous day (RON/MWh]
- \({{\text{ElPrange}}}^{{\text{d}}-1}\) :
-
Range price of the previous day (RON/MWh]
- \({{\text{ElPratio}}}^{{\text{d}}-1}\) :
-
Ratio price index of the previous day
- \({\mu {\text{ElP}}}^{{{h}}3}, {\mu {\text{ElP}}}^{{{h}}7}\) :
-
Averages of the previous hourly prices for 3 and 7 consecutive days
- \({\Delta {\text{ElP}}}^{{{h}}1}, {\Delta {\text{ElP}}}^{{{h}}2}, {\Delta {\text{ElP}}}^{{{h}}3}\) :
-
Hourly variation of the prices for 3 consecutive days
- \({X}^{{{h}}}\) :
-
Input variable of the ML algorithms (array]
- \({y}^{{{h}}}\) :
-
Actual target variable of the ML algorithms (RON/MWh]
- \(\widehat{{y}_{k}^{{{h}}}}, \,k\in \{{\text{RF}},\,\mathrm{ LGBR},\mathrm{ HGBR},\,\mathrm{ XGB},\,\mathrm{ VR}\}\) :
-
Output of the ML models (forecasted values of electricity price)
- \(\widehat{{{\text{ElP}}}^{{{h}}}}\) :
-
Forecast of the hourly electricity price (RON/MWh]
- \({\theta }_{k}\) :
-
Weights used to adjust the output of the ML models to obtain \(\widehat{{{\text{ElP}}}^{{{h}}}}\)
References
Al-Janabi, S., Al-Barmani, Z.: Intelligent multi-level analytics of soft computing approach to predict water quality index (IM12CP-WQI). Soft. Comput. (2023). https://doi.org/10.1007/s00500-023-07953-z
Al-Janabi, S., Al-Janabi, Z.: Development of deep learning method for predicting DC power based on renewable solar energy and multi-parameters function. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08480-6
Al-Janabi, S., Alkaim, A.: A novel optimization algorithm (Lion-AYAD) to find optimal DNA protein synthesis. Egypt. Inform. J. (2022). https://doi.org/10.1016/j.eij.2022.01.004
Al-Janabi, S., Alkaim, A., Al-Janabi, E., Aljeboree, A., Mustafa, M.: Intelligent forecaster of concentrations (PM2.5, PM10, NO2, CO, O3, SO2) caused air pollution (IFCsAP). Neural Comput. Appl.Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06067-7
Al-Janabi, S., Alkaim, A.F.: A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft. Comput. (2020). https://doi.org/10.1007/s00500-019-03972-x
Al-Janabi, S., Alkaim, A.F., Adel, Z.: An innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy. Soft. Comput. (2020). https://doi.org/10.1007/s00500-020-04905-9
Al-Janabi, S., Mohammad, M., Al-Sultan, A.: A new method for prediction of air pollution based on intelligent computation. Soft. Comput. (2020). https://doi.org/10.1007/s00500-019-04495-1
Al-Janabi, Z.K., Al-Janabi, S.: An efficient predictor of renewable energy based on deep learning technique (DGBM) and multi-objectives optimization function (2023). https://doi.org/10.1109/iiccit55816.2022.10010380
Alkaim, A.F., & Al_Janabi, S.: Multi objectives optimization to gas flaring reduction from oil production. In: Lecture Notes in Networks and Systems (2020)https://doi.org/10.1007/978-3-030-23672-4_10
Bâra, A., Oprea, S.-V., Georgescu, I.A.: Understanding electricity price evolution–day-ahead market competitiveness in Romania. J. Bus. Econ. Manag. 24(2), 221–244 (2023)
Bâra, A., Oprea, S.-V., Tudorică, B.G.: From the East-European regional day-ahead markets to a global electricity market. Comput. Econ.. Econ. (2023). https://doi.org/10.1007/s10614-023-10416-0
Bashir, N., Irwin, D., Shenoy, P.: A probabilistic approach to committing solar energy in day-ahead electricity markets. Sustain. Comput. Inform. Syst. (2021). https://doi.org/10.1016/j.suscom.2020.100477
Chai, S., Xu, Z., Jia, Y.: Conditional density forecast of electricity price based on ensemble ELM and Logistic EMOS. IEEE Trans. Smart Grid (2019). https://doi.org/10.1109/TSG.2018.2817284
Dragasevic, Z., Milovic, N., Djurisic, V., Backovic, T.: Analyzing the factors influencing the formation of the price of electricity in the deregulated markets of developing countries. Energy Rep. (2021). https://doi.org/10.1016/j.egyr.2021.07.046
Fragkioudaki, A., Marinakis, A., Cherkaoui, R.: Forecasting price spikes in European day-ahead electricity markets using decision trees. In: International conference on the European energy market, EEM (2015)https://doi.org/10.1109/EEM.2015.7216672
Gencer, B., Larsen, E.R., van Ackere, A.: Understanding the coevolution of electricity markets and regulation. Energy Policy (2020). https://doi.org/10.1016/j.enpol.2020.111585
Georgescu, I.: Possibility theory and the risk. Stud. Fuzz. Soft Comput. (2012). https://doi.org/10.1007/978-3-642-24740-8_1
Gianfreda, A., Ravazzolo, F., Rossini, L.: Comparing the forecasting performances of linear models for electricity prices with high RES penetration. Int. J. Forecast. (2020). https://doi.org/10.1016/j.ijforecast.2019.11.002
Gürtler, M., Paulsen, T.: The effect of wind and solar power forecasts on day-ahead and intraday electricity prices in Germany. Energy Econ. (2018). https://doi.org/10.1016/j.eneco.2018.07.006
Hu, X., Jaraitė, J., Kažukauskas, A.: The effects of wind power on electricity markets: a case study of the Swedish intraday market. Energy Econ. (2021). https://doi.org/10.1016/j.eneco.2021.105159
Kadhuim, Z.A., Al-Janabi, S.: Intelligent deep analysis of DNA sequences based on FFGM to enhancement the performance and reduce the computation. Egypt. Inform. J. (2023). https://doi.org/10.1016/j.eij.2023.02.004
Karakatsani, N.V., Bunn, D.W.: Fundamental and behavioural drivers of electricity price volatility. Stud. Nonlinear Dyn. Econ. (2010). https://doi.org/10.2202/1558-3708.1657
Kath, C., Ziel, F.: The value of forecasts: quantifying the economic gains of accurate quarter-hourly electricity price forecasts. Energy Econ. (2018). https://doi.org/10.1016/j.eneco.2018.10.005
Keles, D., Scelle, J., Paraschiv, F., Fichtner, W.: Extended forecast methods for day-ahead electricity spot prices applying artificial neural networks. Appl. Energy (2016). https://doi.org/10.1016/j.apenergy.2015.09.087
Lago, J., De Ridder, F., De Schutter, B.: Forecasting spot electricity prices: deep learning approaches and empirical comparison of traditional algorithms. Appl. Energy (2018). https://doi.org/10.1016/j.apenergy.2018.02.069
Li, W., Becker, D.M.: Day-ahead electricity price prediction applying hybrid models of LSTM-based deep learning methods and feature selection algorithms under consideration of market coupling. Energy (2021). https://doi.org/10.1016/j.energy.2021.121543
Li, X., Li, Z., & Guan, X.: Does the investor attention matter in predicting the cny exchange rate? evidence from the asymmetric perspective. Econ. Comput. Econ. Cybern. Stud. Res. (2022). https://doi.org/10.24818/18423264/56.1.22.19
Liu, J., Wang, J., Cardinal, J.: Evolution and reform of UK electricity market. Renew. Sustain. Energy Rev. (2022). https://doi.org/10.1016/j.rser.2022.112317
Maciejowska, K., Nitka, W., Weron, T.: Enhancing load, wind and solar generation for day-ahead forecasting of electricity prices. Energy Econ. (2021). https://doi.org/10.1016/j.eneco.2021.105273
Miranian, A., Abdollahzade, M., Hassani, H.: Day-ahead electricity price analysis and forecasting by singular spectrum analysis. IET Gener. Transm. Distrib. (2013). https://doi.org/10.1049/iet-gtd.2012.0263
Mohammed, G.S., Al-Janabi, S.: An innovative synthesis of optmization techniques (FDIRE-GSK) for generation electrical renewable energy from natural resources. Results Eng. (2022). https://doi.org/10.1016/j.rineng.2022.100637
Sánchez de la Nieta, A.A., Contreras, J.: Quantifying the effect of renewable generation on day–ahead electricity market prices: The Spanish case. Energy Econ. (2020). https://doi.org/10.1016/j.eneco.2020.104841
Shafie-Khah, M., Moghaddam, M.P., Sheikh-El-Eslami, M.K.: Price forecasting of day-ahead electricity markets using a hybrid forecast method. Energy Convers. Manag. (2011). https://doi.org/10.1016/j.enconman.2010.10.047
Spodniak, P., Ollikka, K., Honkapuro, S.: The impact of wind power and electricity demand on the relevance of different short-term electricity markets: the Nordic case. Appl. Energy (2021). https://doi.org/10.1016/j.apenergy.2020.116063
Weron, R.: Electricity price forecasting: a review of the state-of-the-art with a look into the future. Int. J. Forecast. (2014). https://doi.org/10.1016/j.ijforecast.2014.08.008
Zhang, C., Li, R., Shi, H., Li, F.: Deep learning for day-ahead electricity price forecasting. IET Smart Grid (2020). https://doi.org/10.1049/iet-stg.2019.0258
Zhang, J.L., Zhang, Y.J., Li, D.Z., Tan, Z.F., Ji, J.F.: Forecasting day-ahead electricity prices using a new integrated model. Int. J. Electr. Power Energy Syst. (2019). https://doi.org/10.1016/j.ijepes.2018.08.025
Ziel, F., Steinert, R.: Electricity price forecasting using sale and purchase curves: the X-Model. Energy Econ. (2016). https://doi.org/10.1016/j.eneco.2016.08.008
Acknowledgements
This work was supported by a Grant of the Ministry of Research, Innovation and Digitization, CNCS- UEFISCDI, Project number PN-III-P4-PCE-2021-0334, within PNCDI III.
Funding
This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS-UEFISCDI, project number PN-III-P4-PCE-2021-0334, within PNCDI III.
Author information
Authors and Affiliations
Contributions
AB: conceptualization, methodology, validation, formal analysis, investigation, resources, data curation, writing–original draft, writing–review and editing, visualization, and supervision. SVO: conceptualization, validation, formal analysis, investigation, writing–original draft, writing–review and editing, visualization, and project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bâra, A., Oprea, SV. Predicting Day-Ahead Electricity Market Prices through the Integration of Macroeconomic Factors and Machine Learning Techniques. Int J Comput Intell Syst 17, 10 (2024). https://doi.org/10.1007/s44196-023-00387-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44196-023-00387-3