Predictive models development using gradient boosting based methods for solar power plants

doi:10.1016/j.jocs.2023.101958

Journal of Computational Science

Volume 67, March 2023, 101958

https://doi.org/10.1016/j.jocs.2023.101958 Get rights and content

Highlights

•
Solar power prediction studies and methods have been reviewed briefly.
•
Gradient boosting machine methods: XGBoost, LightGBM, CatBoost are briefly explained.
•
Problem formulation with dataset and proposed methods’ implementations are elucidated.
•
Models’ results have been given and compared.

Abstract

Being able to predict the power to be generated by solar power plants in a smart grid, microgrid or nanogrid with high accuracy and speed brings a lot of advantages in the decisions to be made for these systems. Making power generation forecasts, which are strictly dependent on the dynamic energy management of these grids, influences many factors from the amount of energy to be stored to the cost of energy. In this study, the development and analysis of three gradient boosting machine learning-based methods for power prediction are carried out. Innovative and fast predictive models are designed with XGBoost, LightGBM and CatBoost algorithms. These models, which have a training set consisting of several meteorological features, offer considerable benefits such as high accuracy and fast learning. Further, the performances of these models are compared and their applicability is discussed.

Graphical abstract

Introduction

In order to leave a more livable environment with reduced carbon emissions, the methods by which energy is obtained should be selected from renewable energy sources. In the same way, increasing the amount of energy produced by such energy sources will contribute significantly to reducing the carbon footprint. The paradigm of converting the clean and endless energy of the sun into usable electrical energy appears as a field that has been studied for decades. On the other hand, the point reached in artificial intelligence (AI) and the eminent algorithms with proven performance brings the idea of integrating this field into energy systems. AI methods make significant contributions to the control of a system, determining the decisions to be made about the system, future strategies, and increasing efficiency. For this purpose, many machine learning and deep learning algorithms have been integrated into the use of renewable energy resources. Similar to the one set forth in this study, machine learning(ML) algorithms such as solar power estimation have been used with different training sets. In [1], a two-stage estimation study is carried out, firstly irradiance data is estimated from numerical data, in the second stage solar power generation is predicted using the estimated irradiance. Moreover, while in some studies more than one algorithm is used together [2], in others modified and enhanced versions of the same algorithm are used together [3]. In addition to machine learning methods, deep learning approaches have also been introduced for solar power forecast in some studies [4], [5]. In these studies of [6], [7], [8], whereas artificial neural network (ANN) models are trained for prediction [6], LSTM with memory cells or its derivatives are sometimes preferred [7], [8]. Another issue that determines the methods in these studies is whether the targeted prediction study will be short-term or long-term. In the same way, day-ahead forecasting has yielded successful results [9]. Further, the selection of the right data for training predictive models has a significant impact on the performance outcomes of these models. Some studies use geographic structure features for training, satellite photos are seen as very popular training data in this context [10]. Convolutional Neural Networks (CNN), which is a powerful deep learning method, stands out in this regard. Both LSTM and CNN are frequently used with deterministic, heuristic, or meta-heuristic optimization methods which are used for enhancing learning [11]. However, the common problem that arises with these approaches is the need for high computational power and long training times. On the other hand, meteorological data have a considerable influence on the level of solar energy to be produced. In particular, radiation is directly related to solar power generation. In this direction, some studies have focused on estimating solar radiation rather than estimating power and have achieved very successful results [12], [13]. When all these prediction studies focused on solar power plants are examined, it is seen that two different difficulties arise. The first is the difficulty of selecting and preprocessing the training dataset to be used for training the models. The second is the problem of the long training time spent on training the models, although a high percentage of accuracy is demonstrated. When studies are conducted on how to overcome these difficulties, the idea of using meteorological data with more than one feature, instead of choosing only radiation data or satellite photos, comes to the fore for the training set. On the other hand, innovative regression algorithms based on gradient boosting machine (GBM) have been researched to shorten the training time. It has been seen that the XGBoost algorithm yields very promising and strong results in very short-term energy forecasts [14]. Similarly, it can be said that the performance outputs in the load prediction study using this method are impressive [15]. In [16], in which XGBoost and LightGBM are used together with CNN for load prediction, very successful results are achieved. Also, in [17], LightGBM, used in conjunction with CNN, is a useful solution for the power generation forecasting of wind turbines. In a similar study trying to solve the same prediction problem, two popular deep learning techniques CNN and LSTM are applied together with LightGBM and a successful model is presented [18]. CatBoost,a relatively new GBM-based algorithm, can be considered as another method that shows a significant success in prediction studies. This method has been effectively used to solve the long-time load prediction problem [19].

Another benefit of creating fast and highly accurate prediction models as targeted in this study is emerging in machine learning-based control applications as augmenting the methods. This field, with the innovations it promises and the developments it reveals, is explained in detail in [20] under the name of data-driven control. In parallel with this field, where system dynamics is tried to be learned from data and ML models, the subject of reinforcement learning (RL), which is based on learning to control the system through a model environment, has an increasing popularity. In [21], temporal difference-based, and deep neural networks-based RL algorithms are described in detail. The issue of how prediction models can improve novel control methods can be exemplified by control problems in smart grids, microgrids or nanogrids referring to a building’s grids [22]. A control agent based on ML needs a well-designed environment model to learn to control the system. The good design in this regard is to determine the control actions appropriately, to create the finite or infinite possible states correctly, and to determine the rewards that the agent will receive as a result of the control decisions to be made. In the energy management or control problems of the grids that have solar power plants or panels, the highly accurate prediction of the power to be generated by the solar plant is vital because it is used by ML (RL) agent to learn to control the system through the environment model [23]. Briefly, accurate future prediction leads to the creation of correct rewards and the agent learning to control better [24]. In addition to RL-based methods that try to control the system directly, learning the system dynamics with data-driven control-based methods, and then controlling the system are also promising topics [25]. In this context, Sparse Identification of Nonlinear Dynamics with Control (SINDY) and SINDY-PI algorithms [26], [27], which give very successful results in the control of nonlinear systems, can be enhanced with GBM-based prediction models. When all these studies and the solution proposals that have been examined in terms of solar power plants and forecasting models are examined, the contribution of a predictive model that produces high accuracy and fast results is considerable. This study is based on creating and analyzing a solar power prediction model using relatively novel GBM based algorithms (LightGBM, XGBoost, CatBoost). The performance outputs of these models are presented, and compared and their usability is examined. In the next section, the methodology of the models used is described. In the third section, dataset analysis and problem formulation is given. In the following section to the third section, the performance outputs are given.

Section snippets

Gradient Boosting Machine

Ensemble learning (EL) algorithms come to the fore in many AI or prediction-based academic publications, competitions, and prediction projects aiming for high accuracy. This concept is based on creating multiple models and unifying their results for the final output, rather than creating a single model for a forecasting problem. It is divided into two parts as bagging and boosting. In the concept of bagging, more than one model is trained at the same time in parallel using the same dataset. In

Dataset analysis and problem formulation

In solar power forecasting studies, the dataset to be used in the training of the model is as important as the method to be used to create the model. Which machine learning or deep learning method will be used requires obtaining a dataset accordingly. For example, if a CNN-based model is to be created, training with a dataset consisting of satellite photos may give good results, while a prediction model using LSTM requires training with time-series data for easier learning of memory cells. The

Performance outputs

The dataset, which is pre-processed and made ready for training, becomes suitable for the use of the proposed algorithms. Subsequently, Python language and open-source libraries for each method are used while creating the models. The open-source software library prepared by the Distributed (Deep) Machine Learning Community (DMLC) group is used for XGBoost [32]. For LightGBM, open source software library provided by Microsoft company is used [36]. Finally, the python library prepared by Yandex

Conclusion

Around the world, where the importance of renewable energy sources is becoming more and more felt today, using the sun’s endless energy efficiently is vital for the environment to be left to the future. It is essential to make accurate future planning to increase the efficiency of solar power generation, and to anticipate what may happen in order to prepare an appropriate plan. The idea of integrating the powerful and innovative prediction methods offered by machine learning with solar power

CRediT authorship contribution statement

Necati Aksoy: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Istemihan Genc: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

All authors approved version of the manuscript to be published.

Necati Aksoy earned his B.Sci in Electrical and Electronics Engineering. Then, he received his Master’s degree from Temple University, USA. He is a Ph.D. candidate at Istanbul Technical University and a member of the Smart Grid lab at ITU. His research interests include reinforcement learning, microgrids, smart grids, machine learning applications in power engineering and related fields.

References (37)

MishraM. et al.
Deep learning and wavelet transform integrated approach for short-term solar PV power prediction
Measurement
(2020)
GaoY. et al.
Interpretable deep learning models for hourly solar radiation prediction based on graph neural network and attention
Appl. Energy
(2022)
İzgiE. et al.
Short–mid-term solar power prediction by using artificial neural networks
Sol. Energy
(2012)
GhimireS. et al.
Efficient daily solar radiation prediction with deep learning 4-phase convolutional neural network, dual stage stacked regression and support vector machine CNN-REGST hybrid model
Sustain. Mater. Technol.
(2022)
DengX. et al.
Bagging–XGBoost algorithm based extreme weather identification and short-term load forecasting model
Energy Rep.
(2022)
RenJ. et al.
A CNN-LSTM-LightGBM based short-term wind power prediction method based on attention mechanism
Energy Rep.
(2022)
XiangW. et al.
Multi-dimensional data-based medium- and long-term power-load forecasting using double-layer CatBoost
Energy Rep.
(2022)
BruntonS.L. et al.
Sparse identification of nonlinear dynamics with control (SINDYc)
IFAC-PapersOnLine
(2016)
KimH. et al.
Probabilistic solar power forecasting based on bivariate conditional solar irradiation distributions
IEEE Trans. Sustain. Energy
(2021)
Al-DahidiS. et al.
Ensemble approach of optimized artificial neural networks for solar photovoltaic power prediction
IEEE Access
(2019)

WenS. et al.

A hybrid ensemble model for interval prediction of solar power output in ship onboard power systems

IEEE Trans. Sustain. Energy

(2021)

ChaiM. et al.

PV power prediction based on LSTM with adaptive hyperparameter adjustment

IEEE Access

(2019)

TangN. et al.

Solar power generation forecasting with a LASSO-based approach

IEEE Internet Things J.

(2018)

ZhangY. et al.

Day-ahead power output forecasting for small-scale solar photovoltaic electricity generators

IEEE Trans. Smart Grid

(2015)

ChengL. et al.

Short-term solar power prediction learning directly from satellite Images With Regions of interest

IEEE Trans. Sustain. Energy

(2022)

BaeK.Y. et al.

Hourly solar irradiance prediction based on support vector machine and its error analysis

IEEE Trans. Power Syst.

(2017)

SinglaP. et al.

An ensemble method to forecast 24-h ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network

Earth Sci. Inform.

(2022)

MaZ. et al.

Very short-term renewable energy power prediction using XGBoost optimized by TPE algorithm

Cited by (11)

Data-driven approach for day-ahead System Non-Synchronous Penetration forecasting: A comprehensive framework, model development and analysis
2024, Applied Energy
This article presents a comprehensive, innovative, and data-driven approach for predicting System Non-Synchronous Penetration (SNSP) levels. It consists of iterative steps that involve data analytics and forecasting model development to overcome the challenges associated with forecasting, such as data mining or overfitting. The approach starts by defining the problem domain and identifying relevant features using the Pearson correlation method. The framework ensures that all forecasting models carry out data pre-processing uniformly. The hyperparameters, understood as adjustable external factors not learned during the training process that affect the performance and predictive ability of the forecasting model are optimized using the random search algorithm to enhance the models’ performance. The study compares the performance of classical models, such as Random Forest and Light Gradient Boosting, with advanced machine learning-based models, such as Feed-forward, Gate Recurrent Unit, Short-Term Long Memory, and Convolutional Neural Network. Data from the Irish power system is chosen as a case study. The results indicate that the Feed-forward model produces the lowest errors. It has a Mean Absolute Error of about 4.09, a Root Mean Squared Error of 5.37 and a Mean Absolute Percentage Error of 18.17% respectively. This systematic and practical approach can be applied to other regions with similar challenges. This study also highlights the potential of advanced machine learning-based models in improving SNSP forecasting accuracy. The approach is beneficial for network and market operators, and ancillary service providers in smart grid network operations, with a 15-minute resolution. It provides a promising direction for future research in this area.
Predictive models of beetroot solar drying process through machine learning algorithms
2023, Renewable Energy
Precise modeling of the drying process permits to achieve three key objectives: (i) assessing material properties, (ii) characterizing the microstructure, and (iii) optimizing the drying process. Driven by recent advances in machine learning techniques, we employed a machine learning-based approach to investigate the drying process of beetroot in a conventional solar dryer. Experimental part of this study showed that the drying kinetics of beetroot slices were highly impacted by the temperature and the thickness of the slices. Generally, the duration required for drying decreased as temperature and thickness increased. In one hand, the effective diffusivity coefficient was varying in a range of 5.65 × 10⁻⁹ - 7.37 × 10⁻⁷ m²/s. In other hand, the activation energy was ranging from 83.33 to 99.14 kJ/mol. The average activation energy for beetroot slices was approximately 90.47 kJ/mol. Findings show that the moisture transportation mechanism is dominated by liquid diffusion. In the modeling part, our findings suggest that the Catboost model is the most accurate among the evaluated models, based on three metrics: coefficient of determination (R²), Mean Squared Error (MSE), and Mean Absolute Error (MAE). Catboost model shows the higher performance with a R² of 99.99%, MSE of 3.15 × 10⁻⁶, and MAE of 0.02.
Reliability Analysis of Regression-Based Hybrid Machine Learning Models for the Prediction of Solar Photovoltaics Power Generation
2024, SSRN
Forecasting of Solar Power Generation Using Hybrid Empirical Mode Decomposition and Adaptive Neuro-Fuzzy Inference System
2024, Lecture Notes in Electrical Engineering
Short-Term Hours Ahead Forecast of Expected Available Solar Power Using Linear Regression Machine Learning Scheme
2024, Proceedings of the 32nd Southern African Universities Power Engineering Conference, SAUPEC 2024
Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting
2023, Toxins

View all citing articles on Scopus

Istemihan Genc received the B.Sc. degree in electrical engineering from Istanbul Technical University, the M.Sc. degrees in electrical engineering, systems and control engineering, and systems science and mathematics from Istanbul Technical University, Boğaziçi University, and Washington University, respectively. After receiving the Doctor of Science (D.Sc.) degree in 2001 from Washington University in St. Louis, he joined Istanbul Technical University where he is currently a Professor in the Department of Electrical Engineering. His research interests include power system dynamics and stability, smart grids, optimization and control applications in power engineering.

View full text

Predictive models development using gradient boosting based methods for solar power plants

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Gradient Boosting Machine

Dataset analysis and problem formulation

Performance outputs

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Measurement

Appl. Energy

Sol. Energy

Sustain. Mater. Technol.

Energy Rep.

Energy Rep.

Energy Rep.

IFAC-PapersOnLine

Probabilistic solar power forecasting based on bivariate conditional solar irradiation distributions

IEEE Trans. Sustain. Energy

Ensemble approach of optimized artificial neural networks for solar photovoltaic power prediction

IEEE Access

A hybrid ensemble model for interval prediction of solar power output in ship onboard power systems

IEEE Trans. Sustain. Energy

PV power prediction based on LSTM with adaptive hyperparameter adjustment

IEEE Access

Solar power generation forecasting with a LASSO-based approach

IEEE Internet Things J.

Day-ahead power output forecasting for small-scale solar photovoltaic electricity generators

IEEE Trans. Smart Grid

Short-term solar power prediction learning directly from satellite Images With Regions of interest

IEEE Trans. Sustain. Energy

Hourly solar irradiance prediction based on support vector machine and its error analysis

IEEE Trans. Power Syst.

An ensemble method to forecast 24-h ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network

Earth Sci. Inform.

Very short-term renewable energy power prediction using XGBoost optimized by TPE algorithm