Designing a multi-stage multivariate empirical mode decomposition coupled with ant colony optimization and random forest model to forecast monthly solar radiation
Introduction
The quest for increased generation of electricity from renewable sources is the key priority of many nations (including Australia) as a measure to combat and mitigate the drastic impacts of the changing climate. However, during the year 2014–2015, the electricity generated in Australia from renewable sources declined by 7%, of which hydro-electric generation showed the largest decline [1]. This decline was predominantly due to a reduced hydro-electric generation brought about by the recession of dam water levels following a prolonged period of drought. In contrast, solar energy could provide a viable alternative for Australia which is susceptible to drought events. Australia is one of the countries that receive an extremely high level of annual global solar radiation, indicating that solar energy could sustain a high percentage of Australia’s electricity demand [2]. In addition, solar energy has the least adverse impact on the environment, hence is regarded as one of the cleanest renewable energy sources [3]. In particular, the state of Queensland (aka the sunshine state) has a very high rate of incident solar radiation with the State Government committing to a 50% renewable energy target by the year 2030 [4].
Even though this is a very strong proposition, solar energy is known to be highly variable in nature requiring specific technological implementation and grid management systems [2]. Therefore, effective forecasting tools that can potentially be implemented into smart-grid systems are necessary for efficient supply and demand matching to ensure reliable and sustainable solar power generation. A plethora of studies has attempted to forecast solar radiation via data-driven modeling approaches [5], which can largely be classified into classical and hybrid models. In the classical modeling approach, standalone models were applied to forecast solar radiation including (but not limited to) the widely used artificial neural networks (ANN) [6], [7], [8], [9], support vector machine (SVM) [9], [10], [11], support vector regression (SVR), gradient boosted regression (GBR) and random forest [12]. Additionally, Guermoui, Melgani [13] trialed weighted Gaussian process regression both in a parallel forecasting architecture and a cascade forecasting architecture for solar radiation forecasts, while the adaptive neuro-fuzzy inference system (ANFIS) was proposed by Quej, Almorox [9]. Furthermore, a more advanced and efficient algorithm, namely the extreme learning machine (ELM), has also produced worthy results [14]. Yet, these classical and standalone models might not be able to capture all the deterministic features that exist in the historical data when predicting this important variable i.e., solar radiation.
To augment the forecasting capabilities of data-driven modeling, hybrid models have been developed and explored. Gala, Fernández [12] proposed a weighted linear combination of SVR, GBR and RFR outputs, whereby the respective weights were derived from each individual model's mean average error (MAE) during the training period. Hybrid models are developed with the intention to extract as many pertinent features as possible from the predictor input data set to optimize the model performances. A strategic approach to achieving this is by incorporating a suitable feature selection algorithm. Bouzgou and Gueymard [15] applied maximum relevance – minimum redundancy (MRMR) filter as a feature selection method with ELM in order to optimize the forecasting performances. Similarly, fuzzy logic feature pre-processing with an ANN model also gave commendable forecasts [16]. Recently, Salcedo-Sanz, Deo [17] developed a method by integrating Coral Reefs Optimization (CRO) with ELM (CRO-ELM) where the CRO acted as a feature selection function guided by an ELM algorithm. The authors compared the forecasts of their CRO-ELM method with the Grouping Genetic Algorithm integrated with the ELM model (GGA-ELM). In a similar manner, the hybrids of Multivariate Adaptive Regression Splines (MARS), Multiple Linear Regression (MLR) and the Support Vector Regression (SVR) were also studied. They found a better performance in CRO-ELM in comparison to the other models. A major shortcoming of these hybridized models was that they only concentrated on addressing the issue of feature selection, while the other important issue of non-stationarity was being ignored.
Owing to the day-to-day variability in irradiance, cloud cover, and other environmental and atmospheric parameters, the solar energy is a strong random process [18] that makes the time series intrinsically stochastic. Accordingly, appropriate multi-resolution data pre-processing tools are necessary to suitably extract the embedded information within the non-stationary historic time series. A commonly used procedure is to decompose the predictor signals using discrete wavelet transformation (DWT) into respective detailed components and an approximation component. Consequently, studies that applied the DWT were plenty. Royer, Wilhelm [19] combined DWT with ANN to forecast short-term solar radiation, while Deo, Wen [20] combined DWT with SVM, resulting in their DWT-SVM model. Although DWT based models achieved improved performances in comparison to the classical models, they have inherent drawbacks. Firstly, the DWT is impaired by the decimation effect whereby half of the wavelet coefficients are recursively lost at each of the subsequent transformation levels [21], [22]. Additionally, DWT requires the adoption of a pre-selected mother wavelet; otherwise, a time-consuming trial and error process is often needed. With that, different decomposition levels have been found to generate varying forecasting performances [22].
An alternative multi-resolution analysis (MRA) tool, the empirical mode decomposition (EMD) developed by Huang, Shen [23], segregates higher frequency input series into lower frequency resolved components. The EMD gained prominence due to its self-adaptability i.e., it does not require any prescribed frequency bands or imposed any basis functions [24]. This property of complete data dependence makes EMD advantageous in terms of extracting pertinent features from the predictor time series without any loss of information. In addition, the set of decomposed salient features is a representation of the physical structure of the data since the EMD temporally decomposes the predictor inputs using the extrema information of the riding waves [25]. Likewise, EMD integrated with ANN has proved to be a successful model to forecast solar radiation [24]. In another recent study by Wang, Tian [18], the authors combined EMD with local mean decomposition (LMD) to decompose the non-stationary solar radiation series into simpler components and employed least squares support vector machine (LSSVM) and the Volterra models for forecasting. A comparison of these EMD based algorithms with an autoregressive integrated moving average (ARIMA) method revealed a better performance in the EMD-LMD-LSSVM-Volterra model.
Yet, the key issue with such methods including EMD and its variants (including ensemble EMD (EEMD) [26], the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [27], and the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) [28] is that they can only be applied in a uni-variate setting, i.e., only the significant lags of solar radiation time series could be applied as predictors to forecast future solar radiation values. This is a critical issue since the variability of incident solar radiation is dependent on many dynamic meteorological and environmental factors that may have been left out. The meteorological parameters such as air temperature, sunshine duration, relative humidity, cloud cover are indeed correlated with solar irradiation [29]. Therefore, these parameters need to be appropriately incorporated into the respective models.
To utilize several predictors and subsequently extract most, if not all, possible relevant predictive features, a new hybridized modeling approach called multivariate ensemble empirical mode decomposition (MEMD), is introduced in this study and applied to forecast monthly solar radiation at three sites. The proposed MEMD is an extension of standard EMD to multivariate signals, where EMD has been shown to accurately perform data-driven time-frequency analysis of complex, nonlinear and multichannel dynamical processes [30]. Another key advantage is that the MEMD overcomes the mode alignment issue in the joint analysis of multiple oscillatory components within a higher dimensional signal, which has remained unresolved in standard EMD [31]. The MEMD has been applied with successes in forecasting evapotranspiration [32], soil water [33], crude oil price [34] and iceberg drift forecast [35], but this is the first application of this novel technique for solar radiation forecasting in Australia.
For solar radiation forecasting, eight meteorological predictor time series (i.e., Maximum temperature, minimum temperature, precipitation, evaporation, vapour pressure, estimated Relative Humidity at maximum temperature, estimated Relative Humidity at minimum temperature and Potential Evapotranspiration) acquired from the Scientific Information for Landowners (SILO) are concurrently transformed into respective IMFs and residual components via the MEMD process. The MEMD addresses the important non-stationarity issue via a simultaneous demarcation of input series into resolved components. The important problem of feature selection is resolved via the implementation of a robust bio-inspired feature selection method called the ant-colony optimization (ACO). The ACO is a bio-inspired algorithm that mimics the behavior of ant colonies [36] and has been successfully used in different applications [37], [38], [39], [40], [41]. The novelty of this paper lies in the proposal of a hybridized data-intelligent model that integrates MEMD and ACO with a robust tree-based model, namely random forest (RF), resulting in the hybrid MEMD-ACO-RF model for Rn forecasting. The proposed model simultaneously addresses the non-stationarity and feature selection problems that negatively impact Rn forecasting models. The MEMD-ACO-RF model is benchmarked against competitive M5tree and MPMR models as well as the standalone RF, M5tree and MPMR models in forecasting monthly solar radiation at three solar rich stations in the state of Queensland, Australia. In the remaining of this paper, the theoretical frameworks of these models will be presented, followed by the descriptions of study region, results, discussion, and conclusions.
Section snippets
Theoretical overview
In this section, an overview of the forecasting model Random forest (RF), multivariate empirical mode decomposition (MEMD), ant colony optimization method (ACO) and its comparative counterpart M5tree and minimax probability machine regression model (MPMR) models will be presented.
Study region and datasets
The study area is located in Queensland (QLD), which is Australia’s sunshine state that has an abundance of solar resource. To construct a large set of predictor matrices, predictor data for neighboring meteorological sites were acquired from the Scientific Information for Land Owners (SILO) Portal developed by the Queensland Department of Environment and Resource Management [72]. The data is comprised of monthly rainfall (Rain; mm), maximum (Tmax; °C) and minimum temperature (Tmin; °C),
Results
The performance of the proposed multi-stage MEMD-ACO-RF model vs. comparative models in the testing phase were assessed with the aid of a set of statistical metrics, visual figures and error distributions between the forecasted and observed Rn.
Discussion: limitations and opportunities for further research
In this paper, the suitability of the ACO optimized MEMD coupled RF (benchmarked with M5 Tree model and MPMR) for monthly solar radiation forecasting was investigated. Generally, the RF outperformed M5tree and MPMR models for all selected sites, thus revealing that the RF model was efficient in the detection of the features within the meteorological inputs in a physically meaningful way in order to forecast Rn. The hybridized MEMD-ACO-RF also outperformed the other models compared (i.e.,
Conclusion
A hybrid multi-stage MEMD-ACO-RF model has been designed by incorporating the selected IMFs based on ACO feature selection method on the decomposed input data (Tmax, Tmin, Rain, Evap, VP, RHmax, RHmin, FAO56) for training of the model to forecast future Rn in the Springfield, the Ross River, and the Clare Solar Farms. The monthly input data collected since January 1905 till June 2018 for these candidate sites were decomposed into IMFs and a residual through the MEMD algorithm. The ACO algorithm
Acknowledgements
The authors are grateful to Scientific Information for Landowners (SILO) for providing the relevant meteorological and solar radiation data for the study regions. The authors are also thankful to the editor and the two respected reviewers in providing their comments in improving the quality of the paper.
References (93)
Australian renewable energy progress
Renew Sustain Energy Rev
(2010)- et al.
A review on global solar energy policy
Renew Sustain Energy Rev
(2011) - et al.
Machine learning methods for solar radiation forecasting – a review
Renew Energy
(2017) - et al.
ANFIS, SVM and ANN soft-computing techniques to estimate daily global solar radiation in a warm sub-humid environment
J Atmos Sol Terr Phys
(2017) - et al.
A nonlinear support vector machine model with hard penalty function based on glowworm swarm optimization for forecasting daily global solar radiation
Energy Convers Manage
(2016) - et al.
A novel soft computing framework for solar radiation forecasting
Appl Soft Comput
(2016) - et al.
Hybrid machine learning forecasting of solar radiation values
Neurocomputing.
(2016) - et al.
Multi-step ahead forecasting of daily global and direct solar radiation: a review and case study of Ghardaia region
J Cleaner Prod
(2018) - et al.
Minimum redundancy – maximum relevance with extreme learning machines for global solar radiation forecasting: toward an optimized dimensionality reduction for solar time series
Sol Energy
(2017) - et al.
An efficient neuro-evolutionary hybrid modelling mechanism for the estimation of daily global solar radiation in the Sunshine State of Australia
Appl Energy
(2018)
Hourly solar radiation forecasting using a Volterra-least squares support vector machine model combined with signal decomposition
Energies.
A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset
Appl Energy
Input selection and performance optimization of ANN-based streamflow forecasts in the drought-prone Murray Darling Basin region using IIS and MODWT algorithm
Atmos Res
Improved complete ensemble EMD: a suitable tool for biomedical signal processing
Biomed Signal Process Control
Prediction of daily global solar irradiation data using Bayesian neural network: a comparative study
Renew Energy
Soil water prediction based on its scale-specific control using multivariate empirical mode decomposition
Geoderma
A proof of convergence for ant algorithms
Inf Sci
A review of ant algorithms
Expert Syst Appl
Feature selection using ant colony optimization with tandem-run recruitment to diagnose bronchitis from CT scan images
Comput Methods Programs Biomed
Ant colony algorithms in MANETs: a review
J Network Comput Appl
Very short-term reactive forecasting of the solar ultraviolet index using an extreme learning machine integrated with the solar zenith angle
Environ Res
Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition
Geoderma
Neural networks and M5 model trees in modelling water level–discharge relationship
Neurocomputing
Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree
J Hydrol
Using spatial interpolation to construct a comprehensive archive of Australian climate data
Environ Modell Software
Improved historical solar radiation gridded data for Australia
Environ Modell Software
HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts
Environ Modell Software
Hour-ahead wind power forecast based on random forests
Renew Energy
Australian Energy Update 2016
An ANN-based approach for predicting global radiation in locations with no direct measurement instrumentation
Renew Energy
Predicting global solar radiation using an artificial neural network single-parameter model
Adv Artificial Neural Syst
Prediction of solar energy based on intelligent ANN modelling
Int J Renew Energy Res
Application of extreme learning machine for estimating solar radiation from satellite data
Int J Energy Res
Short-term solar radiation forecasting by using an iterative combination of wavelet artificial neural networks
Independent J Manage Prod
Wavelet-based multiscale performance analysis: an approach to assess and improve hydrological models
Water Resour Res
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
Proc Royal Soc
Radiation time-series prediction based on empirical mode decomposition and artificial neural networks
On the time-varying trend in global-mean surface temperature
Clim Dyn
Ensemble empirical mode decomposition: a noise-assisted data analysis method
Adv Adaptive Data Anal
A complete ensemble empirical mode decomposition with adaptive noise
Multivariate empirical mode decomposition
Proc Roy Soc A: Math Phys Eng Sci
Multiscale image fusion using complex extensions of EMD
IEEE Trans Signal Process
Scale-dependent prediction of reference evapotranspiration based on Multi-Variate Empirical mode decomposition
Ain Shams Eng J
Multivariate EMD-based modeling and forecasting of crude oil price
Sustainability
Cited by (145)
Solar power generation forecasting by a new hybrid cascaded extreme learning method with maximum relevance interaction gain feature selection
2023, Energy Conversion and Management