Short-term electricity load time series prediction by machine learning model via feature selection and parameter optimization using hybrid cooperation search algorithm

Reliable load time series forecasting plays an important role in guaranteeing the safe and stable operation of modern power system. Due to the volatility and randomness of electricity demand, the conventional forecasting method may fail to effectively capture the dynamic change of load curves. To satisfy this practical necessity, the goal of this paper is set to develop a practical machine learning model based on feature selection and parameter optimization for short-term load prediction. In the proposed model, the ensemble empirical mode decomposition is used to divide the original loads into a sequence of relatively simple subcomponents; for each subcomponent, the support vector machine is chosen as the basic predictor where the real-valued cooperation search algorithm (CSA) is used to seek the best model hyperparameters, while the binary-valued CSA is set as the feature selection tool to determine the candidate input variables; finally, the aggregation of all the submodules’ outputs forms the final forecasting result. The presented method is assessed by short-term load data from four provincial-grid dispatching centers in China. The experiments demonstrate that the proposed model can provide better results than several conventional models in short-term load prediction, while the emerging CSA method is an effective tool to determine the parameter combinations of machine learning method.


Introduction
Accurate short-term load forecasting information has a profound impact on not only the social and economic decision-making plan [1][2][3], but also the safe and stable operation of modern power system, like hydrothermal scheduling, unit commitment, security analysis, risk assessment, and equipment overhaul [4][5][6]. Generally, load forecasting is an important but challenging production task because the energy curve is usually influenced by a large number of factors at the same time, like economic level, industrial structure, weather condition, living style and population [7][8][9]. Due to the booming economic development and growing extreme climate events, the energy quantity and peak-valley difference of load curves becomes increasingly prominent in recent years [7][8][9]. Meanwhile, the installed capacity of distributed renewable energy (like wind and solar) with the features of strong volatility and high uncertainty grows rapidly, increasing the change frequency and amplitude of load demand curve [10][11][12]. Under this engineering background, short-term load forecasting is raising new challenges for the daily operation of power system [13][14][15].
Over the past few decades, many advanced methods have been developed for short-term load forecasting [16][17][18][19]. Based on the employed theory principle, the existing methods can be roughly divided into two different categories: one is the conventional approaches represented by multiple linear regression and autoregressive integrated moving average method [20][21][22], while another is intelligent approaches represented by SVM, adaptive networkbased fuzzy inference system, ANN and deep machine learning [23][24][25][26]. Conventional approaches based on linear analysis have the merits of easy implementation and broad adaptability, but may fail to capture the nonlinearity of load curves in many cases [27][28][29][30]. Intelligent methods can obtain satisfying forecasting results without explicitly deducing the underlying mechanism of load series, promoting their widespread applications in practical problems [31][32][33][34]. Given the above considerations, the intelligent approach is chosen as the focus of this paper. As one of the most famous intelligent methods, SVM is based on the structural risk minimization theory and Vapnik-Chervonenkis dimension [35][36][37]. In theory, SVM can guarantee the achievement of global optima for regression and classification problems [38][39][40]. However, the SVM performance often changes with the combinations of computation parameters [6]. To refill the research gaps, more effective methods should be developed to improve the SVM performance in the short-term load forecasting problem.
In order to overcome the SVM defects, two popular strategies (feature selection and parameter optimization) are employed in this paper. As an effective signal analysis tool, EEMD can divide the original load series into a set of subcomponents with different frequencies and resolutions [41][42][43]. EEMD has the merits of high flexibility and strong robustness compared with the traditional EMD method [44][45][46], promoting its wide applications in time series analysis and forecasting. Hence, EEMD is set as the data processing tool to identify the hidden information in the time series. Besides, due to the rapid development of information technology, a variety of swarm intelligence methods are successfully proposed to search for the ideal parameter combination of the SVM method [47][48][49][50]. Recently, an emerging method named as CSA is proposed to solve numerical optimization and engineering optimization problems [51][52][53]. In the CSA method, it is believed that each solution is equal to a staff in the teamwork while a group of staffs can converge to the promising areas around the best solution. After randomly placing the positions of the initial swarm in the state space, the team commutation operation is used to enhance the global exploitation; the reflective learning is used to improve local exploration, while the internal competition is used to guarantee the survival of the fittest. Hence, the RCSA method is employed to search for the optimal parameters of the SVM model for the first time. Moreover, feature selection proves to be an effective way in identifying candidate variables and reducing modeling difficulty [54][55][56]. In order to achieve this goal, the BCSA method is proposed for the first time. Based on the above analysis, an effective machine learning model using feature selection and parameter optimization is developed to predict electricity load time series. Specially, the EEMD method is used to divide the load series into several different parts; and then the SVM method using the HCSA method is adopted to model the inputs-outputs relationships; and then the outputs of all the models are combined to produce the final forecasting result. Specially, the HCSA method is mainly composed of two different parts: real-valued zone for parameter optimization and binary-valued zone for feature selection. The feasibility and practicability of the proposed method is fully proved by the experimental results of several power grids in China.
To better understand this paper, the interesting insights of this paper lie in: (a) the RCSA method is used to optimize the hyperparameters of the SVM method; (b) the BCSA method is developed to select candidate input variables; (c) a hybrid method using EEMD, CSA and SVM are developed for short-term load prediction. For all we know, the above three points have not been reported by other research groups by far, providing useful technical reference for relevant engineering problems.
The reminder of this paper is organized as below: the technical details of the presented method is given in section 2. The performance of the presented method is verified in section 3, while the conclusions are given in section 4.

Ensemble empirical mode decomposition
EEMD is an effective signal analysis tool developed to divide the complex data series into a set of small and

Evolutionary optimizer
Decoding Figure 1. Sketch map of the HCSA method.
simple subseries called IMF [57][58][59][60]. In the decomposing process of the EEMD method, the white noise with finite amplitude is added to the original signal to split various scales of sub-signals in the frequency space; and then each IMF subseries is equal to the ensemble mean of several trails [36,61,62]. Next, the execution steps of the EEMD method are summarized as below: Step 1: define the number of ensembles M and the amplitude of the white noise.
Step 2: add the white noise to the original signal x (t) to reconstruct the modified data signal: where n i (t) and x i (t) are the tth value of the ith white noise and the ith modified signal.
Step 3: use the standard EMD method to divide the newly-obtained signal x i (t) into a series of IMFs where S is the number of IMFs. h i,s (t) is the tth value of the sth IMFs associated with x i (t). r i,S (t) is the residue representing the mean trend of x i (t).
Step 4: steps 2-3 are repeated M times to produce an ensemble of IMFs. Then, the average of all the obtained IMFs possessing identical frequency is treated as the final result whereh s (t) is the tth value of the sth IMF subcomponent.

Hybrid CSA
In practice, many problems often have binary and real variables at the same time. To address this kind of problem, all the decision variables in HCSA method are encoded as real values using the same evolutionary optimizer. To effectively respond the actual problem, all the variables are divided into two groups: real zone and binary zone. As showed in figure 1, the variables in binary zone should be mapped into binary values via a transfer function while the variables in real zone require no processing. By this time, all the variables for the target problem will be known and then the fitness values of all the staffs can be obtained. The technical details of the HCSA algorithm are given in the following sections.

Real-valued CSA
Based on the team cooperation behaviors drawn in figure 2, the CSA method starts from an initial population randomly placed in the feasible decision space, and then iteratively makes use of three evolutionary operators to approach the promising area around the global optimal solution [51][52][53]: the team communication operator where each staff gain knowledge from leaders to improve the job performance; the reflective learning operator where each staff has the chance to draw lessons from the past and avoid making similar mistakes as far as possible in the future; and the internal competition operator where elite staffs are conserved and under-performing staffs are dismissed. Generally, the team communication operator helps the population explore in the unknown space with a large search step; the reflective learning operator is used to enhance the local search ability of the swarm with a small search step; the internal competition operator gradually improves the solution's quality from generation to generation. Without loss of generality, for the J-variable minimization optimization problem, the detailed procedures of the CSA method with I staffs are given as below: (a) Team building phase. The job tasks of all the staffs at hand are randomly assigned in the feasible decision space, which can be expressed as below: superior behaviors are seen as the candidate members in the supervisor and directors board; and then each staff improves the professional skills by gaining favorable information from excellent staffs in both supervisory and directors boards, which can be described as follows: where u k+1 i,j is the jth value of the ith group staff at the k + 1th cycle. A k i,j , B k i,j and C k i,j denote the influence of the chairman, the directors and the supervisors on the jth value of the ith staff at the kth cycle. pBest k i,j is the jth value of the ith personal best-known staff at the kth cycle. gBest k ind,j is the jth value of the indth global best-known staff from the beginning to the kth cycle. ind is the index randomly selected from the set of {1, 2, …, M}. M is the number of the global bestknown staffs. α and β are two positive learning coefficients. To be mentioned, if the boundary constraints are violated, the elements of the obtained solutions will be modified to the feasible space by the following equation: where x j andx j are the upper or lower limits of the jth value of the target problem.
(c) Reflective learning operator. In daily work, each staff has the chance to draw lessons from the past to reduce the possible mistakes in the future, which can be described as below: where v k+1 i,j is the jth value of the ith reflective staff at the k + 1th cycle.
(d) Internal competition operator. For the entire team, the survival of the fittest is preferred to gradually improve the overall strength under a highly competitive environment, which can be described as below: where F (x) is the fitness value of the staff x.
The pseudo-code of the CSA method for solving global optimization problem is given as: 1. Set the objective function and physical constraints of the target problem. 2. Initialization 3.
Create the initial swarm in the feasible search space by equation (4).

4.
Calculate the fitness values of all initial solutions.

5.
Both group and reflective solutions are set as its initial one. 6. End Initialization 7. Repeat search process 8.
Update I personal best-known solutions for the current swarm. 9.
Update M global best-known solutions found by far.
Compute the fitness values of both group and reflective solutions.

13.
Choose I better solutions by internal competition operator in equation (14). 14. While (maximum iterations is not met) 15. Global best-known individual is seen as the final solution of the target problem.

Binary-valued CSA
To effectively solve the discrete optimization problem, this section introduces the BCSA whose search process is similar to the RCSA method. In BCSA, the elements of each staff in the current swarm will be turned into the binary variable (i.e. 0 or 1) via the aid of the classical sigmoid function defined as below: where w and S are the original and modified values, respectively. bx k+1 i,j is the binary variable associated with the jth value of the ith staff at the kth cycle.

Support vector machine
SVM is a classical statistical machine learning method for regression and classification problem. In the optimization process, SVM takes advantages of the structural risk minimization principle to produce the global optimal solution [63][64][65]. Mathematically, SVM for a regression problem with N training samples can be described as: where x i ∈ R D and y i ∈ R are the input and output variables of the ith sample. ϕ (x i ) is the nonlinear transfer function. w and b are the weight vector and bias, which can be obtained by solving the following constrained optimization problems: where ξ i and ξ * i are two slack variables for penalizing the training errors of the loss function over the error tolerance ε. C is the parameter used to determine the empirical risk.
To effectively solve the above problem, the Lagrangian function is reconstructed as below: where η i , η * i , a i , and a * i denote the Lagrange multiples.
Based on the famous Karush-Kuhn-Tucker condition, the dual form of the problem in equation (19) can be expressed as below: After solving the above constrained optimization problem, the desired weight vector can be derived, and then the regression function for the SVM model becomes where K (x i , x) is the kernel function meeting the Mercer's condition. Here, the radial basis function where σ is the value of the standard deviation.

The proposed hybrid method for short-term load time series prediction
As mentioned above, the load curve in practice usually presents the characteristics of strong nonlinearity, randomness and instability due to the growing economy, changeable electric demand and other uncertain factors. In order to accurately trace the dynamic process of load time series, this paper introduces a hybrid load forecasting method linking the advanced signal decomposition tool and population-based metaheuristic algorithm into the classical machine learning model. Figure 3 shows the sketch map of the proposed method. It can be clearly found that the presented method is composed of three different parts: (a) the first is the decomposition phase where the EEMD method is driven to extract different resolution subcomponents hidden in the original load series; (b) the second is the modeling phase where SVM is used to simulate the possible inputs-outputs relationship hidden in each subseries, and the HCSA is used to find the best combinations of input variables and model parameters; (c) the last is the ensemble phase where the outputs of all the optimized SVM models are combined to produce the final forecasting value. By this way, the proposed method is able to effectively identify the inherent property of nonlinear load series and provides strong technical support for smart grid operation. Then, the execution procedures of the proposed method in load prediction are given as below: Step 1: the original load series is decomposed into a group of subseries. For each subseries, the dataset is normalized into the range of [0,1] and then divided into training and testing samples.
Step 2: use the SVM method to construct the forecasting model for all the subcomponents, where the HCSA method is driven to search for the satisfying combination of model parameters and candidate inputs. In the search process, the RCSA method is in charge of optimizing model parameters while the BCSA method is used to eliminate the redundant information from multiple possible input variables.
Step 3: the forecasting results of all the optimized forecasting model are combined to produce an aggregated output. In practice, for each forecasting model, the newly-obtained variable should be normalized into the preset range of [0,1] while the simulated outputs should be renormalized to the original range of the studied signal.

Study area and dataset
In practice, it is preferred that the developed load forecasting model can work in different cases. Thus, to fully test the robust performance of the proposed method, the load series of four China's provincialgrid dispatching centers (A-D) in both summer and winter are chosen for comparison. The reasons lie in that the load series of four power grids have different features, like numbers of peak loads, occurrence times of both peak and valley loads, total loads, as well as peak-valley differences. If not robust enough, the developed method may be suitable for one power grid and fail to work for another grid; but not vice versa. As illustrated in figure 4, the loads in summer are considered to reflect the huge energy demand caused by the high temperature weathers; besides, the load in winter are considered in the second case. Table 1 shows the statistical information of four power grids in summer and winter, demonstrating the huge differences of various load curves. For the studied power grids, about 1600 quarter-hour load series in both winter and summer are collected, where the first 75% observation are chosen for model training while the left is used for testing the performance of the forecasting model. The forecasting window as a quarter-hour. The experiments are executed on a personal computer with 3.30 GHz processor and 8 GB RAM.

Evaluation indicators
To fully check the quality of various forecasting model, the RMSE, MAE, MARE, R, and NSE are chosen as the performance evaluation indexes for comparison. Generally, the better the forecasting mode, the larger the values of NSE and R, the smaller values of RMSE, MAE and MARE. Then, the definitions of five statistical measures are given as below: where n is the number of samples. y i andỹ i are the ith observed and forecasted load values. y avg andỹ avg are the mean value of the observed and forecasted load series.  power grids. It can be found that obvious differences exist in the obtained subcomponents: the first subseries has the largest amplitude and frequency while the other subseries gradually changes the frequency and amplitude. Specially, the first two IMFs have no obvious changing rules and fluctuates frequently, reflecting the random high-frequency features of the original loads; the 3rd IMFs basically varies by hours, reflecting the hourly-level feature of original loads; the 4th and 5th IMFs are the high-frequency periodic components varying in days; the following 6-8th IMFs roughly represent low-frequency fluctuations varying in weeks; while the last IMF denote the overall trend of the load signal. Hence, the feasibility of the EEMD method in identifying internal features of the target loads is successfully verified.

Model development
Given the inputs has a great influence on the model performance, the set of the candidate input variables should be determined before optimizing the forecasting model. For the proposed method, the PACF is used to identify the initial input sets. Generally, the variable at lag t can be chosen as the possible influence factor as the relevant PACF value falls into the confidence interval. Based on the PACF value and other relevant information, the initial set of input variables for various subseries can be roughly determined.
Several artificial intelligence methods are introduced for comparison, including ANN, ELM, SVM, SVM couple with EMD (SVM-EMD), SVM couple with EEMD (SVM-EEMD). For the standard ANN, ELM and SVM methods, the original loads are chosen as the target object. The trial-and-error method is used to select the number of hidden neurons in both ANN and ELM, while the grid search method is used to identify the computational parameters of SVM. For both SVM-EMD and SVM-EEMD method, the decomposed results by EMD or EEMD are selected as the target object, while the relevant parameters are also determined by the grid search method. For the proposed method, the decomposed result produced by the EEMD method are selected as the target object, while the HCSA method is used to find satisfying combinations of model parameters and input variables. The number of staffs and iterations in HCSA are set as 20 and 100. Table 2 shows the statistical results of various methods for four power grids in summer. It can be seen that in the load prediction works, there are obvious differences in the performances of three artificial intelligence methods (like SVM, ELM and ANN), demonstrating the importance of the training technique and model theory; secondary, the modified SVM methods are able to yield better results than the standard SVM method, proving the role of decompositionensemble tool in improving the model performances; moreover, the proposed method has the ability of producing the best performances with respect to various indexes in different power grids, demonstrating the feasibility of feature selection and parameter optimization. For instance, as compared with the RMSE values of the ELM and SVM approaches, the proposed forecasting model can make about 52.25% and 35.03% reductions for grid A during the training phase; compared with the MAE values of the  To show the improvements of the proposed over other methods, the comparison between the proposed method and other methods in terms of four statistical indicators are drawn in figure 6. It can be found that the largest value in the performance indexes viewpoint is often achieved by the ANN method, demonstrating the huge room for improvement. Besides, as adopted to forecast the summer loads of four grids, the improvements of several control methods are larger than 0, while the ranking order from the best to worst are basically ELM, SVM, SVM-EMD and SVM-EEMD on the whole. Hence, the proposed method is an effective tool outperforming several conventional methods in the complex load prediction task.

Simulation results
In order to fully demonstrate the superiority of the proposed method, figure 7 draws the original and forecasted loads produced by various forecasting methods in the testing phase. It can be seen that in the testing phase, all the developed forecasting methods can closely track the dynamic change process of four power grids, proving the practicability of the developed models. Besides, the performances of the modified SVM methods (like SVM-EMD and SVM-EEMD) are superior to three traditional methods, which proves the role of the decomposition-coordination technique. The proposed method outperforms the control methods in approximating the dynamic change of the load series because its trend lines in different cases are closer to the ideal value. Thus, the feasibility of the proposed method in forecasting load series is proved again. Table 3 shows the statistical results of various methods for four power grids in winter. It can be found that for four power grids, the forecasting results of the proposed method are better than five control methods during both training and testing phases in terms of various indexes; besides, three standard methods are inferior to the decomposition and ensemble-based forecasting method. For instance, as used to forecast the load of grid D, the proposed method can make about 53.82%, 48.29%, 37.16%, 31.49% and 25.76% reductions as compared with the RMSE values of the ANN, ELM, SVM, SVM-EMD and SVM-EEMD methods during the testing phase. Thus, the superior performance of the proposed method for load forecasting is proved.   Figure 8 shows the comparison between the proposed method and other methods in terms of four statistical indicators. It can be clearly observed that a single forecasting method (like SVM and ELM) can make obvious improvements with respect to various performance indexes in the testing phase. Besides, the results of all the methods can be improved to different degree compared with the proposed method. Thus, we can draw the conclusion that the proposed method is a powerful method for short-term load time series forecasting. Figure 9 shows the results of various forecasting methods for four power grids in the testing phase. It can be seen that the developed methods can obtain satisfying forecasting results for four power grids, while the proposed method has the best performance among in different scenarios since its trend lines are closer to the observed load line than the other methods. Thus, the proposed method can provide superior performances as used to forecast the load data series in different cases.

Case 3: comparison with different swarm intelligence methods
To further test the performance of the proposed method, the classical GA and PSO are chosen to optimize the SVM parameters. Table 4 lists the statistical results of various SVM methods using different strategies for four power grids. In SVM-EEMD-PSO and SVM-EEMD-GA, the original load is divided into several subcomponents while the SVM method based on PSO or GA are driven for modeling. For table 4, it can be found that three SVM variants outperforms the standard SVM method, demonstrating the effectiveness of decomposing technique and swarm intelligence; besides, the proposed method yields better or same statistical indexes as compared with two other SVM variants during both training and testing, proving the superiority of the CSA approach in parameter optimization. Therefore, it can be concluded that the proposed method is an effective tool to accurately forecast the load series.

Result discussions
From the above analysis, it can be clearly found that in short-term load time series prediction, the developed model based on the dynamic combinations of three effective methods is capable of producing better performances than several traditional methods with distinct improvements in the statistical indexes. Based on the EEMD method, the complex load time series with the stochastic and time-dependent features can be transformed into a series of relatively stable subseries, making an obvious reduction in the modeling complexity; and then the HCSA method is driven to optimize the computational parameters of the SVM model for each subcomponent, which can help improve the network compactness and generalization ability of the forecasting model; finally, the result is obtained by summing the outputs of all the SVM models, producing a synergetic effect in the final predicted value. Thus, the hybrid model based on the decomposition-and-ensemble idea can enhance the forecasting performances compared with several conventional methods.
Although the feasibility of the proposed method is proved in the real-world load demand of several power grids in China, its practical application may face some possible defects, one is the credibility of the EEMD method since its decomposing results change with the features (like length or magnitude) of the studied dataset; the second is the local minimum of the SVM model since even with the aid of the HCSA method, it is still difficult to guarantee the best choice of the input variables and network parameters. Thus, future research can be directed to the developments of the data decomposition tool with more stable performance, evolutionary algorithm with strong global search ability [66], input variable determination tool and forecasting model with stronger generalization ability. Besides, there is no uncertainty quantification in the resultant forecasting result because the deterministic load value is preferred in the decision-making process made by operators and managers of power grids [67,68]. For better and reliable predictions, it is quite necessary to consider the relevant input variables (like weather conditions and date attribute) in the uncertain forecasting model in the future.

Conclusion
In order to improve the short-term electricity load prediction accuracy, this research develops a practical machine learning model based on feature selection and parameter optimization. In the proposed method, the EEMD is employed to divide the complex load data into a set of subcomponents that will be modeled by the famous SVM method; and then the HCSA is used to find the best combinations of model hyperparameters and input variables; finally, the outputs of all the submodules are used to form the final forecasting results. The load series of four provincialgrid dispatching centers are used to verify the performances of the proposed method. The simulations demonstrate that the presented method can effectively enhance the forecasting accuracy of several traditional methods in terms of all the employed assessment indexes. Hence, a powerful machine learning model with robust performances is presented to forecast the short-term load demand curve in practice. For the policy and decision makers, the proposed methodology can provide strong technical support information to determine the satisfying operation plan of various power resources (like hydropower, thermal and wind), rationally arrange the maintenance plans of the generating units, maintain the safety and stability of power grid. By this way, the fuel cost and pollutant emission of electric power system can be sharply improved in the long run, which can effectively increase the economy benefits, environmental and social benefits.

Data availability statement
Due to the strict security requirements from the departments, some or all data, models, or code generated or used in the study are proprietary or confidential in nature and may only be provided with restrictions (e.g. anonymized data).