Artificial Intelligence based accurately load forecasting system to forecast short and medium-term load demands

An efficient management and better scheduling by the power companies are of great significance for accurate electrical load forecasting. There exists a high level of uncertainties in the load time series, which is challenging to make the accurate short-term load forecast (STLF), medium-term load forecast (MTLF), and long-term load forecast (LTLF). To extract the local trends and to capture the same patterns of short, and medium forecasting time series, we proposed long short-term memory (LSTM), Multilayer perceptron, and convolutional neural network (CNN) to learn the relationship in the time series. These models are proposed to improve the forecasting accuracy. The models were tested based on the real-world case by conducting detailed experiments to validate their stability and practicality. The performance was measured in terms of squared error, Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE). To predict the next 24 hours ahead load forecasting, the lowest prediction error was obtained using LSTM with R2 (0.5160), MLP with MAPE (4.97), MAE (104.33) and RMSE (133.92). To predict the next 72 hours ahead of load forecasting, the lowest prediction error was obtained using LSTM with R2 (0.7153), MPL with MAPE (7.04), MAE (125.92), RMSE (188.33). Likewise, to predict the next one week ahead load forecasting, the lowest error was obtained using CNN with R2 (0.7616), MLP with MAPE (6.162), MAE (103.156), RMSE (150.81). Moreover, to predict the next one-month load forecasting, the lowest prediction error was obtained using CNN with R2 (0.820), MLP with MAPE (5.18), LSTM with MAE (75.12) and RMSE (109.197). The results reveal that proposed methods achieved better and stable performance for predicting the short, and medium-term load forecasting. The findings of the STLF indicate that the proposed model can be better implemented for local system planning and dispatch, while it will be more efficient for MTLF in better scheduling and maintenance operations.


Introduction
In the presence of load forecasting service, the thing which is important and achievable to meet the upcoming load demands so that power can be obviated to happen. The occurrence of faults due to unnecessary loading, other electrical failures such as blackout etc. can be refrained if properly required precautionary measures to be taken to manage the load demands. However, load demands are kept on continually increasing day by day. It fluctuates a lot on an hourly basis [1], so short-term load demands are highly desired to be automated for proper forecasting. The load forecasting is basically a stochastic problem instead of being random. One cannot find 'certain' in this phenomenon, because forecasters have to deal with stochasticity and haphazardness. The outcome of this process is expected to be in complicated form, for instance, a forecast extending below this or that context, a prediction level, and a little quantile of interest. Forecasting is considered crucial and high priority in the power sector. The previously available couple of global power forecasting fixtures, known as GEFCom2012 as well as GEFCom2014, grabbed the attention of many academia data scientists as well as various industries along with resolved numerous forecasting hurdles faced within the power sectors including probabilistic forecasting [2] and hierarchical forecasting [3].
There are three approaches most widely used in load forecasting such as shorter-term load forecasting (STLF), medium-term load forecasting (MTLF) and long-term load forecasting (LTLF) [1]. The STLF has been explored extensively as well as modeled at the generation capacity, distribution strategy and in transmission process. Mostly, the problem arises at the distribution stage because the distribution utilities are linked with the generation capacity and its scheduling and system peaking. These utilities do not have direct relation to short term (ST) load demand of customers himself. The ST load forecasting becomes a central problem for the functioning and transmission of energy networks to interrupt crucial outcomes related to failures in flash and energy systems. That is imperative towards the economic strengths of energy networks and the essence of sending off along with assembling start-up-shutting down schemes that play a crucial part in the automatically controlled power networks. Accurately precise power load forecast does not merely aid consumers to go for far suitable power utilization plan of action, but also decreases electric cost expenditure, meanwhile enhancing equipment utilization and hence minimizing the cost of production and upgrading the economic welfare. It is also conducive to optimize the reversion of the power system, enhancing the efficiency of power supply, and attaining the goal of energy conservation along with emission reduction.
Proper and efficient electric load forecasting can save electricity and make distribution easier and efficient by utilizing minimum energy resources. The cost of the electricity may increase if electric load forecasting is not accurate. Furthermore, when load forecasting is not proper and accurate then it may waste energy resources. Contrary to that, if load forecasting is done properly and accurately, it will help to manage the future electric load demands by efficiently load planning [4].

Factors affecting Electrical Load
Mostly, in a large power system, the usage schemes of tools might show distinction from each other and a single electrical device being randomly used. There are large fluctuations in discrete loads, but on summing up these loads into bigger facility load, emerging pattern could be predicted arithmetically [5]. Here are the following factors which can affect electric load including weather, time factors, economic factors, and random effects.

Weather
Weather is one of the most influencing factors from the above-listed factors. Weather details become extremely essential elements in the load forecast scenario. Usually, load forecasting models are built and tested, taking into considerations the actual reading of weather. Nevertheless, outline operational models prerequisite maneuvering weather forecasts, along with related forecast errors and miscalculations. The weather forecast obviously escort to the degradation and breakdown in the model performance. Weather forecasting, for instance, the speed of wind, climatic temperature along with cloud covering performs a massive role in STLF by modifying the graph. The characteristics of these elements are mirrored in the load requirements even though some are influenced far more than others.
Weather have significant effects on ST electrical load forecasting of power network [6]. Weather sensitive loads like heating, air conditioning devices and ventilating have a great effects on small-scale industrial energy structures. Weather elements that could influence hourly power load forecasting are barometric pressure, precipitation, speed of wind, solar irradiance along with humidity. During maximum humidness days, the cooling apparatus can function for prolonged duty patterns to get rid of unneeded condensation from the conditioned air. Precipitation holds the capability to minimize the temperature of air that will result in the reduction of cooling load [5].

Time Factors
There are further three factors that can influence the electric loads: holidays, day to day weekly cycle, and seasonal outputs. The day to day power load forecast for the off day is almost the same as that of what is found on weekends. During holidays, the vastness of power loads is lesser as compared to working days. Day to day weekly cycle is power load forecast schemes which are periodical all through every day and week. While seasoning outcomes describe for long-term modifications in the enhancement and patterns of weather [7].

Economic Factors
Economic factors depend upon the investment that facilitates the basic structure by establishing the labs and new constructions that would add load to the power system.

Random Effects
All other random conflicts except the three earlier mentioned are included in random effects which can disturb energy load depiction. These disturbances include essential loads that have not plan and the absence of employees which makes the prediction difficult [5].
Load forecasting has gained more importance in smart energy management systems in recent few years. The number of users continually increasing for load forcasting on daily basis. Mostly, STLF is utilized to manage load forwarding and to control energy transfer schedule for thirty minutes to the entire day. Therefore, any betterment gained around precision and management of STLF escorts to reduce expenses of electrical management system along with the enhance efficiency of the energy network [8].
According to Gross and Galiana [5], STLF plays an important part in the creation of feasible, secure, dependable, economic as well as consistent working techniques for the electrical management network. STLF provides information about the latest prediction of weather, recent load forecasting strategy, and random behavior of the power system [8]. The load forecast is getting more interesting and attracting the attention of researchers in the current few years because of its importance and increasing trend in microgrids, smart grids, and renewable sources of energy. Different strategies have been implemented for load forecast as well as management, such as autoregressive integrated moving average models, seasonal auto-regressive, fractionally integrated moving average, regression, Kalman filtering etc. Al-Al-Hamadi and Soliman [9] presented a load management model to meet the time-varying demand of users on an hourly period. There was another approach called the Kalman filter that was implemented to calculate the optimal load forecast on an hourly basis [8]. Song et al. [10] introduced another technique based on the fuzzy linear regression method built on the load data.
In recent years, numerous other techniques were applied to enhance the accuracy and efficiency of load forecast management of power systems, such as fuzzy logic and artificial intelligence (AI). These effective solutions built projected on the technologies of AI to solve load forecasting issues. AI-based intelligent systems have gained more importance, becoming more widespread nowadays and being developed on large scale. Because of their explanation capability, flexibility, and symbolic reasoning, they are being deployed worldwide in several applications. Ranaweera et al. presented another technique that is based on fuzzy rules developed by using a learning type algorithm to incorporate load management and historical weather data [11]. He et al. [12] proposed a novel method for the quantification of unpredictability stringed to power load and to obtain future load information, after that neural structure has been used in order to construct quantile regression framework in the development of probabilistic techniques for forecasting.
Recently. artificial intelligence (AI) based methods have been utilized for loads forecasting including smart grid and buildings [13], next day load demands [14], load forecasting in distributed systems [15] and autoregressive ANN with exogenous vector inputs [16] etc. Moreover, other AI base methods including fuzzy logistic techniques [17] specialist networks structures [18], Bayesian neural system [19] and support vector apparatus [20] are widely used in handling number of forecasting issues. Even though extensive research has been carried out, an error-free as well as precise STLF continue in becoming a challenge because it is having a load data which is nonstationary in addition to carrying durable dependencies related to forecasting horizons. For this reason, the long-short term memory (LSTM) is applied [21], which is a peculiar kind of recurrent neural network (RNN) structure [22] to resolve the STLF issue. LSTM functions effectively around long-time horizon forecasting as compared to the rest of the artificial intelligence techniques which is routed through the previous load statistics which governed outcome as well as connections across the series of time.
The researchers [23] suggested a hybrid technique established on wavelet transform as well Bayesian neural networks (BNN) to get the load characteristics for training of the BNN model. In this technique, an average sum of BNN outputs is used for predicting load for a specific day. Fan et al. [24] introduced another hybrid design established on the junction of a combination of Bi-Square Kernel (BSK) regression framework and phase space reconstruction (PSR) method. In this approach, load statistics were regenerated via using the PSR method to obtain the developmental modes of the documented load statistics to enhance prediction reliability [24]. The researchers [25] proposed another fuzzy logic controller-based hourly for load forecasting depending upon different conditional variables, e.g., random disturbances, load historical statistics, time as well as climate etc. Finally, the developed hybrid model was successfully established by employing a combination of evolutionary algorithms and neural networks.
In another study, Metaxiotis et al. [26] provides a comparison of models based on CNNs and AI, where CNN models show a distinctive performance in load forecasting prediction. Fukushima K [27] presented CNNs in very simple form. LeCun et al proposed the current form on CNNs with more advanced concepts, there have been many further extensions with improvements, such as batch normalization and max-pooling layer [28] etc. More specifically, some strategies are devised for CNN to facilitate structures, for instance, the addition of pooling surface in the design [29]. In short, CNN is being considered as a potential candidate for load forecasting implementations, so far as the control of over-fitting is concerned. On the other side, Fuzzy time series (FTS) were employed in pattern acquisition-based techniques within numerous implementations which include load forecasting. In 2005, Yu [30] proposed a novel kind of weighted FTS for the forecast of the stock market. In 2009, Wang et al. proposed another technique using FTS that was implemented in stock index forecasting and temperature forecasting [31]. In summary, many other relevant studies show that FTS has been most widely used in load forecasting management systems by many other researchers such as hybrid dynamic and fuzzy time series model for mid-term power load forecasting [32], a new linguistic out-sample approach of fuzzy time series for daily load forecasting [33] and imperialist competitive algorithm combined with refined high-order weighted fuzzy time series (RHWFTS-ICA) for short term load forecasting [34].
The AI-based machine learning and deep convolutional neural networks methods have also successfully been used in many other areas of complex physiological signals and image processing problems. The applications of AI based methods include seizuer detection using entropy-based methods [35] and machine learning methods [36], congestive heart failure by extracting multimodal features and employing machine learning techniques [37], Alzheimer detection via machine learning [38], brain tumor detection based on hybrid features and machine learning techniques [39], arrhythmia detection [40], lung cacner detection be extracting refined fuzzy entropy [41], and prostate cancer based on deep learning [42] and machine learning [43] methods.
Deep leaerning methods such as CNN mostly improve the prediction performance using big data and has improved the traditional computer vision tasks such as image classification etc. Recently, CNN is used for both imaging and non-imaging data. There are many applications of 1D-CNN for time series data including electricity load forecasting [44], electricity load forecasting for each day of week [45], hydrological time-series prediction [46], short-term load forecasting of Romanina power system [47] and short-term wind power forecasting [48] etc. Figure 1 shows the schematic diagram of the electric load forecasting system for Short-term and medium-term load forecasting. Short-term load forecasting (STLF) is used for the planning of the power systems ranging from 01 hours up to one week. In this case, we computed the STLF for the next 24 hours, 72 hours, and one week. The medium-term load forecasting (MTLF) is used to plan maintenance etc. ranging from one day to a few months. In this case, we computed the next load forecast of one day (24 hours), 72 hours, one week, two weeks, and one month. After applying the data preprocessing, we optimized and initialized the robust neural network and deep learning models such as multilayer perceptron (MPL), LSTM and CNN. The performance was computed on the test set based on the standard performance error metrics such as R-squared, MAPE, MAE, MSE, and RMSE. Finally, the load forecasting for STLF and MTLF ahead was computed and performance was evaluated in terms of errors for actual and predicted load demands. In this study, we optimized and employed robust machine learning and deep learning-based methods to predict the load forecasting demands for STLF and MTLF. For multilayer perceptron (MLP), we optimized the function by changing the number of neurons in the hidden layer, this number must be high enough to model the problem and not too much high to avoid overfitting. The iterative backpropagation algorithm was used for this purpose. Moreover, to solve the problem of minimizing the cost function concerning connection weights, the gradient descent algorithm is used in conjunction with the backpropagation algorithm.

Fine tunning of parameters
Following methods were used to fine-tune the neural network parameters: We created an LSTM model with one LSTM layer of 48 neurons and "Relu" as an activation function. We also added a dense layers which contains 24 neurons and the last layer, which also acts as the output layer, contains 1 neuron. Finally, we compiled our model using optimizer = "ADAM" and train it for 100 epochs with a batch size of 24.

Conv1D
Firstly, for Conv1D we defined 48 filters and a kernel size of 2 with "RELU" as an activation function. In order to reduce the complexity of the output and prevent over fitting of the data we used Max pooling layer after a CNN layer with size of 2. Moreover, a dense layers were added contained 24 neurons and the final output layer, contains 1 neuron. The model was compiled with "ADAM" optimizer and then fit for 100 epochs, and a batch size of 24 samples is used.

MLP
The used model contained a single layer with 48 neuron and "RELU" as an activation function. Similar to LSTM and Conv1D a dense layers were also added with 24 neurons and a final layer for output contain a single neuron respectively. Lastly, the model was fit using the efficient ADAM optimization algorithm and the mean squared error loss function for 100 epochs

2.
Material and methods

Dataset
The data was taken from Al-Khwarizmi Institute of Technology, The University of Punjab, Lahore, from one of its project regarding the electricity hourly load demands of the complete year 2008 and July to December 2009 of one grid as used in [4]. The data was taken from a feeder to fulfill the maximum load requirements for this study. The load time series was generated for 24 hours ahead, 72 hours ahead, one week ahead, and one month ahead for predicting the short-term and medium-term load demands.

Multilayer Perceptron (MLP)
Paul Werbos developed the MLP in 1974, which generalizes simple perception in the non-linear approach by using the logistics function It has become one of the most popular neural networks conceived for supervised learning. The Multilayer Perceptron constitutes of 3 layers, an input layer, an intermediate layer as well output layer which can be formed by at least one layer, the information is transmitted in one direction, emerging out of the input layer towards the output layer. By an adjustment iteration set comparing outputs and inputs, the MLP adjusts the weights of neural connections; to find an optimal weight structure through the gradient backpropagation method. The network generally converges to a state where the calculation error is low [49].
Local learning procedure is used to train MLP [50]. For this purpose, few specimens are picked from the neighbourhood of any selected position x* to train MLP. The neighborhood means the group of k-nearest neighbors in the scheme of training set Φ of the selected point. The framework is trained to learn and positioned perfectly to the target function about the selected point x* be the proficient for this query scheme only. Local complication is lesser as compared to the global complexity of the target function. Therefore, elementary MLP design with least concealed neurons can be used that can learn fast. In [50] researcher reported that using a single-neuron carrying sigmoid acceleration function delivered better outcomes than other systems working along with numerous neurons in the concealed surface. MLP can be implemented with a backpropagation algorithm and is considered as one of the most popular and commonly used networks. The backpropagation training algorithm is a monitored acquisition algorithm and has so far been implemented on a large scale for the prediction problems such as Pan evaporation (EP) prediction [51] and prediction of annual gross electricity demands based on socio encomonic indicators and climate conditions [52].
The figure of closest neighbours (that is learning position) becomes the only hyper-parameter which is modulated in the local learning programming (LLP) method. The MLP absorbs utilizing the Levenberg-Marquardt algorithm along with Bayesian regularization [53], which reduces the amalgamation of squared miscalculations along with net weights to keep away from overfitting. The optimization of MLP is figured out in the algorithm given below: Figure 2 reflects the general architecture and working of the MLP algorithm, which contain the input time series of different normalized loads demands as actual time series, hidden neurons and activation functions, learning and finally error metrics are computed to predict ed the difference between actual and predicted load demands.

Deep Learning Methods
Deep learning is a sub-domain of Artificial intelligence (AI) works in similar strategies as machine learning (ML) and artificial neural networks (ANN) perform. The AI is functioning in similar pattern to human. Like a human brain, ANN take information to process this information using a group of neurons which form layers. These neurons transfer information to other neurons where some information is sent back to the previous layer and finally processed information is sent to the output layer in the form of classification or regression. Deep learning extract features and data automatically and methods improve the prediction of complex problems [54].

Long Short -Term Memory (LSTM)
LSTM is an artificial recurrent neural network (RNN) architecture used in the field of deep learning [55]. LSTM is considered a popular and expert model used in the forecasting of time series that can efficiently deal with both long term and short-term data dependencies. LSTM was designed and motivated to get control of the issues related to disappearing gradients in RNN architecture, specifically while dealing with the long-term data dependency, which lead to the short and LTM neural network. Basically, the LSTM framework adds the input barrier, forgetting gate, and output gateway towards the neurons in regressive neural network architecture. This newly developed approach can efficiently manage the problem of vanishing gradient [56]. This addition gets LSTM structure more appropriate for long-termed data dependency problems. LSTM has the ability of learning long-short term memory from any given input sequence and that is why it is being widely employed in the time series prediction [57].
LSTM methods from RNNs have been used in many applications such as speech and language modelling, speech recognition and classification of neurocognitive performance by Greff et al., in [58]. Moreover, P. Nagabushanam et al. [59] employed LSTM for classification improvement of EEG signals. Senyurek et al., [60] employed LSTM for puffs detection in smoking by obtaining temporal dynamics sensors. LSTM improved recognition of gait in a neuron degenerated disease as compared to old recurrent neural network (RNN) methods [59]. The machine learning algorithms faces the problem of gradient learning. LSTM on contrast solve this problem as it is based on the appropriate gradient-based learning algorithm and solve error backflow problems. Moreover, LSTM algorithm is also more appropriate when there is noise or incompressible input sequence without losing short time lag capabilities. LSTM is also more efficient in fast and adaptive learning than other machine learning algorithms. It is capable to solve very long-time lag tasks and complex problems which cannot be solved using conventional machine learning problems.
The hidden layers of LSTM are linear, but self-loop memory blocks allow the gradient to flow through large sequences. LSTM comprised of recurrent blocks termed as memory blocks, each block contain recurrent memory cells and three multiplicative units namely input, output and forget gates [61]. These cells allow memory blocks to store and access information for a long time to solve the vanishing problems [62]. LSTM was originally comprised on input and output gate, the forget gate was included by [63] to improve the functionality by resetting memory cells. Figure 3 reflects the general architecture of the LSTM model. Memory cell of LSTM is considered as its major innovation, which is used as a hoarder to store the state particulars. In the initial step, the forget gate is used to get rid of the unnecessary information. After that, a sigmoid operation is applied to measure the accelerate the forget state .

= ( . [ℎ −1 , ] + ) (2.2)
The second step is used to know which new data is required to get saved within the cell condition. Another sigmoid layer, known as the "input gate layer", is used to get the updated information. Then, a ℎ function is used to create a vector ̃ of novel values that ought to be updated embarked on upcoming state. In the last step, the output is needed to be decided. The step consists of a couple of further steps: the sigmoid function is used as an output barrier to strain the cell state. Further, the obtained cell state is passed across tanh (•), the obtained output is multiplied for the calculation of desired information.

Convolutional Neural Network (CNN)
CNN is also a class of ANN which become dominant in many applications including computer vision, signal and image processing, electricity load forecasting etc. CNN is designed to automatically and adaptively learn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. In the field of computer vision, artificial neural networks improved the results of various computer vision problems as compare to several classical techniques by utilizing and convolutional neural network is the most common type of artificial neural networks used for computer vision tasks such as image.
The scope of CNN is very wider and it has numerous applications, for example, in the health care domain as saving lives is a top priority in healthcare so CNN can be used for health risk assessment. In radiology CNN can be used for efficient classification of certain diseases, Segmentation of organs or anatomical structures, to detect abnormalities, by computed tomography (CT) or MRI images. Drug discovery is another major healthcare field with the extensive use of CNNs. There are many other applications of CNN in computer vision such as object location, object detection & image fragmentation [64], recognition of face [65], video grouping, in-depth assessment as well as image inscribing [66] classification. It is worth mentioning here, that the scope of application of CNN is not only limited to image processing tasks. CNN has many other application as well such as in natural language processing [67], day-ahead building-level load forecasts [68], anamoly detection [69], time series stream forecasting [70].
At that point, this thought acquired by specialists in modern zones there this innovation gets well known. In 1998, Lecun et al. [71] proposed plan which is improved structure than [72] engineering of the present Convolution Neural Network resembles LeCun plan. Convolution Neural Network gets well known by winning the ILSVRC2012 rivalry [73].
In the current years, many other advanced strategies have been implementer serving the purpose of load forecasting, for example, Artificial Intelligence as well as fuzzy logic. Savvy arrangements, considering AI advancements, to understand short term load forecasting are turning out to be increasingly more far-reaching these days. AI-based frameworks are created and sent out globally in numerous implementations, essentially due to their emblematic insightfulness, adaptability, and explain-country capacities. For instance, Ranaweera et al. suggested a technique that utilized fuzzy guidelines to consolidate recorded climate and load information. The historical data is a source from which these fuzzy principles were acquired utilizing a learning-type calculation [11]. He et al. [12] introduced architecture to evaluate the vulnerability associated with electrical load as well as gaining far more data upcoming subsequent load, afterward neural system has been utilized in order to remake quantile regression archetype aimed to develop probabilistically done forecasting technique. In one of the studies, a hybrid forecast technique for the wavelet change, neural system, and a set-up of evolutionary algorithm was suggested [74] and utilized for STLF. In a comparative methodology, Ghofrani et al. [23] presented a mix of Bayesian Neural Network (BNN) and a wavelet transform (WT) to create definite load attributes for BNN preparation. Viewing the design, a weighted total from the Bayesian Neural Network yields were made in application to anticipate the load desired for a particular day. However, in one of the other hybrid models, an STLF design suggested by [24] conjoins Phase Space Reproduction calculation with bi-square kernel (BSK) design of regression. In this particular structure, load information can be imitated by a phase space reconstruction algorithm to determine developmental patterns around historical loads details as well as data along with the implanted significance highlights data to expand the unwavering quality of the forecast. The BSK design, then again, link to the spatial structures between the points of regressions and the point located at their neighbour to get the guidelines of rotatory rules and unsettling influence in each measurement. The suggested model including multi-dimensional relapse was effectively established and utilized for STLF [24]. Mamlook et al. [25] utilized fuzzy rationale controller-based hourly to predict the effect of different restrictive boundaries, e.g., time, atmosphere, historical load sum of data and arbitrary unsettling influences, overload forecasting regarding fuzzy sets through the age procedure. In this examination, WT deteriorated the time arrangement into its segments, and afterward every part was forecasted by a mix of neural system along with evolutionary algorithm.

Performance Evaluation Measures
The quality of the predictor was examined by quantitatively measuring the accuracy in terms of root mean squared error (RMSE), coefficient of determination (R 2 ), mean square error (MSE), and mean absolute error (MAE). The following renowned error prediction metrics detailed in [75] are used:

Root Mean Squared Error (RMSE)
To examine the quality of a predictor, we need metrics to quantitatively measure its accuracy. In the current study, a quantity called RMSE was introduced for such a purpose, as defined by: Where and denote the measured and predicted values of the i-th sample, and 'n' denote the total number of samples of the training dataset. The smaller value of RMSE denotes the better set of selected descriptors.

Co-efficient of determination (R 2 )
R 2 can be computed using the following function: Here ̅ denote the average values of all the samples.

Mean Square Error (MSE)
MSE can be mathematically computed as follow (2.10) The MSE of an estimator measures the average of the squares of errors or deviations. MSE also denotes the second moment of error that incorporates both variance and bias of an estimator.

Mean Absolute Error (MAE)
MAE is the measure of difference between two consecutive variables, for example, variable y and x denote the predicted and observed values, then MAE can be calculated as: (2.11)

Mean Absolute Percentage Error (MAPE)
The MAPE is computed using the following formula. (2.12)

Results
In this study, we applied machine learning and deep learning methods such as MLP, LSTM, and CNN on load forecasting time series data from January 01, 2008 to December 31, 2009. The performance was evaluated in terms of R2, MAPE, MAE, MSE, and RMSE. We computed the next 24 hours, 72 hours, one week, and one-month forecasting using the proposed methods The distance between actual and predicted values is computed. If the difference between the observed and predicted values is smaller and unbiased, it indicates that the model best fits the data. Statistically, the goodness of fitness is also measured using residual plots which can reveal unwanted residual patterns that indicate biased results more effectively than numbers. R-squared is a statistical measure that indicates how close the data are to be fitted. It is also called as coefficient of determination. Table 1 reflects the prediction of the next 24 hours ahead load forecasting. Based on the Rsquared error method, the LSTM gives the highest prediction with R2 (0.5160) followed by CNN yield R2 (0.5462) and MLP yield R2 (0.6217). Based on MAPE, the highest next 24 hours ahead prediction was obtained using MLP with MAPE (4.97) followed by LSTM yield MAPE (5.17) and CNN yield MAPE (5.62). Accordingly, the prediction performance in terms of MAE, the highest 24hours ahead prediction was obtained using MPL yield MAE (104.33) followed by LSTM with MAE (109.2) and CNN with MAE (115.62). The highest prediction in terms of MSE was obtained using MLP with MSE (17936.03) followed by CNN with MSE (21515.55) and LSTM with MSE (22947.59). Likewise, the next 24 hours ahead load forecasting prediction was obtained using MLP with RMSE (133.92) followed by CNN with RMSE (146.68) and LSTM with RMSE (151.48).   Figure 5 shows the results of 24-hours ahead load forecasts obtained by three different methods (MLP, LSTM, and CNN). From the extracted curves, the MLP is closest to the actual load curve followed by LSTM and CNN. The corresponding error values are obtained as reflected in Table 1. Table 2 reflects the prediction of the next 72 hours ahead load forecasting. Based on the Rsquared error method, the LSTM gives the highest prediction with R2 (0.7153) followed by CNN yield R2 (0.7176) and MLP yield R2 (0.7588). Based on MAPE, the highest next 72 hours ahead prediction was obtained using MLP with MAPE (7.04) followed by LSTM yield MAPE (8.07) and CNN yield MAPE (7.44). Accordingly, the prediction performance in terms of MAE, the highest 72 hours ahead prediction was obtained using MPL yield MAE (125.92) followed by CNN with MAE (140.21) and LSTM with MAE (144.84). The highest prediction in terms of MSE was obtained using MLP with MSE (35393.93) followed by CNN with MSE (41451.11) and LSTM with MSE (41786.91). Likewise, the next 72 hours ahead load forecasting prediction was obtained using MLP with RMSE (188.13) followed by CNN with RMSE (203.59) and LSTM with RMSE (204.42).   Figure 6 shows the results of 72-hours ahead load forecasts obtained by three different methods (MLP, LSTM, and CNN). From the extracted curves, it can be seen that the LSTM and CNN are closest to the actual load curve followed by MLP. The corresponding error values are obtained as reflected in Table 2.   Table 3 reflects the prediction of the next one week ahead load forecasting. Based on the Rsquared error method, the CNN gives the highest prediction with R2 (0.7616) followed by) and LSTM yield R2 (0.8814) and MLP with R-square (0.8879). Based on MAPE, the highest next oneweek ahead prediction was obtained using MLP with MAPE (6.162) followed by LSTM yield MAPE (6.74) and CNN yield MAPE (6.79). Accordingly, the prediction performance in terms of MAE, the highest one-week ahead prediction was obtained using MPL yield MAE (103.156) followed by LSTM with MAE (107.13) and CNN with MAE (158.27). The highest prediction in terms of MSE was obtained using MLP with MSE (22746.21) followed by LSTM with MSE (24060.60) and CNN with MSE (48390.42). Likewise, the next one week ahead load forecasting prediction was obtained using MLP with RMSE (150.81) followed by LSTM with RMSE (155.11) and LSTM with RMSE (219.97). Figure 7 shows the results of one week ahead load forecasts obtained by three different methods (MLP, LSTM, and CNN). From the extracted curves, it can be seen that the LSTM and CNN are closest to the actual load curve followed by MLP. The corresponding error values are obtained as reflected in Table 3. Table 4 reflects the prediction of the next 15 days ahead load forecasting. Based on the Rsquared error method, the CNN gives the highest prediction with R2 (0.76) followed by) and MLP yield R2 (0.87) and LSTM with R-square (0.89). Based on MAPE, the highest next two-weeks ahead prediction was obtained using LSTM with MAPE (5.44) followed by MLP yield MAPE (5.54) and CNN yield MAPE (8.67). Accordingly, the prediction performance in terms of MAE, the highest two-weeks ahead prediction was obtained using LSTM yield MAE (93.67) followed by MLP with MAE (99.50) and CNN with MAE (141.89). The highest prediction in terms of MSE was obtained using LSTM with MSE (17395.14) followed by MLP with MSE (19963.230) and CNN with MSE (36938.71). Likewise, the next two-weeks ahead load forecasting prediction was obtained using LSTM with RMSE (131.89) followed by MLP with RMSE (141.29) and CNN with RMSE (192.19). Table 3 reflects the prediction of the next one week ahead load forecasting. Based on the Rsquared error method, the CNN gives the highest prediction with R2 (0.7616) followed by) and LSTM yield R2 (0.8814) and MLP with R-square (0.8879). Based on MAPE, the highest next oneweek ahead prediction was obtained using MLP with MAPE (6.162) followed by LSTM yield MAPE (6.74) and CNN yield MAPE (6.79). Accordingly, the prediction performance in terms of MAE, the highest one-week ahead prediction was obtained using MPL yield MAE (103.156) followed by LSTM with MAE (107.13) and CNN with MAE (158.27). The highest prediction in terms of MSE was obtained using MLP with MSE (22746.21) followed by LSTM with MSE (24060.60) and CNN with MSE (48390.42). Likewise, the next one week ahead load forecasting prediction was obtained using MLP with RMSE (150.81) followed by LSTM with RMSE (155.11) and LSTM with RMSE (219.97).    Table 4. Table 5 reflects the prediction of the next one month ahead load forecasting. Based on the Rsquared error method, the CNN gives the highest prediction with R2 (0.90) followed by) and MLP yield R2 (0.92) and MLP with R-square (0.92). Based on MAPE, the highest next one-month ahead prediction was obtained using LSTM with MAPE (4.38) followed by MLP yield MAPE (4.23) and CNN yield MAPE (5.1). Accordingly, the prediction performance in terms of MAE, the highest one-month ahead prediction was obtained using LSTM yield MAE (75.12) Figure 9. One month ahead load forecasting using a) MLP, b) LSTM, c) CNN. Figure 9 shows the results of one month ahead load forecasts obtained by three different methods (MLP, LSTM, and CNN). From the extracted curves, it can be seen that the LSTM and CNN are closest to the actual load curve followed by MLP. The corresponding error values are obtained as reflected in Table 5.

Discussions
Accurate load forecasting can help alleviate the impact of renewable energy access to the network, facilitate the power plants to arrange unit maintenance, and encourage the power broker companies to develop a reasonable electricity price plan. Many activities within the power system such as the maintenance scheduling of generators, renewable energy integration, and even the investment of power plants and power grids depend on the load forecasting. In the electricity market, the regulators monitor the activities based upon the forecasting load and power generators. Customers and power brokers decide their action strategies.
Convolution Neural Network (CNN) is extensively used in forecasting. CNN are capable of apprehending pattern characteristics along with scale-invariant characteristics when the close by statistics has solid relationships with one another [76]. The design of locally set course of load informational data in close by hours could be extricated by CNN. In [77], another load forecasting design that utilizes CNN infrastructure is given and made a comparison to the rest of the neural systems. The outcomes reveal that MAPE along with CV-RMSE of suggested calculation is 9.77% and 11.66% that are the tiniest number across all the designs. The examinations demonstrate the fact about CNN infrastructure is essential regarding load forecasting and concealed characteristics could possibly be taken out through the planned 1D convolution surfaces. In light of the above description, LSTM as well CNN are equally shown to give high exactness forecast in STLF because of their advantageous feature to catch concealed characteristics. Along these lines, it is required to build up a hybrid neural systematic structure that can catch and incorporate such different invisible traits to give effective execution. More pointedly, it comprises of three sections: The LSTM module, the CNN module as well as featured fusion module. The LSTM module can get familiar with the valuable data for quite a while by the overlook gate as well as memory cell with the CNN module is used to remove schemes of nearby patterns and similar design which shows up in various areas. The featured fusion combination module is utilized to incorporate these unseen essentials and make the ultimate forecast. The suggested CNN-LSTM framework was created and implemented to foresee a real word electric load series of time. Also, a few strategies were actualized to be contrasted with our suggested model. To demonstrate the legitimacy of the suggested model, the CNN and LSTM modules were trailed separately. Moreover, the data record was separated into a few segments to test the strength of the suggested model. In outline, this paper proposes a profound learning structure that can successfully catch and incorporate the concealed characteristics of LSTM as well as the CNN model to accomplish far better precision and strength.
In this study, we computed ahead forecasting of one day, one week, two weeks and one month by applying MLP, LSTM and CNN. The computational performance was computed in terms of R 2 , MAPE, MAE, MSE and RMSE. We optimized the parameters of these algorithms in order to get the improved performance to obtained the ahead STLF and MTLF as reflected in the Table 6. The results yielded from our study with previous findings in terms of MPE for one day, one week, two weeks and one month revleas that our proposed models with parameters optimization gives the highest ahead detection performance.

Conclusion
In this study, we have taken the load data from a feeder and computed load forecasting for the next 24 hours, 72 hours, one-week, two-weeks, and one month to forecast the load demands for short-term and medium-term load demands. We optimized the AI algorithms such as MLP, LSTM, and CNN to improve the forecasting performance. We measured the forecasting performance based on the robust error metrics such as squared error, MAPE, MAE, MSE, and RMSE. The smallest error value between the observed and predicted values of these different ahead forecast value indicates that model best fit the data for the next 24 hours, 72 hours, one week and one-month load forecast. The smallest value of the R 2 statistical measure method indicates how closely the data to be fitted. The Rsquared yields the smallest error in all these cases followed by MAPE, MAE, RMSE, and MSE. For STLF, the MLP and LSTM gives the better forecasting. However, for MTLF, the CNN and LSTM gives more better forecasting. It indicates that data demands are getting higher the deep learning models with a high number of neurons and optimized activation functions provide better predictions. The results indicate that the proposed model with optimizing the deep learning models yielded good predictions for short-term and medium-term forecasting. This indicated that the power systems with their complexity and growth and other different influential factors for power generation and consumption, power planning, etc. can be better predicted using this approach.

Limitations of study and future recommendations
There are various factors which effects the load growth such as peak hours in the day during which the demand of electricity increases or some environmental factors which effects on energy demand. Currently, we computed ahead load demands using different proposed models from January 2008 to December 2008 and July to December 2009 of one grid. We will considered each of these factors separately to see the effects of load growth.