Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000

The next-day load forecasting is complex due to the load pattern variations driven by external factors such as weather and time. This study proposes a hybrid model that incorporates the Classification And Regression Tree (CART) with pruning conditions and a Deep Belief Network (DBN) to improve forecasting accuracy. The CART can recognize the load patterns by classifying similar groups with low variance, thus reducing the complexity of the forecasting model. The actual 48-period load data from the Electricity Generating Authority of Thailand (EGAT) is used. The proposed model is compared with five widely used standalone forecasting benchmark models and provides better performance at the minimum 0.46% MAPE. Moreover, the forecasting performance of DBN and the other four benchmark models are improved by using our hybrid approach. INDEX TERMS Classification and Regression Tree (CART), Daily load forecasting, Deep Belief Network (DBN), Forecasting accuracy, Pruned-CART.


I. INTRODUCTION
Load forecasting plays a vital role in the planning and operation of electric generators as it requires meeting the generating units between supply and demand. It can be divided into short-term, medium-term, and long-term forecasting duration [1]. Forecasting electrical load for annual data is considered as long-term, and monthly forecasting is considered as medium-term. Daily load forecasting is called short-term load forecasting, ranges from hours to weeks. The accuracy of the forecasting model can reduce the operating cost as it could reduce the standby generators for spinning reverse. However, the reliability of the predicted results depends on the features of the input and the model fitting.
Daily load patterns depend upon myriad aspects that change incessantly due to weather effects, seasonal effects, calendar effects, and so on. Weather effects typically include temperature, sunshine, cloudy, and humidity, while seasonal effects comprise spring, summer, fall, and winter. Among these effects, the temperature variable plays a significant impact in examining electric load demand from the consumers [1]. Calendar effects vary from day to day, month to month, and on holidays. Thus, understanding the nonlinear relationship between load patterns and influential variables can enhance electric load forecasting performance. This paper proposes the classification and regression tree (CART) with pruning predefined conditions to classify the pattern of electrical data. The proposed classification depends on the calendar, such as day of the week, the month of the year, holidays, and bridge holidays.

A. RELATED WORKS
Pattern classification was introduced in the late 20 th century and later applied in power system load forecasting [2]- [4]. Load pattern clustering for electricity customers was conducted using the defined harmonic-based features grouped by fixing the minimum modified Euclidean distance. The primary purpose of their system is to store electric customers' big data into manufacturer databases in a classified manner. The scholars developed a clustering algorithm to store refined data according to the time-frequency domain. This technique reduced the data by selecting and storing essential features in the database following the specific cluster. It used the harmonic order in terms of the time zone for clustering the electric customer's data [5]. In addition, Zhong and Tam proposed characteristic attributes in the frequency domain (CAFD), which helps to create a classification tree for load profiles. This CAFD classification method was experimented with different complex parameter values to fine-tune the number of final leaf nodes of the tree. The illustrated work is the hierarchal classification of load profiles according to the frequency domain [6].
Ferreira et al. evidenced that recognizing load patterns could improve the reliability of the electric power sector and support decision-making levels in management. In their research, the whole sample using specific features was clustered at the first iteration. Next, the similarity between medians of each group and the median at the first iteration are verified to cluster some patterns for further iterations [7]. Kotriwala et al. combined load patterns and power-zone attributes for clustering power-zones and used the −means algorithm for classifying the load patterns. They used logistic regression (LR) and support vector machine (SVM) for classification and linear regression model for forecasting the power load. They revealed that load classification could optimize the system efficiency of power generation planning [8]. Yin et al. proposed the deep forest regression, wherein random forest with Gini index parameter was applied both for classification and regression to improve the forecasting accuracy of the power system. Besides, the random forest (RF) algorithm randomly chooses the sample data and features randomly, which causes sample duplication and a lack of essential features [9].
Classification is one of the significant components of data mining techniques. The prime task is to analyze a set of provided data and generate some essential rules, which can be used to classify future data. There exist many applicable classification models, which include the k-means algorithm, SVM, naive Bayesian classifier, artificial neural network (ANN), and decision tree (DT) [10]. These models are generally helpful for both classification and regression problems. The k-mean algorithm was used for clustering hourly load data to improve the functional linear forecasting model. It optimized the number of clusters, denoted as G, by computing the determination coefficient and the Gdimensional function of the regression model. Their approach used a fixed number of groups for classification and used the functional regression technique for forecasting. The method is complex and only works with a parametric system such as parametric regression models [11].
Moreover, load patterns were clustered using a threshold value of different average loads (30MW and 60MW) concerning day types between train and test data. Then the average load is predicted by using the SVM model. However, they used manual classification in terms of day type [12]. Zhang et al. combined two models where the CART was used with Gini index parameter to build tree analysis, and the SVM model was used for forecasting one-week test data. They have performed short-term load forecasting (STLF) on smart meters data by using big data analytics technologies. However, there was an overfitting problem during training the machine learning (ML) model [13]. Besides, Moon et al. proposed a hybrid model for STLF by combining RF and multilayer perceptron (MLP). The RF model selects a subset of variables from the whole original set. To form the optimal tree, this model adjusts the total number of trees to be generated (nTree) as a minimum split and the decision treerelated parameters (mTry) as a split criterion, respectively. Nevertheless, they have tried various ML models with different configurations, but their error accuracies are still high [14].
Many types of research on classifiers or clusters for electricity load forecasting have been investigated since the last decade. Among popular classification models, DT is an up-and-coming model. Many algorithms such as Iterative Dichotomiser 3 (ID3), C4.5 algorithm, RF, and CART are used to create the DT [15]. Srivastava et al. conducted STLF using the DT algorithms, for instance, RF, Bagging, and M5P tree algorithms. The results are compared for each algorithm. In their research, the bagging model trained a DT from each bootstrap training data sample and selected the final bagged model with a minimum mean for prediction. Nonetheless, they applied all models with default configuration for regression performance [16]. Loh also reviewed some comprehensive classification tree algorithms for classification and regression algorithms for prediction, comparing their strengths and weaknesses [17]. Among the above-discussed algorithms, CART is considered one of the most popular models, which can efficiently deal with the problems of both classification and regression [4].
Mori and Kosemura proposed optimizing the optimal CART by the tabu search algorithm at each split node and pruning it by computing the cross-validation and standard errors [18]. Similarly, Mori et al. used the simplified fuzzy inference to find the efficient rules from actual data for CART and predicted one-step load by MLP. Their CART model classified the data into two terminal nodes, which can cause impurity of load profile [19]. Hambali et al. worked with three decision tree algorithms for forecasting electrical power load: CART, reduced error pruning tree (REPTree), and decision stump (DS). CART and REPTree used the Gini index and regression logic tree in the cited work, while DS measures the dataset's entropy according to each attribute. On the other side, these tree models were applied with the purpose of regression [20]. Besides, Lei et al. combined the −means algorithm for clustering, whereas RF and CART for regression. Lei et al. implement these algorithms on Apache Spark machine learning library (MLlib). However, the data processing ability of the model needs to be more optimized to improve the forecasting performance [21]. Additionally, Guo used the CART to classify load patterns and the "If…Then" structure to create pattern rules. The author measured the misclassification risk and complexity to prune the CART. The simple ANN model has been used for load forecasting, and it requires more computation power and time for training [22].
Many researchers have cited myriad advantages of tree pruning since the late 1990s. In this regard, Shah and Sastry proposed Alopex, learning automata (LA), and generative algorithm (GA) for pruning the tree handling on two classification problems [23]. Alternatively, pruning the CART algorithm could help the penalized regression model selection by proving the risk bounds [24]. Guo and Niu proposed the CART pruning by calculating the cost-complexity measure, which chooses the minimal sum of squared residual with the complexity parameter to get the final pruned tree. Likewise, the cost complexity function is used for trimming CART of load pattern recognition that is clustered primarily using the fuzzy C-means clustering to increase the forecasting accuracy [25]. The feature selection and the input data reduction can also be made by using a classification model with a predefined set of clusters to assign new electrical loads customers [26]. Hanif et al. proposed the optimal constrained pruning of decision trees, applied the proposed model to a large-scale transportation analysis and simulation system (TRANSIMS), and revealed the flexibility and effectiveness of the proposed approach. For these reasons, adding objective functions and constraints could optimally prune the original decision tree [27].
Based on the above-cited works, it is evident as the broad daylight that the forecasting performance can be improved by using an appropriate classification model. Therefore, the CART seems to be a promising approach for grouping the similarity of load profiles, improving the model's forecasting performance. The data preprocessing stage can be eliminated by using the CART algorithm. It can also reduce data imbalance issues. In addition, pruning the decision tree could reduce the size and the overfitting problem, which might slightly increase the training error while decreasing the testing error. As a result, the pruned tree is more efficient, accurate, and understandable than the original unpruned tree.
At the start of the 21 st century, all neural network-based models have been successfully deployed in the field of load forecasting. These models can be divided into traditional ML and deep learning (DL). The popular ML forecasting models are regression trees, gradient boosting model [22], [28], bagged neural network (BNN) [29], ANN, and SVM [30]- [32]. It has also been noticed that the above-cited models applying different data performed better accuracy for STLF. Nevertheless, DL models have become more popular because of having exceptional capabilities such as controlling nonlinear relationships, model complexity, and computation time over ML models. Schmidhuber proposed the deep ANN, the learning of unsupervised and reinforcement, and evolutionary computation to clarify the use of DL models [33]. Recurrent neural network (RNN) [34], long short-term memory (LSTM) [35], convolutional neural network (CNN) [36], deep neural network (DNN) [37], and deep belief network (DBN) [38] have been investigated in STLF. The forecasting performance of these DL models was satisfactory by comparing them with traditional statistical time series and ML models based on different case studies.
Furthermore, many researchers performed STLF using hybrid models that combine two or more models. The hybrid approach appears to be a viable choice based on the input structure, selected method, and application. Kouhi applied a cascaded neural network that consists of an intelligent twostage feature selection with the forecasted engine of three cascaded neural network structures [39]. In the same way, GA operations in particle swarm optimization (PSO) and Bayesian optimization (BO) were applied to achieve better forecasting accuracy for ANN and SVM, respectively. However, their proposed ML model costs high computation time [40] [41].
As mentioned above, the hybrid models provide promising results compared to the standalone models in terms of accuracy. In most hybrid studies, they combine two or more models for performing load forecasting. However, this approach still required data preprocessing manually. In our work, we use CART to group the similarity of load patterns and the variance reduction. Therefore, there is no need for data preprocessing and handling data imbalance issues. The other benefit of our approach is to use decision tree pruning which reduces overfitting and provides a balanced training set for the forecasting model. Additionally, our forecasting DBN model has several benefits, such as the solution of vanishing gradient decent problem, pertained and fine-tuning process, and less computation time than other NNs. The summary of literature reviews on classification models of electricity load is listed as the last table in the Appendix.

B. CONTRIBUTIONS
For the first time, a mixture of classification and regression models is proposed to predict short-term electrical loads. The historical electrical load data is classified using the CART model, which uses mean squared error (MSE) criteria to group similar load data with low variance. Because of this, the proposed hybrid model omits the usage of the data preprocessing module, which is compulsory in conventional electrical load forecasting models. Predefined conditions are created to prune the original CART because there are many irregular load patterns on special days, such as holidays or the day before or after holidays, separately split by CART. This issue could impact training the forecasting model. Keeping in view this issue, we pruned the original CART with predefined algorithms. A different DBN forecasting model for each terminal node of CART is also built to train and test input data. The forecasting performance of the DBN model is better than benchmark models, which use manual classification data based on day of the week (DoW) as the input load data has almost the same variance each other. Finally, an extensive comparison is carried out between our proposed hybrid model and other baseline models to prove the validity of our proposed hybrid electrical load forecasting model. VOLUME XX, 2017

C. PAPER ORGANIZATION
The structure of the article is organized using the following sections. Section II describes the system modeling of the proposed hybrid model. Section III gives the details for the design of experiments. The outcomes of the experiments with comparative analysis are also provided in Section III. The conclusion of this manuscript is provided in Section IV.

II. SYSTEM MODELING
This section mainly describes the whole system of the proposed hybrid model, including a brief description of both CART and DBN models. Following are three steps of the proposed framework, including mainly the classification module, the training forecasting module, and the accuracy measurement using test data, as exhibited in Figure 1. The process of CART, pruned-CART, and DBN regression models are further explained in different subsections.
• At first, the classification module arranges the training and testing datasets of independent and dependent variables based on case studies. Next, training data is provided to the classification module to build the training CART. The original CART is executed with pure homogenous leaf nodes of target load data by recognizing independent variables. The original CART is pruned based on predefined constraints. The process of pruned-CART continues until all leaf nodes satisfy the requirements. After creating the final pruned-CART, a test set, including both independent and dependent variables, is given into this CART. The test set is then fallen into the respective terminal node based on its independent variables. • The second train forecasting module builds different DBN regression models regarding each terminal node generated by the pruned-CART. Afterward, the training set for each forecasting model is trained along with adjusted parameters. • Finally, predictions of each DBN trained model on the whole test set are performed, and error metrics are calculated on test data to measure the forecasting accuracy.

A. CLASSIFICATION AND REGRESSION TREE (CART)
CART is one of the decision tree models that build either regression or a classification model in the form of a tree structure [42]. An example of CART in the framework is shown in Figure 2, where CART is a pair, = ( , ), in which is a set of nodes and are the edges. Suppose that there is a sample CART as shown in Figure 1, where { 1 , 2 , … , 11 } are the nodes and { 1 , 2 , … , 10 } are the edges of CART. The leaf nodes are represented by 1 , 2 , 3 , 4 , 5 and 6 , whereas an edge is between two leaf nodes, for example, 1 = ( 7 , 1 ) is an edge going from 7 to 1 . CART depends on independent variables to split a node into two or more sub-nodes by minimizing the total data variance in sub-nodes. The minimum calculation of the criteria value of all variables is determined to choose the best split variable. In our case, the reduction in variance is fitted for the continuous data type. Therefore, MSE is minimized to group the similar load patterns along with the feature selection, which is expressed mathematically as, where be the set of load demand vector in node , = ̅ 1 , ̅ 2 , … , ̅ , ̅ is the load demand vector of day in node . Let be the vector of load demand on day , ̅ = { ,1 , ,2 , … , ,48 }where is the load demand on the day at period , be the set of the day in node l, be the set of the average load demand for each period, is the average load demand at period i, = { 1 , … , 48 } be the set of mean squared error of load demand for each period. The of demand at period in node is given as,

B. PROCESS OF PRUNED-CART
Pruning methods aim to generally solve overfitting problems and reduce the size of the tree, which might slightly increase the training error while decreasing the testing error. Moreover, the pruned tree is more efficient, accurate, and understandable than the original unpruned tree. Hence, the pruned-CART is proposed to solve the overfitting for training the forecasting model. The original CART is recursively checked on each leaf to see whether it satisfies the predefined constraints. If the leaf node is not associated with them, it is pruned or deleted from the tree. for each ∈ 3: if < 4: for each ∈ 5: if ( ) = then ← ( ) end if 6: end for 7: ← ∅ (i.e., R is a set of removed edges) 8: for each ∈ 9:   and 2 are deleted from set E. Then, node 7 becomes a leaf node of set L. Similarly, 3 (resp. 5 ) and 4 (resp. 6 ) are pruned because the sample of 4 (resp. 5 ) has less than 15, and they have the same parent. Therefore, 8 (resp. 9 ) become leaf nodes of set L and the edges 3 (resp. 5 ) and 4 (resp. 6 ) are deleted from set E.
After pruning, the first pruned-CART is obtained, as demonstrated in Figure 3(b), and it has five nodes and four edges. The pruned-CART with one-layer depth is recursively pruned by using recursive-pruned-CART, where the samples of all leaf nodes are checked using a check-sample. Ultimately, the pruned-CART is executed with three nodes and two edges, as shown in Figure 3(c). Two leaf nodes of pruned-CART have enough samples to train the forecasting model.

C. DEEP BELIEF NETWORK (DBN)
The DBN model was introduced by Hinton [43]. It has become popular in forecasting areas using real-world data such as sunspot data, exchange rate data, electricity load data, and generation data [43]. It is also combined with other traditional time series models to optimize the forecasting performance [44]. The DBN reconstructs the inputs with consideration to the probability and includes various learning modules with less complexity. It mainly consists of the restricted Boltzmann machine (RBM) using unsupervised learning to pre-train the network with each pair of layers. As depicted in Figure 4, the RBM model has a double-layer neural network containing a visible layer and a hidden layer with Boolean hidden units. This model is the pre-trained model with generative energy that can learn a probability distribution over a range of input sets. In addition, it has only a single hidden layer of hidden units as there are no interconnections within the same layer. Thus, it consists of symmetrical connections to a visible layer of units. However, it still requires the formation of a bipartite graph with its neurons. The learning process of layer-wise configuration and the probability distribution of the hidden layer [45] is written as, where, = the binary state of ℎ neuron in the visible layer, = the number of neurons in the visible layer, ℎ = the binary state of the ℎ Boolean neuron within the hidden layer, ℎ = the number of neurons in the hidden layer, , = the weight matrix between the visible layer and the hidden layer, , = bias vectors for the visible and hidden layers. The activation functions of the visible and hidden layer are described as, where refers to the sigmoid activation function [46]. The following steps describe how the DBN regression model predicts the future daily load.
• After achieving the pruned-CART process, different DBN models for each terminal node are created for training.
• The same training parameters are applied to train all DBN models of each terminal node of the pruned-CART in the training process.
• Testing data from Apr 2020 to Mar 2021 are predicted to be associated with each leaf node's respective trained (1 <= <= ).
• Error metrics measure the forecasting performance on the test data.

III. DESIGN OF EXPERIMENTS, RESULTS, AND DISCUSSIONS
The section highlights the necessary steps involved in the design of experiments, and later, the discussions of experimental results are also provided.

A. EXPERIMENTAL DATA
Electrical data recording every thirty minutes per day is provided by the electricity generating authority of Thailand (EGAT). The net peak load data in megawatt (MW) from Apr 2018 to Mar 2021 is applied to train the proposed model. Firstly, due to some outliers for specific periods, the data is categorized using CART, which will improve the forecasting accuracy. Figure 5 illustrates load patterns for the first week in May 2018 to reveal the difference between weekdays and weekends. Among weekdays, there is one holiday, so-called International Workers' Day, on May 1. The load fluctuation on holiday reacts differently from other regular days. In general, the reaction of load patterns on Saturday is almost like weekdays. Nevertheless, the usage of load demand on Sunday is lower than other days, so the load pattern fluctuates inversely.

B. DATA SEGMENTATION
The training dataset from Apr 2018 to Mar 2020 and the testing dataset from Apr 2020 to Mar 2021 are divided between training and testing the proposed model. The whole training set is applied to build the original CART for grouping similar load patterns with low variance. After that, leaf nodes of this CART that do not have enough training samples are pruned with predefined conditions. Then, one test set is started to feed into the model. For instance, the first leaf node from a tree is grouped only by Monday in the training set. If the first test set is Monday, then it is gone into the first leaf node. The following test set is provided and fallen into the associated leaf node. The whole test set continues sliding, and the same procedure is conducted until the end of the classification process. It is called the walk-forward train and the test routine is shown in Figure 6.

C. DATA ARRANGEMENT
The input selection is one of the crucial factors in network modeling for forecasting purposes. Furthermore, the input variables are dominated, varying from day to day, week to week, or month to month. Consequently, the following independent and dependent variables are applied to train the CART and handle the nonlinear relationship between input and output variables. Independent variables are holiday or non-holiday (Hol), bridging holiday or non-bridging holiday (B-Hol), day of the week (DoW), and month of the year (MoY). Binary variables (1 or 0) are used for Hol and B_Hol. For the DoW variable, one is for Monday; two is for Tuesday, and so on. Another variable MoY uses 1 for Jan, 2 for Feb, 3 for Mar, and so forth. The dependent variable of CART is the actual load demand, which is a continuous data type.
For the DBN forecasting model, yesterday's load demand (YDt(d-1)) from the same leaf, Hol, B_Hol, DoW, and MoY, seasonal index (SI) is input variables to predict daily forecasted load demand (FDt(d)). SI for each period is calculated by dividing the actual load demand of each period by the average daily load demand. This paper conducts two different cases to train both classification and forecasting models with different variables. The former case does not include the MoY variable, whereas the latter one includes MoY and SI. The same duration of train and test are used for both cases. The parameters for each DBN model are listed: (i) hidden layer = 10, (ii) learning rate of RBM = 0.001, (iii) number of epochs for RBM = 10, (iv) number of iterations for backpropagation = 100, (v) batch size = 50, and (vi) activation function = rectified linear unit (ReLU).
Alternatively, forecasting input features of the DBN model are trained for all forecasting benchmark models, followed by the training parameters of each benchmark model. Two different input structures of CART are also used for MC. The first DNN model uses the ReLU function with 50 hidden layers, 100 training cycles, ten epochs, and stochastic gradient descent (SGD) for backpropagation. In the second ANN trained model, a sigma activation function, two hidden layers, 0.01 learning rate, 0.9 momentum, and 100 training cycles are applied. The next SVM trained model used a dot kernel type and 0.001 convergence epsilon to optimize parameters. The last LR trained model is trained with M5 prime for feature selection.

1) DATA ARRANGEMENT FOR CASE I
For the classification module, three independent variables: Hol, B_Hol, and DoW are included to create the CART in case I. On the other hand, the forecasting model has four input variables such as Hol, B_Hol, DoW, and (YDt(d-1)) to forecast 4 ( ), which represents the forecasted load demand of four inputs. The forecasting equation for the case I is denoted in Eq. 8, where is a constant, and 1 , … , 4 are coefficients of each input. The sample data arrangement of leaf node 1 for the case I is indicated in Table I

D. EXPERIMENTAL RESULTS
Mean absolute percentage error (MAPE) and MSE are calculated to measure the forecasting accuracy. These error metrics reveal how many units of the forecasted demand deviate from the actual demand, contributing to Eq. 10 and Eq. 11. For both cases, the proposed model is compared to DBN with MC. Furthermore, the generated results of the proposed model are compared with other standalone benchmark models, such as DNN, ANN, SVM, and LR. In addition, we combine CART with all standalone benchmarking models to ensure that the combination of classification gives better performance on forecasting models. Five categories are regarded as weekdays (Tuesday-Friday), weekends, Monday, holidays, and bridging holidays to make the monthly MAPE and MSE comparisons for each case. Before discussing MAPE and MSE comparisons, the differences between CART before pruning and CART after pruning for both cases are explained.

1) CASE I: DIFFERENCE BETWEEN ORIGINAL CART AND PRUNED-CART
In case I, the original CART is primarily built based on the DoW independent variable, and holidays are also detected separately. This CART executes twelve terminal nodes observed at the last depth layer before pruning, as indicated in Figure 7. In Figure 7, x[0], x [1], and x [2] refer to the independent input variables such as Hol, B_Hol, and DoW, respectively. Five nodes with the black boxes are regarded as the final terminal nodes after pruning CART. The first leaf node has only loaded a group of Mondays, showing 88 training days without holidays. The group of Tuesdays, Wednesdays, Thursdays, and Fridays is included in the second terminal node with 390 training days. The third and fifth leaf nodes are grouped with loads of Saturdays without holidays and Sundays with holidays. However, the load group for holidays, including Mondays to Saturdays, is generated at the fourth leaf node. The detail of leaf nodes on each load group, total training days, and categories of independent variables for the case I is revealed in Table III. Certainly, CART works very well for the classification module.

2) CASE II: DIFFERENCE BETWEEN ORIGINAL CART AND PRUNED-CART
The additional MoY variable is given in the training process of CART so that the seasonality of data is affected on a tree in case II. There are twenty-two terminal nodes in the last depth layer of the original CART. Only six terminal nodes remain after pruning the tree, as shown in Figure 8. [2], and x [3] in the tree represent the independent variables regarded as Hol, B_Hol, DoW, and MoY, respectively. In this case, the tree is mainly classified in terms of the MoY variable. Leaf 1 is grouped loads from Monday to Saturday in January, amounting to 52 days. Load demands from Mondays to Saturdays are categorized in leaf 3, leaf 4, and leaf 5 in February-June, July-November, and December, with 238, 250, and 45 training days. The fifth leaf node is split into loads on holidays in the training set, with 41 days total. The last terminal node has only Sunday's loads from January to December, including holidays in total days of 105. The detail of the load groups under DoW and MoY, full training days, and categories of independent variables for case II is indicated in Table IV.

4) MONTHLY MAPE AND MSE COMPARISONS FOR CASE II
For case II, the monthly MAPE and MSE comparisons on test predictions between CART and MC are indicated concerning five categories, as presented in Table VII and Table VIII, respectively. In this case, the performance of the proposed model for weekdays is improved. Moreover, the BH category indicates that CART is still better than MC, as in the case I. However, MAPEs and MSEs of CART of holidays are much higher than MC in Apr, Jan, Feb, and Dec. It is indicated that adding MoY and SI variables in the proposed model provides worse forecasting performance, whereas it provides better performance for the forecasting model alone.

5) MONTHLY MAPE AND MSE COMPARISONS BETWEEN CASE I AND CASE II
The error comparisons using the proposed model between case I and case II are also enumerated in Table IX and Table X. The experimentation results show that additional input features cannot improve the forecasting performance even though it is suitable for the classification model. Total average error percentage of 4 ( ) outperforms 6 ( ) in all groups, except weekdays. In the category of weekdays, MAPE performance of 4 ( ) deteriorates in comparison to 6 ( ). In case 4 ( ), the error performance is 5

6) MAPE COMPARISON BETWEEN THE PROPOSED MODEL AND STANDALONE FORECASTING MODELS
Comparative analysis between the proposed and benchmark standalone forecasting models is also performed, where MAPE is evaluated using case I. The opted benchmark models are DBN, DNN, ANN, SVM, and LR. Four input features and MC are used for training all benchmark models on test predictions from Apr 2020 to Mar 2021. From the analysis of results (exhibited in Figure 9) Figure 10. The proposed model outperforms benchmark models for weekdays, at around 4% in total average. About 6% of MAPE in CART with DBN and DBN models is given, while other models are delivered around 7% on weekends. In the case of Monday groups, the proposed model has less performance than all baseline models, except SVM. The CART with DBN is ranked as a second good model in the holiday group, representing near 12%. ANN and LR give better accuracy than others for the BH category, although the proposed model provides a reasonable error, at approximately 4%. It is revealed that the effect of external input factors has the drawback for improving accuracies in the groups of holidays and Mondays.

7) MONTHLY MAPE AND MSE COMPARISONS BETWEEN CASE I AND CASE II
All standalone benchmark models are combined with the CART model to measure the improvement of forecasting performance. Therefore, two error metrics on the whole testing set are computed for all alternative hybrid and standalone models and compared to each other. Overall, all forecasting models achieve better accuracy using our approach, and they have approximately 5% MAPE and 2500GW MSE, as represented in Table XI and Table XII. Regardless of errors in the DBN standalone model in case I, the combination of the CART and five forecasting models improve the performance compared with all standalone forecasting models. Consequently, the usage of the classification algorithm could improve the performance of short-term forecasting.

IV. CONCLUSION
In conclusion, this article focuses on the daily load forecasting for the electric industry. The CART model is proposed to classify the load data and handle nonlinear problems between input and output variables. Moreover, grouping similar load patterns can help the improvement in the forecasting model's training process on unseen data. The DBN forecasting model is applied for predicting the daily load demand. The historical load data provided by EGAT is used. Our proposed model has been compared with five standalone forecasting benchmark models by using two error metrics. It outperforms all benchmark models, giving a minimum MAPE of 0.462%. Additionally, the CART is combined with all forecasting models and measured accuracies. Consequently, the combination of classification and forecasting models ensure the improvement in the accuracy performance on STLF. Besides, the CART provides the insight classification of seasonality effect better than MC.