Equation Based New Methods for Residential Load Forecasting

This work proposes two non-linear and one linear equation-based system for residential load forecasting considering heating degree days, cooling degree days, occupancy, and day type, which are applicable to any residential building with small sets of smart meter data. The coefficients of the proposed nonlinear and linear equations are tuned by particle swarm optimization (PSO) and the multiple linear regression method, respectively. For the purpose of comparison, a subtractive clustering based adaptive neuro fuzzy inference system (ANFIS), random forests, gradient boosting trees, and long-term short memory neural network, conventional and modified support vector regression methods were considered. Simulations have been performed in MATLAB environment, and all the methods were tested with randomly chosen 30 days data of a residential building in Memphis City for energy consumption prediction. The absolute average error, root mean square error, and mean average percentage errors are tabulated and considered as performance indices. The efficacy of the proposed systems for residential load forecasting over the other systems have been validated by both simulation results and performance indices, which indicate that the proposed equation-based systems have the lowest absolute average errors, root mean square errors, and mean average percentage errors compared to the other methods. In addition, the proposed systems can be easily practically implemented.


Introduction
The energy utilization in residential and commercial buildings all over the USA is almost 40% of the overall energy generation. With the increase of luxury requirement of residents, the energy consumption is ever-increasing [1,2]. Therefore, providing the required power by grid is a hard task, especially during peak hours of the days. However, this problem can be solved in two ways. Firstly, by proper planning and allocation of energy resources by the grid, adequate power can be supplied to the consumers. Secondly, by implementing effective demand-side energy management system in the smart building that is capable of scheduling the load efficiently, the total cost of energy can be reduced by utilizing less loads that are operated by the grid power during the peak hours without affecting the consumers' comfort demands [3,4]. An efficient load forecasting system helps the buildings' energy management system schedule the loads ahead of time, operate the energy sources and energy storage systems effectively during peak hours to reduce the cost of energy and remove burden on the grids [5][6][7]. It also creates possibility for the smart building to sell energy to the grid during peak hours to achieve some incentives [8]. Moreover, with the knowledge of load forecasting, the grid can allocate

•
Three generalized equations are developed for predicting load consumption based on the HDD, CDD, occupancy, and day type. The coefficients of the non-linear equations and linear equation are optimized by the well-known PSO and multiple linear regression method, respectively.

•
In order to see efficacy of the proposed equation-based methods, in predicting the loads, their performance have been compared with that of a recently published forecasting method such as the subtractive clustering based ANFIS approach, random forest, gradient boosting trees and LSTM, and conventional and modified support vector regression models.
In this work, the predicted data for all methods are simulated in MATLAB software and different errors are considered as performance indices to validate the efficacy of the proposed equations-based prediction systems.
The rest of the paper is organized as follows. In Section 2, the proposed equation-based prediction systems are described. Section 3 explains the conventional forecasting method, i.e., the ANFIS system, random forest, gradient boosting, LSTM, conventional and modified support vector regression. Simulation results are presented and explained in Section 4. The conclusion and future research directions are provided in Section 5. Finally, the references are enlisted.

Proposed Equation Based Prediction Methods
The load consumption of a building depends highly on temperature. The increase in temperature increases the load consumption if the temperature is above a certain temperature, which in general is 65 • F in USA, due to a higher cooling requirement. In addition, if the temperature goes below the same temperature mentioned above, the load consumption increases due to a higher requirement of heating. Therefore, energy consumption of a residential building is dependent upon HDD and CDD, which represent temperatures below or above 65 • F. Based on this fact, the energy consumption, e, can be expressed as the following: e ∝ HDD, e ∝ CDD.
Moreover, for the same temperature, HDD, CDD, the energy consumption increases with an increase in the number of occupants and vise versa in the same apartment. Therefore, In addition, the energy consumption pattern of a building is different for a normal working day, weekend, any special day, or special occasion. The special day depends on the family living in a building when there may have some religious festival celebrations, some family events happening, or more than usual family members staying in the building for some reasons. In addition, it can be a normal working day or weekend or even holiday. Therefore, e ∝ Day type.
Therefore, based on the above discussion, three types of equations, as shown in (1) to (3) below, have been developed for load predictions in this work. The first equation is linear in nature as variable HDCC, occupant number (O), and day type (D), and values are linearly multiplied with the coefficients to predict the total energy consumption of the day. Moreover, the other two equations are non-linear in nature as some power values of HDCC, O are multiplied with the coefficients, whereas D is used as power for Equations (2) and (3). The exponential component is used in (2), whereas the variable is a constant whose values are determined by the optimization algorithm for (3).
e = C 9 HDCC p + C 10 O q + C 11 a D + C 12 , where, e and O represent the total load consumptions in kWh in a day and number of occupants present on that day, respectively. HDCC represents the HDD values, which is the difference between the average day's temperature and 65 • F if the temperature is below or equal to 65 • F. Moreover, HDCC represents the CDD values, which is the difference between 65 • F and the day's average temperature values if the temperature is above 65 • F. The coefficients C 1 , C 5 , and C 9 depend on HDD or CDD values for (1) to (3), respectively. C 2 , C 6 , C 10 are the coefficients for number of occupants. The coefficients C 3 , C 7 , and C 11 vary with the day type. The coefficients C 4 , C 8 , and C 12 are considered to be off-sets that are dependent on HDD, CDD, occupancy and the day-type. The values for D for normal working days, weekend, and special days are considered to be 0, 1, and 2, respectively, for this work. The equations proposed in (1) and (2) to (3) are linear and non-linear in nature, respectively, whose performance certainly depends on the properly tuned values of the coefficients with the varying HDD, CDD values, occupancy, or the type of the days. Therefore, multiple linear regression method and PSO algorithm have been utilized to obtain the coefficients of the linear equation in (1) and non-linear equations proposed in (2) and (3), respectively, in order to predict the optimal total energy consumption of the day. The working principle of the equation-based methods for residential load predictions, which consider the HDD, CDD, occupancy and the values of D based on normal working days, weekends, or special days as inputs (x), are shown in Figure 1. In this work, generalized equations are formulated based on the inputs. The dotted line portion shown in Figure 1 represents the equation-based prediction systems. First the inputs (x) are fed into the equation-based prediction system so that the ranges of the input variables are selected. Once the ranges of the variables are selected, the MLR/PSO tuned coefficient values are sent to the main equation block where Equations (1) to (3) are utilized to predict the energy consumption based on the inputs and the coefficients. The multiple linear regression method (MLR) or the PSO method provides the optimized coefficient (C 1 . . . C 4 /C 5 . . . C 8 /C 9 . . .  Tables 1-3. These optimized tuned coefficients are obtained from the previous input data and the energy consumption data obtained from the smart meter.  Tables 1-3. These optimized tuned coefficients are obtained from the previous input data and the energy consumption data obtained from the smart meter.

Parameter Tuning by Multiple Linear Regression (MLR) Algorithm
In MATLAB, the command, regress is used for calculating the coefficients of the linear model, which has the following format: subject to where the input matrix, x = [HCDD; O; D; U], C = [C1 C2 C3 C4] and y represent the anticipated output obtained from the smart meter. U is a unity vector of length of HDCC vector to determine the values of C4 by the multiple linear regression algorithm and introduced in the x matrix as dummy as for each set of data, the columns number of C matrix should be equal to the rows number of x matrix. By matrix multiplication of C and x matrix, the predicted output (e) is calculated and put to the condition shown above until the coefficient values (C1,….C4), for which the summation of square of the difference between the anticipated output (y) and predicted output (e) gets minimum.

Parameter Tuning by Particle Swarm Optimization (PSO) Algorithm
As already mentioned, in this work, the PSO method has been used for parameters tuning of the non-linear equations shown in (2) and (3). It has been widely applied in applications such as energy management [44,45], load predictions [46][47][48], etc. It is very easy to implement and has faster convergence speed and effective over other optimization algorithms such as the genetic algorithm [45].
In PSO, a random number of particles are chosen for search space and the objective function is defined. Based on the cost function at any current location, the optimal position and cost are determined and updated among the particles. Each particle then finds its new position based on its current position, previous velocity and global optimal location among the particles. After updating its positions and velocity vectors, again the best position and cost among the particles are circulated and updated. Therefore, by updating the situations (position and velocity vectors) and collaborating the information of optimal best location and optimal cost, the swarm as a group reaches its optimal goal.
Predicted Energy Input Data Range Selector

Parameter Tuning by Multiple Linear Regression (MLR) Algorithm
In MATLAB, the command, regress is used for calculating the coefficients of the linear model, which has the following format: subject to (y − e) 2 = minimum, where the input matrix, x = [HCDD; O; D; U], C = [C 1 C 2 C 3 C 4 ] and y represent the anticipated output obtained from the smart meter. U is a unity vector of length of HDCC vector to determine the values of C 4 by the multiple linear regression algorithm and introduced in the x matrix as dummy as for each set of data, the columns number of C matrix should be equal to the rows number of x matrix. By matrix multiplication of C and x matrix, the predicted output (e) is calculated and put to the condition shown above until the coefficient values (C 1 , . . . .C 4 ), for which the summation of square of the difference between the anticipated output (y) and predicted output (e) gets minimum.

Parameter Tuning by Particle Swarm Optimization (PSO) Algorithm
As already mentioned, in this work, the PSO method has been used for parameters tuning of the non-linear equations shown in (2) and (3). It has been widely applied in applications such as energy management [44,45], load predictions [46][47][48], etc. It is very easy to implement and has faster convergence speed and effective over other optimization algorithms such as the genetic algorithm [45].
In PSO, a random number of particles are chosen for search space and the objective function is defined. Based on the cost function at any current location, the optimal position and cost are determined and updated among the particles. Each particle then finds its new position based on its current position, previous velocity and global optimal location among the particles. After updating its positions and velocity vectors, again the best position and cost among the particles are circulated and updated. Therefore, by updating the situations (position and velocity vectors) and collaborating the information of optimal best location and optimal cost, the swarm as a group reaches its optimal goal.
The PSO algorithm is characterized by the two-model equations of velocity and position vector in an N-dimensional solution space as shown below: x k+1 where v k+1 i represents i th particle velocity of (k + 1) th iteration of N dimensional search space. Similarly, x k i corresponds to i th particle velocity of k th iteration. p k i and p k g correspond to the individual best position of the i particle and global best position of the swarm, respectively. Moreover, r 1 and r 2 are randomly chosen numbers, which are uniformly distributed between 0 and 1. c 1 and c 2 are known as learning factors which control the significance of the best solution. The values for both learning factors are chosen to be 2. The value for the inertia coefficient, w for each iteration number is calculated using the following equation: where, w max and w min represent the upper and lower value of w and t, respectively, MaxI correspond to the current iteration number and maximum iteration number, respectively. The objective function for the current work is considered as follows: subject to y − e = minimum.
The procedure of the PSO algorithm is described as follows: • Initialization: 1. Load the input (x) and anticipated output (y) value based on the smart meter data. After the optimal coefficients are obtained from the MLR and PSO, the coefficients are put into (1) to (3) to get the predicted outputs. The coefficients, based on different HDD, CDD, occupancy and day type condition, as determined by the MLR and PSO methods, are shown in Tables 1-3, respectively. Interpretability is the main advantage of this proposed method. The model explains the energy consumption based on the heating degree days (HDD), cooling degree days (CDD), occupancy, and the day type. The proposed equation-based system is practically implementable as it needs only three parameters (temperature, number of occupants, type of the day.). The predicted temperature information for future days can be easily found online. The number of occupants can be inserted by the consumer, or a motion detector can be placed inside the building to count the number of occupants. Moreover, normal working days and weekend information can be available from an online calendar and the special day information can be inserted by the consumers. Once the coefficients and the temperature range are known to consumers, they even can calculate the energy consumption by hands. Moreover, it requires moderate amount of data (energy consumption, HDD, CDD, occupancy, day type) for parameter coefficient tuning by MLR and PSO. It is very convenient for practical implementation. However, the energy consumption of a residential building depends upon the habits of residents living there, responses to different environmental condition chance, mode of comfort (the usage of appliances based on consumer comfort desire under different conditions), etc. Therefore, these three equations can be implemented for any building provided that the coefficient is re-tuned based on the energy consumption pattern and other conditions such as country, region, and location.
In the first condition in Table 1, it refers to the temperatures for which CDD will be 17 • F above the reference temperature (65 • F). All temperatures equal to or higher than 82 • F (65 • F + 17 • F), would have an equivalent value for CDD of 17 • F or higher. Similarly, in the second condition, the values of CDD between 0 to less than 17 • F refer to all the values of temperatures from 65 • F to 81 • F (below 82 • F). Moreover, the value of HDD in the third condition refers to all the temperature less than or equal to 20 • F lower than the reference value 65 • F. In this case, all the temperatures that will be in the range 0 • F to 45 • F (65 • F − 20 • F) will be equivalent value for HDD of 20 • F or higher. Finally, all the temperatures in the range above 45 value less than 20 • F to 0.1 • F value. Therefore, by choosing these four ranges, all temperatures are considered. Similarly, the temperature of different ranges in terms of HDD and CDD are considered in Tables 2 and 3.
It is important to note that the HDD and CDD values are calculated based on the constant reference temperature (65 • F) for USA. However, the consumers' temperature comfort for different seasons and conditions can be different. Therefore, in order to cope with both conditions and predict the accurate energy consumption with HDD/CDD, the coefficients (C 1 for Equation (1), C 5 for Equation (2), and C 9 for Equation (3)) are tuned and based on HDD/CDD values for the defined range of HDD/CDD, and represent the energy variation with per degree variation of HDD/CDD (kWh/ • F). Moreover, if the above methods are used for other residential places located in others countries, regions, etc., then the HDD/CDD values should be calculated based on that region's reference temperature and the coefficient should be tuned accordingly.

Conventional Methods
As already mentioned, in this work, the performance of the proposed equation-based methods has been compared with that of the conventional methods such as the ANFIS, random forest, gradient boosting trees, and LSTM. These conventional methods are described below.

Adaptive Neuro Fuzzy Inference System (ANFIS) Based Load Forecasting
The ANFIS is an intelligent model with the inherent contribution of both a neural network and a fuzzy system. In this work, a Sugeno-type ANFIS system is considered. The ANFIS system is governed by two major stages, namely antecedent and conclusion. Both parts are related to each other by fuzzy rules. For the chosen Sugeno type ANFIS system, the fuzzy rules are formulated by the following equation [34]: where, x 1 and x 2 correspond to the inputs to the ANFIS system. Two inputs that have been chosen, are temperature (x 1 ) and a variable, R (x 2 ), as shown in (10). A i and B i represent the fuzzy sets. Therefore, f i indicates the output that is governed by the fuzzy rules. For example, temperature corresponds to A 1 and R value corresponds to B 1 , rule 1 of the output would be: During the training process, the parameters (i.e., p i , q i , and r i ) are calculated. The input, R is determined by (10): The value of d can be 0, 1, and 2 based on normal working days, weekend, and special days, respectively. Therefore, if the number of occupants for a day is 5, and the day is a normal working day (d = 0), the value of R would be 5. If the day is a weekend (d = 1) or special day (d = 2), for the same number of occupants (5), the values of R would be 6.5 and 8, respectively. In the ANFIS system, at first the data is utilized during the training process and the rules are extracted and membership functions types and their positions are determined through training and testing. Finally, the results are used for future predictions. For this work, during training, temperature, R values, and output energy consumption data of 304 days are provided. The parameters for the input (temperature, R) and output (total energy consumption) membership functions are tuned by the hybrid algorithm that utilizes the backpropagation method for the parameter of input membership function. In addition, output membership function parameters are optimized by the least square estimation method. Subtractive clustering defines the number of the fuzzy rules along with the number of membership functions and membership type. Therefore, the subtractive method is very useful if the data pattern is unknown, as well as if one is unsure as whether or not to choose the number of membership function with the membership type and center position. The parameters of subtractive clustering are chosen from [34]. In normal fuzzy system, if both inputs have 10 membership functions, then the total fuzzy rules would have been 100, which have to be analyzed for each input data. However, for the chosen subtractive clustering parameters, each input has 10 membership functions and the total number of fuzzy rules is 10, as shown in Figure 2, which makes the subtractive clustering beneficial and the system faster. The minimum error and number of epochs are chosen to be 0 and 500, respectively. The minimal root-mean-square error is found to be 5.13 after 500 epochs. The tuned Gaussian fuzzy membership functions are shown in Figure 3. The parameters of ANFIS system are used from [34].
Energies 2020, 13, x FOR PEER REVIEW 9 of 22 optimized by the least square estimation method. Subtractive clustering defines the number of the fuzzy rules along with the number of membership functions and membership type. Therefore, the subtractive method is very useful if the data pattern is unknown, as well as if one is unsure as whether or not to choose the number of membership function with the membership type and center position. The parameters of subtractive clustering are chosen from [34]. In normal fuzzy system, if both inputs have 10 membership functions, then the total fuzzy rules would have been 100, which have to be analyzed for each input data. However, for the chosen subtractive clustering parameters, each input has 10 membership functions and the total number of fuzzy rules is 10, as shown in Figure 2, which makes the subtractive clustering beneficial and the system faster. The minimum error and number of epochs are chosen to be 0 and 500, respectively. The minimal root-mean-square error is found to be 5.13 after 500 epochs. The tuned Gaussian fuzzy membership functions are shown in Figure 3. The parameters of ANFIS system are used from [34].   Energies 2020, 13, x FOR PEER REVIEW 9 of 22 optimized by the least square estimation method. Subtractive clustering defines the number of the fuzzy rules along with the number of membership functions and membership type. Therefore, the subtractive method is very useful if the data pattern is unknown, as well as if one is unsure as whether or not to choose the number of membership function with the membership type and center position. The parameters of subtractive clustering are chosen from [34]. In normal fuzzy system, if both inputs have 10 membership functions, then the total fuzzy rules would have been 100, which have to be analyzed for each input data. However, for the chosen subtractive clustering parameters, each input has 10 membership functions and the total number of fuzzy rules is 10, as shown in Figure 2, which makes the subtractive clustering beneficial and the system faster. The minimum error and number of epochs are chosen to be 0 and 500, respectively. The minimal root-mean-square error is found to be 5.13 after 500 epochs. The tuned Gaussian fuzzy membership functions are shown in Figure 3. The parameters of ANFIS system are used from [34].

Random Forest Based Load Forecasting
Random forest is an ensemble approach that emphasizes the predictions of all the decision trees that are independent upon each other [49]. The sample size is randomly selected and fitted into a regression tree. The process is known as bagging and the selected sample is called bootstrap. This sample is replaced with another random sample each time. The probability of all the observations is assumed to be same. The bagging algorithm then implements the classification and regression tree (CART) algorithm to obtain a set of regression trees and finally averages the output of all trees based on the following equation:Ŷ where,Ŷ is the output estimation based on new input X andĥ X , S θ i n is the predicted output of bootstrap sample of S n . θ i represents a randomly chosen variable having identical distribution.
For this method, the input variables considered are temperature, occupancy, and day type. The energy consumption per day is the output of the prediction system. The unbiased importance of input variables that are measured using the out of bag method and the number of levels, is shown in Figure 4.

Random Forest Based Load Forecasting
Random forest is an ensemble approach that emphasizes the predictions of all the decision trees that are independent upon each other [49]. The sample size is randomly selected and fitted into a regression tree. The process is known as bagging and the selected sample is called bootstrap. This sample is replaced with another random sample each time. The probability of all the observations is assumed to be same. The bagging algorithm then implements the classification and regression tree (CART) algorithm to obtain a set of regression trees and finally averages the output of all trees based on the following equation: where, is the output estimation based on new input and ℎ , is the predicted output of bootstrap sample of Sn. θi represents a randomly chosen variable having identical distribution. For this method, the input variables considered are temperature, occupancy, and day type. The energy consumption per day is the output of the prediction system. The unbiased importance of input variables that are measured using the out of bag method and the number of levels, is shown in Figure  4.  The parameter of this method, optimized by the Bayesian optimization algorithm [50], are summarized in the Table 4.

Gradient Boosting Trees Based Load Forecasting
The gradient boosting is an additive model that is characterized by the following equation [51]: where Fm(x) represents the prediction sum of all m regression trees and hm(x) is the fixed sized regression trees. In MATLAB, the least square boosting (LSBoost) is used for regression [52,53]. At each iteration, the ensemble adds a new tree to the difference between the response observed and the summation of prediction of all trees used before. The LSBoost is efficient in minimizing the meansquared error. Similar to the random forest method, the variables such as temperature, occupancy, and day type are considered as inputs for this method. The energy consumption per day is the output The parameter of this method, optimized by the Bayesian optimization algorithm [50], are summarized in the Table 4.

Gradient Boosting Trees Based Load Forecasting
The gradient boosting is an additive model that is characterized by the following equation [51]: where F m (x) represents the prediction sum of all m regression trees and h m (x) is the fixed sized regression trees. In MATLAB, the least square boosting (LSBoost) is used for regression [52,53]. At each iteration, the ensemble adds a new tree to the difference between the response observed and the summation of prediction of all trees used before. The LSBoost is efficient in minimizing the mean-squared error. Similar to the random forest method, the variables such as temperature, occupancy, and day type are considered as inputs for this method. The energy consumption per day is the output of the prediction system. The parameters of this method, optimized by the Bayesian optimization algorithm, are summarized in Table 5.

LSTM Based Load Forecasting
The LSTM is an improved version recurrent neural network (RNN) with added cell state and gates and thus it has the ability to overcome the gradient vanishing problem that the conventional RNN has [35,36]. The LSTM is characterized by the following sets of equations: where, f t represents forget gates that control the amount of previous states to be reflected on the current states. It is the input and o t is the output gates that decide the amount of new information to update the cell state and to output depending on cell state. σ keeps the output values between 0 to 1. All the gates are updated based on current input x t and previous output h t−1 . C t and C t represent cell state and the value required for calculating cell state, respectively. For the LSTM based load forecasting, the input variables are temperature, occupancy, and day type. The training of the LSTM approach is shown in Figure 5. For the LSTM model parameters, the Adam optimization approach is used [34] and the parameters for LSTM are shown in Table 6.

Conventional and Modified Support Vector Regression Based Load Forecasting
The modified support vector regression (SVR)-based prediction method involves three stages for residential buildings energy consumption predictions, as shown in Figure 6. In the first stage, the previous historical data inputs (xtr) and known energy consumptions (ytr) are fed into the SVR training stage, which produce the values of β0, b0. β0 has 14 values which correspond to coefficients for 14 input parameters such as temperature, humidity, wind speed, etc. The obtained values of β0, b0 by the SVR training system are then considered as the initial values for the PSO stage. In the PSO stage, the predicted inputs (x) and anticipated consumption (y), which can be obtained from smart meter by similar day/input approach, are inserted. As already mentioned, energy consumption in a residential building depends on the temperature range, other environmental conditions range, occupancy, or even the day type. Therefore, more sets of parameter values are required to be considered based on temperature range to predict the energy consumption more accurately. Therefore, four sets of βoptn, boptn values are generated by the PSO method based on the temperature range and one of four sets values of βoptn, boptn based on the corresponding temperature is used by the SVR equation to predict the energy consumption of the residential building, as shown in Figure 6, where n = 1, 2,…4.

Conventional and Modified Support Vector Regression Based Load Forecasting
The modified support vector regression (SVR)-based prediction method involves three stages for residential buildings energy consumption predictions, as shown in Figure 6. In the first stage, the previous historical data inputs (x tr ) and known energy consumptions (y tr ) are fed into the SVR training stage, which produce the values of β 0 , b 0 . β 0 has 14 values which correspond to coefficients for 14 input parameters such as temperature, humidity, wind speed, etc. The obtained values of β 0 , b 0 by the SVR training system are then considered as the initial values for the PSO stage. In the PSO stage, the predicted inputs (x) and anticipated consumption (y), which can be obtained from smart meter by similar day/input approach, are inserted. As already mentioned, energy consumption in a residential building depends on the temperature range, other environmental conditions range, occupancy, or even the day type. Therefore, more sets of parameter values are required to be considered based on temperature range to predict the energy consumption more accurately. Therefore, four sets of β optn , b optn values are generated by the PSO method based on the temperature range and one of four sets values of β optn , b optn based on the corresponding temperature is used by the SVR equation to predict the energy consumption of the residential building, as shown in Figure 6, where n = 1, 2, . . . 4.

Conventional and Modified Support Vector Regression Based Load Forecasting
The modified support vector regression (SVR)-based prediction method involves three stages for residential buildings energy consumption predictions, as shown in Figure 6. In the first stage, the previous historical data inputs (xtr) and known energy consumptions (ytr) are fed into the SVR training stage, which produce the values of β0, b0. β0 has 14 values which correspond to coefficients for 14 input parameters such as temperature, humidity, wind speed, etc. The obtained values of β0, b0 by the SVR training system are then considered as the initial values for the PSO stage. In the PSO stage, the predicted inputs (x) and anticipated consumption (y), which can be obtained from smart meter by similar day/input approach, are inserted. As already mentioned, energy consumption in a residential building depends on the temperature range, other environmental conditions range, occupancy, or even the day type. Therefore, more sets of parameter values are required to be considered based on temperature range to predict the energy consumption more accurately. Therefore, four sets of βoptn, boptn values are generated by the PSO method based on the temperature range and one of four sets values of βoptn, boptn based on the corresponding temperature is used by the SVR equation to predict the energy consumption of the residential building, as shown in Figure 6, where n = 1, 2,…4.  The support vector regression, because of its dependence on kernel function, is considered as a nonparametric technique [54]. In MATLAB, epsilon-insensitive support vector regression is available in which the set of training data of both predictor variables (x tr ) and observed response values (y tr ) are provided with a view to deriving a function f (x) which will deviate from all y within the limit of ε values. Therefore, the equation for the f (x) can be expressed as shown in (19) [54,55].
where, x is the set of N observation, β and b represent the coefficients of input and bias, respectively. In order to formulate a convex optimization problem and to ensure that f (x) is as flat as possible, it is required to minimize the objective function, which can be represented by the following equation: Subject to ∀n : where, ε is the residue. Since it might not be possible for f (x) to satisfy the constraint in (20) for all values of x, two slack variables ξ n and ξ * n are included with a view of maintaining the constraint shown in (21) for all values of x. Therefore, the objective function presented in (20) can be rewritten as follows: Which subjects to: ∀n : ξ n ≥ 0, where, C is known as the box constraint that has the ability to control the penalty when the observation does not fall within the ε margin. It also controls the trades between the flatness of f (x) and maximum tolerable values beyond ε margin. The linear ε-insensitive loss function can be expressed as: The non-linear support vector regression can be achieved using Lagrange dual formulations. Then, the objective function becomes as shown in (22). The constraints in (22) are: where, the linear Kernel function can be expressed as: Energies 2020, 13, 6378

of 22
The objective function shown in (22) can be solved by the quadratic programming techniques. In this work, sequential minimal optimization method (SMO), which is a very popular approach for SVR problems, is considered. In SMO, a series of two-point optimization is considered and these two points are selected by a selection rule that is governed by second-order information. In SVR, the gradient vector is updated after each iteration by the following equation: After the training process described in (19)- (24), the values of β 0 , b 0 are obtained and then fed in the PSO stage for further optimizations. For PSO, all the methods and parameters are used the same, as described in Section 2.2.
After the optimal coefficients are obtained from the PSO based on the temperature range, input and anticipated output, the coefficients are put into (19) to get the predicted output.
Moreover, in this work, the conventional PSO tuned SVR method, as shown in Figure 7, has also been used. Likewise, the modified SVR system, the conventional system, also involves three stages for energy consumption predictions. The SVR training stage produces the β 0 , b 0 for the PSO stage. Then, the PSO provides only one set of values of β opt , b opt based on the predicted inputs and anticipated consumption, which can be obtained from a smart meter using the similar day/input approach. Therefore, the SVR training system and the PSO stage are the same for both methods with the exception that the modified system considers the temperature range as an additional input. The coefficients, based on different temperatures for the modified SVR method and one set for all temperatures for the conventional SVR method are shown in Table 7, where all T values are in degree Fahrenheit ( • F).
Energies 2020, 13, x FOR PEER REVIEW 14 of 22 points are selected by a selection rule that is governed by second-order information. In SVR, the gradient vector is updated after each iteration by the following equation: After the training process described in (19)- (24), the values of β0, b0 are obtained and then fed in the PSO stage for further optimizations. For PSO, all the methods and parameters are used the same, as described in Section 2.2.
After the optimal coefficients are obtained from the PSO based on the temperature range, input and anticipated output, the coefficients are put into (19) to get the predicted output.
Moreover, in this work, the conventional PSO tuned SVR method, as shown in Figure 7, has also been used. Likewise, the modified SVR system, the conventional system, also involves three stages for energy consumption predictions. The SVR training stage produces the β0, b0 for the PSO stage. Then, the PSO provides only one set of values of βopt, bopt based on the predicted inputs and anticipated consumption, which can be obtained from a smart meter using the similar day/input approach. Therefore, the SVR training system and the PSO stage are the same for both methods with the exception that the modified system considers the temperature range as an additional input. The coefficients, based on different temperatures for the modified SVR method and one set for all temperatures for the conventional SVR method are shown in Table 7, where all T values are in degree Fahrenheit (°F).

Simulation Data and Conditions
In this work, the daily total energy demand and the average temperature data of the day were collected from an apartment located in 3571 Midland Avenue, Memphis, TN. The smart energy meter (meter 54BKW988882) data is available in the MLGW web account. Moreover, the number of occupants present at any day and type of the day information were collected from the residents in the building. A total of 334 days of data (334 sets of data) of average temperatures for a given day, average number of occupants for the day, day type, were collected. Moreover, out of these data, randomly chosen 30 days (30 sets of data) data were used for the prediction of total energy consumption per day for comparison purposes and rest 304 days data were used for the ANFIS, random forest, LSBoost, and LSTM network methods for their training and validation. Similarly, 30 days of data of HDD/CDD, occupancy, and day type value (D) were used to get the tuned values of coefficients for the proposed equation-based systems. As for modified SVR and conventional SVR, 14 inputs (temperature, average dew points, relative humidity, specific humidity, indoor humidity, average wind speed, atmospheric pressure, average precipitation, insolation index and solar radiation, occupancy, normal weekdays/weekend/special holidays, HDD, CDD) were considered and 304 sets of data of 304 days were used for training and validations.

Effectiveness of Proposed Equation Based Prediction System over ANFIS, Random Forest, LSBoosting, and LSTM, Modified and Conventional SVR Methods
For all the prediction systems, as previously explained, randomly chosen 30 days of data were used for prediction and comparison purposes. For the ANFIS system, as previously explained, two inputs such as the temperature and P values were considered. For the equation-based systems, three inputs (HDD/CDD, occupancy, day type) and for other methods except modified and conventional SVR methods, three inputs (temperature, occupancy, day type) were considered. Since for all methods, occupancy and day type are common inputs, the data for the 30 predicted days were shown in Figure 8.

Simulation Data and Conditions
In this work, the daily total energy demand and the average temperature data of the day were collected from an apartment located in 3571 Midland Avenue, Memphis, TN. The smart energy meter (meter 54BKW988882) data is available in the MLGW web account. Moreover, the number of occupants present at any day and type of the day information were collected from the residents in the building. A total of 334 days of data (334 sets of data) of average temperatures for a given day, average number of occupants for the day, day type, were collected. Moreover, out of these data, randomly chosen 30 days (30 sets of data) data were used for the prediction of total energy consumption per day for comparison purposes and rest 304 days data were used for the ANFIS, random forest, LSBoost, and LSTM network methods for their training and validation. Similarly, 30 days of data of HDD/CDD, occupancy, and day type value (D) were used to get the tuned values of coefficients for the proposed equation-based systems. As for modified SVR and conventional SVR, 14 inputs (temperature, average dew points, relative humidity, specific humidity, indoor humidity, average wind speed, atmospheric pressure, average precipitation, insolation index and solar radiation, occupancy, normal weekdays/weekend/special holidays, HDD, CDD) were considered and 304 sets of data of 304 days were used for training and validations.

Effectiveness of Proposed Equation Based Prediction System over ANFIS, Random Forest, LSBoosting, and LSTM, Modified and Conventional SVR Methods
For all the prediction systems, as previously explained, randomly chosen 30 days of data were used for prediction and comparison purposes. For the ANFIS system, as previously explained, two inputs such as the temperature and P values were considered. For the equation-based systems, three inputs (HDD/CDD, occupancy, day type) and for other methods except modified and conventional SVR methods, three inputs (temperature, occupancy, day type) were considered. Since for all methods, occupancy and day type are common inputs, the data for the 30 predicted days were shown in Figure 8.  Figure 9 represents the comparison of prediction of energy consumptions by the proposed equations, ANFIS, random forest, LSBoosting, LSTM, modified and conventional SVR based prediction systems with actual energy consumption data. From the results, it is evident that the proposed equation-based prediction systems perform better as compared to all other systems.
Furthermore, the absolute percentage of error (%Err), the absolute average error (A.E), root mean square error (RMSE), and mean average percentage error (MAPE) for the prediction systems have been calculated using (25), (26), (27), and (28), respectively.  The absolute percentage error shows the percentage of prediction error per day total consumption and helps determine the maximum error that occurs within the considered time period. The absolute average error predicts the average error of prediction from the actual consumption with the considered time periods. Similarly, the RMSE and MAPE shows the mean error and mean percentage of error over a considered time period. These error methods are very standard for the comparison of performance. The lower values of these errors mean the system predicts very close to the actual predictions. Therefore, these errors are used to evaluate the best system performance and these errors have been used as performance indices in this work.
where N = 30 is used for Equations from (25) to (28). The percentage errors of proposed methods and other systems for predicting energy demands of chosen 30 days are shown in Figure 10. Moreover, the average, root mean square and mean average percentage errors for all systems are shown in Table 8. From Table 8, it is evident that the average errors of equation-based prediction systems are less than those of ANFIS, random forest, LS boosting, LSTM, modified and conventional SVR based prediction systems. In this case, the proposed method shown in (1), (2), and (3) perform 29.75%, 47.97% and 48.63% better, respectively, than the ANFIS system. The modified SVR performs 2.87% better as compared to ANFIS system. However, the ANFIS system performs 106.8%, 96.31%, 109.01%, and 71.31% better as compared to random forest, LSBoosting, LSTM and conventional SVR methods, respectively. Furthermore, the absolute percentage of error (%Err), the absolute average error (A.E), root mean square error (RMSE), and mean average percentage error (MAPE) for the prediction systems have been calculated using (25), (26), (27), and (28), respectively.
The absolute percentage error shows the percentage of prediction error per day total consumption and helps determine the maximum error that occurs within the considered time period. The absolute average error predicts the average error of prediction from the actual consumption with the considered time periods. Similarly, the RMSE and MAPE shows the mean error and mean percentage of error over a considered time period. These error methods are very standard for the comparison of performance.
The lower values of these errors mean the system predicts very close to the actual predictions. Therefore, these errors are used to evaluate the best system performance and these errors have been used as performance indices in this work.
where N = 30 is used for Equations from (25) to (28). The percentage errors of proposed methods and other systems for predicting energy demands of chosen 30 days are shown in Figure 10. Moreover, the average, root mean square and mean average percentage errors for all systems are shown in Table 8. From Table 8, it is evident that the average errors of equation-based prediction systems are less than those of ANFIS, random forest, LS boosting, LSTM, modified and conventional SVR based prediction systems. In this case, the proposed method shown in (1), (2), and (3) perform 29.75%, 47.97% and 48.63% better, respectively, than the ANFIS system. The modified SVR performs 2.87% better as compared to ANFIS system. However, the ANFIS system performs 106.8%, 96.31%, 109.01%, and 71.31% better as compared to random forest, LSBoosting, LSTM and conventional SVR methods, respectively. Energies 2020, 13, x FOR PEER REVIEW 17 of 22  Moreover, the RMSE values indicate that the equation-based systems proposed in (1) to (3) perform 48.72%, 50.83%, and 48.42% better, respectively, than the ANFIS system. The modified SVR performs 8.31% better as compared to the ANFIS system. However, the ANFIS system shows 44.18%, 59.38%, 54.87%, and 33.01% superior performance as compared to random forest, LSBoosting, LSTM and conventional SVR methods, respectively. In addition, the equation-based systems perform 19.62%, 35.21%, and 44.38% better, respectively, than the ANFIS system in terms of MAPE. Moreover, the ANFIS system performs 281.56%, 117.83%, 125.72%, 30.11%, and 170.42% better as compared to random forest, LSBoosting, LSTM, modified and conventional SVR methods, respectively. Therefore, the proposed equation-based prediction systems perform better than other methods in all cases. Moreover, the errors of the ANFIS system are considered as the reference system for all performance improvement calculations mentioned above.
In addition to the RMSE error calculation, the sum of squares due to error (SSE), the coefficient of determination (R 2 value) is used to evaluate the goodness of fit statistics analysis [56]. The R 2 values are calculated based on the following Equation (29): where, SST corresponds to sum of squares above the mean. Based on Equation (29), the R 2 value for the multiple linear regression optimization-based Equation (1) system is found to be 0.9804, which  Moreover, the RMSE values indicate that the equation-based systems proposed in (1) to (3) perform 48.72%, 50.83%, and 48.42% better, respectively, than the ANFIS system. The modified SVR performs 8.31% better as compared to the ANFIS system. However, the ANFIS system shows 44.18%, 59.38%, 54.87%, and 33.01% superior performance as compared to random forest, LSBoosting, LSTM and conventional SVR methods, respectively. In addition, the equation-based systems perform 19.62%, 35.21%, and 44.38% better, respectively, than the ANFIS system in terms of MAPE. Moreover, the ANFIS system performs 281.56%, 117.83%, 125.72%, 30.11%, and 170.42% better as compared to random forest, LSBoosting, LSTM, modified and conventional SVR methods, respectively. Therefore, the proposed equation-based prediction systems perform better than other methods in all cases. Moreover, the errors of the ANFIS system are considered as the reference system for all performance improvement calculations mentioned above.
In addition to the RMSE error calculation, the sum of squares due to error (SSE), the coefficient of determination (R 2 value) is used to evaluate the goodness of fit statistics analysis [56]. The R 2 values are calculated based on the following Equation (29): where, SST corresponds to sum of squares above the mean. Based on Equation (29), the R 2 value for the multiple linear regression optimization-based Equation (1) system is found to be 0.9804, which reflects that 98.04% of the total variation in the data (N = 30) are explained by the mentioned system. Moreover, SSE and SST values are found to be 139.867 and 7136.418, respectively.

Conclusions
This paper proposes new equations-based methods, based on HDD, CDD, occupancy and week/special days, for residential load forecasting. The performance of the proposed methods has been compared with that of the ANFIS, random forest, LSBoosting LSTM, modified and conventional SVR approaches. The forecasted energies by all methods are analyzed with actual energy consumption data for validation. The 304 days data are considered during training of the ANFIS, random forest, LSBoosting, LSTM, modified and conventional SVR systems. Moreover, 30 days of data of the same apartment are used for the prediction of all the methods. Based on the obtained simulation responses and performance indices, the following conclusions can be drawn.

1.
The proposed equations-based methods are effective in predicting residential loads.

2.
The proposed prediction systems require less computation and perform better than the ANFIS, random forest, LSBoosting, LSTM, and modified and conventional SVR systems. It is noteworthy that the energy consumption of a residential building depends upon the members living there with their habits, response to different environmental condition, mode of comfort, etc. Therefore, if the energy consumption is categorized based on HDD, CDD, number of occupants, the day type, the uncertainty of energy reduces much. From the Table 9 below, it is evident that if we consider the whole data range (bottom most row), the uncertainty of the system is high (12.91) in terms of standard deviation while the average energy consumption is 27.15 kWh. However, after dividing the data based on the conditions, it is evident that the average energy consumption is different than others and have much less uncertainty as compared to when considering the whole data. Moreover, this variation is seen because of various number of occupancies for a particular day and day type. That is why our proposed systems perform better than other considered systems. Moreover, our proposed systems do not require large data sets for training and sequential data for efficient prediction as it is required for LSTM but can efficiently predict any randomly chosen data. The proposed equations-based systems can easily be implemented in real practice. In the near future, the performance of the equations-based prediction systems will be compared with other methods such as deep neural network, other new probabilistic prediction systems, etc. In addition, Bayesian optimization, which considers the data to have normal distribution, will be considered in the future work.