A multi-energy load prediction of a building using the multi-layer perceptron neural network method with different optimization algorithms

Since cooling and heating loads are recognized as key characteristics for evaluating the energy efficiency of buildings, it appears indisputable that they must be predicted and analyzed for residential structures. Accordingly, the multi-layer perceptron neural network is applied for predicting the heating and cooling loads using the experimental dataset. The used dataset is obtained by monitoring the impact of the building's dimensions on energy consumption. To optimize the training process of the multi-layer perceptron neural network, several optimizers are employed. Besides, different statistical performance indicators are considered to see which selected optimizer outperforms the rest in terms of accuracy and authenticity. The obtained results emphasize the remarkable performance of adaptive chaotic grey wolf optimization, which can be used to train the multi-layer perceptron neural network and forecast the buildings’ energy consumption with the highest accuracy. According to the obtained results, the hybrid multi-layer perceptron neural network- adaptive chaotic grey wolf optimization method demonstrates the best performance. The optimum number of neurons in the hidden layer is obtained to be 15. Also, based on the statistical performance indicators, the selected method reveals an R2 of 0.9123 and 0.9419 for cooling and heating loads, respectively.

Introduction vehicles, photovoltaic unit, and battery energy storage systems were considered. Besides, the efficient scheduling state of such units was specified considering the uncertainty of distributed energy resources. A model-based predictive control method was also proposed by Ferreira et al. (2012) to control the HAVC system in a building, and the energy consumption was reduced by 50%. In another study, the energy consumption of HVAC was precalculated to optimize the setpoint and deadband parameter of the HVAC system considering a systematic technique (Ghahramani et al., 2016). Considering such studies, the total energy consumption of a building can be significantly minimized when the heating and cooling loads are predicted beforehand.
It should be noted that parameters such as temperature, sunlight equipment, occupant behavior, wall materials, glazing area, surface, height, and volume of the building are important to specify the interactions between the general energy requirements of the building (Barkokebas et al., 2019;Ghahramani et al., 2016). For example, according to the study of Ihara et al. (2015), it was revealed that the whole façade properties consisting of solar reflectance, U-value, and solar heat gain coefficient (SHGC) have a variety of impacts on the energy efficiency of buildings. The authors stated that the energy consumption is minimized when the SHGC is significantly reduced. Liu et al. (2019) have an opposing idea and concluded that the SHGC reduction could not be the main reason for enhancing the energy efficiency in the building without a suitable U-value. Since the weather conditions, building dimensions, and the behavior of residents are also influential in the whole energy consumption, calculating the energy consumption is significantly difficult. Hence, using an artificial neural network (ANN) can be a good idea to specify the heating and cooling loads for forecasting the energy consumption in the buildings.
ANN is one of the most significant artificial intelligence algorithms utilized in various applications. This approach was developed to simulate the architecture and activities of human brain cells during learning and thinking. These cells are known as neurons, and there are more than 50 billion of them in the average human brain (AlShabi and Assad, 2021). ANN approaches are among the methodologies used to estimate the performance of renewable energy systems. In applications involving renewable and sustainable energy, ANN approaches are widely used. For instance, Rashidi et al. (2022) studied the thermal conductivity of EG-Water-based nanofluids containing alumina particles using Multi-Layer Perceptron (MLP) neural network and Group Method of Data Handling (GMDH) as two effective intelligent techniques. According to the obtained results, the coefficient of determination (R 2 ) for MLP neural network with tansig and radial basis functions and GMDH were obtained to be 0.9998, 0.9998, and 0.9996, correspondingly. In another application of ANN models, ANN with two transfer functions, including normalized radial basis and Tansig, was utilized by Komeili Birjandi et al. (2022) to estimate CO 2 emission from Southeast Asian countries. According to the results, using a network with a normalized radial basis and 11 neurons in the hidden layer yielded the most accurate model with an R 2 of 0.9997.
Moreover, ANN models have acquired popularity in the area of energy forecasting of buildings owing to the extraordinary outcomes they deliver. Neto and Fiorelli (2008) indicated that the energy consumption could be accurately predicted through the simulation method and ANN; however, using ANN minimizes the computational cost and is efficient in terms of the expedition. There exists a considerable body of literature on using the neural network method in this regard. To mention a few, Biswas et al. (2016) showed how ANN models might be developed and validated in the TxAIRE Research homes. The number of days, temperature, and solar radiation are all input factors utilized from the home data, whereas house energy use and heat pump energy usage are output variables. Promising findings with R 2 in the range of 0.87-0.91 were obtained using the Levenberg-Marquardt and OWO-Newton algorithms. Li et al. (2018) proposed a hybrid teaching-learning artificial neural network to predict the main parameters for electrical energy. In the cutting-edge paper, the issue of changes in the energy consumption using ANN for forecasting the user-based energy consumption was examined by Lee et al. (2019), and acceptable results were presented. The obtained results imply a relationship between the characteristics and energy usage. Also, Ilbeigi et al. (2020) attempted to introduce a trustworthy method for optimizing the energy consumption of buildings in 2020. In order to simulate the energy consumption, a robust ANN based on the MLP model was created, trained, and tested. In a major advance of 2021, Wang et al. (2021) used a deep convolutional neural network to predict the building load and presented remarkable results. Notably, a novel feature fusion was presented to improve the learning ability of the model. Despite such interest, there are still many concerns regarding the accuracy and ability of the ANNs since the training process is not always efficient. To tackle this problem, optimizers can be used to optimize the training procedure.
A large number of existing studies in the broader literature have examined the capabilities of the various optimizers for training the ANNs in the field of energy consumption prediction. For instance, the issue of optimization in the heating, HAVC system operations, and other building parameters using ANN and multi-objective genetic algorithm (MOGA) was examined by Satrio et al. (2019). The aim was to minimize the annual energy consumption and maximize thermal comfort. Notably, the two-chiller system operation of a building was optimized through the applied techniques, and acceptable results were presented. Furthermore, in a major advance of 2021, Chou and Truong (2021) used the nonlinear machine learning models and specified the historical pattern of regional energy consumption to propose a new prediction system that optimizes the linear time series. Also, the capabilities of the machine learning models in predicting the HL and CL of the building were exploited by Sauer et al. (2021). Zhou et al. (2021) evaluated the energy performance of building using the multi-layer perceptron neural network modified by teachinglearning-based metaheuristic method. As a result, the proposed method's prediction error decreased by around 20%, and the correlation between measured and predicted cooling loads increased from 0.8875 to 0.9207. To boost energy efficiency without deteriorating comfort, Ruiz et al. (2018) proposed Elman neural networks with the genetic algorithm for forecasting energy demand in public buildings. According to the obtained results, an average improvement of 61% was obtained.

Main contributions and novelties
Using energy prediction tools is important to improve decision-making when it comes to reducing the amount of energy used in buildings since they can analyze a wide range of building designs and methodologies. However, several important factors contribute to a building's energy consumption, including weather conditions, the behavior of its inhabitants, and the installed technology and equipment. This makes energy prediction a difficult scientific topic to solve. However, progress has been made in the area of energy consumption evaluation in sustainable buildings. Artificial Intelligence-based methods have recently captivated the interest of scientists and have been applied in a broad variety of applications.
Despite the significance of the above-mentioned reviewed papers, there are some research gaps: • The review studies do not concentrate on the impact of "building characteristics" on energy efficiency. Various features/characteristics, including temperature, weather, and date, have been used to estimate energy consumption. However, there aren't many studies on building energy consumption prediction research that concentrates on the specifics of the building (surface area, wall area, roof area, relative compactness, overall height, orientation, glazing area, and glazing area distribution), which are the most important aspects to take into account during the design phase. • The reviewed studies do not pay much attention to the categorization/combination of prediction techniques and hybrid optimization algorithms. The current work aims at giving a new combination of prediction techniques and optimizers with thorough descriptions, statistics, and comparisons of the used approaches. • Existing works do not offer a mapping between energy consumption forecast methodologies and appropriate performance indicators in buildings.
To bridge these gaps, this research aims at using different optimization approaches to improve the multi-layer perceptron neural network in order to forecast the heating and cooling of residential buildings. Surface area, wall area, roof area, relative compactness, overall height, orientation, glazing area, and glazing area distribution are the eight independent parameters employed in this research. Test and training data are also evaluated to demonstrate the dependability and accuracy of the findings. Therefore, the MLP neural network is considered for the process of prediction in which genetic algorithm (GA), multi-verse optimizer (MVO), chaotic grey wolf optimization (CGWO), adaptive chaotic grey wolf optimization (ACGWO), augmented grey wolf optimizer (AGWO), grey wolf optimizer (Lee et al.), particle swarm optimization (PSO) are employed to optimize the hyperparameters of MLP. Additionally, the statistical analysis considering standard error (SE), mean squared error (MSE), root mean squared errors (RMSE), mean absolute percent error (MAPE), mean absolute error (MAE), relative absolute error (RAE), correlation coefficient (R), Coefficient of determination (R 2 ), and normalized mean squared errors (NMSE) as the statistical performance indicators were used to see which model is the best in terms of accuracy, expedition, and authenticity.

Methodology
The methods employed to analyze the problem are described in this section. In fact, the heating and cooling loads are predicted based on the experimental data presented in this paper. For this purpose, six meta-heuristic algorithms, namely GA, MVO, CGWO, ACGWO, AGWO, GWO, and PSO, are considered to optimize the MLP neural network that are explained here. Notably, a statistical analysis is also conducted here to introduce the best prediction technique among the selected ones.

Multi-layer perceptron (MLP) neural network
Artificial neural networks (ANNs) have received much attention due to their remarkable applications for modeling engineering parameters and are widely used to solve complicated problems. In fact, ANNs are popular due to their striking features such as adaptive learning, self-organization, real-time operators, generalization, stability and flexibility, and parallel processing. ANN is mainly used to estimate functions, make predictions, pattern recognition, control, and so on (Bui et al., 2018). A multi-layer perceptron (MLP) which is considered a subbranch of a feedforward artificial neural network (ANN), is employed in the present work to specify the unknown parameters of HL and CL. The aim here is to predict the energy consumption in buildings and minimize it significantly according to the values of HL and CL. Accordingly, the MLP is utilized to estimate any nonlinear function with the highest accuracy and identify the system's unknown parameters.
However, the outputs are sometimes not efficient as expected, and the training process needs to be optimized using the optimizers in this case. Hence, several optimizers are used in this study to reach the best parameters and examine the HL and CL that are explained in the following.

Optimization methods
The optimization methods considered to specify the best parameters of the CL and HL and train the MLP neural network are as follows: GA, MVO, CGWO, ACGWO, AGWO, GWO, and PSO.
As shown in Figure 1, the optimization process is conducted for eight inputs, namely relative compactness, surface area, wall area, roof area, overall height, glazing area, orientation, and glazing area distribution. Also, cooling and heating loads are considered the outputs in this analysis.
According to Figure 1, the input and output data are pre-processed and normalized in the beginning, and then the network structure is determined. After that, the activation function, parameters, hidden layers, maximum number of neurons in the hidden layer, and the performance function are selected. Then, the main processes of training for the network based on the appropriate algorithms are conducted. Notably, the best algorithm in terms of accuracy and expedition is specified through the statistical analyses.
Genetic algorithm and particle swarm optimization. The genetic algorithm has a vast range of applications for optimizing the training process in the MLP neural network. Due to the remarkable ability to create high-quality solutions for optimization and search problems, this algorithm is employed here, relying on biologically inspired operators such as mutation, crossover, and selection (Gerges et al., 2018).
Moreover, PSO has significantly drawn the researchers' attention for efficiently identifying the unknown parameters and optimizing the network training process (Askarzadeh and Rezazadeh, 2011;Sedighizadeh et al., 2011). Basically, the emulation of birds' movements constituted the PSO, which mainly suggests some initial solutions (Takagi and Sugeno, 1985). As the name of this optimizer implies, the swarm means a that is also indicated by the set of particles. The particles of the swarm are considered for searching the space to specify the efficient solution and best location. Accordingly, the swarm's particle constantly corrects its location. Eqs. 1 and 2 indicate the best swarm location and particle location that Gbest and Pbest can show, respectively (Diab and Rezk, 2017;Mohamed et al., 2019).
According to the equations above, k is the iteration, and a particle i changes its position in a dimension d from location P(k) to location P(k + 1), which depends on these equations. Besides, the   weight of inertia is indicated by w. Besides, c 1 and c 2 are the two scaling constants based on which the local experience and the global experience weights are determined. Furthermore, r 1 and r 2 are considered to represent the two random parameters according to their variations in the closed range of [0 1], which indicates the stochastic behavior of the optimizer.
Multi-verse optimizer. The current paper has also exploited the MVO as a new stochastic populationbased algorithm to optimize the process of training in the MLP neural network (Mirjalili et al., 2016). This algorithm is basically inspired by the theory of multi-verse in physics. Notably, the MVO is based on three concepts of the multi-verse theory, including the white hole, black hole, and wormhole, that are mathematically modeled to construct this optimizer.
Grey wolf optimizer and chaotic grey wolf optimization. Many studies emphasize the benefits of GWO, which is inspired by grey wolves (Canis lupus) (Mirjalili et al., 2014). As shown in Figure 2, the leadership hierarchy and hunting mechanism of grey wolves in nature is the base of this optimizer according to which the optimization process is conducted. Grey wolf or Canis lupus which belongs to the Canidae family, is based on the mechanism saying that grey wolves are regarded as apex predators and are at the top of the food chain. These animals mainly tend to live in packs with a population of 5-12 wolves on average. According to Figure 2, a considerably strict dominant hierarchy is considered for them. The male and female can be the leaders in this pack and are categorized in the alphas group responsible for deciding hunting, sleeping place, wake-up time, etc. Hence the members of the pack are supposed to follow the alphas' orders. Accordingly, the beta category belongs to the subordinate wolves, including males and females, that aid the alpha in decision-making and other pack activities. Also, the omega group, which includes the lowest-ranked wolves, is related to the ones that must always (Mech, 1999) follow the rules and mainly eat the leftovers. Lastly, the delta group is related to the ones that are not alpha, beta, or omega. Based on the study of Muro et al. (2011), the major steps of grey wolf hunting can be described as follows:

Parameter Value
Size of the initial population in GWO 50 Number of GWO iteration 50 Range of scale value (k) 0 < k < 1 r 1 and r 2 dynamic Range of r 1 and r 2 0 < r 1 <1, 0 < r 2 <1 Number of nests (CS) 15 Number of iterations in CS 15 Pa (probability of nest rebuilding) dynamic • Tracking, pursuing, and coming close to the prey.
• Chasing, surrounding, and harassing unless the prey stops running.
• Attack towards the prey.
According to the aforementioned explanations, the hunting technique and the social hierarchy of grey wolves are mathematically modeled for designing GWO and conducting optimization. GWO algorithm can theoretically solve the optimization problems whose pseudocode is provided in Figure 3 (Mirjalili et al., 2014). This algorithm is employed in the present paper since it can solve complicated function optimization and engineering problems (Yu et al., 2016). Despite the fact that GWO has an acceptable convergence rate, GWO cannot always be efficient in specifying the global optima with a considerable effect on the convergence rate of the algorithm (Kohli and Arora, 2018). In order to tackle this problem, the CGWO can be considered. When chaos is added to the GWO, the CGWO is developed. This algorithm is actually a deterministic, random-like method found in a nonlinear, dynamical system, and it is non-period, non-converging, and bounded. Accordingly, a variety of chaotic maps with different mathematical equations can be employed. Hence, it is able to explore the search space more dynamically and globally during the optimization process.
Augmented grey wolf optimizer. In spite of the striking benefits of GWO, including simplicity, flexibility, and globalism, the Augmented AGWO has a better performance in hunting (Qais et al., 2018). This algorithm which is more useful for the low number of search agents like electric power system applications, is based on raising the possibility of the exploration process over the exploitation process by correcting the behavior of the control parameter and position updating. Figure 4 illustrates the number of iterations based on the convergence factor (a) for the linear regressive and nonlinear regressive curves. Accordingly, the highest value of a and the iterations number are 2 and 100, respectively. (continued) Adaptive chaotic grey wolf optimization. The fundamental GWO algorithm also has several shortcomings, including that it easily falls into local optimization and slow convergence in the later phase of the search. In order to deal with such drawbacks, an ACGWO with the capabilities of tackling the uncertainties can be a good idea which is considered in this paper. In addition, the flowchart of this algorithm method proposed here is outlined in Figure 5.

Statistical analysis
The main statistical parameters, including SE, MSE, RMSE, MAPE, MAE, RAE, R, R 2 , and NMSE, are considered to examine the best algorithm in terms of accuracy and authenticity. More details regarding these performance criteria are presented in the previous studies and are computed as follows (Bui et al., 2018;Chou and Bui, 2014;Chou et al., 2016):     (7), the number of instances and the number of outputs are indicated by N and N out , respectively. It should be noted that y and y is the actual output and the predicted output through ANN (Bui et al., 2020).

Data collection
The total dataset considered in this paper includes 770 inputs, of which 75% is used for the training process, and the rest is allocated for testing (Tsanas and Xifara, 2012). Employing Ecotect simulation software, this dataset has been obtained from twelve building types which were indicated by 18 simple cubes (3.5 m × 3.5 m × 3.5 m), and their shape is demonstrated in Figure 6. Accordingly, the building with the same volume and various surface areas and dimensions is considered. The simulated buildings were residential buildings in Athens, Greece. Considering this dataset, the impact of dimension on the CL of a building was examined. Thus, the material properties of the façade employed for the whole twelve buildings consist of the U-value of the wall (1.78 W / m 2 K), window (2.26 W / m 2 K) floors (0.86 W / m 2 K), and roofs (0 .50 W / m 2 K) considered for this analysis are the same. Notably, the values of 300 lux and 2 W / m 2 were set for the lighting level and latent heat. Employing eight features consisting of the relative compactness (RC), surface area, wall area, roof area, overall height, orientation, glazing area, and glazing distribution were considered for simulating the CL. Accordingly, using Eq. 8, the RC is computed as follows (Bui et al., 2020):    (1) According to Eq. (8), the volume and surface area of the building are denoted by V and A.
The building was simulated according to the experiments, considering a glazing system and a system without glazing. Three glazing-to-floor area ratios, which are 10%, 25%, and 40%, are employed in the glazing system. In this situation, five glazing distributions are examined which consisting of 25% glazing for the faces, 55% on the north side and 15% for the outer faces; 55% for the east face, 15% for the other face; 55% in the west face and 15% for the other face (Tsanas and Xifara, 2012). Lastly, the whole building shapes were rotated to four orientations: north, south, east, and west The necessary information regarding the selected parameters is highlighted in Table 1.

Results and discussion
The main results obtained in this study are presented and discussed in this section. The selected optimization methods illustrated in the second section are applied to train the MLP neural network, and the obtained results are analyzed.

Neuron number selection
According to the MLP neural network, the number of neurons is selected for the CL and HL parameters that plays a prominent role inaccuracy of the final results. Accordingly, the possibility of having the highest accuracy for eight, fifteen, twenty, twenty-four, and thirty neurons are examined here. According to Figures 7 to 9, fifteen neurons are the best choice for the analysis of this study. As shown in Figures 7 and 8, the accuracy generally increases when the neuron raises. But this trend is not good after fifteen neurons, and the data are not well organized when the number of neurons is higher than fifteen. In this case, the accuracy is not desirable, and the computational cost is high as well. Besides, the values of R 2 are generally high, as shown in Figure 9.  As regards Figure 10, the statistical results of RMSE based on the selected optimization methods for HL and CL parameters are presented. Accordingly, the ACGWO method has the best performance since its RMSE value is lower than the rest and includes 10% of the whole. On the other hand, the genetic algorithm (GA) is considered the weakest optimization technique in terms of the RMSE performance criterion. Besides, CGWO for the parameter of CL has an acceptable performance after ACGWO. Notably, the performance of PSO, MVO, and AGWO for the parameter of HL is also considerable.
Furthermore, the results shown in Figure 11 imply that ACGWO has the best performance compared to the others. The results of RAE, SE, and MAE performance criteria are drawn for the test data, training data, and total data based on HL and CL. Accordingly, there is a considerable gap between the values of these parameters obtained for GA and the values obtained for the other optimizers. As a result, GA has the weakest performance and is not suitable for optimizing the MLP neural network. The values of RAE, SE, and MAE are not a lot which means that the performance of MVO, CGWO, AGWO, GWO, and PSO is acceptable, especially for examining CL.
The results of R, R2, MSE, and MAPE are also indicated in Figure 12, according to which ACGWO has the best performance and GA has a significant difference from the rest, which is not desirable.
As shown in Figure 13, AGWO has the highest speed in the convergence rate and shows a value of 0.238 s based on 157 iterations. Notably, the least value of convergence rate belongs to ACGWO with 3000 iterations at 0.201 s.
According to Figures 14 to 16, since the values of R and R 2 are more than the rest in ACGWO, this optimizer is more beneficial and is likely to predict the energy consumption with the highest accuracy. It is also noteworthy that the error rate in this optimizer is considerably low, and this result is not observed for the other parameters.
In general, the statistical results that are highlighted in Tables 2 and 3 emphasize the remarkable capabilities of ACGWO for optimizing the training process in the MLP neural network. The values of R and R 2 are significantly high for ACGWO, while this trend is vice versa for GA. Table 3 organizes the selected optimizers in order of accuracy according to which GA has the weakest performance and ACGWO is strikingly accurate.

Conclusion
In summary, the issue of energy consumption has been comprehensively examined by calculating the HL and CL of a building. Using the MLP neural network for a dataset, the heating load (HL) and cooling load (Zanchini and Naldi) were calculated for the building. Since the training process of the MLP network is not desirable to bring the optimal results, the various optimizers have been considered, including GA, MVO, CGWO, ACGWO, AGWO, GWO, and PSO. In addition, the statistical analysis was conducted to find the best optimizer, which was able to obtain the most accurate results. For this purpose, the performance criteria, namely SE, MSE, RMSE, MAPE, MAE, RAE, R, R 2 , and NMSE, were analyzed in this study. However, MVO, CGWO, AGWO, GWO, and PSO turned out to be practical for the optimization process; the performance of ACGWO outperforms the rest according to the statistical analyses. On the other hand, the performance of GA was not acceptable since it has the highest values of R and R 2 and the least values of SE, MSE, RMSE, MAPE, MAE, and RAE compared to the other selected optimizers. Notably, AGWO has the highest speed in the convergence rate and shows a value of 0.238 s based on 157 iterations. Overall, ACGWO is introduced as the best optimizer in terms of accuracy and authenticity for predicting energy consumption. More research is needed to apply and test other methods and parameters in future research. According to the obtained results, MLP-ACGWO demonstrates the best performance when the number of neurons in the hidden layer is considered to be 15. In this condition, for the total dataset, the coefficient of performance is obtained to be 0.94198 and 0.9123 for the heating load and cooling load prediction, respectively.