A comprehensive comparative analysis of machine learning models for predicting heating and cooling loads

The continuous increase in energy consumption has brought worldwide attention to its significant environmental effect, which is triggered by the increase in greenhouse gas emissions, global warming, and rapid climate change. As such, more energy efficient buildings are required to minimize the energy consumption of heating and cooling. The present study introduces a set of machine learning-based models to predict the heating and cooling loads in buildings. This includes back-propagation artificial neural network, generalized regression neural network, radial basis neural network, radial kernel support vector machines and ANOVA kernel support vector machines. The comparisons were conducted as per mean absolute percentage error (MAPE), mean absolute error (MAE) and root-mean squared error (RMSE). Finally, the significances of the capacities of the machine learning models are evaluated using two-tailed student’s t-tests. Results demonstrate that the radial basis function network outperformed the afore-mentioned machine learning models.


Introduction
The increase in energy consumption contributes significantly to the increase in greenhouse gas emissions, which consequently amplifies the climate change implications (United Nation, 2005). This increase is associated with the growth of urbanization and industrialization in both developing and developed countries. As the growth continues, the energy consumption will rise along with its impact. It is worldwide admitted the building sector is a significant contributor to energy consumption in the world. Buildings (e.g. living, commercial. public places) require around 2 billion Tons Oil Equivalent (TOE) fuel, which is about 31% of fuels for global energy use. Buildings also consume 0.84 billion TOE in electricity and heating, which is about 46% and 51% for energy use. The consumption of building sector on developing countries is about 20% -25%, while it is about 30%-40% in the developed countries (Akande et al., 2015;Al-Sakkaf, et al., 2019). The building industry accounts for 32% of global energy consumption, more than a third of material global resource consumption, global energy consumption, and 12% of all fresh water use, in which these percentages contribute to an estimated 40% of global solid waste generation and 40% of CO₂ emissions. Hence, the sustainable buildings are crucially needed to help in decreasing greenhouse gas emissions (GHG) and their related side effects, which assist in reducing air pollution, improve occupants' health and quality of life, increase productivity, employment rate and new business opportunities, improve social welfare and poverty alleviation, and increase energy security. Consequently, understanding the sustainability of buildings is vital, which highlights the need of energy prediction models to assess the sustainability of buildings, to reduce their harmful impacts on the environment, and to encourage the facility managers and investors to improve the performance of their buildings by taking into consideration economic, environmental and social aspects (Ürge-Vorsatz et al., 2007;Mahmoud, et al., 2019;Al-Sakkaf, et al., 2019).

Literature review
Energy performance of a building was defined by (Poel et al., 2007) as "the amount of energy actually consumed or estimated to meet the different needs associated with a standardized use of the building". According to them, this amount is reflected in one or more calculated numerical indicator. They also pointed out that energy performance is triggered by other characteristics namely: insulation, technical and installation characteristics, design and positioning in relation to climatic aspects, solar exposure and influence of neighboring structures; building's own energy production; and other factors such as indoor climate. Radhi et al. (2013) presented a model to evaluate the impact of climate interactive facade systems (CRFS) on cooling energy in fully glazed buildings. This research combines building energy simulation and computational fluid dynamics to determine the boundary conditions and to develop geometrical models based on a new constructed multi-storey building. Mousa et al. (2016) presented an approach to analyze and visualize building carbon emissions by incorporating both building information modeling and carbon estimation models. The proposed approach provides a graphical representation of data, which allows facility managers to take informative decisions. Attar et al. (2013) utilized TRNSYS simulation platform to model and evaluate the performances of a solar water heating system (SWHS) used for greenhouses as per the Tunisian weather. They pointed out that the stored solar energy cannot, alone, meet the total requirements of heating. Hence, it is necessary to use an auxiliary heating system such as fuel boiler or electric energy. Fumo and Biswas (2015) developed a multiple regression analysis-based model to predict the whole building energy consumption in single-family homes. They highlighted that the time interval of the observed data plays a very important role in improving the prediction capacity of the regression model. Li et al. (2017) presented a hybrid simulation-optimization approach to minimize the CO2 emissions of on-site construction processes in cold regions. They concluded that optimizing labor allocation can result in a reduction in the on-site construction emissions by 21.7%. Wong et al. (2013) presented a virtual prototype-based model to predict and simulate carbon emissions of construction projects. They pointed out that the developed visualization model provides an interactive tool for decision-makers to manage construction project. Fahmy et al. (2014) evaluated three different external walls specifications for three climatic zones scenario in Egypt based on energy consumption, energy cost and the thermal comfort. The experiments are based on building performance simulations which take into considerations thermal properties of materials. They demonstrated that the 10cm GRC (C2) wall specification provided a better energy efficient alternative than the single wall of half red-brick -Ct. Afsordegan et al. (2016) introduced a multi-criteria decision making approach for sustainable energy planning. The weights of the nine attributes were obtained using fuzzy AHP (analytical hierarchy process). Seven energy alternatives were presented, whereas qualitative TOPSIS was utilized to provide a final ranking for the alternatives. They stated that qualitative TOPSIS provided the same final ranking when compared to the modified fuzzy TOPSIS although it required less cognitive effort to the decisions makers. Aranda et al. (2012) developed a multiple regression analysis-based model to forecast the energy consumption in the banking sector. They constructed three models, whereas the objective of the first model was to forecast the energy consumption of the whole banking sector. The objectives of the second and third models were to predict the energy consumption of the branches with low winter climate severity and high winter climate severity, respectively. They concluded that the first model was capable of predicting energy consumption of bank branches with good energy consumption and detecting inefficiencies in bank branches with poor energy consumption. Hygh et al. (2012) developed a multivariate regression model to evaluate the building energy performance in early design stages. They indicated that standardized regression coefficients (SRCs) can provide valuable information to designers regarding the relative impact of each parameter on heating and cooling loads. They highlighted that the proposed model provides architects with an assessment tool that can provide rapid feedback based on changes to high level design parameters. Marzouk et al. (2017) developed a building information modeling (BIM)-based model that enables the estimation of six types of environmental emissions including: greenhouse gases, sulfur dioxide, particular matter, eutrophication particles, ozone depleting particles and smog during different project phases. They concluded that the indirect emissions produced from manufacturing phase and transportation offsite phase constitute from 85% to 94% of the equivalent amount of greenhouse gases in the case of conventional materials, Abanda et al. (2013) presented a mathematical model to enhance the prediction of embodied energy, greenhouse gases, waste, time-cost parameters of building projects. They concluded that the developed quantification model can facilitate the engagement into low carbon buildings. Alarcon et al. (2011) constructed a general value function to assess the sustainability of industrial buildings. They highlighted that the mathematical formulation of the universal function is flexible enough to incorporate the decision makers' preferences. Cuadrado et al. (2015) presented a methodology to calculate an environmental sustainability index for timber structures based on h Integrated Value Model for Sustainability Assessment (MIVES) concept. By the help of the developed model, decision makers can identify the most environmentally sustainable alternatives to perform at the design phase.
Yildirim and Bilir (2017) evaluated the renewable energy option for the required total energy need of a greenhouse using solar photovoltaic panels. They concluded that the energy payback time of the system was 4.9 years. Moreover, they highlighted that the greenhouse gas payback time was 5.7 and 2.6 years based on natural gas and coal-based electricity generation, respectively. Marzouk and Mohammed Abdlekader (2019a) introduced a hybrid fuzzy multi-objective non-dominated sorting genetic algorithm II model to select the most sustainable materials of building components subject to time, cost and environmental constraints. Then, they utilized TOPSIS technique to select the most feasible solution among the finite set of optimal solutions.

The proposed method
The primary objective of the present study is to develop a machine learning-based model to forecast the heating and cooling loads based on a set of main characteristics of residential buildings. The framework of the proposed model is described in Figure 1. The framework of the proposed model is composed of four modules. The dataset used in the present study are 768 instances of energy simulation generated using Ecotect energy analysis software. The dataset was published by UCI machine learning repository (Asuncion & Newman, 2007) based on the work published by Tsanas and Xifara (2012). It is worth mentioning that the simulated energy is generated from different buildings of different surface areas and dimensions. The input variables utilized to predict the heating and cooling loads include: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution. The number of possible values of relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution are 12, 12, 7, 4, 2, 4, 4 and 6, respectively.
Five machine learning models are constructed to forecast heating and cooling loads such that a separate prediction model is constructed for each of the heating and cooling loads. These models are: backpropagation artificial neural network, generalized regression neural network, radial basis neural network, radial kernel support vector machines and ANOVA kernel support vector machines. The performances of these models are evaluated using split validation based on mean absolute error percentage error, mean absolute error and root-mean squared error. Eventually, two-tailed Student's t-tests were performed to evaluate the significance level of the outcome of the machine learning models (Mohammed Abdelkader et al., 2019b).

Model development
This section describes some of the models and algorithms presented in the "Proposed Method" section.

Back-propagation Artificial Neural Network
A neural network can be defined as a parallel distributing process between input layer, output layer, and one or more hidden layer that are connected by neurons. Each neuron receives one or more inputs and produces an output through an activation function. Each neuron in the hidden layer receives a signal from all the input layers which is equal to the weighted sum of all neurons entering the neuron. There is a weight for each connection between neurons. The most common transfer or activation function is sigmoid function and it can be calculated using Eq. (1) (Mohammed Abdelkader et al., 2019c;Wang et al., 2015). where; x represents the weighted sum of all neurons entering the hidden neuron.
The input of the neurons in the output layer should be also transformed using the sigmoid activation function. The error function at the output neuron should be minimized and it can be calculated using Equation (2). where; E W represents the error function. d and O represents the actual and predicted values, respectively.
Based on the gradient descent algorithm, the weights are adjusted during each training epoch (k) based on Eq.
(3), whereas the error partial derivative is computed during each training epoch and subsequently, as per the error partial derivative and the learning rate, the weights are updated (Yu & Xu, 2014). where; ∆ W ((k) represents the adjustment or increment in the weights (weight updates). W (k + 1) and W (k) represent the new (updated) and current (old) weights, respectively. η depicts the learning rate.
( ) represents the error partial derivative with respect to the weights.

Generalized regression neural network
Generalized regression neural network is a type of feed forward neural networks that is based on normalized radial basis function and kernel regression. In generalized regression neural network, a probabilistic function is applied to model the dependent variables in a regression function simulation problem. Due to its probabilistic nature, generalized regression neural network does not face the problem of local minima entrapment that is encountered by other types of neural network. It is composed of input layer, pattern layer, summation layer and output layer. The numbers of neurons in the input layer and output layer are equal to the dimensions of the input vector and output vector, respectively. The input variables are transmitted directly from the input layer to the pattern layer. The number of neurons in the pattern layer depends on the total number of observations in the training dataset. Gaussian function is the most commonly-utilized activation function. In the summation layer, the number of neurons is equal to the number of output neurons plus one. There are two different types of summation in the summation layer which are: single division unit and summation unit. The spread parameter is a significant parameter in designing the generalized regression neural network, which heavily influences the recognition capacity of the generalized regression neural network. A small spread can degrade the neural network generalization while the large spread can smooth the function approximation. In both cases, inaccurate spread parameter can heavily alter the network's performance (Lu et al., 2015;Modaresi et al., 2018).

Radial Basis Neural Network
Radial basis neural network is a type of feed forward neural networks that utilizes supervised learning technique to model the input-output relationship. Radial basis neural network offers faster convergence, high reliability and smaller extrapolation errors when compared to the multi-layer perceptron. The architecture of the radial basis neural network is less complex than the multi-layer perceptron and it is based on iterative function approximation and localized basis function. It is composed of input layer, hidden layer with non-linear activation function and output layer. The input layer collects the input information while the hidden layer performs the non-linear transformation of the input data. In the radial basis neural network, the Gaussian function is the activation function, which implies that the center and width of the activation function are the two parameters which heavily influence the neural network performance. The weights of the neural network are adjusted based on minimizing the meansquared error using the gradient descent algorithm (Vallabhaneni & Maity, 2011;Pinar et al., 2010).

Support Vector Machines
Support vector machines (SVM) is a supervised learning technique that can be utilized in either classification or regression applications. Support vector machines were originally proposed by Cortes and Vapnik in 1995 for classification purposes (Kohestani & Hassanlourad, 2016). SVM are capable of learning and modeling both linear and complex (non-linear) mapping functions. The simplest form for SVM is linear SVM where it can be utilized for the data that are linearly separable in the current space or the original space (Park et al., 2008). Support vector machines can sometimes be called "Support Vector Regression" when used for regression purposes and it applies similar principals of the support vector machines for classification models. Consider the following linear model which infers the relationship between response variable and one or more independent variables. The linear model is shown in Eq. (4).
y(x) = w ø( ) + where; ø(x) represents non-linear mapping function which maps the training data into high dimensional linear feature space. w represents normal vector to the regression hyperplane. b indicates the bias term. x and y represent input variables and output variables respectively. The optimization problem is shown in Eq. (5).
Minimize J (w, e) = 1 2 w w + γ 1 2 e where; is a regularization constant and it is greater than zero. indicates number of training samples. e represents the error variable. The solution of the primal problem is very difficult so it is important to establish the lagrangian and to derive the dual problem as shown in Eq. (6).
where; α represents the lagrange multiplier and ø(x ) indicates the Kernel function. According to Mercer condition, the kernel function can be applied using Eq. (7) where it is equal to inner product of two vectors.
The support vector machines can be implemented for regression using Eq. (8).
where; α , and b are the solution of linear system obtained after defining the conditions of optimality. Any function that satisfies Mercer's condition can be used as a Kernel function. Non-linear regression can be implemented using non-linear functions called "kernel functions". Non-linear classification is performed by mapping the data to a high-dimensional space where linear classification is feasible. The kernel function is implemented to calculate the inner product of two vectors in a high-dimensional space. The three most common kernel functions are polynomial, radial basis, and sigmoid functions as shown in Eq. (9), Eq. (10) and Eq. (11), respectively.
where; σ, p, α , and β are adjustable kernel parameters. The proposed model investigates the application of radial kernel function and ANOVA kernel function to predict the future heating and cooling loads.

Performance Indicators
The present study utilizes three performance indicators to compare between the five machine learning models. The three performance indicators are: root-mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). RMSE, MAE, and MAPE can be calculated using Eq. (12), Eq. (13) and Eq. (14), respectively (Nazari et al., 2015;Ranjith et al., 2013). where; O and P stand for the observed and predicted heating or cooling loads, respectively. K indicates number of observations.

Model implementation
The dataset is comprised of 768 observations, whereas 614 data points are utilized for training while the remaining 154 data points are used for testing purposes. A sample of the data set required to build the heating and cooling loads prediction models is shown in Table 1. The terms "X1", "X2", "X3" "X4", "X5", "X6", "X7" and "X8" stand for relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution, respectively. The terms "Y1" and "Y2" refer to the heating load and cooling load, respectively. As mentioned before, five machine learning models are developed to predict the heating and cooling loads. For the back-propagation feed forward neural network, the number of hidden layers, number of hidden neurons and momentum coefficient are assumed 4, 2 and 0.001, respectively. In the generalized regression neural network, the spread of the Gaussian activation function is assumed 1. For the radial basis neural network, the maximum number of neurons in the hidden layer is assumed 10 while the spread of the Gaussian activation function is assumed 1. The gamma and convergence epsilon of the radial kernel support vector machines are 1 and 0.001, respectively. The gamma, degree and convergence epsilon of ANOVA kernel support vector machines are 1, 2 and 0.001, respectively. A sample of 30 observations for the actual and predicted heating and cooling loads using back-propagation feed forward neural network and ANOVA kernel support vector machines are presented in Figures 2 and 3, respectively. As shown in Fig. 2, back-propagation feed forward neural network can serve as an efficient platform to predict the heating loads. However, ANOVA kernel support vector machines failed to predict the cooling loads as shown in Fig. 3. The predicted values of the heating and cooling loads using the five machine learning models are described in Tables 2 and 3, respectively.   A comparative analysis between the different machine learning models is described in Table 4 and 5, respectively. As shown in   Two-tailed Student's t-tests were performed to evaluate the significance level of the machine learning models' outcome, whereas the significance level (α) is set to be 0.05. The performed student's t-tests examine the null hypothesis (H ), which is that there is no significant difference between the capacities of the machine learning models. On the other hand, the alternative hypothesis (H ) assumes that there is a significant difference between the capacities of the machine learning models. If the P − value is less than the significance level, then the null hypothesis is rejected in favor of the alternative hypothesis. Nevertheless, if the P − value is more than the significance level, thus the null hypothesis is accepted. The paired Two-tailed Student's t-tests of the machine learning models for heating and cooling loads prediction are depicted in Tables 5 and 6, respectively. As presented in Table 6, the pairs (BPNN, GRNN), (BPNN, Radial SVM), (GRNN, RBNN), (GRNN, Radial SVM), (RBNN, Radial SVM) and (Radial SVM, ANOVA SVM) are less than 0.05, which means that the null hypothesis (H ) is false. Thus, there is a statistically significant difference between the pairs of the machine learning models. The P − value of the pair (BPNN, ANOVA SVM) is more than 0.05, which highlights that there is no statistically significant difference between the machine learning models. In table 7, the pairs (BPNN, GRNN), (BPNN, Radial SVM), (BPNN, ANOVA SVM), (GRNN, RBNN), (GRNN, Radial SVM), (GRNN, ANOVA SVM), (RBNN, Radial SVM) and (RBNN, ANOVA SVM) are less than 0.05, which evinces that there is statistically significant difference between the pairs of the machine learning models.

Table 6
Statistical comparison between the machine learning models for heating loads prediction based on twotailed Student's t-test

Conclusion
Energy consumption and greenhouse gases have increased significantly. Thus, there is a substantial need for efficient energy measures for buildings. Energy efficiency measures are meant to reduce the amount of energy consumed for space heating and/or cooling while maintaining or improving the quality of services provided in the building. This study is based on a comprehensive literature review, and a comparative analysis among the different machine learning models to predict the heating and cooling loads in residential buildings. The investigated machine learning models are: back-propagation artificial neural network, generalized regression neural network, radial basis neural network, radial kernel support vector machines and ANOVA kernel support vector machines. Radial kernel support vector machines achieved the least performance for predicting the heating loads, whereas it achieved MAPE, RMSE and MAE of 17.0715%, 5.74 and 3.863, respectively. Radial basis neural network attained the highest performance, such that MAPE, RMSE and MAE are 1.016%, 0.5363 and 0.2133, respectively. Radial kernel support vector machines achieved the least performance in predicting cooling loads, whereas it achieved MAPE, RMSE and MAE of 17.0715%, 5.74 and 3.863, respectively. On the contrary, radial basis neural network achieved the highest prediction accuracy, such that it achieved MAPE, RMSE and MAE of 12.7762%, 4.9793 and 3.0076, respectively. . Eventually, the two-tailed Student's t-tests were performed to explore the statistical significance of the output provided by the different machine learning models. Accordingly, it is expected that the radial basis neural network can provide a solid paradigm for modeling the heating and cooling loads in residential buildings.