Development of a surrogate model by extracting top characteristic feature vectors for building energy prediction

doi:10.1016/j.jobe.2018.12.018

Journal of Building Engineering

Volume 23, May 2019, Pages 38-52

https://doi.org/10.1016/j.jobe.2018.12.018 Get rights and content

Highlights

•
Determination of annual energy consumption of 100 thousand input combinations using machine learning techniques.
•
Verification of the predicted energy consumption with EnergyPlus simulated energy consumption.
•
Usage of efficient sampling techniques to increase the accuracy of the machine learning models.
•
Demonstration of this study on two Indian cities namely, Jaipur and Hyderabad.

Abstract

In early stage of building design, design team has to consider and simulate energy consumption for several combinations of various input parameters to analyze the building energy consumption. In a scenario considering five parameters, each with ten variations, one has to simulate hundred thousand combinations. It requires a lot of computation to simulate energy consumption for all the input combinations. This paper aims at reducing the computation required to compute the energy consumption of all the combinations. This is done by identifying appropriate training samples, computing their energy consumption using EnergyPlus and estimating energy consumption of the rest of the data using machine learning techniques. This paper presents two sampling methods along with various regression techniques to predict energy consumption of a building in the early phase. It involves usage of efficient sampling methods for identifying the training data. The key contribution of this method of surrogate modeling is saving a lot of computation by reducing the computation by ~100-fold. This method is tested for Jaipur and Hyderabad cities of India. Approximately hundred thousand simulations are performed for each location using parallel computation. By simulating approximately one percent of the input combinations, annual energy consumption for the large set of combinations are predicted using SVR and k-means clustering for Jaipur with accuracy greater than 93% for 99.8% of the input combinations. When the same model is trained for Hyderabad, it produced accuracy greater than 93% for 98% of the input combinations.

Introduction

Buildings consume 27% of total US delivered energy in the United States. Energy delivered to the building sector is expected to grow 0.3%/year from 2017 to 2050 [1]. As per the report Global Construction 2030, the volume of construction output would grow by 85% to $15.5 trillion worldwide by 2030, with China, US and India, leading the way and accounting for 57% of all global growth [2]. The construction industry is growing rapidly and there is need to incorporate energy efficiency. Much of the building energy use is wasted because of “poor design, inadequate technology and inappropriate behaviors” [3].

In the year 2015–16, of the total electrical energy consumption in India, buildings accounted for 32.45%. This is a considerably big number following right after the industry sector (42.30%) [4]. Hence, for a developing country like India, it is very important to employ methods to improve the energy footprint and efficiency of the buildings sector.

Building design has various phases which include predesign, schematic design, design development, construction documents, construction administration, procurement and operation. In early design phase of a building construction, it is easy and inexpensive to make significant design changes in order to arrive at the right solution. Building Energy Modeling (BEM) is a crucial operation in predesign and schematic design phases, with a potential to optimize the building energy consumption by a decent margin. In early stage of the design a simplified energy simulation model can be used and envelope related parameters are important to consider. Finding the optimum design choices is important for a building. Many methods are implemented in the past to identify the low cost design choices. Mostavi et al. [5] built a framework to build a multiobjective design optimization tool. They used the building envelope as the main component for optimization. Fensanghary et al. [6] achieved the optimized solutions by taking pareto optimal solutions into account. BEM plays a key role in finding an energy optimized configuration for the building. This usually requires testing of a large number of configurations using dynamic Building Energy Simulation (BES) programs such as EnergyPlus [7], IES-VE [8], eQuest [9]. For example, in a scenario studying energy consumption over five variables, with ten values for each variable, it requires studying 100 thousand different input combinations. Each combination needs to be simulated using BES tool to compute the effect on energy consumption. This task is computationally very expensive.

This task is not only computationally expensive but also demands a lot of time. EnergyPlus software takes around 150 s on a regular personal computer to complete one energy simulation for a simple five zone model. To perform 100 thousand energy simulations, total approximately 1666 CPU hours are needed. To address this problem, some efforts have been made to use parallel computing to reduce simulation time for a group of simulations and there are some tools available that employ parallel computing. A tool developed by Zhang et al. [10] runs multiple instances of EnergyPlus in parallel on multiple machines specifically for parametric analysis where multiple design alternatives have to be analyzed simultaneously. Garg et al. [11] made efforts to speed up by dividing each annual simulation into 12 monthly simulations and running them on a parallel system. But parallel systems only reduce the computation time. They fail to address the problem of huge computational requirement.

Various machine learning techniques were employed in the past in this domain to reduce the computational expenditure. Artificial Neural Networks (ANN) are being investigated extensively for their pertinence to building energy concepts from a very long time. ANNs are used particularly in energy simulation and development of surrogate models. Bektas Ekici and Teoman Aksoy [12] used a three-layered feedforward ANN to predict heating energy of the building. Different configurations of form factors, transparency ratios and orientation angles along with their corresponding energy values were used to train the neural network. When compared to the calculated values, their ANN had a successful prediction rate of 94.8–98.5%. Yu et al. [13] developed a decision tree by considering 10 input parameters and predicted building energy demand levels with an accuracy of 93% for the training data and 92% for the test data. Their research aimed at building a simple, easily interpretable decision tree rather than using complex regression techniques and artificial neural networks.

Melo et al. [14] employed neural network based models to represent the interaction between building input parameters and the energy outputs. They experimented with several configurations of ANNs by taking nineteen input parameters. They were able to predict the output with errors of ±16% for a confidence level of 90% of the cases for the building stock of Brazil. Athanasios Tsanas and Xifara [15] studied the effects of eight diverse input variables including compactness, orientation and glazing properties to the heating load and cooling load of a building using Random Forests. They were able to predict the heating and cooling loads with a very minimal deviation of 0.5 points and 1.5 points respectively, from the simulated results. Amiri et al. [16] used a randomized approach to reduce the required number of simulations examining the whole design space. Monte Carlo simulation technique was used to generate combinations of design parameters, covering the full range for each climate region. A detailed analysis of various machine learning techniques that were employed in the past in the building energy scenario can be seen in Table 1, Table 2, Table 3. They are divided into different tables based on the data used for building the model. Unlike the present study, most of these models are used for studying aspects related to analysis of detailed building design. This paper aims at speeding up the process of simulating energy consumption for large set of options at a simple level. The present study uses data generated by EnergyPlus 8.6 for Hyderabad and Jaipur cities. Various methods are experimented on Jaipur dataset and tested for accuracy using Hyderabad.

Development of an optimized surrogate model must consider the clear definition of the objective for the model. The objective of this study is to optimize the task of finding energy consumptions for large sets of input combinations in early stage of design, with approximately 100,000 data points, in terms of both computation and time for a simplified five zone energy simulation model. The estimated statistical model should fit the EnergyPlus simulated values and their corresponding input vectors with as less mean error as possible. The novelty in this method lies in identifying and using top characteristic feature points as the training data, saving a lot of time in data generation.

Section snippets

Methodology

The methodology in the paper consists of five steps as shown in Fig. 1.

There can be many variables influencing the energy consumption at the time of early design phase of the building. However, envelope related parameters are most important as they connect indoor and outdoor environments directly. Garg vishal et al. [31] identified five variables that are important to consider for the building design. These variables are building orientation, Aspect Ratio, Window to wall ratio, glass type and

Details of inputs and simulation model

This study aims at speeding up the process of simulating large number of combinations for a simplified five zone building energy simulation model. The building model has a rectangular footprint and has four perimeter zones and a core zone as shown in Fig. 2. The dimensions of the floor plan are determined by the floor area and the aspect ratio. The energy consumption of about 100,000 input combinations are generated for both the cities. To generate the datasets for testing, a parallel computing

Sampling for training data

It is crucial to select the appropriate sample to train a regression model. The accuracy and validity of any regression model are highly dependent on the sample used for training it. Especially, when a large number of predictions are to be made with a model, it must be taken care that the training sample fittingly represents the original dataset. For this study, we have considered two methods of sampling.

Regression

Regression is the method of fitting the detailed simulation values to a statistical function. They can be used to predict a building's energy demand as a function of different variables. They can efficiently provide the precision of the building energy simulation software with running times of simplified linear models. Regression models have been used to predict building energies in multiple scenarios. Statistical regression techniques are found to perform on par with artificial neural networks

Results and discussion

Based on the sampling technique used, this section is divided into two sub-sections. As the clustering based sampling works with a training set of ~1% and the Domain knowledge based sampling works with a training set of ~2.5%, a direct comparison is not made for these two sampling techniques.

Conclusion and future work

This paper aims at building efficient, programmed sampling techniques that would fittingly predict energy consumption when trained on a regression model. We experimented with two different methods of sampling combined with various regression techniques to predict annual building energy consumption for two different cities. Our results show that the data sampled through k-means clustering is retaining the energy distribution and produced the most accurate results when the sampled data is trained

References (53)

M. Fesanghary et al.
Design of low-emission and energy-efficient residential buildings using a multi-objective optimization algorithm
Build. Environ.
(2012)
B.B. Ekici et al.
Prediction of building energy consumption by using artificial neural networks
Adv. Eng. Softw.
(2009)
Z. Yu et al.
A decision tree method for building energy demand modeling
Energy Build.
(2010)
S.S. Amiri et al.
Using multiple regression analysis to develop energy consumption indicators for commercial buildings in the U.S
Energy Build.
(2015)
Z. Wang et al.
Random Forest based hourly building energy prediction
Energy Build.
(2018)
L. Wang et al.
Adaptive learning based data-driven models for predicting hourly building energy use
Energy Build.
(2018)
B. Dong et al.
Applying support vector machines to predict building energy consumption in tropical region
Energy Build.
(2005)
S.-M. Hong et al.
A comparative study of benchmarking approaches for non-domestic buildings: part 1 – Top-down approach
Int. J. Sustain. Built Environ.
(2013)
D. Hawkins et al.
Determinants of energy use in UK higher education buildings using statistical and artificial neural network methods
Int. J. Sustain. Built Environ.
(2012)
S. Gou et al.
Passive design optimization of newly-built residential buildings in Shanghai for improving indoor thermal comfort while reducing building energy demand
Energy Build.
(2018)

C. Zhang et al.

An improved cooling load prediction method for buildings with the estimation of prediction intervals

Procedia Eng.

(2017)

Cited by (19)

Cluster analysis applied to obtaining reference models for building thermal performance studies
2024, Journal of Building Engineering
Cluster analysis has often been used to obtain reference models. However, using different methods in each analysis step may yield different results. Furthermore, the validation of the results is often neglected. This study aims to develop a cluster analysis method to obtain reference models for use in thermal performance studies of buildings from different clustering configurations. The procedure proposed is composed of six steps: (1) initial database composition, (2) data matrices formation, (3) cluster analysis, (4) determination of reference models, (5) methods validation and (6) results interpretation. Different data treatments, similarity measures and partitioning algorithms were combined in the cluster analysis to define the method with the best cluster formation. The weighting factor was the data treatment that most contributed to obtaining suitable clustering solutions. City-block, Euclidean Distance and Squared Euclidean Distance similarity measures resulted in suitable formations as well as Complete Linkage, Ward and K-means algorithms. Two clusters were obtained from the cluster methodology, for which two reference models were determined. Hypothesis tests showed that clusters differ for most performance indicators. It was concluded that the procedure proposed could identify the combination of methods that results in the best application of cluster analysis. The main contribution of this paper is to introduce a procedure for validating clusters. This procedure is an applicable technique to obtain reference models.
A comparative analysis of machine learning and statistical methods for evaluating building performance: A systematic review and future benchmarking framework
2024, Building and Environment
The utilization of machine learning (ML) techniques is increasingly prevalent in the domain of building performance evaluation. This trend is primarily driven by ML's capacity to capture intricate relationships between building attributes and performance metrics, such as energy consumption and comfort levels. However, the comparative merits of ML techniques and traditional statistical methods, such as linear and logistic regression, which are typically more cost-effective and interpretable, remains uncertain. This study presents a systematic comparison between ML and statistical methods in the assessment of building performance, considering factors such as model complexity, interpretability, required expertise, performance disparities, and computational costs. Findings indicate that, in most scenarios, ML techniques outperform statistical methods. Nevertheless, there are notable instances where statistical methods can compete, highlighting the context-dependent nature of technique selection. Furthermore, this research introduces a novel Python-based framework with a user-friendly spreadsheet interface designed for the evaluation and benchmarking of ML and statistical methods in research settings. The developed framework can be easily customized for ML evaluation and benchmarking in diverse fields, including production, logistics, supply chain management, and others.
A segmented evaluation model for building energy performance considering seasonal dynamic fluctuations
2023, Energy Conversion and Management
Carbon emissions of existing buildings in China have accounted for over 1/4 of the national total, and energy management in the building sector is influenced by energy performance evaluation, such as evaluation methods and indicators, accurate evaluation of building energy performance is crucial for achieving energy conservation and emission reduction. This study adopts medium (monthly)-long-term (multiyear) actual data, combines energy consumption characteristics and weather parameters, selects target variables with appropriate time granularity, segments the time series, strengthens the similarity of building characteristics within the segments and the practicability of adapting to climate change. Then XGBoost based on genetic algorithm optimization is used to construct an energy consumption prediction model within each segment, forming a comprehensive building performance evaluation method with multiple segments, two dimensions and multiple indicators. The results show that the segmented model has higher accuracy, fewer discrete points, more representative and targeted important features, and the R² is up to 0.85. The model can provide a basis for managers to set energy consumption quotas, identify low-performance buildings, and improve energy efficiency.
Effect of adaptive intelligent sampling and machine-learning emulators in surrogate energy modeling of architectural massing
2023, Journal of Building Engineering
In recent building studies, data-driven surrogate modeling is often employed to replace the computationally expensive physics-based energy performance simulation. However, importance of data sampling in predictive model training has been underrepresented, although attaining higher accuracy with fewer samples is crucial during the early building design stages. To fill this research gap, this paper compares conventional non-adaptive (random) and adaptive/informative sampling techniques in machine-learning (ML) surrogates of architectural massing—initial building shapes and energy/daylight. By measuring R-square (R²), processing time, and solution robustness, experiment results showed that model accuracy highly depended on sampling method choice and shading surface features were among the most important influencing geometry parameters. The low discrepancy random sequence performed best for non-adaptively sampled cases (R² < 0.98), whereas the k-fold ML performed best for adaptive sampling (R² > 0.98). Our findings suggest that intelligent sampling increases prediction quality, reducing training data sizes in the construction of building energy and daylight simulation surrogates.
Buildings' energy consumption prediction models based on buildings’ characteristics: Research trends, taxonomy, and performance measures
2022, Journal of Building Engineering
Citation Excerpt :
Also, R2 shows the amount of variation in the predicted variable that can be explained by the independent variables. Other measures that are more difficult to interpret were rarely used: Spearman's Rho (r) [44], (Cc) [45], (NMAE) [43], and (GCV) [23]. Predictive modeling based on buildings' characteristics is a promising approach for estimating buildings’ energy consumption.
Building's energy consumption prediction is essential to achieve energy efficiency and sustain-ability. Building's energy consumption is highly dependent on buildings' characteristics such as shape, orientation, roof type among others. This paper offers a systematic literature review of studies that proposed building's characteristics based energy consumption prediction models. In particular, the paper reviews the types of buildings, their characteristics, the type of energy predicted, the dataset, the artificial intelligence (AI) methods used for energy consumption prediction, and the implemented research evaluation performance measures. The review findings show that a small number of studies consider buildings' characteristics as predictors for energy consumption. Most of the studies use historical energy consumption data, i.e., time-series data, to predict future buildings' energy consumption. The present study contributes a new taxonomy of the most common AI methods used for energy consumption predictions based on buildings' characteristics. The study also provides a comparative analysis of the different AI methods in terms of their contributions regarding the prediction of energy consumption. The review identifies research gaps in the existing studies, which is used to highlight future research directions.
An optimal surrogate-model-based approach to support comfortable and nearly zero energy buildings design
2022, Energy
Citation Excerpt :
Facing this kind of optimization problems, metamodeling could be the only way to reduce the computation time of the whole process [32]. Table 1 presents a brief review of existing studies in this research axis [33–35], aiming at developing new solving approaches that require as few simulations as possible to build surrogate models. The move towards energy efficient buildings has become effective for new residential buildings in certain regions, such as California, under the Building Efficiency Standards (Title 24) beginning in January 2020 [36].
The shift from conventional buildings to the so-called Nearly Zero Energy Buildings (NZEBs) is becoming one of the major contemporary challenges in the world. In this work, a multi-objective optimization approach, based on a smart surrogate model, has been developed to minimize the energy consumption, improve the thermal comfort of the occupants and increase the energy self-sufficiency of residential buildings. For this purpose, two main phases have been considered: the first one is related to the development of the surrogate model, based on machine learning utilities, in particular Artificial Neural Networks (ANNs), and the second is related to the optimization process, performed by means of the Multi-Objective Particle Swarm Optimization algorithm (MOPSO). This approach has been applied to a typical Moroccan building, Ground Floor + First Floor (GFFF), in different regulatory climate zones. The results show that the approach was successfully implemented using TRNSYS, Matlab and other numerical simulation tools, leading to different solutions in terms of building envelope design. The best-fit solution achieved a huge improvement potential in most climate zones, averaging about 75%, 50% and 85% respectively for energy consumption, thermal comfort and energy self-sufficiency of the studied building. Finally, we strongly recommend this approach to the various stakeholders in this field, including designers, engineers, architects, consulting firms, etc., since the results have proven its effectiveness as a very promising step towards designing Comfortable and Nearly Zero Energy Buildings. Future work will focus on the implementation of a hardware device that is able to perform all the steps of the proposed framework for possible pre-project optimizations.

View all citing articles on Scopus

View full text

Development of a surrogate model by extracting top characteristic feature vectors for building energy prediction

Highlights

Abstract

Introduction

Section snippets

Methodology

Details of inputs and simulation model

Sampling for training data

Regression

Results and discussion

Conclusion and future work

Build. Environ.

Adv. Eng. Softw.

Energy Build.

Energy Build.

Energy Build.

Energy Build.

Energy Build.

Int. J. Sustain. Built Environ.

Int. J. Sustain. Built Environ.

Energy Build.

Appl. Energy

Energy Policy

Energy Build.

Appl. Energy

Energy Build.

Energy Build.

Energy Build.

Energy Build.

Anal. Chim. Acta

Renew. Sustain. Energy Rev.

Appl. Energy

Energy Build.

Energy

Appl. Energy

Appl. Energy

Procedia Eng.