Development of a surrogate model by extracting top characteristic feature vectors for building energy prediction

https://doi.org/10.1016/j.jobe.2018.12.018Get rights and content

Highlights

  • Determination of annual energy consumption of 100 thousand input combinations using machine learning techniques.

  • Verification of the predicted energy consumption with EnergyPlus simulated energy consumption.

  • Usage of efficient sampling techniques to increase the accuracy of the machine learning models.

  • Demonstration of this study on two Indian cities namely, Jaipur and Hyderabad.

Abstract

In early stage of building design, design team has to consider and simulate energy consumption for several combinations of various input parameters to analyze the building energy consumption. In a scenario considering five parameters, each with ten variations, one has to simulate hundred thousand combinations. It requires a lot of computation to simulate energy consumption for all the input combinations. This paper aims at reducing the computation required to compute the energy consumption of all the combinations. This is done by identifying appropriate training samples, computing their energy consumption using EnergyPlus and estimating energy consumption of the rest of the data using machine learning techniques. This paper presents two sampling methods along with various regression techniques to predict energy consumption of a building in the early phase. It involves usage of efficient sampling methods for identifying the training data. The key contribution of this method of surrogate modeling is saving a lot of computation by reducing the computation by ~100-fold. This method is tested for Jaipur and Hyderabad cities of India. Approximately hundred thousand simulations are performed for each location using parallel computation. By simulating approximately one percent of the input combinations, annual energy consumption for the large set of combinations are predicted using SVR and k-means clustering for Jaipur with accuracy greater than 93% for 99.8% of the input combinations. When the same model is trained for Hyderabad, it produced accuracy greater than 93% for 98% of the input combinations.

Introduction

Buildings consume 27% of total US delivered energy in the United States. Energy delivered to the building sector is expected to grow 0.3%/year from 2017 to 2050 [1]. As per the report Global Construction 2030, the volume of construction output would grow by 85% to $15.5 trillion worldwide by 2030, with China, US and India, leading the way and accounting for 57% of all global growth [2]. The construction industry is growing rapidly and there is need to incorporate energy efficiency. Much of the building energy use is wasted because of “poor design, inadequate technology and inappropriate behaviors” [3].

In the year 2015–16, of the total electrical energy consumption in India, buildings accounted for 32.45%. This is a considerably big number following right after the industry sector (42.30%) [4]. Hence, for a developing country like India, it is very important to employ methods to improve the energy footprint and efficiency of the buildings sector.

Building design has various phases which include predesign, schematic design, design development, construction documents, construction administration, procurement and operation. In early design phase of a building construction, it is easy and inexpensive to make significant design changes in order to arrive at the right solution. Building Energy Modeling (BEM) is a crucial operation in predesign and schematic design phases, with a potential to optimize the building energy consumption by a decent margin. In early stage of the design a simplified energy simulation model can be used and envelope related parameters are important to consider. Finding the optimum design choices is important for a building. Many methods are implemented in the past to identify the low cost design choices. Mostavi et al. [5] built a framework to build a multiobjective design optimization tool. They used the building envelope as the main component for optimization. Fensanghary et al. [6] achieved the optimized solutions by taking pareto optimal solutions into account. BEM plays a key role in finding an energy optimized configuration for the building. This usually requires testing of a large number of configurations using dynamic Building Energy Simulation (BES) programs such as EnergyPlus [7], IES-VE [8], eQuest [9]. For example, in a scenario studying energy consumption over five variables, with ten values for each variable, it requires studying 100 thousand different input combinations. Each combination needs to be simulated using BES tool to compute the effect on energy consumption. This task is computationally very expensive.

This task is not only computationally expensive but also demands a lot of time. EnergyPlus software takes around 150 s on a regular personal computer to complete one energy simulation for a simple five zone model. To perform 100 thousand energy simulations, total approximately 1666 CPU hours are needed. To address this problem, some efforts have been made to use parallel computing to reduce simulation time for a group of simulations and there are some tools available that employ parallel computing. A tool developed by Zhang et al. [10] runs multiple instances of EnergyPlus in parallel on multiple machines specifically for parametric analysis where multiple design alternatives have to be analyzed simultaneously. Garg et al. [11] made efforts to speed up by dividing each annual simulation into 12 monthly simulations and running them on a parallel system. But parallel systems only reduce the computation time. They fail to address the problem of huge computational requirement.

Various machine learning techniques were employed in the past in this domain to reduce the computational expenditure. Artificial Neural Networks (ANN) are being investigated extensively for their pertinence to building energy concepts from a very long time. ANNs are used particularly in energy simulation and development of surrogate models. Bektas Ekici and Teoman Aksoy [12] used a three-layered feedforward ANN to predict heating energy of the building. Different configurations of form factors, transparency ratios and orientation angles along with their corresponding energy values were used to train the neural network. When compared to the calculated values, their ANN had a successful prediction rate of 94.8–98.5%. Yu et al. [13] developed a decision tree by considering 10 input parameters and predicted building energy demand levels with an accuracy of 93% for the training data and 92% for the test data. Their research aimed at building a simple, easily interpretable decision tree rather than using complex regression techniques and artificial neural networks.

Melo et al. [14] employed neural network based models to represent the interaction between building input parameters and the energy outputs. They experimented with several configurations of ANNs by taking nineteen input parameters. They were able to predict the output with errors of ±16% for a confidence level of 90% of the cases for the building stock of Brazil. Athanasios Tsanas and Xifara [15] studied the effects of eight diverse input variables including compactness, orientation and glazing properties to the heating load and cooling load of a building using Random Forests. They were able to predict the heating and cooling loads with a very minimal deviation of 0.5 points and 1.5 points respectively, from the simulated results. Amiri et al. [16] used a randomized approach to reduce the required number of simulations examining the whole design space. Monte Carlo simulation technique was used to generate combinations of design parameters, covering the full range for each climate region. A detailed analysis of various machine learning techniques that were employed in the past in the building energy scenario can be seen in Table 1, Table 2, Table 3. They are divided into different tables based on the data used for building the model. Unlike the present study, most of these models are used for studying aspects related to analysis of detailed building design. This paper aims at speeding up the process of simulating energy consumption for large set of options at a simple level. The present study uses data generated by EnergyPlus 8.6 for Hyderabad and Jaipur cities. Various methods are experimented on Jaipur dataset and tested for accuracy using Hyderabad.

Development of an optimized surrogate model must consider the clear definition of the objective for the model. The objective of this study is to optimize the task of finding energy consumptions for large sets of input combinations in early stage of design, with approximately 100,000 data points, in terms of both computation and time for a simplified five zone energy simulation model. The estimated statistical model should fit the EnergyPlus simulated values and their corresponding input vectors with as less mean error as possible. The novelty in this method lies in identifying and using top characteristic feature points as the training data, saving a lot of time in data generation.

Section snippets

Methodology

The methodology in the paper consists of five steps as shown in Fig. 1.

There can be many variables influencing the energy consumption at the time of early design phase of the building. However, envelope related parameters are most important as they connect indoor and outdoor environments directly. Garg vishal et al. [31] identified five variables that are important to consider for the building design. These variables are building orientation, Aspect Ratio, Window to wall ratio, glass type and

Details of inputs and simulation model

This study aims at speeding up the process of simulating large number of combinations for a simplified five zone building energy simulation model. The building model has a rectangular footprint and has four perimeter zones and a core zone as shown in Fig. 2. The dimensions of the floor plan are determined by the floor area and the aspect ratio. The energy consumption of about 100,000 input combinations are generated for both the cities. To generate the datasets for testing, a parallel computing

Sampling for training data

It is crucial to select the appropriate sample to train a regression model. The accuracy and validity of any regression model are highly dependent on the sample used for training it. Especially, when a large number of predictions are to be made with a model, it must be taken care that the training sample fittingly represents the original dataset. For this study, we have considered two methods of sampling.

Regression

Regression is the method of fitting the detailed simulation values to a statistical function. They can be used to predict a building's energy demand as a function of different variables. They can efficiently provide the precision of the building energy simulation software with running times of simplified linear models. Regression models have been used to predict building energies in multiple scenarios. Statistical regression techniques are found to perform on par with artificial neural networks

Results and discussion

Based on the sampling technique used, this section is divided into two sub-sections. As the clustering based sampling works with a training set of ~1% and the Domain knowledge based sampling works with a training set of ~2.5%, a direct comparison is not made for these two sampling techniques.

Conclusion and future work

This paper aims at building efficient, programmed sampling techniques that would fittingly predict energy consumption when trained on a regression model. We experimented with two different methods of sampling combined with various regression techniques to predict annual building energy consumption for two different cities. Our results show that the data sampled through k-means clustering is retaining the energy distribution and produced the most accurate results when the sampled data is trained

References (53)

  • X. Chen et al.

    A multi-stage optimization of passively designed high-rise residential buildings in multiple building operation scenarios

    Appl. Energy

    (2017)
  • A.P. Melo et al.

    Development of surrogate models using artificial neural network for building shell energy labelling

    Energy Policy

    (2014)
  • A.P. Melo et al.

    A novel surrogate model to support building energy labelling system: a new approach to assess cooling energy demand in commercial buildings

    Energy Build.

    (2016)
  • R.E. Edwards et al.

    Constructing large scale surrogate models from big data and artificial intelligence

    Appl. Energy

    (2017)
  • A. Rahman et al.

    Predicting fuel consumption for commercial buildings with machine learning algorithms

    Energy Build.

    (2017)
  • A. Tsanas et al.

    Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools

    Energy Build.

    (2012)
  • T. Catalina et al.

    Development and validation of regression models to predict monthly heating demand for residential buildings

    Energy Build.

    (2008)
  • S. Asadi et al.

    On the development of multi-linear regression analysis to assess energy consumption in the early stages of building design

    Energy Build.

    (2014)
  • M. Daszykowski et al.

    Representative subset selection

    Anal. Chim. Acta

    (2002)
  • H.X. Zhao et al.

    A review on the prediction of building energy consumption

    Renew. Sustain. Energy Rev.

    (2012)
  • W. Chung et al.

    Benchmarking the energy efficiency of commercial buildings

    Appl. Energy

    (2006)
  • B. Howard et al.

    Spatial distribution of urban building energy consumption by end use

    Energy Build.

    (2012)
  • G.K.F. Tso et al.

    Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural networks

    Energy

    (2007)
  • K. Kavaklioglu

    Modeling and prediction of Turkey's electricity consumption using support vector regression

    Appl. Energy

    (2011)
  • Q. Li et al.

    Applying support vector machine to predict hourly cooling load in the building

    Appl. Energy

    (2009)
  • C. Zhang et al.

    An improved cooling load prediction method for buildings with the estimation of prediction intervals

    Procedia Eng.

    (2017)
  • Cited by (19)

    • Buildings' energy consumption prediction models based on buildings’ characteristics: Research trends, taxonomy, and performance measures

      2022, Journal of Building Engineering
      Citation Excerpt :

      Also, R2 shows the amount of variation in the predicted variable that can be explained by the independent variables. Other measures that are more difficult to interpret were rarely used: Spearman's Rho (r) [44], (Cc) [45], (NMAE) [43], and (GCV) [23]. Predictive modeling based on buildings' characteristics is a promising approach for estimating buildings’ energy consumption.

    • An optimal surrogate-model-based approach to support comfortable and nearly zero energy buildings design

      2022, Energy
      Citation Excerpt :

      Facing this kind of optimization problems, metamodeling could be the only way to reduce the computation time of the whole process [32]. Table 1 presents a brief review of existing studies in this research axis [33–35], aiming at developing new solving approaches that require as few simulations as possible to build surrogate models. The move towards energy efficient buildings has become effective for new residential buildings in certain regions, such as California, under the Building Efficiency Standards (Title 24) beginning in January 2020 [36].

    View all citing articles on Scopus
    View full text