An overview of machine learning applications for smart buildings

.


Introduction
Conventional automation systems are challenged by the increasing complexity of the built environment . The development of sustainable buildings, cities and societies presumes the energy systems' ability to cope with increased penetration of intermittent renewable energy resources in the conditions of strict demand of energy efficiency, flexibility, and resilience (e.g., Aduda, Labeodan, Zeiler & Boxem, 2017). On the other hand, buildings need a capability to adapt to changing boundary conditions (e.g., user's needs, changing climate and fluctuating grid prices) (Al Dakheel, Del Pero, Aste and Leonforte (2020;EPBD Recast (2010)). The variables related to operational environment, such as occupancy patterns, are subject to unpredictable changes, which calls for the system-level ability of quick and autonomous learning from experience that originates from outside historical datasets recorded during a long period of time (Xie et al., 2021).
Overcoming the above challenges is possible due to the rapid evolution of information and communication technologies (ICT) and building energy management systems (BEMS) plus that of the concepts 'intelligent building' (IB) and 'smart building' (SB) (Al Dakheel et al. (2020); )). A shift towards the implementation of artificial intelligence (AI) trained by machine learning algorithms is recognized as one of the major trends of development (Karpook, 2017). Given the complexities related to the operational environment, the machine learning techniques 'reinforcement learning (RL)' and its derivative 'deep reinforcement learning (DRL)' have been experienced useful for the autonomous control networks of buildings .
Quite a few review articles have been published with various perspectives on smart buildings. A quick look at the most relevant review articles in the field reveals that most of them focus on issues such as hardware technologies, monitoring, forecasting, modelling, building energy management, and applications of machine learning (Alawadi et al. (2020); Djenouri, Laidi, Djenouri and Balasingham (2019) ;Li, Lu, Yan, Xiao and Wu (2021) ;Petrosanu, Carutasu, Carutasu and Pîrjan (2019)). Here, learning is referred to as a single, data-driven process Abbreviations: AI, artificial intelligence; AGI, artificial general intelligence; ANN, artificial neural network; BAS, building automation system; BD, big data; BEMS, building energy management system; BI, building intelligence; BIM, building information modelling; BIQ, building intelligence quotient; DRL, deep reinforcement learning; DT, digital twin; EPBD, energy performance of buildings directive; HVAC, Heating, ventilation and air-conditioning; IB, Intelligent buildings; ICT, Information and communication technologies; IoT, Internet of things; KPI, Key performance Indicator; LAI, Learning ability index; LM, Linear (regression) model. based on the analysis of simulated or measured occupancy or system status data (e.g., Jazizadeh, Ghahramani, Becerik-Gerber, Kichkaylo and Orosz (2014), Marinaki et al. (2013)). Machine learning applications have been reviewed in terms of predicting occupancy and window-opening behaviours (Dai, Liu & Zhang, 2020), self-tuned indoor thermal environments , occupancy estimation (Amayri, Ploix, Bouguila & Wurtz, 2020), and building energy efficiency (Merabet et al. (2021); Wang et al. (2021)).
The existing reviews commonly provide a specific, technological perspective without a vision for when and how smart technologies should be integrated (Khan, Seo and Kim (2020) ;Stopps, Huchuk, Touchie and O'Brien (2021)). Qolomany et al. (2019) takes a step towards a more holistic vision by reviewing smart buildings jointly from the perspectives of application, data analytics, and machine learning. Yet the learning ability as a feature of smart buildings is interpreted narrowly, without addressing issues such the roles of human and AI agents plus training environments and their impacts on the learning process in complex and abruptly changing operational environments. Moreover, there is a lack of reviews describing the training of autonomous, building-integrated AI applications capable of independent decision-making.
Thus, the first part of this article is dedicated to the discussion of the buildings' learning ability in general, including an overview of the concepts, learning goals, AI training methods, and training environments. The second part is an overview of reported machine learning applications, focusing on autonomous AI agents that make independent decisions for building energy management. Specifically, we present applications based on (deep) reinforcement learning (RL), their reward mechanisms, training data, and the training environments in various application domains.
This article is expected to be a useful source of information and ideas for further research for engineers and research scientists who develop and design autonomous AI systems for smart buildings and communities.

Learning ability as a feature of buildings
The discussion of building intelligence and the learning ability of buildings originates from the idea of integrated buildings and machines, which, in turn, can be considered to have its earliest roots in the year 1923, when architect Le Corbusier characterized a house as 'a machine for living in' (Le Corbusier, 1923). The evolution of brain research, cognitive science, and computer science gradually resulted in the use of the word 'intelligence' with an aim to address the increased ability of artificial systems to operate autonomously, whereas 'intelligent buildings' were introduced in the scientific literature in 1990s (Derek & Clements-Croome, 1997). From the beginning of 2000s, the need to emphasize the interaction between humans and machines was acknowledged. Himanen (2003), for example, stated that building intelligence may refer to intelligence 'imprinted into an inorganic object (such as a building) by human intelligence'.
Later, the term 'smart building' (SB) was adopted in the Energy Performance of Buildings Directive (EPBD) to promote the energy efficiency of buildings (Al Dakheel et al. (2020); EPBD Recast (2010))). More recently, Albino, Berardi and Dangelico (2015) as well as  concluded that while 'intelligence' refers to the diffusion of ICT in the infrastructure, 'smartness' entails the building's interaction with people and community and the 'smart' feature particularly aims at improving the system's interaction with humans. Since there is no standard definition of a smart or intelligent building, these concepts are often treated as synonymous with each other . The next step is 'cognitive building', which is rooted in the term 'cognition', i.e., the process of acquiring knowledge and understanding through thought, experience, and senses. A 'cognitive building' can be considered an enhanced version of smart buildings, since it includes more intelligence in the loop (Pasini et al. (2016);Xu, Lu, Xue and Chen (2019)). Here, the term cognition particularly refers to cognitive computing (modelling human thinking process in complex and uncertain situations), which is integrated with an aim to better manage buildings (Ploennigs, Ba & Barry, 2018).
The ability to learn can be considered one of the cognitive building features, since cognitivism refers to acquiring knowledge and skills (Gross (2010); Masethe, Masethe and Odunaike (2017)). In terms of buildings, learning is a process that particularly relies on the implementation of machine learning (ML) algorithms that mimic human learning (Karpook, 2017). So far, the learning applications reported in the literature are mainly human-initiated, but the learning process can be also initiated by artificial intelligence through autodidactic functions (Albino et al., 2015).
Learning is about continuous adaptation based on experience so that non-favourable decisions will not be repeated. The decisions may be based on historical relationships and trends in a given set of data and they should be proactive rather than reactive (Nie, Xu, Cheng and Yu (2019); Van Offeren (2020)). Systemic antifragility can be also considered learning, when the adaptation is based on disorder such as mistakes or failures (De Florio (2014); Taleb and Douady (2013)).
Mofidi and Akbari (2020) associate learning with issues such as occupants' preferences and behaviour, occupancy patterns, productivity, indoor environmental preferences, and adaptive behaviour and thermal behaviour of the building and its environment. Lê, Nguyen and Barnett (2012) refer to learning as the building's ability to predict and satisfy the needs of users and to adapt to the stress from the external environment. To that end, shared learning between humans, machines, and buildings is necessary and it is realized through 'people-literate technology', i.e., technology with 'an ability to put human intentions into context by including multisensory and multi-touchpoint interfaces like wearables and advanced computer sensors' (Panetta, 2019).
Shared learning can be also considered organizational learning, i.e., a process, which is expected to yield a cultural change within an organization (= building + occupants + processes) (Argyris & Schön, 1978). Here, the role of artificial intelligence is to facilitate organizational learning. The reported applications vary from the resource management of a warehouse (Zhang, Pee & Cui, 2021) to AI-assisted maintenance of renewable energy systems (Shin, Han & Rhee, 2021).

Learning goals and the assessment of learning ability
The term 'learning goal' (aka 'learning objective' or 'learning outcome') refers to a measurable skill or knowledge that a learner (here: AI) is expected to have after being trained in a learning process (Bloom, Engelhart, Furst, Hill & Krathwohl, 1956).
Practical examples of how AI training may improve the building automation system's (BAS) operation with respect to various functions are mentioned in Table 1 using the above classification.
The examples in Table 1 make it visible that misinterpreted data (e. g., temperature data set with gaps) is a likely reason behind biased operation. In situations, where the BAS must quickly adapt to changing operational environment, the data acquisition and management abilities with respect to routinely monitored variables (e.g., temperatures, energy demands, occupancy detection) play a crucial role.
On the other hand, quick adaptation benefits from direct feedback from users through occupant-building interaction (Stopps et al., 2021). Carreira, Costa, Mansur and Arsénio (2018), for example, propose a 'learning occupant-centric control (OCC) system', where user feedback is acquired through mobile devices. The OCC system combines historical sensor data and occupants' preferences with an aim to learn an appropriate HVAC control configuration. Here, the decreased number of votes expressing discomfort demonstrates the performance of the OCC system before and after training.
To assess the learning ability with respect to various learning goals, Alanne (2021) proposes buildings' learning ability to be quantified as the growth of the building intelligence function (BI). Here, BI is a function of n key performance indicators (KPIs), which, in turn, are functions of time (t). Now, 'learning' can be understood as seeking the best available performance with respect to specific KPIs (i.e., optimization of given objective function) over a given period of time. To that end, Alanne (2021) introduces a novel performance indicator, Learning Ability Index (LAI), which is a single, dimensionless number between zero and one. Since the value of the LAI is bound to time invested in training, this method is useful for assessing a building's learning performance also in dynamic and unpredictable operational environments.
In Alanne's method, learning is measured using the value of KPI recorded at two moments in time (pre-test before and post-test after the learning process) and the performance is assessed using a novel learning ability index (LAI), which varies between zero and one so that zero represents the starting level (pre-test), and the full match between a predefined learning goal and reality (post-test) is awarded by LAI = 1. Alanne (2021) also states that the quality (and also diversity) of training data determines at which level a learning goal can be achieved. Given the quantitative essence of training data in smart building applications, it is essential that experimental and computational data match the reality as accurately as possible. Therefore, Alanne (2021) suggests that the quality of training data (and thus the quality of learning) should be considered by way of a separate correction coefficient (between 0 and 1) when assessing the building's learning ability by way of the learning ability index (LAI). Here, for example, the measurement error of +/−2% would result in the quality factor of 0.98.
The elicitation of learning goals and useful KPIs is, of course, casespecific. To that end, several generic assessment frameworks have been proposed (Chen, Clements-Croome, Hong, Li & Xu, 2006). Tools such as the New Intelligent Building Index (IBI) (Chow (2005)) and the Honeywell Smart Building Score™ (Honeywell, 2016) have been suggested to measure building's intelligence with respect to several criteria. Al Dakheel et al. (2020) present a table of 34 KPIs, where the indicators have been divided into four sub-sets (nearly zero-energy targets, flexibility, monitoring, interaction with users).
The work of Candanedo et al. (2018) presents an example, where the root mean square error (RMSE) is used as the KPI for assessing the learning ability of linear regression (LM) and random forest (RF) algorithms when predicting average indoor temperatures on the basis of incomplete data. The expectation is that after the training, the RMSE should be as close to zero as possible and, correspondingly, the value of LAI should be close to one. Mosavi et al. (2019) propose the correlation coefficient (r) (i.e., the difference (error) between the training data and the output of the trained model) for the same purpose.

Components of the AI training
Technically, a building-integrated AI is an advanced software implemented in a building automation system (BAS). To enable the AI training, the BAS needs to have related software, hardware, protocols and standards for the communication (e.g., C-Bus, LonWorks etc.) (Gholamzadehmir, Del Pero, Buffa, Fedrizzi & Aste, 2020).
The main sources of training data are a history database and preferably an access to other external databases (e.g., weather, energy, and cost databases) (Martín-Lopo, Boal & Sánchez-Miralles, 2020). A connection with the Internet of Things (IoT) enables data transfer over the Internet between the BAS and potentially any component from light bulbs to presence sensors. It also offers an access to big data (BD), which is above all weather data or sensor data (Daissaoui, Boulmakoul, Karim & Lbath, 2020), but may also include various other items such as hyperlocal and site-specific data, street view images or building codes (Mehmood, Chun, Han, Jeon & Chen, 2019). Data on occupants and their behaviour can be acquired through mobile and wearable technology, including their feedback, health, emotions, mobility, social media activity etc. (Zhang & He, 2020).
Typical of BD is that data mass is growing continuously, which enables learning from experience. The AI training may be based on correlations and patterns recognized employing a data analysis technique known as data mining (Zhao, Zhang, Zhang, Wang & Li, 2020) or machine learning, which this article focuses on.
A broad categorization of machine learning can be made into supervised, unsupervised and reinforcement learning. In supervised learning, a significant set of historical data is used to train a mapping from independent variables to a dependant variable that is being predicted. In a building context, the independent variables are often sensor measurements. The predictions are generally either time-series forecasts or classifications. An example of the former is building energy consumption forecasting. An example of the latter is diagnosing the type of fault that has occurred in an HVAC equipment. Supervised learning requires an extensive training set of data for the independent variables, as well as the corresponding correct values for the dependant variables. These correct values are known as labels. After the training process, a mapping from the independent variables to the dependant variable has been established, so the model is able to receive a previously unseen combination of independent variables and predict the value of the dependant variable (Alawadi et al., 2020).
Unsupervised learning can be applied when a labelled training set does not exist. A common application of unsupervised learning is anomaly detection. The machine learning model is trained with data in normal conditions, after which it is able to detect whether the system is in a normal condition or not. However, it will not be able to diagnose the type of failure condition (Mirnaghi & Haghighat, 2020).
Supervised and unsupervised approaches are suitable for observing and predicting, but less well suited for adjusting, managing, and interacting. For these latter approaches, the reinforcement learning technique is suitable Perera & Kamalaruban, 2021). Key concepts of reinforcement learning are in italics in Fig. 1, with typical building sector examples in parenthesis. Instead of labelled data as in supervised learning, a special purpose environment is constructed for the reinforcement learning agent. The agent takes actions that impacts the environment and gets feedback in the form of a reward that quantifies whether the impact was beneficial or not. Based on the feedback, the agent gradually learns to take actions with beneficial outcomes. The BAS recognizes unexpectedly high a room temperature.

Predict
The BAS mispredicts the building's energy demand for the next 24 h.
The BAS is able to predict the energy demand with a reasonable accuracy. Adjust The BAS follows a pre-defined control strategy without considering unpredicted changes in the occupancy.
The BAS introduces a new control strategy on the basis of up-to-date occupancy data.

Manage (data)
The BAS mispredicts the energy demand due to the gaps in acquired weather data.
The BAS predicts the energy demand accurately due to a reconstructed weather data set. Interact (with humans) The BAS calls service without a reason due to a misinterpreted temperature.
The BAS recognizes an unexpected temperature and calls the service if needed.
Additionally, the environment provides state information, which the agent uses to select the action. The applications to these concepts to building energy management are discussed further in Section 3. Commonly, AI training is not a continuous process, but its essence is pre-optimization, where algorithm-specific variables (e.g., weight factors of an artificial neural network, ANN) are fixed to yield a certain output from a fixed set of training data (e.g., Gharehbaghi, Nguyen, Farsangi & Yang, 2020). Here, the AI is not able to adapt to any situation other than one included in the training data in the pre-optimization phase. When exposed to anomalies, the AI repeats an erroneous action until a new training with extended data is performed. Therefore, the model training must take place on either a regular basis or on the basis of demand (e.g., user feedback, initiation by automation system).An autonomous building should be also capable of both identifying the possibilities to improve its performance and initiating and automatically running appropriate learning processes. To create a self-taught AI calls for the implementation of autodidactic functions (Albino et al., 2015). Here, a database provides a platform for the AI, whereas the learning process continuously utilizes available data, unassisted by humans (Bailey, 2020).
An autodidactic algorithm can be realized under the reinforcement learning (RL) paradigm, where the AI is set into an unknown environment (data set) and either rewarded or penalized according to its actions. The AI aims at maximizing the cumulative reward, wherefore it (in theory) evolves endlessly (e.g., Dey, 2016). Deep Reinforcement Learning (DRL) incorporates deep learning (DL) into the algorithm. Deep Reinforcement Learning (DRL) has been mentioned as a potential direction for future research for coping with the complex decisions related to multi-energy systems (Hassan, Acharya, Chertkov, Deka and Dvorkin (2020)); Ye, Qiu, Wu, Strbac and Ward (2020)). An example of an autodidactic approach is given in the work of McAleer, Agostinelli, Shmakov and Baldi (2018) who have developed a reinforcement learning algorithm called Autodidactic Iteration that teaches itself to solve the Rubik's Cube with no human assistance. To our best knowledge, autodidactic training has not been applied so far in the field of building automation, anyway.
The present-day AI is called narrow or weak AI, since the AI can specialize in one task only (Mehmood et al., 2019). Instead, artificial general intelligence (AGI) (aka strong AI) is capable of performing various tasks similarly as humans (e.g., Bołtuć, 2020). To that end, techniques such as program synthesis (Reynolds, 2017), neuro-morphic computing (Mizutani, Ueno, Arakawa & Yamakawa, 2018) and the merger of biology and technology (Dambrot (2020); Pisarchik, Maksimenko and Hramov (2019)) have been suggested. The aforementioned approaches are at the level of basic research, however, and their applications have not been reported so far in the field of building automation.
Attempts to imitate AGI-like qualities in machine learning have been reported. For example, several weak AI agents may be connected together to solve a particular computational problem (a multi-agent system) or to store knowledge while solving one problem and to apply it to a different but related problem (transfer learning) (Wang et al., 2017). Applications of these methods also has been demonstrated for smart buildings (e.g., smart-grid interaction by Labeodan, Aduda, Boxem and Zeiler (2015), temperature prediction by Grubinger et al. (2017) and prediction of energy use by Pinto, Praça, Vale and Silva (2020)). Transfer learning in multi-agent systems has been identified as a potential avenue of research, but this field is still maturing (Da Silva & Costa, 2019).
Finally, we point out that the reported machine learning applications often focus on tasks where the AI does not make decisions on the basis of what it has learned, such as predicting different variables with improved accuracy (e.g., Candanedo et al., 2018). Instead, when the AI actuates single processes (e.g., HVAC, lighting), it can be classified as an autonomous AI at the lower hierarchy level of the BAS. If it redefines the setpoints to change the control strategy (e.g., to achieve demand response), it is an autonomous AI at the upper hierarchy level of the BAS. To support these decisions, an AI may utilize user feedback as training data, as exemplified by Carreira et al. (2018)).

Training environment
Training environments for AI may be virtual, physical or a hybrid of virtual and physical environments. Here, virtual training environments often refer to whole-building simulations, where all the building's physical components and their functions have mathematical counterparts. AI training may take place completely on the basis of sensor data (e.g., smart controller) or in its natural operational environment (e.g., kitchen robot). A hybrid of mathematical models and physical components has been described, for example, by Kilpeläinen, Lu, Cao, Hasan and Chen (2018). In the work of Lu et al. (2020), the accuracy of the whole-building simulation is enhanced through re-calibration of the building model on the basis of continuously updating set of sensor data.
One of the most promising virtual training environments is digital twin (DT), which is defined as "a digitalized version of a physical object" (i.e., building and its systems) (Mathupriya, Saira Banu, Sridhar & Arthi, 2020). The DT may be constructed as an extremely detailed whole-building simulation model and building information modelling (BIM) is employed as a major product database (e.g., Boje, Guerriero, Kubicki and Rezgui (2020), Sacks, Girolami and Brilakis (2020)). Furthermore, DT includes an interface between the physical system and its virtual counterpart, e.g., building energy management system (Agouzoul et al. (2021); Koulamas and Kalogeras (2018)). Given that combined data from both physical and soft (virtual) sensors can be synchronized in real time, the DT allows a quick and reliable building performance prediction as a response to changes in operational variables (e.g., Srinivasan, Manohar & Issa, 2020).

Material and methods
The review was initiated by quick searches with different combinations of keywords in various Internet databases were conducted to obtain an overall conception of the relevant literature. The suggestions by the search engine and the reference lists in the found publications were utilised, when applicable. To narrow down the scope, the search was supplemented with four (4) searches in the field 'Abstract, Title, Keywords' in the ScienceDirect database using its own search engine. In the first place, the subject areas were not restrained (Search 0). The rest of searches (1-3) included the subject areas 'engineering' and 'energy' only. The search procedure (ScienceDirect) and the number of hits is shown in Table 2. The development of research activity is in Fig. 2.
The papers classified as relevant were first chosen for further consideration using annotations. Second, the papers were scanned with an aim to identify and classify the application domains and applicationspecific machine learning techniques. Finally, autonomous applications based on reinforcement learning were reviewed with more detail to identify their key features such as sources of training data, applied reward mechanisms and the training environments. An overview of the application domains is presented in Section 3.2, whereas Section 3.3 has been dedicated to autonomous applications utilizing RL techniques.

Application domains by main function
The survey of 47 selected articles resulted in a classification, where 18 key application domains were recognized using the five (5) main functions listed in Section 2.2. The aforementioned application domains are in Table 3. These are not constrained to autonomous AI agents or reinforcement learning techniques, but all the detected approaches are included.
The main function 'observe' is one of the basic functions of BAS, and there are plenty of reported applications. The recent reports include applications such as the use of energy signatures to identify the heating system and building type using smart metre data and unsupervised regression modelling (Westermann, Deb, Schlueter & Evins, 2020 (2020)), detection of anomalies (Araya et al., 2017) and BAS intrusions (Pan, Hariri & Pacheco, 2019). In their application, Araya et al. (2017) use ensemble learning (supervised) in their work, as well as Han, Zhang, Cui and Meng (2020) for fault diagnosis. Autonomous AI agents are not typical for this category, and the reports surveyed for this overview do not include the implementation of reinforcement learning, either. Hence, the detailed treatment does not belong to the scope of this survey, and the reader is referred to the original papers for further information.
Quite a few of the reported applications represents the main function 'predict'. In this category, autonomous AI agents are not typical, but reinforcement learning has been reported in the context of predicting energy use (Liu, Tan, Xu, Chen & Li, 2020) and occupant behaviour . The prevailing approach is deep learning (e.g., Gao, Ruan, Fang and Yin (2020), Wen, Zhou and Yang (2020))), whereas statistical learning has been applied, for example, to the prediction of the risk of power outages (Mukherjee, Nateghi & Hastak, 2018). Applications relying on transfer learning (e.g., Chen, Tong, Zheng, Samuelson and Norford (2020), Gao et al. (2020), Qian, Gao, Yang and Yu (2020)) and supervised ensemble learning (e.g., Gong, Wang, Bai, Li and Zhang (2020), Pinto et al. (2020)) also have been reported.
The main function 'manage (data)' typically does not include autonomous AI agents. To our best knowledge, reinforcement learning has not been applied in this area in terms of smart buildings, either. Data management issues such as 'poor data quality' have been referred to in the context of other main functions (e.g., Gao et al., 2020). Amayri et al. (2020) present database quality assessment for interactive learning in the context of occupancy estimation. End-user group categorization on the basis of reduction and transformation of energy use data is presented by Song, Ahn, Ahn, Park and Kwon (2020). An example of dataset reconstruction is in the work of Candanedo et al. (2018). The main function 'interaction with humans' is a new albeit growing body of smart building research. An example is reported by Konstantakopoulos et al. (2019), who present a deep learning and gamification approach to improving human-building interaction and energy efficiency in smart infrastructure. While gamification also suits into the frame of organizational learning, a more holistic approach is so called 'people-oriented' approach, where training data are gathered through 'five senses' (voice, visual recognition, recognition of emotions etc.), and, in tandem with the users' active feedback used to improve building performance (Li, Zhang, Li, Huang & Wang, 2020).
In contrast to the aforementioned categories, autonomous AI agents are common under the category 'adjust'. Here, the reinforcement learning technique is appropriate and also prevailing, wherefore Section 3.3 as whole is dedicated to a more detailed review of those applications.

Autonomous AI based on RL and DRL
In this section, we survey applications, where the AI not only actuates control processes autonomously (i.e., lower hierarchy level of the BAS) but chooses the values of parameters and control variables with an aim to adapt to changing operational environment (i.e., upper BAS hierarchy level).
Five categories emerged from our review of reinforcement learning applications to intelligent buildings. These categories are summarized in the 'Application' column of Table 4, and each of them is discussed in a separate subsection of Section 3.3. The key elements of a reinforcement learning system illustrated in Fig.1, namely, training environment, state, action, and reward, are elaborated for each of these application categories in Table 4.

Application 1: controller tuning
Shipman and Coetzee (2019) apply RL to train an autonomous AI agent to tune a PI controller, given only the process variable, set-point, manipulated variable and prior controller gains. The training considers random changes in plant dynamics, disturbances, and measurement noise and it is realized in a simulation. The saturating reward function (Deisenroth, 2012) is used as the reward signal to the AI agent.

Actuation of heating and cooling systems
Many reinforcement learning applications for HVAC control focus on heating. Gupta, Badr, Negahban and Qiu (2021) develop a reinforcement learning controller for automatically actuating a heating element. A reward function is designed to simultaneously optimize thermal comfort and energy saving. A simulation model of a house is used as the environment to train the controller. Most works control indoor temperature, whereas Brandi, Piscitelli, Martellacci and Capozzoli (2020) control supply water temperature. Rahimpour, Verbič and Chapman (2020) demonstrate the advantages of reinforcement learning in indoor temperature control for buildings with phase change materials, which have complex dynamics and thus pose significant challenges for conventional control approaches. Whereas the majority of approaches are limited to considering thermal comfort and energy consumption, Y. Du et al. (2021) additionally consider a variable electricity retail price and Yoon and Moon (2019) consider humidity as an additional factor for occupant comfort.
The majority of approaches involves using a building energy 20 -simulator as the training environment, but criticizing the simplifying assumptions made during the building of such simulation models. Zou, Yu and Ergan (2020) use two (2) years of building automation system data to train deep neural networks to serve as the environment for training the reinforcement learning agent doing the HVAC control.
In general, reinforcement learning methods for HVAC control assume that the agent takes actions at regular intervals, but Hosseinloo et al. (2020) proposes an event-triggered approach, in which examples of events are state variables crossing a threshold. Very similar setups for the building environment and reinforcement learning agents are applicable for cooling (Jia, Jin, Sun, Hong & Spanos, 2019).
The majority of research ignores the possible availability of rooftop photovoltaic generation, despite the strong increase in such capacity (Gernaat, de Boer, Dammeier & van Vuuren, 2020). However, Lissa et al. (2021) propose a reinforcement learning agent for home energy management, performing three distinct tasks: space heating, domestic hot water heating and ensuring that local photovoltaic generation is used locally as much as possible. A separate reward function is defined for each of these tasks. By weighting these, a higher-level reward function computes the overall reward used to train the agent. The environment for training the agent is a house energy simulation, parameterized with real sensor data from a case study.

Actuation of ventilation systems
One direction of research in learning ventilation systems are natural ventilation systems in which the opening and closing of windows is actuated. Han et al. (2020) developed a reinforcement learning agent for automatically closing and opening windows in order to optimize indoor    (2018) present a reinforcement learning agent for a similar problem, with the capability of controlling an air conditioner and heater in addition to opening and closing the windows. The capability to control indoor temperature and humidity all year round is demonstrated. Another direction of research in learning ventilation applications assumes that windows remain closed. Valladares et al. (2019) propose a reinforcement learning agent for the joint control of ventilation fans and air conditioners in a subtropical environment with no heating need. Unlike the majority of the research, air quality (CO 2 levels) is included into the usual optimization criteria of thermal comfort and energy efficiency. Building on this work, Yu et al. (2021) consider an environment with a variable number of occupants as part of the system state. The number of occupants is recorded manually, which requires further work before the system could be deployed.
We note that learning applications for ventilation have been studied less than applications for heating and cooling. With the exception of Chen et al. (2018), the reviewed works make simplified assumptions: Han et al. (2020) is limited to two months during a season in which outdoor temperatures were favourable towards natural ventilation, and (Valladares et al., 2019;Yu et al., 2021) assume a climate in which heating is never needed.

Actuation of indoor lighting systems
Park, Dougherty, Fritz and Nagy (2019) present a reinforcement learning agent for controlling lighting levels of an office. In addition to sensor data about occupancy and ambient lighting, the users' activity in operating the light switch is considered as a state variable, in order to allow the agent to learn the user preferences. The actions are limited to turning the lights on or off, lacking the possibility for adjustment of lighting levels. Cho et al. (2020) present a similar approach and go further by analysing occupants text message and by tracking their location and activity. Cheng et al. (2016) propose a reinforcement learning agent for lighting and window blinds control, to optimize the combined energy consumption of lighting and air conditioning. Motamed, Bueno, Deschamps, Kuhn and Scartezzini (2020) perform joint control of blinds and lighting to achieve sophisticated goals for visual comfort. As a pre-study for reinforcement learning applications for affecting human emotions by adjusting lighting levels and light colour. Seo, Choi and Sung (2021) designed a supervised learning model for recommending an illuminance level and correlated colour temperature depending on the task to be performed and the fatigue level and emotional mood of the person performing the task.
In summary, learning applications for indoor lighting present broad research challenges, due to the complexity of determining the impact on human productivity and comfort, the complexity of tracking humans and due to the interdependencies with other systems such as window blinds and heating or cooling.

Building energy management: decision making at the setpoint level
One category of learning applications are higher level building energy management systems that do not directly participate in real-time control but rather adjust the setpoints of lower-level HVAC control systems. Vázquez-Canteli, Ulyanin, Kämpf and Nagy (2019) use a building energy simulator as the environment to train a reinforcement learning agent. Two case studies are demonstrated. Firstly, an agent is trained to operate a heat pump to manage a chilled water tank in order to minimize electricity consumption of an air conditioning system over a one-day period. Secondly, the same problem is solved in the presence of local photovoltaic generation. Jiang et al. (2021) present a similar approach with another building energy simulator, using the solid mass of the building as a heat sink instead of a chilled water tank as a cold storage. A very similar approach is presented by Schreiber, Eschweiler, Baranski and Müller (2020), with the addition of considering the spot electricity price as part of the electricity cost minimization problem. Pinto, Piscitelli, Vázquez-Canteli, Nagy and Capozzoli (2021) propose another building energy simulator for training reinforcement learning models, with the advantage of being scalable from a single building to clusters of buildings and urban districts.
The abovementioned works assume that the same indoor environment should be maintained throughout the building. Lork et al. (2020) note that for managing the air conditioning of a residential building, the individual preferences of inhabitants should be considered in different parts of the building. Room specific environments are trained for reinforcement learning agents, which adjust the temperature setpoint of the air conditioning unit of the room. Y. Du et al. (2021) present a similar multi-zone indoor temperature control system, with the additional capability of considering variable electricity retail price for the energy cost optimization. Luo et al. (2020) propose the self-learning controller as an alternative to reinforcement learning approach such as the ones reviewed by Vázquez-Canteli and Nagy (2019B). The self-learning controller does not employ machine learning techniques but is similar to reinforcement learning in the sense that it receives feedback from a building energy simulation and adjusts its actions accordingly.
The majority of the research does not consider the possible availability of rooftop photovoltaic generation of electricity or hot water. Works that do consider it have diverse approaches for incorporating it into the learning targets. Soares et al. (2020) use reinforcement learning to control a domestic heat pump and electric loads in the presence of a hot water tank for heat storage, a battery storage and local photovoltaic production. The goal is simply to maximize local self-consumption of the photovoltaic energy on an hourly basis, with occupant comfort being treated as hard constraints for the lower level control. Thus, comfort is not included in the reward function of the reinforcement learning. The energy content of the tank and battery are modelled analytically instead of relying on ready components in building energy simulators.  aim to use photovoltaic generation to reduce the electricity consumption in the presence of cold storage. Correa-Jullian, Droguett and Cardemil (2020) considers a domestic solar thermal collector and a reinforcement learning agent performing on/off control of a solar circulation pump and heat-recovery circulation pump. A complex combination of factors contributes to the reward, including energy efficiency, local exploitation of renewable energy, energy cost and thermal comfort.

Demand response: decision making for rescheduling HVAC loads and using alternative (non-electric) sources of energy
In an overview on machine learning approaches for demand response in residential buildings, Sharda, Singh and Sharma (2021) identify RL as an emerging alternative to conventional multi-objective optimization techniques. Instead, our survey reveals great variety in how different authors formulate the RL problem. Mathew, Jolly and Mathew (2021) train an RL agent for rescheduling residential electricity load away from peak priced hours. The scheduling problem is modelled as a game, where loads are blocks on a chart depicting time on the horizontal axis and the total load on the vertical axis. The agent is able to move the blocks and it is rewarded, when it is able to create a load profile shifted away from peak hours. Sheikhi, Rayati and Ranjbar (2016) propose an energy management system for a residential building equipped with gas powered micro-CHP. The system includes a reinforcement learning agent that takes demand response actions to avoid electricity consumption at times of high electricity prices. It can do this either by rescheduling electric loads or by buying gas and using the micro-CHP. In (Sheikhi, Rayati & Ranjbar, 2016), they generalize the approach to a multi-energy system with several possible energy carriers. In a review of reinforcement learning demand response solutions for HVAC and other assets, Vázquez-Canteli and Nagy (2019B) point out a problem in the majority of such approaches: in markets with demand-independent electricity prices, these approaches are likely to only shift the consumption peaks. This critique could be applied to (Mathew et al., 2021); however, (Sheikhi et al., 2016) assume a real-time pricing environment, which could overcome this problem. Demand response research is focused on electrical systems, but Solinas, Bottaccioli, Guelpa, Verda and Patti (2021) apply the concept to district heating, in which a hot water piping network supplies heat to buildings from a CHP plant. The authors apply reinforcement learning to reduce peak demand at the CHP plant by controlling the heating load in buildings.
We note that the reviewed works do not consider the additional possibility to trade on ancillary markets such as frequency reserve markets, which is a special form of demand response and offers additional possibilities for profitable exploitation of flexible energy resources (Giovanelli, Kilkki, Sierla, Seilonen & Vyatkin, 2019;Kempitiya et al., 2020;Subramanya et al., 2021).

Discussion and recommendations for future research
In the first part of this article, we discussed the learning ability as a feature of buildings. We conclude that the increasing autonomy of smart buildings, the evolving AI, and the increasing demand for interaction between humans and buildings challenge the future research. Further research is needed, for example, to find out to which extent the AI may enhance building performance and the buildings' adaptability to unpredicted changes when the entire system rather than single processes is concerned. Here, the application of digital twins as training environments in boosting the learning processes is considered one of the major research topics.
The evolving autonomy of building energy management systems requires further specifications for AI-initiated monitoring, analysis, and decision-making tasks (Aguilar, Garces-Jimenez, R-Moreno & García, 2021). One of the potential research topics is building performance assessment with an aim to identify processes that yield the highest benefit from automation, including the autonomous learning processes. Here, the key performance indicators are indoor comfort, energy efficiency, carbon footprint and techno-economic viability. We propose the Learning Ability Index (LAI) as a tool to assess the implications of autonomous learning processes at the system level. This is because the LAI indicates the performance of AI by aggregating multiple attributes into a single number and considers the time invested in AI training (Alanne, 2021). Other indicators such as the Building Intelligence Quotient (BIQ), the Consumer Engagement, and the Smart Readiness Indicator (SRI) can be used to quantify the contribution of AI-initiated processes in the whole system (Batov (2015); SCIS (2017), Verbeke et al. (2017); Vigna, Pernetti, Pernigotto and Gasparella (2020); Volkov (2013)).
With an intention to outline pathways towards higher building intelligence, we want to stimulate discussion on whether to standardize autonomy levels for buildings, inspired by the levels of automated driving (SAE J3016 2018). The Society of Automotive Engineers (SAE) determines explicitly the driving tasks belonging to the human driver and those belonging to AI using a scale zero (0) to five (5) (Gruyer et al., 2017). AI training takes place on a regular basis and implements both hardware/software updates and data collected from real traffic conditions (Badue et al., 2021). There is no corresponding standardization for buildings. We conclude on the basis of our survey, however, that this type of a definition and standard would potentially serve as a roadmap supporting a progression from basic capabilities towards more advanced capabilities, building on the basic ones.
In the second part of this paper, we presented an overview of reported machine learning applications under five main functions, namely, observe, predict, adjust, manage, and interact. Here, we focused on autonomous AI agents that make independent decisions for building energy management and are based on (deep) reinforcement learning (RL).
The reviewed reinforcement learning applications involve adjustment both in real-time, hourly, and daily timescales. We conclude that the adjustments would perform better if they incorporated the outputs of asset management and prediction as state information to the reinforcement learning agent (see: Fig. 1), but such integrations were not found in the reviewed papers. This is a major direction for future research.
Management of big data will be a major issue related to large scale deployment of all of the above, but these aspects have not been addressed by the studies reviewed in this paper. To that end, Qolomany et al. (2019) perform a review of machine learning applications for smart buildings with a focus on big data. Applications of these technologies is a major area of further research to address the challenges of deployment of AI solutions to physical buildings.
The reviewed papers address interaction with human users in limited ways, building on adjustment capabilities, so that the adjustments are made based on occupancy sensor data and in some cases providing very simple user interfaces for human users. Sophisticated interaction technologies such as wearable technologies or speech detection are topics for further research. Besides, our survey revealed some further limitations and potential areas of future research, which are briefly discussed in the following paragraphs.
One direction of further research would be to develop benchmarks of building environments with constraints for the indoor environment, in order to permit direct comparisons between the performance of different systems targeting similar goals; the benchmarks could be in the form of open source RL training environments implementing the popular OpenAI Gym interface (Brockman et al., 2016). Ma, Aviv, Guo and Braham (2021) present a comprehensive overview of the variables used in the literature and identify additional variables that have hardly been used such as adjustments to lighting colour that are made possible by LED technology. Han et al. (2019) identify a lack of research that accounts for the behaviour of the occupant. The simplest way to take the occupant into consideration is to add occupancy as a state variable to the reinforcement learning model (Han, Zhao, Zhang, Shen & Li, 2021). Deng and Chen (2021) propose an alternative approach by creating a model of the behaviour of occupants in how they adjust the thermostat and adjust their clothing level.
The majority of the works reviewed in this article fail to account for human behaviour; however, Park et al. (2019) incorporate the occupant's manipulation of the light switch into the training of the reinforcement learning agent and Cho et al. (2020) extensively track the activity, movements, and messaging of the occupants, which may raise privacy concerns should the system be deployed.
In a review of reinforcement learning applications for HVAC control systems, Wang and Hong (2020) note that a major upcoming challenge for HVAC controllers is that they need to integrate to higher level building energy management systems; this improves energy savings Mason and Grijalva (2019). For example, Azuatalam et al. (2020) integrate a demand response operating mode to the reinforcement learning HVAC controller. As a long-term challenge, Perera and Kamalaruban (2021) identify further research horizons for exploiting reinforcement learning for energy management across buildings and other sectors, namely, transportation, agriculture, and waste management.
When reinforcement learning agents are deployed to physical buildings, only a limited number of experiences can be gained from the physical environment. One avenue of further research would be transfer learning in multi-agent systems, in which experiences gained by an agent in one building could be exploited by agents in other buildings.
The great majority of the reviewed approaches use kind of a building energy simulation as the environment for training the reinforcement 9 learning agent. However, Han et al. (2020) and Schreiber, Netsch, Baranski and Müller (2021) create data-driven black-box models of the environment with datasets collected from the physical building. Qiu et al. (2020) compare two alternatives for the environment: a building simulation model developed with real data from the case as well as a data-driven model using the same data. Superior performance was achieved with the former one. The building simulation approach is preferred to the data-driven approach also in the sense that it enables predicting the system behaviour also in other than measured operation.

Conclusions
In this paper, we discussed the learning ability of buildings in general and reviewed machine learning applications for training buildingintegrated AI with an emphasis on the reinforcement learning (RL) technique and autonomous AI agents that make decisions to initiate building control and energy management processes. To clarify the learning goals for buildings, we classified the learning applications according to five main functions, namely, observe, predict, adjust, manage data, and interact with occupants.
We conclude that buildings' ability to learn is a key element for their adaptability. Learning can be achieved through AI training by way of machine learning algorithms, but also through shared learning between humans and a building. However, the adaptability at the system level and within a limited time calls for increased autonomy of building energy management systems. This can be realized through autodidactic functions, which initiate learning processes. Here, the training environment plays a significant role to make the learning process efficient.
Significant activity with RL was discovered, and it was concluded that the field is not yet mature with respect to the following three limitations: the performance of different solutions for similar problems are in general not comparable (1), the capabilities of RL are not exploited to perform optimization across subsystems (2) and solutions do not scale up to several buildings without the need to engineer environments for each building from scratch (3).
In the conditions of global climate change and its consequences, as well as the increased penetration intermittent renewable energy production, building occupancy scenarios may not stabilize to a 'new normal', limiting the relevance of long sets of training data from a single building. Hence, there is a need for solutions that can quickly adapt to the changed conditions as well as to detect when conditions have changed so that the historical experiences of the agent might be obsolete. As the research on reinforcement learning for building energy management begins to confront the challenges related to deployment to physical buildings, there are clear motivations to explore transfer learning and multi-agent reinforcement learning for energy management of intelligent buildings.
Another line of research would be to employ a digital twin of the building as a virtual training environment for the reinforcement learning agent, to ensure that the environment reflects the most recent conditions at the building. An agent trained in such an environment would have a high potential for successful deployment to the real building. However, a high-fidelity digital twin is computationally expensive, and one line of further research would be to determine the adequate level of fidelity in the digital twin to enable the reinforcement learning agent trained in this environment to successfully generalize to the operating environment of the real building.

Declaration of Competing Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no financial support for this work that could have influenced its outcome.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.