Automating occupant-building interaction via smart zoning of thermostatic loads A switched self-tuning approach

are the of for the set point of the thermostat. This fixed set of actions prevents smart zoning, i.e. to dynamically regulate the set points in every room at different levels according to geometry, orientation and interaction among rooms caused by occupancy patterns. In this work we frame the problem of load management with smart zoning into a multiple-mode feedback-based optimal control problem: multiple-mode refers to embedding multiple behaviors (triggered by building-occupant dynamic interaction) into the optimization problem; feed-back-based refers to adopting a Hamilton-Jacobi-Bellman framework, with closed-loop control strategies using information stemming from building and weather states. The framework is solved by parameterizing the candidate control strategies and by searching for the optimal strategy in an adaptive self-tuning way. To demon- strate the proposed approach, we employ an EnergyPlus model of an actual office building in Crete, Greece. Extensive tests show that the proposed solution is able to provide, dynamically and autonomously, dedicated set points levels in every room in such a way to optimize the whole building performance (exploitation of renewable energy sources with improved thermal comfort). As compared to pre-programmed (non-optimal) strategies, we show that smart zoning makes it is possible to save more than 15% energy consumption, with 25% increased thermal comfort. As compared to optimized strategies in which smart zoning is not implemented, smart zoning leads to additional 4% reduced energy and 8% improved comfort, demonstrating improved occupant-building interaction. Such improvements are motivated by the fact that the approach exploits the building dynamics as learned from feedback data. Moreover, the closed-loop feature of the approach makes it robust to variable weather conditions and occupancy schedules.


Introduction
The future will see more and more developments in the smart buildings and smart grids areas [1]: while smart buildings should implement demand management programs (sometimes also referred to as load management programs) [2], smart grids should implement demand both load management programs and demand response programs should ultimately act on such set points. Two ways of acting on the set points are possible: in the first one, the users can actively manage the thermostat set points; in the second one, automated solutions for set point selection are established, in which the set point is automatically regulated without users being actively involved. However, automated demand management presents several challenges, one of the main ones being enhancing energy efficiency in thermostatically controlled HVAC loads via smart zoning [5]. Smart zoning is the capability of dynamically creating localized climate conditions that take into account the usage of a room, its orientation and its occupancy. For example, there is a recent trend in developing smart thermostats, Nest, Tado and Toon being just a few examples, that allow a sort of 'smart automated' regulation of the thermostat set point (e.g. via learning algorithms). While being appealing and in many case effective, such solutions work for small homes, and demand management actions often consist of a fixed (nondynamic) set of rule-based options. This fixed set of options often neglects the building dynamics and the dynamic occupant-building interaction: in fact, in order to keep consistent performance, the HVAC set points should be continuously adjusted depending on variable weather conditions (which will affect the availability of renewable energy sources) or depending on user activity (which will affect occupancy patterns). A zoning program which is truly smart should (1) combine dynamically the available information stemming from the building and the weather states [6] (intelligent load management); (2) embed the occupancy pattern using models that can be clearly interpretable by human beings [7] (occupancy-based load management). This, unfortunately, turns out to be a big challenge due to the multiple factors that dynamically influence energy consumption [8]. The following subsections give an overview of recent results on intelligent load management and on occupancy-based load management. Some open problems in these areas are discussed, from which the motivations for this work arise.

Related work in intelligent load management
HVAC load management is the most cost-effective option for energy efficiency in buildings: while it is clear that raising the HVAC set point during summer and decreasing it during winter has great energy saving potential, smart zoning programs would push the energy efficiency even further by intelligently taking thermal comfort constraints into account [9]: the thermal conditions for human occupancy are codified in the ASHRAE Standard 55 [10]. Energy/comfort/economy costs are studied in [11] using a static building model. However, dynamics models are more appropriate to study the delicate trade-off between changing the thermostatic set point and the effect on thermal comfort: efforts in this direction can be found, for example in [12] via a simulation-based method, in [13] via weighted linguistic fuzzy rules in combination with a rule selection, in [14] via a numerical procedure based on the finite-difference method, and in [15] via population-based stochastic optimization, based on different comfort bounds. Being thermal comfort closely connected to indoor air quality, a conflict exists also between energy saving and indoor air quality improvement, as studied in [16] via a knowledge-based automation approach, or in [17] via a genetic algorithm. Sometimes the focus is on user satisfaction, which may or may not include thermal comfort: a rule-based demand side load management technique that is capable of controlling loads within the residential building in such a way that the user satisfaction is maximized is considered in [18].
Importance of a dynamic optimization: With a few exception, most state-of-the-art works consider the optimization of 'static' parameters: such parameters are not able to evolve dynamically and in real-time if new conditions arise. In order to achieve dynamic optimization, feedback-based strategies are necessary. This is particularly relevant if renewable energy sources like PV panels must be exploited [19], so that the HVAC management should take this information into account to minimize non-renewable energy consumption. In fact, a reasonable criterion in addition to thermal comfort/user satisfaction, is the one of covering the HVAC load using renewable energy sources: [20] tackles the problem via a rule-based algorithm that controls the battery inverter, whereas [21] considers different rule-based strategy planning models that allow to select optimum preheating/cooling time. When considering the power demand of aggregated HVAC, the HVAC control problem is formulated as a scheduling problem in [22]. Few works consider dynamic programs for joint energy savings with thermal comfort, and no works, to the best of the authors' knowledge, embed smart zoning in such programs. For example, in previous work by some of the authors, a combined criterion composed of the non-renewable energy consumption and the thermal comfort has been used in [23] to design an appropriate feedback-based strategy: however, the fact that the availability of the building testbed was limited to a few rooms prevented the implementation of a true zoning strategy.

Related work in occupancy-based load management
No work of the one cited considers occupancy-based strategies, i.e. setting the temperature in every room depending on occupancy patterns. Among the few works available in literature, [24] has considered selecting different set points based on aligning the residents' thermostat preferences with the indoor temperature, whereas [25] has considered aligning building resident's thermal preferences by assigning optimal resident-apartment pairs via integer-programming. The system proposed in [26] aims to match thermal service with the spatial distribution of occupants. In general, the goal of these works is to minimize the difference between unregulated room/zone temperature and the occupants' thermal preference based on heating/cooling loads: therefore, no dynamic smart zoning depending on weather conditions or occupancy schedule is considered.
Importance of a dynamic optimization: The importance of dynamically exploiting occupancy information in open-loop solutions like model predictive control has been recognized as a key enabler for energy efficiency with thermal comfort [27]. Interestingly, in [28] an occupancybased rule-based controller is compared with an occupancy-based model predictive control that requires real-time optimization: it is found that the much higher complexity of the model predictive control yields negligible benefits over the simple rule-based controller. This suggests that feedback-based (closed-loop) solutions are of fundamental importance for occupancy-based load management. The integration of (rule-based) expert knowledge with automated feedback-based decisions is not trivial, since most advanced programs based on model predictive control can achieve this either by resorting to complex mixed-integer nonlinear programming [29] or by adding new constraints [30], which might make the optimization infeasible. Previous work by some of the authors has shown that occupancy information can be embedded not only in open-loop solutions, but also in closed-loop solutions [31]: furthermore, it was shown that occupancy information can be efficiently combined with availability of renewable energy supply so as to shape the demand based on the real-time building/ weather/occupants measurements [32]. However, in such works the occupancy schedule is homogeneous within a building: no works, to the best of the authors' knowledge, explore the possibility of embedding the occupant behavior to implement zoning strategies that can take actions in a systematic and dynamic way based on the real-time building/ weather/occupants measurements.

Motivations and contributions of this work
Summarizing, the motivations for us to implement an automated smart zoning program can be listed as: • To truly optimize the energy consumption and thermal comfort, it is crucial to consider the 'system-of-systems' structure of buildings (composition of interacting rooms, with dense interconnection of HVAC actuation/sensing [33]); • When delivering their load management actions, smart thermostats still cannot automatically account for user behavior and occupancy patterns at the zone level (with different actions in different zones [34]); • Rule-based actions alone cannot not promote dynamic set point adjustment based on internal conditions (building state) or external conditions (weather state) [35].
We tackle the aforementioned difficulties by embedding the load management with smart zoning program into a Hamilton-Jacobi-Bellman (HJB) optimal control problem whose main components can be identified as: (a) Closed-loop control: by using feedback information stemming from the internal building state (temperatures, occupancy schedule, availability of renewable energy) and from the external state (weather conditions, weather forecasts) we are able to intelligently adapt the HVAC set point to all these conditions; (b) Multi-modal control: by integrating the user behavior via a switched model that includes the occupancy state of the different rooms, we can explicitly consider the occupant-building interaction and the user-driven energy pattern; (c) Self-tuning control: by solving the HJB framework via parameterization the candidate solution, we are able to use learning mechanisms to search for the optimal solution in an adaptive selftuning way.
We refer to the proposed approach as R-PCAO (Rule-based Parameterized Cognitive Optimization).
The paper is organized as follows: Section 2 introduces the problem setting, while Section 3 focuses on the control goals. Section 4 presents the automated load management, with simulations performed in Section 5 to demonstrate and analyze performance. Conclusions are in Section 6.

Problem setting
The main purpose of HVAC automated programs should be to employ feedback to establish meaningful 'relations' among data gathered from internal states (temperature in each room, occupancy schedule) and from external states (weather conditions). Such relations should be ultimately exploited to select the HVAC set point in each room. In this section we present the mathematical models describing the 'relations' among different states. We consider a cooling problem and we start from the dynamics of a single room i with thermostatically controlled HVAC In (1), the superscript i ( ) is used to indicate a quantity of room i; the superscript i ( ) is used to indicates a quantity of neighboring room 1 of room i. The other parameters are as follows: C R is the thermal capacitance of the room, T i ( ) the room temperature, T i ( ) the neighbor room temperature, T O the outside temperature, S O the solar gain entering from the window. The parameters and are material-dependent and room-dependent: in particular, they indicate the heat transfer coefficients between the room and the outside, and between the room and the neighboring rooms, respectively. The parameter depends on the size of the window (the larger the window, the larger the solar radiation entering the room), and might also be room-dependent. 2 Two other inputs affect (1): the first is the heat gain Q occ i ( ) resulting from the presence of occupants (which clearly cannot be controlled), and second is the power Q HVAC i ( ) injected by the HVAC to cool the room (which can be controlled). The mechanism for controlling the HVAC is the classical thermostatic mechanism, whose switching is driven by T i ( ) and T set i ( ) as illustrated in (1).
Note that the effect of thermostatic threshold can be removed in the presence of a variable-speed drive. In fact, variable-speed drives can modulate the HVAC action in such a way that: the more the difference between the set point and the room temperature, the more the power injected by the HVAC; the less the difference between the set point and the room temperature, the less the power injected by the HVAC. To model such variable-speed drive, we consider where K is the constant of the proportional controller in the variablespeed drive. Till now we have considered a generic set point T set i ( ) : however, in practice the set point is scheduled by the building management system according to some rule-based strategies: in the following we describe the strategy according to which the set point is scheduled.

Rule-based set point selection
It is common practice of many office buildings to deploy a rulebased load management. Basically, the rules determine the HVAC set points based on the occupancy schedule. In this work we focus on three basic rules, implemented nowadays in the majority of the building/facility management systems: 1. Normal mode: This represents the desired set point when there are people in the room. 2. Set-back mode: This is the most common strategy in buildings. The load management program selects a higher set point during nonoccupancy hours (e.g. outside office hours). In fact, it is usually preferable to switch-off HVACs when occupants are not present, aiming for lower energy consumption. 3. Pre-cooling mode: A common feature of many load management programs is to turn on the HVAC some time before people arrival, usually with a set point slightly lower than normal (for cooling) in order to reach faster appropriate indoor conditions.
Next to these three rules, a so-called zoning program can allow to select a different set point for every room. This might be necessary, e.g. due to different usage of a room, different window area and orientation, or even different preferences of the users. Note that in large buildings like office buildings or commercial buildings the difference between the set points in the different rooms can be 2-3°C, therefore very relevant. We assume that such a program is available in our test case in view of better energy efficiency.
It is clear that the switch from one mode to another is driven by the occupant behavior, as represented in Fig. 1. Of course, the implementation of the schedule in Fig. 1 requires the presence of a system with the ability of predicting user behavior: this is a topic of increasing interest in recent years, cf. the survey [36] and the work [37]. Therefore, we assume the presence of such a prediction system. 1 In case a room has more than one neighboring rooms, one should consider the summation of all these terms. In order to avoid making the notation more cumbersome, in the following we will not report such summation for simplicity. 2 Therefore, , and should actually be ,

Overall simplified building model
By combining the single room dynamics with the set point selection, one obtains a dynamical model represented in Fig : normal mode, set-back mode, and pre-cooling mode.
• Each room i has one state (its own temperature T i ( ) ) and one input • It is assumed that forecasts for both T O and S O are available, so that these disturbances will be treated as present and future measurable disturbances.
• The occupancy schedule leading to Q occ i ( ) is also assumed to be known at each time.
• The cost comprises two terms: • By taking into account the building topology, the model can be extended from one room to the entire building, as shown in (5). We do not describe the complete procedure for lack of space: the attentive reader will recognize that the following model will be obtained where Note that each room has its own switching signal because its is independent from the modes of the other rooms. of all rooms and is the summation of the costs in each room.

Control goals
The specific goal of any load management program is to reduce the energy costs while keeping occupants satisfied, as formalized in (5). However, (5) refers to a specific instant in time, whereas, due to the dynamic evolution of temperatures, solar radiation and occupancy schedules (at least at the time scale of minutes), one should integrate the cost (5) over some long enough horizon where N t is the horizon length, which can be anything from one day or one week, or even longer depending on the length of the simulations one is interested in performing. In many cases, people is interested is infinitely long horizons, with a discounting factor so as to obtain a finite integral where > 0 is the discounting factor that reduces the importance of the cost far in the future. Clearly, the importance of minimizing an integral cost (9) in place of an instantaneous one like (5), is that the minimization of an integral cost takes into account dynamic behaviors in conjunction with occupancy schedule and weather conditions (an example is to 'overcool' the building when enough solar energy is available, so that energy can be saved when less solar energy is available [19]). Summarizing, a dynamic optimization problem is obtained where all the variables have been defined in (3)-(7). Let us underline that the dynamics in (11) describe dynamics in the scale of minutes, which is quite crucial in order to account for changes in weather conditions and changes in occupancy. Clearly, the model (11) is a simplified room/building model: it is useful to define the state and inputs of the system and the control objective, but it cannot be used for realistic testing of a smart zoning program: it is well known that simulation tools like EnergyPlus or TRNSYS [38] provide more realistic building dynamics. Nevertheless, the simplified dynamics summarized in (11) are of fundamental importance to understand which measurements from EnergyPlus or TRNSYS can actually be used by the optimization algorithm for feedback and real-time control.

Simulation model
To test the proposed algorithm on a realistic model, we use a building test case in EnergyPlus [39]. Our EnergyPlus model, shown later in Fig. 3, represents an actual office building in the campus of Technical University of Crete, Greece. The building has 10 rooms with 10 different HVAC set points that can be selected independently. Interestingly, the building is oriented along the North-South axis, with offices on each side of the building: note that, in view of its orientation, the offices take considerably different solar radiation from their windows. As a result, the different solar gain might influence drastically the selection of the HVAC set point. The buildings is also equipped with a photovoltaic panel that can be used to partially cover the energy demand. The EnergyPlus model has been developed during previous European research projects, mainly the AGILE project [40], coordinated by one of the authors, prof. Kosmatopoulos. It has been developed and validated in such a way that the thermal and energy dynamics of the model can capture in a realistic way the actual dynamics of the building. The energy cost and the comfort cost are automatically calculated by EnergyPlus. Because the photovoltaic energy is free of charge, the total energy cost (in kWh) takes into account only the energy absorbed from the power grid: in other words, if not further explained, in the following we will use the term 'energy consumption' to indicate the non-renewable portion of the energy consumption. To make the simulation even more realistic, typical load profiles from the actual buildings (PCs and appliances) have been implemented in En-ergyPlus: clearly these loads are uncontrollable, but they make the total energy consumption of the building more realistic.
For the thermal comfort cost, we resort to an established metric, standardized in the ANSI/ASHRAE Standard 55 [10]: the Predicted Mean Vote Index (PMV) index. The PMV index is a thermal comfort model predicting the mean response of people according a seven-grade

Comparison strategies
For comparison purposes, the following load management strategies are adopted and implemented in EnergyPlus: • Two Fixed Set Point (FSP) strategies. The FSPs employ a simple strategy, which consists of fixing the HVAC set points of each room at 24°C (°FSP C 24 ) or 25°C (°FSP C 25 ) during occupancy hours (the set points are 30°C outside occupancy hours, which implies switching off the HVAC). Such simple strategies (they actually implement no zoning) provide acceptable performances in terms of the cost (10), although the performance is clearly far from optimal: the choice of these two set point temperatures is motivated by the AGILE project. During the project, fixed set point temperatures have been studied in such a way to find a trade-off between good energy consumption and good thermal comfort. It turned out that, in summer season, a set point of 24°C gives an acceptable PPD of 10-12% (the ASHRAE standard suggests a PPD of around 10%), whereas a set point of 25°C makes the PPD 1-2% worse while reducing the energy consumption of around 20%. Thus, these two strategies can be considered as two extremes of the Pareto front in between which optimal control strategies can play: most importantly, such strategies also provide with a fair base scenario that reduce any bias arising from calculating improvements for different weather conditions. C in room i half an hour before people entering the room (i.e. pre-cooling mode). Such pre-cooling strategy is also motivated by AGILE project [40], as a trade-off between the two FSP strategies. The values T s i ( )°C are optimized with a genetic algorithm in such a way that the PPD is below 9% for a nominal occupancy schedule and some nominal weather conditions. This guarantees to have a baseline strategy that keeps occupants satisfied, so that we can perform meaningful comparisons with regards to energy consumption.
It is important to remark that the good performance for RBLM can only be evaluated under nominal conditions (weather and occupancy): this is because the designed RBLM is an open-loop schedule whose optimization would have to run continuously, otherwise its performance cannot be robust to changing conditions (e.g. weather and occupancy conditions). From these considerations we infer that a truly optimal strategy should be feedback-based, able to exploit information stemming from the entire building, and, in particular, able to use the information about environmental conditions so as to continuously and automatically adjust the set points: our proposed solution follows this idea as explained in the next section.

The proposed optimization methodology
Here we present the Rule-based Parameterized Cognitive Adaptive Optimization (R-PCAO) we adopted to solve the load management with smart zoning problem defined by (10) and (11). We will first give a brief overview of the concept, and then we will provide the details of the optimization algorithm.

The concept
The connection between the simplified building model, the EnergyPlus model and the optimization algorithm are shown in Fig. 3. Basically, the components interact as follows: • The simplified building model: defined by (10) and (11), it is used to identify the inputs and outputs of interest (HVAC set points and zone temperatures, respectively), as well as the external factors influencing temperature (outside temperature, solar radiation, occupancy) and the different operating point of the HVAC (trigger by the occupancy pattern.
• The EnergyPlus building model: it provides the actual measurements to be used for feedback. To perform the simulation tests the EnergyPlus building model receives the inputs from the controller.
• The switched dynamic controller: it processes all the quantities previously identified (building and weather states) in such a way to generate the feedback-based input actions. The controller depends on some parameters (the control gains to be introduced later).
• The self-tuning optimizer: it must be able to self-tune the controller parameters such a way to maximize the building performance. This is the task of the proposed Rule-based Parameterized Cognitive Adaptive Optimization (R-PCAO) which exploits an Hamilton-Jacobi-Bellman (HJB) formulation described in the next section.
Depending on the building structure (defined by the simplified Eqs. (10) and (11), or even by the more realistic EnergyPlus model), different control gains are needed to optimize the performance at the whole building level. Therefore, one requires an optimization algorithm with the ability to 'learn' the building dynamics from data, and use this knowledge to optimize the whole building performance (in our case, exploit the renewable energy sources while delivering improved thermal comfort). Before giving the mathematical details of R-PCAO, it is instructive to collect in Table 1 all the quantities involved in the optimization.

The Rule-based Parameterized Cognitive Adaptive Optimization
For convenience, let us first separate the cost t ( ) for some appropriate > 0. From dynamic programming theory [41], we know that the optimal solution to (10) and (11) satisfies the Hamilton-Jacobi-Bellman equation:°= where°V is typically referred to as the optimal value function, while°u is referred to as the optimal control. Note that the optimal value function and the optimal control are dependent of t ( ), in view of the different modes of the system. The main idea behind the R-PCAO algorithm is to parameterize both the optimal value function and the optimal control°=°+ where°P can be referred to as the optimal (or nearly optimal) parameterization matrix. Because the value function can be interpreted as a Lyapunov function for the system, it is positive definite, which can be achieved by imposing°I P I 1 2 . In addition, the function M x ( ) z is the Jacobian matrix of z x ( ) with respect to x z x , ( ) is the feedback vector to be defined later, and O L (1/ ) is the approximation term. The exact form of L and z x ( ) will be discussed later: for the moment it is sufficient to say that L is a parameter such that by increasing L the approximation error becomes smaller (similar to neuralnetwork approximation error). The form of z x ( ) also depends on L (similar to neural-network regressors).
Being°V t ( ) unknown, the main problem resides in the fact that the optimal parameterization°P is unknown. However, one can substitute the optimal parameterization matrix°P with an estimate P , with At this point, one should iteratively approach the nearly-optimal solu-tion°P by updating the parameterization P at every time step. The R-PCAO algorithm is a specific way of updating P through the use of the building model. The R-PCAO algorithm is schematically represented in Fig. 4, and described through the following steps.
(1) Calculation of close-to-optimality index: Select a sampling time dt, and consider the index which can be referred to as the close-to-optimality index. In fact, the smaller the = E P ( ) 2 , the closer P is to°P (albeit the approximation error O L (1/ )). Therefore, one can think about using a gradient-like descent for updating P , where > a t ( ) 0 is an update step, so as to minimize 2 . In fact, when P converges to the optimal°P , then 2 would be of the order of the approximation error for all time steps. However, (19) cannot be directly used because: a. An analytic expression of E P in (19) is not available. In fact, E P depends on the simplified dynamics (11), which are an approximation of the actual building dynamics; b. It is well know that, in practice, the approximation error term

O L
(1/ )can make the convergence properties of the standard gradient descent algorithm invalid [41]. To overcome these technical difficulties we construct an alternative descent method according to the following steps.
(2) Update linear-in-the-parameters estimator: Consider the linear-in-theparameters estimator to approximate P 2 , i.e. to approximate the gradient of the objective function with respect to P t ( ) . Using stochastic approximation techniques, it has been shown in [41] that such an approximation will iteratively converge close to the actual gradient as more data are collected.
(3) Generate candidate strategies: Because P is the parameterization of a Lyapunov function, only positive definite matrices should be considered for every update. This can be easily achieved by generating the appropriate candidate perturbations where a t ( ) is a positive update step and P best will be defined later.  (20), and only the best one (according to the estimator) can be selected for actual test in the building model Therefore, the use of the estimator (20) has the clear advantage that only one evaluation using the building model is performed for each time step. This is important because for each evaluation of a strategy using a building model (simulation-based evaluation), the computational cost is in general proportional to the simulation horizon that one wants to test.

Switched-based approximation
Any building management system exploits a certain set of information to operate its management actions. Such information can be typically categorized as internal feedback factors (internal temperatures, generally denoted with x) and external measurable feedback factors (external temperature and solar radiation, generally denoted with d). The same holds for R-PCAO, where the information of x and d should be used to operate an optimal management: in particular, in R-PCAO, the measurements are used to approximate the value function and the control law. In many applications, a quadratic approximation of the value function, e.g. x P x and a linear approximation of the control law, e.g. = u B P x 1 can provide acceptable performance. However, because we have seen in (3)-(5) that switching modes will occur, the dynamics corresponding to each t ( ) which be too different to be handled by a single (linear) controller. This implies that the quadratic/ linear approximations must be overcome. Therefore, we utilize different controllers depending on the active mode t ( ). In this case, P t ( ) and z x ( ) are where the following things have to be noticed: the feedback vector contains information about outside weather conditions, people occupying a certain room, inside temperature and temperature of the neighboring rooms. This allows us a special block-structure in P is activated depending on the mode of room i.

Results
This section focuses on the simulations for the test case of Section 2. The simulations have been conducted with Matlab R2015b, and Energy-Plus 6.1.0, on a PC with 16 GB RAM and Intel 4770 k. The weather data have been taken from the EnergyPlus database, in particular for Athens, year 2011. It is important to underline that the numerical values of the RBLM have been optimized so as to perform good (with average PPD around 9% for a nominal occupancy schedule, and as small energy consumption as possible) over 7 different sets of 7 days during summer (Greek climate, from mid June to early August 2011). Note that the hottest weeks of the season are included in these sets, with some variability: considering the hottest weeks allows us to maximize the need for zoning in the building.
The average occupancy schedule of the building is given in Table 2. Note that sometimes the building is occupied during the whole day and sometimes only during the morning, creating a variability that makes the control design more challenging (because occupant behavior has to be taken into account). The one reported in Table 2 is the average occupancy schedule over the entire building: the actual occupancy schedule for each room consists of some random perturbation of the average schedule, as reported in Fig. 5.
In all subsequent tables we will report the improvement in terms of non-renewable energy consumption (intended as the energy consumed from the grid), and the improvements in terms of PPD. We calculate such costs for the different load management programs, including the proposed one. Furthermore, in order to provide a cumulative (total) improvement, we also calculate which is a linear combination of the two main performance criteria. The weights 0.1 and 0.9 reflect the same proportion used in the cost matrix H , and thus reflect the actual cost driving the R-PCAO optimization. Different weights would lead to different trade-offs and thus different Pareto-optimal results. To highlight the benefit of smart zoning, we deploy a 'downgraded' version of R-PCAO where the HVAC set point is the same in all rooms (we call this version R-PCAO no zoning); finally, to prove the necessity of the multi-modal control actions parameterized by the switching signal , we also implement a linear version of R-CAO (named R-PCAO linear), which implements zoning, but with a single linear controller (and a single quadratic value function) to handle all possible working modes shown in Fig. 2. The performance of all programs is tested over 7 different sets of 7 days, and a percentage range is given to show the effect of weather variability on the performance. Table 3 shows the improvement of R-PCAO and RBLM as compared to the two simple FSP strategies. It can be noted that R-PCAO attains total improvements in the range 26-33% as compared to the two FSP strategies. This is a remarkable 39-42% performance improvement over the RBLM strategy (whose total improvement is in the range 15-20% as compared to the two FSP strategies). Such improvements are quite consistent despite different external weather. It is not a surprise that, when no smart zoning action is implemented by R-CAO (R-PCAO no zoning), the performance almost falls back to the performance of RBLM: in fact, RBLM is a strategy optimized via a genetic algorithm that does not implement any smart zoning. Therefore, as compared to optimized strategies with no smart zoning, there is an additional benefit of around 4% reduced energy and 8% improved comfort in implementing a smart zoning program. Finally, the last row of Table 3 reveals that the linear version of R-PCAO (R-CAO linear) attains improvements in the range 21-27% as compared to the two FSP strategies, which is about a 18-19% performance degradation as compared to the proposed (switched) R-PCAO. From here we see that the switching action is fundamental to maximize performance of the load management and zoning program.
For easiness of analysis, let us now focus on one of the 7 sets of Table 3 (in particular on the last set), and let us plot the two bar charts presented in Fig. 6: the first bar chart reports the energy cost (in kWh), while the second bar chart reports the PPD (in %) for all strategies. The important observation is that R-PCAO scores better in both bar charts than all other strategies: for example, as compared to°FSP C 24 , R-PCAO saves more than 150 kWh in 7 days (even with 4% better comfort levels). On the other hand, as compared to°FSP C 25 , R-PCAO has only a slightly better energy cost, but a relevant 6% improvement in thermal comfort. This is particularly remarkable because R-PCAO consumes less energy than a strategy that, by not implementing any pre-cooling action and by raising the set point, is saving a large amount of energy. The reason behind this reduction of energy consumption is that R-PCAO manages to exploit more efficiently the renewable energy from the solar panel. For the same experiment as before, Fig. 7 reports that°FSP C 24 uses 389 kWh from the solar panel (35% of the total building energy),°F SP C 25 uses 395 kWh from the solar panel (41% of the total building energy), RBLM uses 475 kWh from the solar panel (44% of the total building energy), and R-PCAO uses a remarkable 590 kWh from the solar panel (51% of the total building energy). The interesting observation is that, if we look at the total energy consumption (non-renewable + renewable) for all strategies, we have:°FSP needs 964 kWh, RBLM needs 1079 kWh, and R-PCAO uses 1157 kWh, which is the largest of all. So, the benefit of R-PCAO is exploiting in a smarter way the renewable energy which is free of charge. Fig. 8 reports, for both RBLM and R-PCAO, the evolution of the set points for 1-day simulation (the different color variations indicate the different set point levels). The figure shows how R-PCAO is able to give sharper feedback-based adjustments as compared to the RBLM basic working point. In addition, while RBLM does not implement any zoning (all zones are optimized at the same set point, indicated on top of Fig. 8(a)), RPCAO is able to deliver different set points for each room (indicated in Fig. 8(b) with different color degradation). The difference in the set points from room to room can be up to 2°C. We conclude that  R-PCAO is able to learn the building dynamics from the EnergyPlus data and to use them for feedback-based control.

Robustness to variable weather
It is clear that rule-based programs can seldom guarantee robustness against weather conditions (unless the rules are weather-dependent, which is not the case for the rules in Section 2.1). In fact, when rulebased programs are optimized over a specific data set, robustness to different data sets may not occur. Additional simulation tests will now demonstrate that the proposed R-PCAO, by providing a feedback-based action, can enhance robustness to different weather conditions. In these simulations all the strategies are optimized over seven days in early August, and used over 4 different sets of 7 days of early autumn season (mid September-mid October 2011) and over 4 different sets of 7 days of late spring season (mid May-mid June 2011).
The purpose is to test to what extent a load management and zoning program optimized over short data sets can be robust when implemented over longer data sets (this feature is often referred to as generalization, i.e. the performance is consistent over data sets different than the data sets used for optimization). Moreover, we perform the optimization of RBLM and R-PCAO during the 4 sets of autumn season and the during the 4 sets of spring season, so as to check the degradation of performance with respect to the best possible performance:   we refer to such (best possible) strategies as RBLM autumn, R-PCAO autumn, RBLM spring and R-PCAO spring, respectively. Table 4 presents the results of the early autumn 4-week experiment and Table 5 presents the results of the late spring 4-week experiment. In order to avoid any bias arising from different weather, we compare the performance with respect to the FSP strategies. Obviously, the improvements of RBLM and R-PCAO, are worse than the improvements their full data set counterparts RBLM autumn, R-PCAO autumn and RBLM spring, R-PCAO spring, respectively. However, a notable difference arises: while the performance degradation of R-PCAO is acceptable (improvements in the range 21-30% as compared to 24-33% in R-PCAO autumn and 23-31% in R-PCAO spring), the performance degradation for RBLM is higher (improvements in the range 9-12%, as compared to 14-20% in RBLM autumn and 13-19% in RBLM spring. In other words, the average degradation of performance for R-PCAO is below 10%, whereas the average degradation of performance for RBLM is around 40%. Small degradation of performance basically indicates robustness to variability of external condition. Poor robustness of RBLM can be explained by the fact that RBLM does not employ any feedback control: therefore, in order to keep consistent performance under different dynamics and weather conditions, continuous tuning and redesign of RBLM from the user or from the building manager are constantly required. On the other hand, the feedback action embedded in R-PCAO (note that the feedback vector of R-PCAO includes external weather conditions) makes it intrinsically more robust.

Robustness to variable user behavior (occupancy schedule)
If weather conditions can be imagined to follow some continuous stochastic process, user behavior (in terms of occupancy schedule) will typically follow a discrete stochastic process (e.g. in the form of a Markov chain). Therefore, assessing robustness with respect to variable occupant behavior is even more crucial than with weather conditions. It is clear that the rules in rule-based programs as presented in Section 2.1 are occupancy-dependent. However, because a careful tuning of the set points in different times of the day and over different rooms is necessary (cf. Fig. 8), robustness is not guaranteed. Therefore, set points optimized over a short data set with a specific occupancy schedule, may not generalize to longer data sets with variable occupancy behavior. Similarly to the previous set of simulations, RBLM and R-PCAO are optimized over seven days in early August, and used over 4 different sets of 7 days. During this 4 sets, we keep the same weather conditions, and we only change the occupancy schedule: the different occupancy schedules are generated as a perturbation of the nominal schedule in Table 2. Moreover, we perform the optimization of RBLM and R-PCAO during the 4 sets of variable occupancy schedules (full data set), so as to check the degradation of performance with respect to best possible performance: we refer to such (best possible) strategies as RBLM schedule and R-PCAO schedule, respectively. Table 6 presents the results where, in order to avoid any bias arising from different occupancy schedules, we compare the performance with respect to the FSP strategies. Because the same weather conditions are used for all weeks, the indicated range refers to the different occupancy schedules (differently from Tables 3-5 where the range was arising because of different weather). By looking at Table 6, a notable difference arises: when optimized over the full data set, the range of improvements of RBLM schedule and R-PCAO schedule is quite narrow (the differences are of the order of 1-2%). On the other hand, optimization over a shorter horizon leads to ranges of the order of 3-4% for R-PCAO and of 6-7% for RBLM, which shows that R-PCAO is more robust to variability in the occupancy schedule. This happens because the feedback action embedded in R-PCAO exploits the relations between the occupancy status and the thermal state of the building, so that such relations can be generalized when different occupancy schedules occur.
We conclude that the proposed R-PCAO program, due to its embedded feedback action (with information of thermal and occupancy conditions), leads to consistent improvements even under changing conditions, covering both variable occupancy patterns and variable weather.

Conclusions and future work
Load management actions in large buildings often neglect the occupant-building dynamic interaction and prevent smart zoning strategies, i.e. setting the temperature in every room at different levels according to geometry, orientation and interaction among rooms caused by variable occupancy patterns. In this work we created a novel selftuning demand management architecture that adds, on top of a rulebased load management, sharper feedback-based zoning actions: this was achieved by embedding multi-mode (switched) behavior into the approximate solution of a Hamilton-Jacobi-Bellman framework. We demonstrated the proposed load management and zoning program via a test case for intelligent management of heating, ventilating and air conditioning (HVAC) in a building with multiple zones. In particular, to demonstrate the proposed approach, we employ a realistic EnergyPlus model of an actual office building in Crete, Greece. Extensive tests show that the proposed solution is able to learn the building dynamics and to provide different set points in every room in such a way to optimize the whole building performance (exploitation of renewable energy sources with improved thermal comfort). As compared to pre-programmed (non-optimal) strategies, we show that smart zoning makes it is possible to save more than 15% energy consumption, while the thermal comfort results increased more than 25%. As compared to optimized strategies in which smart zoning is not implemented, smart zoning leads to additional 4% reduced energy and 8% improved comfort, demonstrating improved occupant-building interaction. Moreover, the proposed solution is robust to occupant behavior and weather conditions the closedloop feature of the approach makes it robust to variable weather conditions and occupancy schedules.
Despite its positive performance, this approach is open to future improvement. First, it would be really relevant to include the occupant interaction with the building automation system, for example opening window/doors or overruling the set point: unfortunately, we could not find any 'human model' that could be included in EnergyPlus to simulate such human response: we would like to study this relevant point on future work. Second, load management and zoning could be connected with demand response programs defined by the grid operator via price-based incentives. This scenario could be handled via game-theoretic approaches, to be handled for example in an extended Hamilton-Jacobi-Isaac formulation: studies in this direction can be found in [42]. Finally, some further steps are still necessary to reach a fully-automated program, for example plug-n-play integration of new loads, storage and generation devices. Studies in this direction can be found in [43] with multiple control strategies taking different decisions depending on the available equipment.