Passive versus active learning in operation and adaptive maintenance of Heating, Ventilation, and Air Conditioning

maintenance scheduling are integrated in the same optimization framework. Continuous and discrete states are embedded as hybrid dynamics of the system, while considering both continuous controls (for energy management) and discrete controls (for maintenance scheduling). To account for the need to estimate the equipment efficiency online, the solution to the problem is addressed via an adaptive dual control formulation. We show, via a zone-boiler-radiator simulator, that the best economic cost of the system is achieved by active learning strategies, in which control interacts with estimation (dual control design). parameter in operating the automaton. Let us define a simple base case strategy for operating the supply water temperature:


Introduction
The economic cost of buildings is largely dependent on control and maintenance of Heating, Ventilating and Air Conditioning (HVAC) equipment [1]. For example, neglecting any performance degradation or even faults in HVAC will irredeemably lead to increased costs for facility managers and building owners. While control decisions have a direct impact on energy consumption [2], literature has shown that the effect of performance degradation is more complex and essentially twofold: firstly, increased consumption of resources in order to compensate for the system inefficiency [3]; secondly, failure to meet the given set points leading to decreased comfort within the building (with loss of productivity, complaints, etc.) [4]. This human-centered impact adds to the cost of maintenance, which should be scheduled optimally in such a way to minimize the overall adverse economic effects. In a nutshell, joint control and predictive maintenance is a complex and largely unsolved optimization problem involving the joint design of estimation and control. In this work, we formulate such problem as a dual control problem: the term 'dual' refers to the twofold action of the control action, which is in charge of both running the HVAC system toward optimal performance and of reducing the uncertainty when estimating HVAC degradation. We address two types of actions: a continuous action involving the selection of the HVAC set point (e.g. water temperature set point for boilers), and a discrete action determining maintenance (e.g. repairing or not the boiler). Due to the thermostatic mechanism of HVAC operation, we embed such mechanism in a hybrid dynamical system with continuous and discrete dynamics, thus requiring the solution of a hybrid control problem.
Because it involves the monetization of HVAC performance and discomfort, the formulation and the solution of such control problem is relevant and, to the best of the authors' knowledge, novel. Previous researches have analyzed smaller aspects of the global problem, namely: (a) optimization of HVAC energy consumption and thermal comfort (with no focus on maintenance); (b) performance monitoring, i.e. fault detection and identification of HVAC equipment (with no focus on scheduling maintenance); (c) scheduling HVAC maintenance (with no focus on the human-centered impact of decreased comfort). In the following, we review these three research directions.

Related works in optimization of energy/comfort
With respect to optimization of energy consumption and thermal comfort, several strategies can be found in literature to predict the effect of changing the control strategy on indoor comfort [5], or energy consumption [6], or both [7]: typical criteria driving the optimization include maximizing economy while satisfying power demand [8], optimizing components sizing [9], maximizing self-consumption [10], balancing natural ventilation and air conditioning [11], and many more techno-economic criteria [12] (see also references therein). The terms 'human-in-the-loop optimization' [13], or 'comfort-driven optimization' [14], or ' occupancy-based optimization' [15] are sometimes adopted, referring to the fact that the energy demand is ultimately driven by human needs [16]. See also the recent review [17]. In [18], a datadriven approach for minimization of HVAC energy consumption and room temperature ramp rate is presented. Intelligent glazed facades is the subject of [19], with emphasis on the influence of different control policies on energy and comfort performance. The authors in [20] apply particle swarm optimization to optimize the set points based on some comfort zones. In [21] the operation of variable air volume HVAC is optimized with respect to comfort and indoor air quality. The influence of thermostat operation on energy consumption and thermal comfort is studied in [22,23] focuses on integration of multiple HVAC systems, [24] studies how to optimize simultaneously several HVAC set points, and [25] studies cooperation among intelligent HVAC systems. Cooperative HVAC control has lead to studying the effect of HVAC operation at the grid level, such as demand response [26] or other ancillary services [27]. All these approaches show, sometimes also via real-life experiments, that relevant energy savings can be achieved without compromising thermal comfort. However, in these and other related works the degradation of HVAC components is neglected to a large extent: the HVAC system is assumed to work as good as new, thus neglecting the possible waste of energy and loss of comfort due to HVAC degradation.

Related works in performance monitoring
On the other hand, much literature has been focusing on HVAC performance monitoring, both at a system-level or at a component-level [28]. System-level approaches describe the HVAC system as a network of interconnected subsystems [29]: for every subsystem, a monitoring agent is designed that combines local and transmitted information from its neighboring agents in order to provide a decision on the type and location of the faults [30]. In the presence of uncertainty, decisions can be based on stochastically robust thresholds [31], adaptive thresholds [32], or on state estimation techniques [33]. Centralized (in place of distributed) strategies are also possible, like the data-driven automated building HVAC fault detection methods in [34] and the system identification-based method in [35]. At a component-level, mainly boilers and air handling units have been studied. For boilers, in [36] a model was developed to predict the seasonal efficiency based on the efficiency at full load evaluated at return water mean temperature. In [37] heat and mass transfer analytical models of a condensing heat exchanger system were developed to predict the boiler efficiency according to design parameters choices: the model in [38] includes flue gas outlet temperature, supply water temperature, water vapor mole fraction, and condensation rate of water vapor. A dynamic relation between boiler efficiency and state of the heat exchange can be derived from the model in [39]. In [40] algorithms for real-time monitoring of condensing boilers have been developed. For air handling units, the work in [41] focuses on monitoring techniques as part of the on-going commissioning process. The set of expert rules derived from mass and energy balances in [42] is able to detect faults in air handling units, whereas [43] adopts Kalman filtering techniques instead of expert rules. A detailed overview of fault detection and diagnosis methodologies on airhandling units is given in [44]. What is missing in current fault detection and diagnosis methodologies is a complete monetization analysis taking into account the balance between costs due to loss of performance and costs due to maintenance actions. A work partly going in this direction is [45], which adopts a hybrid approach utilizing expert rules, performance indexes and statistical process control models: in this way it is possible to include increased energy consumption due to HVAC degradation. Summarizing, most works on fault detection and diagnosis do not investigate the whole economic aspects of degraded HVAC operation.

Related works in scheduling maintenance
In the category of maintenance, the authors in [4] develop commissioning strategies to identify cost-effective operational and maintenance measures in buildings to bring them up to the optimum operation. The aim of [46] is to early plan maintenance interventions for a multi-components system based on stoppages characteristics, system remaining useful life and components criticalities. Retrofitting is the focus of [47]. The approach in [48] focuses on operational and cleaning costs of a biomass boiler. In [49] the energy and economic performance of energy recovery ventilators is studied as a function of parameters such as climate, building design and HVAC system parameters. An overview of procedures about continuous commissioning in office buildings is given in [50]: interestingly, this work discusses how to select good models not only for maintenance, but also for model-based control. However, rarely these two aspects are connected into a humancentric (e.g. comfort-driven) maintenance strategy: notable exceptions are [51], where it is recognized that discomfort plays an important role in determining when the maintenance is performed, and [52], that investigates the maintenance characteristics of HVAC system that affect occupants' satisfaction. However, what is missing in these works is recognizing the role of control in reducing uncertainty (e.g. uncertainty around efficiency parameters). To clarify this point let us observe this: the use of identification techniques as in [52] to establish relationships among quantities, e.g. regression models, is a passive learning method; on the other hand, the use of the control action to improve the fidelity of the regression models while minimizing HVAC operational costs (dual control action) is an active learning framework, whose formulation and solution is still missing.

Main contribution and originality
In this work we address the gaps in the state of the art by considering an active learning framework which is relevant to the maintenance optimization problem. The monetization model we propose will incorporate in a comprehensive cost function the operational costs of HVAC equipment subject to degradation, the human-centered costs of the fault occurring in the system, and the costs of maintenance actions. The following points are covered in this work that, to the best of the authors' knowledge, have not been covered in the state of the art: • The control, the monitoring, and the maintenance problems are recast in the same optimization framework via a dual control formulation (joint design of control and estimation).
• The thermostat hysteretic behavior is embedded as hybrid dynamics of the system (with continuous and discrete states). In addition, both continuous and discrete control actions are considered.
• Comparisons between passive learning strategies and active learning strategies are provided.
In order to keep the optimization problem tractable, assumptions and simplifications have been made when describing the joint control/ estimation problem: such assumptions and simplifications have been studied by the authors in such a way to retain the main features of the HVAC problem. It is worth mentioning that the proposed framework is validated (cf. Section 7) on a zone-boiler-radiator simulation environment developed within the European Union project 'Advanced Methods in Building Diagnostics and Maintenance (AMBI)' (FP7-PEOPLE-2012-IAPP -Industry-Academia Partnerships and Pathways).
The rest of the paper is organized as follows: in Section 2 the HVAC and room models are given; in Section 3 the efficiency model is given, whereas in Section 4 all the continuous/discrete dynamics are recast as a hybrid system. The role of uncertainty is covered in Section 5, and the proposed adaptive approach is in Section 6. Section 7 gives the simulation results, and Section 8 concludes the work.
Notation: The notation is quite standard as explained in Table 1. The subscripts B R Z , , refer to boiler, radiator, and zone, respectively. The subscripts rw sw , stand to return and supply water. The numerical values of such parameters used for simulation purposes are reported in the Appendix.

HVAC and room models
In the following we provide the details of the model used for synthesis of the maintenance strategy. We consider a single zone whose HVAC consists of a radiator driven by a boiler. The model has been selected as a trade-off between depth of description of the thermal/ energy dynamics and computational feasibility of the maintenance strategy: Dynamics of boiler, radiator and zone are presented in order.

Boiler
We will focus on a condensing boiler which, whenever the return temperatures from the heating system is below the dew temperature of the flue gas, can recover the latent heat of water vapor in the flue gas so as to achieve higher efficiency than traditional boilers. Above the dew temperature, no latent heat is recovered and the boiler will operate in a non-condensing mode [53]. We assume that the boiler has no dynamics, which amounts to assuming being in steady-state operation. This is a reasonable assumption, since most boiler models available in literature are static models [40]. The input provided to the boiler is the supply temperature set point T sw , which determines the power (on the water side) necessary to reach the set point, according to where c w is the specific heat of water in kJ/kg°C], w wB is the boiler mass flow rate in [kg/s], T sw and T rw are the temperatures in [°C] of the supply and return water exiting and entering the boiler, respectively. Let us distinguish between the power on the water side and the power on the gas side, by calling them p out and p in , respectively. The output power of the boiler in [kW] (on the water side) is in is the input power to the boiler in [kW] (on the gas side) and T ( ) rw is the (dimensionless) efficiency curve depending on T rw (an example of this curve is shown in Fig. 1). 1 The boiler mass flow rate w wB and the firing rate will be assumed to be constant.

Radiator
The radiator is modeled as a first-order system as follows where w is the density of the water in [kg/m 3 ], V R is the volume of the radiator in [m 3 ], w wR is the water mass flow rate into the radiator in is the surface area of the radiator and h R is the convection heat transfer coefficient of the radiator in [kW/m 2°C ]. In addition, T rwR represents the temperature in [°C] of the return water out of the radiator, and T Z is the temperature in [°C] of the air in the zone. The quantity R represents a stochastic process noise. According to (2), the heat exchange with the room occurs via the difference between the zone temperature and the radiator mean temperature. Assuming a heating setting (late fall or winter), we model the thermostatic control in the radiator as follows: In other words, the valve can be fully open or fully closed, leading to raising or decreasing temperature (thermostat hysteretic behavior).
where is the discretization step of the valve. This model is not considered to keep the control setting as simple as possible.
The radiator receives water from the boiler: the presence of a radiator shunt splits the mass flow rate as where w wS is the water mass flow rate of into the shunt in [kg/s] and c S is the minimum percentage of flow circulating in the shunt. In other words, even if the radiator valve is fully open, a mass flow rate c w S wB will circulate in the shunt. As shown in Fig. 2, the return water to the boiler is given by the mixing between the return water from the radiator and the supply water,

Zone
The zone is modeled as a first-order system interacting with the outside air, with the neighbor zones, and with the radiator where

Boiler efficiency
The efficiency of the boiler is approximated as a piecewise affine function of T rw , similarly to what shown in Fig. 3. The approximation range is 30-71°C (corresponding to 90-160°F): with 4 for continuity of the efficiency curve. In order to reduce the number of parameters from four to three, we explicitly make use of the continuity condition, so that the previous expression can be written as The curve (9) has not only less parameters, but it is also continuous Fig. 1. Condensing boiler efficiency curve. As a condensing boiler can recover the latent heat of water vapor in the flue gas, its efficiency is higher low water temperature.  by construction. The dew point T dew is the temperature at which the condensing process will occur: as commonly done by all boiler manufacturers, this temperature is given in terms of the return water temperature even if, from a physical point of view, it should be calculated in terms of the flue gas temperature [39]. The dew point is commonly in the range T 54-58 rw°C (slightly depending on the flue gas composition). The temperature of the return water T rw is calculated as in (6). In the following, it is described how the parameters of the curve (9) change with time as a consequence of performance degradation.

Boiler degradation and maintenance
In order to include performance degradation we consider the following multiplicative degradation (10) is used to model the deleterious effects of processes like deposition, erosion and corrosion. The formulation (10) leads to the degradation model Remark 2. In other words, the linear-in-the-parameter efficiency model (9) leads to a degradation model (11) whose parameters evolve linearly with the degradation d t ( ). So, these parameters (which are typically unknown) can be estimated using standard state estimation techniques (cf. Section 5).
We assume an exponential incipient degradation where > 0 is the degradation rate, i.e. the rate of deposition, erosion and corrosion deteriorating the efficiency of the boiler. The relation (12) implies that the half-life of the boiler, i.e. the time necessary for the efficiency to fall to one half of its initial value is ln (2). Most condensing boilers have half-life time constant of several months or a few years. The exponential incipient degradation (12) results in the following degradation model for the parameters  (13) which is a set of first-order filters driven by stochastic noises. The parameters , 1 2 new new and 3new describe the efficiency for a new boiler. Remark 3. The stochastic noises , , 1 2 3 account for model inaccuracies, for example if the degradation is not exactly exponential. Similarly to R and Z , by setting appropriate covariances for such disturbances, the designer will be able to set to which extent the model is an approximation of the actual system (the larger the covariance, the larger the modelling inaccuracies).

Actions
Two possible actions, namely control and maintenance actions, can be taken on the system: • The first type of action is the local continuous control, i.e. setting the set point for building equipment: in the case at hand, this amounts to properly setting the supply hot water temperature set point T sw for the boiler.
• The second type of action is the maintenance discrete action, i.e. the repair at a certain time t of the building equipment. In this work we consider an ideal repair restoring its performance to the initial performance: where x and MAINT are the variables and the set to be used by the maintenance strategy, as they will be defined later. In other words, whenever it occurs, the maintenance action is supposed to restore the state of the boiler to its initial value

Automaton formulation
The boiler-radiator-zone system can be described by a particular class of stochastic hybrid system. A hybrid dynamical system is an indexed collection of dynamical systems along with some map for jumping among them (switching dynamical system and/or resetting the state). This jumping occurs whenever the state satisfies certain conditions, given by its membership in a specified subset of the state space. The hybrid dynamical system can be described as with constituent parts as follows • Q is the set of index states or discrete states.
q is a controlled dynamical system. Here, X q are the continuous state spaces, and f q are the continuous dynamics; U q is the set of continuous controls.
for each q Q , is the collection of autonomous jump sets.
S is the autonomous jump transition map, parameterized by the transition control set V q , a subset of the collection = V { } q q V Q ; they are said to represent the discrete dynamics and controls.
is the collection of controlled jump sets.
where is the collection of controlled jump destination maps.
is the hybrid state space of the dynamical system.
For the problem at hand we have: • The continuous state spaceX q arises from the room temperature (7), radiator return water temperature (2) (which can be observed) and the boiler performance parameters (13) (which have to be estimated).  This defines the map F : the continuous state (boiler performance parameters) changes impulsively on hitting prescribed regions of the state space (maintenance action region). The hybrid dynamical system can be represented as an automaton as in Fig. 4: note that the room temperature and radiator return water temperature evolve continuously, but the performance parameter evolve discontinuously after repair. The different regimes of the automaton are formally defined as: Regime valve open  (16) Regime valve closed where the constraints have been selected taking into account typical operating conditions. It is clear that the control actions influence the transitions and thus the behavior of the hybrid dynamical system. In the following we will introduce a cost to quantify the performance associated to a certain behavior.

Cost
The operation of the automaton presented in Fig. 4 must be optimized taking into account the following cost: The four terms are all monetized in €, as explained in the following.
• Operational costsC oper : the first term in (19) is related to the costs of system operation, which in most cases is simply the cost of energy consumed by the HVAC system for given time step as function of observations and inputs. In the boiler case, the energy consumption is given by p in , i.e. the energy (at the gas side) necessary to reach the set-point temperature, based on the efficiency of the boiler. As the boiler degrades its performance, more and more energy will be necessary to achieve the same set point (for the same return water temperature). The economic value of this term is derived from the natural gas price statistics in EU-28 [54], which is 0.07 €/kW h • Failure costsC fail : the second term in (19) is related to the costs due to improper behavior of the system, which in our case is not following the desired temperature T d . We focus on zone set points, since in building domain these are crucial constraints to occupant comfort and thus productivity. It has been estimated by the Federation of European Heating, Ventilation and Air Conditioning (REHVA) that improving indoor environment in office buildings would result in a direct increase in productivity of 0.5-5%: reduction in performance is around 4% at cooler temperatures and 6% at warmer temperatures [55]. To the purpose of this study, the failure cost is taken as the squared distance from the desired temperature where dt is the sample time. The following estimate is made: losses of 0.2 €per sample time for one degree far from the desired one and 0.8 €per sample time for two degrees far from the desired one, and so on.
• Maintenance costsC maint : the third term in (19) is related to the costs of maintenance actions. We will focus on the maintenance action of reparation.
as the cost of condensing boiler reparation vary in the range 500-2000 €depending on the brand or output [56].
• Cycling costsC cycl : finally, one last term should be considered in (19) mainly due to well-posedness reasons = C t ( ) 0.15 at autonomous jumps cycl (23) In fact, in order to have a well-posed formulation with no chattering phenomena (high frequency control actions), it is necessary to penalize any transition caused by autonomous switches (i.e. changes in the valve). Roughly speaking, assigning a cost to such switches has an interpretation in terms of avoiding fast cycling, which could potentially wear out the equipment. Fig. 4. Automaton associated to the joint energy/maintenance problem: Tit comprises two regimes (on/off) driven by the thermostat. Each regime contains continuous dynamics for evolution of temperature and degradation. The maintenance action is a discrete control that restores the efficiency of the boiler.

Uncertainty
Let us specify which coefficients can be measured and which ones must be estimated. The known coefficients are: • The thermostat hysteretic threshold h. • Properties of fluids (density and heat capacitance of air and water). • Volumes of boiler, radiator and zone. • The heat transfer coefficients. • The exponential decay of the boiler efficiency .
Uncertainty arises from the unknown coefficients in the boiler efficiency curve (the efficiency of the boiler cannot be known perfectly and it is thus subject to uncertainty): • The coefficients , 1 2 and 3 are unknown and must be estimated.
Let us consider a least-squares estimator for the following linear-inthe-parameter model where p in is assumed to be measured (from gas side measurements). The least-squares estimator takes the form is the regressor vector, and P is the covariance matrix of the uncertainty. In order to take into account the error arising from neglecting the boiler transient, the following stochastic model is taken where B is a stochastic noise. Because the management/maintenance algorithm is ultimately implemented on a digital controller, we consider the estimator in discrete time in place of (28). After discretization of (13) using backward Euler and sample time dt, the parameters in T ( ) rw are estimated using the stochastic Kalman filter at the end of the page.

Remark 4.
It has to be noted that, despite the fact that the condensing boiler operates in two modes, a unique estimate¯and a unique covariance matrix P are updated. Because 1 and 2 are shared in both modes, but 3 can be observed only in the non-condensing mode, the following estimation strategy is adopted: • in the non-condensing mode, full-order update (for the three components from measurements) and full-order prediction (based on dynamics for the three components) is performed; • in the non-condensing mode, reduced-order update (for the first two components from measurements) and full-order prediction (based on dynamics for the three components) is performed.
This amounts to assuming that when 3 is not observed its estimate is only based on the evolution of its dynamics. In fact, 3 will decay exponentially also when it is not observed (in the condensing mode). Also, every time a maintenance occurs, we reset the covariance matrix P to some initial value, i.e. = P t P (¯) new , where t is the same instant as in (14). This is done in order to reset the a priori knowledge of the Kalman estimator.

Optimization approach
The idea is to adopt the unified framework for optimal control in hybrid systems [57,58]. The following total discounted cost is considered where is the discounting factor for the future cost. The decision variables over which this cost has to be minimized are: the continuous control T sw (boiler set point), and the maintenance strategy, which will both be defined in the following.
According to the way the automaton in Fig. 4 is operated, the cost (30) can be closer to or more distant from the optimum. It is clear that the continuous control T sw (supply water temperature) is a crucial parameter in operating the automaton. Let us define a simple base case strategy for operating the supply water temperature: For the maintenance strategy, a simple idea is to perform maintenance once the efficiency falls below a certain percentage r% of the efficiency of a new boiler. However, in view of uncertainty in the parameters , 1 2 and 3 , the efficiency must be estimated. Therefore, we propose two different strategies depending on how the estimated efficiency is used.

Certainty equivalence strategy
The simplest idea for maintenance strategy is to perform maintenance once the estimated efficiency falls below a certain percentage r % th of the efficiency of a new boiler. In other words, in this certainty equivalence framework the control action is calculated as if the estimate k ( ) were exact. This amounts to neglecting any uncertainty in the estimate where T ( ) new rw is the efficiency curve of a new boiler (with parameters , 1 2 new new and 3new ), and it has to be noted that we take into account that the efficiency curve depends on T rw .

Remark 5.
Two things must be noticed. The first one is that it is not difficult to include the firing rate, call it fr in the efficiency model, i.e.
T fr ( , ) rw : this can be simply achieved by a linear in the parameter model not only with respect to T rw , but with respect to T rw and fr. In fact, most efficiency curves for boilers, heat pumps etc are given as linear in the parameter models with respect to two or more parameters (cf. [23] for more details). The second one is that, according to (14), we have 1 2 3 and that the strategy MAINT is defined by the parameters T r , , , dew th 1 2 new new and 3new .

Cautious strategy
The maintenance strategy (33) does not take into account any uncertainty in the estimation of the efficiency. A simple design to take into account the uncertainty is a 'cautious' control action that adds a measure of caution depending on the uncertainty: to this purpose we first define the covariance of the efficiency and then we define the 'cautious' threshold In other words, for the same threshold r th , the cautions controller will tend to do maintenance more often if is large. Basically, we notice that the numerator decreases if the uncertainty associated to T ( ) rw increases. The uncertainty on T ( ) rw is measured as the square root of the covariance t ( ) 2 . If reduces to zero, then the cautious strategy converges to the certainty equivalence strategy. Therefore, a crucial question arises: is it possible to actively reduce uncertainty by means of the control action? The next strategy tries to address this question.

Dual strategy
Both the certainty equivalence and the cautious strategy are adaptive because they depend on an efficiency curve that is estimated, and thus adapted, online. However, both the certainty equivalence and the cautious strategies are passive learning policies because they do not involve any active probing signal generated to improve the estimation of the efficiency curve. In the following, we want to create a more active learning mechanism. Let us define the proportional gain: In other words, the control gain increases if the uncertainty in T ( ) rw increases. This is because when the uncertainty is large, larger control actions might help in reducing the estimation error (and thus in reducing t ( ) 2 ). It has to be remarked that p in enters the linear-in-theparameters model, thus the selection of T sw has an effect on reducing the uncertainty. Note that the opposite mechanism (the gain k p is decreased as the uncertainty is increased) is not desirable: since the effect of a control action on estimation is not taken into account, this can lead to turning off the controller if the uncertainty becomes too large.
In view of these considerations, the dual strategy becomes Summarizing, the proposed strategies are illustrated in Table 2: their performance will be compared via numerical simulations.

Table 2
Summary of the tasks to be accomplished (energy management and maintenance schedule) and of the policies to accomplish them (certainty equivance, cautious and dual policy).

Simulation experiments
Simulation experiments are performed on a 'smart building' simulator developed by the authors. The simulator environment is based on the zone-boiler-radiator dynamics described in the previous sections, implemented in Matlab in a similar way as previously done by some of the authors in [23]. A visualization of the features of the simulator can be seen in Fig. 5, with the available measurements, the flow diagrams of energy/maintenance controls and the different policies. The simulator also comprises a few additional modules, such as thermostat features and testing criteria which are proprietary and cannot be disclosed due to intellectual property agreement. The weather data used for the simulations represent outside temperature and solar radiation for 36 days; as it can be seen from the trend of the weather, initially we have a quite rigid winter that evolves into a milder one. In order to consider a longer simulation horizon, we repeated these 36 days for 60 times, with random perturbations on the values. In this way we are able to simulate around 2200 days of winter season (Fig. 6).
The simulations are run to optimize the following parameters: • For the certainty equivalence strategy: k p and r th ; • For the cautious strategy: k p and r th ; • For the dual strategy: k k , p and r th .
Because of the low number of parameters, we can optimize the parameters using a brute force approach over a grid. The following initial grid has been chosen for optimization  (38) whose meaning is the following: the proportional gain k p is selected from low gain (shallow control) to high gain (aggressive control); the threshold r th goes from 10% degradation to 55% degradation with steps of 5%; finally, the probing gain k goes from low probing to high probing. All experiments run on a Dell OptiPlex 7060 MT, Intel Core i5- Fig. 5. Visualization of the features of the zone-boiler-radiator test case used to test the proposed framework (on the left is only an artist's impression and the actual test case is written in Matlab). The building test case (of around 150 m 2 ) contains a boiler driving a zone with radiators. The boiler uses a proportional controller to set the water supply temperature, while the thermostatic controller determines on and off regimes. The efficiency of the boiler degrades with time, so that maintenance is needed. The flow diagrams of energy/maintenance controls are on the right. The overall energy/maintenance scenario can be managed according to three policies (certainty equivalence/cautious/dual).   ). Then, when a rough estimate of the optimal point has been found, the grid can be refined and reduced in size to further improve the performance. Spanning this smaller grid takes around 30 min. Therefore, we verified that the computational complexity of the proposed approach is relatively low, and it is due to the fact that one policy can be simulated over a 2200 day horizon in less than half a minute. Key to such a fast simulation is the relatively simple nature of the proposed hybrid modelling, which can simplify the model while still retaining the main features of the HVAC maintenance problem. We believe that the proposed hybrid modelling reaches quite a good trade-off in terms of realistic maintenance scenario and complexity of the formulation. It is inevitable that, when increasing the number of states and actions, the proposed optimization would become more and more complex and cumbersome: such an issue can be studied in future work. The results of the optimization are shown in Table 3, where it is shown that the dual strategy can improve the cost of around 0.02 €/h      with respect to the certainty equivalence strategy, and of around 0.01 €/h with respect to the cautious strategy (note that this amounts to around 175 €/year and 87.5 €/year, which are non negligible savings for a zone of 150 m 2 ). The first thing to be noticed is that apparently the certainty equivalence strategy is not the best strategy to be adopted: this is because the controller is designed independently from the estimator, therefore learning is passive and uncertainty leads to non optimal maintenance decisions. By selecting a lower threshold, the cautious strategy adds some caution in the maintenance which helps in improving the cost: however, even in this case the controller is designed independently from the estimator, therefore learning is passive. The dual strategy is the one for which the controller is co-designed with the estimator (active learning): this dual role of control is apparently really necessary to improve the performance even more. This is clearly an interesting result, as it shows that passive learning (certainty equivalence or cautious strategies) on the long run does not pay.  [13][14][15], it can be seen that the dual strategy is the one that saves more on energy (followed by the cautious and the certainty equivalence): therefore, the dual strategy seems to be the one with the best     tradeoff between energy costs and maintenance costs. Figs. 16-18 show the estimation of the efficiency parameters and the covariance of the efficiency curve: the peaks that can be noticed in the figures correspond to a maintenance action restoring the efficiency to their initial values (this also requires to reset the covariance matrix of the estimator). The boiler has to be maintained several times because the degradation rate of the boiler has been set a bit higher than usual on purpose, in such a way to have a richer scenario in which maintenance is required quite often. As indicated in the figures, the dual strategy is the one with the smallest estimation error (this can be seen also by noticing that the dual strategy is the one whose covariance matrix decreases faster), followed by the cautious strategy and by the certainty equivalence strategy. Therefore, the estimation of the dual strategy is more accurate, which is due to the presence of the term k . Note that the certainty equivalence strategy is the one with the largest gain k p , therefore one would expect somehow a better estimation performance due to high gain: however, this does not happen and it is really the term k in the dual strategy that, by making the controller interact with the estimator, contributes in a sensible reduction of the estimation error. Overall, the simulations demonstrate that the best economic cost of the system is achieved by active learning, i.e. the dual strategy, in which control interacts with estimation.

Conclusions
In smart buildings, the models used for Heating, Ventilation, and Air Conditioning energy management and for maintenance scheduling differ in scope and structure: while the models for energy management describe continuous quantities (energy, temperature), the models used for maintenance scheduling describe only a few discrete states (healthy/faulty equipment, and fault typology). In addition, models for energy management typically assume the Heating, Ventilation, and Air Conditioning equipment to be healthy, whereas the models for maintenance scheduling do not take possible human factors (e.g. discomfort) into account. In this work, a framework for human-centric optimal maintenance is proposed: energy management and maintenance scheduling for Heating, Ventilation, and Air Conditioning are recast in the same optimization framework. Both continuous and discrete states are embedded as hybrid dynamics of the system: in addition, both continuous controls (for energy management) and discrete controls (for maintenance (scheduling) are considered. Because of the presence of uncertainty (the status of the equipment) the solution to the problem is addressed via an adaptive dual control formulation, where control occurs jointly with estimation. Numerical examples obtained via a zoneboiler-radiator test case demonstrate the effectiveness of the approach.
This work can further proceed in many directions: (1) considering more complex models can give a more realistic measure of human comfort, such as the Predicted Mean Vote and the Predicted Percentage of Dissatisfaction; (2) studying trade-offs between complexity of human comfort models (requiring measurements of metabolic rate, ratio of clothed/nude surface area, surface temperature of clothing, air velocity relative to human body, etc.) and most commonly available sensors (temperature and humidity); (3) studying the feasibility and the effectiveness of considering simplified models, i.e. studying how the simplifications in the human comfort model introduce additional uncertainties to be estimated on line and to be embedded in the optimization; (4) extending the maintenance actions by considering non-ideal reparation (that would decrease the degradation without restoring the initial state) or inspection (whose effect would be to improve the estimate of the efficiency, possibly resetting the estimate and the covariance matrix).