Multi-task deep reinforcement learning for intelligent multi-zone residential HVAC control
Introduction
The latest development in machine learning such as deep learning and reinforcement learning techniques are being widely discussed in many critical areas that were once dominated by human intelligence, such as robotic control and autonomous driving [1], as well as in the field of power and energy [2]. In particular, the deep reinforcement learning (deep RL) method has been implemented for controlling heating, ventilation, and air conditioning (HVAC) systems to achieve both an economic benefit and improved customer comfort. In [3], a model-free deep Q network (DQN) is applied for joint data center and HVAC load control in mixed-use buildings to reduce energy consumption. In [4], the authors compare the value-based DQN method with the policy-based deep policy gradient (DPG) method in residential energy management, and demonstrate that the latter is more suitable to perform online scheduling of energy sources. Given that many control variables in HVAC thermal control are continuous, the deep deterministic policy gradient (DDPG) method is implemented in [5,6], to avoid the discretization of the control variables and to obtain better learning performance. In [7], the authors utilize imitation learning to pre-train the HVAC control agent on historical data to make it behave similarly to the existing controller. Following this, the RL agent continues to improve its policy during the online training using a policy gradient method proximal policy optimization (PPO). In [8], the authors further extend the deep RL algorithm to optimize multi-zone HVAC system control, where a set of actor network and critic network is designed for each thermal zone, and feature extraction from selected neighbor zones is collected to better capture the mutual thermal effects between different zones for improving the control policy.
While the effectiveness of the deep RL based HVAC control methods has been illustrated in the above existing researches, one deficiency is that the majority of the researches focus on learning a single HVAC control task by merely training the algorithm in either the cooling scenario or the heating scenario. Retraining of the algorithm is required when the scenario switches. It is widely known in the RL community that the training of the algorithm can be time-consuming and resource-consuming. Solving only one task at one time is less efficient and acceptable as more complex control problems emerge. Motivated by the above concern, in this short communication, we work on teaching the RL agent to master both the cooling and heating tasks simultaneously to guarantee an optimal HVAC control regardless of the scenario. A multi-task DDPG algorithm is developed for this purpose and is further tested in a multi-zone residential HVAC system. Comparisons with a rule-based HVAC control strategy and a single-task DDPG algorithm demonstrate that the multi-task DDPG algorithm has higher generalization and enables lower energy consumption cost and less user comfort violation through intelligent scheduling.
Section snippets
Multi-task DDPG for multi-zone residential HVAC control
The changing of indoor temperature under the control of residential HVAC system can be formulated as a Markov Decision Process (MDP) [9], and the key parameters are defined as follows:
- 1)
State: the outdoor temperature Tout(t), the indoor temperature Tin,z(t) for each zone z, and the retail price λretail(t), where t is the index of time step; 2) Action: the setpoint Setptz(t) for each zone z; 3) Reward: the total energy consumption cost plus the temperature violation penalty, as shown below:
Simulation results
The above multi-task DDPG algorithm is tested in a two-zone residential HVAC building model [11]. The weather data and Georgia Power price data from [12], [13] are used for algorithm training and testing. The Georgia Power price contains only two price values, a peak price value at 0.2$/kWh, and an off-peak price at 0.05$/kWh. For the cooling scenario, the algorithm is trained with data from Jul. 1st, 2019 to Jul. 31st, 2019 and tested with data from Aug. 1st, 2019 to Aug. 10th, 2019, and the
Conclusions
In this short communication, a multi-task DDPG method is applied to learn the setpoint control strategies of multi-zone residential HVAC systems in both cooling and heating scenarios. The multi-task learning process can lead to a more generalized feature extraction among different tasks that share some similarities and improves learning efficiency compared to single-task learning. Comparisons with rule-based control strategies demonstrate the economy and adaptability of the RL-based HVAC
Author statement
Yan Du: Concept development, algorithm development, algorithm implementation, and writing
Fangxing (Fran) Li: Concept development, algorithm development, technical supervision, and editing
Jeffery Munk: Concept development and algorithm development
Kuldeep Kurte: Concept development and algorithm development
Olivera Kotevska: Concept development and algorithm development
Kadir Amasyali: Concept development and algorithm development
Helia Zandi: Concept development, algorithm development, and
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work was funded in part by the U.S. Department of Energy, Energy Efficiency and Renewable Energy, Building Technology Office under contract number DE-AC05-00OR22725, in part by CURENT which is an Engineering Research Center (ERC) funded by the U.S. National Science Foundation (NSF) and DOE under the NSF award EEC-1041877, and in part by the U.S. NSF ECCS awards 1809458 and 2033910.
References (13)
- et al.
From AlphaGo to power system AI
IEEE Power Energy Mag.
(Mar. 2018) - et al.
Reinforcement Learning: An introduction
(2018) - et al.
Deep reinforcement learning for joint datacenter and HVAC load control in distributed mixed-use buildings
IEEE Trans. Sustain. Comput.
(2019) - et al.
On-line building energy optimization using deep reinforcement learning
IEEE Trans. Smart Grid
(Jul. 2019) - et al.
Deep reinforcement learning for smart home energy management
IEEE Internet Things J.
(2019) - et al.
DeepComfort: energy-efficient thermal comfort control in buildings via reinforcement learning
IEEE Internet Things J.
(2020)
Cited by (39)
An online reinforcement learning approach for HVAC control
2024, Expert Systems with ApplicationsDynamic indoor thermal environment using Reinforcement Learning-based controls: Opportunities and challenges
2023, Building and EnvironmentA Review of Reinforcement Learning for Controlling Building Energy Systems From a Computer Science Perspective
2023, Sustainable Cities and SocietyEnergy-efficient heating control for nearly zero energy residential buildings with deep reinforcement learning
2023, EnergyCitation Excerpt :Deep reinforcement learning (Deep RL) is a model-free method that combines a deep neural network (Q network) and reinforcement learning (RL). Deep RL have been applied to the HVAC industry [26–29]. Different from the model-based method, the model-free method represented by reinforcement learning gradually forms and perfects the control strategy in the form of interactive trial and error between the agent and the environment.
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).