Multi-task deep reinforcement learning for intelligent multi-zone residential HVAC control

https://doi.org/10.1016/j.epsr.2020.106959Get rights and content

Highlights

  • A multi-zone residential HVAC control is proposed in this paper.

  • It is based on multi-task deep reinforcement learning (deep RL) method.

  • It minimizes energy consumption costs while maintaining users’ comfort.

  • It is compared with a rule-based case and a single-task DDPG gradient algorithm.

Abstract

In this short communication, a data-driven deep reinforcement learning (deep RL) method is applied to minimize HVAC users’ energy consumption costs while maintaining users’ comfort. The applied deep RL method's efficiency is enhanced by conducting multi-task learning that can achieve an economic control strategy for a multi-zone residential HVAC system in both cooling and heating scenarios. The applied multi-task deep RL method is compared with a rule-based benchmark case and a single-task deep deterministic policy gradient algorithm to verify its effective and generalized application in optimizing HVAC operation.

Introduction

The latest development in machine learning such as deep learning and reinforcement learning techniques are being widely discussed in many critical areas that were once dominated by human intelligence, such as robotic control and autonomous driving [1], as well as in the field of power and energy [2]. In particular, the deep reinforcement learning (deep RL) method has been implemented for controlling heating, ventilation, and air conditioning (HVAC) systems to achieve both an economic benefit and improved customer comfort. In [3], a model-free deep Q network (DQN) is applied for joint data center and HVAC load control in mixed-use buildings to reduce energy consumption. In [4], the authors compare the value-based DQN method with the policy-based deep policy gradient (DPG) method in residential energy management, and demonstrate that the latter is more suitable to perform online scheduling of energy sources. Given that many control variables in HVAC thermal control are continuous, the deep deterministic policy gradient (DDPG) method is implemented in [5,6], to avoid the discretization of the control variables and to obtain better learning performance. In [7], the authors utilize imitation learning to pre-train the HVAC control agent on historical data to make it behave similarly to the existing controller. Following this, the RL agent continues to improve its policy during the online training using a policy gradient method proximal policy optimization (PPO). In [8], the authors further extend the deep RL algorithm to optimize multi-zone HVAC system control, where a set of actor network and critic network is designed for each thermal zone, and feature extraction from selected neighbor zones is collected to better capture the mutual thermal effects between different zones for improving the control policy.

While the effectiveness of the deep RL based HVAC control methods has been illustrated in the above existing researches, one deficiency is that the majority of the researches focus on learning a single HVAC control task by merely training the algorithm in either the cooling scenario or the heating scenario. Retraining of the algorithm is required when the scenario switches. It is widely known in the RL community that the training of the algorithm can be time-consuming and resource-consuming. Solving only one task at one time is less efficient and acceptable as more complex control problems emerge. Motivated by the above concern, in this short communication, we work on teaching the RL agent to master both the cooling and heating tasks simultaneously to guarantee an optimal HVAC control regardless of the scenario. A multi-task DDPG algorithm is developed for this purpose and is further tested in a multi-zone residential HVAC system. Comparisons with a rule-based HVAC control strategy and a single-task DDPG algorithm demonstrate that the multi-task DDPG algorithm has higher generalization and enables lower energy consumption cost and less user comfort violation through intelligent scheduling.

Section snippets

Multi-task DDPG for multi-zone residential HVAC control

The changing of indoor temperature under the control of residential HVAC system can be formulated as a Markov Decision Process (MDP) [9], and the key parameters are defined as follows:

  • 1)

    State: the outdoor temperature Tout(t), the indoor temperature Tin,z(t) for each zone z, and the retail price λretail(t), where t is the index of time step; 2) Action: the setpoint Setptz(t) for each zone z; 3) Reward: the total energy consumption cost plus the temperature violation penalty, as shown below:r(t)=t

Simulation results

The above multi-task DDPG algorithm is tested in a two-zone residential HVAC building model [11]. The weather data and Georgia Power price data from [12], [13] are used for algorithm training and testing. The Georgia Power price contains only two price values, a peak price value at 0.2$/kWh, and an off-peak price at 0.05$/kWh. For the cooling scenario, the algorithm is trained with data from Jul. 1st, 2019 to Jul. 31st, 2019 and tested with data from Aug. 1st, 2019 to Aug. 10th, 2019, and the

Conclusions

In this short communication, a multi-task DDPG method is applied to learn the setpoint control strategies of multi-zone residential HVAC systems in both cooling and heating scenarios. The multi-task learning process can lead to a more generalized feature extraction among different tasks that share some similarities and improves learning efficiency compared to single-task learning. Comparisons with rule-based control strategies demonstrate the economy and adaptability of the RL-based HVAC

Author statement

Yan Du: Concept development, algorithm development, algorithm implementation, and writing

Fangxing (Fran) Li: Concept development, algorithm development, technical supervision, and editing

Jeffery Munk: Concept development and algorithm development

Kuldeep Kurte: Concept development and algorithm development

Olivera Kotevska: Concept development and algorithm development

Kadir Amasyali: Concept development and algorithm development

Helia Zandi: Concept development, algorithm development, and

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was funded in part by the U.S. Department of Energy, Energy Efficiency and Renewable Energy, Building Technology Office under contract number DE-AC05-00OR22725, in part by CURENT which is an Engineering Research Center (ERC) funded by the U.S. National Science Foundation (NSF) and DOE under the NSF award EEC-1041877, and in part by the U.S. NSF ECCS awards 1809458 and 2033910.

References (13)

  • F. Li et al.

    From AlphaGo to power system AI

    IEEE Power Energy Mag.

    (Mar. 2018)
  • R.S. Sutton et al.

    Reinforcement Learning: An introduction

    (2018)
  • T. Wei et al.

    Deep reinforcement learning for joint datacenter and HVAC load control in distributed mixed-use buildings

    IEEE Trans. Sustain. Comput.

    (2019)
  • E. Mocanu et al.

    On-line building energy optimization using deep reinforcement learning

    IEEE Trans. Smart Grid

    (Jul. 2019)
  • L. Yu et al.

    Deep reinforcement learning for smart home energy management

    IEEE Internet Things J.

    (2019)
  • G. Gao et al.

    DeepComfort: energy-efficient thermal comfort control in buildings via reinforcement learning

    IEEE Internet Things J.

    (2020)
There are more references available in the full text version of this article.

Cited by (39)

  • Energy-efficient heating control for nearly zero energy residential buildings with deep reinforcement learning

    2023, Energy
    Citation Excerpt :

    Deep reinforcement learning (Deep RL) is a model-free method that combines a deep neural network (Q network) and reinforcement learning (RL). Deep RL have been applied to the HVAC industry [26–29]. Different from the model-based method, the model-free method represented by reinforcement learning gradually forms and perfects the control strategy in the form of interactive trial and error between the agent and the environment.

View all citing articles on Scopus

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

View full text