HVAC system attack detection dataset

The importance of the security of building management systems (BMSs) has increased given the advances in the technologies used. Since the Heating, Ventilation, and Air Conditioning (HVAC) system in buildings accounts for about 40% of the total energy consumption, threats targeting the HVAC system can be quite severe and costly. Given the limitations on accessing a real HVAC system for research purposes and the unavailability of public labeled datasets to investigate the cybersecurity of HVAC systems, this paper presents a dataset of a 12-zone HVAC system that was collected from a simulation model using the Transient System Simulation Tool (TRNSYS). It aims to promote and support the research in the field of cybersecurity of HVAC systems in smart buildings [1] by facilitating the validation of attack detection and mitigation strategies, benchmarking the performance of different data-driven algorithms, and studying the impact of attacks on the HVAC system.


Specifications
Engineering Specific subject area Cybersecurity of industrial control systems and cyber-physical systems Type of data Table  How data were acquired  Simulation tool  Data format Raw and analyzed data in three spreadsheet files Parameters for data collection hour of the year, hour of the day, temperature sensor measurements T zA 1 -T zA 4 , T zB 1 -T zB 4 , T zC 1 -T zC 4 , T t , T chiller , T aoA , T aoB , T aoC , T woA , T woB , T woC , T amb , control signals U 1 -U 13 , setpoints, zones' thermal comfort indices PMV 1 -PMV 12 , total estimated power usage P total , status of the HVAC system Description of data collection The data were collected from a simulation model of a 3-floor, 12-zone HVAC system using the Transient System Simulation Tool (TRNSYS), which is a user-friendly software that allows simulating the behavior of dynamic systems using energy and mass balance equations [2] . It has been widely used as a reliable tool for simulating the HVAC systems' dynamics since its models were developed by authoritative departments to be consistent with practical data, and to reproduce the HVAC system to a large extent [3] .

Value of the Data
• The dataset is useful for anomaly detection and cybersecurity research for multi-zone HVAC systems in light of the increased threats on BMSs -which have the HVAC system as one of the major components-, the limited accessibility to real HVAC systems for research purposes, and the unavailability of public labeled datasets to investigate the cybersecurity of HVAC systems. • It covers different models of attacks that can be launched against the HVAC system with different levels of severity. • The dataset will be useful for promoting and supporting the study and research in the field of the security of intelligent buildings with respect to the most expensively operated equipment, namely the HVAC system. • It can be used for benchmarking the performance of the various data-driven approaches for HVAC systems attack diagnosis and mitigation. • It is useful to study the impact of the HVAC system malfunction on the efficiency of the system and the thermal comfort levels of the occupants.

Data Description
The dataset was collected from a simulated 12-zone HVAC system for cooling application. As presented in Table 1 , it consists of three logs collected at a sampling rate of 1 min in which Dataset log 1 contains normal operational data collected for four months -from June to September-, and Dataset log 2 represents normal operational data collected for 20 days, and Dataset log 3 consists of the normal and attack data of 16 attacks injected in a span of 20 days. The following variables were recorded: the hour of the year, the hour of the day, the measurements of 21 temperature sensors, 13 control signals, temperature setpoints, the 12 indices of the zones' thermal comfort of occupants, the total estimated power used by the HVAC system, and the status of the system (i.e. 0 for normal operation and 1 for under attack). The detailed description of the variables is presented in Tables 2 and 3 , and Table 4 shows the months in terms of the hour of the year. The attack models used were presented in [4] , which are: • Attack 1: Changing the setpoints of the control system  Table 3 The description of the data parameters. The details of the attacks are presented in Table 5 .

Experimental Design, Materials and Methods
As presented in [1] , the building is a 3-floor office building operating from 6 AM to 6 PM. The floors are labeled A, B, and C and each floor consists of four zones where Zones 1-3 are office rooms and Zone 4 is a hall as shown in Fig. 1 . It has a simple HVAC system for the cooling application as shown in Fig. 2 in which the temperature at each zone is controlled by proportional  integral derivative (PID) controllers [5] . Each floor is equipped with an air handling unit (AHU) that provides the zones with cold air at a constant temperature of 13 °C , and a variable flow rate controlled by the variable air volume (VAV) terminals. The chiller system and the cooling coils of AHUs are connected by the water tank that supplies chilled water to the cooling coils using a flow pump. The temperature of the chiller supply water T chiller is 9 °C . The water tank temperature T t is controlled using a PID controller at 11 °C via a water valve to modulate the chilled water flow from the chiller to the tank.  2. The diagram of a typical HVAC system using the Variable Air Volume (VAV) system [5] .

Table 6
The ranges of the PMV value for thermal comfort conditions.  It is challenging to obtain actual data or gain access to real building management systems due to confidentiality, unfeasibility, etc. Therefore, the use of reliable simulation tools is common and convenient to provide flexible means to conduct the research and analysis with high fidelity. Using the TRNSYS HVAC system simulation model, attacks were simulated by modifying the setpoint, sensor reading, or the control signal. HVAC systems are used to condition the indoor environment for occupants at minimum energy utilization. The HVAC system energy usage can be estimated by the consumption amount of the equipment such as the chiller, fans, and pumps. In terms of the thermal comfort level defined as the degree of satisfaction of occupants with the indoor thermal environment, the predicted mean vote (PMV) index is used to predict the mean response of a larger group of people according to the ASHRAE thermal sense scale [6] as presented in Table 6 .
The dataset can be used to facilitate validating attack detection and mitigation strategies, benchmarking the performance of different algorithms, and studying the impact of attacks on the HVAC system. As described in Table 7 , four code files are provided as supplementary materials for training machine learning-based detection models using the Isolation Forest algorithm [1] .