Elsevier

Energy and Buildings

Volume 210, 1 March 2020, 109689
Energy and Buildings

Unsupervised learning for fault detection and diagnosis of air handling units

https://doi.org/10.1016/j.enbuild.2019.109689Get rights and content

Highlights

  • This work applies an unsupervised learning technique, generative adversarial network (GAN), to deal with the imbalanced training dataset problem in data-driven AHU FDD.

  • Detailed steps of applying GAN to generate artificial faulty training samples are demonstrated and deeply analyzed.

  • A complete AHU FDD framework integrating extensions of GAN is proposed.

  • The existing GAN-based AHU FDD framework has been optimized.

  • Comprehensive experimental results are shown before and after applying GAN extensions to the traditional supervised FDD framework to illustrate the importance of GAN.

Abstract

Supervised learning techniques have witnessed significant successes in fault detection and diagnosis (FDD) for heating ventilation and air-conditioning (HVAC) systems. Despite the good performance, these techniques heavily rely on balanced datasets that contain a large amount of both faulty and normal data points. In real-world scenarios, however, it is often very challenging to collect a sufficient amount of faulty training samples that are necessary for building a balanced training dataset. In this paper, we introduce a framework that utilizes the generative adversarial network (GAN) to address the imbalanced data problem in FDD for air handling units (AHUs). To this end, we first show the necessary procedures of applying GAN to increase the number of faulty training samples in the training pool and re-balance the training dataset. The proposed framework then uses supervised classifiers to train the re-balanced datasets. Finally, we present a comparative study that illustrates the advantages of the proposed method for FDD of AHU with various evaluation metrics. Our work demonstrates the promising prospects of performing robust FDD of AHU with a limited number of faulty training samples.

Introduction

Artificial intelligence (AI) techniques, including machine learning (ML) and deep learning (DL) methods, are important data-driven methods for fault detection and diagnosis (FDD) of engineering systems, such as the heating ventilation and air-conditioning (HVAC) systems in buildings. Recent publications in the literature have shown the successfulness of applying the supervised learning techniques to detect and diagnose faults for different crucial HVAC components, such as the air handling units (AHUs). The FDD classification accuracy was reported as high as over 93% for typical AHU faults [1], [2], [3], [4].

The above mentioned high FDD classification accuracy indeed depends on a well-shaped training dataset in the training phase [5]. Imbalanced training datasets invalidate most of the supervised learning based FDD systems [6], [7]. In real-world scenarios, historical data is collected through remote sensors. The HVAC system operates under normal conditions most of the time. Faulty data samples are hard to be collected with the following reasons:

  • The chance of any particular HVAC system becoming faulty is much less than the chance of the HVAC system working normally.

  • Any faulty HVAC system is usually fixed within a very short period before a sufficient amount of faulty data is collected.

Generative adversarial network (GAN), proposed by Goodfellow et al. in 2014 [8], is capable of generating synthetic faulty training samples that are ‘very close/similar’ to the real-world faulty data samples to re-balance the original training dataset. As a result, the traditional supervised learning based FDD system is revised by adding GAN as a pre-processing step to re-balance the training dataset (Fig. 1). While GAN and its extensions are considered as important solutions for the imbalanced training dataset problem of AHU FDD, there exist several questions in the literature:

  • What is the minimum number of real-world faulty data samples for the revised AHU FDD framework to achieve acceptable performance?

  • As an unsupervised learning method, how are GAN and its extensions applied to generate various types of faulty training samples?

  • How much improvement does the revised AHU FDD framework provide compared with traditional supervised learning FDD approaches?

In this study, the importance of unsupervised learning techniques in the existing typical supervised FDD framework is shown by comparing the FDD results before and after applying GAN to the imbalanced training datasets mimicking the real-world scenarios. A semi-supervised learning FDD framework is designed that combines GAN with traditional supervised learning based FDD methods to detect and diagnose typical air handling unit faults. Both fault detection and diagnosis accuracy rates are drastically improved after applying extended GAN to re-balance the original training dataset. The main contributions of the current study include:

  • 1.

    Detailed steps of applying GAN to generate synthetic faulty training samples. As an unsupervised learning technique, GAN is capable of generating synthetic faulty training samples with only a few real-world samples available. First, the available real-world faulty data samples are collected to form the initial dataset. Second, we illustrate the detailed steps of applying GAN to generate synthetic faulty samples with random noise. Last, we provide a quality check protocol to validate the quality of synthetic samples.

  • 2.

    A complete FDD framework based on extended GAN methods. Extended GAN techniques have been utilized to re-balance the training dataset in both detection and diagnosis processes. We complete the work in [9] by designing a more sophisticated framework using extended GAN methods to enhance the classification accuracy.

  • 3.

    Quality control protocol optimization. The quality control protocol proposed in [9] has been adopted and optimized to achieve better FDD performance. The original ensemble learning quality control protocol with the combination of support vector machine (SVM), decision tree (DT) and random forest (RF) has been evaluated and replaced by a combination of SVM, RF and multi-layer perceptron (MLP).

  • 4.

    Comprehensive experimental results. The importance of applying GAN to the supervised FDD framework is shown by illustrating the experimental results with/without the GAN process. With 5, 10, 20, 30 and 40 faulty training samples available for each fault type, both fault detection and fault diagnosis results are improved significantly.

Section snippets

Related works

Automatic FDD methods in intelligent buildings can be roughly divided into two categories: model-based methods and data-driven methods [10]. Different from the model-based approach, the data-driven methods build models purely based on historical sensor data without any prior knowledge about the physics system. The models are hardly interpretable using mathematical formulas and also known as black models. Fig. 2 shows a typical supervised learning data-driven FDD approaches, where historical

Preliminary study

In this section, we introduce the necessary background of this study, which includes the original data description and the feature selection process.

Methodology

In this study, we improved the work in [17] by applying generative adversarial network (GAN) to generated synthetic training samples to re-balance the training dataset. Detailed procedures of applying generative adversarial network (GAN) are shown in this section. It is noted that a pre-assumption is made, such that the number of faulty training samples is much less than the normal training samples and not adequate to support conventional supervised AHU FDD approaches.

Performance evaluation metrics

Classification accuracy, precision, recall and F-score are four important performance evaluation metrics for HVAC FDD methods. Before we show the experimental results of the proposed framework, formal mathematical formulas of the four evaluation metrics are revisited in this subsection.

Referring to a confusion matrix as shown in Table 3, the true positive (TP) value counts the number of samples that are correctly classified and belonging to the current class. The true negative (TN) value refers

Conclusion, limitation and future works

This work demonstrates the importance of unsupervised learning techniques, more specifically, the generative adversarial networks (GANs), in the field of data-driven FDD for air handling units (AHUs). In real-world scenarios, there is always the case that the number of faulty data samples is not enough for the training process. For example, a specific fault might be fixed within 40 min; and only less than or equal to 40 faulty data samples were collected, along with thousands of normal

CRediT authorship contribution statement

Ke Yan: Conceptualization, Methodology, Writing - original draft, Formal analysis, Writing - review & editing. Jing Huang: Software, Data curation, Investigation. Wen Shen: Formal analysis, Writing - review & editing. Zhiwei Ji: Supervision, Validation, Software, Data curation, Investigation.

Declaration of Competing Interest

None.

Acknowledgment

This study is supported by faculty start-up funding from National University of Singapore under grant number R-296-000-208-133.

References (31)

Cited by (117)

View all citing articles on Scopus
View full text