Unsupervised learning for fault detection and diagnosis of air handling units
Introduction
Artificial intelligence (AI) techniques, including machine learning (ML) and deep learning (DL) methods, are important data-driven methods for fault detection and diagnosis (FDD) of engineering systems, such as the heating ventilation and air-conditioning (HVAC) systems in buildings. Recent publications in the literature have shown the successfulness of applying the supervised learning techniques to detect and diagnose faults for different crucial HVAC components, such as the air handling units (AHUs). The FDD classification accuracy was reported as high as over 93% for typical AHU faults [1], [2], [3], [4].
The above mentioned high FDD classification accuracy indeed depends on a well-shaped training dataset in the training phase [5]. Imbalanced training datasets invalidate most of the supervised learning based FDD systems [6], [7]. In real-world scenarios, historical data is collected through remote sensors. The HVAC system operates under normal conditions most of the time. Faulty data samples are hard to be collected with the following reasons:
- •
The chance of any particular HVAC system becoming faulty is much less than the chance of the HVAC system working normally.
- •
Any faulty HVAC system is usually fixed within a very short period before a sufficient amount of faulty data is collected.
Generative adversarial network (GAN), proposed by Goodfellow et al. in 2014 [8], is capable of generating synthetic faulty training samples that are ‘very close/similar’ to the real-world faulty data samples to re-balance the original training dataset. As a result, the traditional supervised learning based FDD system is revised by adding GAN as a pre-processing step to re-balance the training dataset (Fig. 1). While GAN and its extensions are considered as important solutions for the imbalanced training dataset problem of AHU FDD, there exist several questions in the literature:
- •
What is the minimum number of real-world faulty data samples for the revised AHU FDD framework to achieve acceptable performance?
- •
As an unsupervised learning method, how are GAN and its extensions applied to generate various types of faulty training samples?
- •
How much improvement does the revised AHU FDD framework provide compared with traditional supervised learning FDD approaches?
In this study, the importance of unsupervised learning techniques in the existing typical supervised FDD framework is shown by comparing the FDD results before and after applying GAN to the imbalanced training datasets mimicking the real-world scenarios. A semi-supervised learning FDD framework is designed that combines GAN with traditional supervised learning based FDD methods to detect and diagnose typical air handling unit faults. Both fault detection and diagnosis accuracy rates are drastically improved after applying extended GAN to re-balance the original training dataset. The main contributions of the current study include:
- 1.
Detailed steps of applying GAN to generate synthetic faulty training samples. As an unsupervised learning technique, GAN is capable of generating synthetic faulty training samples with only a few real-world samples available. First, the available real-world faulty data samples are collected to form the initial dataset. Second, we illustrate the detailed steps of applying GAN to generate synthetic faulty samples with random noise. Last, we provide a quality check protocol to validate the quality of synthetic samples.
- 2.
A complete FDD framework based on extended GAN methods. Extended GAN techniques have been utilized to re-balance the training dataset in both detection and diagnosis processes. We complete the work in [9] by designing a more sophisticated framework using extended GAN methods to enhance the classification accuracy.
- 3.
Quality control protocol optimization. The quality control protocol proposed in [9] has been adopted and optimized to achieve better FDD performance. The original ensemble learning quality control protocol with the combination of support vector machine (SVM), decision tree (DT) and random forest (RF) has been evaluated and replaced by a combination of SVM, RF and multi-layer perceptron (MLP).
- 4.
Comprehensive experimental results. The importance of applying GAN to the supervised FDD framework is shown by illustrating the experimental results with/without the GAN process. With 5, 10, 20, 30 and 40 faulty training samples available for each fault type, both fault detection and fault diagnosis results are improved significantly.
Section snippets
Related works
Automatic FDD methods in intelligent buildings can be roughly divided into two categories: model-based methods and data-driven methods [10]. Different from the model-based approach, the data-driven methods build models purely based on historical sensor data without any prior knowledge about the physics system. The models are hardly interpretable using mathematical formulas and also known as black models. Fig. 2 shows a typical supervised learning data-driven FDD approaches, where historical
Preliminary study
In this section, we introduce the necessary background of this study, which includes the original data description and the feature selection process.
Methodology
In this study, we improved the work in [17] by applying generative adversarial network (GAN) to generated synthetic training samples to re-balance the training dataset. Detailed procedures of applying generative adversarial network (GAN) are shown in this section. It is noted that a pre-assumption is made, such that the number of faulty training samples is much less than the normal training samples and not adequate to support conventional supervised AHU FDD approaches.
Performance evaluation metrics
Classification accuracy, precision, recall and F-score are four important performance evaluation metrics for HVAC FDD methods. Before we show the experimental results of the proposed framework, formal mathematical formulas of the four evaluation metrics are revisited in this subsection.
Referring to a confusion matrix as shown in Table 3, the true positive (TP) value counts the number of samples that are correctly classified and belonging to the current class. The true negative (TN) value refers
Conclusion, limitation and future works
This work demonstrates the importance of unsupervised learning techniques, more specifically, the generative adversarial networks (GANs), in the field of data-driven FDD for air handling units (AHUs). In real-world scenarios, there is always the case that the number of faulty data samples is not enough for the training process. For example, a specific fault might be fixed within 40 min; and only less than or equal to 40 faulty data samples were collected, along with thousands of normal
CRediT authorship contribution statement
Ke Yan: Conceptualization, Methodology, Writing - original draft, Formal analysis, Writing - review & editing. Jing Huang: Software, Data curation, Investigation. Wen Shen: Formal analysis, Writing - review & editing. Zhiwei Ji: Supervision, Validation, Software, Data curation, Investigation.
Declaration of Competing Interest
None.
Acknowledgment
This study is supported by faculty start-up funding from National University of Singapore under grant number R-296-000-208-133.
References (31)
- et al.
Sensor fault detection and its efficiency analysis in air handling unit using the combined neural networks
Energy Build.
(2014) - et al.
Robust model-based fault diagnosis for air handling units
Energy Build.
(2015) - et al.
A decision tree based data-driven diagnostic strategy for air handling units
Energy Build.
(2016) - et al.
Diagnostic Bayesian networks for diagnosing air handling units faults–part i: faults in dampers, fans, filters and sensors
Appl. Therm. Eng.
(2017) - et al.
Artificial intelligence-based fault detection and diagnosis methods for building energy systems: advantages, challenges and the future
Renewable Sustainable Energy Rev.
(2019) - et al.
Development and implementation of automated fault detection and diagnostics for building systems: a review
Autom. Constr.
(2019) - et al.
Chiller fault diagnosis with field sensors using the technology of imbalanced data
Appl. Therm. Eng.
(2019) - et al.
ARX model based fault detection and diagnosis for chillers using support vector machines
Energy Build.
(2014) - et al.
An intelligent chiller fault detection and diagnosis methodology using Bayesian belief network
Energy Build.
(2013) - et al.
Fault detection and diagnosis of chillers using Bayesian network merged distance rejection and multi-source non-sensor information
Appl. Energy
(2017)
Early detection of faults in HVAC systems using an XGboost model with a dynamic threshold
Energy Build.
Semi-supervised learning for early detection and diagnosis of various air handling unit faults
Energy Build.
Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis
Int. J. Refrig.
Effective data generation for imbalanced learning using conditional generative adversarial networks
Expert Systems with applications
An Improved Elman Neural Network with Piecewise Weighted Gradient for Time Series Prediction
Neurocomputing
Cited by (117)
AI in HVAC fault detection and diagnosis: A systematic review
2024, Energy ReviewsIntelligent fault diagnosis for air handing units based on improved generative adversarial network and deep reinforcement learning
2024, Expert Systems with Applications