Review of model-based and data-driven approaches for leak detection and location in water distribution systems

Leak detection and location in water distribution systems (WDSs) is of utmost importance for reducing water loss, which is, however, a major challenge for water utility companies. To this end, researchers have proposed a multitude of methods to detect such leaks in WDSs. Model-based and data-driven approaches, in particular, have found widespread uses in this area. In this paper, we reviewed both these approaches and classi ﬁ ed the techniques used by them according to their leak detection methods. It is seen that model-based approaches require highly calibrated hydraulic models, and their accuracies are sensitive to modeling and measurement uncertainties. On the contrary, data-driven approaches do not require an in-depth understanding of the WDS. However, they tend to result in high false positive rates. Furthermore, neither of these approaches can handle anomalous variations caused by unexpected water demands.


GRAPHICAL ABSTRACT INTRODUCTION
Detecting and locating leaks are crucial for reducing water loss in water distribution systems (WDSs) and realizing sustainable water usage. The global volume of non-revenue water (NRW) is estimated to be 126 billion cubic meters per year, and the cost/value of water lost amounts to USD 39 billion annually (Liemberger & Wyatt ). In addition to water loss, leaks can also contaminate water supplies and adversely affect human health (Kouchi et al. ), or they might damage public infrastructure such as roads. Generally, the amount of water loss due to leakage depends on the time elapsed between the occurrence of the leak and its detection by the water utility company (Bakker et al. a). Water leakages in WDSs are typically classified into reported, unreported, and background types (Lambert ). Generally, reported leakages can be detected very quickly via complaint hotlines, thereby ensuring their quick repair. However, unreported or background leakages may continue for a much longer time before they are discovered, which significantly increases the amount of water loss. Therefore, to reduce water loss, the leak duration Leak detection methods may be broadly divided into active and passive approaches, based on the technique used for leak detection (Puust et al. ). In a passive approach, leaks are detected via in-situ visual inspection or monitoring; in an active approach, leaks are detected by performing signal analyses on measurements like acoustic signals, vibrations, pressure, and flow data. Because the passive approach cannot be used for continuous leak monitoring and typically involves long testing times, the active approach is much more commonly used for leak detection. Active approaches can be further divided into transient-based approaches, model-based approaches, and data-driven approaches. Transient-based leak detection is performed by measuring transient pressure signals (Colombo et al. ). The model-based approach typically uses mathematical equations to represent scenarios in a WDS (Almandoz et al. ). In these approaches, the location of a leak may be approximated by comparing the measurements and WDS model estimations (Young & Tych ). In a data-driven approach, leaks are detected by applying signal processing techniques or statistical analysis to the acquired data, which do not require a profound   (Adedeji et al. ), data-driven methods (Wu & Liu ), and current and proposed intelligent methods (Chan et al. ). More details on these articles can be found in Table 1. Colombo et al. () have reviewed and classifed transient-based methods. Therefore, in this paper, we discuss model-based and data-driven methods in detail.
In this review, we provide a detailed description of model-based and data-driven approaches for leak detection.
We will classify the techniques that fall under these categories according to their respective leak detection method and summarize the generic processes used by them. We will then compare a variety of techniques in terms of their performance, analyze their weaknesses, and suggest new directions for future research.

MODEL-BASED APPROACHES
In model-based leak detection techniques, a hydraulic model of the WDS is used to simulate their state of operation (Pérez et al. ). Once this is constructed, the model must be calibrated to ensure that it provides an accurate reflection of the actual operating states of the WDS.
Model-based approaches involve four key steps: (1) construction of a hydraulic model, (2) calibration of the  (4) leak localization. The first three steps are very similar in most modeldriven approaches, while the fourth step depends on the leak localization strategy; more specifically, it depends on the type of data being analyzed and the localization method. Figure 2 illustrates the generic framework of model-based approaches in the form of a generic algorithm that captures the essential features of many of these methods. Table 2  (2) Calibrating the hydraulic model The parameters of a WDS hydraulic model include the nodal head, water consumption, pipe length, pipe diameter, and pipe roughness coefficients. Compared to parameters such as the nodal head and pipe length, the pipe diameter, pipe roughness, and water consumption at a demand node are the most uncertain input variables in the simulation model because they are not typically directly measurable. Therefore, they require calibration (Walski ).

(3) Leak detection
In an operational WDS, the pressure at some node i at time t given by the SCADA system is p i (t), while the estimate given by the WDS hydraulic model isp i (t). The residual pressure, r i (t), is then given by Equation (1): where n is the number of pressure monitoring points in the WDS. r i (t) values are then compared to threshold values τ i .
If the residual exceeds the threshold, a leak warning will be issued, and the leak will be located by some method (as per Step 4). The effects of measurement noise and model uncer-    As shown in Equation (2), one must consider the effects of modeling (Umodelling) and measurement (Umeasurement) uncertainties when comparing a model prediction, Equation (2) may be rearranged to obtain Equation (3): Using this combined uncertainty (U combined ), we can compute the threshold bounds (T low , T high ) by taking the 95% interval of the probability density function. These threshold bounds are then used in Equation (4) to determine the closeness of predictions to measurements at each location via

DATA-DRIVEN APPROACHES
In data-driven approaches, leaks are detected via statistical and signal processing analyses of acquired data (Cody et al. a). This approach is especially promising for WDSs having a large number of sensors because the construction of complex WDS hydraulic models is not required, which makes it insensitive to structural or operational complexities (Wu & Liu ). Simply put, these technologies are dedicated to locating abnormal values in data that may be caused by abnormal events (such as leaks) from the usual patterns recorded in the pipeline system (Geelen et al. ). Data-driven techniques typically utilize flow or pressure data, but they may also use end-user water demands for leak analysis. Figure 7 illustrates the generic framework of data-driven approaches, while Table 3 summarizes and compares a variety of these methods. In most cases, a data-driven technique performed in two steps: (1) data acquisition, preprocessing, and transformation, and (2) leak detection strategy.
(1) Data acquisition, preprocessing, and transformation The primary objective of data preprocessing is to eliminate erroneous or missing data from the time series data to facilitate subsequent analyses. The acquired measurement data must be processed before they can be used for leak analysis, which ensures that they are suitable for the selected algorithm. Although issues pertaining to variability and uncertainty may arise during the processing of the measurement data, they may be overcome to an extent by preprocessing and transforming the data (Zaman et al. (2) Leak detection strategy Data-driven techniques can be classified according to their leak detection strategies; that is, feature set classification methods, prediction-classification methods, statistical methods, and unsupervised clustering methods.

Feature set classification methods
Simple classification methods such as ANNs ( can be used to distinguish leaks from normal data by constructing classifiers. However, these classifiers must be trained with large amounts of normal and anomalous (pipe bursts or leaks) data, which often becomes very difficult (Wu & Liu ). A number of studies have been  given that the leakage events obtained from the model are irregular in nature, it is difficult to satisfy this assumption of non-stationarity.

PERFORMANCE ASSESSMENT Assessment metrics
The performance of leak detection methods is usually assessed using four metrics: true positive rate (TPR), false positive rate (FPR), detection time (DT), and average topological distance (ATD). The first three metrics assess the leak detection, while the last one assesses the leak localization accuracy. Leak detection results are binary as they can only be characterized as 'no leak' or 'leak.' For these reasons, TPR and FPR are widely used to assess the performance of leak detection methods. As shown in Equation (5), TPR represents the percentage of detected leaks, while FPR represents the percentage of false alarms in 'no leak' DT is the time elapsed between the beginning of a leak and its detection, and the leak damage increases in proportion to DT. Therefore, it is an important metric for leak detection methods. It should be noted that leak simulations were not always performed to assess leak detection. In some studies, the records of water utility companies were used to calculate DT, while others used synthesized data to estimate the detection performance of the leak detection system.  Pressure measurements tend to be noisy, which limits their applicability in leak detection. As such, certain techniques can be used to exclude signal noise and extract information from the raw data, such as wavelet analysis Therefore, we can attempt to combine different methods to improve the leakage detection performance of the datadriven method.

CONCLUSION
In this paper, we have reviewed model-based and datadriven approaches for WDS leak detection and location.
We then classified the techniques that fall under these approaches according to their respective leak detection methods. Although these approaches are promising, they have not been well developed, and the current technology is far from ideal (Zaman et al. ). Model-based approaches include sensitivity matrix-based approaches, mixed model-based/data-driven approaches, optimizationcalibration approaches, and error-domain model falsification. On the contrary, data-driven approaches include feature set classification methods, prediction-classification methods, statistical methods, and unsupervised clustering methods. The generic processes of these methods were summarized and encapsulated in a generic algorithm.
It is seen that model-based approaches are capable of detecting and locating leaks but require calibrated hydraulic models and optimized sensor placement; their results are also highly sensitive to modeling and measurement errors.
For data-driven methods, a profound understanding of the WDS is not necessary as these methods only involve statistical or signal processing analyses of the acquired data.
However, they require large quantities of data and are also sensitive to data loss, anomalous sensor data, communication issues, and noise. Furthermore, fluctuations in water demand also affect their leak detection performances.
A data-driven approach is more appropriate when a large amount of historical data can be obtained from a real network. However, when the data is less and its hydraulic model is easy to obtain, model-based methods are then preferred.
Model-based and data-driven approaches both have their own strengths and weaknesses. Some researchers have tried to combine two or more of these approaches to improve leak detection performance, and these combinatory methods are called 'hybrid leak detection techniques'