Skip to main content
Advertisement
  • Loading metrics

Water distribution pipe lifespans: Predicting when to repair the pipes in municipal water distribution networks using machine learning techniques

  • Nacer Farajzadeh ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

    n.farajzadeh@azaruniv.ac.ir

    Affiliations Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran, Artificial Intelligence and Machine Learning Research Laboratory, Azarbaijan Shahid Madani University, Tabriz, Iran

  • Nima Sadeghzadeh,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran, Artificial Intelligence and Machine Learning Research Laboratory, Azarbaijan Shahid Madani University, Tabriz, Iran

  • Nastaran Jokar

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran, Artificial Intelligence and Machine Learning Research Laboratory, Azarbaijan Shahid Madani University, Tabriz, Iran

Abstract

Water is one of the essential matters that keeps living species alive; yet, the lifespan of pipes has two direct impacts on wasting water in very great amounts: pipe leakages and pipe bursts. Consequently, the proper detection of aged pipes in the water distribution networks has always been an issue in overcoming the problem. This makes water pipe monitoring an important duty of municipalities. Traditionally, leakages and bursts were only detected visually or through reports in local areas, leading municipalities to change the old pipes. Although this helps to fix the issue, a more desired way is to perhaps let officials know about the possibilities of such problems in advance by predicting which pipes are aged, so they can prevent the wastage. Therefore, to automate the detection process, in this study, we take the initial steps to predict the pipes needing repair in a particular area using machine learning methods. We first obtain a private dataset provided by the municipality of Saveh, Iran which outlines pipes that were damaged previously. We then train three machine learning algorithms to predict whether a set of pipes in an area is prone to damage. To achieve this, One-Class (OC) Classification methods such as OC-SVM, Isolation Forest, and Elliptic Envelope are used and they achieved the highest accuracy of 0.909. This study is of value since it requires zero additional devices (i.e., sensors).

1. Introduction

Water is one of the most critical substances on which the life of every living species depends. Therefore, the water shortage crisis is among the top issues that concern humankind. Nowadays, due to global warming, underground water levels are decreasing and the natural glaciers are melting. Hence, it is predicted that Earth will experience drought in the future [1]. Therefore, people should be more vigilant about water usage. To alleviate the issue, municipalities took into consideration several approaches such as water purification (for reuse), cloud fertilization technology (to increase annual rainfall), cultivation of optimal water use (to reduce waste), etc. [2, 3]. Nonetheless, water leakage in pipes or pipe busts are two issues that cannot be naturally controlled and may expedite the water shortage crisis. This is majorly a result of old pipes that are prone to burst or cracks that cause water to be wasted.

Traditionally, leakages and bursts were only detected visually or through reports in local areas, both of which are still prevalent in less developed areas. On the other hand, specialists introduced some technological facets to accelerate the reports. A paramount example includes monitoring leakage using pipe pressure gauges to locate old pipes [1, 4]. However, all of the current methods will only detect leakages or bursts after a great amount of water is wasted, not before that. Therefore, it is deduced that the current technology is not sufficient to prevent the water shortage crisis [5].

Additionally, not every leakage is detectable. For example, the damaged part of a pipe might be so small that neither is it visible to people, nor the pressure difference is significant enough to be detected in the monitor rooms. Therefore, the leaking water in such environments is left unnoticed and over time, may damage some infrastructures, i.e., building undermining, sinkhole formation, etc. [6, 7]. Hence, it is important to locate old pipes to prevent this.

To tackle this, the predictive aspects of computers, especially those of Artificial Intelligence (AI) and Machine Learning (ML), could be utilized to see which areas in a city are more prone to experience water leakage. Once such areas are outlined, municipalities can start preventive operations to decrease the chance of leakage. Such operations include changing old pipes, covering rotten part(s) of pipes, re-routing distribution network, etc. Clearly, if leakage times are minimized, water can be even more available in less developed cities as well as dry areas of a country [8, 9].

AI and ML are very versatile and can be applied to almost anything [4, 1013]. They can learn latent relations between every bit of data to provide the desired outcome for a given task [14]. In the present task, it is expected that an artificial model will be able to map some features related to pipes (e.g., age, pressure tolerance, etc.; see Section ‎3.1 for other examples) to a probability level of leaking/burst. Inspired by all of these, the proposed method is to provide an AI-based method to learn the features of pipes and predict whether pipes in a specific area are prone to burst or leak water. The proposed method achieves an accuracy of 90.9%. This study is of importance since no previous study used such a method (see Section ‎2 for review) to eliminate the necessity to use sensors or district metering for leakage/burst monitoring.

Monitoring pipe health with tabular data can eliminate the need for sensors by leveraging existing data sources such as maintenance records, inspection reports, and historical performance data. By analyzing this tabular data, patterns and trends can be identified to assess the health of pipes. For example, monitoring parameters like pressure, flow rate, temperature, and corrosion levels over time can provide insights into potential issues or deterioration. This approach eliminates the cost and complexity associated with installing and maintaining physical sensors while still enabling proactive maintenance and minimizing the risk of pipe failures. From a more scientific perspective, the primary objective of this paper is to predict the lifespan of specific sets of pipes. However, given that bursts and leakages are direct noticeable consequences of aging pipes, they receive greater emphasis in this paper compared to the broader concept of overall lifespan. Another consequence includes internal pipe corrosion. When a metallic pipe corrodes, the inner coating can detach and cycle through the flow, potentially causing issues such as reduced flow rates, blockages, or contamination of the water being transported.

The remainder of the paper is organized as follows: Section ‎2 provides a review of related work. Section ‎3 describes the proposed method in detail. Section ‎4 provides the results of the experiments. Finally, Section ‎5 concludes the paper and provides suggestions for future works.

2. Related works

Reasons for pipe bursting or leaking could be generally classified into seven categories [6, 15]:

  1. Pipe wear and/or its age are the top reasons that cause the mentioned problem. When a pipe is old enough to become thin on its wall, its resistance to water pressure decreases and thus, may burst anytime.
  2. Using non-genuine parts, mostly in joints, are used in water distribution networks to reduce costs and expenses. In such cases, the parts may not be able to hold the water pressure and burst.
  3. Passage of vehicles may impose additional pressure on the pipes, especially where the pipes are buried near the ground surface, that ultimately cause them to burst. This can be also due to the resonance caused by vehicles while passing over a set of pipes.
  4. Sand or objects may enter into pipes during maintenance and block water flow. In such occasions, pipes may not resist the water pressure and eventually burst.
  5. Pipe diameter of the joined sections may sometimes mismatch due to miscalculation. If so, this can cause smaller pipes to burst due to the high volume of water passing through.
  6. Pipe material/type if not chosen according to the environment may contribute to the problem. Some types of pipes, such as copper, are thin and lose heat rapidly thus reaching freezing point. Therefore, if used in cold areas, they may not resist low temperatures when imposed to high water pressure, and consequently squeeze to crack.
  7. Land cover may keep pipes safer in environments that can have a direct impact on the pipes. For instance, pipes installed beneath asphalt are considered safer than those buried in the soil. This is because the asphalt layer provides a protective cover that shields the pipes from external objects, reducing the likelihood of damage or disruption. As a result, the pipes are less susceptible to impact from external factors.

Hence, municipalities must consider at least seven factors when managing a water distribution network, each of which presents its own challenges. Furthermore, not all factors can be controlled at all times. For instance, when branching a pipeline, municipalities must increase water pressure to ensure uniform pressure throughout each branch. However, this may require an increase in pipe diameter which is not always feasible. If the network is large enough to require a diameter increase throughout the entire line, this task becomes nearly impossible as municipalities cannot overhaul an entire network [16].

To accommodate such challenges, it becomes necessary to increase the number of pipes branching from a location closer to the central water distribution unit [17]. As a result, municipalities must be even more vigilant about maintaining these additional pipes as their numbers increase. This task is clearly difficult without an assistive automated tool. Therefore, academia has recently proposed automated methods for monitoring and identifying pipes that are susceptible to such incidents.

Moulik et al. [18] presented a water leakage and blockage detection system specifically designed for hilly regions. Their proposed system focused on analyzing the variations in vibrations caused by different water levels within pipes. To capture those vibrations, wireless sensors were utilized on PVC pipes during water flow. Machine learning algorithms were then applied to those vibration records to accurately identify any disruptions in the regular water flow, indicating potential leaks or blockages. Asghari et al. [3] presented an ML-based framework for detecting leaks in pipes using transient waves. At that time, the Transient-Based Leak Detection (TBLD) technology was an optimization problem that was typically solved using inefficient Metaheuristic Optimization algorithms. To address that drawback, they proposed an efficient ML approach that involved training an ensemble of CatBoost models on a dataset of over 3.8 million records to classify leaky sections and predict leak sizes. They achieved a 97% accuracy in leak detection and an F1-score of 0.86, indicating a significant improvement over MOAs.

Van der Walt et al. [4] in 2018 explored the pressure flow deviation method using three solution strategies to determine their suitability for specific networks. Their study combined three strategies–Bayesian probabilistic analysis, support vector machine, and artificial neural network–with the inverse analysis technique on different numerical and experimental networks to identify the limitations of each strategy. The investigation included two numerical networks and one experimental network. The findings indicated that the Bayesian probabilistic analysis struggles to find unique solutions when only a few observations are available, while the support vector machine and artificial neural network face challenges when only flow measurements are available. Furthermore, it is revealed that the artificial neural network encounters difficulties in estimating unique solutions for leak size and location. From their work, it was observed that no strategy consistently outperformed any of the others. This raised the question of whether these strategies should not be combined to work together to find better predictions. The leak detection technique could also be combined with online monitoring systems, allowing for quick and accurate detection of leaks. Although their technique required further investigation to find accurate leak sizes and exact location estimations, it accurately found leaks in networks and identified the pipes they were on. Additionally, their work showed that more research needed to be completed on model calibration techniques to help with the detection of the leaks, their sizes, and their locations.

In 2019, Zhou et al. [1] saw a lack of studies on the accurate localization of a burst within a potential district by accessible meters. To address this, they proposed a novel Burst Location Identification Framework by Fully Linear DenseNet (BLIFF). In this framework, additional pressure meters were placed at limited, optimized places for a short period (minutes to hours) to monitor system behavior after the burst. The fully linear DenseNet (FL-DenseNet) developed in their study modifies the state-of-the-art deep learning algorithm to effectively extract features in the limited pressure signals for accurate burst localizations. BLIFF was tested on a benchmark network with different parameter settings, which showed that accurate burst localization results can be achieved even with high model uncertainties. The framework was also applied to a real-life network, in which 57 of the total 58 synthetic bursts in the potential burst district were correctly located when the top five most possible pipes were considered, and among them, 37 were successfully located when considering only the top one. Only one failed because of the very small pipe diameter and remote location. Comparisons with DenseNet and the traditional fully linear neural network demonstrated that the framework could effectively narrow the potential burst district to one or several pipes with good robustness and applicability.

In 2020, Wang et al. [2] presented a data-driven method for burst detection that consisted of three stages: prediction, classification, and correction. During the prediction stage, the accuracy of flow prediction was improved. The classification stage utilized multiple thresholds to ensure the method was robust to time variation. An outlier feedback correction stage allowed consecutive detection of outliers. In simulated experiments, their proposed method triggered burst alarms with 99.80% detection accuracy, 85.71% true-positive rate (TPR), and 0.14% false-positive rate (FPR). In synthetic experiments over a 10-minute detection time in a real-life DMA, the proposed method achieved 99.77% DA, 94.82% TPR, and 0.21% FPR. The identifiable minimum burst rate was as low as 2.79% of the average DMA inflow.

Understanding the complex factors contributing to leakage is crucial in efforts to reduce it. Therefore, Hayslep et al. in 2023 [19] focused on examining the relationship between leakage and static characteristics of district metered areas (DMAs), without considering pressure or flow. The characteristics analyzed include the number of pipes and connections, total DMA volume, network density, pipe diameter, length, age, and material statistics. Accurately quantifying leakage, particularly background, and unreported leakage, can be challenging. To address this, the Average Weekly Minimum Night Flow (AWM) over the past five years was used as a proxy for leakage. While some legitimate demand might have been included in this measure, it was generally assumed that minimum night flow strongly correlates with leakage. Their study conducted a data-driven case study using data from over 800 real DMAs in UK networks. Two regression models–a decision tree model and an elastic net linear regression model–were developed to predict AWM for unseen DMAs. Despite not including pressure as a feature, these models achieved reasonable accuracy.

Mazaev et al. [20] proposed a hybrid approach for localizing leaks in water distribution networks (WDNs) by combining model-based and data-driven modeling techniques. Their approach involved simulating pressure heads of leak scenarios using a hydraulic model, which was then used to train an ML-based leak localization model. A crucial aspect of their methodology was the incorporation of dynamically calculated bias correction to account for discrepancies between simulated and measured pressures, based on historical pressure measurements. To evaluate the effectiveness of their approach, they conducted in-field leak experiments in operational WDN and collected realistic test data. The results demonstrated that their leak localization model successfully reduced the search region for leaks in parts of the network where detectable drops in pressure occurred. Even in cases where such drops were not observed, their model still managed to localize the leak but with a higher level of uncertainty regarding its predictions.

To diagnose leakage of the main gas extraction pipeline in coal mines, a pipeline leakage diagnosis method based on Simulated Annealing (SA) and Particle Swarm Optimization (PSO) collaborative optimization Back Propagation Neural Network (BPNN) was proposed by Zhou et al. [21]. The reliability of the SA-PSO BPNN leakage diagnosis model was verified by establishing a mapping relationship between the location of the leakage point and monitoring values. The results indicate that as the diameter of the leakage increases at a given location, there is a greater rate of change in flow and pressure at each monitoring point. When leakage occurs, there is a significant decrease in flow and pressure near the leakage point, but these parameters change only slightly further away from the leakage point. Comparing the accuracy of leakage identification among different models (BP, SABP, PSOBP, and SA-PSO BPNN) at various leakage points, it was found that the SA-PSO BPNN model had higher accuracy. The Area Under Curve (AUC) values ranged from 0.614 to 0.940 under different leakage conditions, with test accuracies of 79.61%, 87.21%, and 92.25% respectively when inputting 2, 3, and 5 sets of monitoring point parameters. The diagnostic accuracy of the SA-PSO BPNN model was determined to be 93.33% through verification samples. Overall, the SA-PSO BPNN diagnosis model provides theoretical guidance for achieving timely and accurate detection of leaks in coal mine gas extraction pipelines.

Polyethylene pressure pipes are designed to last for at least 50 years. Traditionally, the lifespan of these pipes has been determined through pipe pressure tests. However, with advancements in slow crack growth (SCG) resistance and longer testing times, these tests are no longer suitable for modern pipe grades. Hence, Frank et al. [22] introduced a method for predicting the lifetime of those pipes, that combined the practical benefits of the cyclic cracked round bar (CRB) test and linear elastic fracture mechanics. One major advantage of their approach was that material characterization could be done at ambient temperatures. Using the stress intensity factor concept and taking into account realistic considerations such as initial defect size and changing crack front geometry, the predicted lifetimes of four different PE pipe grades are calculated. The results demonstrated that all materials will meet the minimum required lifetime of 50 years, and under practical assumptions, even exceed 100 years.

Buried cast iron pipelines are prone to corrosion due to the underground soil environment, leading to damage accumulation and eventual fracture failure. Therefore, Ji et al. [23] proposed a probabilistic method for quantitatively assessing the time-dependent reliability of corroded cast iron pipes. A Gamma-based corrosion process was derived based on established corrosion models. The first-order reliability method and Monte Carlo simulation were utilized for cross-validation and time-dependent reliability analysis using a fracture failure criterion. Additionally, uncertain physical parameters of the pipes were updated using the Bayesian Markov Chain Monte Carlo (MCMC) algorithm, incorporating regional historical data of failed pipes. This allowed for lifetime predictions of buried cast iron pipelines.

Wang et al. [24] aimed to propose a model for predicting the lifetime of aging natural gas polyethylene (PE) pipelines with different internal pressures. This was achieved through thermal-oxidative aging (TOA) tests and oxidative induction time (OIT) tests. To simulate real-world conditions, a pressured natural gas PE80 pipe was used in an improved TOA experimental setup. Accelerated TOA tests were conducted on these pressured PE pipes at various temperatures. The results showed that under internal pressures of 0.1MPa, 0.2MPa, 0.3MPa, and 0.4MPa, the lifetimes of aging PE gas pipes were found to be 14%, 24.5%, 36.1%, and 41.6% shorter compared to those without internal pressure, respectively.

Remarks: The recent methods are either based on district metering or external sensor installation. Therefore, they may also introduce additional challenges to municipalities. Such challenges include the requirement to buy, install, maintain, and upgrade sensors for all of the pipes. However, a system that solely relies on tabular features of pipes (e.g., pipe age, passing pressure, etc.) can learn burst probability and eliminate the need for physical objects for monitoring. Therefore, inspired by these, this study is to propose data mining and machine learning approaches to predict pipe burst probability solely based on pipe features [6, 25, 26].

3. Proposed method

The novelty of this paper is mostly focusing on the dataset and the used models which have no negative samples. Therefore, to follow the coherency, in this section are described the dataset and three OCs.

3.1. Dataset

In order to conduct our research, we requested tabular data from the Saveh municipality in Iran regarding the pipes located throughout the city. The dataset contained a total of 8048 rows, all of which were associated with burst pipes. To ensure accuracy and reliability, we removed any rows with missing or duplicate values, resulting in a final dataset of 1467 rows. The features and their number of categories are available in Table 1 while Table 2 indicates categories and feature frequency in the pruned dataset. In the dataset, the “date” column indicates the date on which the incident occurred, while the “address” shows the exact address where the accident occurred. The “District” column is to convert long addresses to a nominal value for convenience. The “branch” feature is to indicate on which branch of the distribution network has the incident occurred. The “size” column indicates the size of a specific pipe whereas “type” shows the type of the pipe. The “incident reason” is to classify a set of pre-defined reasons into one class. The “contributing factor” column indicates the most effective contributing factor to the event. The “land cover” indicates the covering of the land in a specific area.

As could be seen in the former table, only one factor in each bin/category is the most effective contributing factor for bursts. For instance, when analyzing bursts by "year," it was found that the majority of bursts occurred in 2020. Similarly, when examining bursts by "district," it was observed that pipes in districts one and five experienced more bursts than those in other districts. It is crucial to analyze this information as it allows municipalities to identify patterns, such as the fact that pipes with a diameter of 20 in district one are more susceptible to bursting. This enables officials to proactively maintain these pipes before any incidents occur. Nevertheless, analyzing a vast list of pipes can be challenging and time-consuming.

3.2. Classifying algorithms

Since the data in the dataset are not normally distributed, the learning algorithms should not depend on a dataset that has its data drawn from a normally distributed data space [27]. Hence, we used the prominent algorithms in this regard: OC-SVM, Isolation Forest, and Elliptic Envelope [28]. All of which are described below. Needless to say, the prediction features used for testing or validating the methods are the same as those used while training.

OC-SVM is a type of machine learning algorithm used for anomaly detection, where the goal is to identify data points that are significantly different from the majority of the data [27]. In OC-SVM, there is only one class of data available for training, which means that the algorithm learns to identify outliers based on the characteristics of the majority class. The algorithm creates a hyperplane that separates the majority class from the outliers, and any new data point that falls on the other side of this hyperplane is considered an anomaly.

The algorithm starts by mapping the input data into a higher-dimensional feature space using a kernel function. Then, it tries to find the optimal hyperplane that separates the mapped data points from the origin while maximizing the margin around the normal data points. This hyperplane acts as a decision boundary, and any new data point falling on the positive side of this hyperplane is considered normal, while those falling on the negative side are classified as anomalies.

The training process involves solving an optimization problem to find the support vectors, which are the closest data points to the decision boundary. These support vectors define the shape and position of the decision boundary. During testing or prediction, new data points are projected into the same feature space using the same kernel function and then classified based on their position relative to the decision boundary. In summary, OC-SVM works by creating a decision boundary around normal data points in a higher-dimensional feature space and classifying any new data point falling outside this boundary as an anomaly.

Isolation Forest is mostly used as an anomaly detection method and is based on decision trees [29]. This algorithm has five steps:

  1. Random Selection: The algorithm employs a random selection process to choose a feature and a split value within the range of that particular feature.
  2. Splitting: The data is divided into two new branches within the tree based on the selected feature and split value, thereby facilitating further analysis.
  3. Recursive Splitting: The recursive nature of the algorithm involves repeating the process of randomly selecting a feature and splitting value for each branch until a predetermined maximum tree depth is reached or all data points are individually isolated in their own leaf nodes.
  4. Anomaly Score Calculation: Anomaly scores are computed for each data point by considering the average path length required to isolate that specific point across all trees in the forest. Points with shorter average path lengths are deemed more anomalous.
  5. Anomaly Detection: Data points possessing anomaly scores surpassing a certain threshold are classified as anomalies, while those falling below the threshold are considered normal instances.

The Isolation Forest technique capitalizes on the observation that anomalies tend to be infrequent and distinctive, rendering them comparatively easier to isolate when contrasted with normal instances characterized by similar patterns. By rapidly isolating anomalies through relatively few splits, this method effectively identifies outliers or anomalies present within datasets.

Elliptic Envelope is a machine-learning algorithm used for anomaly detection [30]. The Elliptic Envelope algorithm is commonly used in machine learning for outlier detection. It assumes that the majority of data points are generated from a known distribution, typically an elliptical one. The algorithm’s objective is to enclose the normal observations while excluding outliers by fitting an elliptical envelope around the data points. To achieve this, the algorithm estimates the mean and covariance of the data points and calculates a Mahalanobis distance for each point. The Mahalanobis distance measures how many standard deviations away each point is from the distribution center. Using a user-defined contamination parameter, which represents the expected proportion of outliers in the data, a decision function is applied to classify each point as either an inlier or an outlier. Points with a high Mahalanobis distance are considered outliers. By effectively fitting an elliptical envelope around the normal observations, this algorithm can accurately identify outliers that significantly deviate from the expected pattern. The Elliptic Envelope algorithm is particularly useful for detecting outliers in high-dimensional datasets where other methods may not be effective.

4. Experiments

This study is unique in its approach to addressing the problem at hand, as no previous research has utilized a similar methodology. The primary objective of this study is to investigate the effectiveness of a set of features in predicting water pipe bursts within a specific geographical area, using the only available dataset. To achieve this goal, the dataset is subjected to several preprocessing steps, including pruning, extension, and augmentation. Subsequently, three different algorithms are trained on the processed data to determine which one yields the most accurate results in predicting water pipe bursts. By employing this innovative approach, this study aims to contribute valuable insights into the field of water management and infrastructure maintenance, as well as to cast light on future works’ paths.

4.1. Dataset pruning, extending, and augmentation

As previously mentioned, the original dataset consisted of 8048 rows, but due to the presence of missing values and duplicate rows, some rows were removed. As a result, the dataset has been reduced to 1467 rows. Since the “address” column is a long string and does not include contributing information about the incident, instead, the “district” column is used as an alternative. The latter summarizes the former and since is a nominal value, the learning algorithm will not struggle to extract features for training purposes. An additional “season” column is computed from “date” and appended to the data. A “healthy” class is considered for the dataset and is manually filled in order to have a balanced and uniform dataset. Hence, 200 records are manually added to the healthy pipe class, the addition of which also lets us over-sample the whole dataset via SMOTE [31] algorithm and later use two-class classifiers. The algorithm augments the minority class and after augmentation, both classes became equal summing to 2934.

4.2. Dataset split

Seventy percent of the dataset (2054 rows) is devoted to training, 20% (586 rows) to validation, and the rest (294) to testing. Finally, the 5-fold cross-validation is used to ensure that the algorithms are not under- or over-fitted. Therefore, every value reported in the results is the average of each iteration of the 5-fold cross-validation.

4.3. Results

The results of the performances of the models are presented in Tables 3 through 7. In the first experiment, every feature in the original dataset is used to train the models, the performances of which are presented in Table 3. In this experiment, the Elliptic Envelope achieved the best performance except for the case in which OC-SVM achieved the highest AUC. It should be noted that Accuracy and AUC metrics require more than two classes for accurate results, so they were only used to evaluate the balanced data which included healthy pipe samples.

Table 4 illustrates the performance of the model when the "date" feature is simplified as "season." In the previous experiment, the model utilized each component of the "date" feature–year, month, and day–separately. However, to enhance the model’s feature set, instead, the “season” feature was derived from the original feature and inputted as a nominal value. In line with the previous experiment, it was observed that the Elliptic Envelope exhibited superior performance across all metrics overall. However, this decreased the overall performance hence, this feature is reverted to the former in future experiments.

The impact of the absence of the "branch" is shown in Table 5 revealing a decrease in performance and highlighting its significance as a contributing factor. Therefore, this feature is included in future experiments. As with previous experiments, the Elliptic Envelope proved to be the most effective method. Table 6 illustrates that the exclusion of "incident reason" did not have a significant impact on the performance of the classification algorithms. As a result, this feature will not be included in future experiments to reduce computational loads. In this experiment, the best and the worst performances are respectively those of the Elliptic Envelope and Isolation Forest. Lastly, Table 7 presents the outcomes after removing the "contributing factor" feature from the set. The elimination of this feature has enhanced the algorithms’ performance, allowing us to exclude it for reduced computational loads and improved accuracy.

5. Conclusion

Water is a vital resource for sustaining life, but pipe leakages and bursts are major causes of water waste [32]; both of which are mainly two direct impacts of pipe aging. Detecting these issues in real-time has been a challenge for municipalities, making water pipe monitoring a crucial responsibility. Traditionally, leakages and bursts were only detected visually or through reports in local areas. However, to prevent such problems in advance, officials need to be informed about the possibilities of such issues. To automate the detection process, this study takes the initial steps to predict the likelihood of leakages in a particular area using machine learning methods. The study uses a private dataset provided by the municipality of Saveh, Iran that outlines previously leaked pipes. Three One-Class (OC) Classification methods–OC-SVM, Isolation Forest, and Elliptic Envelope–are trained to predict whether a set of pipes is prone to leakage or burst. The highest accuracy achieved is 0.909. This study is significant as it appears to be one of the first studies to use such techniques for predicting pipe leakages and bursts. These findings can help municipalities in three major ways:

  1. Cost savings: By preventing water pipe leakage/burst, municipalities can reduce the amount of treated water that is lost before reaching consumers. This means that they can save on the costs associated with treating and distributing water that is ultimately wasted. Additionally, by reducing leaks, municipalities can avoid costly emergency repairs and minimize disruptions to the water supply system.
  2. Water conservation: Leakage/burst prevention helps conserve water resources by minimizing wastage. Water scarcity is a growing concern in many regions, and preventing leaks ensures that precious water resources are used efficiently. By conserving water, municipalities can meet the needs of their residents while also protecting the environment.
  3. Improved infrastructure maintenance: Regular monitoring of pipe health and maintenance can help identify potential issues before they become major problems. This proactive approach allows municipalities to address small leaks or weak points in the infrastructure promptly, preventing them from escalating into larger, more expensive issues such as burst pipes or complete system failures. By maintaining their infrastructure effectively, municipalities can extend the lifespan of their water distribution systems and avoid costly replacements or extensive repairs.

Overall, not only does investing in water pipe leakage/burst prevention measures help municipalities save money, but also promotes sustainable use of water resources and ensures reliable access to clean drinking water for their residents.

We decided to experiment with multiple approaches instead of using feature selection primarily because we lacked a clear understanding of which features were most influential in this particular case. Without being able to accurately weigh or manually control these features when necessary, especially when the feature space is limited and unbalanced, we opted to experiment with each one individually. This approach allowed us to narrow down the problem space and choose a suitable method for solving it, taking into account the size of the dataset. Additionally, by indicating the contribution of each feature, we aimed to pave the way for future studies in this area. Furthermore, the algorithms employed in our study generate 2D comparison tables. Consequently, implementing feature selection would result in a multidimensional matrix, making it more challenging for readers to readily compare the impact of different algorithms on different features. Therefore, our intention was to present this work as a foundational reference, allowing each feature to showcase its individual contribution at a glance. However, in future instances where a more comprehensive dataset is published, researchers in academia can employ feature selection techniques to modularize their investigations.

The use of anomaly detection algorithms, as opposed to probability laws, for predicting lifespan is justified by several factors [3335]: Anomaly detection algorithms excel at identifying unusual patterns in data, which can be indicative of potential system failures or abnormalities. Unlike probability laws, these algorithms do not rely on prior knowledge or accumulated data, making them particularly valuable for detecting previously unknown anomalies. Probability laws, on the other hand, depend on strong and implicit hypotheses and may not offer reliable anomaly detection capabilities. Furthermore, the concept of an anomaly itself is ambiguous and prone to subjective and imprecise definitions, making it challenging to utilize probability laws effectively for anomaly detection.

As no prior research has employed a similar approach, burst reports did not require additional and detailed information such as pressure, flow rate, etc. By including more comprehensive pipe details in burst reports, such as those mentioned, municipalities can provide algorithms with a wealth of valuable information. These additional parameters enable algorithms to learn more complex features and patterns within the data, ultimately leading to more accurate predictions. For instance, by considering pressure variations along with other relevant factors like pipe material, age, and location, algorithms can better understand the underlying causes of bursts and identify potential vulnerabilities in the water distribution network. Moreover, incorporating these detailed pipe characteristics into burst reports allows for a more holistic analysis of the system’s performance. By capturing a broader range of variables that influence pipe failure, municipalities can gain deeper insights into the overall health and resilience of their infrastructure. This information can then be used to prioritize maintenance efforts, allocate resources effectively, and implement proactive measures to mitigate future bursts. Furthermore, by embracing this new approach to burst reporting that emphasizes comprehensive data collection, municipalities can foster collaboration between various stakeholders involved in water management. Sharing detailed pipe information with researchers, engineers, and other experts can facilitate a collective effort toward developing innovative solutions for sustainable water infrastructure management.

On the other hand, municipalities can also incorporate information regarding the condition of pipes (healthy, damaged, worn, etc.) to enable algorithms to differentiate between cases with greater accuracy. Currently, only damaged pipes are taken into consideration in reports. However, there exist latent factors in healthy pipes that may prevent damages or at least postpone them. By also listing the condition, algorithms will be able to track health over time or similarly map such features to further provide suggestions to keep pipes healthy or to provide information regarding pipe service life. For instance, if an algorithm identifies a pattern where burst pipes are more likely to occur in areas with a certain type of soil composition or high water pressure, it can recommend measures to mitigate these risks. By analyzing data from both damaged and healthy pipes, the algorithm can identify common characteristics among healthy pipes that contribute to their longevity. Furthermore, incorporating information about the condition of pipes can help municipalities prioritize maintenance efforts. Instead of solely focusing on repairing damaged pipes, they can proactively address potential issues in healthy ones as well. By identifying early signs of deterioration or weaknesses in seemingly healthy pipes, preventive measures can be implemented to avoid costly repairs and minimize disruptions in the water supply. In addition to providing suggestions for maintaining pipe health, this approach can also contribute to resource optimization. By understanding the factors that contribute to pipe damage or bursts, municipalities can allocate their resources more efficiently. For example, if certain types of soil composition are found to accelerate pipe deterioration, efforts can be directed toward reinforcing those specific areas with appropriate materials or implementing protective measures.

References

  1. 1. Zhou X, Tang Z, Xu W, Meng F, Chu X, Xin K, et al. Deep learning identifies accurate burst locations in water distribution networks. Water Res. 2019;166: 115058. pmid:31536886
  2. 2. Xiaoting Wang, Guancheng Guo, Shuming Liu, Yipeng Wu, Xiyan Xu, Kate Smith. Burst Detection in District Metering Areas Using Deep Learning Method. J Water Resour Plan Manag. 2020;146: 04020031.
  3. 3. Asghari V, Kazemi MH, Duan H-F, Hsu S-C, Keramat A. Machine learning modeling for spectral transient-based leak detection. Autom Constr. 2023;146: 104686.
  4. 4. van der Walt JC, Heyns PS, Wilke DN. Pipe network leak detection: comparison between statistical and machine learning techniques. Urban Water J. 2018;15: 953–960.
  5. 5. Gopalakrishnan P, Abhishek S, Ranjith R, Venkatesh R, Suriya VJ. Smart pipeline water leakage detection system. Int J Appl Eng Res. 2017;12: 5559–5564.
  6. 6. Al Qahtani T, Yaakob MS, Yidris N, Sulaiman S, Ahmad KA. A review on water leakage detection method in the water distribution network. J Adv Res Fluid Mech Therm Sci. 2020;68: 152–163.
  7. 7. Zhang C, Alexander BJ, Stephens ML, Lambert MF, Gong J. A convolutional neural network for pipe crack and leak detection in smart water network. Struct Health Monit. 2023;22: 232–244.
  8. 8. Farah E, Shahrour I. Leakage detection using smart water system: Combination of water balance and automated minimum night flow. Water Resour Manag. 2017;31: 4821–4833.
  9. 9. Ayati AH, Haghighi A. Multiobjective Wrapper Sampling Design for Leak Detection of Pipe Networks Based on Machine Learning and Transient Methods. J Water Resour Plan Manag. 2023;149: 04022076.
  10. 10. LeCun Y, Bengio Y, Hinton G, Others. Deep learning. nature 521 (7553), 436–444. Google Sch Google Sch Cross Ref Cross Ref. 2015.
  11. 11. Neocleous C, Schizas C. Artificial Neural Network Learning: A Comparative Review BT—Methods and Applications of Artificial Intelligence. In: Vlahavas IP, Spyropoulos CD, editors. Berlin, Heidelberg: Springer Berlin Heidelberg; 2002. pp. 300–313.
  12. 12. Kim H, Lee J, Kim T, Park SJ, Kim H, Jung ID. Advanced thermal fluid leakage detection system with machine learning algorithm for pipe-in-pipe structure. Case Stud Therm Eng. 2023;42: 102747.
  13. 13. Mikami N, Ueki Y, Shibahara M, Aizawa K, Ara K. State sensing of bubble jet flow based on acoustic recognition and deep learning. Int J Multiph Flow. 2023;159: 104340.
  14. 14. Farajzadeh N, Sadeghzadeh N. NSSI questionnaires revisited: A data mining approach to shorten the NSSI questionnaires. Ijaz MF, editor. PLOS ONE. 2023;18: e0284588. pmid:37083960
  15. 15. Wang C, Xu Q, Qiang Z, Zhou Y. Research on pipe burst in water distribution systems: knowledge structure and emerging trends. J Water Supply Res Technol-Aqua. 2022;71: 1408–1424.
  16. 16. El-Zahab S, Zayed T. Leak detection in water distribution networks: an introductory overview. Smart Water. 2019;4: 1–23.
  17. 17. Cheng J, Peng S, Cheng R, Wu X, Fang X. Burst Area Identification of Water Supply Network by Improved DenseNet Algorithm with Attention Mechanism. Water Resour Manag. 2022;36: 5425–5442.
  18. 18. Moulik S, Majumdar S, Pal V, Thakran Y. Water Leakage Detection in Hilly Region PVC Pipes using Wireless Sensors and Machine Learning. 2020 IEEE International Conference on Consumer Electronics—Taiwan (ICCE-Taiwan). Taoyuan, Taiwan: IEEE; 2020. pp. 1–2. https://doi.org/10.1109/ICCE-Taiwan49838.2020.9258144
  19. 19. Hayslep M, Keedwell E, Farmani R, Pocock J. Understanding district metered area level leakage using explainable machine learning. IOP Conf Ser Earth Environ Sci. 2023;1136: 012040.
  20. 20. Mazaev G, Weyns M, Vancoillie F, Vaes G, Ongenae F, Van Hoecke S. Probabilistic leak localization in water distribution networks using a hybrid data-driven and model-based approach. Water Supply. 2023;23: 162–178.
  21. 21. Zhou J, Lin H, Li S, Jin H, Zhao B, Liu S. Leakage diagnosis and localization of the gas extraction pipeline based on SA-PSO BP neural network. Reliab Eng Syst Saf. 2023;232: 109051.
  22. 22. Andreas Frank, Arbeiter Florian J., Berger Isabelle J., Hutař Pavel, Náhlík Luboš, Pinter Gerald. Fracture Mechanics Lifetime Prediction of Polyethylene Pipes. J Pipeline Syst Eng Pract. 2019;10: 04018030.
  23. 23. Ji J, Xie X, Fu G, Kodikara J. Time-Dependent Reliability Analysis of Fracture Failure of Corroded Cast Iron Water Pipes and Bayesian Updating for Lifetime Prediction. Keawsawasvong S, editor. Adv Civ Eng. 2023;2023: 6644493.
  24. 24. Wang Y, Lan H, Meng T. Lifetime prediction of natural gas polyethylene pipes with internal pressures. Eng Fail Anal. 2019;95: 154–163.
  25. 25. Chatzigeorgiou DM, Youcef-Toumi K, Khalifa AE, Ben-Mansour R. Analysis and Design of an In-Pipe System for Water Leak Detection. IDETC-CIE2011. Volume 5: 37th Design Automation Conference, Parts A and B; 2011. pp. 1007–1016.
  26. 26. Chan T. K., Chin C. S., Zhong X. Review of Current Technologies and Proposed Intelligent Methodologies for Water Distributed Network Leakage Detection. IEEE Access. 2018;6: 78846–78867.
  27. 27. Bishop CM, Nasrabadi NM. Pattern recognition and machine learning. Springer; 2006.
  28. 28. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521: 436–444. pmid:26017442
  29. 29. Somvanshi M., Chavan P., Tambade S., Shinde S. V. A review of machine learning techniques using decision tree and support vector machine. 2016 International Conference on Computing Communication Control and automation (ICCUBEA). 2016. pp. 1–7.
  30. 30. Smiti A. A critical overview of outlier detection methods. Comput Sci Rev. 2020;38: 100306.
  31. 31. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16: 321–357.
  32. 32. Kang J., Park Y. -J, Lee J, Wang S. -H, Eom D. -S. Novel Leakage Detection by Ensemble CNN-SVM and Graph-Based Localization in Water Distribution Systems. IEEE Trans Ind Electron. 2018;65: 4279–4289.
  33. 33. Sunny JS, Patro CPK, Karnani K, Pingle SC, Lin F, Anekoji M, et al. Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects. Sensors. 2022;22: 756. pmid:35161502
  34. 34. Le Lan C, Dinh L. Perfect Density Models Cannot Guarantee Anomaly Detection. Entropy. 2021;23: 1690. pmid:34945996
  35. 35. Pang J, Liu D, Peng Y, Peng X. Optimize the Coverage Probability of Prediction Interval for Anomaly Detection of Sensor-Based Monitoring Series. Sensors. 2018;18: 967. pmid:29587372