A Novel Intelligent Leakage Monitoring-Warning System for Sustainable Rural Drinking Water Supply

Li, Xiaoqin; Wu, Xiaomei; Sun, Mingzhuang; Yang, Shengqiao; Song, Weikun

doi:10.3390/su14106079

Open AccessArticle

A Novel Intelligent Leakage Monitoring-Warning System for Sustainable Rural Drinking Water Supply

¹

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100038, China

²

Department of Irrigation and Drainage, China Institute of Water Resources and Hydropower Research, Beijing 100048, China

³

School of Environment, Tsinghua University, Beijing 100084, China

⁴

School of Energy and Environmental Engineering, Hebei University of Engineering, Handan 056038, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(10), 6079; https://doi.org/10.3390/su14106079

Submission received: 13 April 2022 / Revised: 12 May 2022 / Accepted: 13 May 2022 / Published: 17 May 2022

(This article belongs to the Section Sustainable Urban and Rural Development)

Download

Browse Figures

Versions Notes

Abstract

:

Leakage occurs in rural water supply pipelines very often and its locating is quite demanding even for specialists, which could result in a poor operation efficiency of rural water supply projects and thus a low rural water supply guarantee rate. In view of this problem, the detection of leakage, as well as its prediction, is of great significance for the operation, maintenance, and administration of rural water supply projects. The traditional monitoring-warning systems for urban water distribution networks cannot be applied to rural water distribution networks, due to various limitations. Meanwhile, as with the traditional models, most new approaches based on machine learning such as the artificial neural network (ANN), probabilistic neural network (PNN), and statistical learning theory (SLT) do not fit rural water distribution networks much better, being unable to converge and force high-accuracy results with small sample sizes, which is a crucial demand to meet when dealing with rural water supply pipelines. Extreme gradient boosting (XGBoost), a model that specializes in small sample sizes and has a high generalization ability, was applied to a rural water supply project in Ningxia, China. In this study, a monitoring-warning system featuring both leakage locating and quantity estimation was established based on XGBoost. The accuracy and F1-score of the leakage locating model were 95% and 93%, respectively, while those of the leakage quantity model reached 96% and 97%, respectively. Furthermore, the pressure of monitoring stations could be obtained through the feature importance analysis enabled by XGBoost, which is essential for leakage warning. These results indicate that this system based on XGBoost could be a promising solution to the leakage issue in rural water supply projects, as a great inspiration for future developments in intelligent monitoring-warning systems, thus providing reliable approaches for the sustainable development of rural drinking water supply projects.

Keywords:

rural; water distribution; leakage; monitoring; XGBoost

1. Introduction

By the end of 2020, there were around 580,000 concentrated rural water supply projects in China, providing safe tap water to 909 million rural residents [1]. However, according to the rural water supply plan in the 14th five-year plan of China, around 10% of the concentrated rural water supply projects were constructed before 2005, most of which are facing issues such as aging pipelines and high leakage rates. Reports show that in some regions, the leakage rate of the rural water supply pipeline has surpassed 30%, menacing the rural tap water safety. Pipeline leakage could be due to various reasons, such as pipe cracking, low construction quality, and great maintenance difficulties caused by the depth of buried pipes. In the northeast of China, the pipes are buried relatively deep and the temperature in winter could be low. The leakage rate there could reach 20% in some cities [2].

As stated above, leakage is a serious issue in rural water supply projects and restricts the sustainable development of rural water supply projects. The existing leakage detection approaches are mostly developed from the passive testing method, MNF (minimum night flow), or warning mechanism based on the upper and lower pressure limits. Negharchi and Shafaghat (2020) [3] carried out research on a rural network in the north of Iran using two leakage calculation methods, including background and bursts estimates (BABE) and MNF; the average leakage was found to be 1.45 L/s and 1.105 L/s, respectively. The effects of the legitimate night-time consumption (LNC) and leakage exponent (N) have been evaluated. Norouzi et al. (2019) [4] proposed that employing suitable techniques for specifying the right domestic night-time consumption values is essential when applying the MNF method. Flow measurement by loggers was used to determine real losses through MNF analysis in the Juru Rural Service Centre by Chawira et al. (2022) [5]. Dandansaz et al. (2020) [6] minimized network leakage by applying minimal pressure on network nodes and analyzed the water distribution network using WaterGEMS and ArcGIS, aiming to determine the optimal pressure in the network.

The methods listed above are apparently much less efficient than those applied to urban water distribution networks. A compatible monitoring model for the rural water distribution network is in need. Traditional monitoring models already applied to leakage detection in urban water distribution networks usually rely on sufficient historical data, a hydraulic model of the pipeline, and real-time online monitoring data. However, when developing monitoring-warning systems for rural water distribution networks, researchers would face a lack of online monitoring stations, a limited variety of monitoring indicators, and insufficient historical data, which would result in the low accuracy of the hydraulic model of pipelines and thus a poor leakage detection model. New methods, different from those adopted in monitoring models for urban water distribution networks, need to be introduced and assessed in rural water supply projects.

With the development of network technology, the Internet of Things, and a cloud platform, real-time monitoring data of high quality could be obtained and transmitted by online sensors, enabling the application of artificial intelligence and machine learning to the analysis of online monitoring data. During the last decade, much progress has been made in researching new algorithms such as the artificial neural network (ANN) and fuzzy inference system (FIS). Mounce et al. (2002) [7] first proposed the application of artificial intelligence to water distribution networks. Mounce et al. (2006) [8] utilized multi-layered perceptrons (MLP) and the time delay neural network (TDNN) when studying the process where a fire hydrant was applied to a simulated burst in a pipeline. In the following research of Mounce et al. (2003, 2008, 2010, 2007) [9,10,11,12], an artificial intelligence system was established for the detection of bursts and flow meter data analysis, enabled by continuously updated historical data.

Many other researchers also developed a variety of ANN models. Caputo and Pelagagge (2002, 2003) [13,14] proposed a leakage monitoring approach based on a multi-layered neural network. Feng and Zhang (2006) [15] proposed a leakage monitoring approach based on a fuzzy neural network that could also identify anomalies in a pipeline. Aksela et al. (2009) [16] established a pipeline leakage detection model based on a self-organizing map, where the leakage function consisted of a distance function and a confidence function. Tao et al. (2014) [17] proposed a burst detection method based on an artificial immune network, where a burst could be located by the algorithm of nearest neighbors after monitoring data were inputted into the artificial immune network. Liang et al. (2001) [18] established a model based on the ANN, able to describe the relationship between three pressure monitoring stations and leakage-concerning parameters such as leakage location, intensity, and influence. Huang et al. (2007) [19] developed a method based on supervisory control and data acquisition (SCADA) that could locate a leakage through a fuzzy similar priority comparison. The industrial application of an ANN to leakage detection has also been reported with satisfying outcomes. However, it is not easy for such models to converge and their convergence would require large training sample sizes, thus making them unfit for rural water supply projects.

Bayesian analysis (BA) was introduced to solve the convergence problem of the ANN, where the probability distribution could describe all forms of uncertainty. Poulakis et al. (2003) [20] established a Bayesian probabilistic framework for leakage detection. Costanzo et al. (2014) [21] also realized leakage area determination through BA. Romano et al. (2010) [22] integrated an ANN, statistical process control (SPC), and BA into a burst-leakage detection framework and tested its applicability in a district metered area (DMA) of the UK. The results showed that this framework could successfully locate bursts and leakage. Despite the satisfying results obtained with BA, the Bayesian models have to make assumptions on the probability distribution of training samples, which would often cause a low identification accuracy. Large training sample sizes are required to improve accuracy, which is not practical in rural water supply projects.

Efforts have been made to solve the problem of sample size requirement. Vapnik et al. (1982) [23] developed a statistical learning theory (SLT) that could be applicable to small sample sizes. Mounce et al. (2010) [24] utilized support vector machines (SVMs) to analyze the time-series data obtained from monitoring stations and realized online monitoring of abnormal events such as burst-leakage, pipe cleaning, and sensor malfunction. Mamo et al. (2014) [25] developed a leakage detection and classification technique based on multi-class SVMs (M-SVMs). The operation state of the pipeline was classified into six categories according to the degree of leakage. M-SVMs could then be applied to identify the operation state of the DMA, based on the flow and pressure data obtained from monitoring stations. Zhang et al. (2016) [26] proposed a leakage zone identification model that could be suitable for large-scale pipe networks. Compared to ANN and FIS, this method could send warnings much faster. The weakness of this method is that it fails to consider the optimization of parameters, which could greatly influence the classification accuracy.

Another famous algorithm specialized in small sample sizes is gradient boosting, developed by Friedman, as an approximate of gradient descent. The gradient boosting decision tree (GBDT) is one of the best-performing models derived from this algorithm. It is a recursive model, consisting of multiple decision trees. GBDT could deal with all types of data, achieve high accuracy, and stay robust against anomalies. Extreme gradient boosting (XGBoost) is an enhanced version of GBDT, where regularizers have been introduced to avoid overfitting. XGBoost has a great generalization ability, widely used for various aims such as parameter anomaly detection in satellite engineering, personal credit risk assessment, water depth inversion based on remote sensing, determination of urban water sustainability index, and predicting the quantity of the urban water supply (Chen et al., 2016, Devan et al., 2020, Clercq et al., 2018, He et al., 2020) [27,28,29,30]. XGBoost has been applied to pipeline maintenance in several studies. Snider and McBean (2020) [31] used XGBoost to predict pipeline rupture. Wu et al. (2022) [32] adopted XGBoost to study the influence of the count of leakage occurrence and its location on model performance. Artificial leakage data were generated using a hydraulic model simulation and the prediction accuracy could reach 90.4%. The research of Mohsen (2021) [33] showed that SVM and XGBoost ranked first in predicting leak and nonleak samples in a laboratory-scale water distribution system. Moreover, Wang et al. (2021) [34] found that XGBoost had a better generalization ability than SVM, as XGBoost could improve its prediction accuracy via the decontamination effect. Nagaraj and Lakshmi (2021) [35] reported that the XGBoost classifier outperformed the other machine learning algorithms assessed in their study in terms of water body extraction.

As a model applicable to small sample sizes, XGBoost has a high potential for leakage detection in rural water distribution networks. The validation of its feasibility would be of great value to the better development of rural water supply projects, which has not been carried out to the best of our knowledge. In this study, XGBoost was applied in a rural water supply project in Ningxia, China. A novel intelligent monitoring-warning system for leakage detection was established, consisting of a leakage locating model and a leakage quantity model, aimed to provide valuable insight into the construction and maintenance of future rural water supply projects.

2. Materials and Methods

2.1. Leakage Simulation and Monitoring

The water supply plant where this study was carried out is in Zhongwei, Ningxia, China, with a capacity of 20,000 m³/d. It has been under operation since May 2019, supplying water to 120,000 residents. Ground water pumped through 4 wells is treated in this plant and then pumped to the secondary booster pump station. As shown in Figure 1, the water distribution system is established as a branching network. On each main branch, multiple monitoring and test points were set-up. In this study, four additional monitoring points were set-up at Jingtai, Yanggou, Shuangda, and Caoshan, where pipe flow and pressure could be measured and transmitted in real time. Test points were set-up between monitoring points as simulated leakage locations. Controlled leakage could be realized by a three-way valve or an effluent pipe with a valve.

The parameters considered in the experiment plan included water usage period, leakage location, and leakage quantity. Table 1 shows the parameter settings for water usage period. The distribution of test points was as displayed in Figure 1. Table 2 shows the parameter settings for leakage quantity. When leakage quantity surpasses 30% of the flow in pipeline, the pressure change could be obvious and thus easy to identify. As this study focused on occasions where leakage could be difficult to identify, the leakage quantity was set to be smaller than 30% of the flow.

Before each test, a handheld ultrasonic flowmeter (TDS-100H, Dalian Haifeng Development Co., Ltd., Dalian, China) was installed on the pipe. The duration time of each test started recording when the valve was opened. The designed leakage quantity could be realized by adjusting the valves. After recording the results of the flowmeter for 5 min, the valves were closed. The valves would then be opened and adjusted again for another leakage quantity. The same procedure was repeated for each preset experimental condition until all desired data were collected.

2.2. Model Setup and Analysis

2.2.1. Data Preparation

The models developed in this study were based on XGBoost, the mechanism of which was as described in Appendix A. Data preparation consists of four steps: data acquisition, data transfer, generation of positive and negative samples, and data pre-treatment. Data transfer refers to the process of transferring the results of tests carried out at a couple of water usage periods to those of tests carried out at different water usage periods, as there were certain limitations during the experiment and sample sizes were relatively small. The generation of positive and negative samples could then be completed based on the expanded database. Data pre-treatment mainly concerns the determination of x features and y features, as well as the mapping of several nonnumerical data sources.

The x features are water usage period, pipe depth, pipe flow, pre-test pressure of Jingtai (Jingtai 1st), pre-test pressure of Yanggou (Yanggou 1st), pre-test pressure of Shuangda (Shuangda 1st), pre-test pressure of Caoshan (Caoshan 1st), test pressure of Jingtai (Jingtai 2nd), test pressure of Yanggou (Yanggou 2nd), test pressure of Shuangda (Shuangda 2nd), and test pressure of Caoshan (Caoshan 2nd), labeled as x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, and x11, respectively. Among the x features, two of them, x1 and x2, are nonnumerical and thus need to be transformed, as shown in Table 3. The y features are leakage location and leakage quantity, which are represented by the percentage of leakage in pipe flow. Both features are nonnumerical and their corresponding numerical labels are as shown in Table 4.

After data pre-treatment, the samples were ready for training. The sample distribution of the leakage locating model and that of the leakage quantity model were as shown in Table 5 and Table 6.

2.2.2. Iteration Setup

Proportions of 80%, 10%, and 10% of the total samples were used for training, validation, and testing, respectively. The data distribution of the two models was as shown in Table 7.

The parameter optimization was based on the greedy algorithm as explained in Section 2.2. At every iteration, only one parameter would be modified until no improvement could be made. After finishing the establishment of the leakage locating model, the input of the leakage quantity model could be obtained and iterations for the leakage quantity simulation would be started until optimized parameters were found. The hyper-parameters selected in this study and their default values were as shown in Table 8.

2.2.3. Results Analysis

The assessment of model performance was based on the F1-score, accuracy, and AUC-ROC (Area Under the Curve-Receiver Characteristic Operator) curve, as calculated in Equations (1)–(6). The F1-score and accuracy have been used in many studies on XGBoost, such as Mohsen (2021) [33] and Zhong et al. (2018) [36].

Accuracy = (TP + TN)/(TP + FN + FP + TN)

(1)

F1-score = 2 × Precision×Recall/(Precision + Recall)

(2)

Precision = TP/(TP + FP)

(3)

Recall = TP/(TP + FN)

(4)

TPR = Recall

(5)

FPR = FP/(TN + FP)

(6)

TP/TN stands for true positive/negative, which means that a positive/negative class is predicted as positive/negative. FP/FN stands for false positive/negative, which means that a datapoint of one class is predicted as that of the other class.

The AUC-ROC curve is the preferred method to assess the performance of classification models. ROC is a probability distribution curve, whose x-coordinate and y-coordinate are TPR and FPR, respectively. Ranging from 0 to 1, AUC is calculated from ROC, representing how well the different classes have been separated from each other. The higher the AUC, the better the performance is, with AUC values between 0.5 and 1. When AUC is less than 0.5, the model performance would be considered poor.

To examine the sensitivity of results to the parameters, sensitivity analysis was realized by IBM SPSS Statistics (Version 27.0, IBM, New York, NY, USA). Feature importance analysis was also carried out as XGBoost could obtain the importance of each feature. The more one feature participates in the construction of the decision tree, the higher its importance.

3. Results and Discussion

3.1. Sensitivity Analysis

As can be seen in Table 9, Max_depth and Learning_rate had the highest F-value, at 10.458 and 7.407, respectively. This means that the results were most sensitive to these two factors. Max_depth had great influence on the model complicity, while Learning_rate controlled the steps of each iteration, thus highly relevant to model robustness. Their relatively high sensitivity could be explained while the p-values of all the factors were greater than 0.05, indicating poor statistical significance.

3.2. Model Performance

Training samples were used for model establishment, which would then be used to predict testing samples. The accuracy and F1-score of the two models, applied to training samples and testing samples, were as shown in Table 10. The results obtained from testing samples were almost as good as those obtained from training samples.

The fluctuation in the results along the iterations were also studied. As can be seen in Figure 2, the F1-score and accuracy of the leakage locating model were both satisfying. During iterations, most leakage could be identified. The lowest accuracy and F1-score were 74% and 72%, respectively. At iteration 15, the accuracy and F1-score could already reach 95% and 93%, respectively, indicating that the leakage could be accurately located by the leakage locating model in this study.

The leakage quantity model could predict the quantity of leakage at the location where leakage occurs according to the leakage locating model. The performance of this model was as shown in Figure 3. At iteration 8, the accuracy and F1-score could reach 96% and 97%, respectively, while the error of leakage quantity did not surpass 10% of the pipe flow, indicating that this model predicted leakage quantity with high accuracy and sensitivity. Farley et al. (2008, 2010, 2013) [37,38,39] used a dynamic hydraulic model to locate leakage and noted that the leakage quantity had to be at least 1.5 L/s to cause a pressure change measured at monitoring points. In this study, the leakage quantity model could predict a leakage of less than 1 L/s. Mounce et al. (2006, 2010, 2007) [8,11,12] developed a leakage quantity model based on ANN and the error of its estimate was around 10% of the pipe flow, which was comparable to the performance of the XGBoost model developed in this study.

The results obtained from the AUC-ROC curve were as shown in Figure 4. Class 0, 1, 2, 3, and 4 in the leakage locating model represented “No leakage”, “Yanggou”, “Chengnong”, “Shuangda”, and “Caoshan”, respectively, while Class 0, 1, 2, and 3 in the leakage quantity model represented “y2 < 10%”, “10% ≤ y2 < 20%”, “20 ≤ y2 < 30%”, and “y2 ≥ 30%”, respectively. As can be seen in Figure 4, the AUC of each curve was close to 1. The micro- and macro-average were both over 0.99. The results of Figure 4 further demonstrated the promising performance of the models established in this study.

3.3. Feature Importance Analysis

By calling the function of xgb.feature_importances, the importance of each feature could be obtained, as shown in Figure 5.

It could be seen that in the leakage locating model, the pressure of monitoring points had great importance, whereas neither the influence of water usage period nor that of pipe depth surpassed 5%. Among the four pressure monitoring points, Yanggou, which was situated at a low position, and Shuangda, which was situated at a high position, were very sensitive to pressure and their pressure proved to have greatest importance according to the feature importance analysis. In the leakage quantity model, the most important features were still pressure of the monitoring points, but the importance of pipe flow was lower in the leakage quantity model than that in the leakage locating model. In addition, in both models, the importance of second tests was higher than that of first tests, which indicates that the accurate and timely update of monitoring data is vital to monitoring and warning of leakage.

After further analysis, it could be concluded that monitoring points near the end of the pipeline were more sensitive to pressure than those near the start of the pipeline. Jingtai was located at the start, where the pressure measured had much less importance than that measure elsewhere. Both the importance of pipe depth and that of water usage period were under 10% in both models. The two features mentioned above should not be included in future studies. The feature importance analysis found that the pressure of the nodes could have profound influence on the operation of water distribution networks, thus being among the most important factors to consider when dealing with leakage issues. Extra attention should be paid to pressure management, especially that of nodes near the end of the pipeline, so as to better deal with existing and potential leakage issues.

4. Conclusions

XGBoost, a model that has a great generalization ability and specializes in small sample sizes, was applied to the leakage detection of a rural water supply project in Ningxia, China. A novel intelligent monitoring-warning system was established, consisting of a leakage locating model and a leakage quantity model. The accuracy and F1-score of the leakage locating model were 95% and 93%, respectively, while those of the leakage quantity model were 96% and 97%, respectively. The AUCs of the AUC-ROC curves were all close to 1, while both micro- and macro-F1 were over 0.99. The model performance was satisfying. In addition, with the help of feature importance analysis, enabled by XGBoost, the most important feature for leakage detection was discovered to be the pressure of monitoring points, and it was found that the importance of second tests was greater than that of first tests, indicating that the stable and timely transmission of online monitoring data could be crucial for the establishment of an intelligent monitoring system for rural water distribution networks.

The local water management authorities were also satisfied by the results. The intelligent system established in this study could not only help with major leakage incidents but also minor leakage issues that are difficult to notice. The main holdback of this system found by managers is that the reliable leakage warning service is based on the stable operation of the warning system, which requires the timely upload of pressure data and stable internet access. Certain efforts must be made to the maintenance of pressure monitoring devices.

The successful application of XGBoost in this study shows that a highly intelligent monitoring system for leakage detection in rural water supply projects is not impossible. To further improve the models developed in this study, the system could be enabled to constantly learn from new samples while conserving existing knowledge. Hopefully, the leakage locating model could become even more precise. A future study on the application of this model in rural water supply projects could eventually realize the efficient and accurate identification of leakage, early prediction, timely treatment, and hence significant improvement of rural water supply services.

Author Contributions

Conceptualization, X.L.; data curation, X.L. and S.Y.; formal analysis, X.L., M.S. and W.S.; funding acquisition, X.W.; investigation, X.W.; methodology, X.L.; validation, W.S.; visualization, S.Y.; writing—original draft, X.L.; writing—review and editing, X.L., X.W. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a cooperative research project of academician workstation (No.2020.C-003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

Thanks to Zhang Yanhua from the Zhongwei Water Limited Company of Ningxia Water Investment Group and Yang Pengwei from the China Institute of water resources and Hydropower Research, for providing support in the leakage simulation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

XGBoost model: the main mechanism of the XGBoost model is integrating boosted trees as a kind of additive model (i.e., training process) [40]. It can be written as follows:

\hat{Y} i = \sum_{κ - 1}^{Κ} ƒ κ (x i), ƒ κ ϵ ℱ

(A1)

ƒ_{κ} (x_{i}) = ω_{q (x_{i})} (q : R^{m} \to T, ω ϵ R^{T})

(A2)

κ

is the number of trees and

ƒ

(

x

) is a function in the function space; the function

q (x_{i})

divides sample x into a certain leaf node;

ω

is the leaf score of leaf nodes.

The target function is as described in Equations (A3) and (A4).

ℒ = \sum_{i = 1}^{n} ι (\hat{Y} i, Y i) + \sum_{κ - 1}^{Μ} Ω ƒ κ

(A3)

where Ω (ƒ) = γ Τ + \frac{1}{2} ∥ ω ∥^{2} = γ Τ + \frac{1}{2} \sum_{j = 1}^{Τ} ω_{j}^{2}

(A4)

Here, the function

ι

is a differentiable convex function that represents the difference between the predicted value

\hat{Y}

and the true value

Y

, measured as the mean squared error. The penalty term

Ω

is used to ensure that the tree structure does not become too complex. The weights can be smoothed to avoid the problem of overfitting. The error function of XGBoost has L1 and L2 regular terms, where the loss function can be a square loss or a logical loss, and T represents the number of leaf nodes. The introduction of regularizers is to prevent overfitting. More precisely, regularizers could enable preponing as there is a limited number of leaf nodes in the regularizers; meanwhile, the coefficient of the squared L2 modulus of the leaf score in the regularizers could smooth the leaf score.

The objective function is expanded and simplified using Taylor’s second-order formula, as shown in Equations (A5)–(A8).

ℒ^{(t)} ≃ \sum_{i - 1}^{n} [ι (Y i, {\hat{Y}}^{(t - 1)}) + g_{i} ƒ t (Χ i) + \frac{1}{2} h i ƒ_{t}^{2} (Χ i)] + Ω (ƒ t)

(A5)

where g_{i} = \partial_{{\hat{Y}}^{(t - 1)}} ι (Y i, {\hat{Y}}^{(t - 1)}) and h i = \partial_{{\hat{Y}}^{(t - 1)}}^{2} ι (Y i, {\hat{Y}}^{(t - 1)})

(A6)

{\tilde{ℒ}}^{(t)} = \sum_{i - 1}^{n} [g_{i} ƒ t (Χ i) + \frac{1}{2} h i ƒ_{t}^{2} (Χ i) + γ Τ + \frac{1}{2} λ \sum_{j = 1}^{Τ} ω_{j}^{2}]

(A7)

= \sum_{j = 1}^{Τ} [(\sum_{i ϵ Ι j} g_{i}) ω_{j} + \frac{1}{2} (\sum_{i ϵ Ι j} h i + λ) ω_{j}^{2}] + γ Τ

(A8)

After the structure of the tree is determined, the objective function can be minimized by setting its derivative to zero. The optimal prediction fraction of each leaf node can be obtained as follows: for continuous attribute features, discretize according to the candidate segmentation points, and select the best one from these candidates.

ω_{j}^{*} = - \frac{\sum_{i ϵ Ι j} g_{i}}{\sum_{i ϵ Ι j} h i + λ},

(A9)

Substituting this into the objective function, the minimum loss is:

{\tilde{ℒ}}^{(t)} (q) = - \frac{1}{2} \sum_{j = 1}^{Τ} \frac{{(\sum_{i ϵ Ι j} g_{i})}^{2}}{\sum_{i ϵ Ι j} h i + λ} + γ Τ

(A10)

After the determination of the objective function and the best solution, the nodes should be split according to the node segmentation algorithm. A greedy algorithm is where all splitting points are identified using an exhaustive method, and then the splitting point with the largest information gain is selected. The greedy algorithm is as described in Figure A1.

Figure A1. Mechanism of greedy algorithm.

References

Ministry of Water Resources of the P. R. China. National Rural Water Supply Guarantee Plan for the 14th Five Year Plan; Ministry of Water Resources of the P. R. China: Beijing, China, 2021. Available online: https://baijiahao.baidu.com/s?id=1711649821043075162&wfr=spider&for=pc (accessed on 12 April 2022).
The National Development and Reform Commission of the P. R. China; Ministry of Water and Resources of the P. R. China; Ministry of Housing and Urban Rural Development of the P. R. China; Ministry of Industry and Information Technology of the P. R. China; Ministry of Agriculture and Rural Areas of the P. R. China. Construction Plan of Water-Saving Society in the 14th Five Year Plan; The National Development and Reform Commission of the P. R. China: Beijing, China, 2021. Available online: https://www.ndrc.gov.cn/xxgk/zcfb/ghwb/202111/t20211108_1303414_ext.html (accessed on 12 April 2022).
Negharchi, S.M.; Shafaghat, R. Leakage estimation in water networks based on the BABE and MNF analyses: A case study in Gavankola village, Iran. Water Supply 2020, 20, 2296–2310. [Google Scholar] [CrossRef]
Norouzi, J.; Jalili, G.M.; Moslehi, I. A review of previous studies in determining the night water consumption of household customer (in Persian). In Proceedings of the Second National Conference on Water Consumption Management, Loss Reduction & Reuse, Tehran, Iran, 10–12 December 2019; Tabesh, M., Ed.; University of Tehran: Tehran, Iran, 2019; pp. 1–9. [Google Scholar]
Chawira, Z.M.; Hoko, A.; Mhizha, A. Partitioning non-revenue water for Juru Rural Service Centre, Goromonzi District, Zimbabwe. Phys. Chem. Earth 2022, in press. [CrossRef]
Dandansaz, H.K.; Asl, M.S.; Joneidi, A. Hydraulic Simulation of Rural Water Distribution Network Aiming at Reduced Leakage (Case Study: Ghorakhk Village, Binalood Region). J. Water Wastewater Sci. Eng. 2020, 5, 48–59. [Google Scholar]
Mounce, S.R.; Day, A.J.; Wood, A.; Khan, A.; Widdop, P.; Machell, J. A neural network approach to burst detection. Water Sci. Technol. 2002, 45, 237–246. [Google Scholar] [CrossRef]
Mounce, S.R.; Machell, J.; Boxall, J.B. Development of Artificial Intelligence Systems for Analysis of Water Supply System Data. Am. Soc. Civ. Eng. 2006, 1–15. [Google Scholar] [CrossRef]
Mounce, S.R.; Khan, A.; Wood, A.; Day, A.J.; Widdop, P.D.; Machell, J. Sensor-fusion of hydraulic data for burst detection and location in a treated water distribution system. Inf. Fusion 2003, 4, 217–229. [Google Scholar] [CrossRef]
Mounce, S.R.; Boxall, J.B.; Machell, J. Online application of ANN and fuzzy logic system for burst detection. In Proceedings of the 10th Annual Water Distribution Systems Analysis Conference WDSA2008, Kruger National Park, South Africa, 17–20 August 2008; Zyl, J.E., Ilemobade, A.A., Jacobs, H.E., Eds.; pp. 735–746. [Google Scholar]
Mounce, S.R.; Boxall, J.B.; Machell, J. Development and Verification of an Online Artificial Intelligence System for Detection of Bursts and Other Abnormal Flows. J. Water Resour. Plan. Manag. 2010, 136, 309–318. [Google Scholar] [CrossRef]
Mounce, S.R.; Boxall, J.B.; Machell, J. An artificial neural network/fuzzy logic system for DMA flow meter data analysis providing burst identification and size estimation. In Proceedings of the Water Management Challenges in Global Change, Leicester, UK, 3–5 September 2007; pp. 313–320. [Google Scholar]
Caputo, A.C.; Pelagagge, P.M. An inverse approach for piping networks monitoring. J. Loss Prev. Process Ind. 2002, 15, 497–505. [Google Scholar] [CrossRef]
Caputo, A.C.; Pelagagge, P.M. Using neural networks to monitor piping systems. Process. Saf. Prog. 2003, 22, 119–127. [Google Scholar] [CrossRef]
Feng, J.; Zhang, H. Algorithm of Pipeline Leak Detection Based on Discrete Incremental Clustering Method. In Proceedings of the International Conference on Intelligent Computing, Kunming, China, 16–19 August 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 602–607. [Google Scholar] [CrossRef]
Aksela, K.; Aksela, M.; Vahala, R. Leakage detection in a real distribution network using a SOM. Urban Water J. 2009, 6, 279–289. [Google Scholar] [CrossRef]
Tao, T.; Huang, H.; Li, F.; Xin, K. Burst Detection Using an Artificial Immune Network in Water-Distribution Systems. J. Water Resour. Plan. Manag. 2014, 140, 04014027. [Google Scholar] [CrossRef]
Liang, J.W.; Xiao, D.; Zhao, X.H.; Zhang, H.W. Real-time fault diagnosis method of water supply network. J. Hydraul. Eng. 2001, 12, 40–47. [Google Scholar]
Huang, Y.L.; Cao, M.H.; Zhang, H. Research on the method of real-time detection of pipe burst position in water supply network based on SCADA system. Water Wastewater Eng. 2007, 33, 104–108. [Google Scholar]
Poulakis, Z.; Valougeorgis, D.; Papadimitriou, C. Leakage detection in water pipe networks using a Bayesian probabilistic framework. Probabilistic Eng. Mech. 2003, 18, 315–327. [Google Scholar] [CrossRef] [Green Version]
Costanzo, F.; Morosini, A.F.; Veltri, P.; Savic, D. Model calibration as a tool for leakage identification in WDS: A real case study. In Proceedings of the Procedia Engineering, 16th Water Distribution System Analysis Conference (WDSA2014) Urban Water Hydroinformatics and Strategic Planning, Bari, Italy, 14–17 July 2014; Giustolisi, O., Brunone, B., Laucelli, D., Berardi, L., Campisano, A., Eds.; Elsevier Ltd.: Amsterdam, The Netherlands, 2014; Volume 89, pp. 672–678. [Google Scholar]
Romano, M.; Kapelan, Z.; Savic, D. Real-time leak detection in water distribution systems. In Proceedings of the 12th Annual Conference on Water Distribution Systems Analysis (WDSA), Tucson, AZ, USA, 12–15 September 2010; pp. 1074–1082. [Google Scholar]
Vapnik, V.N.; Kotz, S. Estimation of Dependences Based on Empirical Data, 2nd ed.; Springer Science + Business Media: New York, NY, USA, 1982; pp. 433–450. [Google Scholar]
Mounce, S.; Mounce, R.; Boxall, J. Novelty detection for time series data analysis in water distribution systems using support vector machines. J. Hydroinform. 2010, 13, 672–686. [Google Scholar] [CrossRef]
Mamo, T.; Juran, I.; Shahrour, I. Virtual DMA Municipal Water Supply Pipeline Leak Detection and Classification Using Advance Pattern Recognizer Multi-Class SVM. J. Pattern Recognit. Res. 2014, 9, 25–42. [Google Scholar] [CrossRef]
Zhang, Q.; Wu, Z.Y.; Zhao, M.; Qi, J.; Huang, Y.; Zhao, H. Leakage Zone Identification in Large-Scale Water Distribution Systems Using Multiclass Support Vector Machines. J. Water Resour. Plan. Manag. 2016, 142, 04016042. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Devan, P.; Khare, N. An efficient XGBoost—DNN-based classification model for network intrusion detection system. Neural Comput. Appl. 2020, 32, 12499–12514. [Google Scholar] [CrossRef]
Clercq, D.D.; Smith, K.; Chou, B.; Gonzalez, A.; Kothapalle, R. Identification of urban drinking water supply patterns across 627 cities in China based on supervised and unsupervised statistical learning. J. Environ. Manag. 2018, 223, 658–667. [Google Scholar] [CrossRef]
He, B.; Ma, J.; Gao, H.Y. Predicting urban daily water supply based on multi-granularity feature and XGBoost integrated model. J. Yangtze River Sci. Res. Inst. 2020, 37, 43–49. [Google Scholar]
Snider, B.; McBean, E.A. Watermain breaks and data: The intricate relationship between data availability and accuracy of predictions. Urban Water J. 2020, 17, 163–176. [Google Scholar] [CrossRef]
Wu, J.; Ma, D.; Wang, W. Leakage Identification in Water Distribution Networks Based on XGBoost Algorithm. J. Water Resour. Plan. Manag. 2022, 148, 04021107. [Google Scholar] [CrossRef]
Mohsen, A. Evaluation and Detection of Leaks in a Laboratory-Scale Water Distribution System with Acoustic, Acceleration, and Dynamic Pressure Sensors. Ph.D. Thesis, Texas A&M University, College Station, TX, USA, 2021. [Google Scholar]
Wang, X.; Fu, D.; Wang, Y.; Guo, Y.; Ding, Y. The XGBoost and the SVM-based prediction models for bioretention cell decontamination effect. Arab. J. Geosci. 2021, 14, 1–11. [Google Scholar] [CrossRef]
Nagaraj, R.; Lakshmi, S.K. Performance analysis of machine learning techniques for water body extraction. In Proceedings of the 2021 IEEE Bombay Section Signature Conference (IBSSC), Gwalior, India, 18–20 November 2021. [Google Scholar]
Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2018, 221, 430–443. [Google Scholar] [CrossRef]
Farley, B.; Boxall, J.B.; Mounce, S. Optimal Locations of Pressure Meters for Burst Detection. Water Distrib. Syst. Anal. 2008, 1–11. [Google Scholar] [CrossRef]
Farley, B.; Mounce, S.R.; Boxall, J.B. Field testing of an optimal sensor placement methodology for event detection in an urban water distribution network. Urban Water J. 2010, 7, 345–356. [Google Scholar] [CrossRef]
Farley, B.; Mounce, S.R.; Boxall, J. Development and Field Validation of a Burst Localization Methodology. J. Water Resour. Plan. Manag. 2013, 139, 604–613. [Google Scholar] [CrossRef]
Friedman, J.H. Stochastic gradient boosting. Comput Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of water distribution network featuring monitoring and test points.

Figure 2. Performance indicators of leakage locating model along the iterations.

Figure 3. Performance indicators of leakage quantity model along the iterations.

Figure 4. ROC curve of (a) leakage locating model and (b) leakage quantity model.

Figure 5. Feature importance in (a) leakage locating model and (b) leakage quantity model.

Table 1. Selection of water usage periods.

Water Usage Period	Time	Water Usage Period	Time
Morning peak hours	6:00–8:00	Noon off-peak hours	15:00–17:00
Morning off-peak hours	9:00–11:00	Evening peak hours	18:00–20:00
Noon peak hours	11:00–13:00	Evening off-peak hours	21:00–23:00

Table 2. Experimental setting of leakage quantity.

Sequence	Date	Leakage Location	Leakage Flow (L/S)	Leakage Percentage (%)
1	2 August 2021	Yanggou	2.08	5
2	2 August 2021	Yanggou	4.17	10
3	2 August 2021	Yanggou	6.25	20
4	3 August 2021	Chengnong	0.83	5
5	3 August 2021	Chengnong	1.67	10
6	3 August 2021	Chengnong	2.50	15
7	3 August 2021	Chengnong	3.33	20
8	3 August 2021	Chengnong	5.00	30
9	4 August 2021	Shuangda	1.11	5
10	4 August 2021	Shuangda	2.22	10
11	4 August 2021	Shuangda	3.33	15
12	1 August 2021	Caoshan	0.28	5
13	1 August 2021	Caoshan	0.56	10
14	1 August 2021	Caoshan	0.83	15

Table 3. Mapping of nonnumerical x features.

Feature	Nonnumerical Label	Numerical Label
x1	Morning peak hours	001
	Noon peak hours	010
	Evening peak hours	100
	Evening off-peak hours	011
	Morning off-peak hours	101
	Noon off-peak hours	110
	Other	111
x2	1.9 m	01
	1.8 m	10
	1.6 m	11

Table 4. Mapping of nonnumerical y features.

Feature	Nonnumerical Label	Numerical Label
y1	No leakage	0
	Yanggou	1
	Chengnong	2
	Shuangda	3
	Caoshan	4
	Other	99
y2	y2 < 10%	0
	10% ≤ y2 < 20%	1
	20 ≤ y2 < 30%	2
	y2 ≥ 30%	3

Table 5. The sample distribution of leakage locating model.

Leakage Locating Model	Total Samples	No Leakage	Chengnong	Shuangda	Caoshan	Yanggou
Sample size	6884	3999	1904	587	222	172
Percentage (%)	100.00	58.09	27.66	8.53	3.22	2.50

Table 6. The sample distribution of leakage quantity model.

Leakage Quantity Model	Total Samples	Percentage of Leakage in Pipe Flow (Y2)
Leakage Quantity Model	Total Samples	0	1	2	3
Sample size	2952	448	1133	1112	260
Percentage (%)	100.00	15.18	38.38	37.67	8.81

Table 7. The data distribution of the two models.

Model	Total Samples	Training	Validation	Testing
Leakage locating model	6884	5507	688	689
Leakage quantity model	2952	2362	295	295

Table 8. Default values of parameters.

Parameter	Description	Leakage Locating Model -XGBCIassifier	Leakage Quantity Model -XGBCIassifier
Max_depth	the depth of the given tree	3	4
Learning_rate	the learning rate of the model from each iteration	0.3	0.3
N_estimators	number of sub models	80	200
Objective	loss function	binary:logistic	binary:logistic
Booster	model solving method	gbtree	gbtree
Min_child_weight	the smallest sample weight sum in the child node	6	5
Reg_alpha	weight of L1 regularization term	0	0
Reg_lamba	weight of L2 regularization term	1	1

Table 9. Tests of between-subjects effects.

Dependent Variable: F1_Score
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	0.063 ^a	13	0.005	10.852	0.234
Intercept	7.891	1	7.891	17,534.741	0.005
Max_depth	0.024	5	0.005	10.458	0.23
Learning_rate	0.007	2	0.003	7.407	0.251
N_estimators	0.004	3	0.001	2.939	0.399
Min_child_weight	0.001	1	0.001	2.778	0.344
Reg_alpha	0	1	0	0.778	0.54
Reg_lamba	0	0	.	.	.
N_estimators * Reg_alpha	0	1	0	0.397	0.642
Error	0	1	0
Total	10.481	15
Corrected Total	0.064	14

a. R-squared = 0.993 (adjusted R-squared = 0.901). *. Interaction between two parameters.

Table 10. Performance indicators of training samples and testing samples.

Model	Accuracy (%)		F1-Score (%)
Model	Training Samples	Testing Samples	Training Samples	Testing Samples
The leakage locating model	97	95	95	93
The leakage quantity model	97	96	98	97

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Wu, X.; Sun, M.; Yang, S.; Song, W. A Novel Intelligent Leakage Monitoring-Warning System for Sustainable Rural Drinking Water Supply. Sustainability 2022, 14, 6079. https://doi.org/10.3390/su14106079

AMA Style

Li X, Wu X, Sun M, Yang S, Song W. A Novel Intelligent Leakage Monitoring-Warning System for Sustainable Rural Drinking Water Supply. Sustainability. 2022; 14(10):6079. https://doi.org/10.3390/su14106079

Chicago/Turabian Style

Li, Xiaoqin, Xiaomei Wu, Mingzhuang Sun, Shengqiao Yang, and Weikun Song. 2022. "A Novel Intelligent Leakage Monitoring-Warning System for Sustainable Rural Drinking Water Supply" Sustainability 14, no. 10: 6079. https://doi.org/10.3390/su14106079

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Intelligent Leakage Monitoring-Warning System for Sustainable Rural Drinking Water Supply

Abstract

1. Introduction

2. Materials and Methods

2.1. Leakage Simulation and Monitoring

2.2. Model Setup and Analysis

2.2.1. Data Preparation

2.2.2. Iteration Setup

2.2.3. Results Analysis

3. Results and Discussion

3.1. Sensitivity Analysis

3.2. Model Performance

3.3. Feature Importance Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI