Leak detection and size identification in fluid pipelines using a novel vulnerability index and 1-D convolutional neural network

This paper proposes a leak detection and size identification technique in fluid pipelines based on a new leak-sensitive feature called the vulnerability index (VI) and 1-D convolutional neural network (1D-CNN). The acoustic emission hit (AEH) features can differentiate between normal and leak operating conditions of the pipeline. However, the multiple sources of acoustic emission hits, such as fluid pressure on the joints, interference noises, flange vibrations, and leaks in the pipeline, make the features less sensitive toward leak size identification in the pipeline. To address this issue, acoustic emission hit features are first extracted from the acoustic emission (AE) signal using a sliding window with an adaptive threshold. Since the distribution of the acoustic emission hit features changes according to the pipeline working conditions, a newly developed multiscale Mann–Whitney test (MMU-Test) is applied to the acoustic emission hit features to obtain the new vulnerability index feature, which shows the pipeline's susceptibility to leak and changes according to the pipeline working conditions. Finally, the vulnerability index is provided as input to a 1-D-CNN for leak detection and size identification, whose experimental results show a higher accuracy as compared to the reference state-of-the-art methods under variable fluid pressure conditions.


Introduction
Pipelines are among the top five modes of transport in the modern world.Pipelines are cheap, safe, and provide economic transportation for gas and fluids.However, material corrosion, fatigue cracks, earthquakes, material defects, and discontinuities in the pipelines due to the external environment can all lead to pipeline leaks (B.Zhang, Kang et al., 2022;Z. Zhang, Zhang, et al., 2022).The repercussions that arise from leaks are very severe and can include economic losses, impacts to public safety, pollution, and waste of resources (Che et al., 2021;Duan et al., 2020).Fatalities due to leakage in the pipeline are about 46% around the world.A case study presented in 2020 reveals that in Guizhou (China) leakage in diesel pipelines resulted in the loss of 1.5 million RMB and cross-provincial environmental pollution.The same case study also reported that over 120 deaths and dozens of injuries were reported in Hidalgo (Mexico) due to the explosion that occurred there because of a leak in the petroleum pipeline (Xing et al., 2020).To avoid such severe consequences, early pipeline leak detection is of easy installation, and high sensitivity make it attractive for the industry (B.Wang et al., 2020).Considering the widespread use of AE-based monitoring of pipelines in the industry, in this study, AE technology is utilized for the detection of pipeline leaks and leak size identification.

Related research works
A leak in the pipeline generates elastic energy which forms an AE event.The AE sensors installed on the pipeline surface detect the AE events in the form of AEHs.The detected AEH due to the leak makes variations in the AE signal (Rienstra & Hirschberg, 2018).To utilize these variations for leak detection in the pipeline, researchers primarily focused on feature extraction and intelligent pattern recognition methods (Hu et al., 2021;Korlapati et al., 2022).Intelligent pattern recognition methods for pipeline leak detection can be grouped into the temporal domain (TD), frequency domain (FD), and multi-resolution time-frequency domain (TFD).Wang (F.Wang et al., 2017) presented a strategy for pipeline leak detection based on the TD statistical features.The AE signals were preprocessed and features were extracted in TD to obtain discriminant features with low dimensions.The method further preprocessed the features using principal component analysis.Finally, the pipeline working condition was identified using the support vector data descriptor (SVD).Wang, Mao, et al. (2021) detected the leak in the pipeline by utilizing the TD AE signal amplitude and artificial neural networks (ANN).However, AE signals are intensely affected by interference noises and attenuation, which negatively affect the signal in TD making leak detection in the fluid pipeline more challenging.The frequency spectrum shows sensitivity to variations in the AE signals due to leak-related AEHs (Sun et al., 2016).Wang, Sun, et al. (2021) detected the leak and identified the leak size in the fluid pipeline by analysing the change in the amplitude of leak-related frequencies.Furthermore, the accuracy for leak detection and leak size identification was enhanced using ANNs.Xiao (Xiao et al., 2021) extracted features such as mean, root mean square, crest factor, kurtosis, skewness, mean frequency, median frequency, and peak frequency slope from the AE signal in TD and FD.A single feature vector was formed after the combination of TD and FD features, and leak-related features were selected using the Kullbace-Leibler distance.The leak-related features were classified using ANN, SVD, and random forest for pipeline leak state identification.It is known that the AE signal obtained from the pipeline is complex and non-stationary in nature and that FD analysis is more suitable for stationary signals (Keramat & Duan, 2021;Zheng et al., 2021).TFD techniques such as empirical mode decomposition (EMD), wavelet transform for multi-resolution analysis, and variational mode decomposition can be utilized for leakrelated information extraction from the non-stationary AE signal.Meaningful features are extracted after preprocessing the complex AE signal using TFD techniques.The extracted features are utilized to recognize the health conditions of the pipeline using well-known pattern recognition techniques such as SVD, ANN, and fuzzy-SVD (Goliatt et al., 2021;S. Li, Gong, et al., 2021;Li, Cai, et al., 2022;Stajuda et al., 2022;Xu et al., 2021).However, TFD preprocessing of the AE signal is computationally expensive, and furthermore, the mother wavelet selection for wavelet transforms needs experimental validation.EMD is a self-adaptive technique, however, mode mixing and extreme interpolation make EMD less attractive for the preprocessing of pipeline AE signals.A leak in the pipeline generates elastic energy.The elastic energy in the form of a stress wave travels towards the AE sensor through the pipeline surface.The AE sensor records the stress wave which in turn produces transients in the AE signal commonly known as the AEH (Miller et al., 2005).Features concerning the AEH are called the AEH features.The commonly used AEH features are peak amplitude, counts, rise time, decay time, and average frequency.The AEH features hold information related to the AE event resulting from a defect.Therefore, the AEH features are independent of the entire AE signal distribution (Rai et al., 2021).The AEH performed better than the traditional features extracted from the AE signal in TD, FD, and TFD for different applications such as pipelines, concrete structures, and mooring chains (Angulo et al., 2021;Banjara et al., 2020).Even though the AEH features perform better for pipeline diagnosis as compared to the traditional features such as mean, root mean square (RMS), crest factor, kurtosis, skewness, mean frequency, median frequency, and peak frequency slope, they also suffer from several shortcomings.Concerning the pipeline, the sources of the hit in the AE signal can include the fluid pressure on the joints, background interference noises, flange vibrations, and leaks in the pipeline.Thus, the multiple sources of AEH make the AEH features less sensitive to the identification of leak size in the pipeline.To address this problem, this paper proposes a new technique to extract a leak-sensitive feature called the VI from the AE signal.An AEH can be a burst or continuous type; a burst-type AEH is a rapid sequence of hits and the continuous type of AEH is the noise generated by the AE source.Thus, the burst types of AEHs are easy to differentiate from the background noise as compared to the continuous type of AEH.To address this issue, in the first step an adaptive threshold is introduced that can consider both the burst-type and continuous type of AEH for meaningful AE feature extraction from the AE signal.The distribution of the AEH features changes according to the change in the pipeline working conditions.To extract intrinsic discriminant information from the AEH features based on the change in distribution, in the second step, a newly develop MMU-Test is applied to the acoustic emission hit features on multiscale.The MMU-Test output statistics obtained from the AEH features on multiscale results in a new feature called the VI.The Mann-Whitney test (MU-test) is a non-parametric test that compares two independent samples by utilizing their differences in the mean ranks.The MU-test considers two 1-D samples from the same population if they have an order of random ranks.Contrary to this, if two 1-D samples have an order of clustered ranks then the test will determine that the samples are from different populations (Corder & Foreman, 2014).Costa (Costa et al., 2002) introduced multiscale analysis for the extraction of entropy from 1-D TD signals.The AEH features extracted from the pipeline AE signals represented in multiple scales will have different mean rank orders and therefore can be utilized for leak detection and leak size identification in the pipelines.Hence, in this paper, the MU-test is combined with multiscale analysis, and a new MMU-test is obtained.The test output statistic results in a new feature called the VI.The name VI is given to the MMU-test output statistics because the MMUtest output shows susceptibility to the conditions of the pipeline.To the best of the authors' knowledge, the ability of MMU-test for pipeline leak detection and leak size identification is explored for the first time in this research work.
The pipeline leak diagnosis system starts with leakrelated feature extraction from the AE signal and based on the extracted features the diagnosis system ends with the classification of pipeline working conditions.As compared to machine learning techniques, deep learning (DL) methods can analyse complex data.Furthermore, the DL methods extract meaningful discriminant information from complex data autonomously for the task of pattern recognition (W.Li, Huang, et al., 2022).The most prominent DL methods used for fault diagnosis are deep belief networks, neural auto-encoders, recurrent neural networks, and CNN.The CNN reduces the risk of overfitting and enables low computational complexity by sharing weights in the network, utilizing the local representative fields, and special domain subsampling (Yafouz et al., 2021;Zhang et al., 2021).Furthermore, CNN has shown successful pattern recognition in fault diagnosis for bearings, centrifugal pumps, and pipelines (Ahmad et al., 2022;Chen et al., 2021;Hasan et al., 2021).The capability of CNN to extract discriminant information and recognize the patterns in the 1-D data with low computational complexity while avoiding overfitting makes it attractive for the use of pipeline fault diagnosis over other DL methods.For this reason, in this paper 1-D CNN is used for pipeline leak detection and leak size identification.
The novelty in this paper can be summarized as follows: (i) An adaptive thresholding technique based on a sliding window is introduced for AEH feature extraction from the pipeline AE signals.(ii) A new leak-sensitive feature called VI is introduced, which is calculated from the AEH features using the newly develop MMU-test.
The contribution of this work can be summarized as: (i) The VI is calculated from the AEH features using the MMU-test.(ii) Pipeline leak detection and leak size identification are performed based on VI and 1-D CNN.(iii) Real-world industrial fluid pipeline AE data is utilized for the validation of the proposed method.
The remaining sections of this study are as follows: Section 2 presents the technical background of the methods used in this paper.The proposed method is explained in Section 3. Results obtained from the proposed method will be presented in Section 4. Finally, the paper will be concluded in Section 5.

Review of acoustic emission hit features
A leak in the pipeline generates elastic energy which forms an AE event.The elastic energy in the form of a stress wave travels towards the AE sensor through the surface of the pipeline.The AE sensor records the stress wave which in turn produces transients in the AE signal commonly known as the AEH.AE monitoring takes place with the existence of continuous background noise, and for this reason, the AEHs are separated from the background noise using a threshold.The AEH features extracted from the AE signal are peak amplitude, rise time, decay time, counts, and average frequency.Figure 1 shows the graphic illustration of the AE features.These features can be explained as follows: • AE amplitude: The maximum measured voltage of AEH in the AE signal.AE amplitude is an important feature because it has a direct relationship with the source AE event.The AE amplitude feature is measured in decibels.These significant features are extracted from the AE signal in the current work.

Review of Mann-Whitney test and multiscale analysis
The MU-test is a statistical test that determines whether the two samples belong to the same population or different populations without any prior assumptions about the distribution of the samples.The hypothesis for MU-test can be expressed as follows: • Null hypothesis (N 0 ): Let G(x) and F(x) represent the distributions of two independent 1-D samples.The N 0 states that there will be no significant difference between the MU-test statistics of G(x) and F(x).• Substitute hypothesis (N 1 ): The N 1 states that there will be a systematic decrease or increase in the MU-test statistics of one sample as compared to another sample (G(x) = F(x)).
The MU-test statistics can be calculated using Equation (1).
The S i can be calculated using the following equation.
where m 1 and m 2 are the independent samples, μ and var are the mean and variance for the sample of interest, R i reflects the sum of the ranks for the sample of interest in Equations ( 1) and ( 2).The MU-test statistics represented in Eq. ( 1) will change with changes in the condition of the pipeline.Thus, the MU-test can be used to detect the leak in the pipeline with the help of hypotheses N 0 and N 1 , where the hypothesis N 0 will show that the pipeline is working under a no leak condition, and N 1 will show that the pipeline is working under a leak condition.
In this study, the MU-test statistic is calculated on multiscale from the pipeline AE features, and a new MMU-test is introduced.From the 1-D acoustic emission feature vector V = V i ; i = 1,2,3 . . .,N, the new multiscale 1-D representation of the features can be obtained using the following equation.
where the L τ x is the features represented on a multiscale at time factor τ .N is the length of the feature vector V i , and x is the parameter used to adjust the number of iterations i to obtain the coarse-grained multiscale representation of the original feature vector with the scale factor of τ .The MU-test statistic is calculated from the AE features on each scale and the corresponding new feature VI is obtained on multiscale.Equation (4) presents the mathematical expression for VI calculation.

Convolutional neural networks
The working principle of CNN is like that of the biological visual system.In the biological visual system, the neurons of the cortex respond only to the stimulation of specific areas, thus, the neurons only focus on the local information in the image.The local information of each neuron is processed in the visual cortex and the global perception of the image is created.CNN has a unique architecture that helps reduce the computational complexity of the classification and overfitting of the neural network.The CNN architecture consists of an input layer, convolutional layer (CL), pooling layer (PL), fully connected layer (FCL), and a final output layer.
The convolutional layer of the CNN learns the highlevel abstract features from the input data.The abstract features of the convolutional layer are further enhanced by adding an activation function having different weights and biases.Equation ( 5) shows the convolutional operation of the CNN.
The number of layers is represented by l, and in each layer, the j th component is represented by c j i , P n shows the convolved region, w represents the weights and b represents the biases in Equation ( 5).
Discriminant features play an important role in the classification accuracy of a classifier, therefore, to further improve the quality of the features obtained from the convolutional layer in CNN a pooling layer is introduced.In this study, max-pooling is utilized for the removal of redundant information.The max pooling operation can be expressed mathematically using the following equation.
The down-sampled output of the convolution is represented by c j t , w represents the weights and b represents the biases, the max(C j i ) denotes the max pooling operation in Equation (6).To improve the discriminance of the features linearly in this study the modified linear unit is used as an activation function.
To increase the ability to extract local information from the input data in CNN, several convolutional and pooling layers are used.After obtaining the latent information from the convolutional and pooling layers, the next step in CNN is the classification of the information into its respective classes.For this purpose, a fully connected layer is used in CNN.The latent features obtained from the convolution kernel are classified using a fully connected layer.First, the latent features are flattened into a one-dimensional feature vector, and the flattened feature vector is provided as an input to the fully connected layer.Equation ( 7) shows the operation of the fully connected layer.
For a neuron, the w represents the weighted value and b represents the bias in the jth layer, f C j+1 presses the output value of the ith neuron at the layer l + 1.In this paper, multiclass classification is performed, therefore, a SoftMax function is used in the output layer of the CNN.The SoftMax activation function calculates the probability of the inputs and assigns the classification label according to the highest probability value.To avoid overfitting, in CNN dropout strategy introduced by Srivastava et al. ( 2014) is used.The dropout strategy deactivates neurons in some layers of the network using probabilistic approaches.Deactivating the neurons in the network at a certain layer reduces the adaptability of the neurons and improves their generalization ability.In this paper, 1D-CNN is utilized for the identification of pipeline leak size.

Proposed method
The proposed method starts with acquiring the AE signals from the pipeline and ends with identifying the working conditions of the pipeline.Figure 2 shows the graphical abstract of the proposed method.The steps involved in the proposed method are as follows.
Step 1: The AE signals are collected from the pipeline under test.
Step 2: The AE signals obtained from the pipeline are utilized to extract the AEH features explained in Section 2. A leak in the pipeline produces multiple AEHs in the AE signal.The AEH in the AE signal can be a burst or continuous type.A burst-type AEH is a rapid sequence of hits and the continuous type of AEH is the noise generated by the AE source.Thus, the burst-type AEHs are easy to differentiate from the background noise as compared to the continuous type of AEH.To preserve the properties of both the burst and continuous type AEH due to the leak in the pipeline and to prevent leak-related information loss, a proper setting of threshold is important for the AEH feature extraction.For this purpose, a self-adaptive threshold is introduced in this step.The threshold can be calculated as follows: to cover every AEH, a sliding window of length 'l' is used.Rather than calculating a common threshold, in each sliding window, a threshold is calculated for separating the AEH from the continuous background noise.Thus, the threshold adapts itself to the attributes of the AE signal in each sliding window.
The AEHs in the AE signals appear randomly with time durations lasting from nano to milliseconds, therefore, the balanced choice for window length would be in milliseconds.For the balanced selection of l, in this study, the HSU-Nelson (Sause, 2011) test was performed on the surface of the pipeline 1000 times.The HSU-Nelson test is a pencil-lead break test used to simulate an AE event.Based on the HSU-Nelson test, l = 1 ms is selected for the sliding window.Previous work (Das et al., 2019;Kelkel et al., 2020;Rai et al., 2021) has suggested that the peak value has a direct relationship with AEHs.Therefore, in each sliding window, a specific percentage 'PA' of peak value is selected as a threshold for the separation of AEH from continuous background noise.In this study specifically, PA = 10% of the peak value is taken as a threshold in each sliding window for the identification of AEH.
Step 3: The distribution of the AEH features changes according to changes in the pipeline working conditions.To detect the leak and to identify the leak size, in this step, intrinsic discriminant and leak-sensitive information from the AEH features are extracted based on the change in distribution.For this reason, the MMU-test is applied to the AEH features with the multiscale factor of τ .After applying the MMU-test to the AEH features, a new feature VI is obtained.
Step 4: To utilize the VI for the detection of leak and leak size, the VI is provided as an input to the 1D-CNN.Table 1 shows the network architecture used in this study which is obtained after several trials.The 1D-CNN for pipeline leak detection and leak size identification is comprised of five CLs, four PLs, and two FCLs.The input feature vector is comprised of five VIs calculated from the AE features.The five CLs in CL1 use 128 convolutional kernels of size 16 × 1, the 64 CLs in CL2 are of kernel size of 8 × 1, the 32 in CL3 are of kernel size 4 × 1, the 16 in CL4 are of kernel size 4 × 1, and the eight in CL5 are of kernel size of 4 × 1, with all the CL's ReLU activation functions in use.The PL is added to the CL1, CL2, CL3, and CL4, which performs the max-pooling operation of 2 × 2. To generalize the model, the dropout operation with a dropout ratio of 0.25 is performed at PL3 and CL5.

Experimental results, evaluation, and discussion
The objective of this section is the unbiased evaluation of the proposed method.Therefore, the evaluation is divided into two parts.Part one will focus on the evaluation of the new feature VI against the traditional features extracted from the pipeline AE signals and part two will focus on the evaluation of the proposed method for leak size identification as compared to the existing state-of-the-art methods.

Test setup
The pipeline used for AE data acquisition is a part of a large industrial fluid pipeline.Figure 3(a) shows the pictorial view of the pipeline testbed, and the schematics of the testbed are presented in Figure 3(b).The experimental setup parameters used during the data acquisition process are listed in Table 2.
A hole is created in the pipeline using an electrical drill machine.To simulate leaks of different sizes, a fluid control value is welded on the pipeline where the hole was created.Water is used as the fluid inside the pipeline during the experiment.Water is selected for the experiment  because it poses no hazards to the environment or the operating staff.

System development for AE data acquisition
The flow diagram for the AE data acquisition system is shown in Figure 3 2.A plastic tape is used to fix the position of the sensors on the pipeline, furthermore, to ensure contact between the AE sensors and the pipeline surface, we applied a specialized gel to the contact areas.After the installation of the AE sensors, a data acquisition software developed in Ulsan Industrial Artificial Intelligence Laboratory using interface libraries from NI and Python language is used for controlling the data acquisition process.Before the acquisition of the pipeline leak data, the sensor's calibration and sensitivity of the acquisition system are tested using the Hsu-Nelson test.After ensuring the sensitivity of AE sensors and the functionality of the acquisition system, a reliable pipeline leak dataset was recorded.
The data acquisition system presented in Figure 3(c) can be summarized as follows: in the first phase, the AE waves are converted into electric signals with the help of AE sensors.The AE signals obtained from the AE sensors are in analogue form; in the second phase, an ADC is used to convert them into digital signals.In the third phase, an interference module is used for communication between the personal computer and the ADC.In the final phase, the data acquisition software receives the data and stores it in the hard drive.

Dataset recording and its description
Datasets were collected from the pipeline under normal and leaking conditions.The normal condition refers to the pipeline condition when the experiment valve is closed.The fluid pressure inside the pipeline was controlled with the help of a centrifugal pump (CP).The fluid pressures considered in this study are 7 and 13 bar.For readability, these pressures are represented as p1 and p2, respectively.The environmental temperature was around 25°C at the time of data collection.For each pressure condition data were collected for 2 min with a sampling frequency of 1 MHz.AE signals are collected for each pressure condition under three leak states.The leak states considered in this study are pipeline leaks with dimeters of 0.3, 0.5, and 1 mm.Table 3 shows the datasets collected from the pipeline under normal and leaking conditions.The data collection process can be explained as follows: First, the leak valve on the pipeline was kept closed and the CP was turned on.When the pressure inside the pipeline became p1, data were recorded for 2 min, this is referred to as the normal condition for Dataset-1.After collecting the data in normal conditions, the pipeline value was opened to 0.3 mm under p1, and data were recorded for 2 min.It is to emphasize that, during the condition switching the acquisition processes was not interrupted, thus, data was acquired continuously.
With the collection of 0.3 mm leak signals under p1, the dataset-1 acquisition is completed.The same procedure is repeated to collect pipeline data for leak sizes of 0.5 and 0.1 mm under p1.
For the pressure condition p2, the leak valve on the pipeline was kept closed and the CP was turned on.When the pressure inside the pipeline became p2, data were recorded for 2 min, this is referred to as the normal condition for Dataset-2.After collecting the data in normal conditions, the pipeline value was opened to 0.3 mm under p2, and data were recorded for 2 min.It is to emphasize that, during the condition switching the acquisition process was not interrupted, thus, data was acquired continuously.With the collection of 0.3 mm leak signals under p2, the dataset-2 acquisition is completed.The same procedure is repeated to collect pipeline data for leak sizes of 0.5 and 0.1 mm under p2.The CP and fluid flow inside the pipeline during the data acquisition process are illustrated in Figure 4.

Features description and data configuration
The AEH features such as peak amplitude, rise time, decay time, counts, and average frequency are extracted from each 1s AE signal.The sampling frequency for the AE signal acquisition is 1 MHz, thus, each 1s signal has 1 × 10 6 data points.Figure 5(a,b) shows the 1s AE signals obtained from the pipeline under fluid pressure operating conditions of both 7 and 13 bar.For each AEH feature, a total of 1000 samples/s were obtained by utilizing the sliding window of l = 0.001.As can be seen in Table 3, for each operating condition of the pipeline a total of 100 samples were collected, therefore, the total number of samples for each AEH feature under each operating condition is 1000 × 100.Each AEH feature vector is represented on multiscale using Eq.uation (3) with τ = 100, the MMU-test was applied to each scale, and the corresponding VI were obtained using Equation (4).A total of 100 VI features were obtained from each AEH feature under a single operating condition of pipeline i.e. normal = 100 VI features, leak = 100 VI features.Figure 6(a) and Figure 7(a) show the VI obtained from the pipeline under different operating conditions.
After extracting the VI, the next step is the proper configuration of the testing and training dataset.In this study, the k-fold cross-validation (KCV) strategy is used for the validation of the proposed model for leak detection and leak size determination.In KCV the dataset is divided randomly into folds represented by K.In this study, the K = 3, thus, the dataset is divided into 3 folds.Each time 1-fold will be used for the test while the remaining fold will be used for training the 1D-CNN.A total of 100 × 5 VIs was extracted from the AEH features for the normal condition while 100 × 5 VIs were extracted for the leak condition of the pipeline which result in a total of 200 × 5 samples for each VI.The dataset is divided randomly into training and testing sets with a ratio of 7:3.Hence, 140 × 5 randomly chosen samples were used for training while 60 × randomly chosen samples were used for testing the 1D-CNN for leak detection and leak size determination.
The performance evaluation matrices used for the validation of the proposed method are precision (P r ), recall (R r ), error rate (E r ), and average classification accuracy (ACA).These matrices can be calculated using the following mathematical equations: The term k represents the fold of KCV, the subscripts TP, TN, FN, and FP with U represent the true negative, true positive, false positive, and false negative which are classified as class m by the classification algorithm, i represents the KCV iteration number, and for each test subset the number of samples provided to the classifier is represented by U samples in Equations ( 8)-(11).

Comparison of the novel feature vulnerability index with traditional features
The proposed feature VI is compared with the traditional AE features.The AE features used for comparison in this study are kurtosis, mean, RMS, variance, and standard deviation.from Figure 6(a), that the proposed VI differentiates between the normal and leak operating conditions of the pipeline.As the pipeline condition switches from normal to leak condition, the proposed VI significantly increases or decreases.This is because a leak in the pipeline generates high amplitude AEH that change the distribution of the AE signal when the pipeline conditions change from normal to leak.The AEH features extracted from the AE signal can represent these changes; however, the sources for AEHs in the AE signal can be the fluid pressure on the joints, background interference noises, flange vibration, and leak in the pipeline.Thus, the multiple sources of AEH make the AEH features less sensitive to the identification of leak size in the pipeline.Therefore, to retain the properties of both continuous and bursttype AEH generated from a leak, a sliding window with an adaptive threshold is used for AEH feature extraction.Furthermore, to utilize the distribution change of the AEH features for leak detection and leak size identification, the MMU-Test is applied to the acoustic emission hit features.The multiscale Mann-Whitney test output statistics presented in Equation ( 4) obtained from the AEH features resulted in the new feature called the VI.
The vulnerability index shows the pipeline's susceptibility to a leak and changes according to the change in the pipeline working conditions as can be observed from Figure 6(a-ii) and Figure 6(a-v).The VI can also differentiate between the leak sizes as the VI fluctuates at different levels when the pipeline conditions go from normal to severe leak conditions.The VI shows the same behaviour in Figure 7(a) when the fluid pressure inside the pipeline was increased from 7 to 13 bar.The VI significantly increased or decreased when the pipeline conditions went from normal to incipient and from incipient to severe leak conditions.The VI shows an interesting behaviour regarding the fluid pressure inside the pipeline, namely, the fluid pressure inside the pipeline does not affect the leak detection and leak discriminatory ability of the VI.Furthermore, the VI holds the hypothesis N 0 presented in Section 2.2 for the normal condition of the pipeline and hypothesis N 1 for the pipeline operating under leak conditions, thus, based on the hypothesis a computer can detect the leak in the fluid pipelines.
The traditional features such as mean, kurtosis, RMS, variance, and standard deviation are extracted from the AE signal under fluid pressures of 7 and 13 bar.and 7(b) that the traditional features do not show any significant difference between the normal and leak operating conditions of the pipeline.However, the feature RMS shows a slight discriminatory capacity towards the pipeline operating conditions.Figures 6 (b-iii) and 7(b-iii) illustrate that as the pipeline condition switches from normal to leak conditions, the RMS shows a slight increase in fluctuation as compared to the normal conditions.The remainder of the statistical features cannot differentiate between the pipeline working conditions.This performance was expected from the traditional statistical features because of the known fact that statistical features extracted from the AE signals in the TD are not sensitive to incipient faults, furthermore, AE signals are intensely affected by interference noises and attenuation, which negatively affect the signal in TD and the traditional features extracted from the AE signal are very sensitive to noise.

Validation of the leak size identification capability of the proposed method
The leak size identification ability of the proposed method is evaluated in this section.The proposed method is compared with one AE TD features-based leak size identification method called the TD-CNN, and with one AEH features-based leak size identification method (Banjara et al., 2020), which will be referred to Nawal et al.
in the text.For a balanced comparison, the proposed and the reference techniques were each applied to the dataset presented in Table 3. Leak size identification capability was considered under different fluid pressures inside the pipeline.Each method is applied to dataset-1, dataset-3, and dataset-5, and the results were obtained for the leak sizes of 0.3, 0.5, and 1 mm under 7 bar pressure, referred to as test-1 in the text.The methods were also applied to dataset-2, dataset-4, and dataset-6, and the results for leak size identification under 13 bar fluid pressure inside the pipeline were obtained, referred to as test-2 in the text.Table 4 and Figure 8(a) show the performance of leak size identification of the proposed and reference methods under 7 bar pressure (test-1).The results obtained from the proposed and reference methods for the leak size identification under 13 bar pressure (test-2) are presented in Table 5 and Figure 8(b).
Applying the proposed method to the test-1 dataset resulted in an ACA of 97.74% with a P r of 97.7, R r of 97.7, and E r of 4.3% as seen from Table 4 and Figure 8(a).For test-2 the proposed method resulted in an ACA of 100% with a P r of 100, R r of 100, and E r of 0% as seen from Table 5 and Figure 8(b).The proposed method achieved higher class-TPR as compared to the reference methods irrespective of the fluid pressure inside the pipeline.These results were expected from the proposed technique because the proposed method provides the new VI feature to 1D-CNN for leak detection and leak size identification.As seen in Figure 6(a) and Figure 7(a), the new feature VI is discriminant and shows high sensitivity towards the leak size irrespective of fluid pressure inside the pipeline.The classification accuracy of a classifier is directly proportional to the discriminant capacity of its features.Therefore, when the VI is provided to 1D-CNN for leak size identification, the 1D-CNN more accurately detected the leak and identified the leak size as compared to the reference methods.
The reference method of Nawal et al. extracted AEH features such as counts, energy, peak amplitude, rise time, and duration from the AE signals obtained from the pipeline.After evaluating these features for pipeline leak identification using SVM and relevance vector machine (RVM), the study suggested that certain AEH features such as count, energy, and signal strength are highly capable of identifying leaks in the pipeline.For a fair comparison, instead of SVM and RVM, we used 1D-CNN for pipeline leak state identification.After applying the method of Nawal et al. to our datasets, for test-1 we got an ACA of 90.70% with leak size identification error E r of 14.6% and for test-2 we got an ACA of 94.09% with leak size identification error E r of 10.9%.The ACAs for test-1 and test-2 are less than the ACAs for the proposed method as seen in Tables 4 and 5 and Figure 7(a,b).The low ACA for leak size identification and leak detection for the reference method may be due to the varied sources for AEH in the AE signal, such as the fluid pressure on the joints, background interference noises, flange vibrations, and leaks in the pipeline.The multiple sources of AEH make the AEH features less sensitive to the identification of leak size in the pipeline.As compared to the reference method, the proposed method preprocesses the AEH features and by utilizing the change in the distribution of AEH features, a new feature VI is obtained which is more discriminant and leak size sensitive as compared to raw AEH features.
The proposed method is also compared with TD-CNN.First, in the TD-CNN, AE TD features such as RMS, kurtosis, mean, variance, and standard deviation are extracted from the AE signals.For leak size identification and leak detection, these features were provided to 1D-CNN.After extracting the AE TD features from test-1 and providing them as an input to 1D-CNN, we got an ACA of 64.43% with an E r of 36.2% for the identification of leak sizes and pipeline operating conditions at 7 bar fluid pressure.As in test-1, the TD-CNN underperformed for test-2, resulting in a low ACA of 84.79% with a high E r of 21.1%.From the results presented in Tables 4 and 5 and Figure 8(a,b), the TD-CNN is not an appropriate choice for pipeline leak detection and leak size identification.The underperformance of TD-CNN can be explained as follows: AE signals are collected under continuous background noise which results in noise along with AEHs obtained from AE events.The TD features are very sensitive to the background noise; they can be strongly affected by it as seen in Figures 6(b) and 7(b).These features demonstrate fluctuation almost at the same level as in normal conditions.

Surveillance zone of the proposed method
According to the ISO standard 18211:2016 before acquiring the data from the AE sensor the surveillance zone is determined based on the AE signal attenuation characteristics according to the AE source-induced noise.In AE the term attenuation refers to the loss of signal strength measured in decibels (dB).The attenuation characteristic of an AE sensor can be calculated using the following equation.
In Equation ( 12), the measured potential is represented by V and the reference potential is represented by V * .The term measured AE potential refers to the AE signal obtained from the AE sensor.In acoustic emission, the reference 0 dB is the AE signal potential of 1 µV at the AE sensor without any amplification.In this study specifically, the HSU-Nielsen test is used as an AE source to calculate the attenuation characteristics of the AE sensor.The HSU-Nielsen test is a pencil-led break test where a 0.5 mm diameter of lead is pressed again the pipeline surface to generate an AE event.The AE hits generated from the HSU-Nielsen test are like the natural AE source such as leak-related AE hits.The attenuation characteristics of a fluid-filled industrial pipeline having an outer diameter of 114.3 mm are illustrated in Figure 9.For AE sensor R15I-AST installed on an industrial pipeline having an outer diameter of 114.3 mm the distance between the two sensors should be less than 25 dB.This level of attenuation occurs at 10.9 m as can be seen in Figure 9. Thus, an AE sensor R15I-AST installed on the industrial pipeline with an outer diameter of 114.3 mm can provide surveillance for 10.9 m.When the attenuation of the acoustic signal increases from 25 dB, the AE hits start appearing below 10% of the total peak value of the AE signal, therefore, it becomes difficult to distinguish the leak-induced noise from the background noise.As the proposed method is using an adaptive threshold of 10% of the peak value to separate the AE events from the background noise, therefore, the proposed method can detect the leak and will be able to identify the leak size within the range of 10.9 m or 10,900 mm using a single AE sensor R15I-AST.
Overall, the success of the proposed technique lies in its core idea, that is, to utilize the distribution changes in the AEH features on multiscale as they vary due to the operating conditions of the pipeline.The implementation of the proposed technique is very simple, and the computational complexity and memory consumed are very low.Furthermore, the proposed method can help the industry to overcome the consequences that arise from pipeline leaks such as economic losses, impacts on public safety, pollution, and waste of resources.All these features make the proposed method enticing for use by the industry.

Conclusions
This paper proposed a new technique for leak detection and leak size identification in fluid pipelines.Acoustic emission hit features are extracted from the pipeline acoustic emission signals.To retain the properties of both continuous and burst-type acoustic emissions, a sliding window with an adaptive threshold is used for acoustic emission hit features extraction.The distribution of acoustic emission hit features changes along with changes in the pipeline operating conditions.The distribution change of acoustic emission hit features is calculated using a newly developed multiscale Mann-Whitney test.The test output statistics resulted in a new feature called the vulnerability index.The Mann-Whitney test is insensitive to the outliers generated by acoustic emission sources other than the leak.Therefore, the vulnerability index resulting from the multiscale Mann-Whitney test showed sensitivity towards different leak sizes.To identify the operating conditions of the pipeline, the vulnerability index is provided as an input to 1D-CNN.The proposed method is verified using acoustic emission signals obtained from an industrial fluid pipeline.The results obtained from the proposed method and the reference methods show that the proposed method detects and identifies the leak in the fluid pipeline with higher accuracy as compared to the reference methods irrespective of the fluid pressure inside the pipeline.The proposed method can detect the leak and identify the leak size; however, the proposed method cannot provide any information about the leak location.In the future, the proposed method will be improved to localize the leak in the fluid pipelines.Furthermore, leak detection in underground and pipelines operating in marshy conditions can also be considered for future research.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 2 .
Figure 2. Proposed framework for pipeline leak detection and leak size identification.

Figure 3 .
Figure 3. (a).Picture of industrial fluid pipeline test setup.(b).Schematic of industrial fluid pipeline test setup.(c).AE data acquisition system.

Figure 4 .
Figure 4.The CP and pipeline operation during the data acquisition process.

Figure 5 .
Figure 5. AE signals obtained from pipeline under different operating conditions (a) 7 bar fluid pressure (b) 13 bar fluid pressure.

Figure 6 .
Figure 6.Comparison of the proposed feature with traditional AE features under 7 bar fluid pressure (a) proposed VI (b) traditional AE features.
Figure 6(a) shows the VI obtained from the pipeline under 7 bar fluid pressure.The blue shaded region in Figure 6(a) represents the VI in normal conditions and the yellow shaded region shows the VI under leak conditions.In Figure 6(a), the blue plot shows the severe leak condition of 1 mm, the red plot shows the leak size of 0.5 mm and the black plot shows the incipient leak condition of 0.3 mm.It can be observed

Figure 7 .
Figure 7.Comparison of the proposed feature with traditional AE features under 13 bar fluid pressure (a) proposed VI (b) traditional AE features.
Figures 6(b) and 7(b) show the traditional features extracted from the AE signal under the respective fluid pressures.It can be observed from Figures 6(b )

Figure 8 .
Figure 8. Performance comparison of the proposed method with the reference methods for 7bar and 13-bar fluid pressure inside the pipeline (a) 7 bar (b) 13 bar.

Figure 9 .
Figure 9. AE signal Attenuation in 114.3 mm diameter steel pipe.

Table 1 .
Architecture of 1D-CNN used for pipeline leak detection and leak size identification.

Table 2 .
Parameters used for data acquisition.

Table 3 .
Details of the dataset.

Table 4 .
Performance comparison of the proposed method with the reference methods under 7-bar fluid pressure.

Table 5 .
Performance comparison of the proposed method with the reference methods under 13-bar fluid pressure.