Hybrid Feature Extraction of Pipeline Microstates Based on Φ-OTDR Sensing System

.is paper proposes a general integration method which can effectively describe the characteristics of pipeline leakage and help distinguish multiple pipeline microstates. Since the rapid development ofΦ-OTDR in recent years, this technology has been applied to more and more fields, such as fiber optic safety monitoring, seismic monitoring, and structural health monitoring. Among them, Φ-OTDR has the characteristic of continuous full-scalemonitoring in pipelinemonitoring, but there are few researches on pipeline state characteristics at present. In this paper, based on the analysis of the pipeline state with Φ-OTDR technology, a method of extracting multiple microstates of pipelines is proposed. .is method combined with the peak-to-average power ratio, short-term interval zero crossing, and fractal characteristics in the frequency domain can effectively characterize the microstate of pipes and provide support for identification of more microstates of pipelines. .ese features reflect the common characteristics of leaks in gas pipelines and liquid pipelines. Meanwhile, their combination features can represent the small differences in pipeline states. .e experimental results show that the method can effectively characterize the microstate information of the pipeline, and the recognition rate of the hybrid feature under two kinds of pipeline leakage and multipressure conditions reaches 91% and 83%.


Introduction
Pipeline is an important infrastructure in modern society.Because of environmental corrosion, aging, and man-made damage, pipelines are prone to leak.Pipeline leakage is an important security issue, which directly affects the normal operation of life and production.Because huge losses to normal life and production are caused by pipeline leakage, monitoring of the pipeline is imperative [1][2][3][4].In the aspect of pipeline monitoring, point sensors are easily disturbed by humidity, low temperature, electromagnetic radiation, and other factors, which makes it difficult to identify the pipeline states accurately.Φ-OTDR is a kind of distributed optical fiber sensing technology which collects the disturbance signals surrounding the optical fiber.Based on the advantages of high sensitivity and fast response, Φ-OTDR can overcome the interference of humidity, low temperature, and electromagnetic radiation and will be the trend of pipeline monitoring [5][6].Because of its high accuracy and high spatial resolution, Φ-OTDR is widely used in fields of fiber optic safety monitoring, perimeter security, submarine cable safety, and structural health monitoring [7].
At present, the wavelet has been used to analyze pipeline leakage states through single threshold comparison [8].Qu et al. proposed an "energy-pattern" method based on the wavelet and support vector machine (SVM), which can recognize whether any abnormal event is taking place [1].Wang used the fractal box dimension and improved approximate entropy to distinguish leakage and interference signals.However, only the anomaly in the pipe is studied, and recognition results are greatly influenced by the noise [9].Meanwhile, the development of optical fiber perimeter security is fast.Huang et al. proposed a high-resolution scheme combining empirical mode decomposition with kurtosis, whose average accuracy reached 89% [6].After that, a high-resolution scheme combining empirical mode decomposition with hybrid features was proposed.In such methods, EMD works in an iterative mode and consumes excessive computation [10].However, it requires a lot of research to be done in monitoring and representing the states of pipelines by using Φ-OTDR.e pipeline leakage signal, which is a kind of nonlinear and nonstationary signal, is affected by multiple factors.e signal based on the Φ-OTDR sensing system is extremely sensitive to the monitoring events with significant changes in magnitude, which requires features to adapt to strong noise interference and high hardware cost in signal processing, while there is fading in most Φ-OTDR sensing systems, resulting that features in the time domain find it difficult to represent the effective information.Moreover, the amount of monitoring data by the Φ-OTDR sensing system is huge, and long time of operation is not suitable for engineering application.From the above researches, the sole feature extraction is hard to represent the enough effective information of pipeline leakage, which is not enough to support multistate or even microstate recognition [11]. is paper proposes a method of extraction of pipeline microstates based on the Φ-OTDR sensing system.
e method combines the peak-to-average power ratio (PAR), short-term interval zero-crossing rate (STI-ZCR), and fractal characteristics in the frequency domain (FF) [12,13].e peakto-average power ratio can reflect the overall linear information of the signal over a period, which responds to signal distortion effectively and does not cause any hardware processing cost.Zero crossing reflects the sensitivity of the signal energy to the amplitude and describes the frequency-domain information of signals from the time domain.Moreover, we extend the range of frequency-domain information by improving this feature, while retaining the convenience of calculation.As the fading of Φ-OTDR, the direct fractal feature of the time domain is not enough to characterize the sufficient information of the target event.But the leakage signals based on Φ-OTDR have selfsimilarity at different scales [14,15].erefore, the fractal characteristics in the frequency domain are chosen to reflect the subtle characteristic of the signal frequency.e hybrid features consist of the peak-to-average power ratio, short-term interval zero crossing, and fractal characteristics in the frequency domain, which not only reflect leakage states of pipelines but also represent more detailed information for multiple microstates of pipelines.Although these features cannot directly obtain the hiding pipeline state information, machine learning methods, such as support vector machine (SVM), C5.0, and randomforest algorithm, can be used to recognize the microstates of pipelines [16,17].e experimental results prove this method based on hybrid features which can effectively improve the pipeline leakage event recognition accuracy with the classifier of random forest, whose average accuracy is above 91% on two kinds of pipeline leakage and 83% on four microstates.is paper is organized as follows: Section 1 introduces the feature extraction research on pipelines based on Φ--OTDR.Section 2 introduces the related work to extract features of the pipeline.Section 3 describes the method of hybrid feature extraction and the classifier algorithm.Section 4 presents the method application and details of experiments.Finally, Section 5 summarizes the main results and concludes the whole article.

Preview Works of Signals Collection
Figure 1 shows the schematic diagram of the pipeline leak experiment.e Φ-OTDR sensing system mainly consists of a laser, acousto-optic modulator, erbium-doped optical fiber amplifier, photodetector, analog-to-digital converter, data acquisition card, and personal computer.We have two exactly same pipes, and we make a leak hole on one of them which is shown as the upper pipeline in Figure 1.rough the Φ-OTDR sensing system, the leakage data and the noise with different types of inputs are collected.Also, we control the pipe output by the control valves and pressure gauge and collect the pipeline leakage data at different states through the Φ-OTDR sensing system.e domain morphologies of the leakage signals are shown in Figure 2. e pipeline leakage data are obtained through the Φ-OTDR system.e figure is a pair of signals collected under the same conditions: gas leakage signals and gas noise signals.
From the above analysis, we can find the following three aspects: (a) e pipeline leakage signal is a continuous and nonstationary signal.ere is no obvious difference between the leakage signal and the noise signal, which needs feature extraction to obtain the effective information hidden in the signal.(b) e pipeline data acquisition environment is complex, and single feature finds it difficult to fully characterize pipeline leakage information and distinguish the tiny states of pipelines.
(c) Because of the correlation fading effect of Φ-OTDR, time-domain morphological features cannot effectively represent pipeline information to recognition.is requires that feature extraction should consider both computational efficiency and conventional time-domain features.rough large researches of pipeline detection based on Φ-OTDR and feature engineering, the hybrid features of the peak-to-average power ratio, short-term interval zero crossing, and fractal characteristics in the frequency domain can effectively reflect the accurate information of pipeline leakage and help identify the leakage, types, and multiple microstates of pipelines.

The Proposed Method
Because the data of pipeline leakage based on Φ-OTDR are nonstationary and because of its related fading effect, the conventional time-domain characteristics of leakage data cannot be used.In order to solve this problem and support the pattern recognition for multiple microstates of pipelines, this paper also takes into account the energy characteristics, linear characteristics, and local self-similarity of the signal and chooses the peak-to-average power ratio, short-term interval zero crossing, and frequency fractal to reflect the common features and details of the pipeline leakage [18,19].
e overall block diagram of the method is shown in Figure 3.
is paper only discusses the identification of point data and the analysis models of point data which could be applied to the monitoring and identification of the whole fiber optic cable.So, we select several stable points on the two experimental pipelines as the dataset and perform data 2 Journal of Control Science and Engineering preprocessing.In order to evaluate the hybrid features, we have introduced the classifier of random forest.

Data Preprocessing
3.1.1.Outlier Processing.In the original signal, there is some single point impact noise generated by the equipment itself.In order to reduce the impact of single point on feature calculation, we replace these outlier points by the window mean values.Steps are as follows.
e first step is to find all the offset points by analyzing the global distribution of the data and calculating the maximum and minimum values: where Q3 is the third quartile, Q1 is the first quartile, and k � 3. Journal of Control Science and Engineering e second step is to replace these points by the window mean values.e window length is usually set according to the actual requirement.

Centralization.
Centralization is to subtract the mean from the raw data, and the processed data fluctuate near zero.
is operation is to facilitate the calculation of the improved peak-to-average power ratio and short-term interval zero-crossing rate.
In general, a takes the mean value of sgn(x).

Standardization and Normalization.
In order to reduce the influence of signal amplitude difference and focus on the essence of the leakage of pipelines, the acquisition data should be standardized and normalized firstly. is step can eliminate the errors in the acquisition of optical timedomain reflectometry and ensure the subsequent computation speed.e formulas are as follows: . (3)

Data Cutting.
e collected pipeline leakage signal is a time series of amplitude.In order to facilitate feature extraction, the time series needs to be processed, mainly using the method of cutting the window.Because the pipeline leakage state is continuous and transformed gradually, in order not to omit the pipeline leakage information, we use the sliding step method to extract the features.e steps of cutting the window are as follows: (a) Window length N and sliding step length S are set.
We find the best effect usually at S � 1/2N.e window length is limited by the sampling frequency, and the window must contain more than 2 peaks and valleys.(b) e peak-to-average power ratio and short-term interval zero-crossing rate are calculated by sliding windows one by one.More details are given in the next section.(c) Fourier transform is used to obtain the frequencydomain amplitude sequence of the original signal and then get the fractal characteristics by the sliding window.

Feature Extraction.
e pipeline leakage signal, which is a nonlinear, nonstationary signal with the fading phenomenon, is affected by multiple-factors coupling.Hybrid features consist of peak-to-average power ratio in the field of signal processing, short-term interval zero-crossing rate in the field of speech processing, and frequency fractal in the field of image processing.However, the leakage signal and the nonleakage signal have an intrinsic relationship under the intensity characterization but are not directly related to the strength.erefore, the relative transformation characteristics of the pipeline signals over a time period should be paid more attention [12,20].
ese three features are based on the time-frequency analysis of the pipeline leakage signal.is combination covers the common features and local detailed information of pipelines.erefore, the hybrid features are a combination of common and individual characteristics that can accurately reflect the leakage [21].

Peak-to-Average Power Ratio.
PAR is a kind of feature extraction method in the field of signal processing, which reflects the overall linear characteristics of signals.e large PAR represents a larger relative peak and the linear range of the signal [22].When it comes to the field of feature analysis, PAR will not cause any hardware processing costs, which also avoids the impact of the fading.In the case of strong noise interference, the abnormal value of an extremely strong noise signal can be processed, the strongest part of noise can be removed, and the stable part of the signal peak value can be selected.By this way, the contrast of features can be increased.PAR is regarded as the reciprocal estimation of signal-to-noise ratio.According to the characteristics of the leakage signal, its definition is modified as follows: Step 1. Get the absolute value |sgn(x)|.
Step 2. Take the mean value |x| peak of over 90% quantiles in the signal |sgn(x)|.
Step 3. Take the mean value x rms of |sgn(x)|.

Short-Term Interval Zero-Crossing Rate.
In the field of speech signal processing, short-term average zero-crossing rates are commonly used for endpoint detection of unvoiced and voiced speech.e zero-crossing rate is the ratio of the sign change of a signal [23].If it is a sinusoidal signal, its average zerocrossing rate is twice the signal frequency divided by the sampling frequency, and the sampling frequency is fixed.erefore, the zero-crossing rate can describe the frequency information of the signal from the time domain.When the pipeline continues to leak, the leakage signal is like the unvoiced signal, and the noise signal in the pipeline leakage is like the voiced signal.e pipeline leakage signal is a relatively stable continuous signal when it is stably leaked.In view of the unique characteristics of the pipeline leakage signal and the short-time zero-crossing rate calculation characteristics in the field of speech processing, we improve the short-time zero-crossing rate, retain the convenience of short-time zero-crossing rate calculation, and extend the frequency range information characteristics of the signal from the time domain, which is called the short-term interval zero-crossing rate. is paper calculates the ratio of symbol change of the signal through the interval [− a, a].
e improved calculation steps are as follows: 4

Journal of Control Science and Engineering
Step 1. Center the original signal by formula (2) and obtain sgn ′ (x).

Frequency Fractal.
Fractal is a universal characteristic of complex things in nature.e fractal dimension is used to measure the self-similarity of curves in fractal theory in general.e degree of irregularity of nonstationary random vibration signals can be described by the fractal dimension.ere are many methods of calculating the fractal dimension, such as the Hausdorff dimension, capacity dimension, and information dimension.Leakage signals and nonleakage signals have an intrinsic relationship under the intensity characterization but are not directly related to the strength; that is, the pipeline leakage signal has a certain self-similarity at different intensities.We extract the frequency fractal to represent the local information in the frequency domain, abbreviated as FF.
e information dimension can reflect the inhomogeneity of the distribution of the set to be tested, but the calculation is relatively complex.
is method overcomes the problem that it is difficult to quantitatively describe the feature in the time domain based on the Φ-OTDR sensing system and extracts the leakage signal and the noise signal information in the microscopic analysis angle.Based on the analysis and comparison, the fractal dimension is calculated based on the idea of "covering" [24,25].
where x is the closed interval and N(x) represents the number of sets covering the target set with a subset.Its definition is modified as follows: Step 1. Get frequency-domain data through the Fourier transform of raw data.
Step 2. Acquire the frequency-domain amplitude sequence.
Step 3. Calculate the fractal by the sliding window.

Random-Forest Classifier.
Random forest is a kind of integrated machine learning method.It uses the random resampling technique bootstrap and the random node splitting technique to build decision trees, whose parameters are independent and identically distributed vectors.Under the given independent variable, each decision tree classification model has one vote to select the optimal classification results.RF has the ability to analyze complex interactions and is robust to the data with noise and missing values [21,26].At the same time, it has a fast learning speed.
e implementation process of the random-forest algorithm is as follows: Step 1. N training sets are extracted from the original datasets with the bootstrap sampling method, and a classification and regression tree is built for each training set.e size of each training set is about 2/3 that of the original dataset.
Step 2. In each node of the tree, m features are randomly selected from all the n characteristics (m ≤ n).
After computing the amount of information contained in each feature, one of the m features is selected to split the nodes.
Step 3. ere is maximum growth of each tree without any pruning.
Step 4. N decision trees constitute a random forest.When new data enter random forests, the results of all decision trees are gathered and voting results are used to determine the classification result.

Experimental Environment and Parameters.
To verify our method of hybrid feature extraction, we designed an experimental environment.e experiment scenes are shown in Figure 4.In the upper left corner, there is a pressure gauge, and in the upper right corner, there is the Φ-OTDR acquisition system.e laser source is a 1550 nm distributed feedback laser.In order to ensure universality and stability of the detection, instead of being around the pipe, the fiber optic cable (400 m long with single mode) is parallel to the pipeline in the lower picture.e white part in the picture is a 2-core optical cable with single mode, and the silver section is the pipelines.e diameter of the leak hole on the top pipeline is 4 mm, while the pipe diameter is 19 mm. e length, diameter, and other parameters of the pipelines are all the same.Firstly, the pipe pressure was stabilized at 0.4 MPa, and we choose the gas or liquid as the input for the two pipelines and take turns to collect the leakage data from the upper pipeline and the noise from the lower pipeline through the Φ-OTDR sensing system.en changing the type of input, the above steps are repeated.Finally, we change the pressure to get more microstates of pipelines, such as 0.5-0.4MPa, 0.4-0.3MPa, 0.3-0.2MPa, and 0.2-0.1 MPa. e sampling rate of the Φ-OTDR system was set as fs � 10 kHz, and the recording duration of each trial was set as 10 s.
After the data acquisition and the preprocessing, we extract the hybrid features and build the random-forest model, which helps us recognize the leakage states and the type of the pipeline.e length of the cutting window is 1000, and the length of the sliding step is 500.Each type of features belongs to more than 700 groups in the following sections.

Leakage State Characterization.
In this section, we extract the hybrid features to recognize the leakage state of pipelines.With the gas leakage signal, we obtain the hybrid features, as shown in Figure 5.

Journal of Control Science and Engineering
e noise signal has a wider distribution range on PAR and FF.We choose the random-forest classifier to judge the validity of the hybrid features.For comparison, we refer to the SVM method based on the wavelet and the RBF method based on kurtosis [1,27].Wavelet is a common signal processing method which has been applied to pipeline monitoring.e RBF method performs great in perimeter security.
In order to evaluate the models more accurately, we choose the other two models for comparison.e results are listed from three aspects of average accuracy, average recall rate, and F1 in Table 1.
From the table above, we can find the following: (a) e extensive kurtosis used in perimeter security is not suitable for pipeline state recognition.(b) When only identifying the leakage of the pipelines, the wavelet feature recognition method has an accuracy close to 80%.(c) e accuracy rate of the pipeline state recognition based on the hybrid features reaches 98%. is indicates that the combination features can well reflect pipeline state information.

Multiple Microstates of Pipeline
Representation.e combination of hybrid features and random-forest classifier can well identify whether there is leak in the pipeline.However, in the actual environment, the more the information we get, the easier the arrangement of maintenance work. is is also the significance of the extraction method, which contains energy characteristics, linear characteristics, and frequency-domain local characteristics of pipelines.Here, we also have researched further two-microstate recognition based on this extraction method.

Type Characterization of Leakage Pipelines.
For the collected gas leakage data, liquid leakage data, and noise data, we identify the three states based on hybrid features.Among them, noise data are random mixed data of gas noise and liquid noise.After calculation, the feature distribution of these three microstates is as follows (Figure 6).
For comparison, we take C5.0, SVM, and random-forest algorithm into consideration [26].e recognition results for these three microstates based on hybrid features are shown in Table 2.
e following can be seen from the above results: (a) e combination of PAR, STI-ZCR, and FF can effectively improve the recognition rate of microstates of pipelines based on the Φ-OTDR sensing system.(b) Compared with C5.0 and SVM, random forest acquires a higher accuracy.

Microstate Characterization of Leakage Pipelines.
For more microstates' characterization of leakage pipelines, we control the pressure at 0.5-0.4MPa, 0.4-0.3MPa, 0.3-0.2MPa, and 0.2-0.1 MPa when the gas pipelines are leaking.Regarding these four microstates, the distribution of hybrid features is as follows (Figure 7).For comparison, we take C5.0, SVM, and random-forest algorithm into consideration.e recognition results for these four states based on hybrid features are shown in Table 3.
When smaller states are recognized, the accuracies with these features decrease obviously.But we still find the combination of PAR, STI-ZCR, and FF gets the best recognition.Furthermore, random-forest algorithms acquire the optimal recognition results.

Discussion and Conclusions
rough the experimental results, we find that the hybrid features proposed take into account the generality of pipeline leakage and also consider the local characteristics of the pipelines based on the Φ-OTDR sensing system.ese features proposed in this paper avoid the interference of strong noise and fading effect.From the angle of feature extraction to describe the high-frequency information from the time domain (PAR and STI-ZCR), it helps improving the operational efficiency. is extraction method includes timefrequency information of pipeline states, which can effectively identify pipeline leakage, with accuracy exceeding 98%.In the case of multiple microstates, we can find the FF plays a very significant role in characterizing the microstates of pipelines based on the Φ-OTDR sensing data.e accuracy rate of pipeline type is more than 91%, and the rate of four various pressure states is also above 83%.erefore, the combination of hybrid features and RF classifier can be applied to more kinds of microstate identification for pipeline monitoring based on the Φ-OTDR sensing system.e analysis of pipeline signals from these different aspects provides more possibilities to promote the development of pipeline monitoring based on Φ-OTDR in the engineering practice.It should be mentioned that the current work has its limitations.Although there is rapid development of pipeline monitoring based on Φ-OTDR, the strong noise and fading effect still suppress the feature representation in the time domain.PAR is regarded as a certain estimation of the SNR, and the strong noise in the practice environment requires that feature enhancement methods should be studied [28,29].STI-ZCR is an improved feature to express the higher frequency information from the time domain, which avoids the processing costs such as preaggravation.
is feature still needs more discussions in the online monitoring field.FF describes the objects from the angle of irregularity and self-similarity, which represents the microscopic leakage signal and extends the range of feature extraction.
e features discussed in this paper refer to the fields of signal processing, speech processing, and quantitative description ways like graph theory, which would not obtain the best hybrid features to reflect all the useful information of pipeline signals.More researches referring to other fields will be studied in few years.At the same time, we hope to further expand research on more pipeline leakage scenarios, such as different material pipelines and different embedded environments, and provide support for multiple pipeline microstate monitoring.

Figure 6 :
Figure 6: Feature distribution of the three microstates.(a) Hybrid features of gas leakage.(b) Hybrid features of liquid leakage.(c) Hybrid features of mixed noise.

Table 1 :
Comprehensive comparison of hybrid features.

Table 2 :
Recognition of three microstates with different features.