Fault Diagnosis of Oil Pumping Machine Retarder Based on Sound Texture-Vibration Entropy Characteristics and Gray Wolf Optimization-Support Vector Machine

In order to diagnose the retarder faults of oil pumping machine accurately in complex environments and improve the generalization of the algorithm, a GWO-SVM fault diagnosis algorithm based on the combination of sound texture and vibration entropy characteristics was proposed. Firstly, the acquired sound signal was purified by band-pass filter, then generalized S-transform was developed to extract the box dimension, directivity, and contrast ratio, which reflect the characteristics of time-frequency spectrum, to construct three-dimensional texture eigenvectors. Secondly, the K parameter of variational mode decomposition (VMD) was reasonably selected by the energy method, and then the vibration signal was decomposed to get modal components, and the permutation entropy was obtained from modal components. Finally, joint eigenvectors were constructed and fed into SVM for learning..e gray wolf optimization (GWO) algorithmwas used to optimize the parameters of the SVMmodel based onmixed kernel function, which reduces the impact of sensor frequency response, environmental noise, and load fluctuation disturbance on the accuracy of retarder fault diagnosis. .e results showed that the GWO-SVM fault diagnosis method, which is based on the combination of sound texture and vibration entropy characteristics, makes full use of the complementary advantages of signal frequency band. And the overall diagnostic accuracy for the experimental samples reaches 100%, which has good generalization ability.


Introduction
Oil pumping machine relies on the up and down movement of the horsehead to complete the lifting of the crude oil from the wellbore. e retarder connecting the crank train is the key component of the power transmission. Due to friction and impact, oil pumping machine is prone to oil leakage, gear damage, bearing damage, belt wear, and other faults [1,2]. Vibration signal characteristics can be used to diagnose mechanical failure of oil pumping machine retarder [3,4]. But limited to the charge accumulating effect and coupling mode of the piezoelectric accelerometer, it is likely to cause detection failure from the continuous impact caused by the pitting and broken cog of the retarder. e reason is that the vibration sensor only detects a narrow frequency response range and is insensitive to high-frequency signal changes. e sound signal accompanying the operation of the oil pumping machine is homologous to the vibration signal and can be obtained by the noncontact electret film capacitive sensor, which can effectively compensate for the detection failure phenomenon caused by the vibration sensor band limitation.
For nonstationary signals such as sound and vibration, there are mainly analysis methods such as dynamic time warping (DTW), wavelet transform (WT), empirical mode decomposition (EMD), and local mean decomposition (LMD) [5][6][7][8]. e DTW planning optimal path is prone to metamorphosis distortion, WT has energy leakage, and the two cannot adaptively decompose the signal. e EMD adaptive decomposition process is prone to over enveloping, end effect, and modal aliasing. LMD optimizes underfrequency, over envelopes, and other issues of EMD.
However, LMD and EMD are essentially recursive decomposition methods, which cannot eliminate the endpoint effect, modal aliasing, and other issues fundamentally. Variational mode decomposition (VMD) is a new nonrecursive mode decomposition method, which avoids the modal aliasing problem caused by envelope error [9,10]. e generalized S-transform is developed by short-time Fourier transform (STFT) and Wigner-Ville distribution (WVD), which makes up for the defects of STFT single resolution and WT phase and without cross-term interference problem. It introduces a frequency-dependent adjustment factor into the window function to adjust the time-frequency resolution [11].
For the sound-vibration signal combination method, Zhao et al. [12] process sound and vibration signals via improving the ensemble empirical mode decomposition (EEMD) to calculate the two-dimensional spectral entropy. e diagnostic effect is improved. e literature [13] utilizes wavelet packet and feature entropy theory to extract the features of the collected sound and vibration signals, which improves the diagnostic accuracy. Zhang et al. [14] construct three-dimensional (3D) graph of sound and vibration and extract envelope of 3D graph and hierarchical eigenvectors based on shape, which provides a new idea for circuit breaker diagnosis. In [15], LMD is utilized to decompose sound and vibration signals and select appropriate PF component to obtain feature entropy as the eigenvectors. e diagnostic accuracy is improved. Although studies [12][13][14][15] have achieved some good results, the difference in sound and vibration signals is not considered. e feature extraction methods of the two signals are identical, which leads to misdiagnosis and poor generalization.
It is a beneficial attempt to distinguish the defect of oil pumping machine retarder by the features combination of sound and vibration signals. e generalized S-transform is very sensitive to high-frequency impact signals and has high time-frequency resolution [16], which can accurately reflect the high-frequency impact characteristics of sound signals and rich time-frequency information.
e load fluctuation of oil pumping machine's up and down stroke causes the spectrum of the vibration signal of the retarder to be extremely complicated. e signal components include the combination of gear shaft rotation, gear meshing frequency, and so on [17]. Performing VMD method can effectively improve misdiagnosis and missed diagnosis caused by band aliasing. In this paper, the combined features of sound and vibration signals are extracted and combined with the GWO-SVM model for fault identification. e diagnosis process is shown in Figure 1. AC144 piezoelectric acceleration sensor (0.6-10000 Hz) and NVL-AF-audio embedded waterproof (explosion-proof) high-fidelity pickup (20-20000 Hz) were acquired to collect CYJ10-3-48HB oil pumping machine retarder (CJH1100 × 73) multiple sets of sound and vibration sample signals under retarder oil leakage, gear pitting peeling, belt damage, and normal state to extract sound texture-vibration entropy characteristics, respectively.

Sound Signal Feature Extraction
e motor transmits power to the retarder through the belt. e retarder reduces the high-speed rotation of the motor through three-axis two-stage deceleration to the low-speed rotation of the crankshaft. e whole operation process is relatively complex, and it runs in the open air. As a result, the sound signals collected include the belt friction noise of oil pumping machine, the noise of the motor, wind noise, thunder noise, and human voice.
rough spectrum analysis, it is not hard to find that the above noise is mostly low-frequency interference, and the frequency is concentrated between 0 and 10 kHz. So the frequency band noise of sound signals below 10 kHz and above 20 kHz is filtered by finite impulse response (FIR) band-pass filter. e sound signals in normal operation state between 10 and 20 kHz, before and after denoising, are shown in Figure 2.

Sound Signal Generalized S-Transform.
e windowed Fourier transform for signal x(t) is performed: e Gauss window function is scaled σ and translated τ, that is, e time-frequency spectrum of x(t) is obtained by introducing formula (2) into formula (1): Set σ(f) � 1/|f|, get the S-transform as follows: where f is the frequency, t is the time, τ is the positional parameter controlling the Gaussian window on the time axis t, and ω(f, τ − t) is the Gaussian window function, and the height and width of which vary with the frequency f. At the same frequency, the Gaussian window function of different signals is fixed. In order to improve the energy concentration of time-frequency, parameters α and β are introduced. e basic structure is as follows: where α and β are adjustment factors and are generally positive. When β increases or α decreases, the Gaussian e box dimension, directivity, and contrast ratio which reflect the time-frequency texture features are extracted as eigenvectors to classify faults.

Box Dimension.
e box dimension is particularly sensitive to the texture roughness of the image spatial distribution and can quantitatively characterize the distribution law of the time-frequency diagram. e calculation process is as follows: Suppose the minimum and maximum values of the image grayscale in the (i, j) grid fall in the k-th and l-th boxes, respectively, then the number of boxes required for the (i, j) grid image is given by e number of boxes required to cover the entire image is N r : e box dimension can be calculated by the following formula: e log (N r ) and log (1/r) slopes are fitted by leastsquares method, and the absolute value is the box dimension.

Directivity.
Directivity describes the global characteristics of texture images and characterizes the trend of basic texture units and their arrangement tendency in all directions. Calculate the gradient vector ΔV of each pixel, whose modulus and direction are defined as follows:  where |ΔV| is the gradient vector modulus; ΔP and ΔQ are the changes of ΔV in the horizontal and vertical directions, respectively; and θ is the angle of the gradient vector.
Make a histogram and calculate the directivity by calculating the peak value of the gradient statistical histogram. e formula is as follows: where w p is the amplitude of the p-th peak, n p is the number of histogram peaks, the subscript D of H D is the D-th peak of the histogram, and r is the normalization factor.

Contrast
Ratio. e contrast ratio reflects the difference of the image in gray level. e larger the difference in gray value, the stronger the contrast ratio. When there is a significant peak in the gray value of 0 or 255, the deviation degree is measured by the kurtosis k 4 , and the calculation is as follows: where k 4 � μ 4 /σ 4 h , μ 4 is the 4th moment of the gray mean of the whole image and σ h is the standard deviation of the image.
Calculate the box dimension, directivity, and contrast ratio of the generalized S-transformed time-frequency diagram of the sound signal in the four states (belt breakage, retarder oil leakage, gear pitting peeling, and normal state), and record them as D, F dir , and F c , respectively. e sound texture features are shown in Table 1.
e texture features of the abovementioned sound signal have antinoise and rotation invariance and can describe the local pattern and arrangement rule of the image. Longitudinal and horizontal comparison of the characteristics of the sound texture can identify the operating state of the retarder.

Vibration Signal Feature Extraction
e energy spectrum of the vibration signal is concentrated within 10 kHz, and the vibration signals of the normal state and belt damage state of oil pumping machine are collected. e time domain waveform comparison is shown in Figure 4.

VMD Decomposition
e VMD decomposition is mainly divided into two parts: the establishment and solution of the variational constraint problem. e following problems are solved for the vibration signal with the data length N in the oil production process: where μ p ≔ μ 1 , . . . , μ p is the decomposed p modes and ω p ≔ ω 1 , . . . , ω p is the center frequency of the p modes.
In order to solve the optimal solution, the quadratic penalty factor α and the Lagrangian operator λ(t) are introduced to change the constrained variational problem into a nonbinding variational problem. e extended Lagrangian expression is as follows: e saddle point of equation (13) is solved by the ADMM method, so that μ n+1 k , ω n+1 k , and λ n+1 are continuously updated, and the modal component μ k and center frequency ω k are solved as follows: e VMD steps are as follows: (1) Initialize μ 1 k , ω 1 k , λ1, and n, let its initial value be 0, and set the decomposition modal number K to 2 (predecomposition optimization).
(4) If the following formula is satisfied, the iteration is stopped and the result is output; otherwise, it returns to step 2:

VMD Parameter K Optimization.
In order to prevent the VMD from being decomposed, the K parameter is selected according to the energy conservation theory before and after decomposition. For the original retarder vibration signal sequence x(i), the energy calculation formula is as follows: where E represents the signal energy value and n is the sampling point. In order to characterize the energy difference before and after the VMD decomposition, the energy difference parameter ψ is defined and calculated as follows: where E x corresponds to the energy of the x component, K is the number of components, and E is the original signal energy. e total energy of K components after VMD decomposition is equal to the energy of the original signal (i.e., the ideal value of ψ is 0), after many experiments and calculations, the trend of K is shown in Figure 5.
It can be seen from Figure 5 that when K is greater than 6, the energy difference parameter λ is increased, and it can be judged that overdecomposition occurs. At this time, the K value at the turning point is the optimal decomposition mode number of the VMD. e time-frequency diagram obtained by decomposing the vibration signal in the belt damage state is shown in Figures 6 and 7.
It can be seen from the frequency domain spectrum of Figure 7 that the VMD decomposition of the vibration signal effectively improves the modal aliasing phenomenon and provides a strong support for accurate fault diagnosis.

Calculated Permutation Entropy.
Permutation entropy can detect the sudden change of the signal, has strong antinoise ability and high time resolution, and is highly targeted to the nonstationary chaotic vibration signal under complex environment of oil production field.
For the signal sequence X(i), i � 1, 2, . . . , n { }, phase space reconstruction is performed to obtain the following matrix: where j � 1, 2, . . .  Shock and Vibration symbol sequence has m! species, and the probability of occurrence of s different symbol sequences is P 1 , P 2 , . . . , P s , respectively, calculated as follows: Normalize Pe(m): After many experiments and comparative analysis, the embedding dimension m is selected as 8 and the delay time τ is selected as 5, and the entropy of the six modes of the vibration signal is obtained as shown in Table 2.
e entropy value is related to the degree of the sample sequence rule. From Table 2, it can be seen that there are obvious differences in the permutation entropy of each state, the distinguishability is good, and the state information of the retarder can be effectively characterized.

GWO-SVM Diagnostic
Model. SVM is especially suitable for small sample fault diagnosis, but the selection of penalty factor C and kernel function directly affect the classification performance of SVM. Considering that the sample in this paper is a multisensor data feature set, the sensor frequency response, environmental noise, and load output fluctuation interference have a great influence on the diagnostic accuracy, and GA, PSO, and other algorithms are easy to fall into the local optimal solution, so a mixed kernel function is constructed and the gray wolf optimization (GWO) algorithm is introduced to find the best penalty factor and mixing coefficient.
It is necessary to combine the different kernel functions to obtain a kernel function with strong promotion ability and learning ability and good extrapolation ability. e Taylor-Kernel with moderate decreasing (T-KMOD) function satisfies the zero-point near-descent criterion, with good flexibility, fast convergence, and good locality.
Polynomial functions have high classification accuracy and strong generalization ability. erefore, the two are combined in the following form: x · x ′ ∈ R n , L > 0, L is used to control the value of the kernel function at 0. σ and c are used to control the width and convergence rate of the kernel function, respectively, and both i and n are positive integers.
GWO has a fast convergence speed and a simple structure, and it is easier to achieve optimal classification. e mathematical model is as follows: where t represents the current number of iterations; X p (t) is the prey position vector, and A and C are coefficient vectors.
A and C are as follows: where a is the convergence factor, satisfying a ∈ [0, 2], and r 1 and r 2 are random vectors in [0, 1]. 60 sets of samples were collected for each state, 40 sets were used for training, and 20 sets were used for testing. Initialize the population, calculate the fitness value of each wolf, select the first three optimal fitness values, determine the gray wolf rank, and update the gray wolf position, head wolf, coefficient vector, and other parameters until the SVM parameters are optimal. e algorithm flowchart is shown in Figure 8:

Experimental Result.
In the experiment, the MVP-6000 acquisition card of ADLINK Company is used, whose sampling rate was set to 40 kHz. e sound sensor is placed about 50 cm away from the sound source, and the vibration sensor is adsorbed on the surface of the vibration body of the retarder. e acquisition card is equipped with IEPE constant current source, ±10 V voltage range, 24 bit resolution, and 110 dB dynamic range. e sound texture feature extracted by the generalized Stransform and the vibration entropy feature calculated by the VMD decomposition are combined to construct a joint eigenvector matrix and sent to the SVM for training. GWO is used to optimize the penalty factor C and the mixing coefficient λ in order to improve the SVM classification performance. e number of iterations is set to 100, and the optimal parameter values are 2.5693 and 0.17, respectively. e convergence of the algorithm is shown in Figure 9: e sound-vibration joint eigenvectors are constructed with a combination of Tables 1 and 2. Set the normal state sample characteristic value label to 1 (1-20 groups), the belt damage is 2 (21-40 groups), the retarder oil leakage is 3

Shock and Vibration
(41-60 groups), and the gear pitting peeling is 4 (61-80 groups). Some test data are shown in Table 3.
e diagnosis results are as follows, in which Figures 10  and 11 show the fault identification results before and after the model optimization.
According to the results of Figure 10, two groups of normal state samples were misjudged as belt damage state, two groups of retarder oil leakage state samples were misjudged as belt damage state, and one group of gear pitting peeling state samples was misjudged as retarder oil leakage state. e identification accuracy is 93.75%. e results of the GWO-SVM model are shown in Figure 11. All 80 groups of test samples are classified correctly, and the identification accuracy is 100%. Although the input characteristics are the same, the diagnostic accuracy differs significantly. Compared with SVM, the accuracy of GWO-SVM is improved by 6.25%. e reason is that GWO-SVM diagnostic model can find the appropriate parameters of SVM classifier through GWO, which makes full use of the classification advantages of SVM in constructing optimal hyperplane, and thus, the

Sound-Vibration Joint Characteristic Method
Verification. Contrast the diagnosis effect of sound signal, vibration signal, sound-vibration combined, as shown in Figure 12. For the experimental sample data, the diagnostic accuracy of sound characteristics and vibration characteristics were 91% and 94%, respectively, and the sound-vibration combined diagnosis accuracy rate reached 100%. erefore, based on the combined characteristics of sound   and vibration, the state information of the retarder can be fully reflected, the extracted eigenvectors are complementary, and the diagnostic effect is improved.

Verification of Generalization Performance.
Because the source and structure of the data are different in the actual oilfield operation, it is necessary to classify the fault data of the same type and different characteristics. In the generalization experiment, the sampling rate was changed from 40 kHz to 30 kHz, and the PCB357B21 type vibration sensor and the WM-025N type pickup were replaced, and the sensor placement position was changed. e diagnosis result is shown in Figure 13. It can be seen from Figure 13 that the overall diagnostic accuracy of the optimized SVM model still reaches 97.8% in the case of changes in acquisition parameter settings, sensor types, and positions, which is much higher than the unoptimized model, indicating that the optimized model has stronger adaptability to fresh samples and better generalization ability.

Conclusion
Defect identification of related retarder of beam oil pumping machine has always been a technical problem in the state monitoring of distributed oil production wells. e combination of sound texture and vibration entropy characteristics and the GWO-SVM classification algorithm, proposed in this paper, can effectively and accurately diagnose the field faults under the complementary frequency band. e main contributions and novels of the proposed method are summarized as follows: (1) A fault diagnosis method based on the complementarity combination of sound-vibration signals is proposed for retarder equipment of oil pumping machine, which improves the accuracy of fault identification on the basis of nonmissing detection of retarder defects. (2) For sound signals, the box dimension, directivity, and contrast ratio of time-frequency diagram are calculated after generalized S-transform to construct the sound texture features. For vibration signals, parameter K of the VMD method is selected by the energy method, and the permutation entropy of modal components is obtained to construct the vibration entropy characteristics. e combination of the two effectively characterizes comprehensive information on various types of fault samples.   Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
All the authors have made a contribution to a great extent: Shutao Zhao provided innovation and ideas of the paper; Erxu Wang and Ke Chang wrote the article and revised it later; Bo Li translated the content of the article; Kedeng Wang and Qingquan Wu collected the experimental data.