Fault Diagnosis of Reciprocating Compressor Valve Based on Transfer Learning Convolutional Neural Network

Reciprocating compressors play a vital role in oil, natural gas, and general industrial processes. +eir safe and stable operation directly affects the healthy development of the enterprise economy. Since the valve failure accounts for 60% of the total failures when the reciprocating compressor fails, it is of great significance to quickly find and diagnose the failure type of the valve for the fault diagnosis of the reciprocating compressor. At present, reciprocating compressor valve fault diagnosis based on deep neural networks requires sufficient labeled data for training, but valve in real-case reciprocating compressor (VRRC) does not have enough labeled data to train a reliable model. Fortunately, the data of valve in laboratory reciprocating compressor (VLRC) contains relevant fault diagnosis knowledge. +erefore, inspired by the idea of transfer learning, a fault diagnosis method for reciprocating compressor valves based on transfer learning convolutional neural network (TCNN) is proposed. +is method uses convolutional neural network (CNN) to extract the transferable features of gas temperature and pressure data from VLRC and VRRC and establish pseudolabels for VRRC unlabeled data. +ree regularization terms, the maximummean discrepancy (MMD) of the transferable features of VLRC and VRRC data, the error between the VLRC sample label prediction and the actual label, and the error between the VRRC sample label prediction and the pseudolabel, are proposed.+eir weighted sum is used as an objective function to train the model, thereby reducing the distribution difference of domain feature transfer and increasing the distance between learning feature classes. Experimental results show that this method uses VLRC data to identify the health status of VRRC, and the fault recognition rate can reach 98.32%. Compared with existing methods, this method has higher diagnostic accuracy, which proves the effectiveness of this method.


Introduction
Reciprocating compressors are important equipment in the oil and gas industry. Once failures cannot be detected and eliminated in time, they will cause huge losses to enterprises [1][2][3][4][5][6][7]. e valve is an important part of the reciprocating compressor. e literature shows that 60% of the reciprocating compressors are gas valve failures. e number of shutdowns caused by the gas valve failure accounts for 36% and accounts for 50% of the total maintenance cost [8][9][10].
erefore, it is of great significance to quickly find and diagnose the fault type of the air valve for the fault diagnosis of the reciprocating compressor.
In recent years, deep learning has been widely used in machinery fault diagnosis and prediction because it has the advantages of overcoming the complexity of the traditional learning model network, high accuracy, and difficulty in overfitting [11]. Convolutional neural network is one of the classic algorithms for deep learning. Its weight sharing principle of convolutional layer greatly reduces the number of free parameters of the training network and effectively reduces the complexity of the network. Not only that, the convolutional layer and pooling layer of the convolutional neural network also have translation-invariant characteristics, which makes it more accurate for fault feature extraction.
ese characteristics of convolutional neural networks have attracted a large number of scholars to study their theories and applications. Some experts have also tried to use convolutional neural networks to solve the problem of fault diagnosis. Yang et al. [12] used three sensors to collect vibration signals in case of valve failure of reciprocating compressor. It is directly used as the input of the convolutional neural network and makes full use of its feature of automatic feature extraction. en, the fault diagnosis is carried out, and the higher fault identification rate is obtained. Ince et al. [13] used the one-dimensional motor current signal as the input of a convolutional neural network and then diagnosed the fault of the motor. e fault recognition rate is as high as 97.8%.
Although deep learning has achieved some results in fault diagnosis of reciprocating compressor valves, the good performance of these diagnostic models is due to the availability of massive tag data. For the valve in real-case reciprocating compressor (VRRC), it is difficult to collect enough labeled data. Since reciprocating compressors are mostly in a normal state and rarely fail during operation, failure data are more difficult to collect than normal data. Not only that, the occurrence of VRRC failure is unknown, and it is unrealistic to stop the machine frequently and check the health status of the valve according to the data. In addition, as the data grow, it is inappropriate to manually label each data. For the above reasons, there is not enough labeled data in practice to train a reliable diagnostic model through supervised learning. erefore, directly applying the model trained in the laboratory to actual practice will result in wrong recognition.
Transfer learning is an effective method to solve this kind of problem. Its main goal is to promote a large number of models trained with labeled data from related fields to unlabeled data in the target field so as to improve the performance of the model in the target field [14][15][16][17][18][19][20][21]. At present, transfer learning methods have achieved remarkable results in many fields of visual recognition and have also received extensive attention from scholars in the field of mechanical fault diagnosis. Yang et al. [22] proposed an intelligent fault diagnosis method based on feature transfer, using diagnostic knowledge of bearing data used in laboratory machines to identify the health status of bearings in practical applications. e results show that the method can effectively learn the transferable features and make up the difference between the laboratory bearing data and the actual bearing data. Wen et al. [23] used three-layer sparse self-encoding to extract the spectral data characteristics of the bearing under different operating conditions. en, the maximum distribution difference between the source domain and the target domain is minimized to train the model. e experimental results show that the fault prediction accuracy of this method is as high as 99.82%. Chen et al. [24] proposed a transferable convolutional neural network to improve target task learning. Using the transfer learning strategy, the source data are pretrained on the network to train the target task. Finally, the effectiveness of the model is verified through four cases, and the diagnosis accuracy rate can be up to 99.9%. Wen et al. [25] proposed a TCNN model combining ResNet-50 model and transfer learning, which converts time-domain signals into RGB images as model input and uses the trained ResNet-50 model for feature extraction and classification. Finally, it is verified through three datasets, and the prediction accuracy can be up to 99.99%. is paper draws on the successful application of transfer learning in mechanical fault diagnosis and proposes a reciprocating compressor valve fault diagnosis model based on transfer learning convolutional neural network (TCNN). It uses the diagnostic knowledge of the valves in the laboratory reciprocating compressor (VLRC) to identify the health of the VRRC. is method uses a convolutional neural network to extract the transferable features of the temperature and pressure data of VLRC and VRRC gas entering and exiting and calculates the maximum mean difference (MMD) of the transferable features extracted by the two. In addition, pseudolabels are set for unlabeled VRRC data and trained together with VLRC data. e weighted sum of the three regularization terms, the MMD of the transferable features of VLRC and VRRC data, the error between the VLRC sample label prediction and the actual label, and the error between the VRRC sample label prediction and the pseudo label, is used as the objective function. e TCNN model training is completed by minimizing this objective function. e contributions of this paper are as follows.
(1) A TCNN model that uses VLRC diagnostic knowledge to identify VRRC health status is proposed. is method can transfer the diagnosis knowledge of adjacent fields to the target field and solve the problem that the data of VRRC are not enough to train a reliable diagnosis model. (2) In order to complete the TCNN model training, three regularization items are proposed to limit the model learning and improve the recognition rate of VRRC health status.

Transfer
Learning. At present, in the deep learningbased reciprocating compressor valve fault diagnosis learning task, the dataset used for model training and the test set for testing the effect of the model belong to the same feature space and the same distribution and require a large amount of data. However, in practice, the amount of data related to the failure of the valve of the reciprocating compressor is small, and the training of the fault diagnosis model cannot be completed. In order to solve this kind of problem, this paper proposes the machine learning technology of transfer learning. As the name implies, transfer learning is to transfer the parameters of the trained model to the new model, which can make the second task modeling progress faster or improve its performance [26,27].
As shown in Figure 1, the the source domain (D s ) and the target domain (D t ) are two basic domains in transfer learning, if the sample space X s ∈ D s and X t ∈ D t , then the dataset extracted from the sample space can be expressed as X s � x s1 , x s2 , . . . , x sn and X t � x t1 , x t2 , . . . , x tm and the label corresponding to the dataset can be expressed as Y s � y s1 , y s2 , . . . , y sn and Y t � y t1 , y t2 , . . . , y tm . is paper mainly studies intelligent fault diagnosis based on the transfer learning of VLRC data to VRRC data. Assuming that the two datasets collected are subject to the edge probability distributions P(x s ) and Q(x t ), then the source domain can be expressed as D s � X s , P(x s ) , where X s � x si , y si n i�1 X s represents n labeled data from the laboratory. e target domain can be expressed as D t � X t , Q(x t ) , where X t � x ti m i�1 and X t represents the m unlabeled data from the actual.

Introduction to MMD.
e maximum mean difference (MMD) is used to determine the distribution similarity between two datasets. If the datasets X � x i n i�1 and Y � y i m j�1 follow the probability distribution of p and q, respectively, then the MMD between the datasets X and Y can be expressed as [28] (1) e set of all functions f(·) whose data are mapped to the set R of real numbers in the eigenspace is denoted by F. sup represents the upper bound, which is the minimum upper bound. According to equation (1), the MMD empirical estimate of the datasets X and Y can be expressed as It can be seen from equation (2) that MMD is 0 if and only if p and q are the same distribution, so F is required to have a strong universality. Not only that, as the dataset increases, in order to accelerate the convergence of MMD empirical estimates, F must be restricted. In order to solve the above two problems, the literature proposes that F is the best state when it is reproducing kernel Hilbert space (RKHS) because it can be expressed by the dot product in space f(·) ⟶ f(x) mapping, that is: where ϕ represents the mapping of x ⟶ H and ϕ(x) depends on the value of and μ q to represent E q [ϕ(y)]. en, bring it into equation (1) to get the following derivation: en, square the two sides of equation (4) to get e dot product in equation (5) is calculated by the kernel function k(x, x ′ ). e radial basis function is usually used as follows: In summary, the available MMD empirical estimates based on kernel average embedding are as follows: 2.3. Convolutional Neural Network Structure. Usually, the convolutional neural network consists of three parts: convolutional layer, pooling layer, and fully connected layer. Its biggest advantage lies in the weight sharing principle of the convolutional layer and the invariant nature of input translation. Figure 2 shows a typical CNN topology. is CNN architecture includes two convolutional layers and two pooling layers (C 1 , P 1 , C 2 , P 2 ), a tile layer (F 1 ), and three fully connected layers (F 1 , F 2 , output); the input data extract local features through the convolutional layer and the pooling layer, then combine into more abstract features in the fully connected layer, and finally classify the features through the classifier. e following further introduces the basic structure of convolutional neural networks.

Convolutional Layer.
e convolution layer consists of a set of convolution kernels and the deviation of each feature map. Each convolution corresponds to the extraction of a feature, but the extraction of feature information by each convolution kernel is limited. erefore, multiple convolution kernels are generally used for feature extraction. Assuming that the input of the convolutional neural network is X, H l represents the feature maps of the lth layer maps(H 0 � X), and the jth feature map H l j of the lth convolutional layer can be calculated by the following formula: where w (l) ij represents the weight matrix connecting the ith feature map of the l-1 layer and the jth feature map of the l layer and k represents the number of feature maps of the l − 1 layer. i, j are the indexes of the input and output feature maps. b l j represents the offset corresponding to each feature map of the lth layer. f(·) is an activation function. Since the ReLU function has been proven to accelerate convergence and ease the disappearance of gradients in most classification tasks, this paper uses ReLU as the activation function. e formula is as follows:

Pooling
Layer. e convolutional layer is followed by the pooling layer also known as the downsampling layer. Its function is to reduce the size and parameters of each feature map of the previous layer, reduce the data dimension, and achieve spatial invariance. Suppose H l j is the jth feature map of the lth pooling layer, and its calculation formula is as follows: where β l j and b l j indicate that each output feature map corresponds to its own multiplicative bias and additive bias and down(·) represents the pooling function. Common pooling functions include average pooling and maximum pooling. Because the maximum pooling function can select better features and can lead to faster convergence, this paper chooses the maximum pooling as the pooling function. is one-dimensional vector is the output of the fully connected layer F 1 . e output of the F 2 layer can be calculated by the following formula: where H is an expanded one-dimensional vector and w (F 2 ) ij and b F 2 j represent the weight and offset of each feature in the F 2 layer. After the output of the F 2 layer, a softmax function is added to classify the features, and the multiclass cross-entropy loss function is used to measure the classification results. Assuming that p(x) represents the target class probability distribution and q(x) represents the predicted probability distribution, the cross-entropy loss function formula for p(x) and q(x) is Since the Adam algorithm [29] has the advantages of designing independent adaptive learning rates for different parameters and a small number of adjustment parameters, the Adam algorithm is selected to optimize the gradient descent method to update the weights and deviations to minimize the loss function.

Establishment of TCNN Model
e structure of the TCNN model proposed in this paper is shown in Figure 3.
e model uses a one-dimensional convolutional neural network to extract the transferable features from the temperature and pressure data in the source and target domains and then reduces the distribution difference between the transfer features through domain adaptation. Finally, CNN training is performed by creating pseudolabels for unlabeled data in the target domain. Because the label-free data in the target domain cannot be used for training from the F 2 layer to the softmax layer, this article refers to the creation of pseudolabels [30] for training. e pseudolabel of a sample is to extract a label with a larger prediction probability, assuming it is a true label. e creation of pseudolabels requires not only the conversion of pseudolabels but also the output of the F 2 layer to predict the probability distribution of sample labels in the target domain through the softmax function.
erefore, the creation of pseudotags can be calculated by the following formula: Target domain sample data · · · · · · · · · · · · · · · · · · · · · · · · Input Mathematical Problems in Engineering y ti � y tj , y tj , . . . , y tj , where y i represents the pseudolabel of the ith target domain sample data and h ti represents the label prediction of the ith target domain sample data through the softmax function. e TCNN model is trained by minimizing three regularization terms, which are the error between the predicted label and the actual label of the source domain sample data, the error between the predicted label and the pseudo label of the target domain sample data, and the distribution difference MMD between the migratable features of the source domain sample data and the target domain sample data. Finally, the TCNN model is trained by the following formula: e specific TCNN model diagnosis process is shown in Figure 4. Firstly, according to formulas (8), (10), and (11), feature extraction is carried out for two kinds of sample data in source domain VLRC and target domain VRRC, respectively. en, then label prediction is performed by the softmax function, and formula (7) is used to calculate the distribution difference MMD of the removable features extracted from the sample data of source domain and target domain. Based on the prediction results of VLRC in the source domain, formula (12) is used to calculate the prediction error Loss1 for sample data in the source domain. In combination with the prediction results of target domain VRRC, the corresponding pseudolabel is established for the target domain sample data through equation (13), and then the prediction error Loss2 of target domain sample data with pseudolabel is calculated through equation (12). e weighted sum of the three regularization terms, MMD, Loss1, and Loss2, is taken as the objective function, namely, equation (14). If the target function reaches the set value, the training will end; otherwise, the CNN model parameters are corrected and retrained by the gradient descent method optimized by the Adam algorithm. Input the VRRC data into the trained TCNN model to get the expected diagnosis result.

Dataset Description.
In this experiment, the diagnosis knowledge of valve in laboratory reciprocating compressor (VLRC) is used to identify the health status of valve in realcase reciprocating compressor (VRRC). In this paper, VLRC adopts the valve on the five-stage reciprocating compressor in the laboratory as the test object, while VRRC selects the valve on the five-stage reciprocating compressor in a field filling station as the test object. e inlet and outlet temperature and pressure data of the five-stage reciprocating compressor valve (IODM-115-5-3-16) in the laboratory are obtained by temperature sensors and pressure sensors placed in the inlet and outlet pipeline of each stage cylinder. e installation position of each sensor is shown in Figure 5. In the figure, T i1 and P i1 represent the temperature and pressure of the gas entering the cylinder, T i2 and P i2 represent the temperature and pressure of the gas discharged from the cylinder, and i represents the cylinder of which stage. Because the temperature range of inlet and outlet gas is between 0°∼135°and the pressure of inlet and outlet gas is between 0.125 MPa-25 MPa, the temperature sensor of BRW600-400 and the pressure sensor of UNIK-5000 are selected to detect the temperature and pressure of the inlet and outlet gas of each cylinder. At the motor speed of 1000r/min, six health states of normal valve (N), first-stage valve leak (VL1), secondary-stage valve leak (VL2), three-stage valve leak (VL3), four-stage valve leak (VL4), and five-stage valve leak (VL5) are simulated, respectively. e data acquisition system collects temperature and pressure data at the sampling frequency of 1 Hz and 2 kHz, respectively. As shown in dataset A in Table 1, it contains 2400 samples, and 20 numbers of each sample represent the data obtained by 20 sensors, respectively. e motor speed of the five-stage reciprocating compressor (IODM 70-5-4R) in the on-site filling station is 1480r/min. e data of gas temperature and pressure in and out of the cylinder of each stage were obtained from daily operating records. Data are collected when the valve is in 6 health states (N, VL1, VL2, VL3, VL4, and VL5), respectively. As shown in dataset B in Table 1, it contains 2400 samples, with 20 samples each representing the inlet and outlet temperature and pressure data of the 20 cylinders in each stage of the five-stage compressor. e difference between data A and data B is that the field reciprocating compressor operates in a complex environment (influence of temperature, vibration, humidity, etc.) and the motors of the two compressors operate at different speeds. is will result in a number of deviations from the measured data of the two samples, such as the VRRC exhaust temperature per stage being higher than the VLRC exhaust temperature per stage. However, the characteristics of valve faults are similar, so according to Table 1, a transfer learning task can be created: data A ⟶ B. Data A is regarded as the source domain data providing diagnostic knowledge, and data B is regarded as the target domain data. e goal of this transfer learning is to classify the samples in data B as accurately as possible. According to the data types in Table 1, the detailed model parameters of CNN are shown in Table 2.

Experimental Results and Discussion.
When verifying the effectiveness of the model, it was found that the selection of the trade-off parameters α and β in equation (14) seriously affected the diagnosis results of the TCNN model, so further research on the choice of the two parameters is needed. Since the pseudolabel of the target domain is established to complete the training of the model, the key to improve the diagnostic accuracy is to calculate the difference in the distribution of the transferable features extracted from the source domain and the target domain. erefore, the weight parameter α is selected from 0, 0.01, 0.05, 0.1, 0.5, 1 { }, and the weight parameter β is selected from 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50 { }. e results obtained by averaging ten times of experiments are shown in Figure 6. It can be seen that when α � 0.01 and β � 5, the TCNN model has the highest fault diagnosis accuracy for data B. erefore, α and β in equation (14) are set to 0.01 and 5, respectively. Figure 7 shows the error curve Figure 7 (a) and accuracy rate curve Figure 7 (b) of the TCNN model during training. e model in curve (a) reaches the red line where the error is set to 0.001 after training for 260 times. Curve (b) also gradually approaches 100% as the number of training increases and reaches an accuracy rate of 99.18% when training reaches 260 times.
In order to verify the effectiveness of this method, the transfer results and transfer performance of the BP, TCA [31], DAFD [32], and CNN methods are compared. TCA is one of the classic methods of data distribution adaptive in transfer learning. is method uses the data of the source domain and the target domain after dimensionality reduction to learn and train. DAFD is an intelligent diagnosis model based on deep transfer learning. e difference from

Mathematical Problems in Engineering
TCNN mentioned in this article is that it does not use pseudolabels for learning and the objective function regularization weighting parameter settings are different. Figure 8 shows the fault diagnosis accuracy rate of each method for data B after 10 tests. It can be seen that the diagnosis accuracy rate of BP is mostly between 63% and 67%, and the diagnosis accuracy rate of TCA is mostly between 68% and 73%. e diagnostic accuracy rate of CNN is mostly between 73% and 78%, the diagnostic accuracy rate of DAFD is mostly between 87% and 89%, and the diagnostic accuracy rate of TCNN is mostly between 97% and 99%.     show that the TCNN model trains the model by taking the weighted sum of the three regularization items as the objective function, which improves the average diagnostic accuracy of transfer learning.
Transfer ratio (TR) and transfer loss measure the transfer performance of TCNN and CNN [33]. e calculation formula of transfer ratio is as follows: where n represents n migration tasks; err(S i , T i ) represents the migration error, that is, the error obtained by the model trained on the source domain S using the target domain T test; and err b (T i , T i ) represents the baseline error in the domain, that is, the error obtained by training a baseline model on the target domain and then using the same domain data for testing. Transfer loss is the difference between migration error and baseline error in the domain. In this paper, CNN is used as the baseline model for metric calculation. e distribution ratio between the training set and the test set is 7 : 3 [34]. After 10 trials, the average classification accuracy of the baseline model on data B is 99.44%. erefore, the baseline error in the domain err b (T i , T i ) � 0.56%. Table 4 shows the migration loss and migration rate of each method calculated for data A ⟶ B.
e transfer loss of TCNN is 1.12%, which is much lower than other methods. Not only that, the transfer ratio of the proposed TCNN method is 0.988, which is the highest among these five methods. Table 4 shows the calculated transfer loss and transfer ratio of BP, TCA, CNN, DAFD, and TCNN for data A ⟶ B. e two results show that the proposed TCNN has better transfer performance than other methods.
In order to understand the feature extraction of the transfer learning process more intuitively, t-distributed stochastic neighbor embedding (t-SNE) algorithm is introduced [35]. e algorithm can reduce the dimensionality of high-latitude learning features and draw the distribution of learning features after dimensionality reduction on a low-dimensional graph. Figure 9 shows a visualization of the transferable features during the learning process of CNN and TCNN. According to the results shown in Figure 9(a), there are serious distribution differences in the transferable features learned by the CNN method. Not only that, the CNN has a poor classification effect on the data features of the target domain, and the small class spacing between each feature cannot be very good. It is good to distinguish the types of faults, so it is impossible to accurately classify the target domain data when performing model training only on the source domain data. Figure 9(b) shows the transferable features learned by the TCNN method mentioned in this paper. It not only reduces the distribution difference between the transferable features but also enlarges the class distance between the learning features of the target domain data, making the fault types easy to be distinguished. erefore, the TCNN method can accurately classify the samples in the target domain data. e results show that compared with the CNN method, the TCNN method has better

Conclusion
In this paper, a fault diagnosis method for reciprocating compressor valve based on transfer learning convolutional neural network (TCNN) is proposed. In this method, the feature extraction function of CNN is used to extract the transferable features of the temperature and pressure data of the gas entering and leaving the VLRC and VRRC. en, the weighted sum of the three regular terms is used as the objective function to constrain the parameter set of the TCNN model so that the learned transferable features are easy to identify and classify. We use the laboratory five-stage reciprocating compressor valve fault diagnosis knowledge to identify the health status of the five-stage reciprocating compressor valve of an on-site gas station. e fault recognition accuracy rate of TCNN can reach 98.32%, and the transfer ratio can reach 0.988. e fault recognition accuracy rate of the DAFD method is 88.08%, and the transfer ratio is 0.885. e fault recognition accuracy of the CNN method is 74.8%, and the transfer ratio is 0.752. e fault recognition accuracy rate of the TCA method is 69.92%, and the transfer ratio is 0.703. e fault recognition accuracy of the BP method is 64.08%, and the transfer ratio is 0.644. e results show that the TCNN method has higher classification accuracy and better transfer performance. We solved the problem that the actual reciprocating compressor valve does not have enough labeled data to train a reliable model. In the future, we will conduct extensive research on other failure types of reciprocating compressor valves and strive to make the method proposed in this paper more reliable in practical applications.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.