Remaining useful life prediction with insufficient degradation data based on deep learning approach

Accurate prediction of the remaining useful life (RUL) is extremely valuable for decision-making in condition-based maintenance for preventing catastrophic field failure. For degradation-failed products, the data of performance deterioration process plays a major role in RUL estimating. The methods of RUL estimation can be divided into three categories: 1) method based on failure mechanism analysis [9, 22], 2) method based on data-driven approach, and 3) hybrid method that combines the first two. The key point of RUL prediction using the first method is to fully understand the degradation mechanism of the target equipment. Prior knowledge in the target field is indispensable when establishing a mathematical model of the degradation process. However, as the complexity of the equipment increases and automation advances, obtaining complete knowledge of the degradation mechanism becomes difficult [7, 14]. The aircraft turbine engine data set of the National Aeronautics and Space Administration (NASA) was built from more than ten sensors. These data should be analyzed together to reveal the health indicators of the turbine engine. Different from the method based on failure mechanism analysis, the data-driven approach does not require researchers to have a comprehensive understanding of the target equipment [12, 23]. After collecting sufficient degradation data from sensors, researchers could constructs a nonlinear mapping between degradation data and the real equipment health indicators, and meanwhile solves the dynamic dependency problems [8, 28]. This nonlinear mapping network can be used to predict the RUL of the equipment used on site. Data-driven methods, especially the deep learning approach have developed substantially in recent years [3, 6, 16–18, 24]. Considering the problem of weak dependence of time-series information, Zhu [36] combined the information of the previous convolutional layer with the current layer and proposed a multiscale convolutional neural network (CNN) for RUL prediction. The long-range dependence problem exists in many studies on time-series data. Li [11] selected the long short-term memory network (LSTM) and CNN as the base model to build the RUL prediction model. LSTM can save past information for the current network parameter update and CNN has a Remaining useful life (RUL) prediction plays a crucial role in decision-making in conditionbased maintenance for preventing catastrophic field failure. For degradation-failed products, the data of performance deterioration process are the key for lifetime estimation. Deep learning has been proved to have excellent performance in RUL prediction given that the degradation data are sufficiently large. However, in some applications, the degradation data are insufficient, under which how to improve the prediction accuracy is yet a challenging problem. To tackle such a challenge, we propose a novel deep learning-based RUL prediction framework by amplifying the degradation dataset. Specifically, we leverage the cycle-consistent generative adversarial network to generate the synthetic data, based on which the original degradation dataset is amplified so that the data characteristics hidden in the sample space could be captured. Moreover, the sliding time window strategy and deep bidirectional long short-term memory network are employed to complete the RUL prediction framework. We show the effectiveness of the proposed method by running it on the turbine engine data set from the National Aeronautics and Space Administration. The comparative experiments show that our method outperforms a case without the use of the synthetically generated data. Highlights Abstract


Introduction
Accurate prediction of the remaining useful life (RUL) is extremely valuable for decision-making in condition-based maintenance for preventing catastrophic field failure. For degradation-failed products, the data of performance deterioration process plays a major role in RUL estimating. The methods of RUL estimation can be divided into three categories: 1) method based on failure mechanism analysis [9,22], 2) method based on data-driven approach, and 3) hybrid method that combines the first two. The key point of RUL prediction using the first method is to fully understand the degradation mechanism of the target equipment. Prior knowledge in the target field is indispensable when establishing a mathematical model of the degradation process. However, as the complexity of the equipment increases and automation advances, obtaining complete knowledge of the degradation mechanism becomes difficult [7,14]. The aircraft turbine engine data set of the National Aeronautics and Space Administration (NASA) was built from more than ten sensors. These data should be analyzed together to reveal the health indicators of the turbine engine. Different from the method based on failure mechanism analysis, the data-driven approach does not require researchers to have a comprehensive understanding of the target equipment [12,23]. After collecting sufficient degradation data from sensors, researchers could constructs a nonlinear mapping between degradation data and the real equipment health indicators, and meanwhile solves the dynamic dependency problems [8,28]. This nonlinear mapping network can be used to predict the RUL of the equipment used on site.
Data-driven methods, especially the deep learning approach have developed substantially in recent years [3,6,[16][17][18]24]. Considering the problem of weak dependence of time-series information, Zhu [36] combined the information of the previous convolutional layer with the current layer and proposed a multiscale convolutional neural network (CNN) for RUL prediction. The long-range dependence problem exists in many studies on time-series data. Li [11] selected the long short-term memory network (LSTM) and CNN as the base model to build the RUL prediction model. LSTM can save past information for the current network parameter update and CNN has a Remaining useful life (RUL) prediction plays a crucial role in decision-making in conditionbased maintenance for preventing catastrophic field failure. For degradation-failed products, the data of performance deterioration process are the key for lifetime estimation. Deep learning has been proved to have excellent performance in RUL prediction given that the degradation data are sufficiently large. However, in some applications, the degradation data are insufficient, under which how to improve the prediction accuracy is yet a challenging problem. To tackle such a challenge, we propose a novel deep learning-based RUL prediction framework by amplifying the degradation dataset. Specifically, we leverage the cycle-consistent generative adversarial network to generate the synthetic data, based on which the original degradation dataset is amplified so that the data characteristics hidden in the sample space could be captured. Moreover, the sliding time window strategy and deep bidirectional long short-term memory network are employed to complete the RUL prediction framework. We show the effectiveness of the proposed method by running it on the turbine engine data set from the National Aeronautics and Space Administration. The comparative experiments show that our method outperforms a case without the use of the synthetically generated data.
A data amplification network based on cycleGAN • is designed to effectively increase the size of the degradation dataset.
A RUL prediction framework is constructed with • the sliding time window strategy and BiLSTM network.
Experimental results show the RUL prediction • performance has been significantly improved by the proposed data amplification approach.
strong ability in local feature extraction. The combination of the two improves the accuracy of the prediction network. Group method of data handling-type neural network (GMDH) can self-organize and generate the optimal network structure based on the training data [22]. Ge [4] generates three GMDH networks through different division of training data, and integrates the results of the three GMDH networks with a three-layer back propagation (BP) neural network to solve the disadvantage of local optimum of GMDH and improve the generalization ability. A.Ragab [19] developed a data-driven prognostic methodology using both the age and condition monitoring data as inputs, which can deal with any number of condition indicators. Under different test conditions, different workloads, environmental conditions and noise levels may lead to different distribution of training set and test set. To solve this problem, Wen [26] used domain-adversarial neural network(DANN) and proposed a data-driven framework with domain adaptability using a bidirectional gated recurrent unit (BGRU). This method can effectively reduce the impact on the performance of RUL prediction due to the different distribution of training data and testing data. Deep learning methods are adopted to address the RUL prediction issue of a specific field, such as bearings [20,36], lithium-ion batteries [32,34], lathe tool wear [37], and nuclear systems [38]. Nevertheless, the estimation effect of these mentioned methods is highly dependent on the capacity of the degradation data set. That means the scale of the dataset available in model training phase has a great influence on the RUL prediction accuracy [13]. Abdulraheem [1] explored the effect of the dataset size on prediction results under supervised learning techniques, their findings showed that the model with the largest dataset had the best prediction effect under three datasets listed as dataset size of 400, 800, and 1200. The larger the dataset is, the better is the model established. However, in many actual industrial production practices, obtaining a largescale dataset is not realistic due to the longer degradation time and high cost of collecting degradation data. The XJTUSY rolling bearings dataset mentioned by Wang [25] only collected the complete life cycle of 15 bearings (type LDK UER204), the entire life cycle is only 42h and 18min. Many restrictions on obtaining large-scale degradation data restrict the further development of deep learning data-driven methods in RUL prediction. Moreover, for those newly emerging equipment, there is also a lack of degradation data. Under these scenarios, the RUL prediction performance will be severely affected. Hence, how to improve the prediction accuracy with insufficient degradation data is yet a challenging task.
In the case of insufficient degradation data, the low accuracy of RUL prediction is mainly caused by the low sample diversity, which can be effectively improved by data augmentation [29]. Generative adversarial network (GAN) is a common data augmentation strategy, which can capture the characteristics hidden in the sample space and enrich the diversity of samples [30]. Yoon [31] applied the GAN to the task of generating medical data and produced a patient electronic health dataset containing discrete time series data. In the sequence data generation task, Li [10] utilized GAN to capture the temporal correlation of time series distributions, the generator and discriminator inside the GAN adopt the LSTM network as the basic network, which is friendly to time-series data. Subsequently, Xie [27] generated bearing datasets for various working conditions based on the cycle-consistent generative adversarial network (CycleGAN) framework and its GAN discriminator was trained for fault diagnosis.
Based on the above research, this study developed a complete framework to improve the RUL prediction performance when degradation data is insufficient. Four steps are involved in this framework. Firstly, constructing a data amplification model using the LSTM network which is also as the Generator inside the CycleGAN and mining the inherent distribution of existing degradation data samples of a machine. Second, a data preprocessing strategy is designed for timeseries degradation data before they are sent to the augmentation network. Third, the obtained amplified data are preprocessed using sliding time window method and their labels for prediction model training are obtained. Finally, a data-driven method is built with amplified data for RUL prediction. The contributions of this study are summarized as follows: Proposed an amplification network for generating time series • degradation data based on CycleGAN; this method uses a small amount of data to train CycleGAN and uses the designed generator based on the LSTM network for data amplification without excessive prior knowledge of the data. Designed a data preprocessing strategy to resize the time-series • degradation data before they are sent to the designed amplification network. Constructed a data-driven RUL prediction model and integrated • the above work into a complete set of RUL prediction methods, which is suitable for the degradation data of time-series. Compared the performance differences between RUL prediction • models trained with amplified data obtained from various amounts of degradation data. The rest of this paper is organized as follows. Theoretical foundation of the CycleGAN is introduced in section II. Proposed an amplification network based on LSTM and related theory of data preprocessing strategy and RUL prediction model constructed are introduced in section III. An experiment is introduced in section IV. The conclusions are summarized in section V.

Theoretical Foundation
CycleGAN is a type of unsupervised learning generative network that was designed to solve the problem of image-to-image translation in the field of vision and graphics by learning the mapping between a set of aligned image pairs from source domain to target domain. The key to achieve this function is an adversarial structure composed of two networks called generator and discriminator. The generator captures the distribution of the true image and constructs a fake one, and the discriminator estimates the probability that the image came from the true image rather than the generator. Ideally, the discriminator's recognition success rate should be approximately equal to 0.5, which means that the discriminator cannot distinguish whether the test image is real or generated, that is, the generator obtained the true mapping between image pairs. To ensure improved learning efficiency, we built a cycle-consistent structure from two directions. Two generators and two discriminators are used in each direction; one of the generators is used to transform the data from domainA to domainA , and the other generator aims to reconstruct the generated data back to domainA . The structure is shown in Figure 1.
, , , obtained from the generator G B→A are distinguished with the real data X of domainA via discriminator D A . Value function is shown in Formula 1. To simplify the function, we define G A→B as G and G B→A as F.
In the process of optimizing this value function, the distribution of the data generated by the generator G is updated close to domainB , and the discriminator D Y distinguishes the generated data from the real data. The value function aims to minimize the generation error of G, and maximize the recognition success rate of Y D . Similarly, we can obtain the value function of another generator F and discriminator D X : Combining both two parts shown above can obtain a cycle-consistency loss: The value function is shown as Formula 4:

Methodology
In the task of RUL prediction with data-driven approach, the actual effect of the model is largely determined by the data size. Insufficient run-to-failure degradation data are the key to limit the reliability of the prediction model. This work focuses on how to mine potential data distribution information from limited samples and improve the effect of the RUL prediction model using deep learning technology.
In the model our primary hypothesis is that the time series degradation data used to construct RUL predictions are scarce. If the deep learning method is directly used to summarize the degradation features from the limited degradation data and perform RUL prediction, then the prediction effect will not be as good as expected. We proposed a method that consists of three parts. The first part is an amplification network designed by the LSTM network, which can mine the data distribution information from known samples to expanding sample size [15]. The second is a designed data preprocessing strategy. Owing to the time-dependent dynamic characteristics of the degradation data, the sliding time window strategy is used to fix the dynamic degradation information of the data and adjust the size of the degradation data before sending them to the amplification network to improve the network processing efficiency. The third is described as follows: using the amplified data obtained from the first part to construct a prediction network mainly based on bidirectional long short-term memory (BiLSTM); in the training process, the cyclic neural structure in BIL-STM can effectively solve the problem of long-range dependence in time series, obtain the optimal parameters of the model through the backpropagation algorithm, and construct an RUL prediction network to predict the samples.

Data Amplification Network Based on CycleGAN
In CycleGAN, using the data of two different domains, the generator can make the mutual conversion of the data from the two domains through the adversarial with the discriminator. To obtain the information of the sparse degradation data in our hypothesis, we replaced the data of the two domains with the degradation data of a single domain. Unlike the previous CycleGAN in which the two generators learned the distribution information from one domain, the scheme we proposed aims to learn from each other with scarce degradation data, and the trained generator is used to complete the generation of degradation data.
The generator based on the LSTM was designed as the amplification network. LSTM is a type of recurrent neural network whose structure contains units with functions such as forgetting and remembering; this network is suitable for processing time series data [35]. In actual situations, the degradation data of the device is usually strongly correlated with time and can be used to solve long-range dependence problems [21].
To establish a connection in the calculation unit cycle at each moment, three gate structures in LSTM was designed, namely, forget gate layer, input gate layer and output gate layer. These gate structures control the information flow at different times, and store short-term time-step dependent information for network parameter update, which alleviate the problem of gradient disappearance or gradient explosion of the classic neural network structure during backpropagation. The LSTM cell structure at time t is shown in Figure 2. The input of the current moment consists of the data from current moment input and the data from previous output, the input of the next moment is composed of the data from the current moment output and the data from the next moment input. The related formula is shown as follows: Forget gate layer: Input gate layer: Output gate layer: x is the input data at time t , t h is the output data at time t, C t represents the information flow participates in parameters updated throughout the entire training process.
As the degradation data is basically a continuous time series, we improved the output form of the LSTM network and fixed the input and output sizes of the generator network to be consistent to improve the spatial structure of the sequence to reduce the loss of degraded information. Specifically, the dimensions of the input and output should be consistent. We saved the output obtained from each t h of LSTM from timestep 1 to timestep m, which are used to form the final output from the network. The dimension of the output could be a series instead of a scale. The series can meet the requirements of the network for the input data with time dynamic characteristics. The schematic is shown in Figure 3. On the left is an input data with dimension n×s, where n represents the length of input data and s represents the dimension of sensors in input data. In the center is the generator with timesteps equal to m. On the right is the first output data with dimension m × s. The second output data are obtained with a dense operation at dimension n × s.
The dimension of the input data n s × is given by the task, where n represents the length of input data and s represents the dimension of sensors. The timestep of LSTM is about to set a larger number than the length of the input data. In Figure 3, timestep ts is set to m, where m > n. In the training process, the first line of the input data 1 s × is sent to the generator, the output of the generator with size of 1 timesteps × consists of values obtained from each timestep After all the input data are sent into the network, all the outputs are combined into a matrix of dimension n × m. Finally, a dense operation is performed to obtain an output data with size consistent the input data.
To ensure that the generated data are similar to the real data in distribution and avoid the difference of actual generated data that affects the characterization of degradation information, we add maximum mean difference (MMD) into the generator's loss function, which is shown as follows: where J(G) is the loss function of the generator, n is the number of samples, i y is the generated sample of i-th instance, and i y  is the target sample of i-th instance.
MMD was designed to measure the difference in data distribution by comparing the statistical information of the two sets of data and was used as a training objective functions for generating networks. In practice, the inner product between the two samples is replaced with the kernel calculation, and the MMD formula is as follows: The inner products are replaced by Gaussian kernel between two samples, and the formula is as follows: where σ is the bandwidth. We select a group of different σ, and the calculated MMD is averaged as the final value.
In the training process, we optimize the parameters of the generated model by gradient descent algorithm. The samples generated by the model further reduce the difference between the target samples and enable them to meet the task requirements.

Data Preprocessing Strategy for Amplification
The RUL of the degradation data for training under ideal conditions should be clear. However, even the same type of equipment has a various life cycle due to different qualities or operating environments. To accurately characterize the temporal dynamics of degradation data, we need a data preprocessing strategy before the degradation data with different life cycles is sent to the amplification network.
The strategy of processing data with inconsistent length of life span is as follows. We obtained the initial value of the rapid data degradation stage through statistical analysis. The initial value of the rapid degradation stage divides the degradation data into a normal stage and a rapid degradation stage. We retain the values of the rapid degradation stage. Then, the process of resizing the data occurs in the normal stage, because the value in the normal stage usually maintains a small range of changes and the significance of predicting the RUL in the normal stage is not as important in the rapid degradation stage.
Given time-series degradation data , s n X with size s n × , as shown in Formula 14, we obtain the output , s n X ′ that meets the requirements with size s n × ′ .
where s represents the number of data features and n represents the life span of the degradation data. We assume that the initial value of the rapid degradation stage obtained by the statistical analysis is m.
In the rapid degradation stage, the value is directly retained without any processing. In the normal stage, two types of resize data strategies are proposed as follows: If the current degradation data length is more than 1) n', then we remove the excess part directly to obtain the data that meets the requirements as follows: The size of the processed data is s n × ′ , where lnn =− ′ . The excess part is removed from the beginning.
If the length of the current degradation data is shorter than 2) n′ , then we design a data padding strategy. We calculate the average value of the same sensor data in the first time window as the padding data. The substituted x′ value for sensor s is expressed as Formula 16.
where tw L is the length of time window, and γ is a Gaussian noise in the range of , which are the maximum and minimum values of the data at sensor s in one time window. The processed data are shown as follows: where n 0 = l + n.

Data Degradation Strategy
The degradation data of the generated network should be processed into the same dimensions as the data during training. The time-series degradation data can be expressed as follows: where s is the number of data features; for instance, bearings data may have features such as vibration, rotation speed, and temperature. n represents the length of the data on the time scale, which can reflect the working time or service life of the data; this value is directly related to RUL.
All the real degradation data are sent into the CycleGAN for training the generator. The first batch of degradation data are sent into the trained generator to obtain the first batch of amplified data. The degradation data of the next batch is obtained from the amplified data of the previous batch, and the amplification is stopped until a predetermined amount of amplified data is obtained. To ensure that the amplified data retains more original degradation information during the iterative process, the number of iterative amplifications should not be excessive.

Sliding Time Window Strategy: 1)
For RUL prediction on timeseries degradation data, the problem of label identification needs to be solved. One of the intuitive and efficient methods is the sliding time window method [11,15,33].  where tw i X = is the ith window. The time window records a piece of information of the degradation data. For complete degradation data, we can obtain k pieces of degradation data and the RUL label of each segment in order.

Prediction Model: 2)
A non-linear mapping from data to labels is built by a data-driven method with sufficient labeled data. We use the deep BiLSTM network [5] to build a prediction model. The difference between LSTM and BiLSTM is that the latter increases the reverse transmission process of data information and contains more hidden layers. The structure of BiLSTM is shown in Figure 5. The final output t y of the bidirectional LSTM consists of three parts: input of the model, input of the forward propagation process, and input of the reverse propagation process: where w 1−6 represents network parameters, t x is the input in timestep t, t h is the value from the forward propagation process, ' t h is the value from reverse propagation process, and g is the activation function.
Owing to the flexibility and versatility of the BiLSTM, a deep network with a stronger non-linear fitting ability was obtained, which is beneficial for RUL prediction by stacking the BiLSTM into three layers. Under this framework, the architecture of a mapping between time window and RUL tag is established, as presented in Figure 6.

RUL prediction objective: 3)
The parameters in the prediction network are obtained through the back propagation through time (BPTT) algorithm and the given value function is shown as Formula 22. It's defined as the error between the model output and the label:

Algorithm Summary
Algorithm of data amplification and RUL prediction is summarized in Algorithm 1. The entire flowchart of data amplification and RUL prediction is shown in Figure 7.

Experiment
An experiment was conducted to validate that our proposed data amplification strategy can improve the prediction effect by datadriven methods when using insufficient training data. We selected the degradation data with the multi-sensor turbo aero engine dataset from NASA. This dataset contains the operational data of the complete life cycle of multiple turbo aero engines, and each engine contains multiple sensor data. The multi-sensor degradation data have higher requirements for the RUL prediction model and show the universality of our proposed methods.

Data Preprocessing and Analysis
The turbo-aero engine dataset is divided into four sub-datasets: FD001, FD002, FD003 and FD004. Differences only exist in operating conditions and failure modes, and no dependency exists among the sub-datasets. In this experiment, FD001 was selected as the experimental dataset. FD001 contains the complete degradation data of 100 turbine aero engines. the maximum life span is 362, which means that the entire working cycle of this turbine aero engine is 362. Details of the dataset are shown in Table I.
The sensors are located in all important parts of the turbine aero engine and record the possible parameters related to corresponding degradation indicators. Data from more sensors are considered to provide comprehensive information on engine degradation. Details are shown in Table II. A total of 21 sensors were used. Among them, 14 were related to the potential degradation mechanism during the entire degradation process; these sensors are numbered 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, and 21. In the stage of data preprocessing, to avoid any interference of useless information, we select the information of these 14 sensors as target data. Most of the equipment can be divided into normal stage and rapid degradation stage in its life cycle. For the purpose of RUL prediction, the prediction of RUL in the stage of rapid degradation is more important than the that under normal stage. According to the work of Babu [2], when 125 cycles remain, a clear degradation trend appears. The degradation failure threshold is set to 125 cycles, as shown in Figure 8. Since the rapid degradation trend under normal stage is not obvious, the data intercepted before entering the rapid degradation stage is used as the training data, and the length of the data is set to 160 cycles.
To prevent the increase of network training difficulty caused by different sensor numerical scales, we need the z-score normalization for all the training data. The formula is shown as follows: where i u is the mean value and σ i is the corresponding standard deviation.

Data Generated
The preprocessed data are sent to the amplification network as input. Different from the regression task, the customized generator is a single-layer LSTM network that prevents the output of the network from becoming highly abstract and affecting the expression of the details of the original degradation data.
The number of parameters in LSTM has a positive correlation with the complexity of the model. In this experiment, the number of parameters in LSTM is set to 160. To keep the input and output dimensions of the network consistent, a dense operation is conducted at the output of the network. For the discriminator, features with degraded information need to be extracted extensively, so a stacked three-layer LSTM network is applied, and the number of parameters in LSTM is set to 100.
The trained generator from CycleGAN is used to amplify the training data. The data selected in this experiment are all from FD001. We divide 100 data into three groups of 7: 2: 1 as training set, validation set, and test set. In the training set, we use different numbers of data (10,30,50, and 70) to train the generator and explore the effect of various amounts of degradation data on the experimental results. The generators are constructed from different amounts of training data to generate FD001 Unit 1. The obtained data are shown in Figure 9. For further explanation, we number the data shown in Figure 9.
As shown in Figure 9, 1 # indicates the original data, and 2 # , 3 # , 4 # , and 5 # are the degradation data from the generator trained from the original data with different numbers of scales. In the case of the Generator built from less training data, such as Figure 9(b), the model can still learn the approximate distribution of samples. We compared the MMD differences between them, and the results are shown in Table  III. Although these samples look similar from an intuitive point of view, they are not simply copied.
Furthermore, to find out the difference in the overall distribution of the generated degradation data, we compared the MMD between   Figure 9 and all original training data shown in Figure 10.
As the amount of data participating in training increases, the MMD between the generated and target samples is gradually reduced, which means that the generated samples and overall real samples are getting closer in distribution. Simultaneously, the trend of data degradation generated by the generator is more obvious, because the network summarizes the distribution of overall training data and provides the most common distribution. The generated degradation data are close to the real data in distribution. As the amount of data increases, the differ-  ence between the generated data and real data narrows, which is also in accordance with expectations.

C. RUL Prediction
To test whether the generated data can be used as training data to build a prediction network and explore the effect of our proposed model on training data of different sizes, we add the amplified data to the training data, and establish some prediction networks. To meet the requirements of controlled experiments, we built several sets of prediction models using real and generated data. The details are presented in the Table IV. To accurately measure the prediction effect of the model, we present the evaluation method of RUL for the multi-sensor turbo aero engine as follows: The score function from the dataset provider has a more practical significance. The penalty with smaller prediction deviations is small, but that for a larger prediction deviation is larger. The difference is shown in Figure 11. The training data are grouped as A # , B # ,C # ,D # , and used to construct the RUL prediction model. To reduce the influence of the er-ror on the experimental effect, each set of data builds a prediction model 10 times, and verifies the data on the test set. The average of the results is considered as the final results. The RMSE and scores are shown in Tables V and VI, and the results are plotted into the histogram in Figures 12 and 13. As shown in the Figure 12 and 13, the general method is reflecting on the blue histogram which is the result of a model built using the real data, and the proposed method is reflect on the orange histogram which is the result of a model built using not only the real data but also the generated data from the real data, what needs to be reminded is that both methods use the same predictive model, but the data used to build the model is different. When the MSE function is used to evaluate the test results, our proposed method achieves leading experimental results in all four groups of experiments. However, for the score function, the effect of groups A # and B # did not show obvious advantages, and the score of group B # is higher than that of group A # . In our analysis, the difference between the individual and test samples in the middle part of group B # is extremely large, and the score function is closer to the actual situation, resulting in the poor performance of the model built under the extremely small training data scale. When the real data increases, especially in the C # and D # groups, the proposed method performs better than the RUL prediction.
Intuitively, the RUL prediction in test set FD001 Unit 95 is shown in Figure 14  Various amounts of training data can also build prediction networks, but the prediction effect constructed by the mixed data composed of real data and generated data is better. By comparing the results of MSE, we find that the curve convergence is better than the model built from real data. On the other hand, the prediction model with more training data has better model prediction effect, especially at the end of the life cycle, where the accuracy of the prediction is improved.
In the MSE evaluation, the proposed method can also improve the prediction accuracy. It is not obvious in the score evaluation of samples 10 and 30, but in samples 50 and 70, the proposed method has higher prediction scores.

Applicability analysis
The method proposed in this study is suitable for devices with multiple sensors and degradation data presented in time series. On the premise of having a small number of run-to-failure degradation data, our proposed method shows good performance, when a small amount of data is obtained, the remaining useful life of the equipment can also be effectively predicted. In the case of having sufficient degradation data, the sample space of the degradation data is sufficiently complete, and the prediction model established on this basis already has good performance, our proposed method has limited improvement under such circumstances. In view of the fact that obtain large amount of degradation data in actual industrial production is still not ideal, our proposed method still has very important significance.

Conclusion
In this study, a framework for predicting the RUL with insufficient data was proposed, in which two main parts are involved. First, based on the characteristics of the sequence degradation data, an amplification network was designed using CycleGAN. Second, sliding time window strategy and deep BiLSTM network are jointly employed to construct the RUL prediction model based on the amplified degradation data. The following conclusions can be obtained: 1) Generating an adversarial network, as an unsupervised deep learning network, can indeed learn relevant information about data distribution. 2) The improved generated network based on LSTM can generate data with distribution similar to that of real data, and the RUL prediction network constructed using these amplified data has proved to be effective. 3) In the case where the RUL prediction accuracy is generally limited by the size of the training data, our proposed method provides a new reference for the development of RUL prediction. Some possible topics for future research include the follows.
In many applications, the test set and training set may come (1) from different test conditions, under which the equipment workloads, environmental condition and noise levels may vary. That may lead to different distribution of training set and test set. It would be interesting to improve the domain adaptability of our RUL prediction framework. Due to the variability of raw materials quantity and manu- (2) facturing accuracy, it is common to see that the degradation characteristics of individuals may show unit-to-unit variability. How to improve prediction accuracy considering individual characteristics deserves further investigation.