RAEF: An Imputation Framework Based on a Gated Regulator Autoencoder for Incomplete IIoT Time-Series Data

The number of intelligent applications available for IIoTenvironments is growing, but when the time-series data these applications rely on are incomplete, their performance suﬀers. Unfortunately, incomplete data are all too frequent to a phenomenon in the world of IIoT. A common workaround is to use imputation. However, the current methods are largely designed to reconstruct a single missing pattern, where a robust and ﬂexible imputation framework would be able to handle many diﬀerent missing patterns. Hence, the framework presented in this study, RAEF, is capable of processing multiple missing patterns. Based on a recurrent autoencoder, RAEF houses a novel neuron structure, called a gated regulator, which reduces the negative impact of diﬀerent missing patterns. In a comparison of the state-of-the-art time-series imputation frameworks at a range of diﬀerent missing rates, RAEF yielded fewer errors than all its counterparts.


Introduction
Today's IIoT sensors are capable of collecting an inordinate amount of data, and the applications built to process these data are allowing us to monitor, analyze, and understand how things in our physical world are changing over time [1]. However, to continue improving our capacity for time-series analysis, it is not enough to improve just the analysis methods with better context recognition [2], expanded service recommendations [3], improved anomaly detection [4], and so on. e quantity and quality of the time-series data also need to be improved. For the most part, improving data quality means making sure a data stream is comprehensive and complete. Scope tends to be the easier of these two to address-simply adding more and different types of sensors will get the job done. Unfortunately, completeness is often a more common and difficult issue to overcome [5]. Data can be incomplete due to noise, sensor malfunctions, equipment error, human error, incorrect measurements, and other unavoidable circumstances [6]. As such, almost every data stream produced by a sensor will be incomplete to some degree [7].
Being so common, there are several methods of dealing with incomplete data. e first is to install redundant sensors as backups. If one sensor fails to capture some data, the other may not. e main drawback with this solution is that two sensors cannot be in exactly the same place, nor do they tend to operate on exactly the same timing, so it can be difficult to align the temporal and spatial characteristics of the data [8]. Hence, a more common remedy has been some form of data manipulation: generally, either deletion or imputation [9].
Deletion is a simple and efficient answer when the amount of missing data is very small in comparison with the total. However, in applications that are very sensitive to time series, deleting a small number of records can be enough to destroy the coherence of a sequence and may seriously affect the correctness of the results. Further, most data analysis methods, especially machine-learning methods, require a complete set of time stamps and are not robust to missing data. In contrast, imputing missing data can reduce sensitivity and provide a complete set of time stamps. Hence, imputation has commanded the bulk of the research focus in recent decades [10,11]. e easiest methods of imputation simply replace the missing information with statistically reasonable values, such as means, modes, medians, or any predefined value [12]. However, while straightforward and convenient, the accuracy of such methods relies on the complexity of the samples. With small, basic samples, they work fine, but when the features become complex, these methods are not reliable. For example, imputing with multivariate data generally requires an algorithm based on clustering. Similar samples are grouped into the same cluster and then used to evaluate missing values group-by-group [13]. Rahman and Islam [14] rough fuzzy k-means algorithm is one example of this type of clustering imputation. Here, the researchers exploited fuzzy expectation-maximization and fuzzy clustering to build a missing value imputation framework for data preprocessing. Raja and Sasirekha [15] method was designed to handle missing values, while Zhao et al. [8] developed a local similarity imputation method that estimates missing data based on the stacked autoencoder (SAE) fast clustering algorithm and the top k-nearest neighbors. ere is no doubt that these clustering-based imputations methods yield excellent results. However, clustering an entire time series is hugely time-consuming to the extent that these approaches cannot keep pace with today's dramatic increases in data volume. Additionally, there comes a point when the data may be too incomplete for these methods to work with any level of accuracy.
In the IIoT paradigm, sensor data have special properties. For instance, multiple sensors are often used to record the same/similar measurements in many systems [16]. Sensors that are geographically close to each other tend to be highly correlated for certain periods of time [17]. is means that missing data can sometimes be imputed from the associated sensors, whether spatially or temporally. In these situations, modeling time series and then applying an imputation method such as smoothing or interpolation [18] can be a good choice.
Generally, smoothing or interpolation methods have a low computational overhead and are simple to implement, although they are not suitable for finding long-term correlations in time-series data. Machine-learning techniques can correlate features, which can improve imputation performance, such as generative adversarial models [19][20][21][22] and recurrent neural networks (RNNs). Among them, RNNs are known to be good at modeling time series, and for this reason, many hybrid-RNN methods have been developed. is is because vanilla RNNs estimate missing values from the data immediately preceding the gap. For instance, Kim et al. [23] devised an RNN model to impute missing medical examination data. e time series are modeled by RNNs, which compensate for the missing measurements and then predict future values. Minseok et al. [24], for example, developed an imputation framework called DeepIN based on this type of correlation information. DeepIN uses a deep network consisting of multiple LSTMs arranged according to the correlation information of each IIoT device. Ma et al.'s [25] LIME-RNN models incomplete time series (linear memory vector recurrent neural network). A learnable linear combination of previous history states means gradient information can be propagated efficiently. In this way, LIME-RNN can take full advantage of the previously observed information to reduce the negative impact of missing values. Alternatively, Li et al. [26] proposed a multi-view learning method for estimating missing values in time-series traffic data that combine RNNs and collaborative filtering techniques.
ere is a large body of papers on imputing with incomplete time series that assume any missing data from the current time step are the same as the previous time step [25,27,28] or that apply a decay mechanism to a hidden state to impute the missing data [29][30][31]. Yet, with RNNs, imputation performance suffers when the missing values become continuous. Further, the above imputation strategies can lead to instability during training, and with high missing rates, decay mechanisms will not find sufficient hidden information.
Another branch of investigation in the search to improve imputation performance is missing patterns. In this stream, Minseok et al. [24] compared the effects of missing continuity and discontinuity on imputation performance. Anindita et al. [32] and Tsai and Chang [33] considered the missing patterns of arbitrariness and monotonicity in medical data. Insuwan et al. [34] found that the rating data present a special missing pattern caused by user preference genres. Tak et al. [35] distinguished and contrasted the missing patterns in traffic data caused by prolonged physical damage to the sensors and measurement noise. However, to the best of our knowledge, no special missing patterns for IIoT environments have been proposed.
is study is an attempt to change that. As such, our contributions are as follows: (1) We propose a framework based on a recurrent autoencoder, called RAEF. e encoder turns an incomplete time series into vector representations of both local information and global information. e decoder then initializes using the global information, decoding the local information into complete timeseries data.
(2) As an alternative to decaying the hidden state, inside RAEF, a gated regulator focuses on discriminating between ground truth information and fictitious information. is mechanism is better able to reduce the negative impact of increased missing rates in different missing patterns. (3) In empirical evaluations in a real IIoT environment, RAEF proves to be effective. Additionally, comparisons between RAEF and several state-of-the-art frameworks demonstrate that RAEF results in fewer errors at each missing rate tested.
e remainder of this study is organized as follows. Section II. presents the problem formulation and some necessary preliminaries. Section III. describes RAEF's structure. Sections II. and IV. present the details of the experiments and results, and Section V. concludes the study.

Preliminary
{ } D is applied when generating representations of the data, where m t denotes which features are missing at time step t. e features missing at time step t can be described as follows: us, an incomplete sequential time series is denoted as e following rule is applied when training the model to create an artificial incomplete time series: (2)

Analysis of Missing Pattern.
With an analysis of a large amount of time-series data from the real IIoTenvironment, a piece of knowledge is that the main missing pattern for ITS is two types: univariate missing and common-mode missing. Univariate missing data are the most common pattern, which often appears as a series of reading losses in a single sensor over a short period of time, as shown in Figure 1. e usual cause is a fault in the sensor itself. For simplicity, we have only considered recoverable cases in this study--namely where the data collection can be recovered in a limited time. Here, k max is the maximum length of continuous missing data, noting that, in general, k max � D.
e other type of missing pattern is common-mode missing data, also known as common-mode failure. In these cases, a large number of sensors fail to upload their readings at the same time. Usually, this is caused by some external factor, such as a disk error, a network communications error, and human intervention. [36]. Figure 2 shows an example of this type of missing pattern.

Recurrent Neural Networks.
Recurrent neural networks (RNNs) are especially suited to dealing with temporally and spatially correlated information because they process historical information recursively and model historical memory. RNNs are neural networks that work on a variable length sequence X � x 1 , x 2 , . . . , x T by maintaining a hidden state h over time. At each time step t, the hidden state h t is updated by the following equation: where f is an activation function. Often f is as simple as performing a linear transformation on the input vectors, summing them, and applying an element-wise logistic sigmoid function. z t is an internal intermediate state, and the model parameters are symbolized by Further, we can simplify the RNN at time step t as an F RNN function formulated by the following equation: where f RNN encapsulates the different RNN variants. LSTMs [37] and gated recurrent units (GRUs) [38] are both very popular RNN variants. Figure 3 shows the structure of the RAEF. It learns to encode a sequence that may contain missing data and then to decode those vectors back into sequential time-series data without missing data. Note that the basic neuron used in the RAEF includes a novel GR.

RNN Encoder.
e encoder is a model based on an RNN or variant. In our case, since x t may have missing data, it cannot be used to update h t as per Equation (4). So, when x t is missing, the output of the previous time step is used instead. e information in this previous time step is a type of local information. Further, the mean of x t across time steps, denoted as r t and x. x, can be described as follows: Formally, the initial hidden state h 0 is initialized as an allzero vector. From t � 1 to T, the model is updated by the following equation: where c is a learnable scalar, initialized as 0. Introducing a learnable c allows the network to first rely on the cues in x.
Gradually, it learns to assign more weight to r t . Hence, the encoder can be described as E(x, h e ; θ e ), where x is the sequential input, h e is a hidden state, and E is a differentiable function represented by Equation (6) with the parameters θ e . Once the sequential time-series data X have been fed into the encoder, R � r 1 , r 2 , . . . , r T is recorded, and a vector h c is generated that contains global information about the full sequence of time-series data input: where g(.) are some nonlinear functions. Here, we consider a simple deployment and so assume that g( h 1 , h 2 , . . . , h T ) � h T . e loss function of the encoder is as follows:

Complexity
where λ t is a coefficient weighting, which represents the importance of the previous imputation at each time step. Intuitively, it does not need to be overly precise for the first few time steps of training the encoder. In common, it is assumed that:

RNN Decoder.
e decoder is also a model based on an RNN or variant that aims to decode the sequence R from the encoder back into a sequential time-series data without missing data. c is used to initialize the hidden state of the decoder. Note that, according to Equation (6), R � r 1 , r 2 , . . . , r T is considered to be a replacement to x 2 , x 3 , . . . , x T+1 . Hence, the decoder works backwards, reading the sequence in the reverse order (i.e., from r T to r 1 ). e sequential outputs of the decoder can be derived using Equation (3), denoted as y T , y T− 1 , . . . , y 1 .
Hence, the decoder can be described as D(r, h d ; θ d ). Finally, the decoder trains the parameters by minimizing the errors between the output y t and the input sequential timeseries data x t . e loss function is defined as follows: where L e uses the absolute error,

Gated Regulator.
Since the operation of the encoder is represented in Equations (5) and (6), the input data of each time step are not completely consistent in authenticity. Intuitively, if the imputation framework can evaluate the input data authentically at an early stage, and before calculating the candidate state, the hidden state can reduce the incidence of inaccurate information. A gated structure, i.e., a gated regulator, is therefore integrated into the encoder, as shown in Figure 4. e motivation is to allow the encoder to decide how much of the current hidden state h t will gain its information from the current input without increasing the extra information. Formally, this can be described as follows: Equation (6) becomes Note that the gated regulator is an independent structure, which means it must be compatible with the RNN or variants. As an example, LSTM-GR means an LSTM with the gated regulator. Binary Missing Mask Sequential Time Series Data Binary Missing Mask Sequential Time Series Data

Training Process.
To prevent a vanishing gradient or problems with explosion while back propagating RAEF, the training algorithm, Algorithm 1, prescribes that the encoder and decoder are trained asynchronously. e training process is therefore divided into three parts: (1) Input x into the encoder, and update the encoder by descending its gradient ∇ θ e ℓ encoder . (2) Record the encoder's output r (3) Input r into the decoder, and update the decoder by descending its gradient ∇ θ d ℓ decoder .
roughout, weight clipping is used to limit changes in the encoder's gradient.

Experiments Details
Our experience of real-world IIoTdata, as shown in Figure 5, is that a great many data points can be missing from time series collected in these environments. e levels shown in Figure 5 indicate just how widespread the problem of missing data is in IIoT environments.
is disturbing phenomenon not only affects the ability to monitor devices in real time but also reduces the accuracy of any subsequent analysis done by downstream applications.
In a series of analyses, we compare imputation with RAEF to several state-of-the-art imputation frameworks based on RNNs. en, we illustrated how incomplete timeseries imputation can improve the effectiveness of data applications. Last, we discuss the choice of T.

Dataset and Experiment Setup.
e datasets used in the experiment are summarized in Table 1.

UCI Air Quality Data (UAQ).
e UCI dataset contains 9358 records of average hourly responses from an array of 5 metal oxide chemical sensors embedded in an air quality chemical multi-sensor device taken between March 2004 and February 2005. e air quality data points have 12 features, and 7.5% of the values are missing. After removing the records with missing data, we randomly selected 20% of the data for testing and the others for training. Pearson's correlations between each feature are shown in Figure 6. is dataset can be thought of as an incomplete time-series dataset of a real IIoT environment that is rich in information and has a low-to middle-level missing rate.

Base Station Status Data (BSS).
is dataset was collected from an ePLCM002FR edge node, developed by Hangzhou Yiyitaidi Information Technology Co., Ltd. and deployed in a base station located at the Spring Shopping Mall in the Zhangdian District, Zibo, Shandong Province (see Figure 7). e dataset comprises 14,820 data readings taken between February 2018 and February 2019. Every data point contains six attributes: the temperature and current intensity of two rectifiers, the air conditioning setting temperature, and environmental temperature. 18.2% of the values are missing. We used the data collected for May and September 2018 and February 2019 for testing. e remaining data were used for training.
Pearson's correlations between each feature are shown in Figure 8. Compared with the UAQ dataset, the BSS dataset has shorter collection cycles, low data dimensions, and a higher missing rate. To stabilize the training with each dataset, we normalized the raw data via a linear transformation using the maximum and minimum (min-max normalization) before the experiment. However, because the BSS dataset does not contain any ground truth labels, experimenting with the actual missing values was not possible.
us, we simulated missing data by randomly omitting data according to different missing rates, and using the real values as a ground truth label in Table 1 provides the details.
e results were assessed in terms of mean absolute error (MAE) and mean relative error (MRE), calculated as follows: where Ω denotes the index set of the missing values, and S(.) denotes the size. x i is the ground truth of the ith missing item, and x i is its imputed value. x 1
Complexity 5 bidirectional recurrent dynamical system without any specific assumptions. (4) LIME-LSTM [25]-a novel framework for modeling incomplete time series based on LIME-RNN using an LSTM, where a network learns the residual connection between time steps and implements a linear combination of previous historical states.
Note that BM and KNN are common approaches to imputation. BRITS and LIME-LSTM are both imputation frameworks for time series based on RNNs.

Implementation Details.
We developed two implementations of RAEF: one with an LSTM and the other with a GRU. Further, we configured each model with and without a gated regulator to result in four baselines as follows.
For all methods, we fixed the parameters of the RNNs to be the same. e dimensions of the hidden state were n � 64, and the learning rate was α � 0.0001. In deploying the RNN-based models, we cut the datasets into sequences with a fixed length of T and input ms samples at once for training. e settings for the values of T and ms are shown in the last two lines of Table 1 and were applied to all RNNs consistently. Note that, in the training process, instead of using a validation set, we ended the training when the training loss leveled off.
Our experimental procedure had three main steps. First, we randomly deleted data from the complete time series to mimic the different missing patterns and with different missing rates. We then split the data into training and testing sets according to the proportions mentioned in Section A. Second, we trained all the frameworks. ird, we used different frameworks to generate imputation results for the testing set and evaluated the frameworks by comparing results with the ground truth data in terms of the evaluation metrics.
All experiments were run on the TensorFlow platform using an Intel Core i7-8700K, 3.60-GHz CPU with 16-GB RAM, and a GeForce RTX 2080 8G. Tables 2 and 3 show the results of the imputations, where MP denotes missing pattern and MR stands for missing rate. From these results, we drew the following observations.

Imputation Performance for Single Missing Pattern.
(1) Border mean (BM) was quite inaccurate and became less accurate as the missing rate increased. (2) KNN was not effective for imputing missing values with the common-mode pattern because the distance between the samples often could not be measured given the complete loss of all attributes. KNN was able to achieve a result with the univariate-type missing patterns at low missing rates but was sensitive to changes in this rate, and its performance grew worse as the rate increased. (3) LIME-LSTM, with its unidirectional RNNs, did not perform as well the frameworks that contain a bidirectional RNN, i.e., BRITS and RAEF. (4) LIME-LSTM and BSS could not cope with high missing rates. At low missing rates, RAEF and BRITS demonstrated similar performance. However, as the missing rate increases, RAEF performed significantly better than BRITS, especially the LSTM-GR implementation.    (6) r (i) ←E(x (i) , h e ; θ e ). (7) g θ e ←g θ e + ∇ θ e ℓ encoder . (8) end for (9) θ e ←θ e + α·RMSProp (θ e , g θ e /ms). (10) θ e ← clip (θ e , −c, c).
by Equation (7). (13) for i � 1 to ms do (14) h   In addition to these basic observations, we also noted some distinguishing performance features when comparing the gated regulator variants of RAEF to the plain version. In general, the gated regulator implementations of RAEF both outperformed the other frameworks within a limited range of missing rates and had obvious advantages under higher missing rates. From 5% to 15% missing rates on UAQ, the percentage increase in MRE over the non-gated versions of RAEF with the commonmode patterns was 7.46% and 23.54%, respectively. With the univariate pattern, this number was 8.57% and 20.23%. We can see the same trend with the BSS dataset. ese results suggest that the gated regulator was able to reduce the negative impact of increased missing rates with both types of missing patterns.

Imputation Performance for Mixed Missing Pattern.
Ideally, the missing data in a time series will conform to a single pattern-either univariate mode or common mode.
However, there are occasions where both patterns will be present. For this series of experiments, we fixed the missing rates-at 10% for the UAQ dataset and at 20% for BSS. We then simulated the following patterns of missing data in the time series: 100% univariate, 20% common mode (CM)/80% univariate (UM), 40%CM/60%UM, 60%CM/40%UM, 80% CM/20%UM, and 100% CM. Figure 9 shows the results. RAEF-LSTM-GR was the clear performer with significantly better results than the others.

Task: Imputing Missing Values in an Incomplete Time
Series. To more clearly show the importance of data imputation for downstream applications, we undertook a prediction task using incomplete time-series data and compared the results to the same task using imputed data. To approximate different real application scenarios, we performed the tasks with a range of missing rates. More specifically, we prepared versions of the UAQ dataset with missing rates of 5%, 10%, and 15% and conducted three groups of experiments A, B, and C as follows: (1) A: impute with R AIN-LSTMF and then use an LSTM for prediction (2) B: impute with BRITS and then use an LSTM for prediction  8 Complexity (3) C: impute with LIME-LSTM and then use an LSTM for prediction Groups A-C all used the same LSTM for predictions, which contained 64 neurons and were trained with a complete dataset. e difference between the prediction results and the ground truth data was measured using MAE.
Each experiment was repeated 50 times, and the results were recorded as shown in Figure 10.

e Choice of T.
To assess choice process, we varied the value of T. As shown in Figure 11, RAEF-LSTM-GR generally delivered the best performance for each dataset. But, as the   : Imputation performance for the mixed missing pattern results in terms of MRE (%), where a C b U means that, at a fixed missing rate, a percent of the missing data followed a common-mode pattern and b percent followed a univariate pattern.
missing rate changed, the optimal value of T was slightly different. At higher rates, the RAEF-LSTM-GR preferred a larger T to obtain more information from the input time series. However, when T was too large, performance drops, indicating that the model was affected by an exploding gradient.

Conclusion
is study presents RAEF, an imputation framework for IIoT environments based on a recurrent autoencoder. RAEF identifies the missing patterns in incomplete time series and uses them as a guide to impute the values that are missing. As part of this research, we, for the first time, summarized the missing patterns in incomplete IIoT time-series data. Unlike some other methods, which decay the hidden state, RAEF uses a gated regulator to reduce the negative impact of larger missing rates. Tests on both synthetic and real data with this approach show that RAEF has greater robustness, more flexibility, and returns fewer errors than other state-of-theart imputation frameworks designed for time-series data.
Data Availability e BSS and UAQ data used to support the findings of this study are available from the corresponding author upon request.