Electronic medical records imputation by temporal Generative Adversarial Network

The loss of electronic medical records has seriously affected the practical application of biomedical data. Therefore, it is a meaningful research effort to effectively fill these lost data. Currently, state-of-the-art methods focus on using Generative Adversarial Networks (GANs) to fill the missing values of electronic medical records, achieving breakthrough progress. However, when facing datasets with high missing rates, the imputation accuracy of these methods sharply deceases. This motivates us to explore the uncertainty of GANs and improve the GAN-based imputation methods. In this paper, the GRUD (Gate Recurrent Unit Decay) network and the UGAN (Uncertainty Generative Adversarial Network) are proposed and organically combined, called UGAN-GRUD. In UGAN-GRUD, it highlights using GAN to generate imputation values and then leveraging GRUD to compensate them. We have designed the UGAN and the GRUD network. The former is employed to learn the distribution pattern and uncertainty of data through the Generator and Discriminator, iteratively. The latter is exploited to compensate the former by leveraging the GRUD based on time decay factor, which can learn the specific temporal relations in electronic medical records. Through experimental research on publicly available biomedical datasets, the results show that UGAN-GRUD outperforms the current state-of-the-art methods, with average 13% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) improvements.


Introduction
Electronic medical records are often lost due to equipment failures, data transmission interruptions, and other reasons [1].As a result, the final collections of electronic medical records are often sparse and irregular.To fill in the lost values in electronic medical records, most state-of-the-art methods currently employ Generative Adversarial Networks (GANs) [2], which can learn the distribution of the original dataset and generate imputation values.However, when the missing rate of the dataset is high, there is a significant deviation between the learned data distribution by GANs and the actual data distribution, which leads to a sharp decrease in the accuracy of missing value imputation.Figure 1 shows the missing situation of Health-care [3], a publicly available dataset of electronic medicine.
In Fig. 1, there are a total of 42 physiological attributes and 196,000 records.That is to say, In Fig. 1a, BUN, Bilirubin, Cholesterol, Creatinine, DiasABP, FiO2, GCS, Glucose, HCO3, HCT, HR, K, Lactate, MAP are physiological attributes, and the ordinate represents the serial number of records in the dataset.Obviously, the values of Bilirubin, Cholesterol, and HCO3 are the most severely lost.For the high-missing-rate dataset, the imputation errors of the state-of-the-art GAN-based methods are quite high, as shown in Fig. 1b.Our method achieved good performance in Fig. 1c, achieving an average improvement of 13.0% RMSE and 24.5% MAPE, where the abscissa is the missing rates and the ordinate is the errors.
From Fig. 1, it is apparent that the Health-care dataset has mass missing values.When using and storing this dataset, if the "deletion" method [4] is employed, almost all the records in the dataset will be deleted.If the "mean" or "zero-value" imputation method [5] is utilized, the filled dataset will differ significantly from the original dataset.If the time series relation is exploited for imputation, it is impossible to establish an effective time series-based prediction model due to the mass missing values.If the GANs are utilized to learn the distribution of the original data, although the imputation accuracy is somewhat improved, it still does not reach the level of practical application.
Fig. 1 High-missing-rate dataset and imputation methods.For such a high-missing-rate dataset (Fig. 1a), the imputation effect of current GAN-based methods is shown in the bottom-left (Fig. 1b), and the effect of our approach is shown in the bottom-right (Fig. 1c) that can achieve an average improvement of 13.0% RMSE (Root Mean Squared Error) and 24.5% MAPE (Mean Absolute Percentage Error) Given those practical issues, in recent years, data with high missing rates imputation based on GAN network has been increasingly studied, where multivariate time series GAN network is a research hotspot.Early work attempted to employ GANs to learn the distribution patterns of multivariate electronic medical records [6,7].In recent years, methods combining multivariate time series data mining and GANs for missing value imputation have emerged.For example, Miao et al. [8] explored the time series classification method and the GANs model, and proposed a semi-supervised GAN imputation approach.Cao et al. [9] investigated the Recurrent Neural Networks (RNNs) model and proposed a time series data imputation method based on bidirectional RNN.Wang et al. introduced the attention mechanism [6] and proposed the STA-GAN model [10].Based on these, Benchekroun et al. [11] studied the characteristics of heart rate variability physiological data with high missing rates, and applied several missing value imputation methods to fill these data.
Although proven to be effective, the uncertainty of GANs has not been considered, nor has the role of the GRUs (Gate Recurrent Units) based on time decay been explored in missing value imputation.This could potentially provide another method for electronic medical records imputation.This has motivated us to explore the utilization of time decay compensation and the UGAN (Uncertainty Generative Adversarial Network), which allows traditional GANs and GRUs to work together and form a new missing value imputation method, UGAN-GRUD (Uncertainty GAN-Gate Recurrent Unit Decay).In UGAN-GRUD, to overcome the challenge of capturing the distribution pattern of high-missing-rate datasets, we introduce the uncertainty matrix unit U into GAN to form the UGAN, which is an improvement not considered in existing methods.To utilize the time interval information and U in the dataset, we introduce the time decay factor v t into GRUs to form the GRUD (Gate Recurrent Unit Decay) network.We propose a dual-network collaborative training mechanism, where the uncertainty matrix U in GAN is outputted and utilized to guide the training of GRUD.Compared to the current state-of-the-art methods, our approach better captures the distribution of high-missing-rate datasets and performs more accurate imputation.Experimental results demonstrate that our method outperforms existing state-of-the-art methods.
In summary, the main contributions of this paper can be summarized as follows: (1) We propose the UGAN-GRUD model for the first time, where UGAN is employed to learn spatial distribution patterns and GRUD is leveraged to learn time series patterns.The combination of the two improves the imputation accuracy of highmissing-rate datasets.(2) We propose an improved method for GAN, which includes Generator G, Discriminator D, and uncertainty matrix U, called UGAN, which can capture the distribution patterns and uncertainty of the dataset more accurately.We also propose an improved method for GRU based on time decay factor, called GRUD, which can further improve the imputation accuracy.(3) We theoretically and experimentally demonstrate that the proposed UGAN-GRUD achieves better performance, and also discuss the impact of dataset dimensions on UGAN-GRUD.
The organization of the paper is as follows.Related work is introduced in Sect. 2. The proposed UGAN-GRUD model is detailed in Sect.3, including the architecture of the model and the design of its components.In Sect.4, experiments are conducted on three publicly available electronic medical record datasets, and the results are compared and analyzed.Section 5 concludes the study and discusses future research directions.

Related work
In addition to the current state-of-the-art missing value imputation methods based on GANs, there are many other missing value imputation methods.In this section, we conduct a literature review of missing value imputation methods from four aspects: statistics, machine learning, deep learning, and electronic medical records.

Imputation methods based on statistics
The imputation method based on statistics refers to filling the missing values with statistics method, such as the methods "constant ", "mean", and "sampling".For example, Park et al. [4] adopted the missing value imputation method based on "constant" in analyzing the sleep data.Robertson et al. [5] designed a missing value imputation method based on "mean".Further, Nickerson et al. [12] designed a missing value imputation method based on adjacent observations.Zhang et al. [13] modeled the probability distribution of data changes and used the probability distribution model to predict missing values.Later, Singh et al. [14] investigated statistical sampling and sample estimation, and proposed a method for filling missing values based on continuous "sampling".Therefore, the imputation methods based on statistics are suitable for discrete data imputation, and the imputation effect is better when the data follow a normal distribution.

Imputation methods based on machine learning
Imputation methods based on machine learning include K-Nearest Neighbor (KNN) algorithm, shallow neural network method, and Matrix Factorization (MF) method, etc.For example, Ma et al. [15] proposed a missing values imputation method based on KNN clustering.Shallow neural network is an early form of neural network model, whose network structure is simple and the number of layers is small.Chen et al. [16] investigated the overfitting problem of neural networks, and proposed a neural network method based on steams.Tang et al. [17] employ a fuzzy neural network to classify the data, followed by the KNN method to predict the amount of missing values in each category, and finally utilize fuzzy rough sets to fill in missing values.The MF algorithm attempts to reconstruct the original data by matrix factorization to find the correlation between the data.In recent years, methods based on MF have been introduced into the time series data imputation.Generally, MF-based methods decompose a data matrix into two low-dimensional matrices, and then attempt to reconstruct the original matrix.During the process of matrix reconstruction, missing values are filled in.Fernandes et al. [18] proposed a MF-based method for filling missing values in multivariate time series data, and smoothed the filled values and observed values.Rios et al. [19] exploited machine learning methods for cardiovascular disease prediction and evaluated seven methods for filling in missing values.The imputation methods based on machine learning usually rely on prior knowledge of the data, which makes it difficult to deal with the potential rules in the data.In addition, most of the machine learning-based imputation methods emphasize the structure of data, so it cannot handle the unstructured data well.

Imputation methods based on deep learning
The imputation methods based on deep learning exploit the powerful learning function of deep neural network to learn potential rules from the dataset, and then complete the prediction of missing values.RNNs can process time series data through iterative and scalable neurons, which can well remember the sequence relation of time series data, so as to effectively fill the missing values.Ouyang et al. [20] used RNN to learn the relation between data and time, and then utilized neural networks to predict missing values.Cao et al. [9] proposed a supervised learning-based time series imputation model named BRITS.BRITS assumes that all the labels of time series data are complete, and therefore, data without labels are discarded during the training.It is worth noting that in the datasets with high missing rates, BRITS usually results in severe overfitting due to the sharp reduction of training samples.Shukla et al. [21] improved the weights of BRITS and proposed the AUCOA model.GAN can generate new data from the distribution of the original data.Considering that the missing data and the non-missing data in the dataset follow the same distribution law, the data can be generated by the GAN to fill in the missing values.Yoon et al. [6] proposed a model GAIN that fills missing values through GAN.GAIN exploits the Generator to learn the distribution law of the original data with missing values, and leverages the Discriminator to judge the missing values produced by the Generator.Miao et al. [8] proposed a semi-supervised GAN model named SSGAN for missing values imputation.In SSGAN, a semi-supervised classifier is designed to iteratively classify unlabeled time series data and make the Generator produce predictions for missing values.The methods based on deep learning have greatly improved the accuracy of missing values imputation.However, for the datasets with high missing rates, the accuracy is still not high.

Electronic medical record imputation
The problem of electronic medical record imputation is a clinical application-oriented issue that has gone through three stages of development.Early electronic medical record imputation employed traditional zero-value imputation methods [4], followed by the adoption of machine learning methods [19] for imputation.Currently, most electronic medical record imputation employs deep learning methods, such as the study by Zheng et al. [22] on predicting mortality risk, which utilized the LSTM-RUN model to fill missing values.Experimental results show that this method is effective, where LSTM is a special type of recurrent neural network.Shi et al. [23] applied GRU to the learning of clinical time series data and found that the GRU-based method is a fast missing value imputation method, with GRU being a simplified version of LSTM.The latest electronic medical record imputation methods are based on GAN networks, but GAN networks have only been applied in a simplistic manner [24,25].That is to say, these methods have not considered the specific needs of the electronic medical record field, and thus have not improved GAN networks accordingly.As a result, these methods have led to the problem of high missing rates not being adequately addressed.
To sum up, since the data are seriously missing, if the method "deletion" is utilized, almost all records in the dataset will be removed; if the method "mean" or "zero" is employed to fill, the filled dataset will be very different from the original dataset; if the time series method is exploited to fill, a time-based prediction model cannot be established due to the seriously data missing; if GANs are employed to fill, there will be a large deviation between the learned data distribution law and the real data distribution law.Therefore, the existing methods cannot effectively handle the serious problem of electronic medical record imputation with high missing rates.

Imputation based on temporal GAN
An imputation method for electronic medical records based on GAN and temporal relation is proposed.The method first exploits GAN to learn the true distribution of the original data, and fill in the missing values with the generated data.Then, it leverages the time relation to rectify the filled values.

Problem descriptions
(1) High-dimensional electronic medical record: refers to electronic medical record that contain multiple medical features.
Let x = {x 0 , x 1 , . . ., x n−1 } ∈ R d×n denote electronic medical record dataset, where x 0 represents the observation value of x at time t 0 , and x 1 represents the observation value at time t 1 , and so forth.Each observation value includes d features, for example, x j 0 represents the j th feature value of x 0 .In general, when d > = 3, x is high-dimensional elec- tronic medical record dataset.
(2) Missing value mask matrix: used to mark the missing status of high-dimensional electronic medical records.
Let m(m j i ) ∈ R d×n mark the missing status of electronic medical record dataset x, then, where m j i is a flag in the mask matrix, and 0 means missing, 1 means normal.
(3) Missing interval matrix: used to mark time intervals.
Let δ(δ j i ) ∈ R d×n denote the time interval matrix of electronic medical record dataset x, then, where δ(δ j i ) is employed to compensate for time decay, and m j i−1 is the element in the mask matrix.ti and t i−1 are time. (1) The task of missing value imputation can be described as: based on the given dataset of high-dimensional electronic medical record x, missing value mask matrix m, and missing interval matrix δ, establish a missing value imputation model, and predict the missing data.

UGAN-GRUD model
To overcome the imputation problem faced by high-missing-rate electronic medical records, we propose a missing value imputation model based on uncertainty matrix and time decay factor, viz., UGAN-GRUD.In the model UGAN-GRUD, in order to alleviate the problem of learning the distribution law of electronic medical records with high missing rates, we propose a control network UGAN based on the uncertainty matrix U, where U is the difference between the generated data and the original data, which represents the uncertainty of GAN.Due to the high missing rates of the original dataset, U changes drastically, and its values are uncertain.Considering the accuracy and diversity of imputation, we propose the GRUD based on time decay factor, where the time decay factor is an operator that uses time order and time interval to correct the filled data, which is a function of u t ( u t ∈ U ).The illustration of UGAN-GRUD is shown in Fig. 2.
In the proposed method, the potential distribution of electronic medical records is captured by the Generator, and the output data of the Generator are judged and optimized by the Discriminator.The Generator and the Discriminator form two opposing sides, so that they constantly optimize themselves and improve their ability to generate or discriminate.Eventually, the neural network becomes stronger during the training process.In the time decay compensation process on the right side of Fig. 2, we exploit the temporal dependencies between GRU units and the attenuation matrix to rectify the filled values of UGAN.Since the time intervals between missing values are not necessarily equal, it is necessary to obtain the information of time intervals.UGAN-GRUD not only considers the correlation of features and the uncertainty of GAN-generated data, but also exploits the correlation between time.
Fig. 2 The illustration of model UGAN-GRUD.In Fig. 2, G, D, and U are the Generator, Discriminator, and uncertainty matrix, respectively.z, x t , and m t are the inputs, where z is a random value, x t is multivariate time series data with missing values, and m t is the mask vector.DDM is the data distribution matrix produced by G; D(x) is the output of the D. G updates the neural network with its loss function J G ; D updates the neural network with its loss function J D .m t , m t , u t , andx t are inputs to the neural network GRUD for secondary imputation based on time decay compensation, where u t ∈ U , and x t ǫDDM .The neural network GRUD consists of T units, each corresponding to a specific set of m t , m t , u t , and x t

UGAN
The data generated by ordinary GANs is not accurate for filling in missing values in electronic medical records with high missing rates.To alleviate this issue, we propose an uncertainty matrix-based control network UGAN that takes into account the dynamics of the data distribution.
Unlike the ordinary GANs, UGAN consists of G, D, and U.The input of G is not only z, but a combination of z, x, and m, where x is the original input, z is a random matrix based on x, and m is a mask matrix based on x.In UGAN, to improve the optimization speed of the neural network, tanh() is selected as the activation function by G and D. The raw data are normalized, which are mapped between [-1.0, 1.0].
At a certain moment, the input of G is x t , m t , and z t , and the output is the data distribution matrix DDM, where DDM consists of a series of estimated values x t .
where, ⊙ is the element-wise multiplication and x t is the estimated value of the original input vector x t .Regardless of whether there are missing values in x t , G will generate the estimates in its corresponding dimensions, that is, the non-missing values in x have also corresponding estimates.It should be noted that zero is utilized as a placeholder for missing values in the dataset before the neural network is trained.
To rectify the values of the DDM, it is necessary to replace the corresponding values in the DDM with the non-missing values in x, as shown in Eq. ( 4).where x t is the corrected vector, m t is the corresponding mask vector, x t is the corre- sponding original input vector, and x t is the output of Eq. (3).In order to measure the accuracy of the data generated by G, an uncertainty matrix U = {u 1 , u 2 , ..., u t } is intro- duced, where U is the difference between the generated vector x and the original data vector x, that is, at time t, the error between x t and x t can be calculated by Eq. ( 5).
In Eq. ( 5), d is the dimension of multivariate time series data at time t, and k is the sum of the observations at time t.Since values in some dimensions at time t may be missing, d ≥ k .u t represents the uncertainty of the filled data at time t, and it will be further exploited in subsequent neural networks GRUD.
D is responsible for judging the accuracy of the generated data.The main task of D is to calculate a probability value between 0 and 1 based on the true label, the original input, and the generated data.We make UGAN call the Discriminator twice, one for real data discrimination and the other for fake data discrimination.The different outputs of the Discriminator are leveraged to calculate the loss values of Generator and Discriminator.Finally, the parameters of the neural network are updated using the back-propagation mechanism.In Algorithm 1, UGAN is described in more detail. (3) During the training of UGAN, samples need to be extracted from the training dataset, and these samples will be utilized to generate the mini-batches used in the iterations, denoted as x , m and e .Briefly, the main steps of the algorithm UGAN are as follows.

GRUD
In order to further rectify the missing values filled by UGAN, we propose an iterative and scalable neural network structure GRUD.In GRUD, by introducing a time decay factor, the missing data will be filled differently according to the time of its missing, which increases the diversity of missing value imputation.GRUD provides corresponding information by memorizing the sequential relationship and historical time information of time series data.
For electronic medical records, the issue of missing for a long time often arises [24,25].For the long-term missing of electronic medical records, we attenuate the historical memory vector according to the length of the missing time: if the missing time is long, due to the principle of forgetting, the historical information has little influence on the current status, so the historical memory vector should be attenuated greatly; otherwise, if the missing time is short, the historical memory vector should undergo a small decay.In order to adapt to the missing time intervals of electronic medical records, we propose the GRUD based on a time decay factor, as shown in Fig. 3.
The time decay matrix is composed of time decay factors, which exploits the sequential and historical information between time that can finely fill in the missing data.Specifically, the time decay matrix V (v t ) is calculated by Eq. ( 8).
where W u is the weight parameter, b u is the bias vector, u t is the error between x t and x t at time t, and the range of v t is [0, 1].u t is exploited in v t .u t is the deviation between the vector generated by UGAN and the original data vector, which can be employed to further improve the diversity and accuracy of the filled values.Therefore, u t is introduced into the GRUD to further fill in the missing data by using the time associations.
Obviously, v t is leveraged to highlight the reliability of the generated imputation values, which can rectify the attention of the large biased data generated by G.The estimated value x r t of the current sequence can be predicted from the hidden layer state h t−1 .
(8) Based on v t , x r t and x t are combined to obtain the estimated value of GRUD, as shown in Eq. (10).
Finally, replace the missing values with the estimated values c t to get the complete vector x c t , as shown in Eq. (11).
Additionally, the "∘" operator needs to be leveraged to concatenate the complete vector with the corresponding mask vector.For the hidden state h t−1 , v t−1 is employed for processing to get h t−1 .Therefore, the update of hidden state at time t, h t , is shown in Eq. ( 12).
where, σ represents the activation function, W h and P h are the weight parameters, and b h is the bias vector.The specific definition of the loss function is shown in Eq. ( 13).
In Eq. ( 13), L MAE denotes the mean absolute error loss, and the meanings of x t , m t , and c t are the same as those described above.Algorithm 2 describes the entire proce- dure of GRUD in detail.

Algorithm 2. GRUD Algorithm
The model UGAN-GRUD includes two parts, one is the deep neural network UGAN, and the other is the deep neural network GRUD.The former learns the (9) x r t = W r h t−1 + b r . ( distribution law of the original dataset through the Generator, guides the Generator through the Discriminator, and records the deviation of the filled values through the uncertainty matrix.The latter exploits GRUD to memorize the sequence relations and historical time information of time series data, and then employs the learning function of deep neural network to discover the correlations between data.Finally, the target of improving the imputation accuracy for the datasets with high missing rates is achieved.

Experiments and analysis
To validate the model UGAN-GRUD, we conducted three aspects of experimental studies: (1) the performance study, (2) the ablation study, and (3) the efficiency study.Like existing methods [4,5,8,10,12,21], we performed the same dataset selection and experimental parameter settings.

Experimental datasets and baseline models
To verify the effectiveness of the UGAN-GRUD model, three publicly available e-health datasets, Health-care [3], Perf-DS1 [26][27][28] and Perf-DS2 [28] were used.Those electronic medical records are the data on human physiological indicators [3,28].The datasets are provided by the intensive care units and community hospitals, and the indicators involved include body temperature, heart rate, blood sugar content, electrocardiogram, and so forth.The Health-care dataset has a total of 4,000 records, each 24-36 h long, and belongs to multivariate time series data.Most of the records of Health-care dataset are incomplete (components missing), it has an average missing rate of 80.67%, and the related main task is to classify patients.The Perf-DS1 dataset has a total of 90,000 records, and its average missing rate is 50%, whose continuous missing problem is serious.The Perf-DS2 dataset has a total of 12,000 records, and its average missing rate is 13%, and there is obvious periodicity in these data.
According to the experiments of the current state-of-the-art methods [8,21], the division ratio of the training dataset and the test dataset is 7:3, and they are used for training and testing, respectively.Since the missing-value-imputation based on traditional statistical methods does not require training, it directly enters the testing phase.In order to simulate the mass missing phenomenon, secondary missing processing is required.The method of secondary missing processing [6] is to randomly select a record, and if it is a complete record, delete it and mark it as missing data; and if it is a record with missing values, select next record to handle.We employ a normal distribution with a random seed of 1024 to randomly select the serial number/position of the record in the dataset.
∎ Zero [4] model: This is a classic model that features the use of 0 to fill in missing values.∎ Mean [5] model: This is also a widely used classic model, characterized by using the global average to fill in missing values.∎ Last [12] model: This is a widely used model in the field of behavioral data mining, which features the use of the last observations to fill in the missing values.
∎ KNN [29] model: It is also called the K-Nearest Neighbor imputation algorithm, which is characterized by using the KNN algorithm to find the samples with "near neighbor", and then employing the weighted average of the "near neighbor" samples to fill in missing values.∎ STA-GAN model [10]: This is a missing value imputation model based on GAN network, which fills missing values through the Hint Matrix mechanism [18].∎ AUCOA [21] model: This is a time-series neural network model that is characterized by bidirectional training of data.One direction arranges the data and trains them along time increments, and the other direction arranges the data and trains them in decreasing time.Experiments showed that this bidirectional training method can improve the accuracy of missing value imputation of time-series data.SSGAN [8] model: This is an improved GAN network model that is characterized by iteratively classifying unlabeled time series data through a semi-supervised classifier, which in turn assists the Generator to estimate missing values by using these classified data.∎ SSGAN [8] model: This is an improved GAN network model that is characterized by iteratively classifying unlabeled time series data through a semi-supervised classifier, which in turn assists the Generator to estimate missing values by using these classified data.∎ UGAN-GRUD model: The method proposed in this paper.
Since the problem solved by some baseline methods is the missing value imputation for the general purpose domain, and the problem we are solving is the missing value imputation for the biomedical field, we utilize the datasets of the biomedical field [3,28] to re-compare these methods.In the experiments, based on the characteristics of the datasets, we utilized a normal distribution to initialize the parameters in the models.In addition, as in Ref. [8][9][10], the neural network models were set a Batch Size of 128 and an Iterative Period (epoch) of 1000; The Adam optimizer was chosen for stochastic gradient descent training with a learning rate of 0.001, and the Sigmoid was chosen as the activation function to map variables between 0 and 1.To prevent the distribution of the dataset from adversely affecting the training process, all data were normalized so that their means were zero.

Evaluation criteria
To facilitate evaluation and comparison, the Root Mean Squared Error (RMSE) [30] and the Mean Absolute Percentage Error (MAPE) [31] between the ground-truth values and the predicted values, are adopted as the evaluation criteria in this paper, as shown in Eqs. ( 14) and (15).In Eq. ( 14) and (15), n represents the number of samples, and y i and y ′ i denote the ground-truth value and predicted value at time i, respectively.RMSE and MAPE (14 represent the gap between the original data and the filled data.The smaller the RMSE and MAPE, the better the performance. To evaluate the classification effect of the filled data, the Area Under Curve (AUC) metric is adopted in this paper, as shown in Eq. ( 16).The metric AUC represents the area under the Receiver Operating Characteristic (ROC) curve.The metric AUC is not sensitive to the proportion of positive and negative samples, so the metric AUC can better distinguish the pros and cons of the binary classification models [9].
where D + represents the set of all positive samples, D − represents the set of all negative samples, and f (x + ) > f (x − ) indicates that the prediction result of positive sample x + is better than that of negative sample x − .

Performance of imputation
In the experiments, we implemented all the evaluation testbeds using PyTorch.To evaluate the missing values imputation performance of UGAN-GRUD, it is necessary to select homogeneous and comparable methods.In this paper, Zero, Mean, Last, KNN, STA-GAN, AUCOA, SSGAN were selected as comparison methods.At the same time, in order to reflect the processing effect of the high-missing-rate datasets, the datasets Health-care, Perf-DS1 and Perf-DS2 were treated with secondary missing, and the missing positions of records were randomly selected according to the normal distribution.As references [6,8,9], the "underscore" identification method was introduced to mark the top three models that performed better, and "bold" was used to mark the model that performed best in the experiments.Table 1 shows the imputation performance of different models on the dataset Perf-DS1 with different missing rates.
It is easy to see from Table 1 that the UGAN-GRUD model achieves the best performance.Compared with the model Zero, the performance of UGAN-GRUD is greatly improved by 50%.The model AUCOA has the second performance, but its performance of imputation drops drastically as the missing rate increases.UGAN-GRUD has an average improvement of 36.2% in RMSE and 39.4% in MAPE compared to AUCOA.UGAN-GRUD has an average improvement of 39.2% in RMSE and 41.8% in MAPE compared to SSGAN.Table 2 shows the imputation performance of different models on the dataset Perf-DS2 with different missing rates.
It is easy to see from Table 2 that UGAN-GRUD performs the best on the criteria RMSE.From an average performance perspective, UGAN-GRUD has an average improvement of 19.7% compared to AUCOA and 22.8% compared to SSGAN.In terms of MAPE indicators, UGAN-GRUD is slightly lower than AUCOA, because (1) the periodicity of the Perf-DS2 dataset is better, and it is likely that UGAN and GRUD destroy the original time series laws of the data; (2) The initial missing rate of the Perf-DS2 dataset is relatively low, which makes the advantages of the UGAN-GRUD method impossible to play to a certain extent.This indicates that the UGAN-GRUD method is more suitable for datasets with random distribution and high missing rates.Table 3 shows the ( 16) imputation performance of different models on the dataset Health-care with different missing rates.It can be seen from Table 3 that the UGAN-GRUD model can still achieve better performance under the condition of large data loss, where the initial missing rate of the Health-care dataset is 80.67%.Other models that achieved better performance were STA-GAN and SSGAN, with SSGAN in second place and STA-GAN in third.Analysis: From the performance experiments of imputation, it can be seen that the UGAN-GRUD model performs well on the datasets Health-care, Perf-DS1, and Perf-DS2.It should be noted that the comprehensive missing rates of the datasets Health-care and Perf-DS1 are relatively high, while the comprehensive missing rate of the dataset Perf-DS2 is relative low.This indicates that the UGAN-GRUD model is not only a method suitable for high-missing-rate datasets, but also has certain reference value for common missing rate datasets.

Performance of classification and regression
Since the ultimate purpose of electronic medical record imputation is to support decision-making, the performance of classification and regression of the filled data need to be evaluated.Like references [6,8,9], we constructed a RNN classifier and a RNN regression predictor, and trained the models using the filled dataset.The number of training iterations is 30, the learning rate is 0.005, the dropout is 0.5, and the dimension of the hidden state in the RNN is 64.The evaluation criteria used for the classification is AUC, and the evaluation criteria used for the regression prediction is RMSE.
The Health-care dataset was used for the training and process of the classification task with a number of 30 classes.The Perf-DS1 dataset and the Perf-DS2 dataset were used for the training and testing process of the regression task.Figure 4 is the classification performance based on the Health-care dataset.
It is easy to see from Fig. 4 that the classification performance is the best after the dataset is filled in by the method UGAN-GRUD, which is 19.2% higher than KNN method and 1.4% higher than SSGAN method.Figure 5 shows the regression task performed on the filled Perf-DS1 dataset, where the smaller the RMSE, the better the regression effect.
Figure 6 shows the effect of performing regression tasks on the filled Perf-DS2 dataset, where the smaller the RMSE, the better the regression effect.
It is easy to see from Fig. 5 and Fig. 6 that the regression effect is different after the dataset is filled by different methods, among which UGAN-GRUD corresponds to the best effect, followed by SSGAN, and STA-GAN third.Classification and regression effects are inseparable from imputation effects, for example, UGAN-GRUD, SSGAN, AUCOA, STA-GAN methods with better imputation effects, and their corresponding classification and regression effects are also better.The imputation of dataset is a meaningful endeavor.

Ablation experiments
To explore the impact of various improvements in the UGAN-GRUD model on performance, an ablation study is required.This means removing the improved parts in the UGAN-GRUD model and observing changes in model performance.The key improvements in the UGAN-GRUD model are two-fold, namely UGAN and GRUD.We utilized a GAN-based model [6] as the "Base" model.Then, we added GRUD to the "Base", called "Base + GRUD"; and we added UGAN to the "Base", called "Base + UGAN".Finally, we Fig. 4 Classification performance of Health-care after filled.In Fig. 4, the abscissa is the eight compared methods, i.e., Zero, Mean, Last, KNN, STA-GAN, AUCOA, SSGAN, and UGAN-GRUD, and the ordinate is the AUC evaluation criterion Fig. 5 Regression performance of Perf-DS1 after filled.In Fig. 5, the abscissa is the eight compared methods, i.e., Zero, Mean, Last, KNN, STA-GAN, AUCOA, SSGAN, and UGAN-GRUD, and the ordinate is the evaluation criterion RMSE added both of these key improvements together, called "Base + GRUD + UGAN", which is also the UGAN-GRUD model.All neural network parameters were initialized with the same values.Table 4 shows the ablation study results of the UGAN-GRUD model.

Analysis
(1) From Table 4, it can be seen that after adding GRUD to the "Base", the model's performance is improved.Since GRUD can mine the correlation from time series data, this indicates that GRUD helps to improve the accuracy of imputation.Similarly, after adding UGAN to the "Base", the model's performance is significantly improved.This shows that using an uncertainty matrix to capture the distribution of high missing rate datasets is an effective method.(2) Additionally, when both GRUD and UGAN are added to the "Base", the model's performance reaches its optimum.This indicates that the key improvements GRUD and UGAN are not only effective individually but also when combined, the overall performance can reach its best.In summary, the improvements of the UGAN-GRUD model are all effective, making it a competitive model.and the training efficiency is second.Since the UGAN-GRUD model increases the computation of the uncertainty matrix and the training of the GRU neural network, it is not as efficient as the STA-GAN model.However, the performance of the UGAN-GRUD model far exceeds that of the STA-GAN model.Considering both performance and efficiency, the UGAN-GRUD model is the best choice.

Discussion on scalability and limitations
Missing value imputation is used to restore data in real-world domains and plays an important role in intelligent decision-making.Although the method proposed in this paper is limited by the characteristics of electronic medical records research, it can be tried in scenarios with high missing rates.For example, in our experiments, we have attempted to employ the method proposed in this paper to process the datasets involved in references [8,10,21], etc., and the experimental results show that their performance has been improved to some extent.Since the research task of this paper is the missing value imputation of electronic medical records, no further research and experimental comparisons have been conducted on this.This will be one of the contents of our future research.

Conclusion
The missing of electronic medical records is a commonly observed phenomenon that holds significant research value.In this paper, we propose a missing value imputation model called UGAN-GRUD based on uncertainty matrix and time decay factor.UGAN-GRUD consists of two important components: UGAN, an improvement on traditional GAN, which includes a generator G, a discriminator D, and an uncertainty matrix U; and GRUD, an improvement on traditional GRU, which introduces the time decay factor.We conducted experimental studies, and the results show that UGAN-GRUD not only surpasses existing state-of-the-art methods in terms of imputation performance but also performs well in supporting subsequent classification and regression tasks.
The future research direction is to explore the interaction of correlated features [32][33][34] and their impacts on imputation performance.We believe that this will motivate new algorithm discoveries.

( 4 )
Calculate the loss function of G using Eq.(7) (5) Repeat the training within a given number of iterations (n_iter); (6) Obtain the electronic medical record dataset with filled values after training UGAN.

Fig. 3
Fig. 3 GRUD based on time decay factor.In Fig. 3, z represents the update gate, r represents the reset gate, h represents the candidate hidden state, x c represents the complete vector, c represents the estimated value, and v represents the time decay factor.Mask(m), In(x), In(x ), In(u) represent the four inputs, and Out(h) represent the output

Fig. 6
Fig.6 Regression performance of Perf-DS2 after filled.The meanings of the abscissa and the ordinate are the same as those of Fig.5

Table 1
Imputation performance experiments on the dataset perf-DS1.In Table1, "Criteria" denotes evaluation criteria, which include RMSE and MAPE.Zero, Mean, Last, KNN, STA-GAN, AUCOA, SSGAN, and UGAN-GRUD are eight models used to compareThe underline indicates the top-3 performance, while the bold indicates the best performance

Table 2
Imputation performance comparisons on the dataset perf-DS2.In Table2, the evaluation criteria and missing rates are the same as those in Table1 The underline indicates the top-3 performance, while the bold indicates the best performance

Table 3
Imputation performance comparisons on the dataset health-careThe underline indicates the top-3 performance, while the bold indicates the best performance