Gappy AE: A Nonlinear Approach for Gappy Data Reconstruction using Auto-Encoder

We introduce a novel data reconstruction algorithm known as Gappy auto-encoder (Gappy AE) to address the limitations associated with Gappy proper orthogonal decomposition (Gappy POD), a widely used method for data reconstruction when dealing with sparse measurements or missing data. Gappy POD has inherent constraints in accurately representing solutions characterized by slowly decaying Kolmogorov N-widths, primarily due to its reliance on linear subspaces for data prediction. In contrast, Gappy AE leverages the power of nonlinear manifold representations to address data reconstruction challenges of conventional Gappy POD. It excels at real-time state prediction in scenarios where only sparsely measured data is available, filling in the gaps effectively. This capability makes Gappy AE particularly valuable, such as for digital twin and image correction applications. To demonstrate the superior data reconstruction performance of Gappy AE with sparse measurements, we provide several numerical examples, including scenarios like 2D diffusion, 2D radial advection, and 2D wave equation problems. Additionally, we assess the impact of four distinct sampling algorithms - discrete empirical interpolation method, the S-OPT algorithm, Latin hypercube sampling, and uniformly distributed sampling - on data reconstruction accuracy. Our findings conclusively show that Gappy AE outperforms Gappy POD in data reconstruction when sparse measurements are given.

1. Introduction.In digital twin applications, adopting efficient data-driven models for representing physical fields has become increasingly prevalent.To enhance efficiency, these models involve mapping from a low-dimensional to full-dimensional space.In other words, these learned models approximate solutions in the full space based on information from a reduced-dimensional space.This approach encompasses two distinct solution representations: linear subspaces and nonlinear manifolds.However, the key challenge is identifying a suitable generalized coordinate within the low-dimensional space.
Methods to find a suitable generalized coordinate can be categorized into two approaches: intrusive and non-intrusive.In intrusive methods, the governing equations are solved with a substituted solution approximation.Projection-based reduced-order models, such as the linear subspace reduced-order model [7,16,12,15,18,10], nonlinear manifold reduced-order model [37,31,30,20], component-wise reduced-order model [39,40] and space-time reduced-order model [14,13,33,49], are examples of intrusive methods.These techniques are widely employed to expedite physical simulations.Recent research in reduced order models (ROMs) focuses on enhancing the representability of autoencoders [46] and the computational efficiency of autoencoder training [17,20].In [17], the authors reduced computational costs by training autoencoders on subsampled points only.During the online phase, the decoder outputs data corresponding to subsampled points, and then a POD basis matrix is used to reconstruct the data.The authors in [20] present the domain decomposition (DD) method for nonlinear-manifold reduced order models (NM-ROMs).In NM-ROMs, nonlinear manifolds are found by training sparse and shallow autoencoders, which can be costly for large-scale problems.To alleviate the training cost of the sparse and shallow autoencoders, the authors decomposed the domain into subdomains, resulting in the reduction of the number of learnable parameters of autoencoders for each subdomain and the size of the training data.In [46], the authors introduced convolutional decoders and compressed decoders.The convolutional decoder learns a latent representation of the dynamical system.There is no constraint on the structure of the convolutional decoder, giving us the opportunity to test various deep learning models.The compressed decoder learns mapping from latent space to subsampled data.The latent space coordinates are found by minimizing subsampled residuals, and then the latent space coordinates are taken as input for the convolutional decoder to predict the solution.In this case, subsampled residuals are only used to find latent space coordinates, and the convolutional decoder computes full-order outputs.
In contrast, non-intrusive methods do not involve solving governing equations.Dynamic identification (DI) methods presented in [24,27,3] identify dynamics in latent space using regression method in a leastsquares sense.The Gappy proper orthogonal decomposition (POD) method developed by [23] approximates a solution with a linear subspace spanned by the POD basis matrix.The gappy POD method finds generalized coordinates, which are POD basis coefficients that minimize the error between the predicted and gappy data.This method was successfully applied to unsteady flow reconstruction [50].Linear subspace-based approaches can represent data with fast decaying Kolmogorov N-width, such as elliptic and parabolic equations.However, such methods are not effective in representing data with slowly decaying Kolmogorov N-width, such as linear transport problems [44] and wave equation [25].For digital twins of problems with Kolmogorov N-width decaying slowly, e.g., construction of earthquake digital twin model for source characterization, we need more general method.Since seismic waves are solutions to the wave equation, conventional Gappy POD does not perform well in terms of data reconstruction.The dynamic mode decomposition (DMD) method is a data-driven and non-intrusive method developed by Peter Schmid [47,35] that approximates local dynamic systems from the dynamic modes found using eigenvectors and eigenvalues of the reduced system obtained through singular value decomposition (SVD) of the dynamic system.
An alternative method is to determine approximate solutions via machine learning.Physics-informed neural networks (PINNs) [4,5,19,45] are trained by minimizing loss between output data and training data together with residuals of given governing equations.PINNs predict solutions directly based on temporal and spatial variables and can be adapted to restore data when sparse measurement data is given to address the challenge of reconstructing gappy data.Specifically, these neural network models use time and spatial parameters in conjunction with the observed data to generate the requisite solutions.
There is also an auto-encoder-based approach to reconstruct data.When gappy data is given as an input, a model outputs reconstructed data directly.In [26], the authors presented a conditional variational autoencoder (CVAE)-based method by conditioning on measurements, simultaneously predicting the solution and uncertainty.Their model reduces the input data to latent space variables, which describe probability distributions.The probability distribution in the latent space is found with the measurement data.Then, solutions can be produced from observations and latent space variables.In [43], a deep neural network is used to predict the reduced state of full-order data from sensor measurements directly.Then, the full-order data is reconstructed by multiplying POD basis matrix by the reduced state.In [26] and [43], measurement data is an input argument of the neural networks and is used during the forward pass during inference time.However, our proposed method does not take measurement data as the input of the neural network model.In our method, we solve the minimization problem between the measurement data and the model prediction to find generalized coordinates in a latent space.
We developed a novel gappy data reconstruction method called Gappy auto-encoder (AE) to represent physics field data for digital twin purposes.The proposed method learns a nonlinear mapping between the latent and original data spaces using an auto-encoder.Then, we find generalized coordinates in the latent space by minimizing the error in l 2 norm between gappy measured data and the decoder output.It is categorized into a data-driven and non-intrusive method.Compared to the Gappy POD method, our Gappy AE method represents data using a nonlinear manifold, overcoming the limitations of linear subspace-based methods.Replacing a linear subspace basis matrix with an auto-encoder requires special care, which is not trivial, such as determining sparse sampling locations, having sparse and shallow decoder structure, and creating the subnet that only contains active nodes and weights as detailed in Section 3 and 4 of [31].The purpose of these non-trivial things is to compute the Jacobian of the decoder quickly which is used in Gauss-Newton iterations for finding generalized coordinates in a latent space.The major contributions of our method are summarized as follows: 1. Gappy AE demonstrates superior solution approximations compared to Gappy POD, even with a reduced latent space dimension.2. Gappy AE remains effective even in scenarios where measurements are extremely sparse.3. Gappy AE framework employs four distinct sampling algorithms, and their respective accuracy is assessed and compared.The paper is structured as follows: Section 2 provides an overview of the background materials and our proposed method.We delve into the traditional linear subspace-based approach, the Gappy POD, in Section 2.1.The Gappy AE method is the focus of Section 2.2, which encompasses three key aspects: Section 2.2.1 explores the concept of a nonlinear manifold-based data representation, Section 2.2.2 delves into the structure of the auto-encoder, and Section 2.2.3 explains the Gappy AE algorithm.In Section 2.3, we describe sampling on the residual term and elaborate on the sampling algorithms for measurement location, followed by a showcase of numerical examples in Section 3. Finally, Section 4 concludes the paper.

Nomenclature. N s
Full order model space dimension n s Reduced space dimension n r The number of samples n µ The number of parameters N t The number of time steps Right singular matrix The proper orthogonal decomposition (POD) method is commonly used to construct a basis matrix, Φ.In POD method [2], Φ is obtained by truncating a singular value decomposition (SVD) of a snapshot matrix, X := x Nt+1) , where x µ k n is a solution state at n-th time step with parameter µ k for n ∈ N(N t ) and k ∈ N(n µ ).It pertains to the statistical analysis technique known as principal component analysis [29] and the stochastic analysis concept called Karhunen-Loève expansion [38].The POD method computes its thin SVD: where U ∈ R Ns×nµ(Nt+1) and V ∈ R nµ(Nt+1)×nµ(Nt+1) are orthogonal matrices and Σ ∈ R nµ(Nt+1)×nµ(Nt+1) is a diagonal matrix with singular values on its diagonals.The leading n s columns of U are selected to set Φ (i.e., Φ = u 1 • • • u ns , where u k is k-th column vector of U ).The basis matrix from POD method, Φ is a solution to the minimization problem among Φ ∈ R Ns×ns with orthonormal columns For more detailed information on the POD method, please refer to [28,34].For brevity and readability, n and µ k are omitted in the notation x µ k n .We can set a minimization problem between the measurement and solution data approximation given the sparse measurement data.By solving the minimization problem, we find the generalized coordinates, x.The Gappy POD first approximates x ∈ R Ns as (2.4) x ≈ x = Φx, assuming x ref = 0. Solving the following minimization problem (2.5) x = argmin with given sampling matrix, Z T ∈ R nr×Ns , that extracts the rows of matrix and vector corresponding to measurement points, where n r is the number of samples and n r ≥ n s gives us and then we have where Z T x is replaced with the sparse measurement data.The graphical summary of Gappy POD process is shown in Fig. 1. 2.2.Gappy Auto-Encoder.Gappy AE uses nonlinear manifold solution representation described in Section 2.2.1.The nonlinear manifold is found by training the auto-encoder in an unsupervised fashion as presented in 2.2.2.In Section 2.2.3, gappy procedures for finding points in the reduced space with given measurements are given.A graphical summary of the Gappy AE process is shown in Fig. 2.

Nonlinear manifold solution representation.
The Gappy AE utilizes a nonlinear manifold S := {g (v) |v ∈ R ns }, where g : R ns → R Ns with n s ≪ N s denoting a nonlinear function that maps a latent space of dimension n s to the full order model space of dimension N s .In other words, the Gappy AE estimates the solution by approximating it within a trial manifold as (2.8) x where x ∈ R ns denotes the generalized coordinates.The initial condition for the generalized coordinate, x0 ∈ R ns , is given by x0 ).The details of the nonlinear functions, h and g, are presented in Section 2.2.2.
2.2.2.Sparse Shallow Auto-Encoder.Auto-Encoder is a feed-forward neural network model that learns identity mapping.The auto-encoder consists of an encoder, h : R Ns → R ns , and a decoder, g : R ns → R Ns .The nonlinear activation functions in the auto-encoder make them nonlinear.The encoder maps high dimensional data, x ∈ R Ns to low dimensional data, x ∈ R ns and the decoder maps back the low dimensional data to the high dimensional data.The decoder is used for nonlinear mapping, g(x), mentioned in Section 2.2.1.
The general structure of the encoder and decoder can vary, but for our study, we use a sparse shallow network as outlined in Section 3.2 of [31].The sparsity in the network enhances efficiency by eliminating connections in the layers.However, when constructing neural networks for physical data with higher dimensions, caution must be exercised to ensure that the spatial data organization aligns with the sparse connection.In other words, spatially near nodes share part of the connections, whereas far-away nodes in the space domain do not share the connections even if two nodes are neighbors.In this work, the encoder and decoder have three layers.The encoder layers are fully connected and a nonlinear activation function is applied in the hidden layer, while the output layer lacks a nonlinear activation function.For the decoder architecture, the nodes between the input and hidden layer are fully connected with a nonlinear activation function, however, the nodes between the hidden and output layer are sparsely connected without an activation function.We have introduced two hyper-parameters, b, and δb, to design a sparsity structure.b denotes the size of a block to compute for one output node and δb defines the amount of shift that the block moves.In our work, a swish activation function is used.For more information on the neural network architecture, please refer to [31].
A sparse and shallow decoder can efficiently compute a subset of the output because the active nodes in a hidden layer constitute a subset of the entire set of hidden nodes.Given that we exclusively analyze the sparse measurement data, we only need to consider the subset of output nodes generated by the decoder.This approach enables leveraging the inherent sparsity of measurements.In essence, the sparse and shallow neural network structure can be streamlined into a more compact network by retaining only the active nodes and edges, allowing computing of the output nodes associated with measurement points.

Gappy Procedures.
Minimizing error between measurements and decoder output leads to a nonlinear problem, which can be solved by the Gauss-Newton method to find latent space coordinates.We first define n ∈ R Ns as (2.9) where x ∈ R Ns is a vector that represents the solution at a given time and parameter and x j denotes j-th component of x.Then, a diagonal matrix, N ∈ R Ns×Ns is constructed with elements of n vector on the diagonal and removing zero columns gives us Z ∈ R Ns×nr , where n r is the number of known values.Now, we define an error, ϵ ∈ R as , where Z T x is a measurement data and x is prediction given by (2.11) x = where b ∈ R ns is a generalized coordinate to be found.Minimizing ϵ gives us a generalized coordinate b through the Gauss-Newton method.At the i-th iteration, first, we linearize Z T (x − x) at b i as (2.12) where J = ∂ x ∂b and ∆b i = b i+1 − b i .Then, the linearized error ϵ lin becomes (2.13) Imposing ∂ϵ lin ∂b = 0, which is given by (2.17) Thus, we have The above steps are iterated until the error is minimized.Let us write x as the generalized coordinate at the final iteration.Then x is computed using x = x ref + g(x) to complete the reconstruction.
In practical applications, sensor placement options are often constrained by factors such as product design, operational considerations, and maintenance requirements.We assume the flexibility to position sensors anywhere for our sampling algorithm tests.
For the Gappy POD method, we use the POD basis matrix for DEIM and S-OPT sampling algorithms; however, the Gappy AE does not have such POD basis matrix.To apply DEIM and S-OPT sampling algorithms to the Gappy AE, we introduce a residual, r ∈ R Ns , defined by assuming x ref = 0. Following the GNAT-SNS approach introduced in [15], the residual is approximated by where Φ r ∈ R Ns×nr is POD basis matrix from the residual snapshot matrix and r ∈ R nr is a reduced residual.The residual snapshot is constructed by concatenating residual vectors during iterations when minimizing r for the training data.Then, applying SVD to the residual snapshot and choosing leading n r left singular vectors yield the residual basis matrix, Φ r .Next, we define Z T := [e p1 , . . ., e pn r ] T ∈ R nr×Ns , n s ≤ n r ≪ N s , is the sampling matrix and e pi is the p i th column of the identity matrix I Ns ∈ R Ns×Ns .Z T extracts rows corresponding to sampling points from matrices or vectors.Then, solving the following equation Therefore, we have Then, the residual minimization problem given by (2.25) x = argmin solves x using the Gauss-Newton method.The update for each iteration is given by where J(x) is a Jacobian of g(x).

Discrete Empirical Interpolation Method (DEIM).
The DEIM algorithm employed here is the so-called oversampled DEIM.The oversampled DEIM differs from the original DEIM approach and aligns more closely with the sampling technique presented in [6].Moreover, unlike the original DEIM, the oversampled DEIM allows a larger number of sample points to be selected than the number of POD basis.
Given POD basis matrix, , the algorithm selects indices such that the reconstruction error bound is minimized in a greedy sense.The first sampling index is chosen at the largest absolute value of ϕ 1 .Next, the algorithm proceeds to iterate through the remaining columns of the Φ for the jth iteration with j > 1, approximating and Z T 1:j−1 is a sampling matrix that extracts selected rows (i.e., selected indices) up to (j − 1)th iteration.Then, the next index is selected, at which the absolute value of the error (2.28) ϵ j = ϕ j − φj is the largest.When oversampling, we iterate the above process multiple times as we update sampling indices for a given ϕ j .Implementation-wise, we follow Algorithm 3.1 of [36] for the oversampled DEIM.

S-OPT.
We introduce the S-OPT sampling algorithm presented in [48,36].As in Section 2.3.1, we denote Φ as the POD basis matrix, Z T as the sampling matrix, and x as the ground truth data to be reconstructed.According to Theorem 3.1 of [36], we can write the reconstruction error as (2.29) x. Since we do not know the ground truth of x in practice, the true optimum Z * cannot be found.We use training data to find Z and this is quasi-optimal.The S-OPT algorithm finds sampling points to have the smallest ϵ under the condition of given training data of x, corresponding to maximizing the column orthogonality of Z T Φ and determinant of (Z T Φ) T Z T Φ.In [48], the quantity S : R N ×p → [0, 1] is defined as where A is a N × p matrix and α i is the ith column of A assuming ∥α i ∥ ̸ = 0. Maximizing S(A) provides maximal orthogonality of A and determinant of A T A. In the context of sampling algorithms in the Gappy POD and AE, the S-OPT seeks the optimal sampling matrix Z that maximizes S, i.e., (2.31) The S-OPT sampling algorithm is presented in Section 3 of [48].The source code of the S-OPT algorithm is available at [11].
3. Numerical Results.We used three normalized example problems, diffusion, radial advection, and wave problems, for testing the Gappy AE algorithm.The Gappy AE results are shown in this section.Due to page constraints, the Gappy POD results are presented in Appendix B of [32].The Python codes that are used to produce the results with noiseless measurement assumption are available at https://github.com/youngkyu-kim/GappyAE.The codes for noisy measurement cases are omitted because the only difference is the line for adding noise to the measurement data.Open source finite element method (FEM) solver, called MFEM [1,42] is used to generate simulation data.The auto-encoder is trained using the simulation data, where training parameters for each problem are given by (3.1) µ ∈ D train = {0.8,0.85, 0.9, 0.95, 1, 1.05, 1.  1 of Appendix A in [32].For all cases, both training curves and test curves consistently continue to improve and we don't see the test curves plateau or worsen, which suggests the models are not overfitting.We conclude that the auto-encoder models are not over-fitted to the training data.
The auto-encoder training is repeated 10 times to account for randomness due to random initialization and mini-batch sampling.The statistics of MSE losses are presented in Fig. 2 of Appendix A in [32].The MSE losses decrease as latent space dimensions increase because of more learnable hyper-parameters.
For each numerical problem, we present two cases: measurements with and without noise.We assume white noise and create noisy measurements by adding random numbers with uniform distribution within the range [−0.01, 0.01), which is ±1% of the maximum data value in each example.In our numerical examples, projection error is calculated when all data points are measured with and without noise and used as a low bound of the Gappy AE and POD data reconstruction accuracy.The Gappy AE projection errors are shown in Figs. 3, 4, and 5 of Appendix A in [32].The Gappy POD projection errors are presented in Figs. 26, 27, and 28 of Appendix B in [32].Gappy AE is O (10) to O(100) times more accurate than the Gappy POD in terms of projection error.The projection errors without noise are slightly lower than those with noise; the effects of noise in computing projection errors are negligible.
For our numerical examples, the size of the reduced dimension (i.e., latent space dimension for nonlinear manifold or the reduced basis dimension for linear subspace) should be at most half of the number of sample points to avoid numerical round-off errors.If the reduced dimension size is close to the number of sample points, the condition number of the matrix in solving x increases, leading to errors in the calculation of the pseudo-inverse.In addition, the reduced dimension size is greater than that of the intrinsic dimension.Through the numerical examples, the intrinsic dimension is 2, with time t ∈ R + and parameter µ ∈ R; the number of sample points is 12.Thus, we limit the dimension size to {3, 4, 5, 6}.
The computational cost of the online phase of gappy data reconstruction is measured in terms of wallclock time.Calculations are performed on one CPU of an AMD EPYC 7763 @ 2.45 GHz and DDR4 Memory @ 3200 MT/s.With given 12 sample points and reduced dimension (i.e., the POD basis or latent space dimensions) of [3,4,5,6], the online phase of the Gappy POD method for each reconstruction step in our example problems takes around 0.1 ms.However, it takes around 3.0 to 3.8 ms for the Gappy AE method to reconstruct the data during the online phase.To compare the online cost of Gappy AE, Gappy POD, and FE simulation, wall-clock time for each time step is measured 10 times and the mean value is used in Table 1.The performances of Gappy AE and POD methods in terms of reconstruction error are compared in the following sections.

Table 1
Comparison of online cost of Gappy AE and Gappy POD under noiseless measurement conditions.Note that FE simulation does not reconstruct data but solves governing equations.For data reconstruction, the best-performing cases in terms of accuracy from each numerical example are selected, and the online costs of Gappy AE, Gappy POD, and FE simulation are computed by averaging the mean wall-clock times for each time step over FOM parameters.

Problem Type
Sensor   First, we show the effects of the latent space dimensions in Figs. 4 and 5.In Fig. 4, the relative errors are similar over the latent space dimensions.However, with noise in measurement data, as described in Section 3, the accuracy of Gappy AE is better when the latent space is 3 or 4. We consider that the auto-encoder with a smaller latent space dimension is less sensitive to noise.Next, we choose the best-performing latent space dimension for each sampling method and they are compared in Figs. 6, 7, 8, and 9 of Appendix A.1.1 in [32].When there is no noise, all three sampling methods offer similar accuracy, and no differences in solution plots are observed in Fig. 7 of Appendix A.1.1 in [32].For a noisy measurement case, the uniform sampling method gives the lowest relative error.
Finally, Gappy AE and POD methods are compared in Figs. 6, 7, 8, and 9. Regardless of noise in the measurement data, the Gappy AE algorithm outperforms the Gappy POD in terms of accuracy.The effects of the latent space dimension in Fig. 10 demonstrate no significant changes in relative errors with increasing dimension when there is no noise in measurement data.However, with measurement noise, as shown in Fig. 11, Gappy AE exhibits better accuracy with a smaller latent space dimension, which is due to the less sensitivity of its auto-encoder to the noise.The section also includes a comparison of the Gappy AE performance for each sampling method.The results in Figs.10,11,12, and 13 of Appendix A.1.2 in [32] indicate similar results across all methods with no noise.For a noisy measurement case described in Section 3, the uniform sampling method yields the most accurate results.Adding noise to measurement data not only deteriorates the data reconstruction accuracy but also weakens the spatial characteristics of the dynamics embedded in the data.Thus, the uniform sampling method works the best among the 4 different sampling methods.
Finally, Gappy AE and Gappy POD methods are compared in Figs. 12, 13, 14, and 15.Regardless of noise in the measurement data, Gappy AE algorithm outperforms Gappy POD in terms of accuracy.
where the spatial and time domains are given by x ∈ Ω = [−1, 1]×[−1, 1] and t ∈ [0, 3], respectively; velocity, v is defined as T .The Dirichlet boundary condition (3.8) u = 0 on ∂Ω and initial condition (3.9) u(x, 0; µ) = 0.5(sin(πµx 1 ) sin(πµx 2 ) + 1) are imposed, where µ is a parameter.The spatial domain is discretized into a 64 × 64 square mesh.The fourth-order Runge-Kutta time stepping scheme is applied with uniform ∆t = 0.005.This problem is based on the example 9 of MFEM's example problems.The source code of the radial advection problem is available at https://github.com/mfem/mfem/blob/master/examples/ex9.cpp.Figure 16 shows five snapshots for the two extreme parameter values.For this radial advection problem, we impose the Dirichlet boundary condition, and its value is constant.Thus, we only allow sensors to be located in the inner domain region because data measured at the boundary never changes.The findings indicate that relative errors are the lowest for all sampling algorithms when the dimension of the latent space is 6, as in Fig. 17.However, with noise, the Gappy AE demonstrates higher accuracy with a smaller latent space dimension (Fig. 18).Thus, DEIM, uniform sampling, LHS, and S-OPT algorithms work best when the latent space dimension is 4, 3, 3, and 5, respectively.Next, we choose the best-performing latent space dimension for each sampling method and compared them in Figs.14,15,16, and 17 of Appendix A.2.1 in [32].The results reveal similar results for S-OPT and LHS methods with no noise.However, the accuracy of the DEIM sampling method deteriorates at parameters near 0.8.The uniform sampling method yields less accurate results overall.In contrast, the overall accuracy decreases with noise.The LHS method gives the lowest relative error with noisy data among different sampling algorithms.
A comparison is made between Gappy AE and Gappy POD methods, as illustrated in Figs. 19, 20, 21, and 22.The Gappy AE consistently demonstrates superior accuracy, irrespective of noise in the measurement data.The results for the Gappy POD can be found in Appendix B of [32].The results in Figs.24 and 25 reveal that for 3 different sampling algorithms, data reconstruction accuracies of the Gappy AE with noiseless measurement are similar for latent space dimensions of 4, 5, and 6.When the latent space dimension is 3, the relative errors are worst for all sampling cases.However, noise in the measurement data leads to different results.The Gappy AE with the latent space dimension of 4 gives us the lowest relative error among other latent space dimensions for three different sampling algorithms.The comparison of the data reconstruction error of Gappy AE with the latent space that gives the best performance for each sampling method is shown in Figs.18,19,20, and 21 of Appendix A.3.1 in [32].The Gappy AE shows negligible differences in solution plots in Fig. 19 of Appendix A.3.1 in [32] when there is no noise.However, in Fig. 20 of Appendix A.3.1 in [32], the Gappy AE with uniform sampling yields the lowest relative error for noisy measurements.Moreover, adding noise to measurement data aggravates the relative errors.The effects of varying the latent space dimension size are illustrated in Fig. 30.The relative errors decrease with increasing dimension except for the DEIM sampling case, where the relative error is not the lowest when the latent space dimension is 6.However, as shown in Fig. 31, the presence of noise in measurement data results in better accuracy for Gappy AE with a smaller latent space dimension (i.e., 4).We proposed the Gappy AE method for gappy data reconstruction and compared it with the popular Gappy POD method.In the Gappy AE method, data is reconstructed using a nonlinear manifold by training the auto-encoder.Unlike a linear subspace-based approach, such as the Gappy POD method, the Gappy AE shows O (10) to O(100) times less data reconstruction error when assuming noiseless measurement as shown in Table 2.Even for diffusion dominant problems, the Gappy Table 2 Comparison of relative errors between Gappy AE and Gappy POD under noiseless measurement conditions.The bestperforming cases from each numerical example are selected, and their relative errors are computed by averaging over FOM parameters.

Problem Type
Sensor AE can outperform Gappy POD in terms of accuracy when the number of sample points is very sparse.However, we acknowledge that Gappy POD will in general outperform Gappy AE in terms of speed-up.The superior representability of the nonlinear manifold allows the Gappy AE to reconstruct gappy data when known (measured) data is sparse.This prominent feature is aligned with the digital twin application for physical data because, in reality, sensors are limited to measuring sparse physical field data, yet they want to estimate the entire field data.In addition to that, if there is training data available for autoencoders, the Gappy AE method can be used in any domains whether simulation is possible or not.In our numerical tests, the spatial dimension of reconstruction data is 3, 600 and the number of sample points is 12.The speed of data reconstruction using the Gappy POD is approximately 30 times faster than the Gappy AE.Although the Gappy AE takes more time to reconstruct data because of solving nonlinear problems in latent space dimension, its speed is sufficient for a 260 Hz measurement frequency.As explained in Section 2.3, DEIM and S-OPT algorithms are applied to the residual term to select sampling points for the Gappy AE.These algorithms find sample points to estimate the residual well, not the data to be reconstructed.Therefore, new algorithms to suggest sample points for the Gappy AE based on reconstruction error are needed.
The number of operations for the Gappy POD computation given by Eq. (2.7) is O(N s n r ), where N s is the dimension of original data and n r is the number of samples.Φ(Z T Φ) † is precomputed once and Z T x is not a matrix and vector multiplication but a sampling process.Thus, N s × n r shaped matrix and vector multiplication is done every measurement step, resulting in O(N s n r ).For Gappy AE, generalized coordinates of latent space are computed first, and its computational cost is O(n r bf ) + O(f n 2 r ), where b is the number of nodes in the hidden layer to compute single output nodes and f is the latent space dimension.Then, decoder evaluation for output requires O(N s b) + O(N s f δb) operations, where δb is the amount by which the block of b nodes shifts.Thus, we have O(n r bf ) + O(n 2 r f ) + O(N s f δb) + O(N s b) for computational cost of the Gappy AE for each measurement step.Details on counting operations are presented in [31].Since we assume n r << N s , δb < b << N s , and f << N s , the computational cost of the Gappy AE increases linearly as the problem size, N s increases, indicating that the Gappy AE is also suitable for a large problem if n r , f , δb, and b are small.
In future work, we will investigate retraining the auto-encoder with noisy data to make the Gappy AE more reliable for noisy measurements.To be more specific, we plan to train two decoders.One is for noisy data and the other is for noiseless data.The decoder trained with noisy data can be used for finding x in a latent space.Then, the decoder trained with noiseless data will be used to reconstruct data.

25 Fig. 3 .
Fig. 3. Diffusion simulation solutions from the initial to the final time for the two endpoints of µ

Fig. 4 .Fig. 5 .
Fig. 4. Accuracy of Gappy AE for diffusion problem with noiseless measurements on the boundary

Fig. 10 .Fig. 11 .
Fig. 10.Accuracy of Gappy AE for diffusion problem with noiseless measurements in the domain

25 Fig. 16 .
Fig. 16.Radial advection simulation solutions from the initial to final time for two endpoints of µ

Fig. 17 .Fig. 18 .
Fig. 17.Accuracy of Gappy AE for radial advection problem with noiseless measurements in the domain

Figure 23
shows five snapshots for two extreme parameter values.As in the diffusion problem case, two sensor location scenarios are considered: boundaries and inner regions of the domain.

25 Fig. 23 .
Fig. 23.Wave simulation solutions from the initial to the final time for two endpoints of µ

Fig. 30 .Fig. 31 .
Fig. 30.Accuracy of Gappy AE for wave problem with noiseless measurements in the domain
test = {µ i |µ i = 0.75 + i/100, i = 0, 1, • • • , 50}In our numerical examples, the width of the encoder is twice the input size and the latent space dimension varies from 3 to 6.The block size and the amount of shifts are set as 100 and 20, respectively.The batch size is 240 and the data set is split into 80% for training and 20% for testing.The ADAM optimizer, a variant of stochastic gradient descent (SGD), is used to update the learnable parameters.The initial learning rate is set as 0.001 and reduced by a factor of 10 if learning shows no improvement for 50 epochs.The number of epochs for training is 10000, but if training does not yield improvement after 200 times, the training is stopped.The autoencoders are trained on a single NVIDIA A100 GPU with 40GB of memory.The average training time per epoch is 0.539 seconds with a latent space dimension set to 6.The early stopping scheme introduces variability in the training time; however, we enforce a maximum of 10000 epochs.Consequently, the training time is capped at approximately 90 minutes.The mean squared error (MSE) loss of the auto-encoder is shown in Fig.