Combining a fully connected neural network with an ensemble Kalman filter to emulate a dynamic model in data assimilation

Using neural network technology, dynamic characteristics can be learned from model output or assimilation results to make predictions, which has greatly progressed recently. Combining a fully connected neural network with an ensemble Kalman filter, a data-driven data assimilation method is proposed to emulate a dynamic model from sparse and noisy observations. First, the training of the surrogate model is learned from model forecast values and assimilation results, and its performance is verified using the training accuracy (loss) and the validation accuracy (loss) at different training times. The hybrid model couples the original dynamic model with the surrogate model. Second, the assimilation process includes a "two-stage" procedure. Stage 0 generates the training sets and trains the surrogate model. Then, the hybrid model is used for the next period assimilation in Stage 1. Finally, several numerical experiments are conducted using the Lorenz-63 and Lorenz-96 models to demonstrate that the proposed approach is better than the ensemble Kalman filter in different model error covariances, observation error covariances, and observation time steps. The proposed approach has also been applied to sparse observations to improve assimilation performance. This approach is restricted to the form of the ensemble Kalman filter. However, the basic strategy is not restricted to any particular version of the Kalman filter.


I. INTRODUCTION
Data assimilation (DA) is an important method to fuse observation information and nonlinear physical models. It is aided by estimation theory, cybernetics, optimization methods and error estimation theory in mathematics [1].
Early DA methods were mainly polynomial interpolation, successive corrections, and optimal interpolation in the 1990s. After this period, modern DA methods have been widely studied and rapidly applied [2][3]. DA methods are mainly divided into two categories: continuous DA and sequential DA. The former mainly includes threedimensional and four-dimensional variational assimilation (3DVar and 4DVar, respectively) [4]. The latter mainly includes the Kalman filter (KF), ensemble Kalman filter (EnKF) and particle filter (PF). These approaches refer to the updating of the model state based on the weighting of the observation error and the model error to obtain an a posteriori optimal estimation of the model state when the system is running [5][6][7][8]. After the status update, the model been a huge success [9][10] and has been widely used in the fields of ocean, land surface and atmospheric DA [11].
Since objective reality is unknown, people usually replace the model operator with an expanded and refined mathematical model, but such models cannot be equal to objective reality. Due to the diversity of mathematical model representation methods, environmental changes and other factors, model errors and observation errors always exist [12][13]. At present, many practical DA schemes rely on Gaussian or linear assumptions. In particular, most practical data assimilation schemes are approximations of the famous Kalman filter (KF), all of which have been introduced to reduce computational costs and improve statistical predictions. Although current DA methods have made great improvements in error processing and correction, errors are inevitable due to the influence of various uncertain factors, such as the initial value, boundary condition and model structure [14][15]. Therefore, an important challenge of DA is how to intelligently utilize existing methods in the presence of various errors to solve practical problems, such as the incomplete understanding of physical processes, lack of large computational resources, and understanding of interactions across scales.
With these advancements in ML, the problem of using surrogate models to identify unknown processes based on observations has also been solved through sparse regression and DA approaches. Tang et al. [21] suggested using artificial neural networks as a possible DA technology.
However, Campos Velho et al. [22] used neural networks in all DA spatial domains. Later, this approach was improved by Harter and Campos Velho [23].
The output of DA is clearly dependent on the uncertainty in the numerical model, which has led to the development of techniques for considering model errors in the DA process [24]. One can accomplish this by using parameterizing model errors in model equations or by adding random noise to deterministic models [13]. In any case, there must be a dynamic model, and its existence is the key to DA. It is worth noting that for the ML problem, the optimization problem solved by DA is equivalent to the ML problem when model errors exist. For instance, the DA approach was used to infer ordinary differential equation representations of dynamical models [25]. The use of analog techniques to replace the numerical forecast model has been described by Lguensat [26].

Ensemble Kalman filter
where , a it X represents the analysis values of the random variable at time t.
where 1 f t X  is the mean of the forecast values at time t + 1 and where , 1 a it X  is the analysis value at time t + 1 and ,1 it v  is Gaussian white noise with an expectation value and variance equal to 0 and t R , respectively. This is the principle of the EnKF in one cycle.

Fully Connected Neural Network
An FCNN is a network in which adjacent network layers are fully connected to each other, and the most basic principles are derived from back propagation [35][36].
Neurons are grouped in layers. Figure 1 is the schematic diagram of an FCNN. The leftmost layer is called the input layer, and it is responsible for receiving input data. The rightmost layer is called the output layer, and we can obtain the neural network output data from this layer. The layers between the input and output layers are called hidden layers because they are not visible to the outside [37][38]. Each Therefore, the ReLU is chosen as the activation function in this study [40].

Input layer
Hidden layer Output layer

EnKF coupled with FCNN
The general idea of this algorithm is that the FCNN provides a surrogate forward model to the EnKF; in contrast, the EnKF provides a complete time series to train the neural network. The scheme of the algorithm is displayed in Figure 2, and the schematic diagram of an EnKF coupled with an FCNN is shown in Figure 3. The assimilation process is separated into two stages. In stage 0, the nonlinear physical model is denoted as  In the training process, the optimal weight W (the weight of an artificial neural network) is determined using an iterative minimization process of the cost function. In this case, the cost function to minimize is where m is the length of the assimilation or training window and W is also the set of parameters of the surrogate model. This equation also estimates the model error using sparse and noisy observations. If y t is the full observation and t R = 0 (no observation noise), then the standard ML cost function is shown in equation (13).

Assimilation evaluation metrics
(1) The root mean square error (RMSE) is used to quantify the assimilation performance, as shown in equation (15).

Numerical experiments using the Lorenz-63 model
The Lorenz-63 model is a nonlinear spectral model used in the study of the finite amplitude convection of a fluid and proposed by Lorenz and Saltzman in 1963 [41][42]. Because of its nonlinear chaotic characteristics and low dimensionality, the Lorenz-63 model is often used to verify the performance of data assimilation systems. The Lorenz-63 model is defined as shown in equation (17):

dx t y t x t dt dy t rx t y t x t z t dt dz t x t y t bz t dt
In this configuration, 10, 28, previous studies [44]. This study chooses the variable x(t) to observe the assimilation results.

Training the surrogate model
For the dynamic system described by equation (14), we run the system and record the true value x(t) of the system. , , ,

Performance of the EnKF-FCNN with different model error covariances
In the EnKF-FCNN, the surrogate model ( )   is added to the forecast step. The forecast values are calculated by equation (11). In these experiments, the model error covariance Q t ={0.5, 1.5, 3} and the observation error covariance R t = 2. The other configurations are the same as in the training of the surrogate model. Figure 5 compares the MAE of the EnKF-FCNN approach with that of the EnKF. The RMSEs of the different approaches are compared in Table II. The conclusion of this experiment is as follows: (1) the MAE and RMSE of the EnKF-FCNN approach are significantly lower than those of the EnKF. (2) As Q t increased, although the RMSE of the new method also increased, the rate of the increase was not as fast as that of the EnKF. These results show that the EnKF-FCNN approach performs better than the EnKF, especially when the model error increases.

Performance of the EnKF-FCNN with different observation error covariances
Another group of experiments with the observation error covariance R t ={1, 2, 3} is conducted using the model error covariance Q t = 0.1, which is different from that used in the training process. The other configurations are the same as in the previous experiments. Figure 6 also compares the MAE of the EnKF-FCNN approach with that of the EnKF.
The same result is obtained, which is that the MAE of the

Performance of the EnKF-FCNN with different observation time steps
The

Numerical experiments using the Lorenz-96 model
The Lorenz-96 system is also widely used to verify various assimilation algorithms [45]. The expression is shown as follows:  Figure 8 compares the performance of the EnKF-FCNN with that of the EnKF. In these configurations, RMSE EnKF = 1.272 and RMSE EnKF-FCNN = 0.864. These results show that the EnKF-FCNN performs better than the EnKF. Another group of experiments, with Q t = 2 and R t = 2, is also conducted in the numerical simulation. All the other parameters remain the same. Figure 9 shows the performance of the two approaches in these configurations. The RMSE EnKF and RMSE EnKF-FCNN are equal to 2.121 and 1.067, respectively.
In conclusion, the MAEs and RMSEs demonstrate that the VOLUME XX, 2021 1

Training and loss
With fewer training epochs, the learning efficiency is low, and the accuracy of the training model will be poor.

Uncertainties
The EnKF-FCNN approach performs fairly well in the Merging multiscale ML and sparse coding greatly improves saliency detection [46]. A multiscale convolutional neural network effectively solves remote sensing images with complex backgrounds [47]. A multiscale target detection algorithm based on a region proposal convolutional neural network can detect differences in the target scales of images and improve the detection speed [48]. Therefore, coupling the EnKF-FCNN strategies with multiscale space theory is a very promising research direction.

The computational cost
In the EnKF-FCNN approach, the computational cost of good, which was similar to m=10 3 . Therefore, training sets that are too large will also cause resource waste and high calculation costs. An effective approach is to end the training when the training accuracy or validation loss has not improved over several cycles. This will save considerable computing costs.

IV. CONCLUSION
In this study, a new method for an FCNN coupled with an Nevertheless, the effectiveness of the proposed algorithm has yet to be verified in specific fields with large state spaces and more complex internal and external mechanisms.
Future work should include the implementation of a DA+NN under more general conditions [49][50]. For instance, the combination of parametric model inaccuracy and structural model uncertainty will be studied in the future [51]. In addition, modern deep learning tools (such as CNNs and RNNs) can also be introduced into data assimilation to improve the adaptability and performance to data under different conditions [20].