Application of Recurrent Neural Networks to Model Bias Correction: Idealized Experiments With the Lorenz‐96 Model

Systematic biases in numerical weather prediction models cause forecast deviation from reality. While model biases also affect data assimilation and degrade the analysis accuracy, observation information incorporated through data assimilation can provide information for detecting and alleviating such biases. In this study, the application of machine learning to model bias correction is demonstrated, emphasizing the effectiveness of recurrent neural networks. Idealized experiments are performed using the two‐scale coupled Lorenz‐96 model as the true system and single Lorenz‐96 model as the imperfect forecast model, to compare the effectiveness of bias correction methods based on various architectures of neural networks and simple linear regression. The neural networks generally outperformed linear regression, and recurrent neural networks showed the best ability in finding the systematic bias component from the analysis increment data. Bias correction using the recurrent neural networks also gives the most significant improvement in reducing the error growth rate in extended range forecasts. The results suggest that including past time series of the forecast variables improve model bias correction when limited information of the observation is incorporated through data assimilation.

The choice of predictor variables for neural networks in data-driven prediction is an important issue, as is the choice of neural network architecture. In purely data-driven time series modeling, using multiple time steps of the state variables as predictors usually gives better prediction of the next time step than using only the instantaneous values. This can be implemented using recurrent neural networks, or neural networks with "delay-embedding," which use the history of state variables as a single input data vector. The advantage of using recurrent neural networks over non-recurrent neural networks in data-driven modeling have been demonstrated (Chattopadhyay et al., 2020). Dimensionality reduction in purely data-driven modeling is often applied using latent variables for effective training. In such models, even when the original time series of the system is governed by Markovian dynamical system equations, the dynamics in the reduced space can be non-Markovian. Therefore the past history of the time series may be significant for forecasting the future state (Nguyen & Ouala, 2021;Ouala et al., 2020).
On the other hand, when the model includes partial and imperfect knowledge about the governing equation, the necessity of including past history and using recurrent neural networks is unclear. Previous studies applying machine learning to data-driven forecasting and model bias correction (Bonavita & Laloyaux, 2020;Brajard et al., 2020) have used neural networks without recurrent connections, corresponding the functional form of the part of the governing equation that depends on the instantaneous state variables. If the governing equation is a Markovian process and the previous state variables are fully observed, the input of the neural network model requires only the previous time step values. However, when the observation is partial in elements and sparse in space and time, which is the typical case in numerical weather prediction, the analysis in the model space obtained through data assimilation gives the time series of the full state variables. In such a case, the multiple time step input may be advantageous in compensating the uncertainty of the analysis due to imperfect observation. This study examines the extent to which the recurrent neural networks can be used for the model bias correction to improve the analysis and future forecast in an extended range.
In this study, the neural network application, emphasizing the recurrent neural networks, to the model bias correction is explored in the context of data assimilation. Idealized experiments are performed using the true time series generated by the true dynamical equations and imperfect forecast model, which represents only part of the true dynamics. In one of the experiments, partial observation coverage is assumed. For model bias correction, various types of neural networks including long-short term memory (LSTM) (Hochreiter & Schmidhuber, 1997) were applied and compared.
The remainder of this article is structured as follows. Section 2 describes the experimental settings, including the governing equations and entire structure of the data assimilation cycle with bias correction. Section 3 provides the result of bias correction term estimation using neural networks. A comparison between networks with different architectures is performed. Section 4 provides the result of the data assimilation experiment and shows the improvement of the analysis and forecast accuracy by bias correction. Section 5 summarizes the paper and discusses remaining issues.

Two-Scale Lorenz-96 System
The Lorenz-96 system is a well-known idealized dynamical system used to study the data assimilation and predictability of spatio-temporal chaotic systems (Lorenz, 1996). The coupled system consisting of multiple Lorenz-96 equations with different scales is also widely used in data assimilation studies. In this study, the two-scale Lorenz-96 system, which has been used in previous studies (Arnold et al., 2013;Pulido et al., 2016;Watson, 2019;Wilks, 2005), is used as the "true" system. The two-scale Lorenz-96 system equations are shown follows: where F is the external forcing, and h, b, and c are the coupling parameters. h corresponds to the coupling intensity, b the scales of variable Y, and c is the scale of time t, respectively. The second subscript of Y indicates the element of X to interact with. For each k = 1…K, there are J elements of Y 1,k …Y J,k , and ordered in such a manner that Y J+1,k = Y 1,k+1 (1 ≤ k ≤ K − 1) and Y 0,k = Y J,k−1 (2 ≤ k ≤ K). Both X and Y are ordered on cyclic 1-dimensional coordinates; the boundary conditions are X 0 = X K , X K+1 = X 1 and Y 0,1 = Y J,K , Y J+1,K = Y 1,1 . The number of grid points are K = 8 and J × K = 256 for X and Y, respectively. The parameter values are set to F = 20, h = 1, b = 10, and c = 4. These parameter settings are identical to those used in previous studies Wilks (2005), Arnold et al. (2013), and Watson (2019). The time units are denoted in model time units (MTUs) in the following. Figure 1 shows an example of the evolution of the first elements of the variables X (upper panel) and Y (lower panel). The range of X is larger than that of Y by approximately one order, and the variable X has a longer timescale than Y. Therefore, X and Y can be called "slow" and "fast" variables, respectively. Overall, this system exhibits chaotic behavior, where an initial small perturbation grows exponentially with time and eventually leads the system to a totally different state.
The forecast model is a single Lorenz-96 model without the coupling term in Equation 1.
The number of grid points K = 8 and the parameter value F = 20 are the same with the coupled system. The forecast model does not include the fast variable Y and lacks the coupling term. This imperfection leads to the deviation of the simulated time evolution from the truth and eventually produce systematic errors in the first guess in the data assimilation cycle. As the error is assumed to be partly dependent on the states of X, the bias correction is expected to reduce the systematic error caused by the model imperfections.

Data Assimilation With LETKF
Previous studies on data-driven modeling often assumed that the true time series of the state variables is available and used as training data to feed the model (Pathak et al., 2018;Watson, 2019). We consider a more challenging situation when observation is imperfect. Data assimilation is used to incorporate such observation to the time series of forecast model variables.
In this study, the local ensemble transform Kalman filter (LETKF) (Hunt et al., 2007) is used for data assimilation. Since the choice of data assimilation method is not the scope of this study, we choose a method well studied with the Lorenz-96 system.
We use 10 ensemble members for the LETKF. It is set to be larger than the number of grid points of the forecast model, as the effect of sampling error does not overwhelm the model bias signals that we focus on in this study. The Gaussian-form spatial localization is applied with a length scale of 1 grid. The cutoff length is also set to 1, which indicates that for an analysis at each grid observations only at the same grid and next grids are taken into account. This short localization scale is found to be suitable for reducing the analysis error.
For the covariance inflation, we apply additive inflation. We have confirmed that the additive inflation leads to significantly smaller analysis error than multiplicative inflation (not shown), as it can represent errors arising from the model bias. We first add random noises at each grid point of each ensemble member which is uncorrelated with each other and has a fixed standard deviation. Then we apply re-centering so that additive inflation changes only ensemble perturbations while it keeps the ensemble mean. The additive inflation factor, which is the standard deviation of additive noise, is treated as a tuning parameter to be optimized.
The "nature run" time series is generated by integrating the "true" system equations (Equations 1 and 2). The time step of the integration is 0.001 for both variables. The integration is performed for 16,00,000 steps (1,600 MTUs), starting from a random initial state. After discarding the initial 1,00,000 steps (100 MTUs) as a spinup, the time series of X is stored every 0.05 MTUs. Therefore the nature run has 30,000 records. The synthetic observation is generated based on the nature run, adding random noise to mimic observation error. The observation time interval is set to 0.05 in MTUs. For the observation coverage, we consider two cases. First is the "full observation" case, in which all eight grids of X are observed each time. Second is the "partial observation" case, in which only six out of eight grids of X are observed. The position of grids to be observed is chosen randomly at each observation step. The observation error is assumed to be unbiased, temporally and spatially uncorrelated, and have a Gaussian distribution with a 0.1 standard deviation for both cases. This value is chosen to be small enough to be comparable to the background error caused by model bias, such that we can focus on the effectiveness of model bias correction. The relatively large number of selective observation which is 6 has been chosen for the same reason.
The data assimilation experiments are performed using the synthetic observation and ensemble of the state vector X. The initial values of ensemble X are chosen randomly from the time series obtained by integrating the Lorenz-96 model separately for each member, to ensure that the initial values are on the attractor of the "true" system. The data assimilation cycle is performed for the whole period of nature run time series, which corresponds to 30,000 steps. The result is evaluated as the average over the whole period after discarding the first 100 steps as a spinup.
Before introducing bias correction methods, we first evaluated the performance of data assimilation using the imperfect model without bias correction. First, a set of experiments is performed to optimize the additive inflation factor value which minimizes the analysis root-mean square errors (RMSE). The optimal values were determined to be 0.8 and 0.5 for the full and partial observation cases, respectively. Such large values are necessary because of the model imperfections when a bias correction is not applied. The additive inflation factor must be sufficiently large to compensate for the underestimation of model error caused by model bias. Figure 2 shows an example of the time series of the first element of the analysis ensemble mean X and the analysis increment, which is the difference between the analysis and first guess ensemble means. The analysis increment looks noisy and mostly random, caused by observation error and the unknown instantaneous state of variable Y which is not represented in the model. Figure 3 shows the time series of all 8 grid point values of X in observation, analysis ensemble mean value, and the analysis ensemble spread and deviation from the nature. The analysis ensemble mean reproduces a temporally and spatially smooth time series, filling the unobserved grids with estimated values. The uncertainty of the analysis ensemble mean state can be inferred from the analysis ensemble spread, which tends to be significantly large at grid points where observation is missing. The deviation of the analysis ensemble mean from the nature run is shown at the bottom. The analysis ensemble spread partly corresponds to the magnitude of the deviation from the nature run.

Model Bias Correction
In this study, we investigate the bias correction method in a simple form such that the correction is applied to the first guess before the data assimilation calculation. Figure 4 illustrates the processes in the data assimilation cycle using the LETKF with model bias correction. At each data assimilation cycle, the ensemble of first guess x f is obtained by integrating the model (Equation 3) over the time interval of data assimilation, as it is denoted by the operator M. The first guess ensemble is used for the LETKF after bias correction applying a pre-defined function  . The estimation of the bias correction function is performed in an "offline" manner, where the function is determined statistically using the data set collected in the previous data assimilation experiment without bias correction. Although other forms of implementation of bias correction are possible in data assimilation, including 'online' bias correction, where the bias correction function is updated simultaneously with the system state variables, we used a rather simple method in this study that focuses on the impact of using the recurrent neural networks for bias correction. The bias correction function in this study is described in a general form as follows.
where vector is the first guess of the state variables (X 1 , …, X K ) at time t.
 is an arbitrary nonlinear function. It can be a function of either an instantaneous value of , and a series of recent values , +1 , . . . , . When the system is assumed to be a Markovian process,  is a function of instantaneous states of x. However, when the model is biased and the observation is noisy and sparse, the consideration of the history of x for a certain range of time may be beneficial for better bias estimation.
The spatial localization of  is also applied. The value of  at each grid point is assumed to depend only on the neighboring grid point values of X. In this study, we assume that  depends on the x values within two grids, that is, 5 grid point values in total.
The simple first-order linear regression and fourth-order polynomial regression are applied first as baseline references. Subsequently, neural networks with several different choices of predictors and architectures, including recurrent neural networks, are compared. The detailed neural network settings and training procedure are described in the following section.

Estimation of Bias Correction Function From the Data
Our goal is to estimate  which gives an appropriate value to correct the first guess calculated by an imperfect model in the data assimilation cycle. In other words,  is supposed to predict the systematic difference between the first guess and the hypothetical "optimal first guess," which is an unbiased estimation of the true state. When the true state is unknown and can only be estimated from the assimilation of partial and noisy observation, the resultant analysis state is the proper choice for the reference. Therefore, the difference between the analysis and the first guess, which is known as the analysis incre-  ment, is used as an instructor data to estimate  . The same approach have been used in a previous studies using neural networks (Bonavita & Laloyaux, 2020). This study further explores the advantages of using more flexible functional dependency for  using recurrent neural networks. We use the time series of analysis and first guess ensemble mean values obtained in the data assimilation cycle without bias correction as described in previous sections. The cycle is performed for a sufficiently long period of time to provide a large training data set.

Model Bias Correction Using Neural Networks
In the following, we compare the performances in bias correction by various forms of  . We employ five different types of neural networks and first-order and fourth-order linear regression methods. They are different from each other in model complexity and the input variables, while they are all trained using the same data set.

Network Types
The 5 types of neural networks used in our experiments are shown in the following. Their names correspond to the layer classes of Tensorflow 2.0, except "Dense_single," which uses "Dense" layer with a different input time slice. Dense_single: A basic feed-forward neural network (multilayer perceptron) using input data at a single time step for the input. Dense: A basic feed-forward neural network using a series of input data for the most recent 5 time steps. SimpleRNN: A simple recurrent neural network without gate functions and with layers with fully connected recurrent connections where output data is stored in memory cells and fed into the next input. GRU: A Gated-Recurrent-Unit recurrent neural network. LSTM A Long-Short-term-memory recurrent neural network.
The first two types, Dense and Dense_single, are standard neural networks whereas the other three are recurrent neural networks with different structures. Dense and Dense_single are different in the choice of input variables. SimpleRNN is a simple neural network with a recurrent dependency, whereas GRU and LSTM are advanced recurrent neural networks with gate functions, which enable them to treat more complex temporal dependencies. LSTM has also internal memory cells to represent multiple time scales.

Input, Output Data, and Loss Function
The neural networks are designed to represent the function  in Equation 4. The output of  is a vector with eight elements for X variables. However, as the model equations are homogeneous over all grid points, the bias correction function is also assumed to be homogeneous. That is, the functional form of the bias does not depend on the grid position. Therefore, the calculation of  is performed for each grid point separately using the same function. In addition, we applied the localization for the spatial dependency, in a similar manner to the LETKF. The value of  at each grid point is assumed to depend only on surrounding five grids including a grid at the same location. As multiple time steps are considered for the input, the localization is set broader than that of the LETKF.
For non-recurrent neural networks, Dense type uses five recent time steps for the input data, whereas Dense_ single uses only 1 time step. Recurrent neural networks also use five recent time steps in a similar manner, as a treatment to take account of short-period dependency in addition to the recurrent structure. As a result, the input and output data size is 5 × 5 and 1 respectively, except Dense_single, which is different from Dense in that the input variables for a single time step are considered without recurrent structure. Therefore Dense_single type has five input variables, whereas other neural network types have 25 variables in total.
Input and output data are both generated from the result of the data assimilation cycle without bias correction described in Section 2, using analysis ensemble mean and first guess ensemble mean. From the total 30,000 analysis time steps with a 0.05 MTU interval, the first 100 timesteps are considered as a spinup period and removed. The data on each grid point of output data and corresponding grids of input data is treated as independent data and concatenated with others. After that, the data is normalized and separated into batches with a prescribed size for training and validation. The forecast ensemble mean as a first guess is used for the input data, whereas the differ-ence between analysis ensemble mean and forecast ensemble mean is used for the output data. The batch size is 8,192 for both training and validation. Eight batches are used for training and only one is used for validation.
The loss function is defined as the mean-squared error between the instructor data and the output of neural networks. Additionally, to incorporate the information of input data accuracy, we applied the accuracy weight in the loss function in a similar manner as Brajard et al. (2020). Each training data was weighted by the inverse of the corresponding analysis ensemble spread. This treatment is important, particularly when the analysis ensemble spread significantly varies between directly observed and unobserved grids.

Network Parameters
The parameters of neural network structures and training are set as follows. Most parameters are common among all network types. The activation function for forward connections is a rectified linear unit in all layers and that for recurrent connections is a hyperbolic tangent (tanh). No regularization is used. The learning rate is fixed to 0.032. The Adam optimizer, with default parameters as in Tensorflow 2.0, is used. The networks were trained for 200 epochs in all experiments, which is sufficiently large for loss functions to converge. The number of layers and number of nodes per layer are considered as tunable parameters and determined in an objective way. The best configuration that minimizes the loss function is searched separately for each type of neural networks. The parameter space to explore has a graph structure, in which the number of parameters to tune varies, namely, the number of nodes are decided for each layer after the number of layers has been fixed. To effectively tackle this problem, the Sequential model-based optimization (SMBO) algorithm is applied. The SMBO uses a Bayesian approach, executing each trial with a new parameter set in a sequential manner. The evaluation of the objective function, which is the validation loss in this study, is approximated by a surrogate model in selecting the parameters for the next trial. The tree-based Parzen estimator (Bergstra et al., 2011) was used as the surrogate model. The optimization was performed by Optuna (Akiba et al., 2019), which implements the SMBO and pruner algorithms to speed up the search. This study applied the percentile pruner, which prunes a trial if the tracked intermediate validation loss is in the specified bottom percentile among all previously completed trials.
In addition, in comparing different types of neural networks, the constraint on the hyperparameters is set to keep the total number of tunable parameters within the prescribed range, as it controls the expressive ability of the neural networks. We set the range to be 1,000 to 1,100, which is sufficiently small to avoid overfitting.
The search space for the parameters is set as follows: • Number of recurrent/dense layers: l r ∈ [1, 3] • Number of units in each recurrent/dense layer: ( ) ∈ [4,32] for j = 1, 2, 3 For recurrent neural networks, they have a dense layer with four units and an output layer following the 1-3 recurrent layers of adjustable size. For neural networks, an output layer follows the 1-3 dense layers of adjustable size. Only the combinations of hyperparameters which satisfy the constraint on the total number of parameters are used as the candidates.
To find the optimal parameter values that minimize the validation loss, 30 trials are conducted for each type of neural networks and each of full-observation and partial observation cases. The optimal hyperparameter values for each type of neural networks are shown in Table 1. The result with the optimal parameters are shown in the following.

Linear Regression
We calculate coefficients of linear regression for  using the same data for comparison. The configuration of input and output variables is the same with that of Dense type neural networks. We use the instantaneous first guess value for the input and the same localization with five grid points. We use no regularization. Two regressions with different degree of complexity, one with first-order and the other with fourth-order basis functions, are performed. Figure 5 shows the comparison of losses calculated using the independent test data set with the same 30,000 steps after training in the full observation case. The test loss is defined as the unweighted root-mean square difference 10.1029/2022MS003164 8 of 14 between the predicted bias correction value and the analysis increment. For comparison, the same loss values are calculated for first-order and fourth-order linear regressions on the same data. The ratio of the loss of the case with each bias correction method against the case without correction is shown. For each type, the loss values are calculated for 10 small batches of test data to obtain means and standard deviations, which are shown as black bars. The losses of neural networks are smaller than those of linear regression. Among neural networks, three types of recurrent neural networks all provide better results than others, while the differences among these three types are comparable to their standard deviation and insignificant. Figure 6 shows the similar results for the partial observation case in which only 6 out of 8 grid points were observed every step. The advantage of using recurrent neural networks over other types is also evident.

Training Results
Bias correction can reduce only the systematic part of the model error which is determined by the state of model variables. This explains the remaining large fraction of the original loss (Figures 5 and 6). Figure 7a shows a scatterplot between the forecast value and difference between the forecast and analysis values in the full observation case. The blue and red dots indicate the results before and after bias correction using LSTM, respectively. Bias correction tends to reduce large forecast-analysis differences in which the forecast has a large value at the same grid point. The remaining variance in the difference between the corrected forecast and analysis shown by the red dots consists of the stochastic component and the partially corrected systematic component of the model error. Figure 7b shows a similar plot but for the difference between the forecast and true value. The overall characteristic is similar to Figure 7a, indicating that the analysis increment used in training sufficiently represents the true bias, which is the difference between the first guess ensemble mean and the truth.

Data Assimilation and Forecast Experiments With Model Bias Correction
We incorporated the bias correction functions represented by each type of neural networks and linear regression to the data assimilation cycle and examined their effects. The parameters of the neural networks are the same optimal parameters found in Section 3. Bias correction was applied before each data assimilation step to modify the first guess, as shown in Figure 4.
With each of the bias correction methods, data assimilation cycles were performed for 3,000 steps with additive inflation factor ranging from 0.05 to Figure 5. Comparison of average root-mean-square of the difference between corrected forecast and analysis (test loss in neural network training) for different bias correction methods (None: no correction, Linear: first-order linear regression, Linear4th: fourth-order linear regression, others are neural networks with corresponding names introduced in this section.) Red markers indicates the ratios of loss values against none. Thin black bars for each neural network methods indicates the standard deviation calculated from results using 10 batches of the test data. 1.00 to determine the optimal value. The optimal inflation factor was expected to be smaller when the bias correction is introduced (Li et al., 2009). Figures 8 and 9 show the analysis RMSE, defined as the root mean square of the difference between the analysis ensemble mean and the true state in the nature run, for various additive inflation factors. Each curve corresponds to each bias correction method. In both Figures 8 and 9, bias correction reduces not only RMSE but also the value of inflation factor which minimize RMSE. Those with recurrent neural networks (SimpleRNN, GRU, and LSTM) show smaller RMSE than others.
Since the analysis accuracy varies with time, its improvement by bias correction also varies with time. To examine the robustness of the improvement, the ratio of the analysis RMSE against true value with each bias correction method against that without bias correction is calculated as a function of time and its statistical characteristics are examined. Figure 10 shows the average value and 10 and 90 percentiles of the relative analysis error for each bias correction method in each of full observation and partial observation cases.
The reduction of analysis RMSE is observed for most of the analysis time series in the case of feed-forward and recurrent neural networks.
Using the analysis time series with the optimal inflation factor obtained above, extended forecast experiments were performed to evaluate the model improvement by bias correction. The forecast model also applies the same type of bias correction used to generate the analysis. For each type of bias correction, 100 snapshots were taken every 12.5 MTUs from the analysis time series and used to initialize independent forecasts for 4 MTUs. The forecast duration is sufficiently long for errors to grow. The RMSEs of the forecast against the truth are averaged over 100 forecasts to obtain a mean RMSE as a function of forecast time. Figures 11 and 12 show the mean forecast RMSEs for the full and partial observation cases, respectively. Besides the difference in the initial RMSE, which is the result of the different bias correction methods in data assimilation, the error growth rate is also different among the curves. Bias correction with recurrent neural networks provided the smallest RMSEs among others in both cases. The time required for the RMSE to exceed 1.0 in the forecast with the LSTM bias correction is approximately 30% and 20% longer than the case without bias correction, in the full observation and partial observation case, respectively. Figure 13 shows the improvement in the lead time for the forecast RMSE to grow up to 1.0. The lead times are first calculated for each of 100 forecasts and each bias correction method, and the ratio against the forecast  without bias correction is calculated. The average value and 10 and 90 percentiles over 100 samples for each bias correction method in each of full observation and partial observation cases are shown. Although the improvement significantly varies depending on the initial time of the forecasts, the impact of introducing bias correction is mostly positive for all methods, and recurrent neural networks provide the largest average improvement.
To emphasize the difference in error growth rate, an additional set of forecast experiments were performed with a different initialization procedure. Instead of using the analysis ensemble mean, the initial conditions were created in a similar manner, using the true time series from the nature run and adding temporally and spatially uncorrelated random noises from a Gaussian distribution with a 0.1 standard deviation. This common initialization  method allows us to focus on the differences in the error growth slopes, which directly indicate the improvement of the forecast model by bias correction. Figures 14 and 15 show the average forecast RMSE as a function of forecast time, as shown in Figures 11 and 12, but with common initial states. The "truth" curve represent the average RMSE of the forecasts using the true coupled Lorenz-96 equations. This shows the error caused by the initial random noise and provides the lower limit of the possible error growth. The advantage of recurrent neural networks over other bias correction methods is evident. However, even with bias correction with recurrent neural networks, the error growth rates in both Figures 14 and 15 are significantly larger than the lower limits. While the large part of systematic model error is effectively reduced, the remaining part includes the error from the uncertainty in the instantaneous states of the unresolved fast variables, causing the error growth that is significantly faster than the perfect model scenario.

Summary and Discussion
In this study, the application of recurrent neural networks to model bias correction in data assimilation was studied using a two-scale Lorenz-96 system. We focused on the advantage of using recurrent neural networks in correcting systematic model biases through data assimilation when observation is imperfect. We expected that the use of recent history of state variables can improve model bias correction by compensating for imperfect observation information. Idealized experiments using the Lorenz-96 system were performed for both full and partial observation cases. First, data assimilation cycles were performed to generate the time series of the analysis and first guess using the imperfect forecast model without bias correction. Large inflation factors were required to stabilize data assimilation and suppress the analysis error. The analysis and first guess time series were used to train the neural networks to approximate bias correction functions. The training was performed with various neural network architectures for comparison. In training neural networks, the inverse of the analysis ensemble spread was used to weigh the training data. The performances of the bias correction methods were evaluated in analysis and extended forecast RMSEs. Figure 10. The mean value (red crosses) and 10 and 90 percentiles (vertical lines) of the ratio of analysis root-mean square errors against that of the data assimilation without bias correction. Black and gray lines indicate percentile ranges for full observation and partial observation cases, respectively. Figure 11. Average forecast root-mean square errors as a function of forecast time in the full-observation experiments with different bias correction methods denoted in the same manner as in Figure 6. Each forecast is initialized with the analysis ensemble mean state obtained using the same bias correction method.
In both full and partial observation cases, the three recurrent neural networks performed similarly with the smallest analysis and forecast RMSEs among the other neural networks. In addition, the optimal inflation factor was significantly smaller than with linear regression or without bias correction.
In our experiments, only a small part of model error was corrected by a bias correction function (Figures 5 and 6). The remaining errors in the first guess include the random component caused by the missing information of the unresolved variables. To tackle this issue, some previous studies have proposed a method of stochastic parameterization (Arnold et al., 2013;Gagne et al., 2020).
Our approach using data assimilation is applicable to bias correction based on imperfect observation, considering that observation is noisy and limited in time, space, and variables. The effectiveness of bias correction significantly depends on the observation accuracy and coverage. In our experiments, the value of the observation error standard deviation used to generate synthetic observations was set to a small value: 0.1. If the observation was  . The mean value (red crosses) and 10 and 90 percentiles (vertical lines) of the ratio of analysis root-mean square errors against that of the data assimilation without bias correction. Black and gray lines indicate percentile ranges for full observation and partial observation cases, respectively. noisier or covered less grid points, the systematic bias signals would be more obscured by random errors in the analysis and more difficult to capture. The feasibility of bias correction methods based on machine learning in real-world applications, such as in weather and climate models given the realistic observation coverage, should be investigated in future works.