Comparing an Ensemble Kalman Filter to a 4 DVAR Data Assimilation System in Chaotic Dynamics

ABSTRACT: In this paper, the Ensemble Kalman Filter is compared with a 4DVAR Data Assimilation System in chaotic dynamics. The Lorenz model is chosen for its simplicity in structure and the dynamic similarities with primitive equations models, such as modern numerical weather forecasting. It was examined if the Ensemble Kalman Filter and 4DVAR are effective to track the Control for 10, 20 and 40% of error at the Initial Conditions. With 10% of noise, the trajectories of both are almost perfect. With 20% of noise, the differences between the simulated trajectories and the observations as well as “true trajectories” are rather small for the Ensemble Kalman Filter but almost perfect for 4DVAR. However, the differences are increasingly significant at the later part of the integration period for the Ensemble Kalman Filter, due the chaotic behavior system. However, for the case with 40% error at the Initial Condition, neither the Ensemble Kalman Filter or 4DVAR could track the Control with only 3 observations ingested. To evaluate a more realistic assimilation application, it was created an experiment in which the Ensemble Kalman Filter ingested single observation at the 180th time step in the X, Y, and Z Lorenz variables and only in the X variable. The results show a perfect fit of 4DVAR and the Control during a complete integrations period, but the Ensemble Kalman Filter has a disagreement after the 80th time step. On the other hand, it was shown a considerable disagreement between the Ensemble Kalman Filter trajectories and the Control as well as a total fail of 4DVAR. Better results were obtained for the case in which observation covers all the components of the model vector.


INTRODUCTION
Numerical models are an important tool to weather forecasting, probably the most of all, being largely applied to specific issues, such as aeronautic weather forecasting.The development to improve these models is an ongoing process, and Data Assimilation has been a significant research line to improve weather forecasting model and its applications, such as wind profile predictive models, applied by the Brazilian Air Force at the Centro de Lançamento de Alcântara (CLA), Maranhão State.
Data Assimilation is a procedure to get the Initial Condition as accurately as possible, through the statistical combination of collected observations and a background field, usually a short-range forecast; such a best estimate is called "analysis" in Meteorology.The Data Assimilation community has extensively tested 2 approaches, one based on Kalman filtering and another, on variational calculation.
Several operational meteorological centers, such as the European Centre for Medium-Range Weather Forecasts (ECMWF), used to have Three-Dimensional Data Assimilations (3DVAR) to build their analysis (Parrish and Derber 1992;Courtier et al. 1998).3DVAR is a kind of accurate statistical interpolation, being computationally feasible.However, it does not consider the "errors of the day".It means that the background error covariance is static.To overcome this failure, 4DVAR (Courtier and Talagrand 1998;Rabier and Courtier 1992;Courtier et al. 1994;Rabier et al. 2000) and Kalman filters (Miller et al. 1994) have been subject of many researches.
Harter FP, Corrêa CS Th e 4DVAR is model-dependent and computationally much more expensive than 3DVAR, but more accurate.In turn, only reduced rank Kalman Filters (KF) can be applied to operational numerical weather forecasting owing the computational cost.KF are especially interesting due the capacity to estimate the error covariance of the analysis.
The large dimension of the models can prohibit the implementation of KF in operational numerical weather prediction, but its implementation with simplifi cations -such as Ensemble Kalman Filter (EnKF; Evensen 1994;Burgers et al. 1998;Whitaker and Hamill 2002;Tippet et al. 2003), Ensemble Transform Kalman Filter (ETKF; Bishop et al. 2001), Ensemble Adjustment Kalman Filter (EAKF; Anderson 2001) and Local Ensemble Kalman Filter (LEKF; Ott et al. 2002Ott et al. , 2004) -is possible due to the decrease in the computational cost.
The purpose of this research was to compare an EnKF to a 4DVAR Data Assimilation system through the Lorenz equations and the sensitivity of the model to Initial Condition (IC) for non-linear and chaotic regimes.The main objective was to compare an implementation of the EnKF to a 4DVAR method with different noise levels at the IC and explore the assimilation systems over-determined by lack of observation.
Th e organization of the paper is as follows: the "Methodology" section describes a brief theoretical formulation of Lorenz equations and presents a basic introduction to EnKF and 4DVAR; numerical experiments are shown in the "Results" section and fi nal comments are found in the "Final Comments" section.

METHODOLOGY LORENZ EQUATIONS
Lorenz was looking for the periodic solutions of the Saltzman's model (1962), considering a spectral Fourier decomposition and only low-order terms.Then, he obtained the following system of non-linear coupled ordinary differential equations: (diameter of the Rayleigh-Bénard cell) and time, respectively; σ ≡ κ -1 v is the Prandtl number, being v the kinematic viscosity; b ≡ 4(1 + a 2 ) -1 is the relation between the height and the width of the rectangle (orbit travelled; Saltzman 1962); the parameter r = R/R c ∞ ∆T is the Rayleigh number, being T the temperature and R c the critical Rayleigh number.

ENSEMBLE KALMAN FILTER
Kalman Filter is the best linear unbiased estimator for a linear model under Gaussian assumption for the measurements and model random errors.The Kalman Filter method for non-linear models is called Extended Kalman Filter (EKF) and is given by the following definition, considering forecast and analysis procedures: where τ ≡ π 2 H -2 (1 + a 2 ) κ t is the non-dimensional time, being H, a, κ and t layer height, thermal conductivity, wave number where F n is our mathematical model; μ n is the stochastic forcing (modeling noise error); subscript n denotes discrete time-step; superscript f represents the forecasting value.
is modeled by the matrix H, and v n is the noise associated to the Y observation.The typical gaussianity, 0-mean and ortogonality hypotheses for the noises are adopted.Th e state vector is defi ned as where w a n+1 is the analysis value, K n is the Kalman gain, computed from the minimization of the estimation error variance J n+1 (Lorenz 1963), being )},being E{.} the expected value; Q is the covariance of μ n and R n is the covariance of v n .The assimilation is done through the sampling: r n+1 ≡ y n+1 -y f n+1 = y n+1 -H n w f n+1 ; P a and P f are forecast and analysis covariance error,respectively.
Comparing an Ensemble Kalman Filter to a 4DVAR Data Assimilation System in Chaotic Dynamic s According to Kalnay (2004), "Even if a system starts with a poor initial guess of the state of the atmosphere, the EKF may go through an initial transient period of a week or so, after which it should provide the best unbiased estimate of the state of the atmosphere and its error covariance".However, according to Miller et al. (1994), if the system is very unstable and the observation is not frequent enough, it is possible for the linearization to became inaccurate, and the EKF may drift away from the true solution.
The updating of Eq. 5 provides the "errors of the day", but its computational cost makes this updating impossible in practice to carry out.Therefore, this equation was replaced by simplifying assumptions, such as ensemble mean.In this study, it was proposed an EnKF which consists of replacing the forecast error covariance (Eq.5) by: where α is a parameter to be chosen to achieve convergence through iterations; ∇J is the gradient of the cost function with respect to initial state w 0 f, k .If the iteration Eq. 11 converges, w 0 f, k will approximate the desired initial state w 0 f, ∞ that satisfies J = min (J).However, for operational primitive equation models, where the number of degrees of freedom is of the order of 10 7 , this approach has some limitations.One of the limitations is overcoming by the adjoint model integrated backwards in time, which achieves an exact cost function gradient.This methodology, called 4DVAR, is summarized in the next section.

TANGENT LINEAR OPERATOR AND TANGENT LINEAR MODEL
Considering the model state w 0 f and its small perturbation w 0 f, tl (where tl means tangent linear), the change in the cost function J (w 0 f ), caused by the small perturbation, is denoted by J tl (w 0 f ), where: where the ensemble has K data assimilation cycles; k is the iteration.

FOUR-DIMENSIONAL VARIATIONAL DATA ASSIMILATION
In order to formulate the 4DVAR, first of all, a cost function J should be introduced to measure the misfit between the model and observations: In this approach, the model is used as a strong constraint (Sasaki 1970); it means that the model is considered perfect in the cost function formulation.
To minimize the difference between the model state and the observation, it could be used an iterative process where the values of the initial ones and model parameters are adjusted in the opposite direction of the gradient of the cost function.In operational weather forecasting, efficient minimizations methods are necessary, such as quasi-Newton and conjugate-gradient.In this research, it was used a simple steepest descent approach with α, due the lower dimension of the Lorenz equations, given by: In the limit of || w 0 f, tl || → 0, Eq. 12 becomes Using the definition of J(w 0 f ) in Eq. 10, Eq. 12 can be written as Combining Eqs. 13 and 14, one has on the basic state w k f and time step k.
Equation 16 is the tangent linear equation of the forward model (Eq.4).Through this equation, iteratively, the desired relation between w k f,tl and w 0 f,tl is obtained by: • Integration of adjoint model backward in time (Eq.21).During this integration, the observation is ingested whenever it exists.By this step, the cost function is minimized (Eq.11).

•
A new initial state is provided by Eq. 11.
• This new initial state is used to start the next k cycle.

NUMERICAL EXPERIMENTS
In the results explored forward from here, the Lorenz equations are solved by finite differences with a nondimensional time increment of 0.01 for 200 time steps integration length.According to Lorenz (1963), at a given σ = 10 and b = 8/3, the corresponding Rayleigh number is 24.74, which means that r larger than 24.74 will make a chaotic system.In this paper, only the variable X is depicted, just because Y and Z take to the same conclusions.
In this study, the sensitivity of the model (Classic Experiment) to the IC is evaluated.In Fig. 1a, it is depicted the resulting model state trajectories for σ = 10, b = 8/3 and r = 10, assuming the IC to be X 0 = 1.00,Y 0 = 3.00, Z 0 = 5.00 (Case 1) and assuming IC to be X 0 = 1.10,Y 0 = 3.30, Z 0 = 5.50 (Case 2).It means that Case 1 diff ers from Case 2 with an off set of 10% of noise for all model variables.In Fig. 1b, this experiment is directed to the chaotic regime (r = 32), Case 3 and Case 4.
This simple experiment shows that, for non-linear regime (r = 10), the noise at the IC is not determinant for the success of the prediction.By other side, for chaotic regime (r = 32), comparing the trajectories of Case 1 and Case 2, it is clearly seen that the bifurcation in the model solution where: Substituting Eq. 17 in Eq. 15, it is obtained where The transpose of the tangent linear operator is known as the adjoint operator.It is important to note that the order of the time index of the adjoint operator is reversed, compared to tangent linear operator.By definition, the adjoint model is: Thus, 4DVAR can be summarized by the procedures described below: • Integration of forecast model (Eqs. 1 -3) forward in time and storage of the trajectories.occurs throughout the integration period.This experiment illustrates the sensitivity of the chaotic system to IC.

Experiment 1
In this experiment (Fig. 2), EnKF and 4DVAR assimilate the same evolution trajectories as the Classical Experiment, but the initial guess is rather poor: 10% (X = 1.10,Y = 3.30, Z = 5.50) -Fig.2a; 20% (X 0 = 1.20,Y 0 = 3.60, Z 0 = 6.00) -Fig.2b; and 40% (X 0 = 1.40,Y 0 = 4.20, Z 0 = 7.00) -Fig.2c.It can be observed from the plots in Fig. 2 that, with 10% of noise, the trajectories of both EnKF and 4DVAR are almost perfect; with 20% of noise, the differences between the simulated trajectories and the observations as well as "true trajectories" are rather small for EnKF but almost perfect for 4DVAR.However, the differences are increasingly significant at the later part of the integration period for EnKF, due the chaotic behavior of the system.However, for the case with 40% error at the IC, neither EnKF or 4DVAR could track the Control with only 3 observations ingested.In fact, according to the literature, given a 40% error at the IC, the chaotic nature of the model solution demands a relatively large number of observations in the EnKF assimilation and large data assimilation window in 4DVAR.

Experiment 2
Numerical Weather Forecasting is of the order of 16 6 -10 7 degrees of freedom, whereas the total number of conventional observations of the variables used in the models is of the order of 10 4 (Kalnay 2004).Thus, the amount of observed variables applied hitherto is somewhat overestimated.By this reason, Experiment 2 represents a more realist assimilation application, where the EnKF ingests a single observation at the 180 th time step in the X, Y, and Z Lorenz variables   (Fig. 3a) and only in the X variable (Fig. 3b).In Fig. 3a, the results show a perfect fit of 4DVAR and the Control during a complete integrations period, but EnKF has a disagreement after the 80 th time step.On the other hand, Fig. 3b shows a considerable disagreement between the EnKF trajectories and the Control and a total fail of 4DVAR.It is clear that better results are obtained for the case in which observation covers all the components of the model vector.

FINAL COMMENTS
In this study, an EnKF and 4DVAR were available in a Data Assimilation context in chaotic regime.Both methodologies were examined and are effective to track the Control for 10, 20 and 40% of error at the IC.Considering 10% of noise at the IC, the results show a perfect fitting between assimilation curves and the Control.These results are still quite good for 20% of noise, but there is a disagreement between the "truth" and the estimation, especially at the end of integration period for EnKF, due the chaotic nature of the system.Considering 40% of noise at the IC, both EnKF and 4DVAR fail.
Regarding an experiment in which it is explored the assimilation over-determined by lack of observation, results show that EnKF needs more frequent observations, and the 4DVAR demands for observation in more than 1 variable of the Lorenz system.
Despite the limitations, EnKF and 4DVAR are state-ofthe-art techniques in Data Assimilation implemented in numerical weather forecasting.
Comparing an Ensemble Kalman Filter to a 4DVAR Data Assimilation System in Chaotic Dynamics 4DVAR × EnKF -Single assimilation in X, Y and Z

Figure 2 .
Figure 2. Evolution of the EnKF and 4DVAR for poor Initial Conditions.

Figure 3 .
Figure 3. EnKF and 4DVAR ingest a single observation at the 180 th time step.