Development of NAVDAS-AR: non-linear formulation and outer loop tests

NAVDAS-AR is the 4DVAR extension of the United States Navy’s operational 3DVAR data assimilation system, NAVDAS (Naval Research Laboratory Atmospheric Variational Data Assimilation System), where AR stands for accelerated representer. Like NAVDAS, NAVDAS-AR is cast in observation space, and is an observation space formulation of the 4DVAR algorithm, in contrast to the European Centre for Medium-Range Weather Forecasts 4DVAR system, which is an analysis space algorithm. In this paper we show how an inner loop linear solution can be augmented by an outer loop iteration procedure that introduces non-linear effects in the form of a first-order Taylor series expansion. The non-linear problem is then solved iteratively as a sequence of linear problems. We show results of the outer loop strategy, where the first loop, the ‘linear’ solution, is modified in the second outer loop by both a Taylor series term in the calculation of the observation innovations, and a linearized form of the background model that is linearized about an analysis ‘trajectory’ from the solution of the first loop. We also show results of how the outer loop strategy allows better assimilation of observations of surface wind speed and precipitable water, compared to the operational NAVDAS.


Introduction
Atmospheric variational data assimilation is the object of considerable research and development effort in both the operational numerical weather prediction (NWP) community and meteorological research organizations around the world. In variational assimilation, generalized cost functions allow optimal combination of observations, prior model forecast information, and the errors associated with each. From these cost functions Euler-Lagrange equations can be derived and solved if appropriate simplifications and linearizations are introduced. A major advantage of variational assimilation is that generalized cost functions can contain so-called forward models which allow conversion of non-conventional observations (e.g. satellite radiances) into forecast model state variables such as temperature and specific humidity. In fact, much of the current research effort in variational assimilation is focused on these forward models.
Variational data assimilation is normally formulated as a threedimensional problem (3DVAR) or a four-dimensional problem (4DVAR). In 3DVAR, the cost function has no explicit representation of the time dimension, so differences in time of observation among the observations must be treated in an ad hoc way * Corresponding author. e-mail: xu@nrlmry.navy.mil independent of the cost function. In 4DVAR, the time dimension is explicitly contained in the cost function through a linearized form of an NWP forecast model and its adjoint, each of which must be integrated repeatedly as part of the solution process. This gives a dynamically consistent time evolution of analysis increments, but at a significantly greater computational cost than a 3DVAR scheme which does not have any model integration requirements. The extra cost can be as much as a factor of 10, so only recently have computer resources become adequate to make 4DVAR operationally feasible at major NWP centers around the world.
Besides the separation of variational data assimilation into 3DVAR and 4DVAR categories, the formulation can be done in (a) information or analysis space form or (b) error or observation space form (see Daley and Barker, 2001a for discussion and details). Examples of analysis space variants of 3DVAR have been developed and run operationally in the United States at the National Centers for Environmental Prediction (NCEP; Derber et al., 1996) and the European Centre for Medium-Range Weather Forecasts (ECMWF; Courtier et al., 1998), among others. An observation space 3DVAR scheme, the Physical Space Assimilation System (PSAS) has been developed at the Data Assimilation Office (DAO) of NASA/Goddard (see Cohn et al., 1998), and another observation space 3DVAR system, the Naval Research Laboratory (NRL) Atmospheric Variational Data Assimilation System (NAVDAS; see Daley and Barker, 2001a,b), has been developed by the Marine Meteorology Division of the NRL in Monterey, California. It is run operationally by the United States Navy for their global NWP requirements. NAVDAS is the predecessor to the four-dimensional observation space system to be described in this paper.
An analysis space 4DVAR atmospheric global data assimilation system has been developed jointly by ECMWF and Meteo France as an evolution of the ECMWF 3DVAR system. It is described by Courtier et al. (1994) and has been run operationally at these institutions since the late 1990s. It uses a perfect model assumption that allows a strong constraint variational formulation. A similar global 4DVAR system recently became operational at the United Kingdom (UK) Met Office. In the United States, an observation space 4DVAR system, NAVDAS-AR, where AR stands for accelerated representer, is under development at NRL. Courtier (1997) has shown that when the perfect model assumption is made, the observation space 4DVAR algorithm is the dual of the analysis space 4DVAR algorithm, and from a linear algebra perspective, the two methods are equivalent.
The direct representer algorithm was introduced by Bennett and McIntosh (1982) and Bennett and Thorburn (1992). They applied it to oceanographic data assimilation problems that minimize a four-dimensional weak variational constraint cost function measuring the fit of an analysis to observations, an a priori background forecast, and time and space boundary conditions. In their application the minimization time interval is quite long (e.g. weeks or months), and each observation requires an adjoint model run and a tangent-linear model (TLM) run over this interval. The first meteorological application of the direct representer method was by Xu and Daley (2000), who applied it to short time intervals (e.g. 6 h) as is normal practice in NWP data assimilation. The cycling representer algorithm performs a series of these short interval minimizations as the system steps forward in time. Like the oceanographic application, each observation requires an adjoint and TLM integration. An important benefit of the direct representer algorithm is that it predicts the evolution of background error covariances. Unfortunately, however, the computational cost of essentially two model integration for each observation makes it totally impractical for anything except 'toy' problems, and certainly has no operational potential for the foreseeable future.
In light of the impracticality of the direct representer algorithm, Xu and Daley (2002) developed an accelerated form of the cycling representer algorithm. In the accelerated representer algorithm, the error covariances at the beginning of each cycle are specified, as they are in other 3DVAR and 4DVAR algorithms. The cost function is then minimized by solving the Euler-Lagrange equations as a large descent algorithm problem using methods such as conjugate gradient. Each descent iteration requires an adjoint and TLM model integration over the update cycle period, but a converged solution usually takes fewer than 100 iterations. This is in contrast to the O(10 6 ) in-tegrations that the direct representer algorithm would take for typical observation volumes encountered in operational global NWP. In fact, there is little difference between the cost function minimization methodologies in analysis space 4DVAR and the accelerated representer algorithm; both solve with some kind of descent algorithm. The only difference is the control variable upon which the minimization is based: in the strong constraint analysis space 4DVAR system the control variable is normally the size of the initial condition grid point or spectral field of an NWP forecast model, while in the observation space accelerated representer it is a variable the size of the observation vector. Courtier (1997) points out that if the perfect model assumption in analysis space 4DVAR is relaxed and a weak constraint cost function is defined, the control variable is defined over the much larger analysis space of the entire time period of the minimization. On the other hand, an imperfect model assumption has no impact on the size of the accelerated representer control variable, because in an observation space algorithm it is always the size of the observation vector. This has significant implications for future applications of the two forms when a model error term is included in the cost function (i.e. an imperfect model). We discuss these implications in our concluding remarks. Daley (2000, 2002) introduced the representer algorithm concept and its application to meteorological data assimilation. These two papers describe the first steps toward an operational four-dimensional data assimilation system for the United States Navy using representer algorithms. In a recent paper (Xu et al., 2005, hereafter referred to as XRD), we describe the first application of the accelerated representer algorithm to a fully configured global atmospheric data assimilation system. The system is based on the Navy Global Atmospheric Prediction System (NOGAPS) forecast model (Hogan and Rosmond, 1991) and many of the components of NAVDAS, the Navy's 3DVAR analysis system. XRD present the formulation of NAVDAS-AR in terms of a generalized non-linear cost function, and describe the simplifications and approximations necessary to reduce the problem to a set of linear Euler-Lagrange equations that are solvable. In this paper we take the next step in this evolution: the solution of the non-linear problem as a series of linearized Euler-Lagrange equations, the outer loop strategy. A similar procedure is performed in the ECMWF 4DVAR system and is described by Courtier et al. (1994), Rabier et al. (1998) and Rabier et al. (2000).
In Section 2, we describe the formulation of NAVDAS-AR as a generalized inverse problem in terms of a non-linear cost function. We summarize the development of the Euler-Lagrange equations described in XRD, and reintroduce the assumptions and approximations necessary to make the equations tractable. In Section 3, we give a brief overview of the fully linear NAVDAS-AR solution described in XRD. In Section 4, we formulate the outer loop strategy necessary to solve the non-linear problem as a series of linearized Euler-Lagrange equations. We present the details of the solution process for these equations, in particular Tellus 58A (2006), 1 how the non-linear operators that appear in the general problem are linearized. In Section 5, we present the results of some preliminary data assimilation experiments, showing the impact of the outer loops on analysis increment fields. However, we are not yet in a position to demonstrate the impact of the outer loop strategy on actual forecast skill. Instead we will examine convergence properties of the outer loop strategy and its numerical robustness. In Section 6 we describe our plans for the next step in the evolution of NAVDAS-AR. In Section 7 we give a summary and some conclusions.

Formulation of NAVDAS-AR: the Euler-Lagrange equations
In XRD, we defined a generalized inverse problem formulated as a non-linear least-squares cost function, from which we derived the Euler-Lagrange equations that were linearized to yield a tractable problem. We then derived the observation space algorithm upon which NAVDAS-AR is based. These details can also be found in Xu and Rosmond (2004). In this section we will summarize the linear form, and then formulate a more general linearization strategy based on Taylor series expansion of the non-linear operators about an analysis trajectory. This leads to an iterative outer loop procedure similar in philosophy to that described by Courtier et al. (1994), Rabier et al. (1998), Veerse and Thepaut (1998) and Rabier et al. (2000).

Generalized non-linear cost function
The generalized inverse problem can be posed as the minimization of the following generalized cost function for the time period t 0 ≤ t n ≤ t N . J r and J q are the scalar cost functions for the observation and model error, respectively, and J b 0 is the scalar cost function for the background error at the beginning of the period (t = t 0 ). Superscripts 'r', 'q' and 'b' refer to the observations, model and background, respectively. Superscript 'T' refers to a matrix transpose. Subscripts, '0', 'n' and 'n − 1' refer to the time at t = t 0 , t = t n and t = t n−1 , respectively. M is the non-linear forecast model and has a state vector of length I. x b n is the a priori background forecast and x n is the state vector, each describing atmospheric estimates of length I at time t = t n . Q nn is a rank I model error covariance matrix between t = t n and t = t n . P b 0 is a rank I background error covariance matrix at the initial time t = t 0 . y n is a vector of observations of length k n at time t n , where N n=0 k n = K . K is therefore the length of the observation vector. H is non-linear observation operator that produces k n pseudo-observations from a given state vector x n at time t n . R nn is a rank k n observation error covariance matrix between t = t n and t = t n .
Taking the first variation of eq. (1) yields δx a n − M n−1 δx a n−1 where x a is the analysis that is the value of x which minimizes J. The matrix M n−1 = ∂ M(x)/∂x evaluated at x a n−1 is an I × I Jacobian matrix (also commonly known as the TLM in atmospheric sciences) corresponding to the (possibly) non-linear model M. When M is linear M(x a n−1 ) = M n−1 x a n−1 . The matrix H n = ∂ H (x)/∂x is evaluated at x a n and is an I × k n Jacobian matrix corresponding to the (possibly) non-linear forward operator H. When H is linear we have H(x a n ) = H n x a n . In XRD we showed how eq. (5) leads to the following coupled non-linear Euler-Lagrange equations: for 1 ≤ n ≤ N − 1, and subject to and x a n − M x a n−1 ≡ N n =1 Q nn λ a n , for 1 ≤ n ≤ N, and subject to Equation (6) is an equation in the adjoint variable λ a n that integrates backward in time, from n = N to n = 1, assimilating observations as a right-hand side forcing term (i.e. the innovation vector). Equation (7) integrates the analysis variable x a n forward in time, with an initial condition coming from the convolution of the background error covariance P b 0 with initial time solution λ a 1 of eq. (6), plus an n = 0 innovation vector term that can be discarded, as explained below. For the imperfect model case, a model error term also appears on the right-hand side of eq. (7) as a time-dependent forcing term formed by the convolution of the model error covariance matrix with adjoint variable λ a n . At this point we have made essentially no simplifying assumptions in the derivation of eqs. (6) and (7). They form a coupled non-linear Euler-Lagrange system that is very difficult to solve. When the forecast model M and the observation operator H are linear, however, various representer methods can be used to decouple and solve the linear Euler-Lagrange system. (see Daley, 2000, 2002;Chua and Bennett, 2001;Bennett, 2002). Beside the linearization that is necessary to achieve a tractable solution, it is also convenient to make some additional assumptions. If we assume model errors and observation errors are not correlated in time, we have Q nn = 0 and R nn = 0 for n =n and Q n = Q nn and R n = R nn for n = n . If we also assume that observation errors are uncorrelated in space, then R n is a diagonal matrix of observation error variance at each t = t n . Q n is a model error covariance matrix at each t = t n . A perfect model assumption gives Q n = 0, and because of our lack of understanding of the form of Q n , in the subsequent development we will make this assumption. However, we will briefly discuss the imperfect model case in the conclusions.
For cycling operational NWP applications, the observations at beginning of a given data assimilation period, t = t 0 , are usually already used in the previous data assimilation cycle, i.e. at t = t N of the previous cycle. Because observations should not be assimilated twice, we must exclude observations in the t = t 0 time bin. which requires that the innovation vector term in the initial condition (n = 0) of eq. (7) be dropped.
In the next section we summarize the case of linear M and H described in XRD, and then in the following section we show how a more general linearization leads to a sequence of linear problems that solves the non-linear problem with an outer loop strategy.

Solution of the fully linear case
The approximations and simplifications introduced at the end of the previous section, plus the assumption that M(x n ) = M n x n and H(x n ) = H n x n yield the following linear Euler-Lagrange equations: and x a n − M n−1 x a n−1 = 0, for 1 ≤ n ≤ N , The prior background state estimate x b is from a non-linear model forecast: In XRD the representer method is used to decouple eqs. (8)-(9) and yield the observation space NAVDAS-AR algorithm: and Here, P b 0 is the background error covariance at initial time (n = 0). NAVDAS-AR uses the same P b 0 as is used in NAVDAS (see Daley and Barker, 2001a) The details of the derivation of eq. (11) can be found in XRD and Xu and Rosmond (2004). The solution procedure is also described in great detail in XRD and will not be repeated here.

Solution of the non-linear case: the outer loop strategy
The coupled Euler-Lagrange eqs. (6)-(7) are complicated by the non-linearity of the forecast model M and the observation operator H. However, it is possible to recast the equations as a sequence of coupled problems, each iteration step linearized about an analyzed basic state (trajectory) from the previous iteration. The non-linearity is therefore approximated by Taylor series expansions. If the non-linearity of the system is not too strong, we expect the procedure to converge to a good approximation of the non-linear solution. In this section we develop this iterative system in detail. First, let x j and x j−1 be the estimated state for the jth and ( j − 1)th iterations, respectively. Let x j n =x j−1 n + δx j n . For simplicity we have dropped the 'a' superscripts on the x and λ variables. The non-linear model M and the non-linear observation operator H at the jth iteration are linearized around the previous analysis ( j − 1) using Taylor expansion, such that for 1 ≤ n ≤ N − 1, and subject to and for 1 ≤ n ≤ N , and subject to (14) Note the similarity of eqs. (13) and (14) to eqs. (6) and (7).
Jacobian matrix or TLM corresponding to the non-linear model M. The matrix H j−1 n = ∂ H (x)/∂x, evaluated at x j−1 n , is an I × k n Jacobian matrix corresponding to the non-linear observation operator H.
Equations (13) and (14) form a coupled linearized Euler-Lagrange system. The system can now be decoupled using similar procedures to those used in XRD.

Solution of the coupled linearized Euler-Lagrange system
Although there are many similarities between the coupled linear system defined by eqs. (13) and (14) and that defined by eqs. (8) and (9), there are differences in the details that require careful examination. With similar assumptions to those used in Section 3, we simplify eqs. (13) and (14) to yield for 1 ≤ n ≤ N − 1, and subject to and Let us introduce a prior linear state (x p ) j for the jth iteration, which satisfies the following governing equation linearized around the previous analysis x j−1 : We also introduce the definitions and (x p ) j n is used here as the background state for the minimization of the linear problem (the inner loop) in the representer method. The analysis x j for the jth iteration can now be written as where β j is the coupling vector or the analysis residual normalized by the observation error covariance for the jth iteration and satisfies the following: Detailed derivation of eqs. (21) and (22) can be found in Appendices A and B, respectively. Also notice that eqs. (21) and (22) are, respectively, the post-multiplier and inner loop solver steps described in XRD.

Calculation of (x p ) j , the background trajectory
Equation (17) as written cannot be solved because of the presence of the M(x j−1 n−1 ) term. However, because our Taylor series linearization is about a base state that is the previous analysis, the obvious choice is to assume that x j−1 n = M(x j−1 n−1 ). This assumption, along with eq. (19), yields for 1 ≤ n ≤ N , and subject to Equation (23) is a tangent-linear integration of an initial condition correction to the previous analysis, yielding an adjusted background trajectory that is used to correct the innovation term on the right-hand side of eq. (22). Notice, however, that we still have (x p ) j 0 = x b 0 , which is consistent with our original formulation.
Although eq. (23) is a natural choice for a linearization scheme to predict the (x p ) j 0 , because it is consistent with the representer derivation, there is no reason to restrict ourselves to this linearization. In fact, Bennett (2002) suggests experimenting with linearization schemes to assess the impact on iteration strategies such as the NAVDAS-AR outer loop, particularly if numerical instabilities are a potential problem in the scheme. We have experimented with an alternative scheme to eq. (23), specifically a Picard iteration in which the non-linear model M is linearized about the previous analysis in the form of a relaxation term on (x p ) j n to that analysis. A simple one-dimensional example of a Picard iteration is shown in Appendix C. This approach allows us to retain the non-linear model diabatic physics as a forcing term to the linear model, an appealing property. In fact, in the results of the next section, we use this Picard iteration with a linearized form of NOGAPS model. Preliminary tests show better convergence of the outer loops with the Picard iteration model than with eq. (23), and so far no instabilities are apparent, so we intend to continue to test the system with the Picard iteration model.

Summary of the non-linear case
Step 1. For the first iteration ( j = 1) use eq. (10) to find x b n . For j > 1 find (δx p ) j from the solution of eq. (23), Notice that (x p ) j = x j−1 + (δx p ) j . For the first iteration (δx p ) j = 0, i.e. (x p ) 1 =x 0 =x b . With this assumption and using eq. (10) the first outer loop iteration is identical to the inner loop procedure described in Section 3. Step Step 3. Find the analysis residual for the jth iteration, β j , which is the solution of the solver (22). Step which is the solution of the post-multiplier (20).
Step 5. The analysis for the jth iteration is x j = (x p ) j + δ * x j . If δ * x j satisfies the convergence criteria, stop the outer loop, otherwise go back to step 1.

Results
At this point in NAVDAS-AR development we are not in a position to demonstrate the impact of the new system on NOGAPS forecast performance, i.e. we have not yet run the system in a full update cycle mode. Instead, we are concentrating on testing individual system components in isolation to ensure numerical stability and robustness. In XRD, we demonstrated the ability of NAVDAS-AR to produce comparable analysis increments to the operational 3DVAR NAVDAS, but only for the linear case, i.e. no outer loops.
In this paper, we are extending the tests to the non-linear case with the introduction of an outer loop strategy as described in Section 4. There are two principal components to be tested in the outer loop process: (a) the linearization of the NOGAPS forecast model about an analysis trajectory, yielding more accurate TLM and adjoint models (M and M T ) in eq. (A7); (b) the linearization of non-linear observation operators H in eq. (22), yielding H, which allows adjustments to the innovations during the outer loop iterations.
For the test results in this section, the outer loop spectral resolution is T239L30, with a Gaussian grid of 720 × 360. The calculation of observation innovations is carried out on this grid. The inner loop spectral resolution is T79L30, with a Gaussian grid of 240 × 120. The analysis increment fields are computed at this resolution. Combining background fields with the increments is all done in spectral space, so complications due to the mismatch in grid resolutions are avoided.

Inner and outer loop convergence properties
Before showing results for (a) and (b), however, it is worthwhile to examine the numerical behavior of both the outer loop and inner loop iteration processes. An important detail in the outer loop strategy is the distinction between the analysis trajectories x j n and the forecast model trajectories (x p ) j n . x j n is constrained by the observations during the 6-h update cycle window, while the (x p ) j n is constrained by the fixed initial condition x b 0 and the analysis trajectory from the previous outer loop. Figure 1 shows the convergence behavior of x j n and (x p ) j n , respectively, for six outer loop iterations. Each line is a perturbation energy norm change in δ(x) j and δ(x p ) j for the analysis trajectories and forecast trajectories, respectively, for 1 ≤ j ≤ 6. It is quite apparent that the outer loops converge very rapidly, with over 90% of the convergence taking place after only two loops (note that the ordinate axis is a log scale). The analysis trajectories x j n converge quite uniformly over the 6-h window, which is not surprising, because they are directly constrained by the observations over the entire time window. The forecast trajectories (x p ) j n behave differently, however. Every forecast trajectory starts from the same x b 0 , so the initial differences are zero, which we cannot plot on the logarithmic ordinate axis. Notice also that the forecast trajectory norm changes are about an order of magnitude less than analysis norm changes. This is clearly a consequence of the fixed x b 0 for each iteration, which is a stronger constraint than the varying analysis trajectories about which the Picard iteration model is linearized. It also seems likely that the fixed initial condition makes model error more important for the forecast trajectories.
The conjugate gradient solutions of the inner loop as a function of the outer loop index j are shown in Fig. 2. Here we are finding the solution of r = d j − ∇A j z, the solver equation described in XRD. A j and d j are functions of the analysis trajectories, z is the analysis residual we are solving for, and r is the cost function gradient we are iterating toward zero.
For the first outer loop (j = 1), we start the conjugate gradient iteration with (z = 0). The large value of the initial gradient norm, ≈ 4 (see Fig. 2), shows that this is not a very good starting point, so the first outer loop, i.e. the linear solution, takes many more iterations to converge 1 than subsequent outer loops. We exploit the fact that because A j and d j are not changing substantially as the outer loop iterations proceed, the converged solution z of outer loop (j − 1) should be an excellent starting guess for outer loop J. The much smaller starting gradients in the solver equation for subsequent outer loops confirm this. It is important to point out that in these experiments we are reducing our cost function gradient to values much smaller than necessary, because we are interested in testing the robustness of our conjugate gradient algorithm. For j = 1 the starting gradient/ending gradient ratio is about 400. For subsequent iterations the ratio is smaller, but this is primarily because the starting gradient is smaller; the converged outer loop solutions are quite similar after the first two iterations, as shown in Fig. 1. Daley and Barker (2001a) suggest that a ratio of about 100 is adequate for practical applications, and in fact that for very small gradient values the data are being 'fitted' with small-scale analysis features that may be undesirable, in addition to the extra computational costs of the additional conjugate gradient iterations. In future NAVDAS-AR applications, we will certainly stop our iterations at more reasonable gradient values.

Linearization of NOGAPS about the analysis trajectories
As discussed above, we want to look at the impact of linearizing both the forecast model M and the observation operators H on outer loop performance. In a first experiment, we limited the data types assimilated into NAVDAS-AR to 'conventional' observations, i.e. observations in the actual analysis variables, such as temperature, wind vectors and relative humidity. In this case, H reduce to simple interpolations, and H = H. The innovation term on the right-hand side of eq. (22) reduces to y − H(x p ) j , which is only changing as the forecast trajectory (x p ) j changes for each outer loop iteration, i.e. there is no dependence on the analysis trajectory x j−1 . The linearized model operator M j is still a function of x j−1 , however, and this is what we want to examine. The top panel of Fig. 3 shows the 00Z 12 March 2004 NAVDAS-AR increment field for 250-hPa temperature after the first outer loop, i.e. the (j = 1) 'linear' solution. The bottom panel is the change in this increment field due to the second outer loop. The only significant changes are in the central United States, where the difference field is generally of the same sign as the increment field. The sign of this difference field is such that it should be subtracted from the (j = 1) solution to yield the (j = 2) solution. The net result of the updated analysis and forecast trajectories in the second outer loop is to slightly reduce the amplitude of the increments. We have seen this pattern frequently, i.e. apparently the linear solution tends to overestimate the increments, and the second outer loop reduce them. Which solution is actually the best will have to wait until we can run forecast experiments from each candidate solution.
An even more dramatic example of the difference between outer loop 1 and outer loop 2 is shown in Fig. 4. The top panel is the loop 1 linear solution, showing relatively large increments of terrain pressure over land masses, particularly Australia and North America. The bottom panel shows almost none of these landlocked features. Figure 5 clearly shows why this happens; the differences almost exactly cancel the loop 1 result. We believe the explanation for this somewhat troubling behavior lies in the way that terrain pressure innovations are derived in NAVDAS-AR. Innovations of terrain pressure are computed at observation locations, but at the terrain pressure of the true terrain height at these locations. The NOGAPS terrain pressure is defined at the terrain heights of the spectrally truncated T239 model terrain, which often departs substantially from the true terrain. We only use terrain pressure innovations at locations where there is a reasonable close match between model and true terrain height (within ≈250 m), but this still requires a hydrostatic extrapolation of pressure innovations between the two levels that is subject to error. The second outer loop adjustments are clearly correcting for these errors, because we are sure that the bottom panel result of Fig. 4 is a better solution than the top panel result.
In both Figs. 1 and 2 we see convincing evidence that two outer loops is enough for practical convergence of NAVDAS-AR. Figure 6 is another indication of this convergence condition.
It is equivalent to the bottom panel of Fig. 3, except that it is the difference between outer loop 1 and outer loop 6. The similarities between the two figures are striking, showing that the extra four outer loop iterations added nothing to the NAVDAS-AR solution. These results, and the cost of another outer loop iteration, make it unlikely that we will go beyond two outer loops for any practical NAVDAS-AR applications.

Linearization of observation operators
At this point in NAVDAS-AR development we only have the ability to assimilate two 'non-conventional' observation types: SSM/I (Special Sensor Microwave Imager) wind speeds and SSM/I total column precipitable water. The SSM/I wind speeds require a non-linear observation operator H, so they will have an impact on the right-hand side of eq. (22) during outer loop iterations. The observation operator for the SSM/I precipitable water is not non-linear, but it must transform these observations from column integral values, expressed in units of kg m −2 to profiles of relative humidity, the native moisture variable in NAVDAS-AR. So the operator does much more than the simple interpolations that H performed in our 'conventional' data results previously shown. The details of this operator and the nonlinear wind speed operator can be found in Daley and Barker (2001a).
The top panel of Fig. 7 shows the impact of assimilating the SSM/I surface winds on 500-hPa vorticity after two outer loops. We are intentionally showing results that are remote from the actual observations to demonstrate to ability of NAVDAS-AR to propagate information away from the surface into the mid-troposphere under the dynamical control of M and M T . 3DVAR systems such as NAVDAS have difficulty accomplishing this (Ed Barker, private communication). The very strong signal in the Southern Hemisphere is a direct reflection of the inability of the operational NAVDAS to effectively assimilate SSM/I wind speeds, because the background fields we are using are still very much influenced by previous update cycles from the operational NAVDAS. When we are cycling NAVDAS-AR on its own, we assume we will see the background fields being 'conditioned' by the assimilation of these new data types, and the Southern Hemisphere will be more like the Northern Hemisphere in the impact of these observations. The bottom panel of Fig. 7 shows the impact of assimilating the SSM/I precipitable water at 1000 hPa. We do not see the same strong Southern Hemisphere signal with these data,  primarily because the operational NAVDAS is assimilating the data also, although not with the same observation operator used in NAVDAS-AR. Nevertheless, we see noticeable changes due to incorporation of SSM/I precipitable water over all ocean areas, a strong indication that the data should have significant impact on NOGAPS moisture and precipitation forecasts in these areas.

Future plans
We are now in a position to take the next logical step in NAVDAS-AR development: running the system in full data assimilation mode to produce initial conditions for the NOGAPS forecast model that give competitive forecast skill to those coming from the operational NAVDAS. Several components must still be added to the system before this can be accomplished, however. A separate univariate 2DVAR analysis for subterranean sea level pressure is currently run as part of the operational NAVDAS; this must also be added to the NAVDAS-AR system. A greater challenge, however, is the incorporation of satellite radiance retrievals into NAVDAS-AR. The observation operators H, H and H T must be adapted from the operational NAVDAS to NAVDAS-AR. Experience has shown that although direct radiance assimilation can have significant positive impact on forecast system performance, a great deal of testing and tuning is required to achieve this impact.
In its weak constraint observation space formulation, NAVDAS-AR has the unique advantage of allowing the addition of a model error term Q on the right-hand side of eq. (16), without a change in the size of the control variable. The form that Q should take is the source of considerable research in the data assimilation community, and we expect to investigate its form in future research with NAVDAS-AR.
Computational cost is an obvious issue for any operational 4DVAR assimilation system, and NAVDAS-AR is no exception to this reality. Currently, NAVDAS-AR is five to six times as expensive as the operational NAVDAS. Next-generation computer systems will certainly deliver the necessary computational power to cover this extra cost, but there is still considerable motivation to improve the system's computational efficiency, because the cost of development testing has a direct impact on the rate at which we can move the system toward operational implementation. In Fig. 2 we show the convergence properties of the conjugate gradient algorithm without preconditioning. In the operational NAVDAS, a Choleski-based preconditioner almost doubles the rate of conjugate gradient convergence, but this approach cannot be applied in NAVDAS-AR because of the time integration operators in the solver equation. A promising approach is the so-called Lanczos connection (Golub and Van Loan, 1996), which finds the leading eigenvalues and eigenvectors of the solver equation matrix, from which an approximate inverse can be constructed. This is the current direction of our research in this area.
NAVDAS-AR is a message passing interface (MPI) application, so scalability is a critical issue for overall system computational efficiency. The cost of NAVDAS-AR is dominated by the inner loop computations, divided between the integration of the NOGAPS tangent-linear M and adjoint M T models, and the background covariance calculations with P b 0 . The models are essentially dynamical cores 2 and scale very well to processor numbers of O(100), but the P b 0 operations do not scale as well, primarily because of a complex observation space search algorithm taken from NAVDAS that is unnecessary in NAVDAS-AR. This is because although both NAVDAS and NAVDAS-AR are observation space based, the background error covariance calculations are quite different in the two systems. NAVDAS-AR performs these calculations on the native computational grid of the NOGAPS model, but we are not yet exploiting this feature. As an initial expedience, we adapted the NAVDAS observation space P b 0 algorithm by mimicking observations at the NOGAPS grid locations. This comes with a significant amount of extra overhead that can be eliminated if we redesign the system for the actual grid geometry. This is a top priority for operational implementation.

Summary and Conclusions
In this paper we present results and a progress report on the development of NAVDAS-AR, an accelerated representer form of a weak constraint observation space 4DVAR algorithm. NAVDAS-AR is the basis for the United States Navy's next generation atmospheric data assimilation system. We have reached an important milestone in the development of NAVDAS-AR, the implementation of an outer loop strategy that allows the introduction of non-linear observation operators. Preliminary tests show rapid convergence of the outer loops: two loops achieves over 90% of the non-linear adjustments. We are now ready to proceed to the next step toward operational implementation; cycling the system in a full data assimilation mode and assessing the impact of NOGAPS forecast system performance. Computational issues still remain, however, because NAVDAS-AR is significantly more expensive than the 3DVAR operational NAV-DAS. Next-generation computer systems will remove most of these obstacles, but there is still room for improvement in system efficiency.
The benefits of 4DVAR data assimilation have been clearly demonstrated by the experience at ECMWF. NAVDAS-AR has the potential to deliver the same benefits to Navy operational NWP. An important by-product of our efforts will be comparisons of the relative advantages and disadvantages of the ECMWF type analysis space algorithm versus the NAVDAS observation space algorithm. The impact of an imperfect model assumption (weak constraint formulation) on the two approaches will certainly be one of the most intriguing research questions. We know that an observational space algorithm has a theoretical advantage here because the size of the control variable remains unchanged in the weak constraint case. However, in XRD we point out that the size of the control variable only impacts the cost of the observation operators, a relatively minor part of the total cost of NAVDAS-AR compared to the cost of model integrations and covariance matrix convolutions. Adding an expensive model error covariance term will further reduce the impact of control variable size on overall computational cost. We also suggest that even with the substantially larger control variable in a weak constraint analysis space algorithm (i.e. the ECMWF system) its impact on computational cost will be minimal compared to the cost of a model error convolution. Future research in the data assimilation community will certainly answer this question.
Another significant difference between NAVDAS-AR and the ECMWF system is the use of an incremental form of the 4DVAR algorithm by ECMWF. We investigated a similar incremental approach for NAVDAS-AR and decided not to pursue it at this time. We did so because (a) we found our full variable form performed quite satisfactorily, (b) the representer method does not lend itself to an incremental method, and (c) the original theoretical background work on the accelerated representer by Xu and Daley (2002) was done in a full variable form. At this point we want to pursue NAVDAS-AR as they originally envisioned it, although in the future we may revisit an incremental form if a clear advantage is apparent.
We are now ready to bring NAVDAS-AR to a full prototype operational configuration. In the future we will be reporting on results of data assimilation experiments and the impact of NAVDAS-AR on operational NOGAPS forecast performance. In addition, we will contribute to answering the questions on the relative merits of analysis space/observation space 4DVAR algorithms, such as those posed above. The data assimilation community will clearly benefit from the answers, whatever they may be.

Acknowledgments
We are deeply indebted to our colleague, the late Roger Daley, who formulated and initiated the development of NAVDAS-AR. We would also like to thank Andrew Bennett and Boon Chua of Oregon State University, and Ed Barker of NRL in Monterey for their encouragement and help during the development. Support of the sponsor, the NRL, under base program elements 0601153N and 0602436N, is gratefully acknowledged. We also thank the two anonymous reviewers who gave us numerous helpful comments and corrections, which greatly improved the manuscript.

Appendix A: Derivation of eq. (21)
Let (α k ) j and (γ k ) j be the adjoint field and the representer function for the kth observation, respectively. Also let (α k ) j and (γ k ) j satisfy the following equations for 1 ≤ n ≤ N − 1, and subject to for 1 ≤ n ≤ N , and subject to Also, let M j−1 be a matrix with 0 upper triangle, and [ M j−1 ] T be its transpose: With the definitions above, we may now rewrite eqs. (A1) and (A2) in the following compact forms: Tellus 58A (2006), 1 We also have where (P b ) j−1 is the background error covariance used in the jth iteration. According to representer theory, the analysis x j for the jth iteration is simply the background (x p ) j plus a correction term. The correction term is essentially a linear combination of all the representer functions, (γ k ) j , with the coefficients being β j k for k = 1, K . The solution to the coupled linearized Euler-Lagrange system (15)-(17) can now be written as For the special case at j = 1 we use eq. (10) and set (x p ) 0 = x 0 = x b and δ(x p ) 0 = 0. Recalling x j = x j−1 + δx j , (x p ) j = x j−1 + δ(x p ) j , and δx j − δ(x p ) j = x j − (x p ) j , we can rewrite eq. (A8) as where β j is the coupling vector or the analysis residual normalized by the observation error covariance for the jth iteration and satisfies the following: Notice that β

Appendix B: Derivation of eq. (22)
Define a matrix A consisting of N × N blocks, each of which is an I × I matrix. For the ( j − 1)th iteration, we have A j−1  where I is an I × I identity matrix. We may now rewrite eq. (12) as and eq. (A1) as Now define a matrix F consisting of N × N blocks, each of which is an I × I matrix, for the ( j − 1) th iteration. We have Subtracting eq. (14) from eq. (13) and using eq. (B4), we have where and δ * x j n = x j n − (x p ) j n for1 ≤ n ≤ N .

Appendix C: Simple Picard iteration model
A Picard model is a linearized form with a relaxation term forcing the solution toward some desired trajectory. In NAVDAS-AR we want to relax the NOGAPS forecast model toward the analysis trajectories of our outer loop iterations. To demonstrate the technique, consider a one-dimensional advection equation and where 'p' superscripts identify the Picard model variables and 'a' superscripts identify the analysis trajectory variables produced from the previous outer loop. We substitute eqs. (C2) and (C3) into eq. (C1) and expand the terms, dropping the second-order term (U p −U a )∂/∂x ( p − a ) as in any linearization. This yields The second term on the right-hand side is the relaxation term toward the analysis trajectory.
In our application, the dynamical core equations of the NO-GAPS spectral model are linearized as shown here, producing a Picard NOGAPS model relaxed toward the analysis.