Simplified Kalman smoother and ensemble Kalman smoother for improving reanalyses

. The paper presents a simplification to the Kalman smoother that can be run as a post processing step using only minimal stored information from a Kalman filter analysis, which is intended for use with large model products such as reanal-yses of Earth system variability. A simple decay assumption is applied to cross time error covariances and we show how the resulting equations relate formally to the fixed-lag Kalman smoother, and how they can be solved to give a smoother analysis along with an uncertainty estimate. The method is demonstrated in the Lorenz 1963 idealised system, being applied with both 5 an extended Kalman smoother and an ensemble Kalman smoother. In each case the root mean square errors (RMSE) against truth, for both assimilated and unassimilated (independent) data, of the new smoother analyses are substantially smaller than for the original filter analyses, while being larger than for the full smoother solution. Typically 60% of the full smoother error reduction with respect to the filter, is achieved. The uncertainties derived for the new smoother also agree remarkably well with the actual RMSE values throughout the assimilation period. The ability to run this smoother very efficiently as a post 10 processor should allow it to be useful for real large model reanalysis products, especially ensemble products, that are already being developed by various operational centres.


Introduction
Data assimilation is widely used for making atmosphere and ocean predictions, providing a best estimate of the current state of the system by combining the information from model forecasts with new observations available up to the current time.The state estimates are used for two purposes.First, they are used to initialise new model forecasts (from minutes to seasons ahead).
Second, the state estimates can provide reanalysis products representing our best estimate of past environmental conditions.This involves assimilating historical observational data using the newest models and assimilation methods available to us today, eg.Uppala et al. (2005); Balmaseda et al. (2013).However assimilation systems suitable to initialise forecasts may be less than optimal when used for reanalysis production.
The main distinction we will draw is between sequential assimilation methods which use only past data, as appropriate for forecasting, and temporal smoothing methods which can use past and future data to obtain a better state estimation, which may be more useful for reanalysis.Although 4DVar is used in operational meteorology and provides some temporal smoothing, it is only used to smooth within a short past data window when applied to initialise forecasts.The archetypal sequential data 1 https://doi.org/10.5194/egusphere-2023-337Preprint.Discussion started: 15 March 2023 c Author(s) 2023.CC BY 4.0 License.assimilation approach, originally for linear systems, is the Kalman filter (KF), see e.g.Chapter 6 of Evensen et al. (2022)).
While the basic KF is inefficient to use in applications with large state spaces (due to the size of the error covariance matrices and the problem of propagating them from one time to the next), the ensemble Kalman filter (EnKF, invented by Evensen (1994)) is a popular and tractable approximation which also allows for non-linear systems to be treated.The EnKF exists in many flavours (e.g. in 'stochastic' (Burgers et al., 1998;Houtekamer and Mitchell, 1998) and 'square-root' forms (Bishop et al., 2001;Whitaker and Hamill, 2002)), which, like the basic KF, are all based on Bayes' theorem, and assume that errors in observed, prior, and posterior quantities are Gaussian distributed.Under the EnKF, the prior distribution is described by an ensemble of model forecast states, and the posterior distribution by an ensemble of posterior states found by assimilating current observational information.This makes the EnKF suitable for model-based forecasting systems.
Ensemble Kalman filters, applied either on their own, or hybridised with variational approaches, have shown success in numerous geophysical applications.For example in meteorological applications with the Canadian forecasting system (Houtekamer et al., 2005), with the NCEP global (Hamill et al., 2011;Wang et al., 2013) and regional (Pan et al., 2014) models, and the WRF model (Zhang and Zhang, 2012); in ocean analysis (van Velzen et al., 2016); in ocean and sea ice analysis (Sakov et al., 2012); in atmospheric chemical analysis (Skachko et al., 2016); and in surface trace gas analysis (Feng et al., 2009).
The filtering problem, as noted above, includes only past and present observational data, but this can be extended to a smoothing problem, which can also use observations within a future time window, usually referred to as the lag (e.g.Todling and Cohn (1996)).Kalman smoothers (KSs) are made possible by the construction of cross-time error covariance matrices that link the observations at future times with the current analysis, often up to some maximum lag time.A smoother analysis will therefore use more observational data than a filter analysis and should therefore provide a more accurate state estimate.This would seem particularly relevant for reanalysis applications when full time series of past and future observations are available for constructing system states.Various smoothers have been proposed for use in the geosciences (e.g.Evensen and van Leeuwen (2000); Ravela and McLaughlin (2007); Bocquet and Sakov (2014)).These smoothers have been proposed for both reanalyses, e.g.Zhu et al. (2003), and parameter estimation, e.g.Evensen (2009).Just like the EnKF, the ensemble Kalman smoother (EnKS) uses an ensemble of model realisations to estimate the error distribution of the model forecasts, which can be very efficient.
The KS has been shown to be effective in various applications.For example, Zhu et al. (2003) designed a meteorological reanalysis system using a fixed-lag KS, and Khare et al. (2008) with longer lags; Cosme et al. (2010) developed an EnKS for ocean data assimilation; and Pinnington et al. (2020) used KS techniques for land surface analysis.These applications though rely on the cross-time covariance matrix (either explicitly or implicitly) for the smoothing problem.
For large operational forecasting and reanalysis systems, especially for high resolution global ocean, climate or Earth system models, which contain substantially long timescale processes of up to weeks or months, running a smoother with a reasonably long lag could be very expensive in computation and thus impractical.Even for the relatively cost effective EnKS, the ensemble anomaly matrix for each time-step could consist of billions of elements, which takes large chunks of computer memory space.
In addition it would not be easy to retrofit smoothing code into an operational data assimilation system that has been developed over thousands of person years primarily for initialising forecasts.For reanalysis products developed in this way a simpler post-processing approach to smoothing could be very valuable.Dong et al. (2021) recently proposed a new smoother designed to be used offline through post processing of a filter analysis.
It was based on the simplifying physical assumption of decaying error covariances across time, resulting a formulation similar to an autoregressive model.This smoother uses only the filtering increments, without needing to seek other information.The method was shown capable of improving the Met Office GloSea5 ocean reanalysis (MacLachlan et al., 2015), reducing RMSE against both assimilated and independent data, and producing more realistically smooth temporal variability for important quantities such as ocean heat content.
In this study we further explore the characteristics of the Dong et al. (2021) smoother as an approximation to the Kalman smoother framework.We demonstrate that with proper assumptions, this method can be reproduced by an extended Kalman smoother and an Ensemble Kalman smoother, in the latter case retaining the benefit of the ensemble's flow dependent covariances.We also extend the theory to show how the uncertainty estimates of the smoothed analyses can be obtained from post processed filter information.The full and approximate smoother approaches are implemented in the Lorenz 1963 model and the results compared.We show that the Dong et al. (2021) post processing method produces intermediate error results between the filter and the full Kalman smoother without costing significant computer time or adapting the filter codes.
Section 2 presents the method, starting with the simple smoother representation of Dong et al. (2021), and where the theory is extended to the smoothed uncertainties.Section 3 presents the implementation of the extended filter/smoother in the Lorenz 1963 system.Section 4 adapts the methods presented earlier for application for the EnKF/S and presents results for these assimilation approaches, also in the Lorenz 1963 system.Section 5 is a discussion of the applicability of these approximations in larger models were the simplifications should allow for post process smoothing of operational reanalysis products, and section 6 presents conclusions and recommendations for stored variables that would allow post processed smoothing in larger systems.The appendix reviews the conventional KF and fixed-lag KS equations, and shows formally the approximations that are applied, which lead to our simplified smoothing algorithm.

The simple smoother method
In Dong et al. (2021) a simple smoother method (hereafter referred to as DHM) was presented for application in operational ocean reanalysis products, where the original analysis had been performed with a purely time sequential approach, as used in forecasting situations when future data are never available.This simple approach was designed to use the archive of increments to create a post hoc smoothing of the original reanalysis.Dong et al. (2021) showed the positive impact of this smoothing on both a full ocean reanalysis and also on the low-dimensional Lorenz 1963 system.The algorithm applied is as follows.Let A t be the forward sequential (filtering) analysis at time t, and I t be the analysis increment field used to produce A t .The smoother https://doi.org/10.5194/egusphere-2023-337Preprint.Discussion started: 15 March 2023 c Author(s) 2023.CC BY 4.0 License.solution at time t is denoted S t .The smoother algorithm is then written as follows, firstly for S 0 : where 0 < γ a < 1 is the increment decay rate per analysis time window, so that analysis increments from future analysis times decay in their influence on S 0 .Similarly for S 1 : By rearrangement we get with where SI t = S t − A t defines the 'smoother increment'.These recursive relationships allow the smoother to be applied as a post processing algorithm, which is run backwards in time starting with the final sequentially analysed time window, using the stored archive of analysis increments.It will be convenient later to define the increment decay per model timestep which we will write as just γ where γ N = γ a and N is the number of model timesteps between filter analyses.Later we will assume each analysis window consists of one timestep, therefore N = 1 and γ = γ a .The decay timescale τ associated with the smoothing is then given, in model timesteps δt, by which is effectively a measure of the smoother lag which is not given an explicit maximum cutoff.
Below we discuss how this simple smoother is related to the conventional KS approach (a more formal proof of the equivalence is given in the appendix).

Extended Kalman filter and extended Kalman smoother
We start from the classical extended Kalman filter (ExtKF) and fixed-lag extended Kalman smoother (ExtKS) formulation in which a tangent linear model is used when the model is nonlinear.We will use superscripts f, a, s to describe filter forecasts, filter analyses, and smoother analyses respectively.The analysis of the Kalman filter at time k is given by where the subscript represents the time step, x ∈ R n is the n-dimensional state vector, y ∈ R m is the observations, H k is the observation operator.In the ExtKF, the observation operator and the model can be nonlinear, where the state vector evolves with a model x k = M(x k−1 ).The nonlinear operators have to be replaced by their tangent linear approximations in the forecast and analysis steps, leading to the linear transformations M k ∈ R n×n and H k ∈ R m×n for the tangent linear model and tangent linear observations matrix respectively.K a k ∈ R n×m is the Kalman gain for the analysis, which is given by with P f k ∈ R n×n being the forecast error covariance matrix, R k ∈ R m×m being the observation error covariance, all at the current timestep k.The analysis error covariance can be derived from the forecast error covariance as follows: which we will return to later.
For the fixed-lag ExtKS, Todling and Cohn (1996) (hereafter TC96) derive backward looking equations for the smoother, however here we will present forward looking equations aimed at expressing the fully smoothed state, including contributions from multiple future filter steps, as presented for the simple smoother in Eqs. ( 1)-( 5).The equivalence between TC96's and our notation is demonstrated in the appendix.The contributions from observations at time step k + ℓ to the smoother solution at timestep k can be written in the same Kalman gain notation as We note that index ℓ could be defined as future analysis timesteps rather than model timesteps if data are only introduced at regular analysis intervals.The full smoother solution for timestep k looking forward, is then obtained by the summation of smoother increments for all future timesteps (here assumed truncated to maximum lag L) as (see the appendix for the derivation).The cross time smoother gain matrix is simply a modified version of the standard ExtKF gain, and can be written as There is a subtlety here because in the full TC96 smoother the cross time error covariance P k,k+ℓ is not independent of P k,k+ℓ−1 .However, this will not be relevant in the simple smoother approximation as applied below.
At this point we introduce the key simple smoother approximation when we re-write the cross time error covariance as a simple decay rate and consequently also neglect any inter-dependence of smoother contributions from different times: which is equivalent to assuming This Eq. ( 13), when substituted into Eq.( 10), clearly expresses the approximation being made to recover the simple smoother solution from the ExtKS equations: which when using Eq. ( 6) then gives where I k+ℓ = x a k+ℓ − x f k+ℓ , reproducing the simple smoother under the additional assumption that L >> τ in Eq. ( 5).The smoother is now defined entirely in terms of the sequential analysis increments which allow post processing from an archive of increments from the sequential filter run.Another way to interpret this approximation is to say that the spatial and temporal error covariances in the KS are assumed separable, with the spatial covariances being determined by the KF equations, but the temporal covariances (from times k + ℓ to k) being approximated by a simple decay.We will return to this description later when we seek to extend the approximations to the ensemble KS case.
It is also possible to make the equivalent approximations to the smoothed uncertainties.For each smoother increment introduced, Eq. ( 9), there will be a corresponding reduction in the smoother error covariance given by so that the fully smoothed error covariance can be written as (see the appendix).Then, making the simple smoother approximation, Eqs. ( 12) and ( 13), here gives Now returning to use Eq. ( 8) we finally obtain where IP k+ℓ = P f k+ℓ − P a k+ℓ are the filter error covariance increments mirroring Eq. ( 15), the simple smoother equations for the increments.The smoothing equations ( 15) and ( 19) could clearly both be written in recursive format like Eq. ( 3) for ease of post processing.In the following sections we investigate how well these approximations work through comparisons in the where the model parameters are chosen as σ = 10; ρ = 28; β = 8 3 .All experiments are run for 20 time units, with each time unit consisting 100 timesteps.We performed a 'truth' run first with x, y and z values of 5 as the initial condition.Observations are assigned for x, y with frequency of every 5 and 20 timesteps respectively and error standard deviation of 2. No observations are taken for z.The simple smoother (DHM), Eqs.(1)-( 5), was executed with γ = 0.9 when the smoothing results have smallest error as compared to other γ values (experiments not shown).The reasons for this appear later.We also ran a modified Kalman smoother (MKS) using the approximated cross time Kalman gain as in Eq. ( 13).This is implemented by directly substituting the approximation into the full KS equations described in the appendix, in order to demonstrate DHM equivalent results.The uncertainty estimation for the MKS smoother is also obtained in the same way.

ExtKS Assimilation Results
Across the 100 member ensemble, we calculated the root mean square error (RMSE) timeseries against the truth for each smoothing method.Figure 1(a,b,c) show a portion of the (x, y, z) RMSE timeseries respectively, for the filter and the different smoothing methods in the thicker lines.Without any smoothing, for most timesteps the KF errors are larger than the smoother errors.The full ExtKS has smallest errors, however the DHM and MKS are almost identical and lie in between those for the KF and KS.Also on Figure 1 are dashed blue and green lines representing the ensemble mean smoothed standard deviation (STD) uncertainty estimates for the ExtKS (Eq.( 17)) and MKS (Eq.( 19)) respectively.It should be emphasised that these uncertainty estimates are calculated entirely independently of the actual truth values themselves which would not be known in a real assimilation problem.The level of agreement between these uncertainty estimates and the true RMSE is remarkable.Time mean RMSE for x, y, z are summarised in Table 1, along with the uncertainty STD, where calculated, over the entire 20 time units of the runs.Both DHM and MKS provide an improvement on the ExtKF results by 60-70% relative to the 200 ExtKS improvements for x and y, although the RMSE for z is not reduced in DHM and MKS.This is perhaps because the instantaneous error covariances between z and the assimilated x, y variables, as used in DHM, are insufficient to improve z, whereas the full ExtKS allows some history of x, y evolution to be used in deriving z smoothing increments.The RMSE numbers in parentheses are evaluated only at filter update timesteps where observation data are assimilated.These errors are smaller than the all time RMSE by ∼ 5%, as a result of the data assimilation at these timesteps.This is consistent with 205 the RMSE time series in Fig. 1 where the red line declines sharply where data are available.For the smoother solutions, however, the RMSEs are not only reduced at these times but as smoothers, by design, also yield improved analyses in between observation timesteps.Figure 2(a,b,c) show the actual x, y, z increments respectively, being introduced by the ExtKF and the different smoothers through the analysis times.In each case the smoother increments are additional to the filter increments, which appear clearly as red spikes every 5 timesteps where data are available.Smoother increments in between analysis times can be seen, with DHM and MKS increments being virtually identical, and decaying backwards in time from each filter increment.The ExtKS increments are more complex, sometimes being similar to the DHM, but sometimes they can be considerably larger.
If we look at the mean ratio of cross time error covariances relative to the filter forecast error covariances, in comparison to the simplified γ decay representation across time in Fig. 3, we can understand something of the performance of the smoothers.We do not expect these to be identical because the full cross time smoother covariances for larger lags, ℓ, take account of increments from intermediate times.For small lags values the average cross time error decay rates are fairly similar, however for larger lag values the model derived cross time error covariance can take the opposite sign.This happens on a similar timescale to the short oscillation period of x, y in L63 and is associated with the growing amplitude of these oscillations, reaching larger and smaller x, y values before phase lobe transitions.This is a very model specific behaviour and the simple smoothers γ decay error covariances cannot represent this.This also explains why larger γ values make the simple smoother worse in L63, because The key point is that the simplified smoother DHM provides substantial improvement over the ExtKF while incurring very little computational cost (no tangent linear model (TLM) runs and no storage of cross time error covariances) compared to the ExtKS.The DHM smoother can therefore be applied entirely through post processing of the filter results.While this was demonstrated in Dong et al. (2021), here we show more clearly how the equivalent MKS approximation is derived from the ExtKS equations and we also show how the smoothed uncertainties can be cheaply post processed and still give useful information.
In the next section we extend the decay assumption for cross time error covariances to apply to the ensemble Kalman filter/smoother equations which are much more relevant to large nonlinear models where direct modelling of error covariances across time is in any case infeasible.
4 Ensemble Kalman Smoother experiments in the Lorenz 1963 system

Approximating ensemble error covariances
In the ExtKF/S, a TLM propagates the flow-dependent error statistics which are then used to calculate increments.However, the TLM is not always reliable for a system as non-linear as the L63 model.Time mean RMSE against truth for the ExtKF and ExtKS, along with the modified MKS and the simplified DHM smoother for each variable in the L63 system (see also legend for Fig. 1).Numbers in parentheses are mean RMSE at observation timesteps only (not independent data).
The time mean of the standard deviations calculated for the uncertainties are also shown as STD.Time averaging is now over the entire 20 time units of each assimilation run.
results by estimating the error statistics with a finite ensemble of state realisations propagated by the full nonlinear model rather than by a TLM.This improves the quality of the forecast error covariance matrix.However, the update gains for the EnKF and ensemble Kalman smoother (EnKS) are defined identically to Eqs. ( 7) and ( 9) respectively, although the error covariances, P f k (the error covariance at timestep k) and P k,k+ℓ (the error covariance between timesteps k and k + ℓ) are calculated differently, being emulated from the limited ensemble of state vectors whose variability represents the uncertainty of the system.While ensemble filter methods are starting to be adopted for larger environmental models, the cost to store, update and apply posterior ensemble covariances still makes ensemble smoother methods generally infeasible.
However, these constraints can again be overcome by retaining the EnKF flow dependent ensemble spreads to represent current errors while making a simple decay approximation for the time shift error covariances, similar to our modified ExtKS in Sect.2.2.For comparison purposes, we demonstrate this method by starting with the forecast error covariance estimate in general form of EnKS in state space, keeping the same notation as in the MKS, although the actual computation is performed in the ensemble space as in the ensemble-transform Kalman filter (Bishop et al., 2001, ETKF). where is the normalised anomaly around the mean in an N e member ensemble of forecasts.
The first step in the ensemble filter is to update the ensemble mean: and the second step is to update the uncertainty using Eq. ( 8).Then in order to regenerate the ensemble of analysis perturbations the ETKF uses the transformation where T is chosen to ensure Eq. ( 8) is satisfied.
The EnKS can be viewed with EnKF as the basis, but with additional needs for the cross time error covariances for smoothing updates.Similar to the EnKF, the cross time covariances in the EnKS can be expressed in state space as As explained in the appendix, in the full smoother these cross time error covariances are calculated between the filter forecast (X f k+ℓ in the EnKS) and previous partially smoothed states (X k again in the EnKS), which require both past ensemble means and error covariances to be repeatedly smoothed.However, this is not necessary for the modified (simple) ensemble smoother, which we will here call MEnKS.The ensemble mean smoothing using Eqs.( 13) and ( 10) can be written: and then Eq. ( 18) can be used to obtain the smoothed uncertainties.This is a great simplification because to perform full ensemble smoothing would require the whole past ensemble to be stored at all times and re-processed.Equation ( 19) suggests that the error covariance increments must also be stored during the EnKF filter phase which would still be a large storage requirement for a big model but in fact only the diagonal elements of P s are likely to be of interest, i.e. the uncertainty variance of the state fields or even just a subset of these, so only a smaller set of uncertainty increments may need to be stored for post processing through Eq. ( 19).In the next subsection we show results from applying these approximations in L63.

EnKS Assimilation Results
Using the same L63 twin experiment as in Section 3 we solve both a full smoother (EnKS), and using the modified (MEnKS) algorithm Eq. ( 28).To be consistent with the extended KS configuration, we use an ensemble size of 100 and a fixed lag of 40 timesteps for smoothing.Figures 4(a,b,c) show the x, y, z RMSE and STD uncertainties, respectively, from these Ensemble Filter and Smoother runs in the same format as Figures 1 for the Extended Kalman filter.
These ensemble results are seen to produce lower RMSE than the equivalent ExtKF/KS results, c.f. Fig. 1, demonstrating the superiority of the ensemble method in dealing with the nonlinearity of the L63 model.Again the EnKS substantially reduces the RMSE compared to the filter and again the approximated simple ensemble filter MEnKS gives intermediate RMSE results with much less computational effort than the full EnKS.Although not as optimal as the EnKS, the simplified MEnKS shows much smoother temporal evolution of the RMSE, which will be a significant improvement if eg. applied to an ocean reanalysis.
The post processed uncertainty estimates also reproduce a reasonable estimate of the true RMSEs of the smoothed ensemble mean analyses.Figures 5(a,b,c) show the mean x, y, z increment timeseries, respectively, for the ensemble filter and the 2 smoothers.The increments are smaller than those from the ExtKF/KS.Fig. 2, reflecting the improved assimilation approaches.numbers in parentheses are RMSE calculated at timesteps with observations only (no independent data).Lag=40, γ=0.9

Discussion
The aim of this paper is clearly not to present an improved data assimilation approach for simple models but to explore traceable simplifications to current assimilation approaches which could be applied to high-dimensional models.In particular ocean and 290 earth system models are starting to be used for reanalysis of past climate states using essentially the same codes that have been developed for operational forecasting, especially of the atmosphere i.e. sequential "filter" codes.Even when 4D Variational  approaches are being used, eg. at ECMWF, the effective temporal smoothing window timescales are generally short reflecting atmospheric timescales.In these cases Kalman smoothing approaches could still yield tangible benefits especially for long timescale process variables associated with the earth system, and when reanalysing using sparse historical observing systems.
However there are still further challenges to applying smoothing in real large systems.In Dong et al. (2021), the simple smoother was applied to an ocean reanalysis and it was found that the smoothed analysis gave reduced errors compared to the filter, against independent, unassimilated, data.However it was also noted that problems can occur when observations or model are biased.Biased increments can be detected when the same increment gets repeatedly assimilated by the filter, which is a signal that the model is unable to retain the information.While this may not invalidate the filter analysis it could have a very detrimental impact on smoothing when multiple versions of the same increment may be added without the model being re-run.While bias can be allowed for if it is identified prior to smoothing, any real application of smoothing needs to consider this carefully.This is perhaps another reason to prefer smoothing as a post processing step when bias assessments can be made beforehand, rather than as an integrated part of a sequential forward analysis as it is usually presented in the literature (Todling and Cohn, 1996;Evensen and van Leeuwen, 2000;Bocquet and Sakov, 2014).Another option not explored here because L63 is too simple, is the ability to tune the γ decay timescale for different state variables.In Dong et al. (2021), this was suggested as allowing subsurface ocean increments to decay more slowly than surface increments for example.In the notation used here γ would then become a diagonal decay matrix multiplying the forecast error covariance to convert to a cross time error covariance.
A key benefit to smoothing in real systems would be to bring influence from observations made in the near future where none have been available in the near past, eg after deployment of new observing platforms.A key difference between our approach and using a full smoother is that in a full smoother the cross time error covariances depend upon observations previously assimilated within the smoothing lag time window (see Appendix).Thus a full smoother will reduce analysed error covariances due to the influence of short lag future data first, and in doing so will reduce the cross time error covariances to be applied for longer lag future data, ensuring that the most important near future data has the biggest smoothing influence.This shielding of longer lag influences if shorter lag data are available is missing in the simple smoother as presented and could cause the application of the smoother to give poor results when very frequent observations are available.Further simple modifications to take this into account might be envisioned, for example, allowing γ to reflect upgrades in the observing network during the period of the reanalysis.Alternatively, uncertainty reduction information for each future increment as estimated through Eq. 19 could be used to truncate, or reduce γ, for the smoothing of longer lag increments through Eq. 15.
Although we have proposed how these ideas could be used in ensemble systems we have not explored other challenges of using ensembles in large model products.In particular localisation is often required to remove unrealistic error covariances arising from limited ensemble sizes eg.Petrie and Dance (2010), and when extended to ensemble smoothing that localisation may need to vary with lag for the cross time error covariances eg.Desroziers et al. (2016).Faced with such challenges the simple smoothing method is at least explicit in its assumption that the spatial structure of the error covariances are static while guaranteeing that cross time error covariances will always decay away.
We have included the smoothing of uncertainty estimates in the analysis here despite the fact that these have rarely been attempted for previous large model reanalysis products even when only forward filter steps are involved.However, with the recent trend towards ensemble analysis products for both operational work and for reanalysis systems, it seems sensible to ask how well uncertainty estimates do correspond to the errors in an idealised system where this can be evaluated eg.against independent data.At the same time we have therefore demonstrated the ability to also evaluate smoother uncertainty estimates, and we have found these first results very encouraging.

Conclusions
We have demonstrated that both the extended Kalman smoother and the ensemble Kalman smoother can be simplified to use only a relatively small amount of information stored during a forward filter analysis.This permits the simple smoothing approach to be applied through post processing.The essential novelty is to treat cross time error covariance information as decaying exponentially on some tuneable timescale rather than seeking to model it directly.This allows stored state increments to be down-weighted and added to previous filter analyses.We also show how the smoother uncertainty information can be https://doi.org/10.5194/egusphere-2023-337Preprint.Discussion started: 15 March 2023 c Author(s) 2023.CC BY 4.0 License.post processed provided the increments (changes) in the error covariances between the forecast and analysis for each filter assimilation window are also stored.And we note that the error variance of state fields alone could be smoothed meaning that only 1 additional state field needs to be stored from each filter analysis.
The method has been demonstrated using assimilation runs of the Lorentz 1963 model, using the same idealised assimilated data over a 20 time unit truth run when starting the model from different initial conditions.Observational, but no model errors, are being simulated.In both the extended and ensemble smoother cases the full smoother approaches give the best RMSE results against the truth.However in each case the simple smoother still gives substantially improved RMSE results over the filter, typically giving around 50-60% of the improvements obtained from the full smoothing.We also include the RMSE evaluated only at filter analysis times, when the truth comparison data is not independent, and still find that the smoother results provide substantial improvements over the filter.The ensemble filter/smoother results are substantially better than the extended filter/smoother results as would be expected for such a nonlinear system as L63.The simple smoother retains this benefit as the flow dependent filter error covariances are used in filter analyses and it is these that are down-weighted again for the smoother's cross-time errors covariances.
We also demonstrate the smoothing of the uncertainty estimates in both systems.Remarkably the uncertainty estimates, presented as the STD of the smoother state variances, are in very good agreement with the RMSE errors actually being calculated against the truth.The uncertainties rise and fall over time similarly to the RMSEs as the model moves through more stable and unstable regions of phase space.Uncertainty estimates are usually a little lower than the calculated RMSE values.
The simple smoothing approach gives higher uncertainties than the full smoother estimates but is in excellent agreement with the simple smoother RMSE values.
We believe this approach offers a feasible offline post processing approach for improving reanalyses in real large Earth system models.An initial paper with first results on smoothing the Met Office ocean reanalysis using stored increments was presented in Dong et al. (2021).This paper demonstrates the traceable origin of the approach from Kalman filtering roots and puts the methods in a wider context, including showing how it can be used in ensemble systems that are just starting to be used operationally in order to get better representations of uncertainty.
To summarise the post processing requirements that would allow smoothing of large model datasets; 1.If increments from the sequential filter analysis are stored this should be sufficient to allow post processing of a smoother solution.
2. If an ensemble product is being generated only the ensemble mean fields and ensemble mean increments would be needed to obtain a smoothed ensemble mean solution.
3. If an uncertainty estimate is also needed for the smoother solution the minimum additional requirement would be to store the increments (change from filter forecast to analysis) of those components of the error covariance matrix of interest.This may consist of the error variances of all state fields or only a subset of state fields, eg.only surface fields from an ocean model.Appendix A: Formal derivation of the simple smoother system from the Kalman filter and Kalman smoother equations In order to show formally how our simple smoother system in Sect.2.1 is related to the classical Kalman smoother, we start with a brief summary of the Kalman filter (KF) and fixed-lag Kalman smoother (KS) formulations.We base our system of equations on Todling and Cohn (1996) (hereafter TC96), which we use as our reference for the classical smoother and we therefore adopt a notation similar to theirs.This is a more complex notation from that of the main part of this paper, but is necessary to complete our proof.

A1 Background to the Kalman filter and smoother
The analysis of the KF at time k, and its error covariance are given by where Here the subscript k|k − 1 indicates that the object is valid at timestep k, and has been formed from information up to and including timestep k − 1. States x f k|k−1 and x a k|k are the forecast state and filter analysis respectively at validity time k, where x f k|k−1 has been evolved by the model M, eg.x f k|k−1 = M(x a k−1|k−1 ).The forecast state error covariance P f k|k−1 may be evolved by the model (as in the extended KF) or obtained from an ensemble of model state forecasts (ensemble KF), but either way the analysis error covariance Eq. (A2) for P a k|k is obtained.The vector y k represents the observations at k, whose model counterparts are found using the observation operator H k via H k (x f k|k−1 ), and H k is the tangent linear operator of H k .R k is the observation error covariance matrix, and K k|k is the Kalman gain.Equations (A1), (A2), and (A3) are the same equations as ( 6), (8), and ( 7) respectively in the main paper, but using the TC96 notation.
For the classical fixed-lag KS of maximum lag L, an interval of L + 1 timesteps are updated together after every filter timestep.These L+1 states are valid for timesteps k, k −1, . . ., k −L and are to be updated by observations at timestep k.This is the backward-looking scheme of TC96 which runs interleaved with the filter (below we use j to represent backward-looking intervals).Prior to this update -and using a similar notation to TC96 -these states are x f k|k−1 , x a k−1|k−1 , . . ., x a k−L|k−1 .These are shown as the black states in Fig. A1(a).At this point, observations only up to k −1 have been assimilated, which is reflected in the notation, and so when 2 ≤ j ≤ L, superscripts 'a' refer to partially smoothed analyses generated only using observations up to time k − 1.The state x a k−1|k−1 is the pure k − 1 filter analysis and x f k|k−1 is the filter forecast for k derived from it.The where The new objects are: x a k−j|k is the updated smoothed state at k − j due to observations at k and P a k−j|k is the corresponding updated covariance.Both objects are obtained using K k−j|k , which is the gain for the smoother state at k−j due to observations at k.
To make these updates requires a new kind of covariance for errors between different times.These have a subscript of the form k, k ′ |k − 1, which indicates that the covariance is between timesteps k and k ′ , and has been formed from information up to and including timestep k − 1.In the above P af k−j,k|k−1 = P fa k,k−j|k−1 T are the covariances between errors in x a k−j|k−1 and x f k|k−1 .Each P af k−j,k|k−1 is obtained from P aa k−j,k−1|k−1 in Fig. A1(b) as separate covariance propagations (or from an ensemble of forecasts in the EnKS) for each k − j.If the EnKS is being used, these error covariances are not derived directly, but require all the partially smoothed ensemble members within the lag L to be retained.These are the same covariances as those expressed in Eq. ( 27) in the trucated main text notation.Equation (A4) is a version of (26) given in TC96, (A5) is their (39), and (A6) is their (35).Notice that the same error covariance matrix, P f k|k−1 , appears in the brackets in Eqs.(A3) and (A6).The incremental part of Eq. (A5) is the same as (16) in the main paper, Eq. (A6) is the same as (11), but here using the TC96 notation.There is no equivalent of (A4) given in the main paper.
The KS system is advanced one timestep by propagating x a k|k using the model to x f k+1|k (the blue state in Fig. A1 A2 Explicit equations for maximally smoothed states and covariances, and equivalence to the simple smoother Given the maximum lag, L, the sequence of states x a 0|L , x a 1|L+1 , x a 2|L+2 , . . .(and their error covariances P a 0|L , P a 1|L+1 , P a 2|L+2 , . ..) exploit maximum amount of observational information as they have been updated with all L + 1 sets of future and present observations.These are the states that are analogous to S 0 , S 1 , S 2 . . . in Sect.2.1.Other states in this appendix, such as x a 1|L , are only partially smoothed, but are still needed as part of the classical algorithm.For a general k, the fully smoothed states are x a k|k+L , which can be found from cyclic application of the KS equations.By recursively applying Eq. (A4) over the lag window, it is straightforward to find the following explicit full smoothing solution (now with superscript s as in main text although TC96 and Table A1 retain a) in a forward-looking perspective (with ℓ to represent forward-looking intervals):
Lorenz 1963 system.https://doi.org/10.5194/egusphere-2023-337Preprint.Discussion started: 15 March 2023 c Author(s) 2023.CC BY 4.0 License. 3 Extended Kalman Smoother experiments in the Lorenz 1963 system 3.1 Assimilation setup A twin experiment using the Lorenz 1963 model(Lorenz, 1963), hereafter L63, was carried out to evaluate the smoother.The L63 uses the classical setup with model equations of: Dong et al. (2021) used 3DVar for assimilation into L63 and they used a fixed background error covariance from a climatological L63 run.Here we ran the extended Kalman filter (ExtKF) and extended Kalman smoother (ExtKS) with 100 different initial conditions, but to avoid filter divergence we use a hybrid forecast error covariance, retaining a 5% weighting of the L63 climatological covariances in the background error.The fixed lag smoother uses L = 40 timesteps, as we found that errors of this lag are smaller than other lags in our L63 experiments.

Figure 1 .
Figure 1.The RMSE timeseries for the ExtKF and ExtKS along with the modified MKS and the simplified DHM smoother in the L63 system for (a) x, (b) y and (c) z.The RMSE are averaged across 100 independent assimilation runs starting from different initial conditions, assimilating observations from the same 20 time unit truth run.Only 2 time units of the run are shown to allow the behaviour around analysis times to be clearly seen.Dotted green lines and blue lines are the posterior uncertainty STD estimates for the MKS and ExtKS respectively.

Figure 2 .
Figure 2. As in Fig. 1 but the filter and smoother increments are shown, where the smoother increments where applied are additive to the filter increments.

Figure 3 .
Figure 3. Cross time error covariance decay rates for the ExtKS and the simple smoother.For the ExtKS the y axis is the smoother cross time error covariance divided by the filter forecast error covariance, averaged for all times over the 20 time unit L63 run.This is compared to the decay of γ ℓ used in the simple smoother.
https://doi.org/10.5194/egusphere-2023-337Preprint.Discussion started: 15 March 2023 c Author(s) 2023.CC BY 4.0 License.4. If uncertainty information from an ensemble product is required the minimum additional storage requirement would still be the filter increments in the error covariance components of interest.The whole past ensemble analyses would not be needed.

Figure A1 .
Figure A1.Schema of the fixed lag KS of TC96.Panel (a) shows the update and evolution of the set of states within the fixed lag interval of L + 1 timesteps.The smoother update starts with the set of states in black, which are updated (or smoothed) by observations at k to the set of states in red using Eq.(A1) for the most recent state (at the top), and Eq.(A4) for the remainder.The subscripts have the form k|p, where k is the state's validity time, and p is the timestep of the latest observations that have contributed to estimating that state.The Kalman gains (Eqs.(A3) and (A6)) rely on knowledge of the covariances in panel (b) (first column of the black matrix).The lag interval then progresses by one timestep (blue box of panel (a)), where the forecast (blue state) is evolved from the latest analysis using the model.Panel (b) shows the update and evolution of the error covariances within the fixed lag interval.The black matrix blocks are the error covariances of the set of black states in (a).The diagonal blocks have a subscript of the same form as the states, but the off-diagonal blocks have subscripts of the form k, k ′ |p, which correspond to cross covariances between timesteps k and k ′ .The smoother updates the black covariances to the red covariances using equations not fully shown in this paper (see Eqs. (37), (39), and (41) of TC96, which use information from the black matrices).The lag interval then progresses, as it does for the states, by one timestep (blue box), where the extra covariances (blue) are propagated from the red covariances using Eqs.(46) and (47) of TC96 and the tangent linear model.
(a)), giving a shifted interval of states (blue box in (a)).The covariances are propagated by the tangent linear model (the blue block matrices in Fig. A1(b)), or by propagating the ensemble of new analyses, giving a shifted interval of covariances (blue box in (b)).

Table 2 .
Time mean RMSE and STD uncertainties for each variable in EnKF, EnKS and MEnKS in the L63 model, averaged over time unit 1-20;