Investigating the role of prior and observation error correlations in improving a model forecast of forest carbon balance using Four Dimensional Variational data assimilation

Efforts to implement variational data assimilation routines with functional ecology models and land surface models have been limited, with sequential and Markov chain Monte Carlo data assimilation methods being prevalent. When data assimilation has been used with models of carbon balance, prior or “background” errors (in the initial state and parameter values) and observation errors have largely been treated as independent and uncorrelated. Correlations between background errors have long been known to be a key aspect of data assimilation in numerical weather prediction. More recently, it has been shown that accounting for correlated observation errors in the assimilation algorithm can considerably improve data assimilation results and forecasts. In this paper we implement a Four-Dimensional Variational data assimilation (4D-Var) scheme with a simple model of forest carbon balance, for joint parameter and state estimation and assimilate daily observations of Net Ecosystem CO 2 Exchange (NEE) taken at the Alice Holt forest CO 2 ﬂux site in Hampshire, UK. We then investigate the effect of specifying correlations between parameter and state variables in background error statistics and the effect of specifying correlations in time between observation errors. The idea of including these correlations in time is new and has not been previously explored in carbon balance model data assimilation. In data assimilation, background and observation error statistics are often described by the background error covariance matrix and the observation error covariance matrix. We outline novel methods for creating correlated versions of these matrices, using a set of previously postulated dynamical constraints to include correlations in the background error statistics and a Gaussian correlation function to include time correlations in the observation error statistics. The methods


Introduction
The land surface and oceans are responsible for removing around half of all human emitted carbon-dioxide from the atmosphere and therefore mediate the effect of anthropogenic induced climate change.Terrestrial ecosystem carbon uptake is the least understood process in the global carbon cycle (Ciais et al., 2014).It is therefore vital that we improve understanding of the carbon uptake of terrestrial ecosystems and their response to climate change in order to better constrain predictions of future carbon budgets.Observations of the Net Ecosystem Exchange (NEE) of CO 2 between terrestrial ecosystems and the atmosphere are now routinely made at flux tower sites world-wide, at sub-hourly resolution and covering multiple years (Baldocchi, 2008), providing a valuable resource for carbon balance model validation and data assimilation.
Data assimilation is the process of combining a mathematical model with observations in order to improve the estimate of the state of a system.Data assimilation has successfully been used in many applications to significantly improve model state and forecasts.Perhaps the most important application has been in numerical weather prediction where data assimilation has contributed to the forecast accuracy being increased at longer lead times, with the four day forecast in 2014 having the same level of accuracy as the one day forecast in 1979 (Bauer et al., 2015).This increase in forecast skill is obviously not solely due to data assimilation but also increased quality and resolution of observations along with improvements in model structure, however the introduction and evolution of data assimilation has played a large part (Dee et al., 2011).The current method implemented at many leading operational numerical weather prediction centres is known as Four-Dimensional Variational data assimilation (4D-Var) (Bonavita et al., 2015;Clayton et al., 2013), which has been shown to be a significant improvement over its predecessor three-dimensional variational data assimilation (Lorenc and Rawlins, 2005).Variational assimilation techniques minimise a cost function to find the optimal state of a system given all available knowledge of errors in the model and observations.The minimisation routine typically requires the derivative of the model which can sometimes prove difficult to calculate.Using techniques such as automatic-differentiation (Renaud, 1997) can reduce the time taken to implement the derivative of a model.
In numerical weather prediction data assimilation has been predominately used for state estimation whilst keeping parameters fixed.This is because numerical weather prediction is mainly dependent on the initial state with model physics being well understood.Ecosystem carbon cycle models are more dependent on finding the correct set of parameters to describe the ecosystem of interest (Luo et al., 2015).This is possibly why Monte Carlo Markov chain (MCMC) data assimilation methods have been used more with ecosystem carbon cycle models.Smaller ecosystem models are much less computationally expensive to run than large numerical weather prediction models, meaning that MCMC methods (requiring many more model runs than variational assimilation methods) are more easily implemented.For larger scale and more complex ecosystem models variational methods represent a much more computationally efficient option for data assimilation.Variational data assimilation can be used for joint parameter and state estimation by augmenting the state vector with the parameters (Navon, 1998).By including the parameters in the state vector we must also specify error statistics and error correlations for them.Smith et al. (2009) show that the prescription of these error statistics and their correlations can have a significant impact on parameter-state estimates obtained from the assimilation.
Many different observations relevant to the carbon balance of forests have now been combined with functional ecology models, using data assimilation, in order to improve our knowledge of ecological systems (Zobitz et al., 2011;Fox et al., 2009;Richardson et al., 2010;Quaife et al., 2008;Zobitz et al., 2014;Niu et al., 2014).Two such models that have been used extensively with data assimilation are the Data Assimilation Linked Ecosystem Carbon (DALEC) model (Williams et al., 2005) and the Simplified Photosynthesis and Evapo-Transpiration (SIPNET) model (Braswell et al., 2005).Nearly all data assimilation routines built with these models have used sequential and Monte Carlo Markov chain (MCMC) data assimilation methods with the exception of a variational routine being implemented for DALEC by Delahaies et al. (2013).There have been examples of global land surface models being implemented with variational methods such as the ORganizing Carbon and Hydrology In Dynamic EcosystEms model (ORCHIDEE) (Krinner et al., 2005) and the Biosphere Energy Transfer HYdrology scheme (BETHY) in a Carbon Cycle Data Assimilation System (CCDAS) (Kaminski et al., 2013).These examples have mainly been used to assimilate data from satellite and atmospheric CO 2 observations with only a few cases where site level data has also been assimilated (Verbeeck et al., 2011;Bacour et al., 2015).
Forest carbon balance model parameters are often determined in advance of using the model for forecasting by calibration of the model against observations (Richardson et al., 2010;Bloom and Williams, 2015).Here we take the alternative approach of concurrent state-parameter estimation.A key difference between the joint state-parameter estimation approach and a priori calibration is the way that the observational data is used.Pre-calibration approaches train the model against historical data and so become infeasible when there is a lack of sufficient observational information prior to the model forecast period.Joint stateparameter estimation methods have the advantage that observations could be used as they arrive in real time, by sequential assimilation cycling.This approach also gives the possibility of adapting to changes in the forest (e.g., tree thinning, fires etc.) that may change the parameter values over time.
Background errors (describing our knowledge of error in prior model estimates before data assimilation) and observation errors have largely been treated as uncorrelated and independent in ecosystem model data assimilation schemes.In 3D and 4DVar schemes background and observation errors are represented by the error covariance matrices B and R respectively.
The off-diagonal elements of these matrices indicate the correlations between errors in the parameter and state variables for B and the correlations between observation errors for R. In the assimilation, the off-diagonal terms in the B matrix act to spread information between the state and augmented parameter variables (Kalnay, 2003).This means that assimilating observations of one state variable can act to update different state and parameter variables in the assimilation when correlations are included in B. In 4D-Var the B matrix is propagated implicitly by the forecast model, so that even a propagated diagonal B matrix can develop correlations throughout an assimilation window.These correlations will only be in the propagated B matrix, with the B matrix valid at the initial time remaining unchanged.Including correlations in B has been shown to significantly improve data assimilation results in numerical weather prediction (Bannister, 2008).
Including correlations between observation errors has only started to be explored recently in numerical weather prediction, with R still often treated as diagonal (Stewart et al., 2013).Including some correlation structure in R has been shown to improve forecast accuracy (Weston et al., 2014).Currently the correlations included in R have been mainly between observations made at the the same time rather than correlations between observations throughout time.When assimilating observations, data streams with many more observations can have a greater impact on the assimilation than those with fewer observations.In Richardson et al. (2010) this problem is discussed when assimilating large numbers of NEE observations along with smaller numbers of leaf area index and soil respiration observations.To address this problem Richardson et al. uses a cost function that calculates the product of the departures from the observations rather than a cost function which sums these departures, giving a relative rather than absolute measure of the goodness-of-fit to the observations.This problem is also encountered in Bacour et al. (2015) when assimilating daily eddy covariance data with weekly observations of the FrAction of Photosynthetically Active Radiation (FAPAR).In Bacour et al. (2015) the error in observations of FAPAR is divided by two in order to give these less frequent observations more weight in the assimilation algorithm.Specifying serial time correlations between observations represents another way of addressing this problem, whilst also adding valuable information to the data assimilation routine.Including serial correlations between observations of the same quantity decreases the impact of these observations (Järvinen et al., 1999) therefore increasing the impact of less frequent observations.
In this paper we implement the new version of DALEC (DALEC2 (Bloom and Williams, 2015)) in a 4D-Var data assimilation scheme for joint state and parameter estimation, assimilating daily NEE observations from the Alice Holt flux site in Hampshire, UK (Wilkinson et al., 2012).This assimilation scheme is then subjected to rigorous testing to ensure correctness.
A new method is outlined for including parameter and state correlations in the background "prior" error covariance matrix.
Currently parameter and state error statistics are largely treated as independent and uncorrelated when data assimilation has been used with models of carbon balance.We also introduce a novel method for including serial time correlations in the observation error covariance matrix.The idea of including time correlations between observation error statistics is new and has not been previously explored in carbon balance model data assimilation.These correlated matrices are then used in a series of experiments in order to examine the effect that including correlations in the assimilation scheme has on the results.

Alice Holt research forest
Alice Holt Forest is a research forest area managed by the UK Forestry Commission located in Hampshire, SE England.
Forest Research has been operating a CO 2 flux measurement tower in a portion of the forest, the Straits Inclosure, since 1998 so it is one of the longer forest site CO 2 flux records, globally.The Straits Inclosure is a 90ha area of managed deciduous broadleaved plantation woodland, presently approximately 80 years old, on a surface water gley soil.The majority of the canopy trees are oak (Quercus robur L.), with an understory of hazel (Corylus avellana L.) and hawthorn (Crataegus monogyna Jacq.); but there is a small area of conifers (Pinus nigra J. F. Arnold) within the tower measurement footprint area in some weather conditions.Further details of the Straits Inclosure site and the measurement procedures are given in Wilkinson et al. (2012), together with analysis of stand-scale 30 minute average net CO 2 fluxes (NEE) measured by standard eddy covariance methods from 1998-2011.The data used here span from January 1999 to December 2013, and consist of the NEE fluxes and meteorological driving data of temperatures, irradiance and atmospheric CO 2 concentration.The original NEE data were subjected to normal quality control procedures, including u * filtering to remove unreliable data when there were low turbulence night time conditions, as described in Wilkinson et al. (2012), but were not gap-filled.To compute daily NEE observations we take the sum over the 48 measurements made each day.We only select days where there is no missing data and over 90% of CO 2 flux observations have a quality control flag associated with the best observations and no observations associated with the worst from the EddyPro flux processing software (LI-COR, Inc., 2015).

The DALEC2 model
The DALEC2 model is a simple process-based model describing the carbon balance of a forest ecosystem (Bloom and Williams, 2015) and is the new version of the original DALEC (Williams et al., 2005).The model is constructed of six carbon pools (labile (C lab ), foliage (C f ), fine roots (C r ), woody stems and coarse roots (C w ), fresh leaf and fine root litter (C l ) and soil organic matter and coarse woody debris (C s )) linked via fluxes.The aggregated canopy model (ACM) (Williams et al., 1997) is used to calculate daily gross primary production (GPP) of the forest, taking meteorological driving data and the modelled leaf area index (a function of C f ) as arguments.Figure 1 shows a schematic of how the carbon pools are linked in DALEC2.The model equations for the carbon pools at day i are as follows: where T i−1 is the daily mean temperature, Ψ represents the meteorological driving data used in the GPP function and Φ on /Φ o f f are functions controlling leaf-on and leaf-off.Descriptions for each model parameter used in equations ( 1) to ( 7) are included in the appendix in table 3. DALEC2 differs from the original DALEC in that it can be parameterised for both deciduous and evergreen sites with Φ on and Φ o f f being able to reproduce the phenology of either type of site.The full details of this version of DALEC can be found in Bloom and Williams (2015).

4D-Var
Following the approach of Smith et al. (2011) for joint state and parameter estimation, we consider the discrete nonlinear dynamical system given by where z i ∈ R n is the state vector at time t i , f i−1→i is the nonlinear model operator propagating the state at time t i−1 to time t i for i = 1, 2, . . ., N and p i−1 ∈ R q is a vector of q model parameters at time t i−1 .For DALEC2 the state vector T , with the parameters shown in table 3. Given a set of fixed parameters, the value of the forecast at time t i is uniquely determined by the initial value.The model parameters are not updated by the nonlinear model operator, therefore the evolution of the parameters is given by, for i = 1, 2, . . ., N. We define the new vector x by joining the parameter vector p with the model state vector z, giving us the augmented state vector We define the augmented system model by where The available observations at time t i are represented by the vector y i ∈ R r i which are related to the augmented state vector through the equation where h i : R q+n → R r i is the observation operator mapping the augmented state vector to observation space and ε i ∈ R r i represents the observation errors.These errors are usually assumed to be unbiased, Gaussian and serially uncorrelated with known covariance matrices R i .
In the 4D-Var data assimilation detailed here we aim to find the parameter and initial state values such that the model trajectory best fits the data over some time window, given some prior information about the system.The output from 4D-Var is an updated set of parameters, and an updated model state, valid at the beginning of the time window.The updated model state may be used as initial conditions for a forecast using the full nonlinear DALEC2 model.We assume that at time t 0 we have an initial estimate to the augmented state, usually referred to as the background vector denoted x b .This background is assumed to have unbiased, Gaussian errors with known covariance matrix B. Adding the background term ensures that our problem is well posed and that we can find a locally unique solution (Tremolet, 2006).In 4D-Var we aim to find the initial state that minimises the weighted least squares distance to the background while minimising the weighted least squares distance of the model trajectory to the observations over the time window t 0 , . . .,t N (Lawless, 2013).We do this by finding the state x a 0 at time t 0 that minimises the cost function subject to the augmented states x i satisfying the nonlinear dynamical model ( 11).The state that minimises the cost function, x a 0 , is commonly called the analysis.This state is found using a minimisation routine that takes as its input arguments the cost function, the background vector (x b ) and also the gradient of the cost function given as, where . In practice ∇J(x 0 ) is calculated using the method of Lagrange multipliers as shown in Lawless (2013).We can rewrite the cost function and its gradient to avoid the sum notation as, and where, Solving the cost function in this form also allows us to build serial time correlations into the observation error covariance matrix R. The off-diagonal blocks of R represent correlations in time between assimilated observations and are usually taken to be zero.In section 2.6 we show how these off-diagonal blocks can be specified.We can also calculate the posterior or analysis error covariance matrix after assimilation as, We can use this matrix to estimate the uncertainty in our parameter and initial state variables after assimilation.

Implementation and testing of 4D-Var system
In our DALEC2 4D-Var scheme we are performing joint parameter and state estimation.Typically MCMC techniques have been used for joint parameter and state estimation with functional ecology models, such as DALEC2.However 4D-Var has been used for joint parameter and state estimation with global carbon cycle models (Kaminski et al., 2013).The variational approach is computationally efficient and robust, making it particularly suited to large problems with complex models.The augmented state vector, x 0 , corresponds to the vector of the 17 model parameters and 6 initial carbon pool values, which can be found in the appendix in table 3.Here the nonlinear model (DALEC2) only updates the initial carbon pool values when evolving the augmented state vector forward in time with the parameters being held constant.To find the background estimate, x b , to the augmented state vector we can either use a previous DALEC2 model forecast estimate of the state of the system for the site (when available) or use expert elicitation to define likely state and parameter values and ranges for the site.The background vector (x b ) and its corresponding standard deviations (see table 3) used in this paper were provided from existing runs of the the CARbon DAta-MOdel fraMework (CARDAMOM) (Exbrayat et al., 2015).The CARDAMOM output is a dataset derived from satellite observations of leaf area index which provides a reasonable first guess to DALEC2 state and parameter values for the Alice Holt research site.In this paper we assimilate observations of daily NEE.From Richardson et al. (2008) the measurement error in observations of daily NEE is between 0.2 to 0.8 gCm −2 day −1 .Richardson et al. ( 2008) also shows that flux errors are heteroscedastic.We assume a constant standard deviation of 0.5 gCm −2 day −1 in the assimilated observations of daily NEE as we found this standard deviation gave the best weighting to the observations in the assimilation algorithm, producing the best results for the forecast of NEE after assimilation.Assuming this constant standard deviation also allows for correlations in time between observation errors to be included more easily.Ignoring the heteroscedastic nature of NEE errors may influence results by giving observations of larger magnitude a higher weight than would be realistic.Future work should try to incorporate the heteroscedastic nature of NEE errors.
In order to find the tangent linear model (TLM) for DALEC2 it is necessary to find the derivative of the model at each time step with respect to the 17 model parameters and the 6 carbon pools.We use the AlgoPy automatic differentiation package (Walter and Lehmann, 2013) in Python to calculate the TLM at each time step.This package uses forward mode automatic differentiation to calculate the derivative of the model.In the following tests we use a diagonal approximation to the background and observation error covariance matrices so that, , where σ σ σ b is the vector of background standard deviations found in table 3 and σ σ σ o is the vector of observational standard deviations, for a single observation of NEE σ o = 0.5 gCm −2 day −1 .To minimise the cost function we use the truncated Newton iteration method (Nocedal and Wright, 1999) from the Python package Scipy.optimize(Jones et al., 2001).This method uses a number of stopping criteria to ensure convergence to a minimum of our cost function.In sections 2.4.1 to 2.4.3 we show tests of our scheme.

Test of tangent linear model
The TLM is used in the calculation of the gradient of our cost function in 4D-Var.We can have confidence that our implementation of the TLM for DALEC2 is correct as it passes the following relevant tests (Li et al., 1994).In 4D-Var we assume the tangent linear hypothesis, where δ x 0 is a perturbation of the initial augmented state x 0 and γ is a parameter controlling the size of this perturbation.The validity of this assumption depends on how nonlinear the model is, the length of the assimilation window and the size of the augmented state perturbation δ x 0 .We can test this by rearranging equation ( 20) to find, as γ → 0 (here we are using the Euclidean norm).Equation ( 21) should hold if our implementation of the TLM is correct, even for a weakly non-linear model.Figure 2 shows equation ( 21) plotted for DALEC2 with i fixed at 731 days, a fixed 5% perturbation δ x 0 and values of γ approaching zero .Figure 2 shows that the TLM behaves as expected for values of γ approaching 0. This was also tested for different choices of x 0 and sizes of perturbation with similar results.
It is also useful to show how the TLM behaves over a time window to see how the error in the TLM grows as we evolve the augmented state further forward in time.We again rearrange equation ( 20) with an additional error term to find, J a n 1 9 9 9 A p r 1 9 9 9 J u l 1 9 9 9 O c t 1 9 9 9 J a n 2 0 0 0 A p r 2 0 0 0 J u l 2 0 0 0 O c t 2 0 0 0 In figure 3 we plot the percentage error in the TLM tested throughout a two-year period as DALEC2 is run forward.From figure 3 we can see that the TLM for DALEC2 performs well after being run forward a year with less than a 7% error for all values of γ.By the second year we see some peaks in the error in spring and autumn.This is due to leaf on and leaf off functions in the TLM going out of phase with the nonlinear DALEC2.At these peaks the error reaches a maximum at 35% then coming back to around 10% before growing again in the autumn.Although this level of error is still acceptable we present results using a one year assimilation window in this paper as in practice we could cycle assimilation windows to make use of multiple years of data (Moodey et al., 2013).

Test of adjoint model
The adjoint model we have implemented for DALEC2 passes correctness tests.For the TLM M i,0 and its adjoint M T i,0 we have the identity for any inner product <, > and perturbation δ x 0 .This is derived from the adjoint identity (Lawless, 2013).Using the Euclidean inner product, equation ( 23) is equivalent to We evaluated the left hand side and right hand side of this identity for differing values of x 0 and size of perturbation δ x 0 and showed that they were equal to machine precision.

Gradient test
The 4D-Var system we have developed passes tests for the gradient of the cost function (Navon et al., 1992).In the implementation of the cost function and its gradient we regularise the problem using a variable transform (Freitag et al., 2010).For the cost function J and its gradient ∇J we can show that we have implemented ∇J correctly using the identity, where b is a vector of unit length and α is a parameter controlling the size of the perturbation.For small values of α not too close to machine precision we should have f (α) close to 1. Figure 4a shows f (α) for a 365 day assimilation window with b = x 0 ||x 0 || −1 , we can see that f (α) → 1 as α → 0, as expected until f (α) gets too close to machine zero at order α = 10 −11 .
This was also tested with b in different directions and similar results obtained.

Including correlations in the background error covariance matrix
As discussed in section 1, including correlations in B impacts how information from assimilated observations is spread between different types of analysis variables (Bannister, 2008).We explored a number of different methods in order to include parameter-state correlations in B. In this paper we present a method using a set of ecological dynamical constraints, based on expert judgement, on model parameters and state variables from Bloom and Williams (2015).Bloom and Williams (2015) show that implementing these constraints in a Metropolis Hastings MCMC data assimilation routine improves results significantly.The constraints impose conditions on carbon pool turnover and allocation ratios, steady state proximity and growth and the decay of model carbon pools.
In order to create a correlated background error covariance matrix, B corr , using these constraints we create an ensemble of state vectors which we then take the covariance of to give us B corr .To create this ensemble we use the following procedure: 1. Draw a random augmented state vector, x i , from the multivariate truncated normal distribution described by where B diag is the diagonal matrix described in section 2.4 and x i is bound by the parameter and state ranges given in table 3 in the appendix.

Test this
x i with the ecological dynamical constraints (requiring us to run the DALEC2 model using this state).
3. If x i passes it is added to our ensemble, else it is discarded.
Once we have a full ensemble we then take the covariance of the ensemble to find B corr .We chose an ensemble size of 1500 as a qualitative assessment using a larger ensemble showed little difference in correlations.In figure 5 we have plotted the correlation matrix or normalised error covariance matrix associated with B corr .This matrix includes both positive and negative correlations between parameter and state variables, with correlations of 1 down the diagonal between variables of the same quantity as expected.The largest positive off-diagonal correlation is 0.42 between f lab and C lab .This makes physical sense as f lab is the parameter controlling the amount of GPP allocated to the labile carbon pool, C lab .

Correlation
Figure 5: Background error correlation matrix created using method in section 2.5.Here the correlation scale for off-diagonal values ranges from −0.5 to 0.5 with the correlation along the diagonal being 1.For explanation of parameter and state variable symbols see table 3.

Specifying serial correlations in the observation error covariance matrix
The observation error covariance matrix does not only represent the instrumentation error for an observation but also the error in the observation operator (mapping the model state to the observation) and representativity error (error arising from the model being unable resolve the spatial and temporal scales of the observations).These other sources of error represented in R can also lead to correlations between observation errors (Waller et al., 2014).Errors in NEE observations come from different sources such as instrument errors, sampled ecosystem structure from the variable footprint of the flux tower and turbulent conditions (when there is low turbulence and limited air mixing the magnitude of NEE is underestimated).These errors due to turbulence can still have effect even after u * filtering (Papale et al., 2006).Due to this dependence on atmospheric conditions we expect the errors in observations of NEE to be serially correlated, as the atmospheric signal itself is serially correlated (Daley, 1992).If we were assimilating half hourly observations of NEE we would expect stronger correlations between observation errors, as atmospheric conditions are more constant at this time scale, with correlations between observation errors getting weaker with lower frequency observations.Although some studies suggest that the correlation between NEE measurement errors on the scale of a day is negligible (Lasslop et al., 2008), it is also likely that error in the observation operator and representativity error will lead to observation error correlations for NEE (Waller et al., 2014).
In section 2.3 we have re-written the 4D-Var cost function in equation ( 16) in order to allow the specification of serial observation error correlations in our assimilation scheme.These serial correlations are represented by the off-diagonal blocks of R. In work carried out with spatial correlations it has been shown that the structure of the correlation is not critical and that it is better to include some estimate of error correlation structure in the observation error covariance matrix than wrongly assume that errors are independent (Stewart et al., 2013;Healy and White, 2005).As a first attempt we try including temporal correlations on the scale of the observation frequency.We adapt the simple Gaussian model found in Järvinen et al. (1999) (a second order autoregressive correlation function was also tested but is not presented here).The correlation r between 2 observations at times t 1 and t 2 is given as, where τ is the e-folding time in days, a controls the strength of correlation, δ is the Kronecker delta and η is the cut off time after which the correlation between two observation errors is zero.We have incorporated a cut off for correlations between observation errors as the assumed correlation length scale for the assimilated observations is short.This cut off along with the form of correlation function using the Kronecker delta helps ensure R is positive definite and therefore invertible, as required in the assimilation process.The standard deviation assumed in the observations of NEE is 0.5 gCm −2 day −1 as described in section 2.4. Figure 6 shows the correlation matrix for R created using equation ( 27).Here observations made on adjacent days will have an error correlation of 0.3; this will then decay exponentially for observations farther apart in time.There are 67 NEE observations in this one year assimilation window.These observations are not all on adjacent days and this is evident in the structure of R. The effect of the short e-folding time chosen here (τ = 4) provides the desired structure.

Experiments
In the following sections we present the results of four experiments where we vary the representations of B and R while assimilating the same NEE observations in the window from the beginning of January 1999 to the end of December 1999.As shown in figure 3 the performance of the tangent linear model deteriorates after the first year.We then forecast the NEE over the next 14 years (Jan 2000 -Dec 2013) and compare with the observed data.Using this shorter analysis window with a long forecast allows us to see the effect of including correlations in the error statistics more clearly, as we have a longer time-series of data with which to judge our forecast after data assimilation.These experiments are outlined in table 1 where B diag and Rdiag are the diagonal matrices of the parameter and state variances and the observations variances respectively and B corr and Rcorr are the matrices as specified in section 2.5 and section 2.6 respectively. Experiment Table 1: The combination of error covariance matrices used in each data assimilation experiment.

Experiment A
In this experiment B diag and Rdiag were used in the assimilation as described in section 3.1.Because these contain no correlations this experiment forms the baseline by which the subsequent results from assimilation experiments are judged.
Figure 7a shows assimilation and forecast results for NEE.We can see that assimilating the observations of NEE has improved the background with the analysis trajectory (green line) fitting well with the observations during the assimilation window (Jan 1999-Dec 1999).The analysis trajectory then diverges in the forecast (Jan 2000 -Dec 2013).This can be seen more clearly in figue 8a, where there is an over prediction of respiration in the winter and the seasonal cycle does not match that of the observations.This is also shown in figure 9a where we have plotted the model-data differences for a year's period averaged over the 14 years in the forecast period.Figure 9a shows that the largest errors in our posterior model forecast occur as a result of not capturing the phenology of the site correctly, in particular the start of the season from April to June.
To see how well the forecast performs after assimilation we show a scatter plot of modelled NEE against observed NEE in figure 10b.From table 2 the predictions have a Root-Mean-Square Error (RMSE) of 4.22 gCm −2 day −1 and a bias of −0.3 gCm −2 day −1 for the forecast of NEE, whereas the analysis (Jan 1999 -Dec 1999) has a RMSE of 1.36 gCm −2 day −1 and a bias of −0.03 gCm −2 day −1 .The background trajectory is the model trajectory for DALEC2 when run using the prior estimate of the parameter and initial state values described in section 2.4.The background or prior model trajectory has a RMSE of 3.86 gCm −2 day −1 and a bias of −1.60 gCm −2 day −1 in the analysis window (Jan 1999 -Dec 1999) and the same RMSE of 3.86 gCm −2 day −1 but a bias of −1.36 gCm −2 day −1 during the forecast period (Jan 2000 -Dec 2013).Although using B diag and Rdiag in the assimilation has considerably reduced the RMSE in the analysis period, it has also increased the RMSE in the forecast of NEE.However it has reduced the bias in the model forecast considerably from −1.36 gCm −2 day −1 to −0.3 gCm −2 day −1 .The bias in the background is due to the background model predicting less negative values of NEE than observed (i.e.above the 1:1 line shown in figure 10a).This leads to considerably worse results for the background trajectory than the analysis and its forecast for total forest carbon uptake.It is important to compare our results here with the background trajectory.The background acts as our initial prior model estimate and is the starting point for our minimisation in 4D-Var.Comparing our assimilation results with our background trajectory give us confidence that our 4D-Var scheme is improving the results of our model after assimilation.

Experiment B
Here B corr (as defined in section 2.5) and Rdiag are used in the assimilation.Figure 7b shows assimilation and forecast results for NEE.In figure 8b we can see that the forecast performs considerably better than in experiment A, with the analysis trajectory no longer over predicting winter respiration and matching the observed seasonal cycle of NEE more closely in the forecast period (Jan 2000-Dec 2013).This can be seen more clearly in figure 9b where the improvement in the period April-June is considerable as we capture green-up at the site more closely.Even though we have improved the representation of leaf-on in our model significantly here we can see from figure 8b that this is still where we have the largest uncertainty for our model after assimilation.From figure 10c and table 2 we see that the forecast RMSE has almost halved (now 2.56 gCm −2 day −1 ) with a reduction in bias also, now −0.2 gCm −2 day −1 .In comparison using B corr in the assimilation very slightly degrades the fit for the analysis (Jan 1999 -Dec 1999), with a RMSE of 1.42 gCm −2 day −1 and a bias of −0.04 gCm −2 day −1 , as shown in table 2.
As discussed in section 1 previous work has shown the importance of specifying parameter-state correlations when using variational data assimilation for joint parameter and state estimation (Smith et al., 2009).In 4D-Var the initial correlation structure is evolved implicitly through time.However, in order to make full use of the observations it is essential to specify an accurate estimate to the initial correlation structure.Therefore by not specifying these correlations in experiment A we allow the parameter and state variables to attain unrealistic values in order to find the best fit to the observations in the analysis window (Jan 1999 -Dec 1999), leading to the divergence seen in the forecast (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) in experiment A.
We can see the effect that including correlations in B has on the analysis update in figure 11.For some variables including correlations in B has had a large impact on the analysis update after assimilation.This is particularly clear for the f lab parameter.The largest positive off-diagonal correlation in B corr is between C lab and f lab , with f lab also having a large positive correlation with c lma as shown in section 2.5.The effect of these correlations has been to change the analysis increment for f lab from being slightly positive in experiment A to being strongly negative by following the analysis update of its correlated variables C lab and c lma .From figure 11 we can also see some of the possible reasons for the improved fit to the observations in experiment B. From figure 9a the largest errors in our model forecast of NEE in experiment A stem from a misrepresentation of the phenology of the site in the months April-June.We see that the parameter controlling day of leaf on (d onset ) has been updated slightly differently in comparison to experiment A, with day of leaf on now being slightly later in the year (day 124 instead of 119), again this is due to the included correlations in B.Even this small change in d onset appears to reduce the errors at the start of the season for experiment B as seen from figure 9b.The forecast is also no longer over predicting winter respiration to the same extent as in experiment A. From figure 11 we see that the main parameters controlling ecosystem respiration in NEE ( f auto , θ lit , θ som , Θ) have been reduced in comparison with experiment A, which we believe have led to an improved fit to observations in experiment B. In experiment A we also had an over prediction of peak carbon uptake in summer which has been improved in this experiment.From figure 11 we see that one of the parameters controlling the magnitude of gross primary productivity (c e f f ) has been decreased in comparison to experiment A. This appears to lead to less extreme predictions of peak summer carbon uptake than in experiment A. Two parameters with a significant change from experiment A are f f ol and C lit ; however in Chuter (2013) the DALEC model prediction of NEE is shown to be largely insensitive to variations in these parameters.
The added constraints provided by the correlations in B corr acts to regularise the data assimilation problem and avoid overfitting to the assimilated data by limiting the parameter space of the problem (Smith et al., 2009).These correlations have been diagnosed using the EDC's from Bloom and Williams (2015), as shown in section 2.5, and help to limit unrealistic behaviour for a mature forest site.Although this has led to a slightly degraded fit to the observations in the analysis window (Jan 1999 -Dec 1999) it has also significantly improved the fit to observations for the forecast (Jan 2000 -Dec 2013).

Experiment C
Here we use B diag and Rcorr (as defined in section 2.6) in the assimilation.Results shown in figure 7c, 8c and 9c appear similar to those in section 3.2 however there are some differences.From table 2 and figure 10d we see a slight reduction in RMSE for the forecast (now 4.09 gCm −2 day −1 ) in comparison with experiment A. As in experiment B the fit to the observations in the analysis window (Jan 1999 -Dec 1999) is very slightly degraded as the added correlations in Rcorr act to reduce the weight of the observations in the assimilation (Järvinen et al., 1999).The changes seen when using Rcorr in the assimilation are less than when using B corr as the correlations specified in Rcorr are on a short timescale and much weaker than those in B corr .In figure 11 we can see that the changes between experiment A and C in the analysis increment are much less than when using We also expect that specifying time correlations in R will help when assimilating other less frequently sampled data streams along with NEE as the serial correlations reduce the weight given to the mean of the more frequently sampled observations (here NEE) and also reduce the information content of these more frequently sampled observations (Järvinen et al., 1999;20002002200420062008  analysis and forecast after assimilation, grey shading: Error in model after assimilation (+/-3 standard deviations), red dots: observations from Alice Holt flux site with error bars.Daley, 1992), meaning that less frequently sampled data streams can have more impact on the assimilation.In the final experiment we use B corr and Rcorr in the assimilation.Figure 8d, figure 8b and 9a shows that using both correlated 372 matrices gives similar results as experiment B when B corr is used with Rdiag .However using Rcorr in addition to B corr provides 373 similar improvements as in experiment C. From table 2 and figure 10e we see the forecast RMSE is slightly reduced from 374 results in experiment B to 2.38 gCm −2 day −1 .Using both matrices appears to combine the beneficial effects described in both 375 section 3.3 and section 3.4.In figure 11 we can see that the analysis increment is very similar to experiment B. for the four experiments.Explanation of parameter and state variable symbols in table 3.

Summary
In our experiments we have shown that both B corr and Rcorr have the effect of improving the model forecast of NEE.As it can be difficult to inspect the skill of a certain model by only plotting model trajectories, in figure 12 we show Taylor diagrams displaying a statistical comparison of the four experiment and background analysis (Jan 1999 -Dec 1999) and forecast (Jan 2000 -Dec 2013) results with the observations of NEE.Here the radial distances from the origin to the points are proportional to the standard deviations of the observations and modelled observations and the azimuthal positions give the correlation coefficient between the modelled and observed NEE (Taylor, 2001).If a model predicted the observations perfectly it would have a correlation coefficient of 1 and a radial distance matching that of the observations (represented by the dotted line).
Figure 12a shows that all the experiments give very similar results in the analysis window (Jan 1999 -Dec 1999) with all the experiment points closely grouped on top of each other, whereas figure 12b shows the significant difference between the experiment results in the forecast (Jan 2000 -Dec 2013), with experiments B and D being closer to the dotted line.In all our experiments we find that θ min , C lab and C f ol reach the bounds after assimilation.In the case of θ min this is most likely due to the fact that we do not have enough information to recover this parameter when only assimilating observations of NEE, as the DALEC model prediction of NEE is insensitive to variations in this parameter (Chuter, 2013).Assimilating more distinct data streams could help avoid this edge-hitting behaviour.For C lab and C f ol this could suggest a flaw in the model or the fact that the prescribed bounds need to be relaxed slightly for the studied ecosystem.Our hypothesis is that the mechanism by which C lab is distributed to the leaves is over simplified; we intend to test this in future work.In table 4 we show the standard deviations for our parameter and state variables after assimilation.We can see that we have improved our confidence for most of these variables after assimilation when compared with the standard deviations in table 3.

Discussion
In this paper we have implemented the DALEC2 functional ecology model in a 4D-Var data assimilation scheme, building an adjoint of the DALEC2 model and applying rigorous tests to our scheme.Using 4D-Var can provide much faster assimilation results than MCMC techniques as we have knowledge of the derivative of the model.For our experiments the 4D-Var routine has taken in the order of 10 2 function evaluations to converge to a minimum, whereas MCMC techniques using the same model take in the order of 10 8 function evaluations (Bloom and Williams, 2015).However, we do assume that the statistics of the problem are Gaussian whereas MCMC techniques do not.We have shown that 4D-Var is a valid tool for improving the DALEC2 model estimate of NEE and that even when assimilating only a single year of NEE observations we can improve the forecast significantly.If more than one year was required, this type of data assimilation routine could be run in cycling mode, allowing for the assimilation of multiple years of data (Moodey et al., 2013).This also avoids any possible unstable behaviour associated with much longer single assimilation windows.However, here our aim is to investigate the effect of specifying correlations in background and observation error statistics on the forecast of NEE.We have therefore assimilated just one year of NEE observations and produced a long 14 year forecast in order to see more clearly the effect of including these correlations on the forecast when judging against observations.The observations of daily NEE from the Alice Holt flux site are quite variable year to year, peak summer uptake varies from −14.35 gCm −2 day −1 to −9.04 gCm −2 day −1 , and therefore provide a reasonable test for the ability of the DALEC2 model forecast, especially over a 14 year period.
We then considered the nature of background and observation errors.The effect of specifying parameter-state correlations in the background information and serial correlations between the observation errors was explored.
The technique presented here to specify B corr has been shown to have significantly improved forecasts of NEE over using Analysis (Jan 1999-Dec 1999) Experiment RMSE (gCm −2 day −1 ) Bias (gCm a diagonal representation of B. In section 3.3 we discuss how the correlations in B corr impact the analysis update for the parameter and state variables (see figure 11) causing the seasonal cycle of carbon uptake and magnitude of fluxes to fit more closely with the observations than when using a diagonal B in the assimilation algorithm.These results agree with those of Smith et al. (2009) where the importance of specifying parameter-state correlations when performing joint parameter and state estimation with variational data assimilation was shown.The added constraint provided by including correlations in the prior error covariance matrix, B, acts to regularise the assimilation problem.Hence, the parameter and initial state values we retrieve from our data assimilation are more likely to be realistic, leading to better insight into the studied system.For example we see from figure 11 that when using B corr in our assimilation we find a much longer labile release period (c ronset ) than when using B diag .This means that the period of green-up in our study site is possibly much longer than we would have estimated had we based our analysis on our assimilation results using a matrix B with no correlations.The method for specifying B corr in this paper used a series of ecological dynamical constraints taken from Bloom and Williams (2015).Implementing correlations in the prior error statistics in this way may prove difficult for models where these type of constraints are not available; however there are other methods to build correlations into B. One technique we also tested (not presented here) to create a correlated B involved evolving an ensemble of state vectors over the length of the chosen assimilation window using the model (DALEC2) and then taking the covariance of the evolved ensemble.This gave us a B with parameter-state and state-state correlations, but no parameter-parameter correlations as the parameters are not updated by the model.Using the B created with this method also improved assimilation results significantly over using a diagonal B. A larger number of different tests were run using different background vectors and variances and it was found that specifying some form of correlation structure in B always made an improvement to the results of the assimilation.As this work has only considered a single deciduous site, it would be useful to apply the techniques detailed here for an evergreen site.Evergreen ecosystems usually have less extreme seasonal variation, it will therefore be of interest to see if a similar improvement for evergreen ecosystem forecast results is found when using a B corr created using the same method.
The purpose of this exercise was to see how well we could forecast NEE while also investigating the effect of including correlations between error statistics.It was not an attempt to recover all the parameters and state variables with a high level of accuracy.However, it is still instructive to look at these values and compare with data where available.In Meir et al. (2002) an observed range is given for leaf mass per area (c lma ) for the Alice Holt flux site of between 40 to 80 gCm −2 .The background value for c lma in our experiments is 128.5 gCm −2 .When using diagonal error covariance matrices in experiment A we find a value of 38.7 gCm −2 for c lma after assimilation which is almost within the range given by Meir et al. (2002).In experiment D when using error covariance matrices including correlations c lma has a value of 51.6 gCm −2 after assimilation, this is well within the observed range given by Meir et al. (2002).From observations made by Forest Research we also have estimates of the above and below ground woody carbon pool (C woo ) at the start of 1999, with an observed value of 14258 gCm −2 .It is not clear how uncertain this estimate is.The background value for C woo in our experiments is 6506 gCm −2 .When using diagonal error covariance matrices in experiment A we find a value of 7291 gCm −2 for C woo , an increase but still far away from the observed estimate.In experiment D when using error covariance matrices including correlations C woo has a value of 7262 gCm −2 a similar result as experiment A.Here the assimilation has not been able to recover a value of C woo similar to that of the observed estimate.This is not necessarily of concern as we are not able to quantify the error in this observation.
Also we are assimilating observations of daily NEE only; NEE is the difference between Gross Primary Productivity (GPP) and Total ecosystem respiration (RT), (NEE = RT -GPP), with neither GPP nor RT being direct functions of C woo .Therefore it is unlikely that we will recover an accurate value of C woo , as the assimilated observations are not significantly impacted by large changes in this state variable; this result is also discussed in Fox et al. (2009).This may also explains why we are able to recover a reasonable value of c lma from the assimilation, as from equation ( 1) we can see that c lma is one of the input arguments taken by GPP.The function calculating NEE will therefore be sensitive to variations in the c lma parameter and so assimilating observations of NEE could help to constrain this parameter.
In numerical weather prediction it has been shown that including correlations in R can help improve data assimilation results (Weston et al., 2014;Stewart et al., 2013).However the specified correlations have most commonly been satellite interchannel correlations with observations errors still being considered independent in time.In this paper we have shown that including correlations between observation errors in time can also improve data assimilation results, here providing a slight improvement for the DALEC2 model forecast of NEE.Here we only see a small impact on our results when using Rcorr in the assimilation as the correlations we have included are weak (especially in comparison to those included in B corr ) and on a short time-scale.
We expect including correlations in R will have more of an impact on data assimilation results when assimilating data with stronger error correlations (i.e.finer temporal-resolution observations).We also expect including these serial correlations to have an even greater impact when assimilating more than one data stream as discussed in section 1.Using the form of R given in this paper for specifying serial correlations will also allow us to specify serial correlations between different observation types.When running the DALEC2 model with a day-night time step, instead of the daily time step used for this paper, this will allow us to build in the type of correlations investigated by Baldocchi et al. (2015) between ecosystem respiration and canopy photosynthesis.More work is needed to investigate the effect of including correlations between observations error statistics when assimilating multiple data streams.
The Rcorr presented in this paper has a weak correlation (a = 0.3 as shown in section2.6) in time between observations of NEE, this representation of Rcorr has slightly improved the model forecast of NEE.However other choices of Rcorr (with much stronger correlations between observations) tested for this paper degraded the forecast.This is probably due to the specified correlations being unrealistic and highlights the fact that a reasonable estimate of the true correlation structure for Rcorr is needed to have a positive impact on results.The development of a more diagnostic approach for the calculation of serial correlations in R would be useful.One option would be to adapt the Desroziers et al. (2005) diagnostic, which has been used successfully in numerical weather prediction for diagnosing observation error correlations for observations taken at the same time (Weston et al., 2014), and extending this technique to diagnose serial correlations.

Conclusion
Functional ecology and land surface model data assimilation routines largely treat prior estimates of parameter and state uncertainties and observation error statistics as independent and uncorrelated.In this paper we have shown the importance of including estimates of such correlations, especially between background parameter and state error statistics when performing joint parameter and state estimation.
When performing joint parameter and state estimation including correlations in the background error covariance matrix significantly improves the forecast after assimilation, in comparison to using a diagonal representation of B. Specifying serial time correlations between observation errors in R also improves the forecast and we expect these correlations to have a greater impact when assimilating more than one data stream.More work is needed to investigate the effect of including these correlations when assimilating multiple data streams.The development of a more diagnostic tool for the calculation of the error correlation structure in R is also important.
When including both parameter-state correlations in B and time correlations between observation errors in R and assimilating only a single year of NEE observations we can forecast 14 years of NEE observations with a root-mean square error of 2.38 gCm −2 day −1 and a correlation coefficient of 0.88.This is a significant 44% reduction in error from the results when using a B and R with no specified correlations of 4.22 gCm −2 day −1 and a correlation coefficient of 0.79.

Figure 1 :
Figure 1: Representation of the fluxes in the DALEC2 carbon balance model.Green arrows represent C allocation, purple arrows represent litter fall and decomposition fluxes, blue arrows represent respiration fluxes and the red arrow represents the influence of leaf area index in the GPP function.

Figure 2 :
Figure 2: Plot of the tangent linear model test function (equation (21)) for DALEC2, for a fixed TLM evolving the perturbed augmented state 731 days forward in time and a fixed 5% perturbation, δ x 0 .

Figure 3 :
Figure 3: Plot of the percentage error in the tangent linear model (equation (22)) for DALEC2 when evolving the model state forward over a period of two years with three different values of γ and a fixed 5% perturbation δ x 0 .

Figure 4 :
Figure 4: Tests of the gradient of the cost function for a 365 day assimilation window with b = x 0 ||x 0 || −1 .

Figure 6 :
Figure6: Observation error correlation matrix for the 67 observations used in assimilation created using method in section 2.6 with τ = 4, a = 0.3 and η = 4.

Figure 7 :
Figure 7: One year assimilation and fourteen year forecast of Alice Holt NEE with DALEC2, blue dotted line: background model trajectory, green line: analysis and forecast after assimilation, grey shading: Error in model after assimilation (+/-3 standard deviations), red dots: observations from Alice Holt flux site with error bars.

Figure 8 :Figure 9 :Figure 10 :Figure 11 :
Figure 8: As figure 7 but only showing the first and final two years results from the one year assimilation and fourteen year forecast of Alice Holt NEE with DALEC2, blue dotted line: background model trajectory, green line: analysis and forecast after assimilation, grey shading: Error in model after assimilation (+/-3 standard deviations), red dots: observations from Alice Holt flux site with error bars.
Figure 12: Taylor diagrams displaying statistical comparison of the four experiment and background analysis (Jan 1999 -Dec 1999) and forecast (Jan 2000 -Dec 2013) results with observations of NEE (gCm −2 day −1 ).The dotted line represents the standard deviation of the observations and the contours represent values of constant root mean square error between model and observations.

Table 3 :
Parameter values and standard deviations for background vector used in experiments.

Table 4 :
Standard deviations for each experiment after assimilation, calculated using equation 19.