A variational data assimilation system for soil–atmosphere flux estimates for the Community Land Model (CLM3.5)

This paper presents the development and implementation of a spatio-temporal variational data assimilation system (4D-var) for the soil–vegetation–atmosphere transfer model “Community Land Model” (CLM3.5), along with the development of the adjoint code for the core soil–atmosphere transfer scheme of energy and soil moisture. The purpose of this work is to obtain an improved estimation technique for the energy fluxes (sensible and latent heat fluxes) between the soil and the atmosphere. Optimal assessments of these fluxes are neither available from model simulations nor measurements alone, while a 4D-var data assimilation has the potential to combine both information sources by a Best Linear Unbiased Estimate (BLUE). The 4D-var method requires the development of the adjoint model of the CLM which is established in this work. The new data assimilation algorithm is able to assimilate soil temperature and soil moisture measurements for one-dimensional columns of the model grid. Numerical experiments were first used to test the algorithm under idealised conditions. It was found that the analysis delivers improved results whenever there is a dependence between the initial values and the assimilated quantity. Furthermore, soil temperature and soil moisture from in situ field measurements were assimilated. These calculations demonstrate the improved performance of flux estimates, whenever soil property parameters are available of sufficient quality. Misspecifications could also be identified by the performance of the variational scheme.


Introduction
Interaction processes between the atmosphere and the solid earth surface are a case in which the large range of all timescales involved, from hours to centuries, are of importance.The quality of both short-term meteorological forecasts and centennial runs with climate models strongly depend on the models' ability to correctly simulate sensible and latent heat fluxes.However, skillful assessments of these fluxes over regional or global domains, or as integral quantities heat and moisture budgets, are neither amenable by mere model simulations nor measurements.All models are imperfect, and in prognostic mode errors at grid points are known in statistical terms at best, rather than exactly.Model simulations by soil-vegetation-atmosphere transfer (SVAT) models are set up to provide flux results on regular grids.The quality of simulations depends on moisture initial values, temperature initial values in soil and atmosphere, as well as insolation and atmospheric turbulence which are controlled by cloud and surface parameters, including soil-vegetation properties.Misspecification of one of these quantities will result in biased flux simulations.Observations are also error affected, with only statistical information available.In situ measurements are typically sparse for soil, especially for deeper soil layers, while space borne sensor footprints are coarse, valid for skin layers only, and with long revisit times.There are also no flux measurements available which cover large areas.Eddy covariance devices are sparse.
Rather, a combination of both information sources, models and observations, has the potential to optimally estimate fluxes, which are not directly observable.In any case, C. M. Hoppe et al.: A variational data assimilation system for the CLM advanced data assimilation can be considered a key technique to achieve best estimates of heat and moisture fluxes (Houser et al., 2010).On the other hand, if there are systematic discrepancies between model and data this clearly indicates deficiencies from at least one of these components.In this case, the identification can only be based on independent data of sufficient quality (Talagrand, 2010).
In meteorological applications, data assimilation is focused on estimating optimal initial values, tacitly assuming that available observations shall serve to analyse the physical state of the atmosphere as the most important parameter set for the best forecast.In contrast, from a meteorological viewpoint, soil data assimilation serves to provide optimal flux values of energy and moisture fluxes as a lower boundary condition of the forecast model.
Soil measurements are typically sparse.This is an incentive to apply advanced spatio-temporal data assimilation techniques, which generally have the highest potential to exploit limited data sets (Evensen, 2007).Most prominently, these include Kalman filtering and the four-dimensional variational (4D-var) data assimilation scheme.In typical soil data assimilation studies, the objective is to produce an improved, continuous land surface state estimate in space and time, from which fluxes between soil and atmosphere can be inferred as a secondary product.Yet fluxes are not typically expressed as prognostic but diagnostic parameters in models.
An introduction to different data assimilation methods of SVAT models can be found in Reichle (2008).The application of Kalman filters in soil data assimilation appears prominently in tandem with remote sensing data assimilation.First, Milly and Kabala (1986) presented an integration of models and remote sensing temperature data using an extended Kalman filter (EKF).Notably, numerically simulated vertical and horizontal polarised passive microwave and thermal infrared observations were assimilated by Entekhabi et al. (1994) into a one-dimensional soil moisture and temperature diffusion model by Kalman filtering.Soil moisture profile estimates were provided by Walker et al. (2001), assimilating near-surface parameters by Kalman filtering.A wealth of further Kalman filter studies has since been published, demonstrating the popularity of Kalman filtering in SVAT modelling.
In contrast to soil data assimilation, Kalman filtering has gained less attention in the operational meteorological forecast system.Rather, the 4D-var method is considered as the most advanced of practicable technique.This was often the motivation to complement the meteorological part in assimilation systems by the same variational method for the soil and SVAT section.Earlier examples of adjoint SVAT models, sometimes simplified versions, include those of Marais and Musson-Genon (1992), Callies et al. (1998), Rhodin et al. (1999) and Margulis and Entekhabi (2001).As a typical meteorological objective, Mahfouf (1991) and Bouyssel et al. (1999) applied the ISBA model and its adjoint, assessing the potential of standardised 2 m temperature observations to improve soil humidity simulations.In the former study, weather situations with strong direct radiative impact were selected, where a tight coupling between atmosphere and soil prevails.Under these conditions, meteorological data proved to be especially useful to improve soil analyses.Hess et al. (2008) report a similar observational condition, and additionally used precipitation data, demonstrating improvements in forecasting 2 m temperatures and atmospheric low-level humidity.As a large step forward to satellite data assimilation, Reichle et al. (2001) introduced remotely sensed brightness temperature for assimilation with a radiative transfer model.In very recent years, an enhanced number of studies on energy and moisture fluxes, applying advanced data assimilation techniques, were published.These include the work of Bateni and Entekhabi (2012), who implemented an ensemble Kalman smoother.The authors demonstrated that this algorithm is an efficient and flexible data assimilation procedure that is able to extract useful information on the partitioning of available surface energy from land surface temperature measurements.Although the study was not based, but results were compared to a dynamic variational model, the technique eventually provides reliable estimates of turbulent heat fluxes.Traditional approaches consider soil and vegetation as a combined source, not accounting for the difference between soil and canopy temperatures and turbulent exchange rates.In contrast, Bateni and Liang (2012) consider the markedly different behaviour and analyse the contribution of soil and canopy to the turbulent heat fluxes separately.Soil parameter and flux estimates by remote sensing data assimilation is another area of recent progress, mostly based on microwave sensors.Hain et al. (2012) examine the assimilation of a thermal infrared product based on surface evaporative flux estimates from the Atmosphere Land Exchange Inverse (ALEXI) model and the MW-based VU Amsterdam NASA surface soil moisture product generated with the Land Parameter Retrieval Model (LPRM).
The general objective of this study is to evaluate the estimation of fluxes of energy and moisture between soil and atmosphere with a state-of-the-art SVAT model, based on soil temperature and humidity measurements.Given the 4Dvar potential, to provide physically consistent flux process simulations within an assimilation interval, without disturbing intermediate corrections at instances of available data, this method is adopted here.As the underlying model, the Community Land Model (CLM) version 3.5 is adopted (Oleson et al., 2008).This model simulates complex interactions between soil, vegetation and atmosphere in terms of energy and humidity, and optionally also the carbon-nitrogen cycle.A specific objective of this study is to develop the adjoint and evaluate the potential of 4D-var for flux estimates with this sophisticated and widely used SVAT model, and make it available.
Sect. 2 briefly describes the theoretical basis of timevariational data assimilation.Section 3 introduces the Community Land Model and the development of its adjoint, while Sect. 4 provides a succinct discussion on model parameter impact.Results are presented in Sect.5, and Sect.6 contains the conclusions.

Theory of 4D-var data assimilation
This section gives a short description of the 4D-var method, as applied in the study.More comprehensive expositions in the context of general data assimilation may be found in, for example, Talagrand (1997) and Bouttier and Courtier (1999).A general overview of data assimilation in all earth compartments can be found in Lahoz et al. (2010).
Data assimilation seeks to combine the following information sources, to provide a best estimate of states or processes: 1. a priori or background knowledge, provided by forecasts or climatological information sources, 2. measurements of geophysical states or parameters, and 3. knowledge of governing process dynamics, as introduced to the model code.
Advanced data assimilation methods include the solution of partial differential equations, as for example in this study, the parabolic equations of moisture and heat fluxes.Placing emphasis on the 4D-var method, this technique is briefly described as follows.In order to obtain a phase space trajectory of the model for the assimilation interval, which accounts for continuous and consistent model dynamics and related heat and soil moisture fluxes and their budgets, the adjoint model version is developed and set in a 4D-var context.However, the expenditure for the coding in this method is high.Let x be the control vector containing the variables to be optimised, which may be model initial conditions, model parameters, or both.The optimal state estimate, commonly termed as analysis x a , is found by the minimisation of a quadratic cost function J : with background costs J b and observational costs J o .Matrix B is the background error covariance matrix, containing the estimated errors of background knowledge and its covariances.In this study, B includes the vertical correlation, while cross-covariances between temperature and humidity are not taken into account.With these two parameters and 10 soil layers, B is a symmetric 20 × 20 two-block diagonal matrix, which can be factorised into a diagonal matrix of standard deviations and a correlation matrix C to read B = C .We assume that the vertical correlation increases with depth, following the same reasoning which designs the vertical grid spacing to increase with depth.Therefore, we argue that the variable layer thickness can be taken as units.Adopting a Gaussian covariance model for the correlation, dependent on distance in terms of model layer units, elements of C then read Here i and j are the soil layer indices, and l is the correlation length in terms of layers.In our case we found best results with l = 2.
Observational cost J o measures the differences of model values and observations over the entire assimilation interval.To compare observations y i of time step i with the corresponding model prediction M i (x 0 ), the model state must be projected onto the observation space by the observation operator H i , which is linearised, if applicable.Matrix R i denotes the observation error covariance.To minimise the cost function J , the gradient with respect to the initial state x 0 is calculated by adjoint calculus, prior to the minimisation step, which is typically provided by quasi-Newton techniques, for example by the L-BFGS (limited memory Broyden-Fletcher-Goldfarb-Shanno) algorithm (Liu and Nocedal, 1989).The gradient ∇ x 0 J of total costs J is where M * i is the adjoint model and H T i denotes the transposed linear observation operator.
For the calculation of the gradient at initial time of the assimilation window, ∇ x 0 J , the adjoint model is required, sometimes also quoted as backward model.It is the development of this adjoint model which renders the 4D-var method work and maintenance intense.
This study applies the 4D-var method for individual soil columns and time.Strictly speaking, this results in a 2-D-var approach.As this term is typically reserved for spatial data assimilation, the term 4D-var is used in what follows.
The adjoint model can be understood as follows: Variation δJ o of observational costs by variation of the state δx i during the ith time step is linearly approximated by with • , • denoting the scalar product, and Eq. ( 4) the tangent-linear equation, valid if δx is sufficiently small.Let M i be the model operator, which projects the model state from time 0 to time i, here CLM, x i = M i (x 0 ).Then an initial perturbation δx 0 evolves to time i by δx i ≈ M ′ i δx 0 .
Introducing this to Eq. ( 4), we find One obtains Hence, by , the sought-after gradient ∇ x 0 J of the cost function with respect of the initial values is available.
Complex models such as the CLM are composed of long routines of several hundred lines of code.The development of the adjoint is facilitated by adjoint compilers (see Sect. 3.2).

Community Land Model
The CLM (Bonan et al., 2002b;Oleson et al., 2008) is a land surface model originally developed for coupling with the Community Earth System Model (CESM) and the Community Atmosphere Model (CAM).Model components of the CLM include biogeophysics, the hydrological cycle, biogeochemistry and dynamic vegetation, but the latter two are not part of this study.The underlying fundamental equations for soil temperature T and soil humidity are and respectively.Here, c denotes soil heat capacity, λ thermal conductivity, z soil depth and k hydraulic conductivity.The soil water or capillary potential is , while S gives the local net effect of sources and sinks.
The land surface representation distinguishes between five primary land cover types (glacier, lake, wetland, urban and vegetated) in each grid cell.The vegetated area of a grid cell is described by plant functional types (PFTs), which are characterised by their typical leaf and stem area index and canopy height and a number of other physiological parameters.Each subgrid land cover type and PFT patch presents at least one separate column for energy and water calculations (Bonan et al., 2002a).
CLM features the hydrological cycle over land by interception of water by plant foliage and wood, throughfall and stemflow, infiltration, runoff, soil water and snow.These processes are directly linked to temperature, precipitation and runoff, and affects the biogeophysics module as well.In this study, CLM version 3.5 is used.The most important difference to the previous version (CLM3.0)addresses the representation of the hydrological cycle.Alterations include an improved canopy integration scheme (Thornton and Zimmermann, 2007), a new frozen soil scheme (Niu and Yang, 2006), a basic groundwater model for identifying the water table depth (Niu et al., 2007), a set of features as a novel surface data sets derived Moderate Resolution Imaging Spectroradiometer (MODIS) products (Lawrence and Chase, 2007), scaling of canopy interception (Lawrence et al., 2007) and a simple TOPMODEL-based model for surface and subsurface runoff (Niu et al., 2005).
The CLM comprises 10 soil layers which are thin close to the surface and thicker with increasing soil depth (see Table 1).The concept of this soil layer definition is presented by Lawrence et al. (2008).

Adjoint compiler TAPENADE
The tangent-linear and adjoint code of CLM were created using the adjoint compiler TAPENADE (Hascoët and Pascual, 2004).The latter is designed to create tangent-linear or adjoint code automatically from given FORTRAN code.However, there are some structures that cannot be differentiated by TAPENADE.In these cases, the original code has to be modified to be compatible with the adjoint compiler.Examples of incompatible FORTRAN structures are pointers and allocatable arrays.In CLM, all global variables are stored in pointer structures.Consequently, all pointers have to be converted to subroutine arguments before using TAPENADE.
Since the adjoint code should be as compact as possible, there are several levels of shortening the differentiated code.In this case, certain variables are not buffered or even used in the adjoint code.This is possible for variables that are not adjoint variables and have no impact on other adjoint variables.

Validation of the adjoint code
The correctness of the adjoint code is decisive for achieving proper analyses.The occurrence of errors cannot be excluded during the automatic differentiation procedure.Therefore, the automatically differentiated code must be tested in any case.
To verify the adjoint code, the derivatives of the cost function with respect to the initial state can be calculated using different methods.Here, the gradient ∇ x 0 J with respect to the initial states as calculated by the adjoint model has been tested by finite differences and by tangent-linear model integration.
Using the finite differences method, the gradient ∇ x 0 J for small x 0k can be approximated as In Eq. ( 9), J has to be continuously differentiable at x 0 .Here, x 0k is one component of the vector of the initial state x 0 .Equation ( 9) shows that, for each variable, one additional run of the forward model is required to calculate the gradient.However, this method does not deliver an exact result.The quality of the result depends on the choice of x 0k .
The second possibility for calculating ∇ x 0 J requires the tangent-linear model M ′ .The derivative ∂J ∂x 0k can be calculated using the chain rule: Here, e k is the kth unit vector.Like in the first method, one model run has to be performed per entry in x 0 .
The adjoint model M * is the third possibility to obtain the gradient.All components of the gradient can be calculated in one single run of the adjoint model by Eq. ( 6).
The advantage of the tangent-linear method is that the equivalence of the adjoint and tangent-linear methods can be validated exactly.On the other hand, there is still the problem that the automatic differentiation tool may engender the same error for both calculations.In our case, we used TAPENADE as adjoint and tangent-linear compiler.We also applied the finite difference method for validation.Applying both the finite difference method and the tangent-linear method, it could be verified that our adjoint code development of the core of CLM is correct.In more detail, it came out that there is less difference than 1 per mill between the exact tangent linear and the difference method if the choice of δx is appropriate.In the case of the CLM plant respiration, it was found by this double-checking procedure that the highest TAPENADE optimisation level gave erroneous results.By reducing the optimisation, the correctness of the code could be directly proven.

Parameter impact
Data assimilation as a branch of inverse modelling seeks to optimise initial values, such as of soil humidity and temperature in this study.It is tacitly assumed that these parameters are both insufficiently known and of high impact on the forecast skill.On the other hand, all other parameters are considered as sufficiently well known.However, in real cases this is often not true and significant model biases can be introduced.For the core differential equations of CLM, important parameters in Eqs. ( 7) and ( 8) include soil heat capacity, thermal conductivity k, hydraulic conductivity λ and soil water or capillary potential , which are often coupled by soil classification with typical values.Further, the local net effect of sources and sinks of water, the latter mostly boundary conditions like precipitation, evaporation, ground water level variation, vegetation states and their impact, and horizontal run-off can be difficult to observe and determine.In principle, all these parameters can be estimated by inverse modelling, given a first guess estimate of reasonably good quality, that is, the validity of the respective tangent-linear assumption.However, a situation with multiply ill-defined parameters will render the generalized optimisation problem extremely ill-posed, especially if vegetation parameters and not observed, yet highly volatile meteorological parameters like cloud-modulated insolation and turbulence are included.Surface albedo, in addition, will change with vegetation and soil moisture.
A pragmatic and practical way out of this problem can be found by test runs, where individual parameter variations exhibit parameter specific perturbation fields in the model results.These exercises are especially valuable when timescales of error sources involved are strictly different.As an example, the modified soil heat conductivity exhibits distinct amplitudes of heat during a diurnal cycle, provided surface forcing engenders a strong enough signal.
A statistical approach to identify a sufficiently consistent analysis is given by assimilation diagnostics (Talagrand, 2010), most prominently by the cost function, normalised by the number of observations p, which is χ 2 = J (x 0 )/p = 1/2.Degradations by biased parameters are readily visible in sequences of variational data assimilation results, where a zigzag-like time series emerges, following the chain of data assimilation intervals.Upon redefinition of related parameters, this feature reduces significantly.In this study, several test runs have been performed with moderately varied soil parameters and surface albedo.The best parameter setting was chosen for the assimilation runs.This procedure reduces model biases, though this basic method cannot deliver optimised parameters like a data assimilation algorithm.
To illustrate the importance of well-defined soil properties, we show results for two assimilation runs with different soil parameters.The assimilation included soil temperature and soil moisture measurements at the station Selhausen, which is located close to the station Merken and has a similar measurement setup (see Sect. 5).The simulation setup was also similar to the assimilation for the station Merken described in Sect. 5.In a first assimilation run, we used the soil type that was given in the description of the measurements, namely silt loam with 13 % sand and 17 % clay.The result for the soil temperature at 45 cm depth of the first run is shown in the left panel of Fig. 1.It is visible that the soil temperature is clearly overestimated in the first guess.The analysis of the soil temperature is of the same order of magnitude as the observations, but there are significant discontinuities visible at the ends of the assimilation intervals.In a second run, we changed the soil properties to 5 % sand and 25 % clay, which constitutes a finer soil texture but is still classified as silt loam.Using the finer soil texture, the background is closer to the observations of soil temperature at 45 cm depth, and the discontinuities in the analysis are smaller than in the first run (right panel of Fig. 1).

Idealised experiments
This section presents results from experiments with virtual measurements in an idealised environment.These experiments examine the assimilation algorithm in different configurations to expose its potential and limitations.In this way, the impact of changes in single parameters can be investigated without secondary effects.
A synthetic meteorology is used, which represents a day in June at mid-latitude (say 51 • N) under clear sky conditions.For diurnal variations of solar radiation R and temperature T , the sine functions and respectively, are used.Here, t denotes time in days.The amplitudes are set to a R = 700 W m −2 for insolation and a T = 10 K for temperature.T 0 is set to 290 K.A constant breeze of 2 m s −1 and a constant atmospheric humidity of 0.01 kg kg −1 are assumed.
All soil levels hold the same soil texture.The chosen soil type is loam, containing 40 % sand and 25 % clay.At the beginning of an assimilation interval, the relative humidity of the soil is set to the uniform value of 70 % in all soil levels.The soil is treated as bare, that is, there is no vegetation.This setup follows Schwinger et al. (2010), who performed sensitivity studies using the tangent-linear version of the CLM.
The experiments presented in this section contain the following steps: first, a forward run of the CLM is performed, which will be called background run or first guess in the following.Then, virtual measurements are defined, which markedly differ from the background run.After this, an assimilation run is performed and the resulting analysis is compared to the virtual observations and the background run.Error (co)variances in the cost function are considered to be similar for background and observations.For this reason, the impact of the observations on the analysis is as large as the impact of the background, which will be easily testable by an analysis right in the middle between background and virtual measurements.In the following, several experiments are discussed.

Assimilation of synthetic measurements
A first test is performed, aiming to exploit the potential of 4D-var to provide balanced analyses -that is, that no or only marginal disturbances or spin-up effects occur in the phase space evolution.In terms of dynamic systems theory parlance, this implies adherence to the slow or central manifold in phase space.In Fig. 2 the assimilation result for a virtual temperature observation at 172 cm depth at the end of a 12 h assimilation interval is shown.The analysis produces a good  result, in between the observation and the background, without any spin-up effects.
Other experiments are performed with soil temperature and soil moisture measurements in different environments (not shown).Single parameters, such as the depth of the measurement, initial soil moisture, length of assimilation interval and vegetation type are changed in different experiments.To summarise, the analysis delivers a reasonable result whenever the initial values of the active variables have an impact on the measured value.This is the case for soil temperature and soil moisture measurements in deep soil layers, and also in upper soil layers in the case of dense vegetation or shorter assimilation intervals.

Interaction of soil temperature and soil moisture
This section explains how different active variables in the assimilation system interact with each other.As an example, it is shown that a measurement of elevated soil temperature influences soil moisture content.The analysis, with a changed soil moisture content, is able to better represent the measured temperature.
In this experiment, virtual measurements of soil temperature down to 50 cm soil depth are assumed.The measurement, at the end of the assimilation interval (06:00-18:00 UTC), is set to a temperature that is 3 K above the respective simulated temperature in soil layers 1-7.The relative humidity of the soil is 50 %.The vegetation type selected is corn.
Figure 3 shows the time evolution of the soil temperature profile.The left panel displays the result of the background run and the right panel presents the analysis.To increase visibility, every soil layer is plotted with the same vertical extent, whereas the thickness of the levels in the simulation is different (see Table 1).
The upper soil layers show a pronounced diurnal temperature cycle of up to approximately 10 • C in both the background and the analysis run.The separated bars on the righthand side of each panel in Fig. 3 depict the virtual temperature measurement.In the analysis, there is a stronger warming of the upper soil layers than in the first guess.Accordingly, the temperatures of the analysis lie in between the measurement and the background.It is noticeable that the initial values of soil temperature have not changed much in the analysis.
Figure 4 shows the corresponding profiles of soil moisture.The left panel displays the volumetric soil moisture of the background simulation.At the beginning, soil moisture is constant in all soil layers.During the day, the upper soil layers become dryer.This process starts first in the upper soil layers and is most pronounced there.In the analysis, shown on the right-hand side, the initial values are changed compared to the background run.The upper soil layers are dryer than in the first guess.This causes lower evaporation rates at the surface.Thus, higher surface soil temperatures are achieved in the analysis by changing initial soil moisture values.It should be noted that the assimilation of initial soil temperature only would not significantly improve the fit to the measurements, since the surface temperature cycle in this specific case is controlled by the balance between absorbed solar radiation and latent and sensible heat fluxes.In this set-up the assimilation algorithm changes this balance by changing the initial soil moisture values.

Assimilation of soil temperature and soil moisture observations
This section presents results obtained from assimilation of real soil temperature and soil moisture measurements.It is investigated to what extent the assimilation is able to improve the model result of these variables.Section 5.2.3 shows the influence of the assimilation on surface heat fluxes.

Setup
The measurements are taken in Merken (Germany, 50 • 48 ′ N, 6 • 24 ′ E) in summer 2009 during the FLUXPAT campaign.Graf et al. (2010) present the setting of a similar  measurement campaign at the same location.The measurement station is placed on a barley field which has been harvested in the middle of June.After this, it is a stubble field until young plants begin to grow during August.
The measuring device for the soil temperature profile is a stick with five PT100 sensors at 2, 5, 10, 25 and 50 cm depth.Soil moisture is measured with two CS616 water content reflectometers, which measure soil moisture in parallel at 3 cm depth.All observations at this station are available every 10 min.
For this study, the CLM is run in offline mode.The meteorological input data is taken from high-resolution 24 h forecasts of the Weather Research and Forecasting (WRF) model, version 3.1 (Skamarock, 2008).The model domain consists of 109 × 119 grid boxes and covers an area surrounding the measurement station (50 • 12 ′ -51 • 24 ′ N, 5 • 36 ′ -7 • 12 ′ E).For both models, WRF and CLM, the same horizontal grid structure with a resolution of 0.01 • in the north-south direction and 0.015 • in the west-east direction is used.This corresponds to a horizontal resolution of about 1 km × 1 km.
The CLM time step is 30 min.All measurements taken on the full hour or half hour are included in the assimilation.The assumed errors for measurements and background information are listed in Table 2.
The simulation is run from June 2009 to August 2009.The CLM simulates the soil state in one single column of the model grid, where the measurement site is located.The assimilation interval comprises 24 h and starts at 00:00 UTC.Parameters for soil texture were adjusted as described in Sect. 4. For comparison, CLM is first run without data assimilation over the whole simulation period.This run will be referred to as the control run in the following.

Assimilation based analysis
In August 2009, the barley had already been harvested at Merken, but new plants had regerminated from lost grain.
The assimilation results for soil temperature at 5 cm depth are illustrated in Fig. 5.At 5 cm depth, CLM first guess simulates soil temperatures which are too high.In the assimilation, these values can be significantly improved, particularly the representation of the diurnal cycle.During two days at the end of the month analysis temperatures are 1-2 K higher than the measurements.On the other days, the differences between the analysis and the observations are small in terms of the assumed observational error of 0.5 K.
At 50 cm depth, the measurements show lower temperatures than the CLM control run (see Fig. 6).The difference is approximately 2-3 K.The analysis is more consistent with the observations.The maximum difference of the analysis and the measurements is around 0.5 K, in most cases lower.There are discontinuities visible at the beginning of the assimilation intervals, which suggests that the model is not yet able to properly reproduce the real situation, and the assimilation algorithm has to correct these differences in every assimilation interval, indicating remaining deficiencies of model parameters.
In Fig. 7 the soil moisture at 3 cm depth is shown.In the control run, the soil moisture is clearly underestimated by the model.In the background run, based on the analysis of the day before, as well as in the analysis, the CLM simulation is in better agreement with the observations.The difference is in most cases lower than the assumed observational error of 4 %.In the analysis, there are also discontinuities at the edges of the assimilation intervals.The specific amount of these discontinuities are highly sensible of the chosen error estimates of soil temperature and soil moisture.If, for example, the error of soil temperature is considered to be rather small, then the jumps in the analysis of soil moisture become quite large.

Energy fluxes
There are also measurements of energy fluxes available for the measurement site Merken.The instruments used were an Ultrasonic Anemometer (CSAT3, Campbell Scientific, Logan, UT, USA)1 and a H 2 O/CO 2 gas analyser (Li7500, Li-Cor, Lincoln, NE, USA)2 .The measurement method of a similar measurement campaign at the same location is described in Graf et al. (2010).
Figure 8 shows the measured and modelled sensible heat flux at the station Merken.Flux measurements were not included in the assimilation algorithm, which means that the differences of background and analysis are only due to the assimilation of soil temperature and soil moisture measurements.In Fig. 8 it is shown that in the reference run sensible heat flux is overestimated, as visible for example on 4 and 24 August.The comparison between analysis and observations shows that the overall sensible heat flux is closer to the observations than the background, so that the assimilation can improve the simulation results.
The results for the latent heat flux are shown in Fig. 9.There is a strong variation present in the quality of the forecast.On some days the background, as well as the analysis, fit very well with the measurements, e.g. on the 2, 8, and 13 August.In other cases, an improvement is visible in the analysis, as for example on 27 and 28 August.In the case of direct solar insolation on nearly bare soil the latent heat flux analysis can however degrade, due to loss of hydraulic contact of the soil skin with the lower soil layers.When the soil skin is heated and soil humidity is fully evaporated the upper soil layers should intercept the latent heat flux, which is not well represented in the model.Therefore erroneously high latent heat flux values occur on 4-7 and 15-19 August, which are not seen in the observations.

Conclusions
To summarise, the results of the assimilation show that the developed assimilation system for the CLM is able to produce reasonable results, under the condition that the parameters of the model are chosen correctly.This study shows that the quality of the simulation result depends strongly on parameters of soil properties and vegetation, which are insufficiently known, and which are highly variable in space.The atmospheric impact is also an important factor, and a fully coupled SVAT-atmospheric 4D-var assimilation scheme including plant parameter optimisation is a target set-up.To obtain a good analysis, these parameters have to be optimised systematically.This is also possible in a data assimilation algorithm, and is scheduled for a later development phase.

Figure 1 .
Figure 1.Soil temperature in Selhausen for 15 to 18 July 2007 at 45 cm depth.The black dashed curve shows the control run, the black solid curve shows the CLM forecast based on the analysis of the previous day, while the blue curve depicts the analysis.Measurement values are shown as a red line.The left panel shows results for the original soil composition, containing 13 % sand and 17 % clay.The right panel shows the respective results for a soil containing 5 % sand and 25 % clay.

Figure 2 .
Figure 2. Assimilation of a virtual soil temperature observation in soil level 9 (172 cm depth) at the end of an assimilation interval of 12 h.The black curve shows the background result, the dotted black lines indicate the background error, and the blue dashed line shows the analysis.The observation is displayed in red.

Figure 3 .
Figure 3.Time development of soil temperature profile: CLM forecast without assimilation (background run, left panel) and analysis (right panel).In each panel, the separated columns next to the 18:00 temperature profile depict the virtual measurements in soil layers 1-7.

Figure 4 .
Figure 4. Time development of soil moisture profile: CLM forecast without assimilation (background run, left panel) and analysis (right panel).

Figure 5 .Figure 6 .
Figure 5. Soil temperature in Merken in August 2009 at 5 cm depth.The black dashed curve shows the control run, the black solid curve shows the CLM forecast based on the analysis of the previous day, while the blue curve depicts the analysis.Measurement values are shown as a red line.

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Soil moisture in Merken in August 2009 at 3 cm depth.The red dotted line displays values of a further instrument (see text).Otherwise, colours are as in Fig. 5.

Table 1 .
Soil layers in the CLM.

Table 2 .
Error estimates in the assimilation run for measurement site Merken.