Assimilation of High‐Resolution Soil Moisture Data Into an Integrated Terrestrial Model for a Small‐Scale Head‐Water Catchment

Land surface‐subsurface modeling combined with data assimilation was applied on the Rollesbroich hillslope (Germany). Dense information from a soil moisture sensor network was assimilated with the ensemble Kalman filter applying different scenarios including the update of model states with or without updating of saturated soil hydraulic conductivity on an ensemble size of 128 (or 256) realizations with 3‐D heterogeneous fields of Mualem‐van Genuchten parameters. Simulations were also carried out with a synthetic test case mimicking the Rollesbroich site, to get more insight in the role of model structural errors. The combination of joint updating of model states and hydraulic conductivity was more efficient in updating the soil water content than state updating alone for the real‐world case. On average, the root‐mean‐square error at the sensor locations was reduced by 14% if states and parameters were updated jointly, but discharge estimation was not improved significantly. Synthetic simulations showed much better results with an overall root‐mean‐square error reduction by 55% at independent verification locations in case of daily soil water content data assimilation including parameter estimation. Individual synthetic data assimilation scenarios with parameter estimation showed an increase of the Nash‐Sutcliffe‐Efficiency for discharge from −0.04 for the open loop run to 0.61. This shows that data assimilation in combination with high‐resolution physically based models can strongly improve soil moisture and discharge estimation at the hillslope scale. Large performance differences between synthetic and real‐world experiments indicated the limits of such an approach associated with model structural errors like errors in the prior geostatistical parameters.


Introduction
Modeling the soil water content (SWC) is of high interest for various geoscientific research fields. The SWC influences the water and energy cycles at the local, regional, and global scales . It controls, for example, the partitioning of net radiation into latent, sensible and ground heat fluxes, as well as the partitioning of precipitation into infiltration and runoff (Grayson et al., 1997;Robinson et al., 2008). Thus, a precise characterization and prediction of the SWC patterns is essential for understanding and quantifying the water and energy cycles for applications like weather prediction, flood prediction, and real-time irrigation scheduling.
Given this importance, recent modeling studies have applied integrated terrestrial model platforms (e.g., AquiferFlow-SiB2, Tian et al., 2012;CATHY, Bixio et al., 2002;Camporese et al., 2010;MikeShe, Abbott et al., 1986;Graham & Butts, 2005; ParFlow-CLM, Maxwell & Miller, 2005;Kollet & Maxwell, 2006; ParFlow-WRF, Maxwell et al., 2011;PIHM, Qu & Duffy, 2007;Kumar, 2009;TerrSysMP, Shrestha et al., 2014) for modeling two-way feedbacks between different terrestrial compartments. Coupled land surfacesubsurface models are well suited for in-depth investigation of SWC spatial variability in combination with in situ SWC data at the hillslope scale. Examples are studies on the controls of SWC variability (e.g., Cornelissen et al., 2014;Fatichi et al., 2015;Herbst et al., 2006;Ivanov et al., 2010) as well as the relation between SWC distribution and the rainfall runoff response (e.g., Herbst & Diekkrüger, 2003;Sciuto & Diekkrüger, 2010). These studies demonstrated that fully coupled land surface-subsurface models can be a valuable tool for a better understanding of the small-scale processes at the hillslope scale. However, due to the high computational needs of these models, studies were often conducted with uncalibrated parameters and without model uncertainty estimation. The uncertainty of modeled SWC is related to erroneous model forcings (e.g., precipitation), model parameters (e.g., soil hydraulic properties), and initial conditions, as well as model structural deficits.
With the help of data assimilation (DA) techniques, model predictions can be improved by merging (uncertain) observation data with uncertain model predictions (Burgers et al., 1998;Vrugt et al., 2005). Ensemble based sequential DA methods rely on a probabilistic framework where an ensemble of model realizations is propagated forward and updated each time observation data are available. One of the most commonly applied algorithms is the ensemble Kalman filter (EnKF; Evensen, 1994;Burgers et al., 1998). The EnKF is also used for updating model parameters, for example, by an augmented state vector approach (Chen & Zhang, 2006;Hendricks Franssen & Kinzelbach, 2008). This is especially important for subsurface applications, since parameter uncertainty is an important source of uncertainty for subsurface terrestrial system models. The computational costs for the EnKF are not excessive as the needed ensemble size is in general not too large (less than 1,000 ensemble members) and parallelization to speed up the simulations is trivial (Kurtz et al., 2016;Pauwels & De Lannoy, 2009). This is one of the reasons that the EnKF has been used in combination with different types of terrestrial system models (e.g., atmospheric models, hydrological models, and land surface models) and various kinds of observation data.
Only relatively few studies at the hillslope scale combine DA with integrated terrestrial system models. Nevertheless, some authors were able to show the high potential of the EnKF in combination with integrated hydrological models. For example, Camporese et al. (2009) found that streamflow predictions improved by assimilating different combinations of pressure head and streamflow data. They performed the DA experiments for a synthetic tilted v-catchment with a fully coupled surface water-groundwater flow (CATHY) model including 3-D subsurface flow and found that pressure head data always improved the characterization of streamflow, whereas standalone streamflow assimilation showed less improvement. This was also confirmed by Bailey and Baù (2012) who estimated the spatial distribution of hydraulic conductivity by the assimilation of streamflow and water table data in a similar test case. Shi et al. (2014) assimilated six data types (discharge, groundwater table depth, SWC, land-surface temperature, sensible heat, latent heat, and transpiration) with the EnKF and the land surface-subsurface model Flux-PIHM mimicking a small forested catchment. They estimated several, but spatially homogeneous, Mualem-van Genuchten soil hydraulic properties and found an improvement of the model simulations in association with a high sensitivity of the subsurface soil hydraulic properties (i.e., the Mualem-van Genuchten air entry and shape parameters). Shi et al. (2015) confirmed these findings also for experiments with real-world observations of the Shale Hills catchment in Pennsylvania, USA. Both studies by Shi et al. (2014Shi et al. ( , 2015 were conducted with a simplified representation of subsurface flow, namely, 2-D groundwater flow and 1-D flow in the unsaturated zone. Some studies with integrated hydrologic models focused more on experimental design. For instance, in a synthetic experiment, which mimicked the Biosphere 2 Landscape Evolution Observatory hillslopes, Pasetto et al. (2015) estimated saturated hydraulic conductivity values and investigated the impact of the number and spatial distribution of assimilated SWC data for a hillslope on the characterization of the hydrologic response of the CATHY model with a 3-D synthetic subsurface. If the number of sensors fell below a certain threshold (100 instead of 496), the model was not able to reproduce the synthetic truth by SWC assimilation with the EnKF. Rasmussen et al. (2015) investigated the relationship between the number of ensemble members and the number of pressure head measurement data on the estimation of the discharge rate with help of the ensemble transform Kalman filter. They performed this study for the agriculture dominated Karup catchment in Denmark and concluded that less head observations required more ensemble members to reproduce the synthetic discharge and pressure head observations. Assimilating discharge observations in combination with parameter estimation required a larger ensemble size than the assimilation of groundwater table measurements. Zhang et al. (2015) found that the EnKF performance is very sensitive to the estimation of the initial model parameter uncertainty. The majority of these examples are limited to synthetic test cases or, in case of real-world case studies, strong simplifications of the subsurface compartment. In particular, complex subsurface structures (i.e., fully 3-D heterogeneous fields of subsurface hydraulic properties) and their impacts on the EnKF performance within a fully coupled land surfacesubsurface model at high resolution have not yet been investigated.
DA studies in combination with highly nonlinear vadose zone flow models and heterogeneity of soil hydraulic parameters face additional challenges. The EnKF shows an optimal performance for Gaussian distributions, but strongly skewed pressure head distributions in the vadose soil zone can be expected under very dry conditions (Erdal et al., 2015;Zhang et al., 2018). Furthermore, it is important that the spatial structure of the heterogeneous soil hydraulic parameters (i.e., horizontal layers) is well captured. Otherwise, a calibration with the EnKF leads to poor results (Erdal et al., 2014). DA experiments for different soil types (e.g., Li & Ren, 2011;Montzka et al., 2011;Tran et al., 2014) also indicated that, besides stratification and heterogeneity, the capacity of DA to adequately characterize model states and parameters depends on the soil type and its related hydraulic properties. Within this context, Li and Ren (2011) indicated that the joint update of multiple hydraulic properties (i.e., Mualem-van Genuchten inverse air entry (α) and shape parameter (n) in combination with hydraulic conductivity (K s ) by the EnKF using a 1-D subsurface hydraulic model is more efficient than updating hydraulic conductivity alone. Other examples for the joint update of hydraulic conductivity, α, and n in combination with the EnKF and a 1-D subsurface model are given by Margulis (2011, 2013). Montzka et al. (2011Montzka et al. ( , 2013 used the particle filter for updating these soil hydraulic parameters in a 1-D model. The vast majority of 2-D and 3-D studies where soil hydraulic properties are estimated only update saturated hydraulic conductivity, since the estimation of other soil hydraulic parameters is associated with numerical instability (Rasmussen et al., 2015;Vereecken et al., 2016). Exceptions are studies by Shi et al. (2014Shi et al. ( , 2015 and Pasetto et al. (2015). However, these studies made various simplifications. Shi et al. (2014Shi et al. ( , 2015 assumed spatially homogeneous zones within their subsurface setup. Pasetto et al. (2015) estimated 3-D spatially distributed hydraulic conductivity and porosity under assumption of perfect knowledge of the remaining hydraulic parameters. Chaudhuri et al. (2018) estimated 3-D spatially distributed hydraulic conductivity and Mualem-van Genuchten parameters with an iterative filter for a synthetic case. The impact of only updating saturated hydraulic conductivity on the performance of an integrated 3-D terrestrial model for a real-world case, which includes also other spatial heterogeneous and uncertain soil hydraulic parameters (e.g., Mualem-van Genuchten α, n), has not yet been explored.
This gap of applying the EnKF on a complex high-resolution integrated terrestrial model at the field scale with 3-D heterogeneous fields of Mualem-van Genuchten subsurface parameters using real-world data is covered in this study. More specifically, SWC observations were assimilated in the Terrestrial Systems Modeling Platform (TerrSysMP) developed by Shrestha et al. (2014), which was coupled to the Parallel Data Assimilation Framework (PDAF) DA framework (Nerger & Hiller, 2013) by Kurtz et al. (2016). The physically based model ParFlow (Ashby & Falgout, 1996;Jones & Woodward, 2001;Kollet & Maxwell, 2006) and the Community Land Model (Oleson et al., 2004;Oleson et al., 2008) of TerrSysMP were used. ParFlow and Community Land Model (CLM) are fully coupled via source/sink terms at root zone layer. This allows transient interaction between 3-D subsurface flow, overland flow, and land-surface processes. Data sets collected in the context of the infrastructure initiative Terrestrial Environmental Observatories (TERENO; Zacharias et al., 2011), and the Transregional Collaborative Research center (TR 32;Simmer et al., 2015) were used for assimilation and also for verification. In particular, SWC, evapotranspiration (ET), and discharge measurements were available for this study in the Rollesbroich headwater grassland catchment in the Eifel (Germany). The simulations were made with fully 3-D heterogeneous fields of soil hydraulic properties which results in a model system with 0.3 million unknowns. DA experiments were both made for the real-world case and a synthetic case, which mimics the real-world case, with the aim to unravel the role of model structural errors. In the DA-experiments different combinations of state (i.e., pressure head) and parameter (i.e., saturated hydraulic conductivity) updating were explored. The following research questions are addressed with the conducted experiments: 1. Can the characterization of the hydrology at the hillslope scale (SWC, ET, and discharge) be improved with a combination of physically based integrated hydrological models and a dense network of SWC data using DA? 2. Is the joint update of model states and saturated soil hydraulic conductivity sufficient to adjust the model simulations closer to the observations or is it necessary to include additional soil hydraulic properties (e.g., inverse air entry) within the parameter update to constrain this complex nonlinear coupled model system with spatially heterogeneous soil hydraulic properties? 3. Is there a systematic difference in the performance of DA between synthetic experiments and real-world experiments which points to processes which are not captured by the high-resolution integrated model?

Study Site
The Rollesbroich study site (50°37′ 27″N, 6°18′ 17″E) is located in the Eifel low mountain range (Germany). The Rollesbroich study site is a subcatchment of the river Rur and has an area of~0.38 km 2 , an altitude varying between 474 and 518 m above sea level and slopes between 0% and 10%. As a part of the TERENO infrastructure, the Rollesbroich test site is an extensively managed grassland site dominated by smooth meadow grass (Poa pratensis) and perennial ryegrass (Lolium perenne). The prevalent soils are Cambisol (gleyic), Stagnosol and Cambisol-Stagnosol (Food and Agriculture Organization classification) with a depth between only 0.5 and 1.5 m Koyama et al., 2010). The underlying bedrock is sandstone and siltstone with a heavily weathered top layer (saprolite) of~0.1-to 0.5-m thickness. Data from a meteorological station in approximately 4-km distance operated by the North Rhine-Westphalian State Environment Agency (LUA NRW) for the period 1981-2001 show that the study site has an annual mean temperature of 7.7°C and a mean yearly precipitation of 1,033 mm. A drainage system in the northwestern part of the study site prevents flooding induced by high groundwater tables. The outlet of the drainage system is located close to the Kieselbach source. The diameters of the~80-year-old clay pipes vary between 3 and 20 cm. However, the state of the pipes (possibly broken or blocked) and their detailed impact on the hydrological characteristics of the study site are uncertain. Figure 1 provides an overview map of the study site and the measurement equipment used in this study.
Precipitation measurements at the study site are performed with an interval of 10 min and a resolution of 0.1 mm using a standard Hellmann type tipping bucket balance (TB) rain gauge (ecoTech GmbH, Bonn, Germany). The device setup 1 m above ground is in agreement with the policy of the German weather service for areas with heavy snowfall and altitude >500 m above sea level (World Meteorological Organization guidelines recommend 0.5 m). To avoid measurement errors due to instrument freezing, the device is temporally heated during the winter period.
Latent and sensible heat fluxes are measured by an eddy covariance station at the southern part of the study site (50°37′ 19″N, 6°18′ 15″E, 514 m above sea level). At this location also temperature and air humidity are recorded by a HMP45C, Vaisala Inc., Helsinki, Finland, at 2.58 m above ground. Furthermore, a fourcomponent net radiometer (CNR-1, Kipp and Zonen, Delft, The Netherlands) measuring incoming and outgoing longwave and shortwave radiation, a sonic anemometer (CSAT3, Campbell Scientific, Inc., Logan, USA) recording wind speed and direction are installed there at 2.6 m above ground surface. At this location also a gas analyzer (LI7500, LI-COR Inc., Lincoln, NE, USA) for specific humidity and air pressure measurements is installed. These on-site data were used as hourly forcing for the CLM model simulations for 2011. For 2010 and for gap filling off-site meteorological data from the nearby LUA NRW station were applied instead. Furthermore, the data were used to calculate actual ET, which was used for verification in the project.
The SWC at the study site is measured with a wireless sensor network (SoilNet; Qu et al., 2013). SoilNet data were used in the DA and for model verification. At each of the 179 sensor locations two redundant SPADE ring oscillator sensors (Model 3.04, sceme.de GmbH i.G., Horn-Bad Meinberg, Germany) are vertically installed at 5-, 20-, and 500cm depths within a distance of~10 cm to each other. This increases measurement volume and precision and helps to avoid data inconsistencies (e.g., contact issues with the soil matrix). Additional technical details can be found in Qu et al. (2013) and Qu et al. (2016). A long-term evaluation of the SWCs measured with the SPADE sensors at the study site indicated a drift in SWC values for individual SoilNet locations. The SWC data exhibited a gradual increase of maximum SWCs during saturated soil conditions for the period 2011 to 2013, which was not in line with the annual precipitation trend (Gebler et al., 2017). A linear SWC trend correction was applied which takes the maximum SWC of the wet periods in 2011 and 2012 into account. Trend correction was initially conducted for 82 SoilNet locations located in the southern part of the study site and separately for all three sensor depths (Gebler et al., 2017). This data set was further reduced to 61 sensor locations, since only gap-free SWC time series were used for DA. The sensors of the northern part of the study site were not included in our experiments as SWC values for the northern part of the study site were not available before 2012.
Discharge measurements for the Kieselbach, used for verification in this study, are conducted with a Venturi-Gauge weir close to the catchment outlet and two upstream Tomson gauges close to the headwaters of the Kieselbach (Qu et al., 2016).

TerrSysMP
For this study, ParFlow and CLM were applied in coupled fashion within the TerrSysMP (Shrestha et al., 2014) using the external coupler OASIS3-MCT (Valcke, 2013). While ParFlow simulates subsurface flow as well as overland flow, CLM calculates land-atmosphere exchange fluxes, snow, and vegetation processes. Both components were used in combination with the PDAF (Nerger & Hiller, 2013), which was implemented by Kurtz et al. (2016).
The subsurface model ParFlow simulates transient, variably saturated subsurface flow two-way coupled with overland flow at a high spatial resolution. On this account, ParFlow is designed to run in parallel on highperformance computing platforms (Kollet et al., 2010;Maxwell, 2013). Driven by external boundary conditions, ParFlow calculates the water pressure and saturation field as function of time. It solves the threedimensional Richard's equation (Richards, 1931) using a cell-centered finite differences scheme in space: where S s is the specific storage coefficient (L −1 ), S w relative saturation (−), ψ pressure head (L), t time (T), φ porosity (−), q the volume specific Darcy flux (L/T), q s a source/sink term (T −1 ) (e.g., pumping and injection), q e a general source/sink term that represents exchange fluxes between surface water and subsurface (L/T), m′ an interfacial thickness (L), K s saturated hydraulic conductivity tensor (L/T), k r relative permeability (−), and z positive downward vertical coordinate (L).
Relative saturation and permeability in ParFlow are expressed by the Mualem-van Genuchten relationships (van Genuchten, 1980). The subsurface parametrization in ParFlow can be done with constant input or spatially distributed input files like the ones generated with geostatistical conditioning methods (i.e., turning bands algorithm and parallel Gaussian simulation). Groundwater and overland flow in ParFlow are coupled via a backward Euler scheme in time using a free-surface overland flow boundary condition (Kollet & Maxwell, 2006). Shallow overland flow is simulated by the kinematic wave model and the 2-D momentum equation for streamflow is resolved without parameterized routing routines. Thus, the free surface flow is exclusively driven by gravity. The flow discharge relationships are described by Manning's equation (Kollet & Maxwell, 2008;Maxwell & Miller, 2005). For robust solutions of the subsurface and overland flow equations, the Newton-Krylov method in combination with multigrid preconditioning as described in Jones and Woodward (2001) is computed in a globally implicit manner. With an optional terrain-following grid (TFG) transformation which was introduced by Maxwell (2013), the computational efficiency of ParFlow can be improved, since TFG enables a reduction of model layers for study areas with large topographic gradients.
CLM v.3.5 is the upper boundary condition of ParFlow. Both models are dynamically coupled via source/sink terms at 10 overlapping subsurface horizons (root zone layer) of variable extent. This allows a transient feedback in two ways. Via the OASIS3-MCT coupler (Valcke, 2013), CLM provides ET to ParFlow and controls infiltrating precipitation. ParFlow passes SWC and pressure to the CLM subsurface. All water and heat fluxes in the CLM are calculated according the principles of mass and energy conservation. The CLM hydrologic cycle interacts with the land surface heat fluxes and accounts for snow, soil infiltration, and vegetation water uptake as well as for precipitation interception by plant foliage, throughfall, and stem flow. All these water fluxes can add (source) or extract (sink) water from the upper 10 subsurface horizons which are coupled with ParFlow. In the coupled ParFlow-CLM model, water flow in the subsurface, SWC content, and pressure heads is calculated by ParFlow. In the CLM, the Monin-Obukhov similarity theory according to Zeng et al. (1998) is used for calculating latent and sensible heat flux. The boundary layer resistance terms are implemented using the Biosphere-Atmosphere Transport Scheme, which was developed by Dickinson et al. (1993) and adapted by Zeng et al. (2005). The CLM distinguishes between five different land surface classes (glacier, lake, wetland, urban, and vegetation). The vegetated area can be further subdivided into plant functional types (Bonan et al., 2002) for which physiological plant parameters and other variables are defined. In the coupled mode, the TOPMODEL-based CLM runoff scheme and CLM variably saturated subsurface flow are replaced by ParFlow. Atmospheric forcing data (temperature, precipitation, solar radiation, humidity, and barometric pressure) are provided in a spatially distributed manner.

State and Parameter Estimation With the EnKF
In this study, the EnKF was applied to the ParFlow-CLM model of the Rollesbroich study site. The EnKF is an ensemble based sequential DA method to improve the estimation of model states (and possibly parameters) on the basis of optimally combining information from model simulations and observations.
The assimilation cycle of the EnKF is a two-step process consisting of a forecast and an analysis, which are repeated sequentially. In the forecast step, different model runs (in our case 128 or 256 which are also called stochastic realizations) with the ParFlow-CLM model (Μ) are propagated forward in time. The ParFlow-CLM model, as introduced in section 2.2, solves equations (1) and (2) for subsurface flow by the finite volume method, and in addition, it solves overland flow and the exchange of water and heat between the land and the atmosphere. The stochastic realizations differ because they use different initial conditions, saturated hydraulic conductivity, and precipitation as input. These are considered the main sources of uncertainty. The forecast step is given by the following: where x f ;t i represents the predicted model state vector at time step t, x t−1 i is the model state vector of the previous time step t − 1, p i represents all the model parameters, and q t i are the model forcings for each ensemble realization i (i = 1,2,…,n realz ).
The prognostic variable in ParFlow is pressure. The model state vector x f ;t i consists of the pressure head (ψ) simulated for the N different grid cells (x f ;t i ¼ ψ f ;t i ). In this study, SWC (θ) observations are assimilated: where y t i is the observation vector and ε iy is a vector whose elements contain random values drawn from a normal distribution N o; σ 2 ð Þwith mean zero and standard deviation σ (Burgers et al., 1998). The dimension of all the vectors in equation (4) is equal to the number of observations M assimilated for a certain time step.
The pressure heads in the model state vector are updated by assimilating the SWC observations: where x a i is the updated state vector, K the Kalman gain, y t i is the observation vector, and y f ;t i are the forecasted observations according y f ;t i = h(x f ;t i Þ where h is the nonlinear measurement operator, which handles the mapping between (measured) SWC and (simulated) pressure via the Mualem-van Genuchten relationship according to Reichle et al. (2002).
The Kalman gain K is calculated according where Cov indicates covariances and R (M × M) is the observation error covariance matrix, which contains on its diagonal the SWC measurement error variances, which are defined prior to the simulations in correspondence to equation (5).
The model covariance matrix is not completely calculated in equation (6), as only the covariances between the measurements and the model states are needed. The complete model covariance matrix can be determined from the ensemble of model forecasts, where x f ;t contains the average model states calculated over all realizations n realz at each time step: The EnKF is used in this work to also update uncertain parameters (e.g., Chen et al., 2011;Chen & Zhang, 2006;Hendricks Franssen & Kinzelbach, 2008).
The augmented state vector for updating both states and parameters is as follows: where x is now the augmented state vector including pressure heads (ψ) (L) and the logarithm of soil hydraulic conductivities (Y = log 10 K s (L/T)) at all model grid cells N. This implies that for the case of joint state-parameter updating the augmented state vector is of dimension 2N.
In some of the scenarios where both states and parameters are updated, a damping factor (α) was used to reduce filter inbreeding (Hendricks Franssen & Kinzelbach, 2008). Filter inbreeding is the underestimation of the ensemble variance after applying the EnKF updating equation repeatedly. This underestimation of the variance is related to the low rank approximation of the covariance matrix by a limited number of ensemble members and the linearization of the relation between states and parameters. A damping factor reduces the impact of filter inbreeding because parameter updates are made in smaller steps, limiting the impact of nonlinear relations between parameters and states, which are not captured by the covariances, and limiting the impact of spurious covariances. This results in the following updating equation for the joint state-parameter estimation problem: where x a i is the updated augmented state vector after DA and α T a vector with damping factors, which are 1 for updating states and between 0 and 1 for updating parameters.

The TerrSysMP-PDAF Setup
The PDAF (v1.16) module coupled to TerrSysMP was used for the assimilation of SWC observations from SoilNet infrastructure installed at the Rollesbroich study site. Figure

Setup of the DA Experiments
The Rollesbroich model domain covers an area of 1,280 × 1,120 m with a lateral resolution of 10 × 10 m and a total depth of 3.2 m. This results in a model domain of 128 × 112 × 20 (286,720) grid cells and includes the Rollesbroich catchment area of 38 ha as well as parts of the Rollesbroich residential area, streets, and other anthropogenic artificial structures. The slopes in x and y directions, which represent the topographic driven overland flow in the domain, were calculated with the help of a digital elevation model. Further details of the digital elevation model preparation can be found in Gebler et al. (2017). A Manning's roughness value of 0.001 hr/m 1/3 accounts for high (bank) vegetation density and the small stream channel bed of approximately 30 cm. The lower boundary was set impermeable at 3.2-m soil depth. The ParFlow subsurface domain was built with a variable vertical resolution making use of the TFG approach (Maxwell, 2013). From surface to bedrock the model resolution was gradually reduced with depth. The first layer with a vertical extent of 0.025 m is followed by 10 layers of 0.05-m vertical resolution and a twelfth layer with 0.1-m depth. The vertical resolution then further decreases from 0.2 m (Layer 13-17) to 0.5 m (Layer 18-19), and 0.575 m (Layer 20) for the deepest layer (bedrock).
The subsurface parameterization scheme mainly follows the heterogeneous subsurface setup by Gebler et al. (2017) with exception of the bedrock layers. For the assimilation experiments, the subsurface was separated into three soil horizons and two bedrock layers with spatially heterogeneous two-dimensional stochastic fields of Mualem-van Genuchten hydraulic properties. These stochastic fields include saturated hydraulic conductivity (K s ), inverse air entry (α), shape factor (n) as well as residual (θ r ) and saturated water content (θ s ) which also contain porosity information. A detailed description of the parameter sampling is given in section 3.2. For the weathered siltstone and sandstone bedrock layers preferential flow through fractures is very likely. To mimic these characteristics, the bedrock was subdivided into two separate layers. The mean hydraulic conductivity (log 10 K s ) of the bedrock was increased to the magnitude of a sandy soil to enable an adequate drainage of the upper soil horizons and a high lateral flow component in the bedrock layer which also mimics the efficiency of the drainage system. Figure 3 illustrates the simulation setup with the different horizons. The default CLM subsurface setup with 10 exponentially increasing subsurface layers was adapted to match the before mentioned extent of 10 corresponding ParFlow model layers (0.475-m total depth). The CLM vegetation parameterization mainly relies on standard spatial uniform CLM C3-grass parameters with a leaf area index varying between 0.3 and 3.0 over the year. The CLM default root distribution parameters (roota and rootb) were adapted in accordance with the modifications made for the CLM subsurface extent (roota: 10.6; rootb: 6.0). This modification, which assigns 90% of grass roots within the upper 30-cm soil, is in line with literature values (e.g., Brown et al., 2010). No flow boundary conditions were imposed for the northern, eastern, southern, and western domain boundaries.
Precipitation, air pressure, wind speed, specific humidity, and incoming shortwave and longwave radiation from on-site measurements (2011) and the nearby LANUV station (2010) were used as input to CLM in hourly time steps (Gebler et al., 2017).
In total, a period of two years (2010-2011) was simulated with ParFlow-CLM.

PDAF Setup and Assimilation Scenarios
First, ParFlow-CLM open loop simulations (Table 1) for the entire ensemble and the entire simulation period (January 2010 to December 2011) were performed. This period includes the model spin-up, which was conducted for a period of 16 months (January 2010 to April 2011) beginning with an initial hydrostatic equilibrium condition and a groundwater table at 1.5-m depth. Second, several assimilation variants (Table 1) were tested with the EnKF assimilating SWC data taken from 61 locations in the southern part of the study site at 5-, 20-, and 50-cm depths. More specifically these variants are as follows: 1. State update on daily basis 2. Joint update of states and hydraulic conductivity on daily basis 3. Joint update of states and hydraulic conductivity where states are updated daily and hydraulic conductivity only each 5 days. In addition a damping factor (α = 0.1) was applied for the parameter update.
In order to investigate potential model structural errors or methodical issues with the EnKF, the aforementioned variants were also tested with synthetic data mimicking the real-world data at identical measurement locations. This includes a scenario using deterministic Mualem-van Genuchten α and n using the true parameter set of 2-D heterogeneous layers for the entire ensemble. Furthermore, a scenario with biased saturated soil hydraulic conductivity (log 10 K s : +0.5 in all soil layers) and an incorrect spatial correlation structure (increased variogram range by factor 2) was investigated. The impact of ensemble size was also explored. The selected synthetic reference truth was an ensemble member of the open loop ensemble run (i.e., SYN_OL), and synthetic measurement data were extracted from this reference run at 61 observation locations at 5-, 20-, and 50-cm depths. The synthetic observations were used in all synthetic experiments including experiments with biased K s and erroneous spatial correlation structure. The assimilation period for realworld and synthetic experiments was from 1 May to 31 December 2011. The synthetic cases were further evaluated against 63 random locations ( Figure 1) different from the assimilated locations in 5-, 20-, and 50-cm depths. In addition, two layers of the upper and lower bedrock in 130-and 190-cm depths were examined for verifying the performance of the DA runs.
The standard deviation of the measurement error was set to 0.04 cm 3 /cm 3 and assumed to be spatially uncorrelated for synthetic and real-world scenarios. Surface cells with overland flow (i.e., with saturated conditions) were excluded from the update. This avoids numerical instability by an on/off switching of the surface runoff generation which is caused by the perturbation of the pressure field after the state update.

Ensemble Generation
In order to account for input uncertainties, model forcings as well as soil hydraulic parameters of the ParFlow-CLM model were perturbed. This resulted in 128 (or 256) different stochastic model realizations for the synthetic and real-world experiments. Hourly precipitation, assumed homogeneous for the catchment, was perturbed according to a normal distribution by multiplying with a Gaussian random value ðN (1, 0.15 2 )). As precipitation measurements were available in the catchment, it was assumed that the uncertainty with respect to this forcing was mainly related to measurement uncertainty and therefore relatively small which justifies the Gaussian assumption.
The creation of an ensemble of uncertain subsurface hydraulic properties departed from van Genuchten soil hydraulic properties estimated by HYDRUS-1D (Šimůnek et al., 2008) and the Shuffled Complex Evolution algorithm (SCE-UA; Duan et al. (1992) for 82 locations in the southern part of the study area. This approach was already demonstrated to capture the spatial variability at the Rollesbroich test site (Qu et al., 2014) and tested in combination with a 3-D ParFlow-CLM model (Gebler et al., 2017). On this account, Gebler et al. (2017) extended the initial approach by an uncertainty assessment of the estimated soil hydraulic properties (Figure 4), which was not provided by the original SCE-UA parameter estimation. Therefore, the discrepancy between the measurement support (at the optimization location) and the model discretization was considered on the basis of ordinary block kriging (Burgess & Webster, 1980) using the VESPER (Whelan et al., 2002) software in an unconditional manner at 10 × 10-m model resolution for all soil hydraulic parameters. The estimated uncertainty was similar to the uncertainty that alternatively was derived with the help of measured texture data and the ROSETTA (ROS) pedotransfer function (Schaap et al., 2001; e.g., Mualemvan Genuchten inverse air entry α: ±1.3 cm −1 [SCE-UA]; ±1.5 cm −1 [ROS]) as clarified in Gebler et al. (2017). These data (optimized SCE-UA parameter with uncertainty estimation by block kriging) were the basis for the sampling of 128 (256) stochastic realizations with 3-D heterogeneous fields of soil hydraulic properties which were used in the DA approach.
The subsequent sampling of Mualem-van Genuchten soil hydraulic properties (Figure 4) consists of four major steps. First, for the upper three layers, samples were taken from the multivariate distribution of the soil hydraulic parameters for each model realization. This distribution is defined by the individual parameter mean values and (co)variances, which were given by the optimized 1-D parameter set and the available uncertainty at each spatial location and depth. For the underlying siltstone and sandstone bedrock (Horizons 4 and 5), the hydraulic properties were randomly sampled at each location on the basis of Note. The open loop simulations refer to the synthetic reference (SYN_OL) or to the associated real-world case (REAL_OL). The "ST" refers to simulation with exclusive state update, while "PAR" includes simulation with state and parameter update. The letter "d" indicates scenarios with damping. The "Ks" specifies simulation scenarios with biased saturated conductivity and incorrect spatial correlation in all soil layers. Some parameters were not available in specific scenarios ("n/a"). The assimilation period for the data assimilation scenarios is May-December 2011.
parameter ranges provided by Bogena (2003). The bedrock hydraulic properties were further artificially increased to the magnitude of a sandy soil with the aim to mimic the hydraulic characteristics of the saprolite and the on-site drainage system. The hydraulic conductivity of the lower bedrock horizon (1.2to 1.5-m thickness) was more increased than the upper bedrock horizon (0.2-to 0.5-m thickness). This guarantees a realistic drainage of the lower soil horizon.
Second, the logarithmic hydraulic conductivity (log 10 K s ) at each location of the upper layers (5, 20, and 50 cm) was perturbed with a spatially homogeneous normal distributed Gaussian random value N (0, 0.25 2 ) to create more variable logarithmic hydraulic conductivity averages between different realizations for the individual soil horizons. With this procedure, additional model uncertainty was taken into account which may originate from an underestimation of the 1-D inversion uncertainty for hydraulic conductivity.
Third, a spatial heterogeneous field of log 10 K s was generated with sequential Gaussian simulation using GCOSIM3D (Gómez-Hernández & Journel, 1993) independently for each soil layer using the generated log 10 K s samples for the individual sensor locations. The required variogram parameters (range, sill, and nugget) were estimated by fitting the experimental semivariograms for the different soil layers to an exponential model for each soil layer (Horizons 1-3). For the upper and lower bedrock layer only very limited information regarding the spatial dependence of the hydraulic properties was available. Therefore, the variogram parameters of Horizon 3 for the bedrock stochastic simulations were used.
In the fourth and last step, log 10 α, log 10 n, θ r , and θ s were adapted according to their relations with log 10 K s . The logarithmic transform was also used for α and n as the spatial variability of these parameters can in general not be described very well with the Gaussian approximation (e.g., Carsel & Parrish, 1988). The spatially heterogeneous log 10 K s fields, obtained in the third step, render log 10 α, log 10 n also spatially variable using the multivariate relationships of log 10 K s with the other variables. This results in stochastic realizations for the different soil and bedrock layers of log 10 K s , log 10 α, log 10 n, θ r , and θ s . For the different scenarios 128 (256) realizations were used in the DA procedure in ParFlow. In case of synthetic scenarios with deterministic α and n the true realization values were taken for the entire ensemble.

Performance Validation
The performance of the open loop and different DA scenarios was evaluated with measured or synthetic observations of daily SWC, daily ET and daily discharge. The Nash-Sutcliffe efficiency (NSE) index, the root-mean-square error (RMSE), and bias (BIAS) were calculated for the forecasted state variables of each simulation scenario.
The NSE was calculated according to the following: where x t j are the simulated ensemble mean values at the observation (or verification) locations for the jth sample at time t, y t j are the observed data (or synthetic observations from the synthetic reference), y j the average of the observed data over time, n sample the number of samples (per layer) , and n tstep the total number of time steps. The NSE range is between −∞ and 1.0. Negative values indicate unacceptable simulation performance whereas positive values suggest good performance (Nash & Sutcliffe, 1970;Moriasi et al., 2007).
The model bias (BIAS) and the relative model bias in percent (PBIAS) are given by the following: The RMSE (equation (13)) was calculated to compare observed values with the ensemble mean from the simulations: The RMSE was also calculated for individual locations. These RMSEs at individual locations were averaged for each individual soil layer, which was a further performance statistics to evaluate the model simulations.  The differences between the individual DA scenarios are rather small. The different DA runs show larger differences in terms of SWC differences in May as well as in October and November 2011. Another notable difference is the reduced ensemble spread of SYN_PARαn_1, which is indicated by a~75% smaller standard deviation of mean SWC compared to other synthetic scenarios. SYN_OL shows a good performance (Table 2) for the upper soil layers (NSE: 0.571-0.793). In contrast, negative NSE is found at 130-and 190-cm depths (NSE: −6.761 to −2.333). With exception of the SWC in 20-cm depth (NSE: 0.631), SYN_Ks_OL shows an overall reduced performance compared to SYN_OL, particularly at 50, 130, and 190 cm (NSE: −14.738 to −9.486).
After DA, a performance improvement is found for the mean SWC at 5-, 20-, 130, and 190-cm depths indicated by an RMSE decrease of 0.002-0.020 cm 3 /cm 3 (6-63%). Particularly, the mean SWC at 5, 20, and 130 cm shows a very good fit (NSE 0.765-0.944) after DA, also for SYN_Ks_OL, which exhibits a large SWC fluctuation in all soil layers. For the bedrock layers the RMSE decrease ranges between 31% and 84%. In contrast, at 50-cm depth all assimilation scenarios show larger deviations from the reference. For this layer there is a decrease of performance in terms of mean SWC at the verification locations (NSE: −1.363 to −0.159; BIAS: −0.009-0.013 cm 3 /cm 3 ). An exception is SYN_PARKs_1 (NSE: −2.846), which exhibits a strong performance improvement compared to SYN_Ks_OL (NSE: −9.484) and indicates that the biases in this simulation scenario could be partly corrected by DA. The performance of SYN_Ks_OL improves despite the strong fluctuation in SWC at the end of May.
Hereinafter, the RMSE at the individual assimilation locations is discussed (Figures 1 and 2 in the supporting information) for the different subsurface layers at the individual verification locations at 5, 20, 50, 130, and SYN_Ks_OL: 0.0384-0.0635) at the verification locations is 8-40% smaller than the RMSE at the assimilation locations. The mean RMSE for simulations without biased Ks over all verification locations decreases between 40% (SYN_ST) and 55% (SYN_PARαn_1). For the individual layers, the mean RMSE averaged over the locations decreases between 20% and 66% compared to the open loop run depending on scenario and layer. The scenarios with parameter estimation show, for almost all layers, a better performance than SYN_ST with state update only. SYN_PARαn_1 performs best in terms of layeraveraged RMSE for the individual locations over all five soil layers showing the strongest performance increase. The RMSE reduction at 130-and 190-cm depths is smaller than in the upper soil layers. Particularly, the RMSE reduction at 190 cm is relatively small (20-34%). These layers also show more locations (58-63%) with decreasing performance. In contrast, fewer locations with decreasing performance are found at the upper soil layers particularly for SYN_PARαn_1 (0.0-1.5%). For SYN_PARKs_1 ( Figure S2) the overall RMSE decrease (13%) as well as the RMSE decrease of the individual layers is smaller (9-18%) than for the other simulation scenarios. The best performance is found at 50 cm, where 83% of the individual locations show an improvement. Figure 6 illustrates the temporal evolution of the hydraulic conductivity (log 10 K s ) estimates for the synthetic experiments. In the synthetic scenario, the bedrock layer log 10 K s is reproduced well, despite relative large differences between initial average log 10 K s and the synthetic truth. Results for the other soil horizons for the synthetic case vary among the different DA experiments. While SYN_PAR_1 (Figure 6a) overestimates the hydraulic conductivity for Horizon 2 (20 cm) and Horizon 3 (50 cm), SYN_PAR_5d and SYN_PARαn_1 overestimate the conductivity for Horizon 1 (5 cm). However, in most of these cases the deviations are very small (≤0.05 log 10 Ks). Only the SYN_PARαn_1 Horizon 3 shows larger deviations (~0.10 log 10 Ks). The parameter estimation for this scenario is also affected by temporal instability for Horizons 1 and 4 during the first month of the estimation, due to an insufficient ensemble spread. Higher differences between open loop and DA simulation are found for SYN_PARKs_1 (Figure 6b). While log 10 K s is slightly overestimated in Horizons 1 and 2, for the deeper horizons log 10 K s is strongly adjusted to the synthetic true reference. In this context, Horizons 4 shows an almost perfect fit although all horizons are affected by temporal instability during the first month of assimilation. The parameter updates ( Figure S3) for the synthetic scenarios with biased Ks are very similar to each other. The parameter updates for SYN_PAR_1 and SYN_PARαn_1 are only slightly stronger than for SYN_PAR_5d. SYN_PAR_1 and SYN_PARαn_1 exhibit higher log 10 K s contrasts between the sensor locations than SYN_PAR_5d. On the contrary, high contrasts are found for SYN_PARKs_1 particularly for 50, 130, and 190 cm. This includes also differences in the spatial log 10 K s distribution compared to the other DA scenarios. Figure 7 shows the spatial distribution of the differences between the ensemble hydraulic conductivity estimated for different DA scenarios and the reference hydraulic conductivity. This gives further insight in the change of the spatial patterns of the hydraulic conductivity during the assimilation period. For the scenarios SYN_PAR_1, SYN_PARαn_1, SYN_PARKs_1, and SYN_PAR_5d in general a strong reduction of the RMSE

10.1029/2018WR024658
Water Resources Research is found. For SYN_Ks_OL and SYN_PARKs_1 a higher level of RMSE is found due to the introduction of the biased K s . A large improvement is found at the sensor locations of Horizons 1-3 and also within the bedrock layers where no measurements were available. The spatial distribution of the RMSE reduction differs between SYN_PAR_1, SYN_PAR_5d, and the DA scenarios with biased K s and spatial correlation range. Horizon 1 of SYN_PAR_1 is on average closer to the reference than its SYN_PAR_5d counterpart but shows some locations with relatively high RMSE (0.4-0.5 log 10 K s ) in the center and the west of the study site. This is similar for Horizon 3. In contrast, the RMSE reduction is smoother and less spiky for the SYN_PAR_5d scenario. The spatial pattern of RMSE reduction for SYN_PARαn_1 follows closely the spatial pattern of SYN_PAR_1. The spatial RMSE distribution for the bedrock layers also differs among the simulation scenarios. Local maxima of RMSE are found mainly in the north and center of the study site. SYN_PARKs_1 shows a significant RMSE reduction for Horizons 3-5 but less for Horizons 1 and 2. The RMSE patterns in SYN_Ks_OL and SYN_PARKs_1 reflect also the larger correlation range of Ks in the simulation setup. From these findings it can be concluded that the different spatial patterns of the estimated hydraulic conductivity are strongly associated with the update strategy resp. the model scenarios.

Discharge
SYN_ST shows strong discharge fluctuations until the high flow period associated with a high ensemble spread. In some realizations the Kieselbach fell dry during assimilation, in particular during the period May-June 2011 ( Figure S4). This leads to an underestimation of discharge which is also indicated by a negative relative bias (−22.94%; Table 3  The water balance gaps for the synthetic experiments are mainly related to errors in the simulated discharge (Table 4) for simulations without Ks bias. The discharge of the open loop simulations is higher than the reference truth in the assimilation period (+24%). This gap is reduced by all DA scenarios. All DA scenarios are closer to the reference truth but show a negative bias (−17% to −9%) which is is 62-68% of the total open loop discharge. The EnKF does not preserve the water balance and can add/extract water in correspondence with the measurements. Hence, large differences up to 134 mm can be found for the DA experiments in the assimilation period particularly for experiments with larger impact of the state update (e.g., SYN_ST and SYN_PAR_5d). For SYN_Ks_OL and SYN_PARKs_1 the water balance gap is related to both ET and discharge. Particularly, SYN_PARKs_1 shows a high deficit (26%), which is related to extraction of water during DA.

Real-World Experiments 4.2.1. SWC
In this section simulation results are discussed for SWC characterization with the help of the assimilation of data for the real-world case. Figure 8 shows the temporal evolution of the mean SWC for the open loop simulations and three different DA experiments for the assimilated locations in 2011. The measured mean SoilNet SWC at the study site for this period is high during winter and periods with intensive rainfall or thunderstorm events. The smallest SWC for 2011 is reached in May-June. This basic seasonality is captured well by the open loop and DA simulations. The comparison with the measured average SWC (Table 5) shows that the open loop simulation (REAL_OL) is close to the measurements at 5-cm depth (NSE: 0.894; RMSE: 0.026 cm 3 /cm 3 ), but not at 20-and 50-cm depths with negative NSE (20 cm: −0.269; 50 cm: −7.711), high RMSE (20 cm: 0.048 cm 3 /cm 3 ; 50 cm: 0.059 cm 3 /cm 3 ) and significant bias (20 cm: 0.046 cm 3 /cm 3 ; 50 cm: 0.057 cm 3 /cm 3 ). The DA scenarios REAL_ST (state updating only) and REAL_PAR_1 (daily joint state-parameter updating) showed a slightly improved SWC average compared to the open loop run for May and June 2011 for the 5-cm layer, but these scenarios diverge from the observations afterward until the end of November 2011. Also, for the REAL_PAR_5d scenario with daily updates of states and each 5-day updates of parameters the scenario diverges from the observations during the (late) summer and autumn 2011. The average SWC at 20-and 50-cm depths is better characterized than in the open loop run for the three DA scenarios. For the 50-cm layer the high initial model bias gradually decreased over the entire assimilation period. Whereas the three DA scenarios for 5-and 20-cm depths give very similar results (October and November), at 50-cm depth REAL_PAR_5d outperforms REAL_ST and REAL_PAR_1, which shows the smallest performance improvement compared to the open loop. The bias for SWC-characterization at 50-cm depth decreases from 0.057 cm 3 /cm 3 (REAL_OL) to 0.017 cm 3 / cm 3 (REAL_PAR_5d).  Note. The model performance is indicated by the mean Nash-Sutcliffe efficiency (NSE), percent bias, and the root-mean-square error (RMSE).

10.1029/2018WR024658
Water Resources Research with joint state-parameter estimation results in a small decrease in layer-averaged RMSE of the individual sensors (REAL_OL: 0.0936 cm 3 /cm 3 ; REAL_PAR_5d: 0.0858 cm 3 /cm 3 ) at this depth. However, an improvement was found for SWC model estimation at 20-and 50-cm depths for all real-world DA experiments (16-28% RMSE-reduction at 20 cm and 8-18% RMSE reduction at 50 cm). Nevertheless, at 50-cm depth the initial bias is only partly removed in the different DA scenarios. REAL_PAR_5d reproduces SWC profiles the best over all three layers showing a reduction by 14% for the RMSE averaged over all sensor locations. For REAL_ST the RMSE decreases by 6%, whereas REAL_PAR_1 shows only a 2% RMSE decrease. RMSE decreases or increases thereby do not show vertical correlations. Spatial RMSE structures are not very similar for the different simulation scenarios. In particular, the spatial RMSE structure for REAL_PAR_5d differs from REAL_ST and REAL_PAR_1.
Compared to the open loop simulation for the synthetic case, the open loop simulation for the real-world case (RMSE: 0.0825-0.094 cm 3 /cm 3 ) has a larger layer-averaged RMSE over the observation locations (0.044-0.056 cm 3 /cm 3 ). These numbers suggest that 40% of the layer-averaged RMSE over the observation locations for the real-world case could be related to model structural errors (processes which are not    Figure 10 illustrates the temporal evolution of the hydraulic conductivity (log 10 K s ) estimates for the realworld experiments. For REAL_PAR_1 the ensemble spread rapidly decreases within the first weeks of assimilation. In contrast, the ensemble spread of REAL_PAR_5d with less frequent parameter updating (each five days only) still exhibits~25% of the initial ensemble spread at the end of the assimilation period. The different DA scenarios show large spatial variations in log 10 K s estimates for the real-world case. In particular, log 10 K s of REAL_PAR_1 varies between the individual horizons and the final parameter estimates show major differences in comparison with REAL_PAR_5d. The temporal evolution of log 10 K s for REAL_PAR_1 shows also some instability. The log 10 K s of REAL_PAR_1 for the second and third layer first rapidly decreases resp. increases until the end of May. Afterward, it increases resp. decreases again until it converges to a final parameter estimate which is reached at the end of July. The final estimates differ up to 0.6 log 10 K s (m/hr) between REAL_PAR_1 and REAL_PAR_5d. Figure 11 shows the differences of the ensemble average hydraulic conductivity fields, as updated in the DA scenarios, compared to the ensemble average hydraulic conductivity of the open loop run, for the real-world case. The figure shows that for REAL_PAR_1 parameter updates are stronger than for REAL_PAR_5d. Similar to the synthetic scenarios, REAL_PAR_1 exhibits high log 10 K s contrasts between the sensor locations and also for the bedrock layer between different areas of the northern study site. Also REAL_PAR_5d shows positive and negative changes in log 10 K s which are more limited spatially. Figure 12 shows In particular the bias during low flow periods is strongly reduced for these simulations. However, the peak discharge of REAL_PAR_1 is still overestimated during the entire assimilation period.

Water Balance
The water balance gaps for the real-world experiments are mainly related to discharge differences. Table 7 gives an overview of the different measured and modeled water balance components at the study site for 2011, where all model simulations overestimate discharge. Large differences up to 224 mm can be found for the DA experiments, which indicate that the EnKF extracts a significant amount of water in the assimilation period through the adaptation of the pressure heads. Similar to the synthetic experiments, higher balance gaps are found for the real-world experiments with the strong adaptation of the pressure heads within the simulations (REAL_ST and REAL_PAR_5d).

Discussion
Our findings indicate that the EnKF in combination with fully coupled land surface-subsurface models and SWC data from a dense observation network can potentially improve the characterization of the hydrology at the hillslope scale. For the real-world case joint updating of model states and hydraulic conductivity is  more effective in updating the SWC (14% RMSE reduction averaged over all locations and layers for REAL_PAR_5d compared to open loop) than state update alone (6% RMSE reduction). However, a larger impact for parameter updating could have been expected. In contrast to their real-world counterparts, the synthetic scenarios exhibit a more significant improvement of SWC after DA showing a relatively strong RMSE reduction at the average over all SWC verification locations from 40% (SYN_ST) to 55% (SYN_PARαn_1). This can be called satisfying, especially because the evaluations in these synthetic runs are made at verification locations not used in the assimilation.
The fact that for the synthetic case simulation results are better than for real-world case (with 40% lower layer-averaged RMSE of SWC for the open loop run) indicates that model structural errors play a role for the SWC estimation. One possible explanation, which was addressed with the simulation experiments, was the fact that all soil hydraulic parameters are uncertain, but only hydraulic conductivity is estimated. For a relatively simple 1-D subsurface flow model Li and Ren (2011) already demonstrated that the update of multiple Mualem-van Genuchten parameters (i.e., hydraulic conductivity, α, and n) is more effective than updating hydraulic conductivity alone. This issue has been analyzed by the SYN_PARαn_1 and SYN_PARαn_5_256 scenarios with perfect knowledge of Mualem-van Genuchten α and n. Although overall the SYN_PARαn scenarios give smaller errors, the improvement with respect to their counterparts with uncertain α and n is small. This indicates that model simulations could be further improved by an estimation of these parameters but that their impact at least in this study is relatively small. However, under drier conditions the SWC could be more strongly governed by α and n, so that estimating the three parameters jointly could potentially improve SWC characterization more than in this study. Chaudhuri et al. (2018) showed that under certain conditions the joint estimation of multiple spatially distributed parameter fields can be successful, but it strongly depends on data availability and the joint presence of very dry and very wet conditions in the data set.
The performance differences between the real-world case and the synthetic case are also analyzed with a scenario for the synthetic case that departs from erroneous prior geostatistical parameters (saturated hydraulic conductivity and variogram range). It is found that modest errors already result in a declined performance of the DA, especially for the fluxes in this case. The too high saturated hydraulic conductivity for the upper soil layers creates drought stress and a significant underestimation of ET, which is not corrected by DA. This simulation experiment shows that erroneous values for the prior geostatistical parameters are a possible explanation for the relative poor performance in the real-world case. Note. The model performance is indicated by the mean Nash-Sutcliffe efficiency (NSE), percent bias, and root-mean-square error (RMSE). Such erroneous initial guesses of geostatistical parameters can, for example, also be related to the existence of preferential flow through macropores, which was already observed at the nearby Wüstebach site (Wiekenkamp et al., 2016) and other studies investigating in situ SWC observations (e.g., Martini et al., 2015;Poltoradnev et al., 2016). The underrepresentation of preferential flow in the model might explain the high bias at 50-cm depth for the real-world simulations having a large influence on the infiltration of a grassland with an extensive rooted top soil layer (Weiler & Naef, 2003). This issue was already indicated by Gebler et al. (2017) for a slightly different ParFlow-CLM setup for the Rollesbroich study site. Furthermore, the representation of the saprolite and the drainage system in the subsurface might be suboptimal in our simulation model and could be better represented with a dual porosity approach (e.g., Frey et al., 2012;Frey et al., 2016). Moreover, an unknown additional storage in the bedrock fractures potentially has influence on the baseflow of the Kieselbach (Hale et al., 2016). The representation of the drainage system in the model might be an additional cause for the model structural error, although no direct evidence in the SWC spatial error distribution was found.
Other limitations of the DA experiments are potentially associated with update frequency, ensemble size, and filter inbreeding. Whereas the larger ensemble size of 256 gives slightly better results than the ensemble size of 128, the overall effect of the increased ensemble size is small. Nevertheless, it is found that in the DA experiments, especially in case of daily updating and the real-world experiment, the ensemble variance decreases too fast and is too low. For the real-world case, in too many cases, the observations are not covered by the ensemble of model simulations. As a consequence, restricted weights are assigned to the measurements, which have too little influence on the state and parameter updates (e.g., Evensen, 2004;Hendricks Franssen & Kinzelbach, 2008;Houtekamer & Mitchell, 1998;Zhang et al., 2007). For the synthetic case, the observations are generally well covered by the ensemble of model simulations shown in Figure 5 (note that ±1 standard deviation is plotted for the ensembles). The only exception is scenario SYN_PARKs_1 with biased hydraulic conductivity, indicating the negative impact of biases, which could also be the reason for the reduced performance of the real-world case. While for the synthetic case the ensemble spread for soil moisture is adequate in general, the parameter spread is sometimes too narrow. Figure 6 shows that this is the case for the shallow layers, for several of the scenarios with daily parameter updating. A possible explanation is that updating hydraulic conductivity compensates for uncertain α and n parameters, which are not updated. However, also for the scenario SYN_PARαn_1 the ensemble of estimated parameters does not include the reference values for Horizons 1 and 3. Therefore, also the synthetic case appears to be affected to some degree by filter inbreeding impacting the parameter estimation, in spite of applying dampening and tests with a relatively large ensemble size of 256. In a comparison study of seven EnKF variants, Keller et al. (2018) showed for two physical setups and 1,000 synthetic studies per physical setup that the introduction of the damping factor improves the parameter estimation for relatively small ensemble sizes, in comparison to the standard EnKF without damping factor. Although the damping factor and other simple inflation methods have been applied successfully (e.g., Aksoy et al., 2006;Erdal et al., 2014;Hendricks Franssen & Kinzelbach, 2008), these methods are subject of discussion (Houtekamer & Zhang, 2016). The reduced impact of the error statistics in the EnKF potentially obscures other sources of error and, therefore, can lead to inconsistencies in the derivation of the Kalman gain (Houtekamer & Zhang, 2016). The question is how the estimation of spatially distributed parameters can be further improved in the future avoiding filter inbreeding. In order to answers this question, we see three different strategies, which can also be applied in combination. First, the application of localization to reduce the influence of spurious correlations may be useful (Houtekamer & Mitchel, 2001;Anderson, 2007). A more sophisticated DA strategy including localization could potentially improve the results, as suggested by Rasmussen et al. (2015) who performed tests with integrated terrestrial models, but this option was not yet available and implemented in our TerrSysMP-PDAF framework. Second, the use of normal score ensemble Kalman filter (NS-EnKF; Zhou et al., 2012;Schöniger et al., 2012), which includes a step to make the marginal distributions of states and parameters Gaussian, may show improvements. The NS-EnKF update is performed then in terms of transformed variables. The transformation may be helpful in case of strongly skewed pressure head distributions in the vadose zone under dry conditions (Erdal et al., 2015;Zhang et al., 2018). Especially under extremely dry conditions it could be expected that NS-EnKF outperforms classical EnKF. However, our study area was not affected by these very dry conditions; thus, the advantage of using NS-EnKF, in combination with a relative small ensemble size, would probably have been limited. NS-EnKF is not implemented yet in combination with TerrSysMP-PDAF because it is associated with additional challenges in terms of (complex) parallelization. Third, in very recent work, we found that combining the EnKF with a pilot points method (PP-EnKF) can preserve ensemble spread much better than (other variants of) EnKF. The PP-EnKF method does not need more compute time than classical EnKF. In this method, the update at measurement locations is done with numerically estimated covariances, and between measurement locations analytical covariance functions are applied for parameters. The method still needs to be implemented in the TerrSysMP-PDAF framework.
Discharge was in general not well reproduced in the experiments showing that the SWC characterization is not the main factor for improving discharge estimation. Nevertheless, for the best performing scenarios in the synthetic case the considerable NSE improved from −0.04 to 0.61 is related to assimilation of SWC-data and parameter estimation. For this small hillslope scale where precipitation uncertainty plays a minor role, still a higher impact of SWC-assimilation could have been expected. Particularly for the real-world case where the improvement of discharge estimation related to DA and parameter estimation was smaller. The representation of the Kieselbach in the model is critical (Gebler et al., 2017). Given the small spatial extent of the channel (0.3-1.0 m) the interaction between surface water and groundwater is highly sensitive to the vertical pressure gradient which controls the reinfiltration of groundwater into the channel. The assimilation of discharge data, combined with the updating of Manning's roughness coefficient as well as other Mualem-van Genuchten soil hydraulic properties, could further improve discharge estimation. Baatz et al. (2017) demonstrated the benefit of estimating Manning's roughness coefficients on the discharge simulation of a 2-D regional-scale synthetic ParFlow model.
The assimilation of SWC data had in most scenarios no significant effect on the reproduction of the annual actual ET in the synthetic simulations, which was already very good in the open loop. However, the scenario with a systematic too high saturated hydraulic conductivity for the upper soil layers created drought stress and underestimation of ET, showing that SWC assimilation for drier conditions with limited water supply has potentially more impact.
Overall, this study indicates that DA with integrated physically based terrestrial systems models can potentially strongly improve SWC and parameter characterization, but that (realistic) biases in prior (geo)statistical parameters can already degrade the performance considerably. Although DA under those conditions with bias still improves SWC characterization, the estimation of fluxes like discharge and ET might degrade. This points to the high importance of the assimilation of additional data, including ET measurements or proxies of ET, which is until now not very common in integrated terrestrial systems modeling, as these data are often only used for verification.

Conclusions
This study investigated the assimilation of high-resolution SWC data with the EnKF in the fully coupled land surface-subsurface model TerrSysMP. The assimilation experiments were performed for the small headwater grassland catchment Rollesbroich in the Eifel (Germany), or alternatively, for a synthetic reality which mimics this catchment. DA experiments were performed with an ensemble size of 128 (256) model realizations at 10 × 10-m lateral resolution and a variable vertical resolution (0.025-0.575 m) which resulted in a problem size of 0.3 million unknowns. The individual realizations were set up with a fully heterogeneous subsurface with geostatistical realizations of Mualem-van Genuchten soil hydraulic properties. The following DA experiments were performed: (i) daily state updating, (ii) daily joint state parameter updating, and (iii) daily state updating combined with parameter updating each 5 days and a damping factor. In addition, for the synthetic test case also experiments were performed where (i) Mualem-van Genuchten parameters α and n were deterministic and only saturated hydraulic conductivity uncertain, and (ii) prior geostatistical parameters like the mean saturated hydraulic conductivity and the range of all soil hydraulic parameters were systematically biased. In all experiments SWC data from 61 sensor network locations and three depths (5, 20, and 50 cm) were assimilated.
Considerable improvement was found for the synthetic case, which showed on average 40-55% RMSE reduction for the verification locations and for the different DA scenarios. Also the discharge estimation was improved with a NSE of −0.04 for the open loop and~0.61 for the DA scenarios, which involved parameter estimation. It was found that estimated saturated hydraulic conductivity was much closer to the reference values (after joint state-parameter updating). However, in case of biased prior geostatistical parameters the characterization of SWC and saturated hydraulic conductivity improved less, and the estimation of discharge and ET was even degraded with DA.
For the real-world case the EnKF in combination with joint updating of model states and hydraulic conductivity was more efficient in updating the SWC than state updating alone. For joint state-parameter updating the average RMSE for the overall SWC sensor locations decreased by 14% for the real-world scenario with joint state-parameter updating and damping while state updating reduced the RMSE only by 6%. The improvement of the SWC characterization for the real-world case was limited. The uppermost layer of 5cm depth exhibited a performance impoverishment, indicated by strong RMSE increases. The discharge for the verification period was only marginally improved, although some discharge improvement was found, if both states and parameters were updated with DA. In particular, the systematic bias up to 65% between real-world model simulations and measurements was not significantly improved.
In order to explain the very different results for the real-world case and the synthetic case, it is important to point to the very different RMSE for the open loop runs for the real-world case and the synthetic case. The RMSE for the open loop run, calculated over the 61 measurement locations and upper three layers, was 53% lower for the synthetic case than for the real-world case. This can be related to model structural errors, and these same model structural errors might have inhibited the positive impact of DA as updated parameters just might have compensated for model structural errors. In this context, it was investigated whether results could be better, if besides saturated hydraulic conductivity also Mualem-van Genuchten parameters α and n were estimated. This was tested for the synthetic case by assuming that α and n were perfectly known and only saturated hydraulic conductivity was uncertain. Results improved only very slightly compared to the scenarios where α and n were unknown and point to the fact that for this case the uncertain α and n were not a main additional limitation to improve SWC estimation by DA. However, from the synthetic experiment with relative modest biases in the prior geostatistical parameters, for example related to preferential flow, we conclude that such biases could explain the relative poor performance of the DA experiments for the real-world case.
In summary, the potential of assimilating SWC observations at the hillslope scale to improve the characterization of spatially distributed SWC and discharge has been demonstrated for the synthetic case, which mimicked the real-world case with all its complexities including 3-D fully distributed fields of saturated hydraulic conductivity and Mualem-van Genuchten parameters α and n. The fact that for the real-world case the improvements induced by DA were much smaller we attribute mainly to model structural errors like biases in prior geostatistical parameters and also other model structural errors like the representation of drainage might have played some role. These results might be site specific, and a similar approach could give better results at another hillslope site. However, it also points to the importance of precise estimates of prior geostatistical parameters and the representation of small-scale processes (e.g., macropore flow), which are difficult to capture, even with a physically based integrated terrestrial systems model. The study suggests that for predictions with integrated terrestrial system models it is essential to assimilate multiple data types, including (proxies of) ET.