DasPy 1.0

Abstract. Data assimilation has become a popular method to integrate observations from multiple sources with land surface models to improve predictions of the water and energy cycles of the soil-vegetation-atmosphere continuum. Multivariate data assimilation refers to the simultaneous assimilation of observation data from multiple model state variables into a simulation model. In recent years, several land data assimilation systems have been developed in different research agencies. Because of the software availability or adaptability, these systems are not easy to apply for the purpose of multivariate land data assimilation research. We developed an open source multivariate land data assimilation framework (DasPy) which is implemented using the Python script language mixed with the C++ and Fortran programming languages. LETKF (Local Ensemble Transform Kalman Filter) is implemented as the main data assimilation algorithm, and uncertainties in the data assimilation can be introduced by perturbed atmospheric forcing data, and represented by perturbed soil and vegetation parameters and model initial conditions. The Community Land Model (CLM) was integrated as the model operator. The implementation allows also parameter estimation (soil properties and/or leaf area index) on the basis of the joint state and parameter estimation approach. The Community Microwave Emission Modelling platform (CMEM), COsmic-ray Soil Moisture Interaction Code (COSMIC) and the Two-Source Formulation (TSF) were integrated as observation operators for the assimilation of L-band passive microwave, cosmic-ray soil moisture probe and land surface temperature measurements, respectively. DasPy has been evaluated in several assimilation studies of neutron count intensity (soil moisture), L-band brightness temperature and land surface temperature. DasPy is parallelized using the hybrid Message Passing Interface and Open Multi-Processing techniques. All the input and output data flows are organized efficiently using the commonly used NetCDF file format. Online 1-D and 2-D visualization of data assimilation results is also implemented to facilitate the post simulation analysis. In summary, DasPy is a ready to use open source parallel multivariate land data assimilation framework.

The implementation allows also parameter estimation (soil properties and/or leaf area index) on the basis of the joint state and parameter estimation approach.The Community Microwave Emission Modelling platform (CMEM), COsmic-ray Soil Moisture Interaction Code (COSMIC) and the Two-Source Formulation (TSF) were integrated as observation operators for the assimilation of L-band passive microwave, cosmic-ray soil moisture probe and land surface temperature measurements, respectively.DasPy has been evaluated in several assimilation studies of neutron count intensity (soil moisture), L-band brightness temperature and land surface temperature.DasPy is parallelized using the hybrid Message Passing Interface and Open Multi-Processing techniques.All the input and output data flows are organized efficiently using the commonly used NetCDF file format.Online 1-D and 2-D visualization of data assimilation results is also implemented to facilitate the post simulation analysis.In summary, DasPy is a ready to use open source parallel multivariate land data assimilation framework.

Introduction
Hydrological and land surface processes can be studied on the basis of measurement data or numerical models.For example, remote sensing provides valuable measurements of soil moisture (SM) (Entekhabi et al., 2010;Kerr et al., 2010), land surface temperature (LST) (Wan and Li, 2008) and snow (Hall et al., 2002).In addition to remote sensing measurements, large amounts of in situ data from monitoring networks measure land surface processes locally (Li et al., 2013;Simmer et al., 2014;Vereecken et al., 2010;Zreda et al., 2012).These measurements are representative of the real world, but have a limited spatio-temporal coverage and resolution.This is a main limitation for the assessment of climate change or water resources research purely on the basis of data (Anderson et al., 2008;Li et al., 2009).Therefore, numerical modeling approaches (e.g.land surface and hydrologic models) have become very important tools for earth science research (Liang et al., 1994;Niu et al., 2011;Sellers et al., 1996;Shrestha et al., 2014;Wood et al., 2011).With numerical models, the spatiotemporal variables of land surface states can be estimated at high spatial and temporal resolution and in a cheap and efficient way.However, because of uncertainties in the atmospheric forcings, soil and vegetation parameters and the biophysical processes themselves, usually there are large differences as well as biases in the model simulation results (Bosilovich et al., 2007;De Lannoy et al., 2007;Dee, 2005;Liu and Gupta, 2007).
Land data assimilation can integrate many observation types into land surface or hydrological models to improve the model performance.The uncertainties associated with measurement data and numerical model results are optimally weighted in the land data assimilation approaches (McLaughlin, 2002;Montzka et al., 2012;Moradkhani, 2008;Reichle, 2008).In recent years, many studies have shown the important role of data assimilation for the characterization of water and energy cycles, and multisource observations have been used to improve the characterization of soil moisture (Crow et al., 2008;Kumar et al., 2012), streamflow (Brocca et al., 2010b) (Bateni and Entekhabi, 2012;Xu et al., 2014) and snow (Che et al., 2014;De Lannoy et al., 2012), amongst others.For example, soil moisture profile characterization can be improved by assimilating multisource soil moisture data (Crow et al., 2008;X. Han et al., 2012X. Han et al., , 2015a;;Pauwels et al., 2007;Reichle et al., 2007;Yang et al., 2009); which was also proven to be beneficial for the characterization of land surface fluxes (Han et al., 2015a;Jin and Li, 2009;Xu et al., 2011).Remotely sensed land surface temperature products have been assimilated into land surface models to improve the estimation of land surface fluxes (Ghent et al., 2010;Han et al., 2015a;Huang et al., 2008;Reichle et al., 2010;Xu et al., 2014).Streamflow is sensitive to soil moisture and its prediction can be improved by soil moisture assimilation (Brocca et al., 2010b; E. J. Han et al., 2012;Lee et al., 2011;Mascaro and Vivoni, 2012;Meier et al., 2011;Pauwels et al., 2001) or through streamflow assimilation (Clark et al., 2008;Lee et al., 2011;Li et al., 2011;Warrach-Sagi and Wulfmeyer, 2010).Assimilation of snow cover fraction (SCF) (Andreadis and Lettenmaier, 2006;Clark et al., 2006), snow water equivalent (SWE) (De Lannoy et al., 2012;Dong et al., 2007) and passive microwave brightness temperature data (Che et al., 2014) are all helpful for snow characterization.Terrestrial water storage characterization can be improved by assimilating the Gravity Recovery and Climate Experiment (GRACE) data (Lo et al., 2010;Zaitchik et al., 2008).
Data assimilation is also used in the context of joint state and parameter (or bias) estimation (Ruiz et al., 2013).One possible approach in this context is the state augmentation method (Bateni and Entekhabi, 2012;Franssen and Kinzelbach, 2008;Han et al., 2014).Another approach is dual state and parameter estimation in which the models need to be run twice compared with the state augmentation method (Cammalleri and Ciraolo, 2012;Moradkhani et al., 2005).Further approaches are the combined data assimilation and optimization method (Vrugt et al., 2005;Xu et al., 2011;Yang et al., 2009), and joint state and bias estimation methods (Bosilovich et al., 2007;De Lannoy et al., 2007;Dee, 2005;Kumar et al., 2012).Figures

Back Close
Full Usually the development of the necessary land data assimilation system for conducting the above-mentioned research is laborious.Examples of land data assimilation systems recently developed are the Global Land Data Assimilation System (GLDAS) (Rodell et al., 2004), the European Land Data Assimilation System (ELDAS) (Wilker et al., 2006), the Chinese Land Data Assimilation System (CLDAS) (Li et al., 2007;Tian et al., 2010), the Land Information System (LIS) (Kumar et al., 2008), the Canadian Land Data Assimilation System (CaLDAS) (Balsamo et al., 2007), the GEOS-5 Data Assimilation System (Rienecker et al., 2008), the Data Assimilation Research Testbed (DART) (Anderson et al., 2009), the Korea Land Data Assimilation System (KLDAS) (Lim et al., 2012), the Earth Observation Land Data Assimilation System (EO-LDAS) (Lewis et al., 2012), the Parallel Data Assimilation Framework (PDAF) (Nerger and Hiller, 2013), the Open Data Assimilation library (OpenDA) (Ridler et al., 2014) and the Auto-Tuned Land Data Assimilation System (ATLAS) (Crow and Yilmaz, 2014).The source codes of these systems are not available for the broad data assimilation community except for GEOS-5, DART, EO-LDAS, PDAF and OpenDA.However, GEOS-5, DART, EO-LDAS, PDAF and OpenDA are not specifically designed for land data assimilation, and complex code structures are used in GEOS-5, DART, PDAF and OpenDA for the purpose of versatility.More technical efforts are still necessary to adapt these systems for multisource and multivariate land data assimilation including parameter estimation or bias estimation.For example, observation operators are necessary to assimilate the radiance observation directly.However, observation operators are usually implemented with various programming languages like Fortran and Matlab.Yet additional efforts are required to integrate the observation operators into a single data assimilation framework.
Therefore, a ready to use multivariate land data assimilation system is still necessary for the large land data assimilation community.In this paper, we present the development of an open-source multivariate land data assimilation framework (DasPy).The main programming language of DasPy is Python (https://www.python.org/).Python is a free and advanced scripting language, and is also called the glue language as the Figures

Back Close
Full different programming languages can be integrated easily in Python.Python holds the first place of script languages for scientific computing (http://www.tiobe.com).Python is very easy to learn and use for scientific computing, like for vector/matrix operations, statistics and plotting because of plenty of third party libraries.DasPy has been evaluated in several catchment scale studies of soil moisture assimilation (X.Han et al., 2012), joint assimilation of L-band brightness temperature and land surface temperature (Han et al., 2013), joint soil moisture and soil properties estimation by assimilating L-band brightness temperature (Han et al., 2014) and the joint assimilation of cosmicray neutron intensity and land surface temperature with leaf area index estimation (Han et al., 2015a).Local Ensemble Transform Kalman Filter (LETKF) is implemented as the main data assimilation algorithm and parallel computing is also implemented in DasPy.The objective of this paper is to introduce the implementation of the DasPy land data assimilation framework.A real-world case study with land surface temperature assimilation including parameter estimation illustrates the capability of DasPy at the catchment scale.This paper is organized as follows: Sect. 2 introduces the framework of DasPy implementation; the land surface temperature assimilation with DasPy is shown in Sect.3; Sect. 4 presents the discussion; and the final conclusion is summarized in Sect. 5.

General Framework of DasPy
The objective of the DasPy development is to integrate multisource measurements into a land surface model to obtain an improved characterization of the water and energy cycles.DasPy has been designed in a modular form and is integrated in a parallel computing framework.

Assimilation approaches
The ensemble Kalman filter (EnKF) (Evensen, 2003) has become the most popular data assimilation algorithm in land surface modelling because it is robust against nonlinear model dynamics and can easily handle different uncertainty sources (Reichle, 2008).Many variants of EnKF have been proposed (Bishop et al., 2001;Whitaker and Hamill, 2002) since its classical version with observation perturbation (Burgers et al., 1998).The Local Ensemble Transform Kalman Filter (LETKF) without observation perturbation (Hunt et al., 2007;Miyoshi and Yamane, 2007;Ott et al., 2004) was chosen as the main data assimilation algorithm of DasPy.LETKF is very efficient for parallel computing (Miyoshi et al., 2014) and easy to implement because the data assimilation is handled separately for each model grid cell.In LETKF, observation localization (OL) is used to assimilate spatially distributed observation data (Greybush et al., 2011;Han et al., 2015b).Spatial correlation is an intrinsic characteristic of land surface states such as soil moisture and soil temperature (Brocca et al., 2010a;Ryu and Famiglietti, 2006).Observation data can be extrapolated from the observed locations to unobserved locations in LETKF with OL (Greybush et al., 2011;X. Han et al., 2012;Hunt et al., 2007) on the basis of spatial correlation characteristics of model state variables or observation data.The combination of spatial correlation and land data assimilation was implemented by the OL technique in LETKF and DasPy (X.Han et al., 2012Han et al., , 2015b)).Spatial correlations are typically estimated using geostatistical analysis methods such as semivariogram modeling (Goovaerts, 1997).Several semivariogram models are available to characterize the spatial horizontal correlation of observation data.These are the Gaussian model, exponential model, spherical model and the more general Matern model (Goovaerts, 1997;X. Han et al., 2012;Minasny and McBratney, 2005).The popular 5th order polynomial correlation function is also implemented as an option in DasPy (Gaspari and Cohn, 1999;Reichle and Koster, 2003).
Filter inbreeding, the underestimation of ensemble variance in data assimilation, is a common problem related to small ensemble sizes and the inadequate characterization of all sources of uncertainty.As a consequence of filter inbreeding, the ensemble spread narrows down along with the data assimilation progress, ultimately even leading to filter divergence so that the observation data cannot be fully exploited.Therefore the spread of ensemble members should be kept to retain the ensemble variance and to avoid the filter divergence (Anderson, 2009).The multiplicative ensemble inflation method is implemented as follows (Whitaker and Hamill, 2012): where X a i is the analysis ensemble member i for a given model state, σ b is the standard deviation on the basis of the model forecast ensemble members and σ a is the stan-Figures

Back Close
Full dard deviation on the basis of the model analysis ensemble members.α is a tunable parameter between 0 and 1.The inflation of the analysis ensemble is proportional to the amount of reduction of ensemble spread by the observations after assimilation.
A modified version of the LETKF code developed by Takemasa Miyoshi (https://code.google.com/p/miyoshi/)has been integrated in DasPy.

Model operator
The current version of DasPy is mainly designed for the Community Land Model (CLM -version 4.5) (Oleson et al., 2013), which is a commonly used land surface model and is still under very active development (http://www.cesm.ucar.edu/models/cesm1.2/clm),but DasPy can also be extended to include more model operators.CLM includes the comprehensive solution of biogeophysical and biogeochemical processes, such as water flow and storage in soils, snow packs and vegetation, heat transfer in soils and snowpacks including phase change, momentum, sensible and latent heat fluxes, photosynthesis, surface radiation, dynamic vegetation and carbon-nitrogen cycling.CLM is driven by atmospheric forcing data (e.g., precipitation, long and short wave radiation, wind speed, temperature and humidity) and soil/vegetation parameters.Hybrid MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) parallelization were implemented and a parallel Input/Output system was used to minimize the global array in CLM, and this makes CLM efficient for large scale applications.
CLM is loosely coupled into DasPy.DasPy calls the executable program of CLM directly, and the data transfer between DasPy and CLM is by the initial files of CLM in NetCDF format.There are two advantages with loose coupling: (1) the source code of CLM did not need to be modified (which is an advantage because its structure is very complex), ( 2

Representation of model uncertainties
The data assimilation performance depends on the suitable representation of model uncertainties (Crow and Van Loon, 2006;Moradkhani, 2008).The uncertainties from the atmospheric forcing data, model initial conditions and model parameters are commonly considered in land data assimilation (Reichle, 2008).These uncertainties can be represented by randomly perturbing model parameters and atmospheric forcing data, for example by additive noise and multiplicative noise.In order to facilitate the application of DasPy, standard values for the perturbation of model forcings and model parameters are implemented as follows: 1. Atmospheric forcing perturbation: multiplicative noise with a log-normal distribution (mean = 1, SD = 0.25) is imposed on precipitation.Air temperature and longwave radiation are perturbed using additive noises with normal distribution (mean = 0, SD = 2.5 K) and (mean = 0, SD = 10 W m −2 ), respectively.Shortwave radiation is multiplied by a random noise (constrained in the range of [0.2, 1.8]) with normal distribution (mean = 1, SD = 0.7) (Kumar et al., 2012).Although standard perturbations are defined, the perturbation methods need to be chosen carefully depending on the uncertainty of the input data.The spatio-temporal correlations and the cross correlations among the four variables are not considered for simplicity (Kumar et al., 2009;Reichle et al., 2010), but can be handled if desired.available, the perturbation of model forcings, soil parameters and vegetation parameters can be modified according to the data basis.
Model forcings can also be perturbed including spatio-temporally correlated noise, for example with help of the Fast Fourier Transform approach and a first-order autoregressive model.See (Reichle et al., 2010) for more details.Spatial correlation of soil and vegetation properties can be considered by geostatistical simulation methods (Goovaerts, 1997), this part was implemented in DasPy with the R geostatistical package geoR (http://cran.r-project.org/web/packages/geoR).

Observation operators
The measurements to be assimilated, such as soil moisture, land surface temperature and snow cover fraction, can provide direct information on the land surface states.However, some measurements may provide indirect information, examples are the passive microwave brightness temperature (Entekhabi et al., 2010) or the fast neutron intensity as measured by the cosmic-ray soil moisture probe (Zreda et al., 2012).The observation operators need to be integrated to model the relationship between the land surface states and the indirect measurements of different sensors.The spatial support of a measurement might also differ from the CLM spatial resolution.Therefore, the observation operators should include downscaling/upscaling procedures (Mascaro and Vivoni, 2012; Merlin et al., 2013).Three observation operators (Fig. 3) have been coupled to DasPy.These observation operators handle the assimilation of passive microwave brightness temperature, land surface temperature and cosmic-ray neutron intensity observations: 1.The forward observation operator for L-band passive microwave brightness temperature of the land surface is the Community Microwave Emission Model (CMEM) developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) (Drusch et al., 2009;Holmes et al., 2008).CMEM can be used to assimilate the direct radiance measurements of L-band passive microwave 7406 Introduction

Conclusions References
Tables Figures
2. The forward observation operator for land surface temperature products (e.g.Moderate-resolution Imaging Spectroradiometer, MODIS, and Geostationary Operational Environmental Satellites, GOES) is the so-called Two-Source Formulation (TSF) (Anderson et al., 2005;Kustas and Anderson, 2009).CLM only calculates the ground temperature and vegetation temperature, and does not model the composed land surface temperature from the view point of remote sensors.
The TSF can be used to calculate the composed land surface temperature using ground temperature and vegetation temperature of CLM (Han et al., 2013).TSF is tightly coupled in DasPy and the data transfer between DasPy and TSF is by memory.
3. The new developed cosmic-ray soil moisture probe is a noninvasive soil moisture sensor at the field scale (Zreda et al., 2012).DasPy integrates the COsmic-ray Soil Moisture Interaction Code (COSMIC) model (Rosolem et al., 2014;Shuttleworth et al., 2013) as the observation operator to assimilate the fast neutron intensity measured by the cosmic-ray soil moisture probe (Han et al., 2015a).COSMIC was developed under the COsmic-ray Soil Moisture Observing System project (COSMOS; http://cosmos.hwr.arizona.edu).COSMIC was tightly coupled in DasPy using the Python package of F2py.

Multivariate data assimilation
The multivariate data assimilation capabilities of DasPy include: (1) soil moisture data assimilation with direct soil moisture measurements (X.Han et al., 2012), assimilation of L-band passive microwave brightness temperature (Han et al., 2013(Han et al., , 2014) )  of remote sensing (Han et al., 2015a(Han et al., , 2013)); (3) joint estimation of soil moisture and soil properties (sand fraction, clay fraction and organic matter density) (Han et al., 2014); (4) joint estimation of soil temperature and leaf area index (Han et al., 2015a); (5) all the above capabilities combined together.Moreover, DasPy can easily be extended for the assimilation of other measurements like land surface fluxes, snow, ground water table and leaf area index.Parameter estimation can be further extended and is not limited to soil properties and leaf area index.The multivariate data assimilation of DasPy is summarized in Fig. 4.

Parallel computing
DasPy can run on a standard PC in serial mode, and also on a cluster or supercomputer in parallel mode.Four components of DasPy were parallelized for efficient data assimilation applications: (1) the model operators and observation operators have to run in parallel for different ensemble members; (2) the spatial domain needs to be decomposed into small blocks in order to meet the memory limitation of each computer node of the cluster; (3) the file operations (input/output) of CLM need to be in parallel for The OpenMP technique is also implemented in DasPy for two reasons: (1) LETKF is parallelized using Fortran and OpenMP to do data assimilation for each model grid cell of each sub-domain (Fig. 5); (2) the model states, updated after data assimilation, are used to update the CLM initial files using C++ and OpenMP.

Organization of input and output
The NetCDF file format (http://www.unidata.ucar.edu/software/netcdf)was chosen for the input/output operations of DasPy, like for example the atmospheric forcing data, multisource observation data, intermediate assimilation results and the output data.The NetCDF format is a popular machine-independent binary format for scientific data.The data and metadata can be stored together, and subsets of large datasets can be accessed efficiently.The spatio-temporal datasets can be stored in NetCDF as well.Data compression is also implemented in the NetCDF format.The Python package netCDF4 is utilized to handle all the NetCDF related operations.

Visualization
DasPy also supports online plotting.The scientific results need to be displayed graphically for a quick assessment of simulation results.Two visualization methods have been implemented using Python plotting library matplotlib: (1) 1-D plotting method for time series data, (2) 2-D plotting method for spatial data.The results can be checked in the online mode while DasPy is running.This allows a very fast visualization of analysis results and it releases researchers from the intensive work of data processing and plotting.and latent heat fluxes.MODIS (Terra/Aqua) measures the land surface temperature four times per day (twice during daytime and twice during night) with a high spatial resolution (1 km) and with an expected high accuracy.Therefore, MODIS land surface temperature is a good data source for land data assimilation at the catchment scale.CLM calculates soil temperature and vegetation temperature from the incoming and outgoing heat fluxes (Oleson et al., 2013): where ) is the amount of heat conducted across a unit cross-sectional area in unit time and λ (W mk −1 ) is the thermal conductivity.
The heat flux h (W m −2 ) into the soil is given by: The change in vegetation temperature ∆T v (K) is given by: fluxes to the CLM states/parameters has been evaluated in recent studies (Bonan et al., 2011;Hou et al., 2012;Schwinger et al., 2010).It was shown that the most sensitive CLM states/parameters were soil moisture, leaf area index, maximum rate of carboxylation (Vcmax), decay factor of subsurface runoff (Fdrai) and maximum drainage when the water table is at the surface (Qdrai).In this study, DasPy was extended to include the estimation of the three mentioned CLM-parameters above.
The parameter values of Vcmax, Fdrai, and Qdrai are tuned globally in CLM (Oleson et al., 2013).These values need to be calibrated for the specific applications at the smaller catchment scale.Therefore, MODIS LST data were assimilated in CLM to update these sensitive variables/parameters with joint state and parameter estimation in DasPy.However, a limitation is that the leaf area index as provided by MODIS underestimates the true LAI by 0.66 (m 2 m −2 ) on average for all landcover types (http://landval.gsfc.nasa.gov).

Study area and data
The data assimilation was carried out for the Rur catchment (2454 km 2 ) located in the border region of Belgium-Netherlands-Germany (Han et al., 2013;Montzka et al., 2008).The main vegetation types are forest, crop and grassland (Fig. 6) according to the MODIS land cover product (MCD12Q1 -https://lpdaac.usgs.gov)(Friedl et al., 2010).The annual precipitation and potential evapotranspiration for the northern part are in the range of [650-850 mm] and [580-600 mm], respectively.In the mountainous southern part, annual precipitation is larger due to orographic forcing [850-1300 mm] and potential evapotranspiration is lower related to less incoming radiation and lower temperatures [450-550 mm].The MODIS leaf area index product (MCD15A3 -https: //lpdaac.usgs.gov)and the soil properties of the Harmonized World Soil Database (FAO et al., 2012) were used as CLM input (X.Han et al., 2012Han et al., , 2014)).The COSMO_DE reanalysis data from 2010-2012 were used as atmospheric forcing data (Baldauf et al., 2009).The land surface temperature measurements to be assimilated in CLM were obtained from the daily MODIS land surface temperature product MOD11A1 in 2012 (https://lpdaac.usgs.gov).Only good quality data were used for the assimilation, according to the data quality flag (criterion: average error ≤ 1 K), and 246 observations were assimilated in total.

Experiment design
The spinup of CLM was done for 10 years using the forcing data from 2010 and 2011 repeatedly.Data assimilation for the year 2012 was carried out for the following seven simulation/assimilation scenarios: ( 1 [0.75, 1.25]) for Qdrai.This means that we used the default CLM parameter values as the prior information for parameter estimation.These perturbations were applied only when these parameters were updated together with the states in the data assimilation procedure.It means that these parameters were not perturbed for the scenarios without parameter estimation.DasPy was run on the JUROPA supercomputer platform (http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUROPA/JUROPA_node.html) with 5 computer nodes for each scenario, using 8 processors on each node.About 6 h were needed for each assimilation scenario.
The Root Mean Square Error (RMSE) was used to compare simulation results: where a i is the ensemble mean (either for open loop or data assimilation scenarios), obs i corresponds to a field measurement and N is the number of time steps with valid Introduction

Conclusions References
Tables Figures

Back Close
Full field measurements, the value of N for each station is different depending on the data availability.

Results
Figure in LST_Feedback_Par_All slightly increased errors.The results show that assimilating MODIS LST products improved the estimation of the soil temperature profile when the LAI and SAI were updated jointly.The RMSE values decreased by 0.2 K for the scenarios LST_LAI_SAI and LAT_Par_All.The RMSE values further decreased by 0.1 K for the scenario of LST_Feedback_Par_All where the soil moisture profile was updated together with the parameters.
Figure 11 shows the total annual actual evapotranspiration at five EC-stations for the different simulation scenarios (the basin scale characterization of annual actual evapotranspiration is plotted in Fig. 12). Figure 11 shows that assimilation of LST in combination with estimation of Vcmax, Fdrai or Qdrai alone hardly modified the performance of the simulations.The calibration of LAI and SAI results in higher actual evapotranspiration compared with the open loop simulation.The MODIS product of actual evapotranspiration shows the largest values.For the two grassland sites Rollesbroich and Kall-Sistig the difference between CLM-simulations and the MODIS product is large, whereas for the other sites the difference can be considered relatively small.Considering the relatively low RMSE values (around 40 W m −2 ) of latent and sensible heat fluxes for all simulation/assimilation scenarios, it could be argued that the multivariate assimilation resulted in an acceptable estimation of the actual evapotranspiration.The spatial pattern of actual evapotranspiration is different for the CLM-runs compared with the MODIS-product.The MODIS-product shows enhanced actual evapotranspiration in the hilly area of the south, whereas CLM does not show such a pattern.The higher values in the south could be related to a combination of higher soil moisture content and forests as land use type.The figure also shows lower actual evapotranspiration over urban areas.
The updated LAI for five sites is shown in Fig. 13.The red lines show the annual LAI from the MODIS product, interpolated linearly by CLM.The underestimation of LAI by the MODIS sensor is reduced by assimilating land surface temperature for four of the sites.However, for the forest areas no significant update was found.Introduction

Conclusions References
Tables Figures

Back Close
Full

Discussion
The results of LST assimilation for the Rur catchment show the ability of the DasPy system for joint state and parameter estimation.The characterization of the latent heat flux was (slightly) improved with the update of the soil temperature profile, LAI and SAI.The calibration of Vcmax, Fdrai and Qdrai could (slightly) improve the sensible heat flux estimation.The EC measurement data used for the validation were not corrected for the energy balance deficit.The EC system usually underestimates the latent and sensible heat fluxes (Wilson et al., 2002).In addition, the EC-footprint at the sites may not be consistent with the model spatial resolution.However, in this latter case we do not expect a systematic impact on the results but rather a random impact.The assimilation could reduce the soil temperature RMSE by 0.3 K.The rather small improvements with assimilation might be related to the good quality of the open loop simulation with CLM.
Previous studies gave a RMSE of latent and sensible heat fluxes with/without data assimilation around 50 W m −2 (Bateni and Entekhabi, 2012;Reichle et al., 2010;Xu et al., 2014) and here we find somewhat lower values around 35 W m −2 .Results were not improved when soil moisture was updated by assimilating LST.This is at least partly related to the dense vegetation coverage for large parts of the Rur catchment (Fig. 6).
For dense vegetation, the correlation between LST and soil moisture is worse than for light vegetation (Xu et al., 2014).We would therefore expect better results for sparsely vegetated areas.
Overall, the assimilation of LST data hardly improves the characterization of hydrological states and fluxes.These results are consistent with previous findings (Bosilovich et al., 2007;Ghent et al., 2010;Reichle et al., 2010;Xu et al., 2014).A main reason why LST assimilation most probably did not improve much the assimilation results is the limited information content of a single LST measurement.We expect that LST measurements carry more information on model parameters, whose update could help to more structurally improve the characterization of hydrological states and fluxes.This was done in this study.However, instead of assimilating data measured at a single  tekhabi, 2012;Xu et al., 2014).The ground validation data could also be biased due to the scale mismatch of the CLM simulations and in-situ sensors like the EC footprint.All these factors could be reasons why the assimilation of LST did not improve the CLM simulations significantly (Reichle et al., 2010).The inclusion of other data types besides LST in the data assimilation, like for example soil moisture content, leaf area index, and the combination with parameter estimation could be promising to improve the results as compared to univariate data assimilation of LST (Bosilovich et al., 2007).For large scale applications, the data transfer among different components should be changed from file to memory making use of the high performance global array package, e.g.petsc4py (https://bitbucket.org/petsc/petsc4py). DasPy provides a free multivariate land data assimilation framework to the community, whose development required several years of effort.DasPy is open source and modular.We leave more space for the users to improve or expand the performance of DasPy.
DasPy has been evaluated for spatial scales ranging from 100 s of grid cells to 100 000 s of grid cells.Multivariate data assimilation of soil moisture, L-band passive microwave remote sensing, land surface temperature and cosmic-ray neutron intensity, including model parameter estimation has been implemented in DasPy.DasPy can be used to integrate more remote sensing products and more land surface states, such as the leaf area index, latent heat flux, sensible heat flux and snow depth.

Conclusions
The multivariate land data assimilation framework DasPy is introduced in this paper.
DasPy has been utilized successfully in several multivariate land data assimilation studies of soil moisture, land surface temperature, L-band passive microwave brightness temperature, cosmic-ray neutron intensity, joint estimation of soil moisture and soil properties, and the joint estimation of soil temperature and leaf area index.It has been applied in applications at the point scale and the catchment scale.In this study, the performance of DasPy was illustrated for a new case study of the assimilation of a land surface temperature product from MODIS for the Rur catchment in western Germany.
DasPy is implemented in the Python programming language which is easy to learn and use for scientific computing.Three observation operators have been integrated for the assimilation of land surface temperature, L-band passive microwave brightness temperature and cosmic-ray neutron intensity, respectively.The LETKF algorithm is utilized as the main data assimilation method because of its easy parallelization, and Figures the state augmentation approach for the joint state and parameter estimation is also implemented as an option for the data assimilation.DasPy can be executed in serial mode on a laptop but also in the high performance computing environments with many processors.
The current version of DasPy is designed for CLM specifically.DasPy is not limited to soil moisture and soil temperature assimilation.DasPy provides a flexible framework to extend assimilation of snow depth, terrestrial water storage and the ground water table depth and include the estimation of other model parameters.Assimilation of the normalized difference vegetation index, leaf area index, and net ecosystem exchange (NEE) measurements can also be integrated relatively easily if the biogeochemical version of CLM is used as model operator.
In summary, DasPy is a ready to use multivariate land data assimilation framework.DasPy is under continuous further development and more multivariate data assimilation capabilities will be added to DasPy in the nearby future.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

( 3 )
representation of model uncertainties, (4) observation operators, (5Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) it can easily be adapted to future new versions of CLM and other model operators.The main disadvantage is the loss of computational efficiency, however, the binary NetCDF file format used by CLM could compensate for loss of computational performance.Discussion Paper | Discussion Paper | Discussion Paper | 2. Soil properties perturbation: the soil hydraulic and thermal properties of CLM are derived from the soil properties of sand fraction, clay fraction and organic matter density(Han et al., 2014;Oleson et al., 2013).Additive perturbations are imposed on the soil properties, typically using a uniformly distributed random noise U[−10.0,10.0 %] on sand and clay fraction and U[−10.0,10.0 kg m −3 ] on organic matter density (Han et al., 2014).3. Vegetation properties: leaf area index is typically perturbed using a multiplicative noise with uniform distribution U[0.75, 1.25].If more detailed information is Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | and cosmic-ray fast neutron intensity (Han et al., 2015a); (2) soil temperature data assimilation with direct soil temperature measurements and land surface temperature products 7407 Discussion Paper | Discussion Paper | Discussion Paper | large ensemble size; (4) the data assimilation of each model grid cell was parallelized in LETKF.The parallel implementation of DasPy is illustrated in Fig. 5. First, the mpi4py library is used to initialize the global communicator (MPI_COMM_WORLD) of MPI, which controls the communication among different computer nodes, and then the servers of parallelpython (ppserver) at each computer node are started by MPI.The ensemble members of ppserver are utilized to organize the parallel computation of model simulation, filter operations and so on.The ensemble members of the model operator and the observation operators, and the file operations of DasPy are distributed on each computer node evenly.For data assimilation, the domain decomposition of DasPy needs to be specified according to the number of computer nodes and the memory requirement.The parallel calculation of each spatial block (according domain decomposition) for data assimilation is distributed to each node by ppserver.Discussion Paper | Discussion Paper | Discussion Paper |

3
Land surface temperature assimilation at catchment scale Land surface temperature is a key indicator of the Earth surface energy budget, and has large implications on hydrological processes at the land surface, such as sensible 7409 Discussion Paper | Discussion Paper | Discussion Paper | where S soil (W m −2 ) is the solar radiation absorbed by the top soil, L soil (W m −2 ) is the longwave radiation absorbed by the soil, H soil (W m −2 ) is the sensible heat flux; and λE soil (W m −2 ) is the latent heat flux.
4) where S v (W m −2 ) is the net solar radiation absorbed by the vegetation, L v (W m −2 ) is the net longwave radiation absorbed by the vegetation and H v (W m −2 ) and λ vap E v (W m −2 ) are sensible and latent heat fluxes from vegetation.From Eqs. (2) and (4) it follows that the calculated soil/vegetation temperature is sensitive to the sensible and latent heat fluxes.The sensitivity of sensible and latent heat 7410 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) Openloop -CLM open loop simulation without data assimilation, (2) LST_Vcmax -update Vcmax by assimilating LST; (3) LST_Fdrai -update Fdrai assimilating LST; (4) LST_Qdrai -update Qdrai by assimilating LST; (5) LST_LAI_SAI -update LAI and SAI (stem area index) by assimilating LST, (6) LST_Par_All -update Vcmax, Fdrai, Qdrai, LAI and SAI by assimilating LST, (7) LST_Feedback_Par_All -update Vcmax, Fdrai, Qdrai, LAI and SAI and soil moisture by assimilating LST.The soil temperature profiles were also updated in all assimilation scenarios.The two-source formulation model (TSF, see Sect.2.4) of DasPy was used as the observation operator.Fifty ensemble members of atmospheric forcing data, soil properties and leaf area index were generated with the random perturbation method and using the standard perturbation values introduced in Sect.2.3.The parameter estimation of Vcmax, Fdrai, Qdrai, LAI and SAI was carried out with help of the state augmentation approach.The update of the soil moisture profile (scenario LST_Feedback_Par_All) was also done with the state augmentation approach.As Vcmax, Fdrai and Qdrai are hard coded parameter values in CLM, the CLM source code was modified to allow the changes of these parameters.The empirical values of Vcmax, Fdrai and Qdrai can vary between [10.75 (µmol m −2 s) ∼ 177.32 (µmol m −2 s)] (Walker et al., 2014), [0.1 ∼ 5.0] and [10 −6 ∼ 10 −1 ] (Hou et al., 2012), respectively.CLM however uses default, constant values for these parameters.In order to represent the uncertainties of these 7412 Discussion Paper | Discussion Paper | Discussion Paper | parameters, the default parameter values in CLM were perturbed by multiplying their values with a value sampled from the following uniform distributions: U[0.75, 1.25] for Vcmax, log 2(U[0.75,1.25]) for Fdrai and log 10(U Discussion Paper | Discussion Paper | Discussion Paper | Figure 7 shows the RMSE values of the latent heat flux for the open loop run and the different data assimilation scenarios.The RMSE values for the five EC-stations are summarized.The results show that only in case LAI and SAI were updated together with LST, the characterization of the latent and sensible heat fluxes improved slightly (small RMSE-reduction).Nevertheless, the RMSE values at the Merzenhausen site increased slightly.Results for the sensible heat flux characterization are displayed in Fig. 8 for the same scenarios.The assimilation scenarios where only one parameter was updated together with LST (LST_Vcmax, LST_Fdrai and LST_Qdrai) show a clear reduction of the RMSE values.However, the results for three sites (Merzenhausen, Rollesbroich and Selhausen) show a slightly worse performance for the scenarios of LST_LAI_SAI, LST_Par_All and LST_Feedback_Par_All compared with the scenarios of LST_Vcmax, LST_Fdrai and LST_Qdrai.The results are however not worse than for the open loop scenario.It seems that for the sensible heat flux characterization, the calibration of Vcmax has the largest positive impact.The RMSE values for the characterization of the soil moisture and soil temperature profile at the Rollesbroich site are shown in Figs. 9 and 10.During the winter period, both the measurements and simulated values show a very high soil water content related to frequent rainfall and low evapotranspiration.Interesting for an analysis of the differences between simulated and measured soil moisture content is the period between April and September.The results of the open loop simulation for soil moisture content at 20 and 50 cm depth are already of good quality, and RMSE values are smaller than 0.04 m 3 m −3 .Data assimilation has a limited impact on soil moisture profile characterization, except for the scenario of LST_Feedback_Par_All, in which the soil moisture profile was updated explicitly.The direct updating of soil moisture content 7414 Discussion Paper | Discussion Paper | Discussion Paper | The development of DasPy is for the purpose of scientific research and not for operational application.The performance of DasPy is influenced by many aspects: the ensemble generation (perturbations of soil and vegetation parameters and atmospheric forcing data), characterization of the observation error, and the calibration of the model operator and observation operators.The users should pay attention to these aspects in order to achieve good data assimilation results with DasPy.Although hybrid parallel computation with MPI and OpenMP has been implemented in DasPy, the computational efficiency of DasPy is not very excellent because of the Python script language, intensive data transfer among the different modules of DasPy by NetCDF files and loose coupling methods for the model operator and the observations operators.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |time point, it could be better to assimilate data from a daily LST-cycle with help of a smoother approach.This was not tested in this work and is subject of future research including extension of DasPy.Other reasons for the limited impact of LST-assimilation are probably: (1) incorrect values for model parameters like soil texture, leaf area index, Vcmax, and others; (2) uncertainties of atmospheric forcing data, such as precipitation, air temperature, wind speed and incoming shortwave radiation; (3) model structural errors and (4) intrinsic errors related to model parameterizations.The bias/error of the MODIS LST products and the spatial and temporal mismatch of the CLM model and MODIS products could also have influenced the assimilation performance negatively.Soil moisture content and vegetation coverage are constantly high for the Rur catchment and the dense vegetation and wet soil could decrease the performance of LST assimilation as compared to areas with sparse vegetation and dry soil (Bateni and En-