Elsevier

Remote Sensing of Environment

Volume 112, Issue 4, 15 April 2008, Pages 1347-1364
Remote Sensing of Environment

Assimilating canopy reflectance data into an ecosystem model with an Ensemble Kalman Filter

https://doi.org/10.1016/j.rse.2007.05.020Get rights and content

Abstract

An Ensemble Kalman Filter (EnKF) is used to assimilate canopy reflectance data into an ecosystem model. We demonstrate the use of an augmented state vector approach to enable a canopy reflectance model to be used as a non-linear observation operator. A key feature of data assimilation (DA) schemes, such as the EnKF, is that they incorporate information on uncertainty in both the model and the observations to provide a best estimate of the true state of a system. In addition, estimates of uncertainty in the model outputs (given the observed data) are calculated, which is crucial in assessing the utility of model predictions.

Results are compared against eddy-covariance observations of CO2 fluxes collected over three years at a pine forest site. The assimilation of 500 m spatial resolution MODIS reflectance data significantly improves estimates of Gross Primary Production (GPP) and Net Ecosystem Productivity (NEP) from the model, with clear reduction in the resulting uncertainty of estimated fluxes. However, foliar biomass tends to be over-estimated compared with measurements. Issues regarding this over-estimate, as well as the various assumptions underlying the assimilation of reflectance data are discussed.

Introduction

Quantifying the magnitude and dynamics of carbon (C) sources and sinks across regions is of major importance to Earth System Science because of the relevance to the dynamics of atmospheric carbon dioxide (CO2) (Schimel et al., 2001) and hence climate, and national and international carbon accounting policies (IPCC, 2001). A major uncertainty in the analysis of terrestrial C dynamics exists over whether, and under what circumstances, particular regions or landscapes act as sources or sinks. Advances in this area have been made through the use of process-based ecosystem models (Law et al., 2001a, Rastetter et al., 2003) but large variations remain between different models (Churkina et al., 2005) and their parameterisation over large areas is often uncertain. Additional knowledge has been gained through networks of C flux tower measurements over a range of ecosystems (Valentini et al., 2000). But whilst these aid understanding in conjunction with models, the results are difficult to directly scale up to provide regional or global estimates (Running et al., 1999).

This paper develops from the work of Williams et al. (2005), who argue that the Bayesian assimilation of measurements into a terrestrial ecosystem model can improve estimates of ecosystem C stocks and fluxes over model runs alone. Importantly, uncertainty in the predicted stocks and fluxes is reduced compared to the original measurements. Williams et al. (2005) use a simple box representation (the Data Assimilation Linked Ecosystem Carbon model, DALEC, outlined below) that represents the major pools and fluxes of C in an ecosystem, and assimilate a time series of flux and occasional stock measurements into the model using the Ensemble Kalman Filter (EnKF) (Evensen, 2003). This variant of the Kalman Filter (KF) tracks the model error statistics using an ensemble of model state vectors and has the desirable advantage of being applicable to non-linear models. The ecosystem model parameters (initial carbon pool sizes and rate parameters) are estimated through optimisation of a merit function against the observations available. This model can then be run to provide estimates of ecosystem dynamics and C fluxes when driven by climate data. Williams et al. (2005) demonstrate that with measurements over a young ponderosa pine forest in central Oregon, USA over a 3-year period, assimilation provides improved estimates of net ecosystem exchange of C (NEE) compared to model runs alone (− 419 ± 29 g C m 2 compared to − 251 ± 197 g C m 2). In addition, assimilation can aid the identification of model deficiencies through examining model-data biases (Cosby, 1984, Rastetter, 2003). Such an approach can be relatively easily applied to other ‘data rich’ ecosystems and regions. It can potentially be extended in spatial extent by applying the calibrated model to regions with similar ecosystems to those under which the model optimisation was performed. The fact that such a large reduction in uncertainty can potentially be achieved through the assimilation of time series observations implies that to estimate regional to global carbon stocks and fluxes accurately, such data would ideally be utilised both spatially and temporally. However, measurements of C fluxes over ecosystems are only available at sparse tower sites and obviously cannot directly fulfil this role. The use of atmospheric measurements from so-called ‘tall towers’ (Davis et al., 2003) and air- and spaceborne sensors (Gerbig et al., 2003, Raupach et al., 2005) could potentially fill this data gap on C flux measurements in the future, although this will require linking the ecosystem models to models of atmospheric transport to allow assimilation into ‘spatialised’ ecosystem data assimilation (DA) schemes. Whilst direct measurements of C fluxes clearly provide strong constraints to modelling C fluxes, there are a range of other measurements available relating to ecosystem processes that can also be considered as candidates for assimilation and reducing uncertainty in model estimates. Probably the only source of such data available as time series over wide geographical extents are those obtained from Earth Observation (EO) satellites.

The synoptic coverage of moderate to coarse spatial resolution EO sensors makes them ideal candidates for generation of data suitable for integration into ecosystem models. Typical moderate spatial resolution (250 m–1 km) sensors designed to study the biosphere, such as the MODIS sensors on NASA's Terra and Aqua satellites (Heinsch et al., 2006), or the VEGETATION sensor on board SPOT-4 and SPOT-5 (Maisongrande et al., 2004), have near daily coverage of the whole globe. Measurements in the optical spectrum are responsive to vegetation state variables such as leaf area index (LAI) (Cohen et al., 2006) that can be linked to ecosystem model variables such as leaf foliar biomass through simple linear operators (multiplication by specific leaf area in this case). In addition, such observations can be used to track vegetation dynamics to estimate vegetation phenology to calibrate temperature responses of plant functional types as well as test ecosystem model behaviour by comparison with satellite observations (McCloy & Lucht, 2004). An appreciation of the importance of such data in driving, parameterising and testing ecosystem models has resulted in much effort in the past decade being put into the generation of suites of global high-level products from EO measurements (Justice et al., 2002). Among those most directly relevant to C modelling in terrestrial ecosystems derived from optical sensors are: land cover and land cover change; LAI; the fraction of absorbed photosynthetically-active radiation (fAPAR); snow cover; tree and shrub cover and biomass (Justice et al., 2002, Law and Waring, 1994). Such data are derived via a range of methods ranging from pattern recognition (classification of land cover) through assumed or calibrated empirical relationships with satellite-derived spectral indices (snow cover, some fAPAR products), to the inversion of physically-based canopy reflectance models (LAI, fAPAR) (Knyazikhin et al., 1998). Further derived products exist, such as estimates of daily Gross Primary Production (GPP) and annual Net Primary Production (NPP) (Running et al., 2000), which are generated via production efficiency models, driven by satellite-derived estimates of fAPAR. Other relevant forms of information, particularly related to canopy structure, such as above-ground biomass or tree height are starting to become available from other sensor technologies such as Synthetic Aperture RADAR (SAR) and LiDAR (Treuhaft et al., 2004). Such products are of great importance in mapping the dynamics of terrestrial vegetation over recent decades and also in reducing the vast amount of data available from satellites to more easily manageable quantities of information. In addition, many of these products either directly produce information on components of the terrestrial C cycle (GPP, NPP) or give information that could be assimilated into terrestrial ecosystem models within the framework described by Williams et al. (2005). We can however identify a number of features of such datasets that can practically limit their utility in such an application:

  • 1.

    Despite many efforts at validation of these high-level, derived products it is generally difficult to accurately characterise their error and uncertainty (Heinsch et al., 2006). Characterisation of bias and uncertainty are key to effective DA; without this information assimilation will at best achieve sub-optimal results;

  • 2.

    A range of estimates of what are nominally the same product (e.g. fAPAR from MODIS, MERIS, AVHRR (Gobron et al., 2002, Knyazikhin et al., 1998, Tian et al., 2000)) are provided from different sensors processed by different methods and relying on different assumptions. Whilst this may be of value in obtaining ensemble statistical characterisations of such quantities, the generation of various (somewhat different) products from individual sensors can make it difficult to exploit such information in a DA scheme i.e. how should such products be used/combined;

  • 3.

    Products are generated using specific models based on particular (implicit or explicit) sets of assumptions and often particular sets of driving (e.g. climate) data, which may in turn be inconsistent with the assumptions made in the ecosystem model being used for assimilation. GPP, for example, is typically derived from EO data using a production efficiency model driven by estimates of fAPAR and climate data (Potter et al., 1993). Ecosystem models often use more sophisticated schemes to model GPP, which may result in inconsistencies between observed and modelled GPP values over a given site. This can be further exacerbated through use of different driving climate datasets;

  • 4.

    For efficiency and for other reasons such as the need for cloud-free observations, high-level EO products are generally derived from a set of observations composited over a limited time period (e.g. 8 days for the MODIS LAI/fAPAR product). Most products are derived independently for each temporal window over which samples are taken but this can result in artificially large high frequency variations in the products due to the temporal compositing. Although such variations can be smoothed out (e.g. Plummer et al., 2006), the degree of smoothing needs to be imposed, and the products take no direct advantage of expectations of relatively low temporal frequency variations which would be expected for the parameters under observation (e.g. LAI). A further disadvantage of this approach is that if insufficient cloud-free observations are available during a given compositing period for a full model-based retrieval to be made, typically either some form of empirical ‘back up’ algorithm is used, or no parameter value is generated at all.

Given the issues surrounding the high-level EO-derived ecosystem products, it is in many ways advantageous to directly assimilate satellite observations of surface-leaving spectral radiance, reflectance or albedo. This has the specific advantage, from a DA perspective, of allowing uncertainties in the observations to be tracked, as they can be far more easily characterised for satellite radiance than for higher level products. In the case of optical EO, the satellite measures top-of-atmosphere (TOA) spectral radiance. A desirable strategy, therefore, would be to assimilate these data directly into the ecosystem model. However, these measurements are affected by the atmospheric composition, to which the ecosystem model is only very slightly sensitive (e.g. the ratio of direct to diffuse global illumination, water vapour and CO2 in the planetary boundary layer). It may be feasible to attempt to assimilate TOA radiance measurements into an ecosystem model coupled with an atmospheric circulation model, but that presents many technical challenges and is beyond the scope of this paper. A compromise then, is to assimilate relatively low-level EO-derived observations of an intrinsic surface property. The most relevant observation here being at-ground spectral bidirectional reflectance factor (BRF), i.e. satellite radiance corrected for atmospheric scattering and absorption (Vermote et al., 2002) and normalised by the incident radiation field.

A typical ecosystem model does not directly provide estimates of canopy spectral reflectance. Indeed, the ecosystem model will not generally be greatly sensitive to all of the factors that affect canopy BRF (e.g. arrangement and distribution of leaves). In order to be able to assimilate such data then, an ‘observation operator’ is required that relates one or more ecosystem state variables to canopy BRF. This is achieved here with a physically-based model of canopy BRF. A key requirement of the EnKF is that the observations must be predicted from the ecosystem model state vector by means of a linear observation operator. Canopy BRF is not, however, a linear function of the state vector of a typical ecosystem model and hence the EnKF scheme must be extended to deal with this non-linearity.

Several studies have presented data assimilation schemes that use EO data to improve the functioning of ecosystem or plant growth models. Knorr and Lakshmi (2001) assimilate fAPAR, derived from AVHRR, and skin surface temperature derived from TOVS into the Biosphere Energy-Transfer Hydrology (BETHY) model. Assimilating the fAPAR and temperature data separately the study demonstrated the improvement in the modelled estimates of the un-assimilated variable. The fAPAR was used to determine the model parameters by minimising against the whole data time series using a merit function weighted by observational uncertainty and the temperature data was assimilated using a nudging technique. The fAPAR was predicted from the BETHY state vector using a semi-discrete canopy BRF model (Gobron et al., 1997, Knorr, 1998). This BRF model does not take into account various factors, such as multi-scale clumping, that are important for predicting BRFs in areas where the vegetation does not provide continuous cover. Furthermore the minimisation scheme does not account for uncertainty in the predicted fAPAR and so BRF model inadequacies are not carried through uncertainties in the model outputs. The fAPAR assimilation described by Knorr and Lakshmi (2001) forms part of the Carbon Cycle Data Assimilation System (CCDAS; Rayner et al., 2005). The CCDAS also assimilates observations of atmospheric CO2.

Nouvellon et al. (2001) couple the Markov chain canopy reflectance model of Kuusk (1995) to a grassland ecosystem model to predict the Normalised Difference Vegetation Index (NDVI). NDVI data were assimilated from a time series of TM and ETM+ images by minimising the mean squared difference between the predictions and observations. Neither uncertainties in the model nor in the observations were taken into account, hence the assimilation scheme is sub-optimal and the reduction in the error of the model predictions could not be quantified. The authors recommend the use of the Kalman Filter to address this issue.

Vivoy et al. (2001) have used the Kalman Filter to assimilate NDVI data into the ORCHIDEE (Organizing Carbon and Hydrology In Dynamic Ecosystems Environment) model. The SAIL (Scattering from Arbitrarily Inclined Leaves) model of Verhoef (1984) was used to predict the NDVI from the ORCHIDEE state vector and assimilation was performed using the Extended Kalman Filter (EKF). The authors overcome the restriction that the observation operator must be linear in the EKF assimilation scheme by using a Jacobian matrix to represent the SAIL model at the required point in parameter space. A problem with this approach is that if uncertainty in the parameters of the SAIL model is large then the Jacobian is not an adequate description of the SAIL model. The authors address this problem by running the observation operator in an ensemble mode, which they concede will only partially overcome the inadequacies of using the Jacobian (Vivoy et al., 2001). In addition the error covariance matrix used in the assimilation scheme is diagonal and so cross correlations between the errors of the different parameters cannot develop. Both of these issues may be addressed using the Ensemble Kalman Filter.

We demonstrate the feasibility of the assimilation of canopy BRF data into the simple ecosystem model, DALEC, described by Williams et al. (2005), using a non-linear observation operator and an augmented state vector. The DALEC model assimilation scheme has been shown to work well in reducing uncertainty in C flux estimates under situations where large quantities of field (C flux and stock) data are available for assimilation. DALEC operation is examined here for a test region in Oregon, USA, using the calibrated DALEC parameterisation of Williams et al. (2005), and the assimilation of 500 m resolution Terra MODIS surface BRF data at red and near infrared (NIR) wavelengths over a 3-year period. Predicted carbon fluxes are compared to those measured in the field during the same time period. Assumptions underlying the assimilation scheme are discussed and reductions in uncertainty in the predictions of C fluxes are demonstrated.

Section snippets

The Ensemble Kalman Filter

The EnKF was designed to facilitate data assimilation into non-linear models within a Kalman gain scheme. Its formulation is largely due to Evensen, 1994, Evensen, 2003. It has the same basic form as the KF but instead of propagating a model error covariance matrix the EnKF uses an ensemble of model states to represent error statistics in the model:Aa=A+AATHT(HAATHT+Re)1(DHA)where, A is a matrix containing the ensemble of model state vectors; A′ is the matrix of ensemble perturbations

Results

The DALEC model was run in an ensemble of 200 members for the years 2000 to 2002 inclusive using the meteorological data collected at the field site and the parameterisation previously determined by Williams et al. (2005) for each of the 81 pixels in the study area. On days when MODIS observations were available the EnKF was used to adjust the ensemble state so as to represent the best estimate of the true system state (the ensemble mean) and the second order error statistics. Data for

Discussion

The DALEC model is a simplified description of C dynamics, which makes it suitable for DA studies of this type. In the current use of the model, EO measurements of BRF data at moderate spatial resolution are assimilated into DALEC and the state variables modified sequentially i.e. based on current estimates of the state variables at the previous time step and a current set of observations. Whilst this provides the optimal estimate of model predictions at the next time step of operation, it does

Conclusion

This paper aims to assess the feasibility of the assimilation of canopy BRF data into the simple ecosystem model, DALEC as an aid to improving spatial estimates of C fluxes and pools. The study has shown the approach to be feasible and can be considered successful in many respects, in particular in developing and linking the theory and tools required to allow assimilation of moderate resolution BRF data from MODIS into an ecosystem model through the use of an ensemble scheme with an augmented

Acknowledgements

We would like to acknowledge the support of NERC in this work through funding of the NERC Centre for Terrestrial Carbon Dynamics (CTCD). We are also most grateful to Dr. Wenge Ni-Meister for the provision of source code for the GORT model, the supporting work of Paul Schwarz, James Irvine and Meredith Kurpius at OSU and to the anonymous reviewers for their helpful comments.

References (66)

  • W. Verhoef

    Light scattering by leaf layers with application to canopy reflectance modeling: The SAIL model

    Remote Sensing of Environment

    (1984)
  • E.F. Vermote et al.

    Atmospheric correction of MODIS data in the visible to middle infrared: First results

    Remote Sensing of Environment

    (2002)
  • R.E. Wolfe et al.

    Achieving sub-pixel geolocation accuracy in support of MODIS land science

    Remote Sensing of Environment

    (2002)
  • G. Churkina et al.

    Spatial analysis of growing season length control over net ecosystem exchange

    Global Change Biology

    (2005)
  • W.B. Cohen et al.

    MODIS land cover and LAI Collection 4 product quality across nine sites in the Western Hemisphere

    IEEE Transactions on Geoscience and Remote Sensing

    (2006)
  • B.J. Cosby

    Dissolved oxygen dynamics of a stream: Model discrimination and estimation of parameter variability using an extended Kalman filter

    Water Science and Technology

    (1984)
  • K.J. Davis et al.

    The annual cycles of CO2 and H2O exchange over a northern mixed forest as observed from a very tall tower

    Global Change Biology

    (2003)
  • G. Evensen

    Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics

    Journal of Geophysical Research

    (1994)
  • G. Evensen

    The Ensemble Kalman Filter: Theoretical formulation and practical implementation

    Ocean Dynamics

    (2003)
  • C. Gerbig et al.

    Toward constraining regional-scale fluxes of CO2 with atmospheric observations over a continent: 2. Analysis of COBRA data using a receptor-oriented framework

    Journal of Geophysical Research

    (2003)
  • N. Gobron et al.

    A semidiscrete model for the scattering of light by vegetation

    Journal of Geophysical Research

    (1997)
  • Gobron, N., Pinty, B., Verstraete, M., & Taberner, M. (2002). Medium resolution Imaging Spectrometer (MERIS): An...
  • A. Granier

    Evaluation of transpiration in a Douglas-fir stand by means of sap flow measurements

    Tree Physiology

    (1987)
  • M.Z. Heinsch et al.

    Evaluation of remote sensing based terrestrial production from MODIS using AmeriFlux eddy tower flux network observations

    IEEE Transactions on Geoscience and Remote Sensing

    (2006)
  • IPCC

    Climate Change 2001: The Scientific Basis

  • J. Irvine et al.

    Contrasting soil respiration in young and old-growth ponderosa pine forests

    Global Change Biology

    (2002)
  • J. Irvine et al.

    Water limitations to carbon exchange in old-growth and young ponderosa pine stands

    Tree Physiology

    (2002)
  • J. Irvine et al.

    Age-related changes in ecosystem structure and function and effects on water and carbon exchange in ponderosa pine

    Tree Physiology

    (2004)
  • S. Jacquemoud et al.

    Prospect redux

  • W. Knorr

    Constraining a global mechanistic vegetation model with satellite data

  • W. Knorr et al.

    Assimilation of fAPAR and surface temperature into a land surface and vegetation model

  • Y. Knyazikhin et al.

    Synergistic algorithm for estimating vegetation canopy leaf area index and fraction of absorbed photosynthetically active radiation from MODIS and MISR data

    Journal of Geophysical Research

    (1998)
  • B.E. Law et al.

    Seasonal and annual respiration of a ponderosa pine ecosystem

    Global Change Biology

    (1999)
  • Cited by (123)

    • Kalman filter method for generating time-series synthetic Landsat images and their uncertainty from Landsat and MODIS observations

      2020, Remote Sensing of Environment
      Citation Excerpt :

      As a well-known sensor fusion and data fusion algorithm, Kalman filter has been widely used not only for guidance, navigation, and control of vehicles such as aircraft and spacecraft, etc., but also in time series data analysis and integration in many fields such as signal processing and econometrics, etc. In remote sensing, we can also find its applications in correction of atmospheric effects in satellite images (Arbel et al., 2004), image fusion of multisource images (Sun and Deng, 2004; Hu et al., 2013), integration of remote sensing data with ecosystem models to retrieve soil temperature (Huang et al., 2008), surface BRDF parameters (Samain et al., 2008), soil moisture (Huang et al., 2008), change detection (Kleynhans et al., 2011), and ecosystem productivity (Quaife et al., 2008), or monitor crop phenology (Vicente-Guijalba et al., 2014) and generate time series of Medium-Resolution NDVI (Normalized Difference Vegetation Index) Images (Sedano et al., 2014). However, all of these applications focused on different problems from the issues to be solved by our study.

    • A model of gross primary productivity based on satellite data suggests formerly afforested peatlands undergoing restoration regain full photosynthesis capacity after five to ten years

      2019, Journal of Environmental Management
      Citation Excerpt :

      Models utilising satellite data have been used to estimate carbon fluxes over many ecosystems (eg. Desai et al., 2011; Quaife et al., 2008; Sims et al., 2008; Wu, 2012; Xiao et al., 2004; Yuan et al., 2010), but there has only been limited work done on peatland landscapes globally (Connolly et al., 2009; Gatis et al., 2017; Harris and Dash, 2011; Kross et al., 2013; Lees et al., 2018; Letendre et al., 2008; Schubert et al., 2010), and especially few studies that we are aware of on restored peatlands (Chasmer et al., 2018). This study uses data from the NASA Moderate Resolution Imaging Sensor (MODIS), as it has long archives of freely available data (1999 to present) which allows monitoring across peatlands undergoing restoration over many years.

    View all citing articles on Scopus
    View full text