Impact of measurement error and limited data frequency on parameter estimation and uncertainty quantification

doi:10.1016/j.envsoft.2019.03.022

Environmental Modelling & Software

Volume 118, August 2019, Pages 35-47

https://doi.org/10.1016/j.envsoft.2019.03.022 Get rights and content

Highlights

•
Historical observed data is required for calibration.
•
Measurement error and limited data frequency result in parameter uncertainty.
•
The results highlight the critical roles of measurement error and frequency in the calibration.
•
The effect of the measurement uncertainty is significant when the calibrated data are limited.
•
The research findings can be used to support measurement prioritization and resource allocation.

Abstract

Parameter estimation, using historical observed data, is an important part of the environmental modeling. The uncertainty in the parameter estimation limits the applications of environmental models.

In this paper, the influence of limited and uncertain calibrated data on the performance of the parameter estimation are systematically investigated. For this purpose, synthetic observations with a given uncertainty and frequency are used to estimate the model parameters of a conceptual water quality (WQ) model of the River Zenne, Belgium. Bayesian inference using Markov Chain Monte Carlo sampling is adopted to simultaneously perform the automatic calibration and the uncertainty analysis. The results highlight the critical roles of measurement frequency and uncertainty in the model calibration. We found that the effect of the measurement uncertainty on the parameter estimation is significant when the calibrated data points are limited (e.g. monthly data). The research findings can be used to support measurement prioritization and resource allocation.

Introduction

In order to use environmental models for different tasks, such as predictions, scenario analysis, and setting up regulations, they should represent the reality adequately. Moreover, they should be scientifically sound, robust and defensible (U.S. EPA. 2002). In general, model results are affected by the model structure (i.e. the model assumptions), the model inputs, the boundary conditions, and the model parameters (van Griensven and Meixner, 2006; Rode et al., 2010). The underlying assumptions of the model are often fixed, and therefore, the model structure is not changed during the modeling processes. Moreover, the input data and the boundary conditions, obtained through measuring campaigns or provided by responsible authorities, are not altered by the modeler (Nossent, 2012). On the other hand, most of the model parameters, representing some properties of the system, cannot be measured directly (Vrugt et al., 2003). As a consequence, the model parameters should be set to appropriate values in order to increase the agreement between the model results and the real system. The parameters adjustment is based on the reduction of the difference between the model results and historical measurements of the system response (Laloy et al., 2010; Vrugt et al., 2013; Leta et al., 2015). This procedure is referred to as parameter estimation, parameter optimization, model calibration or inverse modeling (Raat et al., 2004).

Parameter estimation using historical observed data is an important part of the environmental modeling practice which has been the focus of many researches and studies (e.g. Duan et al., 2006). However, because the models are only an approximation of the real system and the observed data used for the calibration contain error (i.e. measurement uncertainty), parameter estimation is error-prone (Vrugt et al., 2002). As a result, it is difficult to well identify the parameters and the parameter uncertainty is caused. In addition, in some fields, such as water quality modeling, the data collection is resource-intensive (Mannina, and Viviani, 2010), consequently, the available data has a limited frequency, e.g. biweekly or even monthly intervals (Zheng and Keller, 2007a). In these cases, a serious complication for the model calibration is the lack of reliable calibration data which results in parameter uncertainty (Raat et al., 2004; Franceschini, and Tsai, 2010a,b). The ambiguity in the parameter estimation has considerable impact on the model simulation uncertainty, and, therefore, limits the applications of environmental models, such as water quality models (Wagener et al., 2003).

Despite the critical role of amount and reliability of calibration data on the performance of the parameter estimation, to the best of the authors’ knowledge, the literature contains very few studies on quantifying the impact of these two aspects of the measurements (i.e. amount and reliability) on the parameter uncertainty of water quality models. For example, for a catchment nitrogen modeling, Raat et al. (2004) explored the relationship between the quality of the calibration data (i.e. measurement uncertainty) and the uncertainty associated with the final parameter estimates (i.e. parameter uncertainty), using virtual data. Wang et al. (2017) used synthetic data to explore how the number of tracer (i.e. isotope) data samples (i.e. measurement amount) affect model calibration. However, the effect of measurement errors of the tracer data on the parameter estimation was not studies. Therefore, it is needed to investigate the effect of both amount and reliability of the calibration data on the performance of the calibration. Neglecting the other source of uncertainties (e.g. model structural uncertainty, input data uncertainty), a modeler should first investigate that it is feasible to reach a pre-defined model performance with a given amount of uncertain measured data.

As the impact of these critical characteristics of calibration data, amount and reliability, on the model calibration has not fully addressed in literature, in this study, we focus on assessing the influence of limited uncertain calibrated data on the performance of the parameter estimation and on the parameter uncertainty intervals of water quality models. For this purpose, the following questions are formulated:

1)
What is the effect of increasing the measurement frequency on the water quality parameter estimation?
2)
What is the effect of reducing the measurement error of the observed data on the water quality parameter estimation?

To address the research questions, synthetic observations with a given uncertainty and frequency are used to estimate the model parameters of a conceptual water quality (WQ) model of the River Zenne in Belgium, for simulation dissolved oxygen (O2) and biological oxygen demand (BOD). As an optimization tool, Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling (Bates and Campbell, 2001; Kuczera and Parent, 1998), is adopted to simultaneously perform the automatic calibration and the uncertainty analysis. The synthetic data series are generated by running a WQ model, with a given set of model parameter values as ‘true’ values. The model outputs are then perturbed with a pre-specified random error, as measurement error, and sampled with a given frequency to mimic discrete measurements. The sampled data are then considered as if they were observed data and used to calibrate the model parameters, using the MCMC algorithm. Finally, it is verified if the model parameters can be identified using the limited and uncertain data. To evaluate the relationship between the amount and reliability of the calibration data and the parameter uncertainty, nine different synthetic data sets are generated by increasing the measurement error and decreasing the measurement frequency. Then, the generated synthetic data are used as calibration data in subsequent optimization runs.

Section snippets

Conceptual Integrated Tool for Water Quality Assessment (CIToWA)

In order to enhance the applicability of conceptual river water quality simulators, Woldegiorgis (Woldegiorgis, 2017; Woldegiorgis et al., 2017) developed a Conceptual Integrated Tool for Water Quality Assessment (CIToWA), as an alternative to detailed WQ simulators. In CIToWA, the river system is represented by reaches which are conceptual elements that divide the channel longitudinally into different parts. CIToWA obtains estimates of discharges and velocities of the reaches from external

The results of the sensitivity analysis

The results of the PAWN sensitivity indices of the model parameters, together with the dummy parameter, are presented in Fig. 3. The dash line represents the sensitivity index of the dummy parameter, as a threshold for the parameter screening. For simulating O2 (Fig. 3 (a)), RK2 is the most influential parameters, followed by RK1 and RK3. Considering the sensitivity index of the dummy parameter (dash line in Fig. 3 (a)), the other parameters considered less influential for simulating O2. For

Conclusion

Parameter estimation requires measurements of the system response. However, in some fields, such as water quality modeling, the measurement frequency is limited. Moreover, the measurements are uncertain. These limitations cause uncertainty in the parameter estimation and in the simulation results. The objective of this study was to investigate the effect of the measurement frequency and uncertainty on the parameter estimation process and the uncertainty quantification.

To this aim, the DREAM_(ZS)

Software/data availability

The PAWN method is implemented in the SAFE Matlab/Octave Toolbox for GSA (Pianosi et al., 2015). SAFE is freely available for non-commercial purposes at www.bristol.ac.uk/cabot/-resources/safe-toolbox/.

The MATLAB toolbox of DREAM is available upon request from the author, [email protected].

The CIToWA tool is available upon request from the author, [email protected].

Acknowledgment

The authors would like to thank the Flanders Hydraulics Research for supporting and coordinating the project of “Development of conceptual models for an integrated river basin management”.

References (55)

E.G. Bekele et al.
Multi-objective automatic calibration of SWAT using NSGA-II
J. Hydrol.
(2007)
E. Borgonovo
A new uncertainty importance measure
Reliab. Eng. Syst. Saf.
(2007)
L.C. Brown et al.
The Enhanced Stream Water Quality Models QUAL2E and QUAL2E-UNCAS: Documentation and User Manual
(1987)
Q. Duan et al.
Model Parameter Estimation Experiment (MOPEX): an overview of science strategy and major results from the second and third workshops
J. Hydrol.
(2006)
S. Franceschini et al.
Assessment of uncertainty sources in water quality modeling in the Niagara River
Adv. Water Resour.
(2010)
S. Franceschini et al.
Assessment of uncertainty sources in water quality modeling in the Niagara River
Adv. Water Resour.
(2010)
F. Han et al.
Multiple-response Bayesian calibration of watershed water quality models with significant input and model structure errors
Adv. Water Resour.
(2016)
F. Khorashadi Zadeh et al.
Comparison of variance-based and moment-independent global sensitivity analysis approaches by application to the SWAT model
Environ. Model. Softw
(2017)
G. Kuczera et al.
Monte Carlo assessment of parameter uncertainty in conceptual catchment models: the Metropolis algorithm
J. Hydrol.
(1998)
E. Laloy et al.
Parameter optimization and uncertainty analysis for plot-scale continuous modeling of runoff using a formal Bayesian approach
J. Hydrol.
(2010)

R.E. Brazier et al.

Equifinality and uncertainty in physically based soil erosion models: application of the GLUE methodology to WEPP–the Water Erosion Prediction Project–for sites in the UK and USA

Earth Surf. Process. Landforms: J. Br. Geomorphol. Res. Group

(2000)

L.C. Brown et al.

Computer Program Documentation for the Enhanced Stream Water Quality Model QUAL 2E (No. 471

(1985)

Cited by (12)

Modeling lake recovery lag times following influent phosphorus loading reduction
2023, Environmental Modelling and Software
Internal feedback of nutrients may impede timely improvement in lake water quality. We describe a parsimonious, mechanistic framework for modeling lag times to recovery of phosphorus-enriched lakes, given decreases in external loading. The approach assumes first-order kinetics in a two-compartment system taking account of phosphorus storage in and loading from benthic sediments. Bayesian parameter modeling, published sediment phosphorus release rates, and a prior dynamic calibration for one lake are used to derive estimates of key parameters. Applications are developed for an example lake, as are maps displaying estimated times to attainment of a phosphorus criterion in lakes across a midwestern state, and lag time estimates for fractional water column concentration decrease averaged over HUC-8s. Mean lag times to 50 and 75% declines in water column phosphorus concentration were estimated as 13.1 and 39.0 years respectively, across more than 70,000 lentic water bodies in the continental United States.
A fast and effective parameterization of water quality models
2022, Environmental Modelling and Software
Citation Excerpt :
Based on the SA results, 8 parameters are selected for the model calibration: RK1, RK2, RK3, BC1, BC2, BC3, Kd and Ko2 (the definitions of the parameters are given in Table 2). These parameters were also important for the conceptual WQ model of the River Zenne, Belgium (Khorashadi Zadeh et al., 2019). The manual calibration is performed by adjusting the multiplier factors of the 8 influential parameters, identified by the SA (see Section 3.1 for the list of influential parameters).
Water quality (WQ) models parameterization remains a challenging task, as these models are typically characterized by a high number of parameters. The objective of this study was to present a solution to the WQ parameterization problem by the use of a fast sensitivity analysis (SA) method and a manual calibration. For this purpose, we applied the simple screening LH-OAT method to the conceptual WQ model of the River Dender, Belgium. To evaluate the effectiveness of LH-OAT to identify the influential parameters, the advanced PAWN method was applied. A manual calibration was done using the influential parameters.
LH-OAT provided a parameter ranking that was very similar to the one of PAWN but in a much more efficient way. The Bayesian uncertainty assessment showed the effectiveness of the LH-OAT results. To conclude, a fast screening method is preferred over an advanced SA method to identify the influential parameters for the calibration.
Augmentation of limited input data using an artificial neural network method to improve the accuracy of water quality modeling in a large lake
2021, Journal of Hydrology
Citation Excerpt :
By contrast, while an automatic monitoring system can provide high-frequency data for a limited number of variables, a regular monitoring system can test many types of variables in the laboratory, albeit with low frequency. The results of water quality modeling with insufficient data can result in misleading interpretations and increase uncertainty (Zadeh et al., 2019). For instance, phytoplankton growth in water is predominantly affected by hourly changes in water temperature and solar radiation (Thomann and Mueller, 1987).
The performance of water quality models depends on both data from the external inputs and the internal processes of a water body. Limited field data can often be the major cause of errors in water quality prediction when modeling, especially in large environments. The aim of this study was to improve the prediction accuracy of water quality in a large lake using the combined application of an artificial neural network (ANN) method and a numerical model. Multilayer perceptron (MLP) method was used as an ANN method to generate temporal input data by learning complex relationships of water quality variables from two types of water quality monitoring systems at major boundaries. A regular monitoring system analyzes 13 water quality variables in 3 layers monthly or weekly, while the automatic monitoring system analyzes 8 surface water quality variables daily. The Environmental Fluid Dynamics Code (EFDC), 3-D hydrodynamics and water quality model, was calibrated with 55,588 grids to simulate the water quality of the 46.5 km section of Daecheong Lake. The accuracy of the EFDC models was assessed at four locations in the lake for the application of daily data generated by MLP models against that of interpolated data from a regular monitoring system as input of EFDC boundary conditions. According to the averaged index of agreement (IA), the performance of MLP-EFDC showed more accurate results than the EFDC using interpolated data for most variables. In particular, the maximum increase in the average IA was 14.4% for total phosphorus. However, the performances of MLP-EFDC were not significantly improved in the downstream section of the study area, where the input effects were mixed with the internal processes of the lake. This study shows that (1) unmonitored temporal input data can be developed using ANN techniques if data for learning processes are available, and (2) the linkage between the ANN technique and the numerical model can improve the prediction accuracy of the water quality in a large lake.
A computational methodology applied to optimize the performance of a river model under uncertainty conditions
2023, Journal of Supercomputing
Coupling Artificial Intelligence Techniques and Remote Sensing Data for Water Quality Simulation of Lakes
2023, SSRN
Development of Water Quality Time Series Inputs for Hydrodynamic Modeling Based on Sparse Data
2023, Oceans Conference Record (IEEE)

View all citing articles on Scopus

View full text

Impact of measurement error and limited data frequency on parameter estimation and uncertainty quantification

Highlights

Abstract

Introduction

Section snippets

Conceptual Integrated Tool for Water Quality Assessment (CIToWA)

The results of the sensitivity analysis

Conclusion

Software/data availability

Acknowledgment

J. Hydrol.

Reliab. Eng. Syst. Saf.

J. Hydrol.

Adv. Water Resour.

Adv. Water Resour.

Adv. Water Resour.

Environ. Model. Softw

J. Hydrol.

J. Hydrol.

Environ. Model. Softw

J. Hydrol.

Environ. Model. Softw

J. Hydrol.

J. Hydrol.

Environ. Model. Softw

Environ. Model. Softw

Environ. Model. Softw

Environ. Model. Softw.

Environ. Model. Softw.

Adv. Water Resour.

Adv. Water Resour.

A Markov chain Monte Carlo scheme for parameter estimation and inference in conceptual rainfall‐runoff modeling

Water Resour. Res.

Uniqueness of place and process representations in hydrological modelling

Hydrol. Earth Syst. Sci. Discuss.

The future of distributed models: model calibration and uncertainty prediction

Hydrol. Process.

Equifinality and uncertainty in physically based soil erosion models: application of the GLUE methodology to WEPP–the Water Erosion Prediction Project–for sites in the UK and USA

Earth Surf. Process. Landforms: J. Br. Geomorphol. Res. Group

Computer Program Documentation for the Enhanced Stream Water Quality Model QUAL 2E (No. 471