Abstract
Probabilistic characterization of environmental variables or data typically involves distributional fitting. Correlations, when present in variables or data, can considerably complicate the fitting process. In this work, effects of high-order correlations on distributional fitting were examined, and how they are technically accounted for was described using two multi-dimensional formulation methods: maximum entropy (ME) and Koehler–Symanowski (KS). The ME method formulates a least-biased distribution by maximizing its entropy, and the KS method uses a formulation that conserves specified marginal distributions. Two bivariate environmental data sets, ambient particulate matter and water quality, were chosen for illustration and discussion. Three metrics (log-likelihood function, root-mean-square error, and bivariate Kolmogorov–Smirnov statistic) were used to evaluate distributional fit. Bootstrap confidence intervals were also employed to help inspect the degree of agreement between distributional and sample moments. It is shown that both methods are capable of fitting the data well and have the potential for practical use. The KS distributions were found to be of good quality, and using the maximum likelihood method for the parameter estimation of a KS distribution is computationally efficient.
Similar content being viewed by others
References
Ades AE, Lu G (2003) Correlations between parameters in risk models: estimation and propagation of uncertainty by Markov Chain Monte Carlo. Risk Anal 23:1165–1172
Anex RP, Lund JR, Grant R (1999) A maximum entropy approach to estimating emissions. J Air Waste Manage Assoc 49:943–952
Bukowski J, Korn L, Waternberg D (1995) Correlated inputs in quantitative risk assessment—the effects of distributional shape. Risk Anal 15:215–219
Burmaster DE, Thompson KM (1998) Fitting second-order parametric distributions to data using maximum likelihood estimation. Hum Ecol Risk Assess 4:319–339
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, IL
Christakos G (2000) Modern spatiotemporal geostatistics. Oxford University Press, New York
Christakos G, Li X (1998) Bayesian maximum entropy analysis and mapping: a farewell to Kriging estimators? Math Geol 30:435–462
Clemen RT, Reilly T (1999) Correlations and copulas for decision and risk analysis. Manage Sci 45:208–224
Cullen AC, Frey HC (1999) Probabilistic techniques in exposure assessment: a handbook for dealing with variability and uncertainty in models and inputs. Plenum Press, New York
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, London
Fletcher R (2003) Practical methods of optimization. Wiley, Chichester
Frey HC, Rhodes DS (1998) Characterization and simulation of uncertain frequency distributions: effects of distribution choice, variability, uncertainty, and parameter dependence. Hum Ecol Risk Assess 4:423–468
Georgopoulos PG, Seinfeld JH (1982) Statistical distribution of air pollutant concentration. Environ Sci Technol 16:401A–416A
Ghosh S, Henderson SG (2002) Chessboard distributions and random vectors with specified marginals and covariance matrix. Oper Res 50:820–834
Haas CN (1999) On modeling correlated random variables in risk assessment. Risk Anal 19:1205–1214
Iman RL, Conover WJ (1982) A distribution-free approach to inducing rank correlation among input variables. Commun Stat Simul Comput 11:311–334
Janes ET (1957) Information theory and statistical mechanics. Phys Rev 106:620–630
Justel A, Pena D, Zamar R (1997) A multivariate Kolmogorov–Smirnov test of goodness of fit. Stat Probab Lett 35:251–259
Klapper H (1991) Control of eutrophication in inland waters. Ellis Horwood, New York
Koehler KJ, Symanowski JT (1995) Constructing multivariate distributions with specific marginal distributions. J Multivar Anal 55:261–282
Lee RC, Wright WE (1994) Development of human exposure-factor distributions using maximum-entropy inference. J Expo Anal Environ Epidemiol 4:329–341
Lu HC (2002) The statistical characters of PM10 concentration in Taiwan area. Atmos Environ 36:491–502
Mittelhammer RC, Judge GG, Miller DJ (2000) Econometric foundations. Cambridge, Cambridge University Press
Morgan MG, Henrion M (1990) Uncertainty: a guide to dealing with uncertainty in quantitative risk and policy analysis. Cambridge University Press, New York
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7:208–313
NWIS (2004) National Water Information System, USGS http://waterdata.usgs.gov/nwis
Park SK (2005) Particulate modeling and control strategy for Atlanta, Georgia. Doctoral Thesis, Georgia Institute of Technology, Atlanta, GA
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in FORTRAN. Cambridge University Press, New York
Ryu HK (1993) Maximum entropy estimation of density and regression functions. J Econ 56:397–440
Seinfeld JH, Pandis SN (1998) Atmospheric chemistry and physics: from air pollution to climate change. Wiley, New York
Smith AE, Ryan PB, Evans JS (1992) The effect of neglecting correlations when propagating uncertainty and estimating the population distribution of risk. Risk Anal 12:467–474
US EPA (1990) Drinking water criteria document on nitrate/nitrite. Office of Drinking Water, US EPA, Washington, DC
US EPA (2004) Air quality criteria of particulate matter. Office of Research and Development, US EPA, Research Triangle Park, NC
Weisstein EW (2003) CRC Concise Encyclopedia of Mathematics. CRC Press, Boca Raton
Wu X (2003) Calculation of maximum entropy densities with application to income distribution. J Econ 115:347–354
Acknowledgments
The authors thank Dr. Seoung Bum Kim, Department of Industrial and Manufacturing Systems Engineering, University of Texas at Arlington, for his suggestions on the manuscript. Partial financial support was provided by the US EPA under Contract No. RD-83096001. Some technical assistance was provided by the Joint Graduate School of Energy and Environment (JGSEE).
Author information
Authors and Affiliations
Corresponding author
Appendix: Univariate distributions
Appendix: Univariate distributions
The forms of the univariate distributions used in this work are summarized below:
for x ∈ (0, ∞), a ∈(−∞, ∞), and b ∈ (0, ∞), where a and b are the scale and shape parameters, respectively.
for x ∈ [0, ∞), a ∈ (0, ∞), and b ∈ (0, ∞), where a and b are the scale and shape parameters, respectively. P(.) is an incomplete gamma function (Weisstein 2003).
for x ∈[0, ∞), a ∈(0, ∞), and b ∈(0, ∞), where a and b are the scale and shape parameters, respectively.
Rights and permissions
About this article
Cite this article
Manomaiphiboon, K., Park, SK. & Russell, A.G. Accounting for high-order correlations in probabilistic characterization of environmental variables, and evaluation. Stoch Environ Res Risk Assess 22, 159–168 (2008). https://doi.org/10.1007/s00477-007-0106-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-007-0106-5