Skip to main content
Log in

Accounting for high-order correlations in probabilistic characterization of environmental variables, and evaluation

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Probabilistic characterization of environmental variables or data typically involves distributional fitting. Correlations, when present in variables or data, can considerably complicate the fitting process. In this work, effects of high-order correlations on distributional fitting were examined, and how they are technically accounted for was described using two multi-dimensional formulation methods: maximum entropy (ME) and Koehler–Symanowski (KS). The ME method formulates a least-biased distribution by maximizing its entropy, and the KS method uses a formulation that conserves specified marginal distributions. Two bivariate environmental data sets, ambient particulate matter and water quality, were chosen for illustration and discussion. Three metrics (log-likelihood function, root-mean-square error, and bivariate Kolmogorov–Smirnov statistic) were used to evaluate distributional fit. Bootstrap confidence intervals were also employed to help inspect the degree of agreement between distributional and sample moments. It is shown that both methods are capable of fitting the data well and have the potential for practical use. The KS distributions were found to be of good quality, and using the maximum likelihood method for the parameter estimation of a KS distribution is computationally efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Ades AE, Lu G (2003) Correlations between parameters in risk models: estimation and propagation of uncertainty by Markov Chain Monte Carlo. Risk Anal 23:1165–1172

    Article  CAS  Google Scholar 

  • Anex RP, Lund JR, Grant R (1999) A maximum entropy approach to estimating emissions. J Air Waste Manage Assoc 49:943–952

    CAS  Google Scholar 

  • Bukowski J, Korn L, Waternberg D (1995) Correlated inputs in quantitative risk assessment—the effects of distributional shape. Risk Anal 15:215–219

    Article  Google Scholar 

  • Burmaster DE, Thompson KM (1998) Fitting second-order parametric distributions to data using maximum likelihood estimation. Hum Ecol Risk Assess 4:319–339

    Article  CAS  Google Scholar 

  • Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, IL

  • Christakos G (2000) Modern spatiotemporal geostatistics. Oxford University Press, New York

    Google Scholar 

  • Christakos G, Li X (1998) Bayesian maximum entropy analysis and mapping: a farewell to Kriging estimators? Math Geol 30:435–462

    Article  Google Scholar 

  • Clemen RT, Reilly T (1999) Correlations and copulas for decision and risk analysis. Manage Sci 45:208–224

    Article  Google Scholar 

  • Cullen AC, Frey HC (1999) Probabilistic techniques in exposure assessment: a handbook for dealing with variability and uncertainty in models and inputs. Plenum Press, New York

    Google Scholar 

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, London

    Google Scholar 

  • Fletcher R (2003) Practical methods of optimization. Wiley, Chichester

    Google Scholar 

  • Frey HC, Rhodes DS (1998) Characterization and simulation of uncertain frequency distributions: effects of distribution choice, variability, uncertainty, and parameter dependence. Hum Ecol Risk Assess 4:423–468

    Article  CAS  Google Scholar 

  • Georgopoulos PG, Seinfeld JH (1982) Statistical distribution of air pollutant concentration. Environ Sci Technol 16:401A–416A

    Article  CAS  Google Scholar 

  • Ghosh S, Henderson SG (2002) Chessboard distributions and random vectors with specified marginals and covariance matrix. Oper Res 50:820–834

    Article  Google Scholar 

  • Haas CN (1999) On modeling correlated random variables in risk assessment. Risk Anal 19:1205–1214

    CAS  Google Scholar 

  • Iman RL, Conover WJ (1982) A distribution-free approach to inducing rank correlation among input variables. Commun Stat Simul Comput 11:311–334

    Article  Google Scholar 

  • Janes ET (1957) Information theory and statistical mechanics. Phys Rev 106:620–630

    Article  Google Scholar 

  • Justel A, Pena D, Zamar R (1997) A multivariate Kolmogorov–Smirnov test of goodness of fit. Stat Probab Lett 35:251–259

    Article  Google Scholar 

  • Klapper H (1991) Control of eutrophication in inland waters. Ellis Horwood, New York

    Google Scholar 

  • Koehler KJ, Symanowski JT (1995) Constructing multivariate distributions with specific marginal distributions. J Multivar Anal 55:261–282

    Article  Google Scholar 

  • Lee RC, Wright WE (1994) Development of human exposure-factor distributions using maximum-entropy inference. J Expo Anal Environ Epidemiol 4:329–341

    CAS  Google Scholar 

  • Lu HC (2002) The statistical characters of PM10 concentration in Taiwan area. Atmos Environ 36:491–502

    Article  CAS  Google Scholar 

  • Mittelhammer RC, Judge GG, Miller DJ (2000) Econometric foundations. Cambridge, Cambridge University Press

    Google Scholar 

  • Morgan MG, Henrion M (1990) Uncertainty: a guide to dealing with uncertainty in quantitative risk and policy analysis. Cambridge University Press, New York

    Google Scholar 

  • Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7:208–313

    Google Scholar 

  • NWIS (2004) National Water Information System, USGS http://waterdata.usgs.gov/nwis

  • Park SK (2005) Particulate modeling and control strategy for Atlanta, Georgia. Doctoral Thesis, Georgia Institute of Technology, Atlanta, GA

  • Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in FORTRAN. Cambridge University Press, New York

    Google Scholar 

  • Ryu HK (1993) Maximum entropy estimation of density and regression functions. J Econ 56:397–440

    Google Scholar 

  • Seinfeld JH, Pandis SN (1998) Atmospheric chemistry and physics: from air pollution to climate change. Wiley, New York

    Google Scholar 

  • Smith AE, Ryan PB, Evans JS (1992) The effect of neglecting correlations when propagating uncertainty and estimating the population distribution of risk. Risk Anal 12:467–474

    Article  CAS  Google Scholar 

  • US EPA (1990) Drinking water criteria document on nitrate/nitrite. Office of Drinking Water, US EPA, Washington, DC

  • US EPA (2004) Air quality criteria of particulate matter. Office of Research and Development, US EPA, Research Triangle Park, NC

  • Weisstein EW (2003) CRC Concise Encyclopedia of Mathematics. CRC Press, Boca Raton

    Google Scholar 

  • Wu X (2003) Calculation of maximum entropy densities with application to income distribution. J Econ 115:347–354

    Google Scholar 

Download references

Acknowledgments

The authors thank Dr. Seoung Bum Kim, Department of Industrial and Manufacturing Systems Engineering, University of Texas at Arlington, for his suggestions on the manuscript. Partial financial support was provided by the US EPA under Contract No. RD-83096001. Some technical assistance was provided by the Joint Graduate School of Energy and Environment (JGSEE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kasemsan Manomaiphiboon.

Appendix: Univariate distributions

Appendix: Univariate distributions

The forms of the univariate distributions used in this work are summarized below:

$$\hbox{Lognormal:}\quad f_X (x)=\frac{1}{\sqrt {2\pi } b x}\exp \, \left[ {\frac{-(\ln x-a)^2}{2\,b^2}} \right]\; \hbox{and}\; F_X (x)=\frac{1}{2}\left[ {1+\hbox{erf}\,\left( {\frac{\ln x-a}{\sqrt 2 b}} \right)} \right] $$

for x ∈ (0, ∞), a ∈(−∞, ∞), and b ∈ (0, ∞), where a and b are the scale and shape parameters, respectively.

$$ \hbox{Gamma:}\quad f_X (x)=\frac{1}{{{\Upgamma}}(a)\,{b^a}}x^{a-1}\exp \,\left( {\frac{-x}{b}} \right)\; \hbox{and}\; F_X (x)=P\left( {a,\frac{x}{b}} \right) $$

for x ∈ [0, ∞), a ∈ (0, ∞), and ∈ (0, ∞), where a and b are the scale and shape parameters, respectively. P(.) is an incomplete gamma function (Weisstein 2003).

$$ \hbox{Weibull:}\quad f_X (x)=b\,a^{-b}x^{b-1}\exp \,\left[ {-\left( {\frac{x}{a}} \right)^b} \right]\; \hbox{and}\; F_X (x)=1-\exp \left[ {-\left( {\frac{x}{a}} \right)^b} \right] $$

for x ∈[0, ∞), a ∈(0, ∞), and b ∈(0, ∞), where a and b are the scale and shape parameters, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manomaiphiboon, K., Park, SK. & Russell, A.G. Accounting for high-order correlations in probabilistic characterization of environmental variables, and evaluation. Stoch Environ Res Risk Assess 22, 159–168 (2008). https://doi.org/10.1007/s00477-007-0106-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-007-0106-5

Keywords

Navigation