Modeling of variability and uncertainty in human health risk assessment

Graphical abstract

A probability mass function is a function such that [ 9 7

_ T D $ D I F F ] [ 9 8 _ T D $ D I F F ] [ 9 9 _ T D $ D I F F ]
The cumulative distribution function of a discrete random variable X, denoted as F(x) is Let X be a continuous random variable. A probability density function of X is a non-negative function f, which satisfies for every subset B of the real line. As X must assume some value, f must satisfy This means the entire area under the graph of the PDF must be equal to unit. In particular, the probability that the value of X falls within an interval [a, b] is The CDF of a continuous random variable X is

Fuzzy set theory
Environmental/human health risk assessment is an important aid in any decision-making process in order to minimize the effects of human activities on the environment. Unfortunately, in general environmental data tends to be vague and imprecise, so uncertainty is associated with any study related with these kinds of data. Fuzzy set theory provides a way to deal with the imprecisely defined variables; defined relationships between variables based on expert human knowledge and use them to compute results. In this section, some necessary backgrounds and notions [1,2] [ 1 0 0 _ T D $ D I F F ] of fuzzy set theory that will be required in the sequel are reviewed.
Definition: Let X be a universal set. Then the fuzzy subset A of X is defined by its membership function Which assign a real number m A (x)in the interval [0,1], to each element x 2 A, where the value of m A (x) at x shows the grade of membership of x in A.
Definition: Given a fuzzy set A in X and any real number a 2 [0,1]. Then the a Àcut or a Àlevel or cut worthy set of A, denoted by a A is the crisp set Definition: The support of a fuzzy set A defined on X is a crisp set defined as The height of a fuzzy set A, denoted by h(A) is the largest membership grade obtain by any element in the set and it is denoted as Definition: An interval valued fuzzy set A defined in the universe of discourse X is represented by ,hðAÞ is the height of LMF, hðAÞ is the height of UMF and ' is an empty set.

Approach to propagate variability and uncertainty
In uncertainty modeling in terms of fuzzy set theory it is observed that representation of uncertain parameters is Type-I fuzzy set in which it is considered that membership functions precisely assign a point value from [0,1].
However, in some situation it is not always possible for a membership function of the type m : X ! [0, 1] to precisely assign one point from [0,1] so it is more realistic to assign interval value. So interval valued fuzzy numbers come into picture.
Here, probability distribution, generalized fuzzy numbers and normal IVFNs have been combined.
Consider an arbitrary mathematical model where i = 1, 2, 3, ..., m ; k = 1, 2, 3, ..., s & l = 1, 2, 3, ..., n which is a function of parameters. Suppose P i 's are m parameters presented by probabilistic distributions; G k 's as are s parameters presented by generalized fuzzy numbers with heights w s and F l are n parameters presented by normal interval valued fuzzy numbers (IVFNs).

The approach is explained below
Step 1: Initially, consider all generalized fuzzy numbers G k with heights w s as well as upper membership functions F u 1 ,F u 2 ,.,F u n of normal interval valued fuzzy numbers (IVFNs). As generalized fuzzy numbers and normal fuzzy numbers have different heights, so to deal with the model, we consider a ¼ ½0 : w 10 : wwhere w = min(w s , 1).
Step 2: Generate 5000 numbers of uniformly distributed random numbers from [0,1] and perform Monte Carlo simulation to obtain m numbers of cumulative distribution functions (CDFs) from the m input random numbers.
Step 3: Calculate a-cut for each fuzzy number (a can be taken stepwise from 0 to w). Then s + n numbers of closed intervals (as a-cut gives closed intervals) will be obtained.
Step 4: Assign all m numbers of CDFs and all combination of initial and end points of the n + s intervals in the model M and which will produce 2 s+n of CDFs. Evaluating infimum (minimum) and supremum (maximum) of the model M will give a pair of CDFs (i.e., one lower probability distribution and one is upper probability distribution).
Step 5: Consider another a level (say, for next a value 0.08 if w = 0.8) to calculate a-cut of each fuzzy numbers and repeat step-3 to step-4. The process will be terminated after execution for the value a = w.
Then, this will produce a family of CDFs.
Step 6: Next, consider all generalized fuzzy numbers G k as well as lower membership functions F l 1 , F l 2 ,.,F l n of normal interval valued fuzzy numbers F l respectively. Here also heights of normal and generalized fuzzy numbers are different, we consider that a = [0, w] where w = min(w s , 1).
Step 7: Repeat step-3 to step-5. In step-6 it should be noted thata ¼ ½0 : w 10 : w. Then we shall have another family of CDFs.
Then the final result can be interpreted in two ways. First one is, envelope of all the obtained CDFs or P-box. That is, cumulative distribution functions for some variable lies on or between two monotonic curves, then these curves form a box and called a probability box or P-box for that variable.
Later one is, Membership functions at different fractiles can be generated from these families of cdfs. It will be completely generalized trapezoidal type interval valued fuzzy number. First family of cdfs will produce UMF and later family of cdfs will give LMF with each of height w of the resulting completely generalized interval valued fuzzy number generated at different fractiles.
In this approach, 5000 Monte Carlo simulations have been considered. We initially tested with 3000, 4000, 5000 up to 10,000 simulations. Since the results obtained are very similar from 5000 simulations onwards, so, we have decided to consider 5000 Monte Carlo simulations for the rest of the study.

Hypothetical case study
To demonstrate and make use of the proposed hybrid approach a hypothetical case study for noncancer risk assessment is carried out here. As due to the discharge of produce water into the sea a lot of organic and inorganic pollutants (however, in this case study we consider only the heavy metal arsenic (As) because of its toxicity and high concentration in produced water.) release into the water and which are harmful to the aquatic organism. Therefore human being may be affected by ingestion of such contaminated aquatic organism. An evaluation is necessary to determine the possible impact such substances may have on human health and ecology. For this purpose, risk assessment is performed to quantify the potential detriment to human and evaluate the effectiveness of proposed remediation measures.
The general form of a comprehensive food chain risk assessment model as provided by EPA U.S. [3] is follows Where CDI = Chronic daily intake (mg/kg-day), FIR = fish ingestion rate (g/day), FR = fraction of fish from contaminated source, EF = exposure frequency (day/year), ED = exposure duration (years), CF = conversion factor (=10 À9 ), BW = body weight (kg), AT = averaging time (days) and C f = chemical concentration of fish tissue (mg/kg). The chemical concentration in fish tissue (C f ) can be computed as Where PEC = predicted environmental concentration (mg/l) and BCF is the chemical bioaccumulation factor in fish (l/kg). The non-cancer risk model for fish ingestion is expressed as: Where, Rfd is the reference dose. In this study, representation of the parameters predicted environmental concentration (PEC), chemical bioaccumulation factors (BCF) are considered to be fuzzy number while fish ingestion rate (FIR) is taken as normal probability distribution and other parameters are taken to be constant. Values of the parameters for the calculation of non-cancer risk are given in the Table 1.
The result of the non-cancer human health risk assessment is performed using our proposed hybrid approach and which is depicted in the following Fig. 1. Similarly, p-box of the CDFs obtained for LMF is depicted in Fig. 3 and whose range is [2.24787e-07,6.11537e-07], mean is [0.000000266,0.000000503] & variance is[0,1.996526479e -14].
However, since an envelope is the boundary of all the CDFs and hence resultant envelope/p-box will be p-box obtained for UMFs.
On the other hand, from these cdfs in Fig. 1, risk at different fractiles [1,4] can be calculated and which are obtained in the form of completely generalized interval valued fuzzy number with each of height 0.8. It is because any arithmetic operations between generalized fuzzy numbers and normal fuzzy numbers produces generalized fuzzy number.
[ ( F i g . _ 1 ) T D $ F I G ]  The graphical representation of the resulting non cancer risk value at 85th fractiles is depicted below in Fig. 5.

Additional information
Health risk assessment on exposure to substance or activity subject to uncertainty has to be dealt with robust methodology in order to make optimum decision. Decision maker in health risk assessment has the responsibility to estimate the severity and likelihood of harm to humans' health from exposure to a substance which is under plausible circumstances can cause harm. Mathematical models are often used in health risk assessment and are associated with a varying degree of uncertainty, both in the choice of model and in parameters. These models are function of many variables which are subject to uncertainty due to lack of measurement point and over-calibration, inaccurate expert judgment and subjective interpretation of available data or information [5].
Generally, uncertainty is broadly categorized into aleatory and epistemic uncertainties. Aleatory uncertainty (or simply variability) arises due to inherent variability, natural stochasticity, environmental or structural variation across space or through time, manufacturing or genetic heterogeneity among components or individuals, and variety of other sources of randomness and can [ ( F i g . _ 3 ) T D $ F I G ]  be handled by traditional probability theory. On the other hand, Model epistemic uncertainty (or simply called uncertainty) arises due to incompleteness of knowledge about the world. Sources of epistemic uncertainty include measurement uncertainty, small sample size, detection limits and data censoring, ignorance about the details of the physical mechanisms and processes involved and other imperfection in scientific understanding.
In health risk assessment both variability and uncertainties co-exist. Thus, there is a need to develop special techniques, which can handle hybrid uncertainties (i.e. fuzzy and random), for carrying out risk assessment. To address this issue different effort have been made by various researchers for joint propagation of variability and uncertainty in the same computation of risk viz., Flage et al. [6], discussed probabilistic and Possibilistic treatment of epistemic uncertainties, Dutta and Ali [7] studied fuzzy focal elements in Dempster-Shafer theory of evidence: case study in risk analysis, Ali et al. [8] discussed modeling uncertainty in risk assessment using Double Monte Carlo method, Dutta and Ali [9] proposed a hybrid method to deal with aleatory and epistemic uncertainty in risk assessment, Pedroni et al. [10,11] studied propagation of aleatory and epistemic uncertainties, Arunraj et al. [12] proposed an integrated approach with fuzzy set theory and Monte Carlo simulation for uncertainty modeling in risk assessment, Pastoor et al. [13] studied roadmap for human health risk assessment in 21st century, [ 1 0 1 _ T D $ D I F F ] Farako et al. [24,25] studied risk assessment for Salmonella in tree nuts, Salmonella in low-water activity foods and Salmonella in low-moisture foods, Zwietering [14] studied uncertainty modeling for risk assessment and risk management for safe foods, Re˛biasz et al. [15] studied joint Treatment of Imprecision and Randomness in the Appraisal of the Effectiveness and Risk of Investment Projects, Alyami et al. [16] studied advanced uncertainty modeling for container port risk analysis, [ 1 0 2 _ T D $ D I F F ] studied uncertainty handling in safety instrumented systems according to IEC and new proposal based on coupling Monte Carlo analysis and fuzzy sets, Abdo and Flaus [17] proposed a new approach with randomness and fuzzy theory for uncertainty quantification in dynamic system risk assessment, Zhang et al. [18] discussed risk assessment of shallow groundwater contamination under irrigation and fertilization conditions. However, in all their efforts it is observed that representation of epistemic uncertainty is of Type-I fuzzy set. But, in some situation it is not always possible for a membership function of the type m : X ! [0, 1] to precisely assign one point from [0,1] so it is more realistic to assign interval value. According to Gehrke et al. [19] many people believe that assigning an exact number to expert's opinion is too restrictive and the assignment of an interval valued is more realistic. In such situations interval valued fuzzy set (IVFS) comes into picture. IVFS was developed in the 1970 0 s. In May 1975 Sambuc [20] presented in his doctoral research (thesis) the concept of IVFS named as 'Àfuzzy set. After development of IVFVs, different researchers have been studied this issue and applied in different areas. An IVFS is a set in which every element has degree of membership in the form of an interval. One can say, IVFS consist of two membership function, one is upper membership [ ( F i g . _ 5 ) T D $ F I G ] function (UMF) and other is lower membership function (LMF). Dutta [21] presented a hybrid approach and combined probability distributions, type-I fuzzy set (normal fuzzy numbers) and generalized fuzzy numbers, Dutta [22] also presented an approach to combine probability distributions, normal fuzzy numbers and generalized interval-valued fuzzy numbers and a hypothetical case study has been carried out using the proposed approach. Dutta [23,5] also presented approaches to deal with hybrid situations. However, all the approaches are inappropriate when representation of model parameters are probability distributions, generalized fuzzy numbers and normal interval valued fuzzy numbers (IVFNs). Therefore, it motivates us to devise a new technique to deal with such situation.
In this regard, this paper presents an approach to combine probability distributions, generalized fuzzy numbers and normal interval valued fuzzy numbers (IVFNs) within the same framework and also a case study in non-cancer risk assessment has been carried out in this setting.
In this present paper we have proposed a method to deal with such situation where some possibilistic distributions are considered as normal interval valued fuzzy numbers together with generalized fuzzy numbers. To demonstrate and make use of the proposed approach a hypothetical case study for non-cancer risk assessment is presented here. After performing risk assessment using our approach risk is obtained in the form of Cdfs and from which, risk has been evaluated in two forms. One p-box and seconds is membership functions of the risk are generated at different fractiles. The membership functions of risk at different fractiles are completely generalized interval valued fuzzy number since representation of at least one parameter is taken as generalized fuzzy number (IVFN). The upper and lower membership functions of the completely generalized interval valued fuzzy number is trapezoidal type generalized fuzzy number, because any arithmetic operation of generalized fuzzy numbers (also generalized fuzzy number and normal fuzzy number) produces trapezoidal type generalized fuzzy number.
The main disadvantage of the proposed approach is that if representation of any model parameter is generalized IVFN then it is inappropriate to deal with such model or situation.