Statistical Inference of Kumaraswamy distribution under imprecise information

Traditional statistical approaches for estimating the parameters of the Kumaraswamy distribution have dealt with precise information. However, in real world situations, some information about an underlying experimental process might be imprecise and might be represented in the form of fuzzy information. In this paper, we consider the problem of estimating the parameters of a univariate Kumaraswamy distribution with two parameters when the available observations are described by means of fuzzy information. We derive the maximum likelihood estimate of the parameters by using Newton Raphson as well as EM algorithm method. The estimation procedures are discussed in details and compared via Markov Chain Monte Carlo simulations in terms of their average biases and mean squared errors.


Introduction
Kumaraswamy (1980) introduced a two parameter absolutely continuous distribution which compares extremely favorably, in terms of simplicity, with the beta distribution. The Kumaraswamy distribution on the interval (0, 1), has its probability density function (pdf) and its cumulative distribution function (cdf) with two shape parameters a > 0 and b > 0 defined by f (x) = a bx a−1 (1 − x a ) b−1 I(0 < x < 1) and F (x) = 1 − (1 − x a ) b . (1) If a random variable X has pdf given in (1) then we will write X ∼ K(a, b). The density function in (1) has similar properties to those of the beta distribution but has some advantages in terms of tractability. The Kumaraswamy pdf is unimodal, uniantimodal, increasing, decreasing or constant depending (similar to the beta distribution) on the values of the parameters. It has some basic properties of the beta distribution: a > 1 and b > 1 (unimodal); a < 1 and b < 1 (uniantimodal); a > 1 and b ≤ 1 (increasing); a ≤ 1 and b > 1 (decreasing); a = b = 1 (constant). For a detailed survey of properties of the Kumaraswamy distribution, the reader is referred to Jones (2009). This distribution has a close relation with beta and generalized beta (first kind) listed below: • If X ∼ Beta(1, b) then X ∼ K(1, b) • If X ∼ Beta(a, 1) then X ∼ K(a, 1) • If X ∼ K(a, b), then X ∼ GB1(a, 1, 1, b), where GB1 stands for the generalized beta distribution of the first kind.
Over the last few years, there has been a great interest in studying the Kumaraswamy distribution, and mixing with other well-known probability models to achieve greater flexibility in modeling several types of real data exhibiting various patterns.  Ghosh (2015) independently studied a Kumaraswamy mixture with Pareto (type IV) model useful for income modeling. Again, in another article, Ghosh (2014), derived and discussed another Kumaraswamy generalization, with mixing with a half-Cauchy distribution. Regarding discrete mixture, Ramos et al. (2015) developed and studied a new distribution, namely the Kumaraswamy-G Poisson family of distributions and discussed the associated inferences for the model parameters. Ghosh and Nadarajah (2016) studied in details, Bayesian inference for Kumaraswamy distribution based on censored samples. All the above references are indicative of the fact that the Kumaraswamy distribution has a greater applicability when it comes to modeling an observed phenomena, with possibly, the values of the variable of interest are somehow bounded between (0, 1).
However, majority of the inferential work for the Kumaraswamy distribution has been conducted under the assumption that complete data are available. In contrast, not much work has been done in the direction of missing (and or imprecise) information scenario with regard to inferential strategy for the Kumaraswamy distribution. This is a major motivation for this article.
It has been observed that in numerous real life situations we encounter data which are not only random in nature but ambiguous as well. It is to be noted that randomness involves only uncertainties in the outcomes of an experiment, while ambiguity, on the other hand, involves uncertainties in the meaning of the data. For example, consider a case study on the electric bulb manufacturing process that focuses on the lifetime of an electric bulb. An electric bulb may work perfectly over a certain period but may not work efficiently for some time, and finally becomes totally exhausted after a certain time point. Therefore, the lifetime of each electric bulb may be reported by means of ambiguous statements such as "approximately lower than 95 hours ", "approximately 25 to 40 hours", "approximately 74 to 98 but near to 110 hours ", "approximately higher than 125 hours" and so on. In such a scenario, randomness occurs when the electric bulbs are selected at random and vagueness (or ambiguity) is due to limited ability of the observer to describe the lifetime of those randomly selected electric bulbs using numbers. To deal with both types of uncertainties -randomness and vagueness, it is important to incorporate fuzzy concept into statistical toolbox.
In recent years, numerous papers on generalization of classical statistical methods to analysis of fuzzy data have appeared in the literature. Wu (2004)  The main objective of this paper is to obtain the suitable inferential procedures for a Kumaraswamy distribution when the available observations are reported by means of fuzzy information. We first describe the construction of fuzzy data from imprecise (equivalently vague) observations, and then discuss the computation of maximum likelihood estimate of the parameter a and b. Based on fuzzy data, there is no closed form for the MLE; therefore, we employ the EM algorithm to determine the maximum likelihood estimate. We also construct the approximate confidence interval of the unknown parameters by using the asymptotic distribution of the MLEs. Additionally, we consider the Bayesian inference of the parameters of the Kumaraswamy distribution. Since the Bayes estimates cannot be obtained in explicit form, we provide an approximation, namely Tierney and Kadanes approximation, as well as a Markov Chain Monte Carlo (MCMC) technique to compute those estimated and construct the highest posterior density (HPD) credible interval of the parameters a and b.
The rest of this paper is organized as follows. In Section 2, we obtain the maximum likelihood estimates of the parameters a and b , and also construct the approximate confidence intervals by using asymptotic normality of the MLEs. The Bayesian analyses are provided in Section 3. A Monte Carlo simulation study is presented in Section 4, which provides a comparison of all estimation procedures developed in this paper. Some concluding remarks are presented in Section 5.
In the following, at first, we consider the fundamental notation and some basic basic definitions of fuzzy set theory which will be frequently used in this paper. Consider an experiment characterized by a probability space X = (Ω, F , P θ ) , where (Ω, F ) is a Borel measurable space and P θ belongs to a specified family of probability measures (P θ , θ ∈ Θ) on (Ω, F ). Assume that the observer cannot distinguish or transmit with exactness the outcome in the performance of , but that rather the available observation may be described in terms of fuzzy information which is defined as follows. For details on this topic, see Tanaka et al. (1979).
• Definition 1: A fuzzy event x on X, characterized by a Borel measurable membership function µx(x) from X to [0, 1], where µx(x) represents the "grade of membership" of x tox, is called fuzzy information associated with the experiment X. The set consisting of all observable events from the experiment X determines a fuzzy information system associated with it, which is defined as follows.
• Definition 2: A fuzzy information system (henceforth, in short f.i.s.)X associated with the experiment X is a fuzzy partition with fuzzy events on X, that is a finite set of fuzzy events on X satisfying the orthogonality condition for all x ∈ X. Alternatively, according to Zadeh(1968), given the experiment X = (Ω, F , P θ ) , θ ∈ Θ and a f.i.s.X associated with it, each probability measure P θ on (Ω, F ) , induces a probability measure onX defined as follows: • Definition 3: The probability distribution onX induced by P θ is the mapping P from X to [0, 1] such that forx ∈X. In particular, the conditional density of a continuous random variable U with p.d.f. g(u) given the fuzzy eventÃ can be defined as For more details about the membership functions and probability measures of fuzzy sets, one can refer to Pak et al. (2013) and the references therein. In this context, we consider another definition due to Shafiq and Viertl (2014) which is as follows: • Definition 4: A fuzzy number is a subset, denoted byx, of the set of real numbers (denoted by R) and is characterized by the so called membership function µx, satisfying the following constraints: Some widely known examples of membership functions to characterize fuzzy numbers are triangular and trapezoidal fuzzy numbers. For example, triangular fuzzy number is defined asx = (ξ, ω, δ) with the corresponding membership function Similarly, a trapezoidal fuzzy number can be defined asx = (ξ, ω, δ, θ) with the corresponding membership function Let us again revisit the example as mentioned earlier in the context of life length of an electric bulb. Example 1: Consider a life-testing experiment in which n identical electric bulbs (made by the same company) are placed on test. A tested electric bulb may be considered as failed, or to be more precise nonconforming, when at least one value of its parameters (constituent parts) falls beyond specification limits. In reality, however, the observer does not have the possibility to measure all parameters and may not be able to define precisely the moment of a failure. So, he/she provides an interval [ξ i , ω i ] which certainly contains the lifetime of the electric bulb marked i and an interval [ξ i , θ i ] which contains highly plausible values for that lifetime. This information may be encoded as a trapezoidal fuzzy numberx i = (ξ i , ω i , δ i , θ i ) with the corresponding associated membership function In this case randomness arises from the selection of electric bulbs as well as other observable factors which influence the perception by the observer. In contrast, fuzziness arises from the meaning of the reported failure times.

Maximum likelihood estimation
Suppose that X 1 , · · · , X n is a random sample of size n from Kumaraswamy distribution with the density function given by (1). Let X = (X 1 , · · · , X n ) denotes the corresponding random vector. If a realization x of X was known exactly, one can obtain the complete-data log-likelihood function as log L (x; a, b) = n (log a + log b) + (a − 1) Next, consider the situation where x is not observed precisely, and only partial information about x is available in the form of fuzzy observationx = (x 1 ,x 2 , · · · ,x n ) with the Borel measurable membership function µx. In reality, the grade of membership µx(x) is often regarded as a "probability with which the observer gets the informationx when he/she really has obtained the exact outcome x". Oncex is given, we can obtain the observed data log-likelihood function by using the expression (4) as follows: The maximum likelihood estimate of the parameters a and b can be obtained by maximizing the log-likelihood ℓ 0 . Equating the partial derivatives of the log-likelihood (5) with respect to a and b to zero, the resulting maximum likelihood equations are: Since there are no closed form of the solutions to the likelihood equations (6) and (7), an iterative numerical search procedure needs to be considered to obtain the MLEs. Next, we describe two widely practiced search procedures, namely, the Newton-Raphson method and the EM algorithm to determine the MLEs of the parameters a and b.

Newton-Raphson procedure
Newton-Raphson algorithm is a direct approach for estimating the relevant parameters in a likelihood function. In this procedure, the solution of the likelihood equation is obtained through an iterative procedure which is as follows. Let δ = (a, b) T , be the parameter vector, where T stands for transpose. Next, at the (h + 1) th step of iteration process, the updated parameter is obtained as where And, where the second-order derivatives of the log-likelihood with respect to the parameters, required for proceeding with the Newton-Raphson method, are obtained as follows: The iteration process then continues until convergence, i.e., until ||δ h+1 − δ h || ≤ ǫ, for some predefined ǫ > 0. Note that the second-order derivatives of the log-likelihood are required at every iteration stage in the Newton-Raphson method. However, quite often, the computation of the derivatives based on fuzzy data can be really troublesome. This is major drawback of this method.
To remedy against this melody, a viable alternative to the Newton-Raphson algorithm is the well-known EM algorithm. In the following, we discuss how that can be utilized to determine the MLEs in this case.

EM algorithm
The Expectation Maximization (EM) algorithm is a widely applicable approach to the iterative computation of maximum likelihood estimates and useful in a variety of incomplete-data problems. For details, see Dempster et al. (1977). Since the observed fuzzy data x can be seen as an incomplete specification of a complete data vector x, the EM algorithm is applicable to obtain the maximum likelihood estimates of the unknown parameters. In the following, we use the EM algorithm to determine the MLEs of a and b.
Based on complete data log-likelihood function from (4), and taking the partial derivative with respect to a and b, respectively, the following likelihood equations are obtained as follows: Therefore the EM algorithm is given by the following iterative process 1. Start with an initial starting given values of a and b, say, a (0) and b (0) and set h = 0.

At the (h + 1) th stage of iteration:
• The E-step requires to compute the following conditional expectations using the expression (5):

Bayesian estimation
In this section we describe the Bayes estimate of the unknown parameter as well as the corresponding highest posterior density credible interval. In the Bayesian estimation unknown parameter is assumed to behave as random variable with distribution commonly known as prior probability distribution. Here, we consider the following independent gamma priors for all the parameters a, b given as follows: • Prior for a : Π(a) ∼ Γ(2.1, 1.7).
Note: We do not claim that these choices of the hyperparameters are the optimal or uniformly best in all situations like this. However, in all the simulations/examples tried, we found this to be a reasonable one. Of course there might be others.
By combining (5) with the above set of independent priors, the joint density function of the data and the parameters a and b becomes Therefore, the marginal posterior density functions of a (and b) respectively given the data can be obtained as Note that the Bayes estimate of any function of a, say h(a), under squared error loss function is the posterior mean which is given by ∞ 0 Π(a|data)h(a)da. (12) and similarly for the other parameter b as well. However, the Equations (11) and (12) are not available in analytically tractable and closed nice form due to the complex form of the likelihood function. Therefore, we use Tierney and Kadanes approximation as well as MCMC method for computing the Bayes estimate of a and b.

Tierney and Kadane's approximation
First, we rewrite the expression in (11) as (for both the parameters a and b respectively) where F (a) = 1 n log Π(data, a), and F * (a) = F (a) + 1 n log h(a).
Tierney and Kadane (1986) applied Laplaces method to produce an approximation of (17) as follows: whereā * andā maximize F * (ā * ) and F (ā), respectively, and θ * and θ are minus of the inverse of the second derivatives of F * (a) and F (a) atā * andā respectively. Similar operation will be assumed for the other parameter b as well. Next, we apply this approximation to obtain the Bayes estimate of the parameter a. Setting h(a) = a, we have and On substitution of (16) and (17) in (15), one can obtain the Bayes estimate of a under squared error loss. Similar approach can also be made to obtain the Bayes estimate of b under square error loss.

MCMC and HPD credible interval
Here, we first draw random samples from the posterior density function (15). Then, we compute the Bayes estimates of a and b and also construct its Highest Posterior Density (in short HPD) credible interval. Since the joint posterior density function in (11) can not be computed explicitly, we use a Metropolis-Hastings algorithm to generate samples from posterior density of τ = (a, b) as follows: The Metropolis-Hastings algorithm is carried out considering the following steps: 1. We consider a starting (initial) value τ (0) = a (0) , b (0) .

5.
We repeat steps 2 − 4 , M times to get M draws from f (τ |data). We consider 20 parallel chains of length M each. Thus, we will have a j and b j for j = 1, 2, · · · , M in general.
The retained sample values, say, τ 1 , · · · , τ M , are a random sample from the joint posterior density (a, b|data). Next, by using Monte Carlo integration technique Rubinstein and Kroese (2006), the Bayes estimate of a and b under squared error loss function can be obtained as For constructing HPD credible interval of a (say), one may use the method proposed by Chen and Shao (1999) as follows: Let a (1) < · · · < a (M) be the ordered values of a i for i = 1, 2, · · · , M . Then, consider the following 100(1−α)% credible intervals of a: The HPD credible interval of a can be derived by choosing the interval which has the shortest length. Similarly, one may obtain the HPD interval for the parameter b.

Simulation study
In this section, simulation studies are conducted to compare the performances of the different estimators and also different confidence/credible intervals. Our main objective is to compare the performances of the MLE and Bayes estimates of the unknown parameters, in terms of their average values and mean squared errors. We also compare the average lengths of the asymptotic confidence intervals to the HPD credible intervals and their coverage percentages. All the computations are performed on R-programming environment. For simulation purposes, we have considered a = 2, and b = 3 and different choices of sample sizes, namely n = 50, 75, 100, 150, 200. For each n, we have generated random sample from the Kumaraswamy distribution with a = 2, b = 3. Then, using the method as proposed by Pak et al. (2014), each realization of the generated samples was fuzzified by employing fuzzy information system. The estimates of the parameters a and b for the fuzzy sample were computed using the maximum likelihood method and under the Bayesian paradigm (with independent priors set up). For initial choices of the parameters (a, b) required for the MLE method, we have taken values that are wide apart from the actual values of the parameters. For computing the Bayes estimate, we have assumed that both a and b have independent gamma priors with specific choices of the hyperparameters (described earlier). We replicate the process 20000 times with a burn in of 1000 samples and report the average values (AV) and mean squared errors (MSE) of the estimates in Tables 1-2. Furthermore, we provide an approximate 95% confidence interval and also the HPD credible interval of the unknown parameters. Criteria appropriate to the evaluation of the two methods under consideration include: closeness of the coverage probability to its nominal value and expected interval width. For each simulated sample, we have computed confidence/ credible intervals and checked whether the true value of the parameter lay within the intervals and recorded the length of the intervals. The estimated coverage probability was computed as the number of intervals that covered the true value divided by 20000 while the estimated expected width of the intervals was computed as the sum of the lengths for all intervals divided by 20000. The coverage probabilities and the expected widths for different sample sizes are presented in Tables  1-2. From Tables 1-2, one may observe the following: • The MSE of the estimators decrease significantly as the sample size n increases, as one would expected.
• The performances of the Bayes estimates with with informative prior are uniformly better. It is also seen that the Bayes estimates obtained by Tierney and Kadanes approximation and the MCMC method behave in a similar manner. So, we can not say that one procedure is uniformly better than the other (while comparing Tierney and Kadanes approximation and the MCMC method). It should be noted here that although the MCMC techniques are computationally expensive, but in turn we can use them to construct HPD credible interval.
Next, considering the confidence and credible intervals, it is observed that the asymptotic results of the MLE work quite satisfactorily. It can maintain the coverage percentages in most of the cases even when the sample size is relatively small. The widths of the confidence/credible intervals decreases with an increase in the sample size n as expected. The performances of the credible intervals are quite good and their coverage percentages are close to the corresponding nominal level. Moreover, in most of the cases, the average lengths of the credible intervals are slightly shorter than the confidence intervals. One may consider a dependent prior choice set up for the Bayesian inference and perform a similar study.

Conclusions
A lot of work has been done regarding the estimation of parameters of the Kumaraswamy distributions based on complete and censored samples (see, for example, Ghosh and Nadarajah (2016)). However, in almost all cited references in this article, it was assumed that the available data are (or can be) obtained in exact numbers. In contrast, in many real world observed phenomena, the data obtained as an outcome of an experiment may not always be recorded/ evaluated/measured properly. As a consequence, there is a greater need of developing an appropriate statistical methodology to tackle such data and conduct a proper statistical analysis.
In this paper, we have discussed several estimation procedures for the Kumaraswamy distribution when the reported data are available in the form of fuzzy information. In particular, we have discussed the traditional maximum likelihood method and the Bayesian procedure (both under the independent noninformative prior and dependent prior set up). From the simulation study, it appears that the performance of the MLE based on NR method is less efficient as compared to the EM algorithm. Although, one can not say in some absolute sense that one method is superior than the other always, but still the EM algorithm is preferred due to its computational simplicity. In terms of overall comparison (with respect to minimum average bias and MSEs) the performance of the Bayes estimates is generally best.