Analysis of Count Data by Transmuted Geometric Distribution

Transmuted geometric distribution (TGD) was recently introduced and investigated by Chakraborty and Bhati (2016). This is a flexible extension of geometric distribution having an additional parameter that determines its zero inflation as well as the tail length. In the present article we further study this distribution for some of its reliability, stochastic ordering and parameter estimation properties. In parameter estimation among others we discuss an EM algorithm and the performance of estimators is evaluated through extensive simulation. For assessing the statistical significance of additional parameter, Likelihood ratio test, the Rao's score tests and the Wald's test are developed and its empirical power via simulation were compared. We have demonstrate two applications of (TGD) in modeling real life count data.


Introduction
Chakraborty and Bhati (2016) recently introduced the transmuted geometric distribution T GD(q, α) using the quadratic rank transmutation techniques of Shaw and Buckley (2007). It may be noted that though there is a large number of new continuous distribution in statistical literature which are derived using the rank transmutation technique but T GD(q, α) is the first discrete distribution derived using this technique. Chakraborty and Bhati (2016) investigated various distributional properties, showed applicability of T GD(q, α) in modeling aggregate loss, claim frequency data from automobile insurance and demonstrated the feasibility of T GD(q, α) as count regression model by considering data from health sector. As T GD(q, α) is a simple yet elegant extension of the celebrated geometric distribution with potential of application in various context of discrete data analysis. In the current article, we discussed some additional theoretical and applied aspects of T GD(q, α), which are structured as follows. In section 2, we present various reliability properties and stochastic ordering of T GD(q, α). In section 3, comparative study of maximum likelihood estimator(ML) obtained numerically and through EM Algorithm are presented through simulation, whereas in section 4, detailed hypothesis testing is discussed considering three Wald's, Rao's Score and Likelihood Ratio test for testing α = 0. To illustrate the applicability of (T GD) models in different disciplines other than those discussed in Chakraborty and Bhati (2016), we consider two real data sets and compare them with different family of distributions in Section 5. Finally, some conclusions and comments are presented in Section 6.
2. For α = −1, (1) reduces to a special case of the Exponentiated Geometric distribution of Chakraborty and Gupta (2015) with power parameter equal to 2. This is the distribution of the maximum of two iid GD(q) rvs.

Reliability properties and Stochastic Ordering
There are several situations in reliability where continuous time is not a good scale to measure the lifetime, in production we may interested in how many unit are produced by the machine before failure or health insurance companies are interested how long a patient stays in hospital before discharge/death. In such situations, the discrete hazard rate functions can be used to model ageing properties of discrete random lifetimes. We consider different hazard rate function of T GD model and associated results as follows

Hazard rate function and its classification
The hazard rate function r X (x) for X ∼ T GD(q, α) is given as The hazard rate function of T GD(q, α) is plotted in Figure 1 for various values of parameters to investigate the monotonic properties and it is clear that the hazard rate of T GD(q, α) is increasing for −1 < α < 0, decreasing when 0 < α < 1 and constant if α = 0 or 1. Also it can be seen that even when α = 1, the hazard rate approach to constant as y increases. Smaller the value of q the faster is the rate of stabilization of the hazard rate.

Second hazard rate
The second rate of failure (Xie et al. (2002)) is given by Theorem 2: The mean residual life function given in (3) is monotone decreasing (increasing) function of y depending on −1 < α < 0(0 < α < 1). Proof: It can be easily be seen that .

Stochastic Ordering
Many times there is a need of comparing the behaviour of one random variable with the other. Shaked and Shanthikumar (1994) has given many comparisons such as likelihood ratio order ( lr ), the stochastic order ( st ), the hazard rate order ( hr ), the reversed hazard rate order ( rh ) and the expectation order ( E ) having various applications in different context.
Corollary Following results are direct implications of Theorem 3. i.
respectively and for all z.

Parameter Estimation and their comparative evaluation
Estimates of the parameters q and α of T GD model can be computed by following five methods (i) sample proportion of 1's and 0's method, (ii) sample quantiles, (iii) method of moments and finally (iv) maximum likelihood (ML) method and (v) ML via EM Algorithm. Moreover, in this section we carry out comparative study of ML estimator obtained numerically and via EM Algorithm utilizing initially estimate from one of the first three methods.

From sample proportion of 1's and 0's:
If p 0 , p 1 be the known observed proportion of 0's and 1's in the sample, then the parameters q and α can be estimated by solving the equations:

From sample quantiles
If t 1 , t 2 be two observed points such that F Y (t 1 ) = γ 1 , F Y (t 2 ) = γ 2 , then the two parameters q and α can be estimated by solving the simultaneous equations

Methods of Moments
Denoting the first and second observed raw moments by m 1 and m 2 respectively, the moment estimates can be obtained by a. Either solving the following two equations simultaneously b. or by the minimization method proposed by Khan et al. (1989)

Maximum Likelihood Method
Let y = (y 1 , y 2 , · · · , y n ) be a sample of n observations drawn from T GD distribution, and Θ = (q, α) be the parametric vector. The log-likelihood function for the corresponding sample is and the score function U (Θ, y) = ∂ln ∂q , ∂ln ∂α can be obtained by differentiating log-likelihood function with respect to q and α as The maximum likelihood estimator(MLE) (Θ) of Θ is obtained by solving the non-linear system of equation U (Θ, y) = 0. Since the likelihood equations have no closed form solution, the estimatorq andα of the parameters q and α can be obtained by maximizing log-likelihood function using global numerical maximization techniques. Further, the Fisher's information matrix is given by whereq andα are the mle's of q and α respectively, Moreover elements of I y (q, α) are given as

MLE through EM Algorithm
The Expected Maximization (EM) algorithm is an useful iterative procedure to compute ML estimators in the presence of missing data or assumed to have a missing values. The procedure follows with two steps called Expectation step(E-Step) and Maximization step(M-Step). The E-step concerns with the estimation of those data which are not observed whereas the M-step is a maximization step. for more details one may refer Dempster et al.(1977). Let the complete-data be constituted with observed set of values y = (y 1 , · · · , y n ) and the hypothetical data set x = (x 1 , · · · , x n ), where the observations y i 's are distributed with random variables X defined as and rv Y be defined as where Z 1:2 ∼ GD(q 2 ), Z 2:2 ∼ EGD(q, 2)(see Chakraborty and Gupta (2015)) and Under the formulation, the E-step of an EM cycle requires the expectation of X|Y ; is a set of known or estimated parameters at k th step with known initial values. Thus, by the property of the Binomial distribution, the conditional mean is For M-step: The likelihood function of joint pdf of hypothetical complete-data (Y i , X i ), i = 1, · · · , n is given as and the corresponding complete log-likelihood function is given as The components of the score function U * n (Θ) = ∂l * n ∂α , ∂l * n ∂q are given by The EM cycle will completed with the M-step by using the maximum likelihood estimation over Θ, i.e., U * n ( Θ; y, x) = 0 with the unobserved x i s replaced by their conditional expectations given in (10). Hence we obtain the iterative procedure of the EM algorithm as where q (k+1) should be determined numerically.

Standard errors of estimates obtained from EM-algorithm
In this section, we obtain the standard errors (se) of the estimators from the EM-algorithm using result of Louis (1982). Let z = (y, x), then the 2 × 2 observed information matrix I c (Θ, z) = ∂ ∂Θ U c (Θ; z) are given by Taking the conditional expectation of I c (Θ; z) given x, we obtain the 2 × 2 matrix where involve the following terms Finally, the observed information matrix (I) can be computed as I( Θ; y) = l c ( Θ; y) − l m ( Θ; y), and I( Θ; y) can be inverted to obtain an estimate of the covariance matrix of the incompletedata problem. The square roots of the diagonal elements represent the estimates of the standard errors of the parameters.

Simulation Study to evaluate EM algorithm
Here we study the behaviour of ML estimators obtained by direct numerical optimization and also through EM algorithm for different finite sample sizes and for different T GD(q, α). Observations from T GD(q, α) are generated using the quantile function provided in Chakraborty and Bhati (2016) (see result 4 of Table 1). In the next two subsections, first we investigate the performance of ML estimators ( q, α) for various combinations of parameters (q, α) in subsection (3.6.1) and then evaluate the performance with respect to varying sample size for fixed parameter values in subsection (3.6.2).
1. Choose the value (q 0 , α 0 ) for the corresponding elements of the parameter vector Θ = (q, α), to specify the T GD(q, α) ; 2. Choose sample size n; 3. Generate N independent samples of size n from T GD(q, α); 4. Compute the ML and EM estimate Θ n of Θ for each of the N samples; 5. Compute the average bias, average standard error of the estimate.
In our experiment we have considered the number of replication N = 1000. It can be observed from Table 1 and Table 2 that as the sample size increase both average bias and average se both decreases.

Performance of estimators for different sample size
In this subsection, we assess the performance of ML estimators of (q, α) as sample size n, increases by considering n = 25, 26, ..., 200, for q = 0.25 and α = −0.5. For each n, we generate one thousand samples of size n and obtain MLEs and their standard error. For each repetition we compute average bias and average squared error. Figures 2 and 3 shows behaviour of average bias and average standard error of parameter q and α, for fixed q = 0.25 and α = −0.5, as one varies sample size n. The horizontal dotted lines in Figure 2 corresponds to zero value and it is clear in figure 2 that the biases approach to zero with increasing n also in figure 3, average standard errors for both parameters (q and     Based on our findings it is clear that EM algorithm produces better ML estimators with smaller average bias as compared to the regular ML estimators while w.r.t. standard error there is not much to choose between the two procedures.

Tests of hypothesis
The T GD(q, α) distribution with parameter vector Θ = (q, α) reduces to the Geometric distribution with parameter q when α = 0. This additional parameter α controls the proportion of zeros of the distribution relative to geometric distribution and also the tail length. Therefore it is of interst to develop test procedure for detecting departure of α from 0. In this section we develop the likelihood ratio test (LRT), the Rao's score test and the Wald's test for testing the null hypothesis H 0 : α = 0 against the alternative hypothesis H 1 : α = 0 and numerically study the statistical power of these tests through extensive simulation.

Likelihood Ratio Test, Rao's Score Test and Wald's Test
The Likelihood Ratio Test(LRT) is based on the difference between the maximum of the likelihood under null and the alternative hypotheses. The LRT test statistics is given by −2 log( L( Θ * ;y) L( Θ;y) ) where Θ * and Θ are the MLE obtained under the null and alternative hypotheses respectively. The LRT is generally employed to test the significance of the additional parameter which is included to extend a base model.
The Rao's Score test (Rao, 1948)is based on the score vector defined as the first derivative of the log likelihood function w.r.t. the parameters. Rao's score test statistic U I −1 U / , where U is the score vector and I is the information matrix derived under the null hypothesis. The score vector and the information matrix, obtained by evaluating the derivative of the log-likelihood function, log L are provided in section 4.4.Note that the scores actully are the slopes of the likelihood functions.
The Wald's test statistics (1943)is based on on the difference between the maximum of the likelihood estimate value of the parameter under alternative hypothesis and the value specified by the under null hypothesis. The Wald's test statistic is given in our case by is the (2, 2)th element of the inverse of the information matrix I, and α is the MLE of α both under alternative hypotheses. Whereas α 0 is the value of α as per H 0 . Note that I −1 [22] is an estimate of the variance of α. Therefore in the present case our Wald's statistic reduces to ( α) 2 V( α).
All the test statistics follow asymptotically Chisqure distribution with "k" degrees of freedom, where "k" is the number of parameter specified by the null hypothesis. so in the present case the df is just "1". For well behaved likelihood function all these tests are based on measuring the discrepancy between null and the alternative hypotheses.

Statistical Power Analysis
Here we present a simulation based study of the statistical power of LR tests, Rao's Score test and the Wald's test considering 5%level of significance.Since the test are asymptotic in nature we have considered four different sample sizes, two samples of smaller sizes namely n = 100, 300, one medium size 500 and one large size 1000.We have generated 1000 replications for each sample size n. The power of these test are estimated by proportion of rejection in these 1000 replications. The effect size (ES) is a measure of departure from the null hypothesis which in the present case is given by α−0 = α is fixed at −0.7, 0.5, −0.3, −0.1, 0.1, 0.2, 0.5, 0.7 for our experiments.
The results are presented in Table 3, Table 4, Figures 4 to 7 reveal that the as expected the power increases with the sample size n and ES; for positive ES all the tests displays show increase in power with the increase in either or both ES and sample size, while for negative ES power increases in a much faster pace. Power for score test is more than LRT for negative effect size where as it is other way for positive effect size. For positive effect size the power of the tests gets closer with increase in sample size.From the over all observation it is clear that the Wald,s test is more reliable than both LRT and Score tests.
The null hypothesis H 0 : α = 0 against H 1 : α = 0 are examined utilizing the LR, Rao's Score and Wald's test, and the results along with the descriptive statistics are presented in Table 5. Both the datasets confirm the presence of over dispersion. Moreover Rao's Score and Wald's test rejects the null hypothesis at 5% significance level. The suitability of the proposed T GD(q, α) model with other competitive distributions namely Com-Poisson (p, α) (Conway and Maxwell (1962)), ZDGGD(q, α) (Sastry et al. (2016)), Negative Binomial(r, p) is carried out and the log likelihood and Akaiki Information Criteria(AIC) value are computed for four models for both the datasets. The results in table 6 reveals that the T GD(q, α) is the best fitted model and could be consider as competitive model for the datasets considered.

Conclusion
The current paper investigates some additional property of the T GD(q, α) distribution with emphasis on the simulation study of the behaviors of the parameter estimation and also power of tests of hypothesis to check statistical significance of the additional parameter. In the parameter estimation we have presented different methods including the EM algorithm implementation of the MLE. A comparative simulation based evaluation of the EM algorithm based MLE against the usual MLE has reveled the superiority of the former in terms of the bias and mean squared errors. We have also presented data modeling examples to showcase the advantage of the T GD(q, α) over some of the existing distribution from literature. As such it is envisaged that the present contribution will useful for discrete data analysts.