Transmuted Singh-Maddala Distribution: A new Flexible and Upside-Down Bathtub Shaped Hazard Function Distribution

The Singh-Maddala distribution is very popular to analyze the data on income, expenditure, actuarial, environmental, and reliability related studies. To enhance its scope and application, we propose four parameters transmuted Singh-Maddala distribution, in this study. The proposed distribution is relatively more flexible than the parent distribution to model a variety of data sets. Its basic statistical properties, reliability function, and behaviors of the hazard function are derived. The hazard function showed the decreasing and an upside-down bathtub shape that is required in various survival analysis. The order statistics and generalized TL-moments with their special cases such as L-, TL-, LL-, and LH-moments are also explored. Furthermore, the maximum likelihood estimation is used to estimate the unknown parameters of the transmuted Singh-Maddala distribution. The real data sets are considered to illustrate the utility and potential of the proposed model. The results indicate that the transmuted Singh-Maddala distribution models the datasets better than its parent distribution.


Introduction
The quality of the statistical analysis heavily depends on the assumed probability distribution.Therefore, the most attractive research direction for mathematicians and statisticians is to develop a class of suitable distributions, along with their relevant statistical properties and methodologies.The goal is to design the standard probability distributions which serve as true models for real world situations.However, there are still many important symmetric and asymmetric spaces present where the existing distributions do not follow the real data pattern.Keeping this in mind, we selected Shaw & Buckley (2009) proposed quadratic rank transmutation map that is applicable for both symmetric and asymmetric distributions.This map is a special case of the general rank transmutation map and it is defined by Shaw & Buckley (2009) without loss of generality.It is given as that yields the probability density function on differentiation and is given as (2) Where g(x) and G(x) are the probability density and cumulative distribution functions of the parent distribution, respectively.The parameter λ lies between [−1, 1] and the extreme values of the λ produce two extreme cases.These cases generate the distribution of the maximum and minimum for λ = −1 and λ = +1, F (x) = G(x) 2 , and , respectively.It is also observed, when λ = 0 that the transmuted distribution exactly approaches the parent distribution.This recent quadratic rank transmutation map is used in many studies to obtain a flexible and versatile model.Sharma, Singh & Singh (2014) proposed the transmuted inverse Rayleigh distribution and used this density function in survival analysis because its hazard function has the upside-down bathtub shape.Khan & King (2014) proposed a generalized transmuted inverse Weibull distribution and found it to found better than the parent distribution in real data application.Similar interpretation are also observed by Shahzad & Asghar (2016), Ahmad, Ahmad & Ahmed (2014), Khan, King & Hudson (2014), Aryal (2013), Merovci (2013), Elbatal (2013) and Aryal & Tsokos (2011).
The rest of the paper is organized as follows: In Section 2, we have derived and sketched the pdf and cdf of transmuted Singh-Maddala distribution.In Section 3, rth moment, moment generating function, quantile function, and random number generating process for transmuted Singh Maddala distribution are explored.Survival analysis of the distribution such as the reliability function and hazard rate function are obtained and presented graphically in Section 4. In Section 5, order statistics and the densities of lowest, highest, and joint order statistics are specified.Section 6 is about the TL-moments and its special cases.Methodology for parameter estimation is discussed in Section 7. The real data set application of the transmuted Singh-Maddala distribution is given in Section 8, and finally study is concluded in Section 9.

Transmuted Singh-Maddala Distribution
The Singh-Maddala is a well-known distribution and is attributed to Singh & Maddala (1976).It was initially used for modeling income data analysis, but due to its better performance it is now popular in a range of fields including actuarial science, economics, extreme value, and reliability studies.Zimmer, Keats & Wang (1998) studied this model and concluded that the model perform better for certain failure time data analysis.Shao, Wang & Zhang (2013) applied the extended Singh-Maddala (SM) distribution to flood frequency analysis.Brzezinski (2014) modeled the empirical impact factor distribution and observed that the SM distribution performance is much better than those of the models which were previously applied to this type of data.Sakulski, Jordaan, Tin & Greyling (2014) quantified several statistical distributions for the analysis of rainfall data such as Extreme Value, Frechet, Log-normal, Log-logistic, Rice, SM, and Rayleigh probability distributions for summer, autumn, winter and spring seasons.Finally, they stated that for all seasons SM distribution fits acutely well.There is wide monographic and periodical literature available on it, for example see, Kleiber & Kotz (2003).To enhance its applicability in various other fields, we introduced the transmuted Singh-Maddala distribution in this section.The transmuted Singh-Maddala (TSM) is more versatile and flexible than the SM distribution.
The TSM distribution is proposed using the quadratic rank transmutation map, taking the SM distribution as a parent distribution.Let X be a random sample of size n and come from the SM distribution with the pdf that has the following form: (3) where α and δ are the shape parameters (α, δ > 0), β is the scale parameter(β > 0).
The cdf and pdf of the TSM distribution is derived using ( 3) and ( 4) in the transmutation mapping given in ( 1) and ( 2).The cdf and pdf of the TSM distribution are obtained in the following form In TSM distribution, the parameter λ is the transmutation parameter that lies between [−1, 1].TSM distribution becomes very appealing and applicable due to its flexibility as it provides a more accurate fitting with the complex data.The shapes of the pdf and cdf of the TSM distribution for various combinations of all the four parameters are sketched in Figure 1 and Figure 2, respectively.These figures indicate that the TSM density demonstrates more flexibility than the parent SM distribution.
Note: * Here the representation α[i]β, shows the different values of the parameter, those starts from α and approachs β with the increment of i.

Basic Properties
In this Section, the main statistical properties for the TSM random variable, X are derived.

Moments
Theorem 1.Let the random variable X follow the TSM distribution, then its rth conventional moment has the following form Proof .By the definition, the rth conventional moment of TSM distribution is given by For convenience, y = (x/β) α is substitute in the above expression, and by taking the simple steps, we obtain m r = λβ r B(1 + r/α, 2δ − r/α) + (1 − λ)B(1 + r/α, 2δ − r/α), where B(., .) is the beta type-II function, which is defined as Taking the simple step, we obtain the required result and this result only holds r < αδ.
The mean and variance of the TSM distribution is obtained by using the result (7), in the following form and respectively.
The moment ratios such as the coefficient of variation (CV ), skewness (Sk) and kurtosis (Kr) can be obtained by using ( 7) and (9) in the usual formulas.

Quantile Function
The random variable X follows the pdf given in (6).The quantile function, say Q(q), is the inverse of the equation Now simplifying it for Q(q), we get .
To obtain the quartiles, deciles and percentiles of the TSM distribution simply replace q with the desired value.The median of the TSM distribution is a special case of the above expression and is given as

Random Data Generation
One can generate random data from the distribution function of the TSM distribution by using the inversion method where u is a standard uniform variate.The X in (10) follows the TSM distribution and can be readily used to generate the random data by taking suitable values of the parameters α, β, δ and λ.

Survival Analysis
In lifetime data analysis reliability and hazard rate functions are most commonly used to describe the life of a component or system.This section discusses these functions.

Reliability Function
The reliability function R(t) provides the probability of an item functioning for a specific quantity of time without failure.The reliability function and cdf, F (t) are reverse of each other.As R(t) and F (t) represent the probability of survival and failure respectively, the reliability function of the TSM distribution is given by

Hazard Function
Hazard function is the ratio of pdf and the reliability function.Hazard rate is important property of a random variable from survival analysis.It is used to find the conditional probability of failure, given that it has survived at time t.The hazard rate for the TSM distribution is given by It can be observed that when α < 1, the behaviour of the hazard function decreases and then move constantly.When α > 2, the behavior of the hazard function is upside-down bathtub shaped (increasing to maximum and then decreasing).Thus, the TSM distribution shows a decreasing, increasing, or unimodal hazard rate in specified ranges of the parametric values.The various shapes of hazard function are presented in Figure 3 and 4 assume different combinations of parametric values.Many survival studies eventually necessitated the hazard functions that instantly increased to maximum at the beginning of life and then gradually decreased until they stabilized.

Order Statistics of Transmuted Singh Maddala Distribution
Order statistics of a random variable that satisfie the condition of ordering X 1:n ≤ X 2:n ≤ • • • ≤ X n:n , are independently identically distributed.The order statistics of the extreme (smallest and largest), median, and joint observations are of great interest.Usually, interest lies in the lowest temperature in winter, the median income distribution in a country, the highest flood flow in dams and joint breaking strength.We derived the density of the order statistics in this section.
The density of the rth order statistics is defined by Arnold, Balakrishnan & Nagaraja (1992) and is given as Where The rth order statistics for the TSM distribution is obtained by substituting ( 5) and ( 6) in ( 13), and is given by Revista Colombiana de Estadística 40 (2017) 1-27 The density of the smallest order statistic X (1) has the following form The density of the nth order statistic, X (n) is obtained from ( 14) in the following form The joint pdf of X (r) and X (s) (1 < r ≤ s ≤ n) for the TSM distribution is derived by using the general expression defined by Balakrishnan & Cohen (1991).So, the joint pdf is obtained in the following form

Generalized TL-Moments
The TL-moments are a worthwhile contribution to extreme values analysis.These moments, based on order statistics, describe the shape of the probability distribution in a better way than conventional methods.Elamir & Seheult (2003) introduced the rth generalized TL-moments as follows where T (s,t) r is a linear function of the expectations of the order statistics s and t.The s and t are the possible trimming lowest and highest values, respectively.
Revista Colombiana de Estadística 40 (2017) 1-27 The expression of the expected value of the (r + s − k)th order statistics of the random sample of size (r + s + t) is given as where F is the cdf.
The generalized TL-moments for TSM distribution are derived by substituting ( 5), ( 6) and ( 16) in ( 15), and we get This general expression of TL-moments is used to obtain its special cases such as L-moments, TL-moments, LH-moments and LL-moments.First, two TL-moments are used to calculate the location and dispersion of the data, respectively; the ratio of TL-moments τ are the CV , Sk, and Kr, characteristic of the probability distribution, respectively.In this way, first four TL-moments are used to summarize the characteristics of the probability distribution.
The L-, TL-, LH-, and LL-moments are independently introduced by different authors, but they became the special cases of the generalized TL-moments.These moments are derived for the TSM distribution using (17) in the following subsections.

The TL-Moments (1,1)
Generally, it is possible to trim any number of the smallest and largest values from the ordered observation.As a special case, only extreme values (s = t = 1) from both sides are trimmed to derive the rth TL-moments.The following expression is obtained To derive the first four TL-moments, substitute r = 1, 2, 3, 4.

The L-Moments
When none of the observation is trimmed (s = t = 0) from the ordered sample, the generalized TL-moments reduced to L-moments.Basically L-moments introduced by Hosking (1990).The rth L-moments for a TSM distribution is given as

The LH-Moments
LH-moments were proposed by Wang (1997), and these moments describe the upper part of the data more precisely.These moments give more weight to the upper values (s = s, t = 0) of the data and the theoretical LH-moments for the TSM distribution are derived as given below

The LL-Moments
LL-moments progressively reflect the characteristics from the lower part of distribution.Bayazit & Onoz (2002) introduced these moments, and later it became the special case of the generalized TL-moments, when s = 0 and t = t in (17).The following is the expression of the r th LL-moments The LH and LL-moments can be evaluated for any value of t and s, but the preferable value for both is upto 4.

Parameter Estimation by Maximum Likelihood
In this section, interest is to estimate the parameters of the TSM distributiom by maximum likelihood estimation.Let X 1 , X 2 , . . ., X n be a random sample drawn from TSM(α, β, δ, λ) with a distribution of size n.Then, the sample likelihood function for θ = (α, β, δ, λ) T is given as The sample log-likelihood function corresponding to the above expression is obtained as Taking the first order derivatives (D α , D β , D δ , D λ ) of ( 18) with respect to the parameters, and matching the resulting expressions equal to zero to find the maximum likelihood estimators.The first order derivatives are as follows The exact closed forms of maximum likelihood estimators are not possible, so the estimates α, β, δ, and λ of parameters α, β, δ and λ, respectively are obtained by analytically solving the above four nonlinear equations.Solving the nonlinear system of equations is conveniently possible by quasi-Newton algorithm.
T be the TSM-score vector, then by definition, the TSM-expected information for θ can be computed as I θ = E D θ D T θ .Thus, the elements I θiθj = E D θi D T θj from the matrix are derived, shown in Appendix, and it is observed that the matrix is not singular.In particular, the diagonal elements of the inverse Fisher information matrix can be taken to obtain the standard errors of the parameter estimates.Under general regularity conditions, the asymptotic distribution of θ − θ is multivariate normal N 4 0, I −1 θ .Consequently, the approximately multivariate normal distribution for θ can be used to obtain the two sided confidence intervals for the parameters in θ.Furthermore, likelihood ratio (LR) statistic can be used to compare the TSM distribution with its special model.Let the as consider the partition θ = θ T i , θ T r T , where θ T i = (α, β, δ), and θ T r = (λ).The LR statistic to test the null hypothesis H 0 : λ = 0 versus the alternative hypothesis H a : λ = 0 is given by w = 2 θ − θ , where θ , and θ are the estimates under the restricted and unrestricted log-likelihood, respectively.Moreover, the sub-matrix of the full information matrix, when λ = 0 coincides with the SM-information matrix.In this case, the columns of the matrix are linearly independent and none of the column is of 0s.In this case, it also leads to a nonsingular information matrix.

Simulation Study
A simulation study has been carried out for two purposes: first, to investigate the precision and accuracy of the estimates; second, to explore the impact of sample size on estimation techniques.Keeping this in mind, we present empirical analysis based on simulated data; the generation of the TSM distribution can be easily obtained through the derived result (10).The data is simulated using the R-language, assuming different sample sizes, n ∈ (25, 50, 100 and 200), and assuming different values of each parameter.Each sample is repeated 1000 times.For each estimate θ = α, β, δ, λ , we computed the bias and the mean square error (MSE), respectively as Revista Colombiana de Estadística 40 (2017) 1-27 The results are presented in Tables 1 and 2. These tables are self-explanatory.In general, the accuracy and efficiency is attained as the sample size increase.patients, and the second data set is the Pakistani annual household expenditure data.In order to compare the two distribution models, we consider AIC (Akaike information criterion), AICC (corrected Akaike information criterion), and BIC (Bayesian information criterion).The best fitted distribution for the data always has the lowest value of the −2 , AIC, AICC, and BIC.Herein where k is the number of parameters in the statistical model, n is the sample size, and is the maximized value of the log-likelihood function in the model under considered.
The first data set represents the remission times (in months) of a random sample of 128 bladder cancer patients.This data set is reported in Lee & Wang (2003).Table 3 shows parameter estimations for each one of the five fitted distributions for this data set and the values of AIC, BIC and AICC values.The values in Table 3 indicate that the TSM distribution is a strong competitor to other distributions, those are considered here.The variance-covariance matrix of the MLEs under the TSM distribution for this data set is computed as Thus, the variances of the MLE of α, β, δ, and λ are var(α) = 0.03575 , var( β) = 173.78411,var( δ) = 1.69406 and var( λ) = 0.58391.Therefore, 95% confidence intervals for α, β, δ, and λ are [1.042173,1.783376], [0, 43.0642], [0.4.781056] and [−1, 1], respectively.The density plot over the empirical histogram, cdfs of the fitted models over the empirical cdf and PP-plots are presented in Figure 5, Figure 6 and Figure 7, respectively to compare the TSM and SM models.All the criteria showed that the TSM model provided good fit.The household expenditure data is a good tool to measure the living standards and consumption patterns in a society.The best fit distribution provides reliable knowledge about data patterns, to make policies those lead society in the direction of development.In this study, we used the average monthly household expenditure data from the Household Integrated Economic Survey (HIES) for 2010-2011 conducted by Pakistan Bureau of Statistics annually.HIES provides reliable data about the expenditure patterns of people of Pakistan at national level.The summary statistics of this data set is given in Table 4.
The MLE parameter estimates, AIC, AICC, BIC and KS-test statistic values corresponding to the fitted models for the expenditure data set are presented in Table 5.The results indicate that the TSM distribution provides a better fit than the parent distribution.The likelihood ratio test statistic is also computed to test the hypothesis H 0 : λ = 0 versus H 1 : λ = 0, and we obtain the statistic 486.76 with a p-value of almost 1.Therefore, the test statistic does not support the null hypothesis and leads us to conclude that the TSM model is the better fitted model.Thus, the standard deviation (sd) of the MLE for estimates and λ are sd(α) = 0.51936, and sd( β) = 177.33528,sd( δ) = 0.02436, sd( λ) = 0.10997, respectively: therefore, 95% confidence intervals for the α, β, δ and λ are [3.83943, 4.04302], [5188.248, 5883.402],[0.80352, 0.89902] and [0.30195, 0.73304] respectively.Figure 8 presents the density over the histogram of the data and Figure 9 shows the fitted cdf of the TSM and SM distribution on the empirical cdf of the expenditure data.The PP-plots for the both the distributions are given in the Figure 10 for the observed expenditure data set.All three plots indicates that the TSM distribution models the data better than the SM distribution, so we can suggest the proposed distribution to model the such kind of data sets.The first four moments and moment ratios are presented in Table 6.It is noticed that the mean (1st moment) is highest in the LH-moments and lowest in LL-moments case.The reason for this is that the LH-moments and LL-moments are introduced to present the high and low parts of the data, respectively.The variation (2nd moment) in the case of TL-moments and L-moments is lowest and highest, respectively because TL-moment trimmed the extreme values of the data but L-moments is based on the full data.In the same way, we can interpret the value of the CV .It can also be observed that the Sk and Kr are high for LH-moments, which is due to trimming of the lower value from the data.

Conclusions
The proposed transmuted Singh-Maddala distribution is the generalization of the Singh-Maddala distribution.The main motivation to generalize a standard distribution is to provide a more flexibile distribution that will demonstrate the behavior of the hazard function as it necessitates for survival analysis.To show the flexibility of new density, the plots of the pdf and cdf have been presented.We have derived moments and other basic properties from the proposed distribution.One of the interesting points is that it has a upside down bathtub-shaped hazard function with the other shapes.The densities of the lowest, highest, rth order statistics, the joint density of order statistics, and TL-moments have also been studied.The parameter estimation is obtained by the maximum likelihood estimation using a Newton-Raphson approach.Here, five goodness of fit criteria are considered to select the most appropriate model.In terms of all of these criteria and the results of the real life data set, we found that the transmuted Singh-Maddala distribution is superior to its parent distribution.Finally, we hope that the proposed model will be more useful for income distribution, actuarial, meteorological and survival data analysis.

Figure 5 :
Figure 5: Estimated densities and empirical histogram for the remission times dataset.

Figure 6 :Figure 7 :
Figure 6: Empirical, fitted TSM, and SM cdf for the remission times dataset.

Figure 8 :Figure 9 :
Figure 8: Estimated densities and empirical histogram for the household expenditures dataset.

Figure 10 :
Figure 10: P-P plots for fitted TSM and SM distribution for the household expenditures dataset.

Table 1 :
Average estimates for various choices of the parameters and sample size.

Table 2 :
Mean square errors for various choices of the parameters and sample size.

Table 3 :
Estimated parameters of the TSM, SM, Beta Pareto, Exponentiated Pareto and Pareto distribution for the remission times dataset.

Table 5 :
Estimated parameters of TSM and SM distribution by MLE.

Table 6 :
First four L-, TL-, LL-and LH-moments for the household expenditure data.