MODELING PROPORTION OF SUCCESS IN HIGH SCHOOL LEAVING CERTIFICATE EXAMINATION-A COMPARATIVE STUDY OF INFLATED UNIT LINDLEY AND INFLATED BETA DISTRIBUTION

unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract. In this article, we first introduced the inflated unit Lindley distribution considering zero or/and one inflation scenario and studied its basic distributional and structural properties. Both the proposed distributions are shown to be members of exponential family with full rank. Different parameter estimation methods are discussed along with supporting simulation studies to check their efficacy. Proportion of students passing the high school leaving certificate examination for the schools across the state of Manipur in India for the year 2020 are then modeled using the proposed distributions and compared with the inflated beta distribution to justify its benefits.


INTRODUCTION
In the field of applied statistics, one of the most common issues which the researchers have to deal with is data arising in terms of fractions, rates or proportions, i.e., variables which assume values in the range (0, 1). However, in many cases, the data arising may contain zeroes and/or ones, i.e., the one observed in the intervals [0, 1), (0, 1] or [0, 1]. In such cases, continuous distributions such as Beta distribution, Kumaraswamy distribution [8] or unit-Lindley [10], all of which have support in (0, 1) are not suitable for modeling the data. We need probability models which are able to capture the probability mass at 0, 1 or both. Such distributions are obtained by mixing a continuous distribution having support on (0, 1) with a degenerate distribution whose probability mass is concentrated at either 0 or 1 (for data arising in [0, 1) or (0, 1]) and with the Bernoulli distribution which assigns non-negative probability to 0 and 1 (for data arising in [0, 1]) [ [11] and [12]].
The idea of zero inflated continuous data modelling was first reported in [1] where in the authors dealt with the case of a continuous distribution which has a non-zero probability of assuming a zero value. That is, a non-zero probability mass at zero. Such distributions are mainly termed as zero inflated version of the original distribution and are obtained as a mixture of degenerate probability mass at zero with the underlying distribution. The terminology of zero inflation is common place in the case of count distributions. Alternatively, the word zero Inflated is often replaced by zero Adjusted or zero Spiked, etc. in the literature. Examples of occurrence of zero inflated continuous variables can be seen in many areas of application (for detail see [13], [6], [9], [5] and references there in). Equally important is the situation where the observation from continuous data contains zeros and/or ones. That is, when there is situation of one-inflation or zero-one-inflation. To model such phenomenon, the probability mass at 0, at 1 or both, is included by considering mixture of continuous distribution with degenerate distribution in the case of zero or one inflation and Bernoulli in case of zero and one inflation is used. The first such was attempt seen in inflated beta distribution proposed and studied by [12].
The main objective of this article is to consider the inflated version of the recently introduced unit Lindley distribution and briefly investigate its basic distributional properties. We consider a data in (0,1) with trace of significant zero, one inflation and analyze the data with unit Lindley and its inflated variants to establish the importance of the proposed model.
High school leaving certificate (H.S.L.C.) examination is one of the most important milestones for the individual schools in particular and for the educational scenario of a state in general. The result of this examination declared in terms of the pass percentage of students is an important indicator, reflecting the state of affair both at the micro as well as in macro level. It is not surprising, therefore, that many of the state's education department has incentive for better show in this examination while poor show often brings penalties for the schools. It's thus relevant to investigate the statistical modelling aspect of the school-wise pass proportion. We considered the data on result of H.S.L.C. examination from the state of Manipur in India for the academic session 2020 in this article. These data sets are available in the public domain ( [3], [2] and [4]) and are easily accessible. In these data sets, pass percentages are given for all the schools under the Manipur secondary education board and classified with respect to three types of schools namely the, Government, Government Aided and private schools. They also provide classification of the pass results in three divisions as first division, second division and third division.
In this paper, the zero-or/and -one inflated unit Lindley distribution is proposed. The paper is organized as follows -Section 2 introduces the zero-or-one inflated unit Lindley distribution and some of its distributional properties, estimation of its parameters are discussed. In Section 3, the zero-and-one inflated unit Lindley distribution is presented and some of its properties are discussed. In the next section parameter estimation for both the distributions of preceding section is presented. Section 5 evaluates the performance of the proposed estimators through extensive simulation studies. Section 6 contains empirical applications of the proposed inflated unit Lindley distribution in comparison to the inflated beta distrbution for four data sets. The paper is concluded with some final remarks presented in Section 7.

THE ZERO-OR-ONE INFLATED UNIT LINDLEY DISTRIBUTION
The unit Lindley distribution is a one parameter continuous distribution having support on (0, 1) which is obtained from the Lindley distribution through a transformation [10]. It has certain advantages over the commonly used beta distribution with two parameters defined in the range (0, 1) [7] such as closed form of c.d.f., quantile function and simple expressions for moments. It scores over the competing Kumaraswamy distribution with respect to the fact that there is no closed form of the moments of this distribution. This distribution also enabled the development of a new bounded regression model which is a feasible alternative to the popular Beta regression model. The unit Lindley distribution with parameter θ has the p.d.f.
The real-life data may include values such as zeroes and/or ones. In such cases, one needs to focus on incorporating a discrete component into the continuous data generating process so that the values zeroes and/or ones are observed with a positive probability. We, thus, consider a mixture of two distributions: the continuous unit Lindley distribution on (0, 1) and a degenerate distribution having the entire probability mass concentrated at the known point c, where c = 0 or c = 1. We refer to it as the data being inflated (having higher probability of occurrence) at one/both endpoints of the standard unit interval. The c.d.f. of the mixture distribution, known as the Inflated unit Lindley distribution is given by where I A is the indicator function which takes the value 0 if y ∈ A and 1 if y / ∈ A, 0 < α < 1 is the mixture parameter and F (; θ ) is the c.d.f. of the unit Lindley distribution with parameter θ .
Here, the r.v. Y follows the unit Lindley distribution with parameter θ with probability (1 − α) and the degenerate distribution at c with probability α. The p.d.f. of the Inflated unit Lindley distribution corresponding to the c.d.f in (2) is given by where f (; θ ) is the unit Lindley density in (1) and α ∈ (0, 1) is the probability mass at c, representing the probability of observing 0 (when c = 0) or 1 (when c = 1).  The r th raw moment of Y is is the r th raw moment of the unit Lindley distribution. In particular, the mean and variance of Y are where Ei (1, θ ) represents the exponential integral function [10]. c. θ is seen to control the shape of the probability curve, whose skewness increases with an increase in the value of θ . In all the sub plots in figure (1), the vertical bar with a circle above represents α = 0.5 (P (Y = 0) or P (Y = 1)). Further, both the ULZI and ULOI distributions have the same functional shape in (0, 1), the only difference being in the mass point, which is 0 for the ULZI and 1 for the ULOI distribution. Proof: Denoting We can rewrite the p.d.f. of the zero-and one-inflated unit Lindley distribution given in the equation (3) as Now taking T (y) = (T 1 (y), T 2 (y)) , the p.d.f. can finally be expressed as Note that the functions B * (η) is a real valued function of η 1 , η 2 ,h(y) is a positive real valued (3) belongs to a two parameter exponential family distribution of full rank.

THE ZERO-AND-ONE-INFLATED UNIT LINDLEY DISTRIBUTION
The zero-or-one inflated unit Lindley distribution introduced in the previous section is suitable for modeling data which has data inflation on either of the two end points of the standard unit interval (0, 1) , but not on both the end points. To model data arising in the interval [0, 1], we need a mixture of the unit Lindley distribution and the Bernoulli distribution, which assigns non-negative probability to the points 0 and 1. The c.d.f. of the mixture distribution, known as the zero-and-one inflated unit Lindley distribution (ULZOI) is given by where y ∈ [0, 1], Ber (; p) represents the c.d.f. of a Bernoulli r.v. and F (; θ ) is the c.d.f. of the unit Lindley distribution with parameter θ . Further, α is the mixing parameter which lies between 0 and 1. We say that a r.v. Y assuming values in [0, 1] has a ULZOI distribution with parameters α, p and θ if its density function with respect to the c.d.f. in (6) is given by Consequently, the mean and variance of Y are where Ei (1, θ ) represents the exponential integral function. Figure (2) presents the ULZOI density for different values of θ for α = 0.7 and p = 0.5. It is evident from this plot that as the value of θ increases, the skewness also increases and also, for higher values of θ , the probability curve of ULZOI distribution approaches the reverse sigmoid curve. Result2 The zero-and-one-inflated unit Lindley distribution in (3) is a three parameter exponential family distribution of full rank.
Proof: Denoting We can rewrite the p.d.f. of the zero-and one-inflated unit Lindley distribution given in the equation (7) as Note that the functions B * (η) is a real valued function of (η 1 , η 2 , η 3 ),h(y) is a positive real valued function. The transformation from (α, p, θ ) to (η 1 , η 2 , η 3 ) is obviously one-one from Also neither the T s nor the ηs are linearly related. Hence the p.d.f. in (7) belongs to a three parameter exponential family distribution of full rank.

ESTIMATION AND RELATED
In this section, the maximum likelihood estimation (MLE) of the parameters and construction of the Fisher Information Matrix is considered.
4.1. MLE: Zero or One inflated ULD. The likelihood function for ν = (α, θ ) based on the random sample y = (y 1 , y 2 , . . . , y n ) from iuL c is given by It can be seen that the likelihood function is factorized into two terms L 1 and L 2 where L 1 depends on α and L 2 only on θ . Now, The log-likelihood function of the Zero-or-One Inflated Unit Lindley distribution is given by The score function is then obtained by differentiating the log-likelihood function and is denoted by , the proportion of n values that are equal to c and the maximum likelihood estimator of θ iŝ The Fisher Information matrix for the Zero or One inflated Unit Lindley law is Supposeν = α,θ denote the m.l.e. of ν = (α, θ ). In large samples,ν is asymptotically Using this result, approximate confidence intervals for the parameters α and θ can be constructed. Let δ ∈ (0, 0.5). Then (1 − δ ) × 100% asymptotic confidence intervals for α and θ are given respectively byα where I {0,1} (y) is an indicator function such that Here, it can observed that the first term depends only on α, the second term depends only on p and the third term depends only on θ . The likelihood function for ν = (α, p, θ ) on the random sample (y 1 , y 2 , . . . , y n ) from the ulzoi (y; α, p, θ ) distribution is given by The corresponding log-likelihood function is l (ν) = log L (ν, y) = l 1 (α; y) + l 2 (p; y) + l 3 (θ ; y) where The score function is denoted by The Fisher Information matrix for the Zero and One inflated Unit Lindley law is Supposeν = α,p,θ . In large samples,ν is asymptotically normally distributed, i.e.,ν D − → N 3 ν, k (ν) −1 where k (ν) is the Fisher Information Matrix. Let δ ∈ (0, 0.5). Then (1 − δ ) × 100% asymptotic confidence intervals for α, p and θ are given respectively bŷ where the symbols have their usual meaning.

ASSESSMENT OF ESTIMATORS: SIMULATION STUDY
In this section, a Monte Carlo simulation study is conducted for the purpose of evaluation and respectively. To simulate n observations from ULZI (α, θ ) distribution, the following algorithm was implemented: Algorithm to generate from ULZI (α, θ ).
Step 3. If U i ≥ α, then we draw a random number from the Lindley(θ ) distribution, say x i and Observations are simulated from the ULZOI (α, θ , p) distribution using the following algorithm: Algorithm to generate from ULZOI (α, θ , p).
If U i ≤ α, then we assign y i = 1.
Step 3. Otherwise, we draw a random number from the Lindley(θ ) distribution, say x i and assign y i = x i 1 + x i .

The performance evaluation of the estimates was done based on the estimated bias and Root
Mean Square Error (RMSE). Table 1 and Table 2 present the simulation results for the ULZI distribution and ULZOI distribution respectively.    Table 1 shows that for the ULZI distribution, the bias corrected estimate of both θ achieves substantial bias reduction over the conditional mean estimate and maximum likelihood estimate whereas the RMSE of MLE and BCMLE are smaller than those of CME. Both the bias and RMSE decreases with an increase in n. For moderately large and large sample sizes, the bias of α is seen to be negative and the RMSE of α also decreases with an increase in n.
From Table 2, it is evident that for the ULZOI distribution, for small and moderately large sample sizes, the bias of BCMLE of θ is negative and the RMSE of both MLE and BCMLE coincide and are less than that of the CME. The RMSE of the estimates of θ , α and p decrease with an increase in n. Further, the bias of p is seen to be negative for moderately large and large sample sizes.   It can be seen from Table 3 that the accuracy of the empirical confidence intervals increases with an increase in the sample size for the single inflation case for both the parameters. The coverage probabilities of both α and θ are also seen to be above 85% for all sample sizes when the value of θ is high, i.e. when θ = 7, whereas the coverage probability of θ is seen to be low (less then 80%) when true value of θ is low, i.e. θ = 0.25.

APPLICATIONS
In this section we consider real life data arising from High School Leaving Examination results of the State of Manipur, in India for the year 2020 ( [3], [2] and [4]). In both the cases, pertaining to government schools and 67 observations pertaining to aided schools. We compare our model with the famed inflated beta distribution [12].  We see from Table 5  test statistic is used to compare the fit of each of the distributions to the data sets. Table 6 displays the maximum likelihood estimates and standard errors of the parameters of the Zero       Table 7 clearly shows that for each of the four data sets, the K-S test statistic value for the ULZOI distribution is smaller than that for the ZOIB distribution and so, the zero-and-one inflated unit Lindley distribution is able to model these data sets better than the zero-inflated Beta distribution. Figure 7 It is evident from figures 7 to 10 that the zero-and-one inflated unit Lindley distribution is a better fit than the zero-and-one inflated Beta distribution to each of the four data sets. The same is confirmed by comparing the Kolmogorov-Smirnov test statistic values also. This is an obvious result as all the four data sets contain both zeroes and ones.

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests.