OPTIMUM STRATIFICATION FOR STRATIFIED PPSWR SAMPLING DESIGN UNDER A MODEL BASED ALLOCATION

In this article we consider the problem of finding optimum points of stratification (OPS), based on an auxiliary variable which is highly correlated with study variable, for a model-based allocation, under stratified probability proportional to size with replacement (PPSWR) sampling design. The known model-based allocation used here is an available allocation in earlier literature which was obtained by the authors of the same under a superpopulation model in PPSWR sampling design. Therefore, in this paper, we use the same sampling design and superpopulation model used by them in dealing with problems of finding OPS. Equations for obtaining the OPS have been obtained. It is fascinating to discover that of all hitherto stratification methods developed under PPSWR sampling design for stratifying heteroscedastic populations which are available in literature, our proposed method has come out to be the most brief and easiest to use as OPSs, by this proposed method, are given by geometric means of means of consecutive strata. Although this method is application friendly, it is implicit; therefore alternative methods of finding approximately optimum points of stratification (AOPS) have also been obtained. The efficiencies of all the proposed methods of stratification are examined by using two randomly chosen live populations. The proposed methods of stratification are found to be efficient and suitable for practical applications. 8343 STRATIFICATION FOR STRATIFIED PPSWR SAMPLING DESIGN


INTRODUCTION
One of the problems in stratified sampling is the determination of optimum points of stratification (OPS) by minimizing the variance of an estimate for a population parameter. For stratification based on estimation variable , the problem of determining OPS was first worked out mainly by Dalenius [1], Dalenius and Gurney [2], and Hayashi, Maruyama and Ishida [3]. In practice information on the study variable is not available but information on some highly correlated auxiliary variable is available. Cochran [4] demonstrated that when information on an auxiliary variable which is highly correlated with the study variable is available, a superpopulation model could be constructed in which a finite population under consideration could be deemed as a random sample from an infinite superpopulation having the same characteristics as that of infinite superpopulation. Based on this, Hanurav [5] and Rao [6] were the first to use auxiliary information to allocate sample size to strata whereas Taga [7] and Singh and Sukhatme [8] were the first to deal with problem of optimum stratification based on the auxiliary variable. The minimal equations for obtaining OPS for various allocations based on auxiliary variable were worked out by many workers among which Taga's [7] work is considered to be the most noteworthy till that time. Singh and Sukhatme [8] obtained equations giving OPS and approximately OPS for Tschuprow [9] -Neyman [10] optimum allocation (TNOA) and proportional allocation for the construction of strata. Under simple random sampling scheme, Singh and Prakash [11] considered the problem of optimum stratification on auxiliary variable for equal allocation. Yadava and Singh [12] too obtained equations giving OPS and approximately OPS for allocation proportional to stratum totals under the same sampling design. Singh and Sukhatme [13] considered the problem of optimum stratification on an auxiliary variable for TNOA when the samples from different strata are selected with probability proportional to size with replacement (PPSWR); Singh [14] considered the same problem for proportional and equal allocations. PPSWR within each stratum, the allocation is given as (1) ℎ ∝ √ ℎ ( ℎ − 1) ̅ ℎ under the following superpopulation model considered by them: where , 2 are superpopulation parameters with σ 2 > 0 and , , denote the conditional expectation, variance and covariance given respectively. Gupt [16,17] considered problem of sample size allocation in simple random sampling with and without replacement under a more general superpopulation model defined by ( | ) = + , ( | ) = 2 and existence of intra-stratum correlation coefficient, and he obtained several allocations. Gupt and Ahamed [18] considered the problem of optimum stratification for a generalized auxiliary variable proportional allocation (GAVPA) obtained by Gupt [16,17].
Finite population taken from a superpopulation is considered to be large, therefore, we assume ℎ ≈ ℎ − 1 ∀ ℎ = 1, 2, 3, …, L, where L is the number of strata into which the population is divided, and hence the allocation given by (1) reduces to (3) ℎ ∝ ℎ √ ̅ ℎ . In this article we deal with the problem of obtaining OPS and approximately optimum points of stratification for the allocations given by (3)  Numerical illustrations by using two randomly chosen live populations along with discussion on the efficiencies of the proposed methods of stratification are given in Sub-section 3.1 and conclusion is given in Section 4.

Equations Giving Optimum Points of Stratification.
Considering a finite population of size divided into number of strata of sizes ℎ , ℎ = 1, 2, … , ., the allocation given by (3) can be written as Gupt and Rao [15] obtained the following expression under the superpopulation model (2): When ℎ ≈ ℎ − 1 ∀ ℎ, we get The variance of estimator of population total for stratified PPSWR sampling design is given by Taking conditional expectation of (7) which is variance of estimator of the population total, we have is the matrix of values in the population.
By taking (4), (6) and (8), we obtain We minimize [ (̂| )] by differentiating (9) partially with respect to ℎ , where ℎ is the demarcation point between ℎ ℎ and (ℎ + 1) ℎ strata, and then equating to zero as follows: While differentiating partially with respect to ℎ , all other terms vanishes except the ℎ ℎ and (ℎ + 1) ℎ terms and, therefore, (10) reduces to , then, we can obtain the following: By taking the first term on the LHS of (11) and using (12), we get Similarly, taking the second term on the RHS of (11) and again using (12), we get Substituting (13) and (14) in (11) we obtain the solutions as Equations (16) give OPS of the study variable y based on the auxiliary variable .
It is observed that the stratification method (16) gives OPS as the geometric mean of two consecutive strata means. It may be noted that in case of stratified simple random sampling design used with proportional allocation, the OPS is obtained as the arithmetic mean of two consecutive strata means (Murthy, [19]).
Hence we get the following theorem: and is the number of strata.

Derivation of Alternative Methods of Finding Approximately Optimum Points of
Stratification. Although the equations (16) give the OPS, these equations comprise population parameters which are the functions of OPS. Because of the implicit nature of the equations, there is some difficulty in finding exact solutions. Therefore, we obtain alternative methods for finding approximate solutions to equations (16). We follow the technique of Singh and Sukhatme [8] of using Ekman's [20] identity in obtaining series expansion of conditional mean.
We assume that the functions ( ) possess various partial derivatives for all x in the range Thus, we obtain where the function and its derivatives are evaluated at = ℎ in the interval ∈ [ ℎ , ℎ+1 ] and = ℎ+1 − ℎ , = ℎ + 1.
Considering a large number of strata whose strata width ℎ are very small, the higher powers of ℎ in the expansion can be neglected.
Putting (18) and (19) in equations (15)  Using (21) in (20), we obtain Using either the relation (23) or (24) we can obtain the AOPS corresponding to (16). To find the AOPS i.e. ℎ ′ by using (23) in (a,b), for a given number of strata, taking equal intervals on the cumulative of √ ( ) 3 gives approximately optimum points of stratification of the variable . In sub-section 2.2 we have proved that the methods of approximation (23) and (24) are equivalent and therefore, we are free to use any of the two. Here we use (24) in our illustration.

Numerical Illustration for the Proposed
On applying linear least square regression technique between the response variable and explanatory variable in population P1, we get the coefficient of determination =0.963817, intercept, =8.384, β = 1.157. This shows that the auxiliary variable x is highly correlated with study variable y in population P1. Applying the same technique in population P2, we get the coefficient of determination =0.966078, intercept, =5.950, β = 5.874. This also shows that the auxiliary variable x is highly correlated with study variable y in population P2.
For stratifying population P1 using (24), it is required to determine a Probability Density Function (PDF) that the auxiliary variable of the population P1 follows. To fit a suitable PDF in the population, we use the data of variable. Using the fitdistrplus package in R-software, we fit a number of known PDFs in the data of variable whose each of the values is divided by 100. The PDFs are fitted using the methods Maximum Likelihood Estimation (MLE), Moment Matching Estimation (MME) and Quantile Matching Estimation (QME) one after another. For stratifying population P2 using (24), we proceed in the same manner as above and, coincidentally, we find that the log-normal probability density function fit best to the data of population P2 and thus the PDF followed by the variable of the population P2 is also given by to the two methods for the two live populations P1 and P2 are given in Table 1 and Table 2 respectively. From the above table, we find that the equations (16) work with higher efficiency for = 5 and with much higher efficiency for = 2, 3, 4 and 6 when compared with that of equal interval stratification. Likewise, the approximation method (24) performs with higher efficiency for = 2 and with much higher efficiency for = 3, 4, 5 and 6 when compared with that of equal interval stratification. It is also seen that the equations (16) perform well than the alternative method (24) for = 2, 3 and 6, same as the alternative method (24) for = 4. However for = 5, the alternative method (24) performs better than the equations (16). From the above table, we find that both the equations (16) and alternative method (24) work with higher efficiency for = 2, 3, 4 , 5 and 6 when compared with that of equal interval stratification. It is also seen that the equations (16) perform slightly better than the alternative approximation method (24) for = 2, 3 , 4 and 6 . However for = 5 , the alternative approximation method (24) performs better than the equations (16).
Overall, we find that the equations (16) as well as the approximation method (24) perform well in stratifying the live populations.

CONCLUSION
In this paper, our proposed methods of stratification under PPSWR design are found to stratify population with high efficiencies when illustrated in randomly chosen live populations. The equations (16) for obtaining OPS and the alternative method (24) for obtaining AOPS perform with higher efficiency for all strata when compared with that of equal interval stratification. It is, therefore, empirically as well as analytically justified that the both the methods can be applied to efficiently stratify populations. The specialty of the proposed stratification method (16) is that it gives OPS by geometric means of means of consecutive strata. Both the methods are user friendly although method (16) is implicit. The methods are convenient for practical applications in stratifying heteroscedastic populations effectively when samples are to be selected from strata with probability proportional to size with replacement. It is also worthy to mention that of all stratification methods under PPSWR sampling design available in literature so far, our proposed methods are the easiest, shortest and user friendly, besides their good efficiencies.