Estimation of finite population mean under PPS in presence of maximum and minimum values

1 Statistics Department, Faculty of Science, King Abdul Aziz University, Jeddah 21551, Saudi Arabia 2 Department of Mathematics, Université de Caen, LMNO, Campus II, Science 3, 14032 Caen, France 3 Department of Statistics, Government College University Lahore Lahore Pakistan 4 Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan 5 Department of Statistics, Govt. S.A Postgraduate College Dera Nawab Sahib, Bahawalpur, Punjab 63100, Pakistan 6 The Higher Institute of Commercial Sciences, Al Mahalla Al Kubra, 31951, Algarbia, Egypt 7 Department of Mathematics, College of Science and Humanities in Al-Kharj, Prince Sattam bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia 8 Department of Mathematics, Faculty of Science, Mansoura University, Mansoura 35516, Egypt


Introduction
In the survey sampling literature, researchers have attempted to obtain estimates for population quantities such as mean, total, median, etc., that possess maximum statistical properties. For this purpose, a representative part of the population is needed. When the population of interest is homogeneous, then one can use a simple random sampling scheme for selecting units. On the other hand, when sampling units vary considerably in size, then units may be selected with probability proportional to size (PPS). Probability proportional to size sampling, usually called as PPS sampling, is an unequal probability sampling scheme, in which the probability of selection for each sampling unit in the population is proportional to an auxiliary variable. Let Y be a variable under consideration and X be a supplementary information. For instance, let us consider that we want to estimate the population in the villages in a particular district. Then we would select a variable on which we have information as the auxiliary variable, e.g., (a) Size of each village in the district (correlated with one study variable = 0.75, say). (b) Number of households in each town in the district (correlation with a study variable = 0.95, say). Based on the above information, we would select the ancillary variable that has the maximum correlation with the study variable. Thus the variable at (b) may be more useful as auxiliary variable when selecting a sample using probability proportional to size with replacement sampling.
Similarly, surveys in relation to income of households may differ in sizes; for a medical survey related to the number of patients, health units may vary in sizes. Similarly, in an agriculture context, fields may vary in sizes. Villages with larger geographical areas are likely to have large populations and covered large areas under food crops (see [21]). The number of persons in the previous period may be taken as a measure of size related to surveys of socio-economic characters, which are likely to be related to population (see [14]).
The use of auxiliary information can be used either at selection stage or at estimation stage, or at both stages. [15] proposed alternative estimators in PPS sampling for multiple characteristics. [22] proposed the regression type estimator with PPS sampling. The readers are also referred to the papers by [2,16,18,20], and the references cited therein. The use of auxiliary information may increase the precision of estimators of the unknown population parameters such as population mean, variance, correlation coefficient, etc. Some common estimators that utilize the information about the auxiliary variable are highly correlated with the study variable. When the correlation between the study variable and the auxiliary variable is high, in such situation, the rank of the auxiliary variable is also correlated with the study variable.
In this article, we propose ratio, product and regression type estimators for estimating the finite population mean under PPS sampling scheme, using maximum and minimum values. Consider a finite population U = {1, 2, ..., N}. Let y i and (x i , z i ) be the values of the study variable (y) and the auxiliary variables (x, z), respectively. Let r xi be the rank of the auxiliary variable corresponding to rank of x, i.e., (R x ).
Let a sample of size n is selected with probability proportional to size z i with replacement (PPSWR), i.e., Suppose that u i and v i are the study and auxiliary for the PPS sampling. Let v * i denote the rank of v, that is such that Many real data sets carry unexpected large (y max ) or small (y min ) values. In estimation of finite population mean, the results will be sensitive when such types of values occur. Under these circumstances, when there exist y max and y min , then the results will be either overestimated or underestimated. To handle such type of situation, [17] suggested the following unbiased estimator for the estimation of finite population mean using maximum and minimum values: if samples contain y min but not y max y − c, if samples contain y max but not y min y, for all other samples.
The variance ofȳ s is given by where S 2 y is the population variance, and c is a constant. The optimum value of c is The minimum variance ofȳ s is specified by which is always smaller than the variance ofȳ. The usual ratio estimator under probability proportional to size (PPS) isȳ The bias and MSE ofȳ R(pps) up to the first order of approximation are given by and The usual product estimator under PPS is given bȳ The bias and MSE ofȳ P(pps) up to first order of approximation are given by and respectively. The usual regression estimator for estimating the unknown population mean under PPS sampling scheme isȳ , are the sample regression coefficients.
The MS E ofȳ lr(pps) up to first order of approximation is obtained as

Suggested estimators
Following the lines of [17], we propose a ratio, product and regression type estimators under PPS sampling utilizing the auxiliary variable along with rank of the auxiliary variable having stronger correlation with the study variable. We also incorporate the minimum and maximum values of the study and the auxiliary variables.

First situation
When the correlation between the study variable and the auxiliary variable is positive, then, for the selection of larger value of the auxiliary variable, a larger value of the study variable is to be selected. And, for the selection of the smaller value of the auxiliary variable, a smaller value of the study variable is to be selected. To utilize such type of information, we suggest the ratio type estimator using the auxiliary variable and rank of the auxiliary variable aŝ for all other samples The regression type estimator iŝ If the sample contains u min and If the samples contain u max and (v max , v * max ), thenū c 11 =ū,v c 21 =v,v * c 31 =v * , for all other samples (here, we mean that, if we can take any value of sample, the ratio estimator gives us good result in term of MSEs as compared to the usual ratio and product estimators using two auxiliary variables.)

Second situation
While in this situation, when the correlation between the study variable and the auxiliary variable is negative, then, for the selection of larger value of the auxiliary variable, the smaller value of the study variable is to be selected. And for the selection of the smaller value of the auxiliary variable, the larger value of the study variable is to be selected. In such situation the proposed product type estimator using the auxiliary variable and rank of the auxiliary variable (x) is given bŷ The regression type estimator iŝ . If the samples contain u max and (v min , v * min ), then (ū c 12 =ū,v c 22 =v,v * c 32 =v * ) for all types of samples. Also c 1 , c 2 , c 3 are unknown constants. To obtain the biases and mean squared errors, we use the following relative errors terms and their expectations: such that E(e 0 ) = E(e 1 ) = E(e 2 ) = 0, Expressing ( Taking expectation on both sides of (2.8), we have

Squaring (2.8), and then taking expectations, we have
Differentiating (2.9) with respect to c 1 , c 2 , and c 3 , we have Substituting the optimum value of c 1 , c 2 and c 3 in (2.9), we get the minimum MS E of (Ŷ R(pps) ) given by Similarly, the bias and minimum MS E of product estimator in PPS sampling scheme is given by
Similarly, the minimum MS E ofŶ lr1P(pps) in case of negative correlation is (2.13) A general form for the MS E for the situation of both positive and negative correlation between the study and the auxiliary variable is given by (2.14)

Comparison of estimators
In this section, we compare the proposed estimators with usual ratio, product and regression estimators under PPS sampling scheme.

Condition (iii)
By (1.9) and (2.14), it comes We observe that the proposed estimators perform better than the existing estimators if above conditions (i)-(iii) are satisfied.

Empirical study
We consider four real data sets for numerical comparisons which are described below. Population 1: [Source: [14]] y=Cultivation wheat in the region during 1964. The results are given in Table 1. In Table 1, we observed that (MS Es) of the proposed estimators are smaller than the corresponding existing estimators, for all four populations. The performance of the regression estimator is the best among all other estimators.

Conclusions
In this paper, we have proposed ratio, product, and regression type estimators in presence of maximum and minimum values using the auxiliary variable and rank of the auxiliary variable X under PPS sampling scheme. The bias and mean squared error of the proposed estimators were derived under the first degree of approximation. Based on the theoretical and numerical investigations, it is observed that the proposed estimators are more efficient than the corresponding existing estimators for all populations which are used here. The performance of the suggested regression estimator is the best than existing estimators in terms of MSEs. Categorically, we recommend the use of our proposed estimators over the existing estimators considered in this paper for the new survey for estimating the finite population mean under probability proportional to size.