A new population mean estimator under non-response cases

Using the information in x, we consider a new estimator that uses an exponential function to estimate the unknown population mean of y in the case of non-responding units. These cases are divided into two categories as Case I and Case II. In Case I, non-response units are only available on y, whereas in Case II, non-response units are available on both x and y. The proposed estimators are derived from both scenarios, accordingly. The necessary comparisons are made theoretically and numerical study on the subject of education is carried out. We conclude that in both non-response schemes, the proposed estimators can be chosen in theory and also in applications, such as the education data.


Introduction
In sample surveys, efficient estimators can be used to obtain the unknown population parameters, such as variance, percentage, total and mean. The use of auxiliary variable information is a basic and common method when a new estimator is proposed. The study variable (y) can refer to successful students, while the auxiliary variable (x) can be the number of students per teacher, the ability of the teachers, the number of teachers, teaching methods, and so on. Different forms of estimators, such as ratio, regression, logarithmic, product and exponential, can be seen in the sampling theory when estimating the unknown population parameters using the information from x. The exponential type estimators, on the other hand, become prominent among others [1].
Some information about the various variables may not be fully available every time. Based on this situation, Hansen and Hurwitz [2] proposed a novel approach using the sub-sampling method. They considered the non-response units while estimating population parameters to reduce the effect of non-response and this technique is still popular in the sampling theory literature. The population size (N) consists of two unit groups as response unit N 1 and non-response unit N 2 (N 2 = N − N 1 ) in this technique. The sample size is determined by drawing n units from the population using the simple random sampling without replacement (SRSWOR) method. Here, only n 1 units are available as response units in this sample, whereas n 2 (n 2 = n − n 1 ) units are obtained as non-response units. The Hansen-Hurwitz method is used to obtain r = n 2 z , z > 1 from n 2 units with extra effort. Here, the value of r can be obtained differently using various z values to show the appropriateness of the proposed estimator for all combinations. In the final part of the technique, (n 1 + r) units can be used to estimate the unknown parameters. According to n 1 and r units, the sample means of y values are denoted asȳ 1 andȳ 2(r) , respectively.
Hansen and Hurwitz [2] proposed an unbiased estimator for the population mean using the following method as: The variance of the unbiased estimators is as follows: In t H estimator, w 2 = n 2 n is the weight of the non-response units for the sample while w 1 = n 1 n is the weight of response. In Equation (2) Y 2 and population mean of y is symbolized asȲ.
The case of non-response is divided into two categories as Case I and Case II. The non-response units are only available on y in Case I, whereas they are available on both x and y in Case II. For both approaches, the population mean of x is known.
In sampling theory, one of the most important aims is to estimate the unknown population parameter with an efficient estimator [3]. By this study, we think that we have made a significant contribution to the literature since we have used an exponential function in a new estimator proposed for the unknown population mean in the case of the non-response approach. Besides, the appropriateness of the proposed estimator is examined in detail via theoretical, numerical and simulation studies as well for both non-response approaches. In Section 2, the estimators in the literature for the non-response approach are given. After that, in Section 3, the proposed estimator is thoroughly analysed for Case I and Case II. In Sections 4 and 5, the theoretical comparisons and numerical study are presented, respectively. Then, the simulation study is conducted in Section 6. In the final section, the results are discussed.

Existing estimators in literature
Many estimators for estimating the population mean using the sub-sampling method proposed in the literature. Tables 1 and 2 present the ratio, regression and exponential estimators, as well as the MSE equations for these estimators, up to the first order of approximation, for both Cases I and II, respectively. In Table  1,ȳ * represents the sample mean of y under the nonresponse approach. Here, C 2 x = S 2 x X 2 , C yx = ρ xy C y C x and ρ xy represents the coefficient of the population correlation between x and y. Furthermore,x andX are the sample mean and population mean of x, respectively, andȲ is the population mean of y.
In Table 2,x * represents the sample mean of x under the non-response approach. Besides, C yx (2) X 2 and ρ yx (2) represent the coefficient of the population correlation between x and y for the non-response group.
According to the estimators in literature, there are some symbols in estimators and their MSEs. The values of α, d * 1 , d * 2 , k, η and δ represent the unknown constants whose optimum values are used to obtain the minimum MSE and s can take only the values of −1, 0 and 1. Besides, θ i = aX aX+b , φ = aX 2(aX+b) , and (a,b) are either real numbers or functions of the known parameters of X.

Proposed estimators
This section introduces a new estimator for estimating the population mean in the presence of nonresponding schemes. In Subsections 3.1 and 3.2, this estimator is examined for Cases I and II, respectively.

Case I
For Case I, we propose t * Ca1 estimator as Here, ω can only have 0 or 1 value to make the t *

Ca1
estimator ratio or product estimator. If ω takes the value of 1, it is a ratio type estimator; if ω takes the value of 0, it is a product type estimator. θ 1 and θ 2 represent the unknown constants whose optimum values are used later for the minimum MSE(t * Ca1 ). To obtain the Bias(t * Ca1 ), MSE(t * Ca1 ) and the minimum MSE(t * Ca1 ), the notations are used under the Case I as Singh et al. [5] t exp1 =ȳ * exp Olufadi and Kumar [6] t YK1 =ȳ * αexp Yadav et al. [7] t Y1 =ȳ * exp Pal and Singh [8] Pal and Singh [9] Dansawad [10] Singh and Usman [11] Sinha and Kumar [12] . Unal and Kadilar [13] Unal and Kadilar [14] follows: Using these notations, we rewrite the t * Ca1 estimator in Equation (3) as: We obtain the following result by expanding the right-hand side of the Equation (4) and ignoring two and higher powers of e * y and e x terms: We take the expectation of Equation (5) and derive the Bias(t * Ca1 ), respectively, as: For the MSE(t * Ca1 ), we take square both sides of the Equation (5) and then expectation, respectively, as follows:  (2) ) .
To simplify the mathematical notation, we can rewrite the MSE(t * Ca1 ) for the Case I as: where − 4ωE(e * y e x ) , The optimum values of θ 1 and θ 2 , θ * 1 and θ * 2 , are obtained using the MSE(t * Ca1 ) derivation as: We substitute the θ * 1 and θ * 2 values into the Equation (8) and then we obtain the minimum MSE(t * Ca1 ) for Case I as:

Case II
For Case II, we propose t * * Ca2 estimator as where θ 3 and θ 4 represent the unknown constants whose optimum values are used later for the minimum MSE(t * * Ca2 ). To obtain the Bias(t * * Ca2 ), MSE(t * * Ca2 ) and the minimum MSE(t * * Ca2 ), the notations are used under Case II as follows: (2) , Using these notations, we rewrite the t * * Ca2 estimator in the Equation (11) as: As in Case I, the Bias(t * * Ca2 ) is obtained by following the similar steps in Case II as well: Bias(t * * Ca2 ) =Ȳ (θ 3 + θ 4 − 1) For the MSE(t * * Ca2 ), we take square both sides of the Equation (13) and then expectation, respectively, as follows: We can rewrite the MSE(t * * Ca2 ) in Equation (16) for Case II to simplify the mathematical notations as: , , , The optimal values of θ 3 and θ 4 , θ * 3 and θ * 4 , are obtained, respectively, as follows: We substitute the θ * 3 and θ * 4 values into the Equation (17) and then we obtain the minimum MSE(t * * Ca2 ) for Case II as:

Efficiency comparisons
In this section, the proposed estimators, t * Ca1 and t * * Ca2 , are compared with several estimators in the literature in Subsections 4.1 and 4.2, respectively, to demonstrate the theoretical appropriateness for Case I and Case II, respectively.

Efficiency comparisons for the first case
We compare the MSE min (t * Ca1 ) with the MSEs of the estimators listed in Table 1 and obtain the following efficiency conditions for Case I as follows: Here, the MSE of the t reg1 estimator is equal to the MSEs of the t US1 , t YK1 , t PS1 , t Y1 , t (α 1 ,δ 1 ) and t (η,δ) estimators. For this reason, the efficiency conditions are similar to the conditions in (22) for these estimators.
Based on the condition results, we conclude that the t * Ca1 estimator is more effective than other estimators in the literature under the conditions between (19) and (26) for Case I.

Efficiency comparisons for the second case
We compare the MSE min (t * * Ca2 ) with the MSEs of the estimators listed in Table 2 and obtain the following efficiency conditions for Case II as follows: Here, the MSE of the t SK estimator is equal to the MSEs of the t PS2 , t UK , t (α,β) , t (α2,δ2) and t US2 estimators. For this reason, the efficiency conditions are similar to the conditions in the Equation (33) for these estimators.
Based on the condition results, we conclude that the t * * Ca2 estimator is more effective than other estimators under the conditions between (27) and (36) for Case II.

Empirical study
After the theoretical comparisons, we use the numerical research on education to present the appropriateness of the proposed estimator in the cases of non-response. The required data set information are given as follows (Source: Satici and Kadilar [23]): The numbers of teachers and successful students in Turkey's 261 homogeneous districts in 2006 are considered in this population (Satici and Kadilar [23]). In the districts, the numbers of elementary school teachers are used as the auxiliary variable (x) and the numbers of successful students in the transition to the secondary education exam are taken as the study variable (y). In this population, the last 25% of units (W 2 = 0.25, 65 units) is represented as a group of non-response (missing data). Note that in this data set, the correlation coefficient between the study variable and the auxiliary variable is positive. For this reason, the value of ω is considered as one.
The MSE values of the existing estimators in the literature, listed in Tables 1 and 2, as well as the MSE values of the t * Ca1 and t * * Ca2 estimators, are obtained using the data set. Besides, the Percent Relative Efficiencies (PREs) of the proposed estimators (t * Ca1 , t * * Ca2 ) and existing estimators in literature with respect to the Hansen-Hurwitz estimator (t H ) are computed by using the PRE formula as follows: PRE(t * * ) = MSE(t H ) MSE(t * * ) × 100.
According to Case I and II, the results of the MSE and PRE values are given in Tables 3 and 4. According to the obtained results in Tables 3 and 4, the proposed literature when all information is available. Hansen and Hurwitz [2] developed a technique in case all information may not always be available. This study uses the Hansen-Hurwitz method and proposes a new exponential estimator for the unknown population mean of y by using the information of x. Using this method, the proposed estimators are examined in Case I and Case II, separately. Statistical properties of the estimators, such as bias, MSE and the minimum MSE, are derived. In the first step, the proposed estimators are theoretically compared with the various estimators in the literature according to the related cases. Based on these comparison results, the proposed estimators can be used under the obtained conditions, instead of estimators in the literature. These obtained conditions are given between Equations (19)-(26) for the Case I and Equations (27)-(36) for the Case II. After that, educational data set is used in the numerical comparison. Numerical study confirms that the proposed estimators have the minimum MSE and the maximum PRE values among compared estimators under the non-response approaches. Besides, the simulation study is conducted to show the performance of the proposed estimators. Based on all results, we recommend the proposed estimates for the non-response case.