Abstract

The use of extreme values of the auxiliary variable is sometimes more beneficial to get the high efficiency of the estimators, and the study variable can have a correlation with the rank of the decently correlated auxiliary variable. As a result, it can be regarded as additional data for the study variable that can be used to improve the estimators’ efficiency. When the knowledge of the minimum and maximum values, as well as the rankings of the auxiliary variable, is known, various better estimators for calculating the finite population mean of the research variable based on extreme values under simple random sampling are proposed in this paper. The suggested estimators’ bias and mean squared error expressions are derived using first-order approximation. The recommended estimators have been compared mathematically to the current estimators. The suggested estimators are more exact in terms of relative efficiency than the other estimators addressed here, as shown by simulation and real datasets used to demonstrate the estimation of a limited population mean based on extreme values.

1. Introduction

The purpose of survey sampling is to utilize the maximum amount of information about the characteristics of interest. Many fields of study require estimation of the finite population mean for a variable of interest. For example, average wheat production per acre, average income of households, mean weight of meat producing animals, etc. The mean per unit estimator is a base line estimator to estimate the finite population mean.

Unusual observations can occur in sample survey data. The mean estimator is sensitive to very large and/or small values if included in the sample. It can provide biased results and, ultimately, tempted to delete the sample data. However, generally, when there are extreme values in the data, the efficiency of classical estimators declines. (for more details, see [1] and the reference cited therein).

The use of supplementary information to enhance the precision of an estimator is a typical strategy in survey sampling. To improve their relative efficiency, the ratio, regression, and product-type estimators all require supplemental information on one or more auxiliary variables in addition to the information on the study variable. For example, when estimating the total household income, the household members and total expenditure may be used as two auxiliary variables. A significant amount of research work has been done to develop new and improved estimators of the population parameters, which include the population mean, total, CDF, median, etc. (for more details, see [29] and the references cited therein). To the best of our knowledge, [913] have done some recent work on the estimation of finite population mean using auxiliary information. However, because classical estimators are sensitive to extreme values, the outlier problem, which is the presence of extreme values in data, reduces efficiency (see [1] and the reference cited therein).

When there are extreme values in the data, the efficiency of a classical ratio/product-type estimator declines in terms of relative efficiency (RE). Similarly, the regression-type estimators do not perform well in the presence of outliers as it is a well-known phenomenon that ordinary least square (OLS) estimators are sensitive in the presence of outliers. However, extreme values, if known, can be retained in the data and used as auxiliary information to increase the precision of the estimate (for more details, see [6, 9, 1417] and the references cited therein, to name a few).

In this study, we used extreme values of the auxiliary variable as auxiliary information and retained it in the data and suggested improved ratio/product-type estimators. On the lines of [17, 18] and 20 in simple random sampling (SRS), we introduced an improved class of estimators for predicting the finite population mean based on extreme values, using the lowest and maximum values of the auxiliary variable as auxiliary information.

The rest of the study is as follows: in Section 2, the methodology and notation of the study are described. In Section 3, existing estimators are discussed. In Section 4, we briefly discussed our proposed estimators. Section 5 and Section 6 contain the mathematical and numerical comparison. Finally, in Section 7, we discussed the main findings and concluded the study.

2. Methodology and Notation

Let us consider a finite population of size . The values of the study variable and the auxiliary variable , respectively, are and . Let be the values of the auxiliary variable ’s corresponding rankings for the th units. We use SRS without replacement to choose a sample of size units from the population . Let , , and be the population mean of study variable, auxiliary variable and the ranks of the auxiliary variables, respectively. It is further assumed that , , and be the corresponding population variances of , and the ranks of the auxiliary variable , respectively.

Let , , and be the population coefficients of variation of the study variable, auxiliary variable, and the ranks of the auxiliary variable, respectively. , , and are the population correlation coefficients between the subscripts.

Let , , and be the sample means and , , and be the sample variance of the study variable, auxiliary variable, and the ranks of the auxiliary variable, respectively.

We may use the following relative error terms to determine the biases and MSEs of the existing and proposed class of estimators. Letsuch that .

3. Existing Estimator

In this section, we define the existing estimators of finite population means, which are to be compared with our proposed estimator.

3.1. Usual Unbiased Estimator

The unbiased estimator of a finite population mean with variance that is most commonly used isrespectively.

3.2. Cochran’s Ratio Estimator

Cochran [2] recommended a ratio type estimator by first estimating the finite population mean in SRS, which is obtained by employing auxiliary information.

Mathematical expression upto the first-order of approximation for the bias and MSE of is given byrespectively.

3.3. Classical Regression Estimator

The classical regression estimator for under SRS is given bywhere is the regression coefficient between and for . The MSE of upto the first order of approximation is given as under

3.4. Mohanty and Sahoo Estimator

On the similar lines of [17, 18], as auxiliary information, we gave two finite population mean estimators based on the minimum and maximum values of the auxiliary variable as follows:respectively. Where and . Here and are the minimum and maximum values, respectively.

The expression for bias and MSE of and are given byrespectively. Where and

3.5. Walia et al. Estimator

Walia et al. [9] presented some estimators based on known knowledge about the auxiliary variable’s minimum and maximum values are provided. The following is the transformation:

The estimators of the finite population mean are listed as follows :respectively. Where and .

The bias and MSE of the above-modified estimators and are calculated as follows:respectively. Where and .

4. Proposed Estimator

In this section, we develop two auxiliary information-based (AIB) classes of estimators, say ratio and exponential ratio, under the SRS technique, for calculating the mean of a finite population .

4.1. First Proposed Class of Estimator

We present a better class of estimators for estimating under SRS utilizing known information about the auxiliary variable ’s lowest and maximum values, as motivated by [15]. The following is the improved class of estimator:where and are unknown constants whose values must be determined in order to calculate the bias and MSE of the minimum and . Further, , and be the scalar quantities that may assume values. In addition, the sub-cases of the are summarized in appendix (given in Table 1).

In order to derive approximate mathematical expressions for the bias and MSE of , we can write and . Let us express the right-hand side (RHS) of (20) in terms of ’s to getwhere . Let us expand the RHS of equation (21) and retain terms up to 2nd power of ’s, we have

Let us take expectation on both sides of equation (22), which is provided by, to get the bias of up to the first order of approximationwhere

Taking square on both sides of equation (22) and then taking its expectation to get the MSE of under first order of approximation, which is given bywhere

By reducing equation (25) with regard to and , the optimum values of and are determined

Substituting the optimum values of and in equations (23) and (25), we get the minimum bias and MSE of , respectively

4.2. Second Proposed Estimator

On the similar lines of 6, we propose another improved class of exponential-type estimator for estimating using supplementary information in terms of minimum and the maximum values of under SRS scheme. The improved estimators are given bywhere and are unknown constants whose values must be chosen so that the biases and MSE of are as small as possible and where and are unknown constants whose values must be set so that the biases and MSE of are as little as feasible, and and . Further, , and be the known values.

Let us express the RHS of equation (30) in terms of ’s to acquire the following approximate mathematical equations for the bias and MSE of aswhere . Using the Taylor series’ first-order approximation, we have

Simplifying and applying expectation on equation (32), we have the final expression of bias of , given by

By squaring and applying expectation on both sides of equation (32), we obtain the MSE up to first-order of approximation as

The and optimum values are derived by minimizing the equation (34), respectively, given by

We get the minimal bias and MSE of substituting the best values for and in equations (33) and (34), respectively.where is the coefficient of multiple determination of on and .

5. Mathematical Comparison

We compared the proposed estimators mathematically to the existing estimator in Section 3 in this section.

5.1. First Proposed Estimator

Condition 1. From equations (2) and (28)

Condition 2. From equations (5) and (28)

Condition 3. From equations (7) and (28)

Condition 4. From equations (11) and (28)

Condition 5. From equations (13) and (28)

Condition 6. From equations (18) and (28)

Condition 7. From equations (20) and (28)

5.2. Second Proposed Estimator

Condition 8. From equations (2) and (36)

Condition 9. From equations (5) and (36)

Condition 10. From equations (7) and (36)

Condition 11. From equations (11) and (36)

Condition 12. From equations (13) and (36)

Condition 13. From equations (18) and (36)

Condition 14. From equations (20) and (36)

6. Numerical Comparison

In this section, simulated and real datasets are considered, and the percentage relative efficiencies (PREs) of the proposed estimator are computed.

6.1. Study of Simulation

We undertake simulation research using the notion from [19] to compare the performance of our recommended estimators to the comparable current estimators. We used the following distributions to construct six datasets of size for the auxiliary variable :and the study variable bywhere is the sample correlation coefficient between study and auxiliary variables, and is the random error term, which has and and follows a conventional normal distribution.

We considered the following steps in to get the results of mentioned estimators in this study:Step 1: the first step was to create different populations (as an auxiliary variable) of 1000 units using the different distributions, and then Y is computed using the model given in equation (53).Step 2: the unknown constants’ optimal values for the suggested estimators are obtained using the datasets computed in .Step 3: we use SRS without replacement to draw a sample of size and calculate for all the estimators covered in this research.Step 4: under the same environment, the variances and MSE of the mean estimators are computed by drawing 50 thousand samples from each population under SRS given in . The variances/MSE of the proposed and existing estimators based on SRS are approximated by using the following formulae:where and . On similar lines, the MSE of other estimators , , , etc., given in Section 3, are obtained. The PRE of with respect to is given by

On similar lines, the PREs of the other estimator based on SRS may be computed. The REs of these proposed and existing estimators are reported in Table 2. It can be seen that the proposed estimators are more efficient than usual unbiased estimators and existing estimators as well in terms of PRE, i.e., all values of the PREs are greater than a hundred. The effect of increasing the number of sample size is precluded. However, generally, with an increase in the sample size, the PREs tend to increase and vice versa.

6.2. Real-Life Data

We used three real datasets to compare the PREs of all these estimators to see how well they performed compared to the comparable existing estimators. These datasets’ descriptions and summary statistics are listed as follows.

6.2.1. Population Ι

This dataset is taken from [20] page 226 and was conducted in Pakistan during the year 2012, which comprised 33 divisions. This dataset may be downloaded from the Pakistan Bureau of Statistics web page via the link: https://www.pbs.gov.pk/content/microdata. The study variable corresponds to the employment level by divisions in 2012 and the number of registered factories in 2012, respectively, while corresponds to the rank number of registered factories in 2012. Here, our objective is to estimate the finite population mean under extreme values in SRS. The population constants are

6.2.2. Population ΙΙ

Another dataset is taken from [20] page 135, conducted in Pakistan during the year 2012, which comprised 33 divisions. This dataset may be downloaded from the Pakistan Bureau of Statistics web page via the link: https://www.pbs.gov.pk/content/microdata. The number of pupils enrolled in each division and the total number of government primary and secondary schools for boys and girls in each division are the research variables and in 2012, respectively, and correspond to the rank of auxiliary variable . Here, our objective is to estimate the finite population mean under extreme values in SRS. The population constants are

6.2.3. Population ΙΙΙ

This dataset is taken from [2], which comprises 36 units of food cost and weekly income of families. The study variable and auxiliary variable are the food cost of families’ employment and weekly income of families, respectively, and correspond to the rank of weekly income of families. For more detail, we can refer to [2] page 24. To estimate the finite population mean under extreme values, population constants are .

On the abovementioned datasets, the PREs of these proposed and current estimators are provided in Table 3. In terms of PRE, it can be seen that the proposed estimators are more efficient than the standard unbiased estimator and existing estimators, i.e., all values are more than one hundred.

7. Conclusion

In this paper, we present some effective estimators for estimating the finite population mean using known information about the minimum and maximum values of auxiliary data. We have identified certain theoretical situations in which the recommended estimators outperform existing estimators. Tables 2 and 3 offer the PREs for all estimators over the mean per unit estimator. According to our findings, the recommended estimators outperform the estimators evaluated in this research. They are recommended among the suggested classes of estimators because of their high PREs for all populations.

Appendix

where

Data Availability

All the data used in this study are available within the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.