Estimation of finite population mean using double sampling under probability proportional to size sampling in the presence of extreme values

Values that are too large or small enough can be found in many data sets. Therefore, the estimator can yield ambiguous findings if several of the incredible deals are picked for the sample. When such extreme values occur, we propose improved estimators to determine the finite population means using double sampling based on probability proportional to size sampling (PPS). The properties of estimators are obtained up to the first order of approximations. When the size of the units varies widely, the PPS sampling technique may be employed. To determine the values of Pi when using PPS, we must be acquainted with the aggregate of the auxiliary variable Xi. However the designs and estimation techniques we have looked at so far are unsuccessful and are less effective when this information is difficult to locate or when other information is missing. The two-phase approach is preferable and more feasible in these kinds of circumstances. To demonstrate how effectively the recommended estimators performed, we used three actual data sets. We show mathematically and theoretically that the suggested estimators outperform alternative estimators.


Introduction
The effective use of auxiliary variables in survey sampling may boost the precision of estimators of the population parameter.The best statistical property estimates for population quantities, like mean, total, median, etc., are frequently searched for by researchers.For this, an illustrative sample of the population is needed.If the aggregate of concern is equivalent, choosing the entities can be done utilizing a SRS approach.It is necessary to know the aggregate constraints of the auxiliary variable in order to use the ratio, product, or regression methods of estimate.The ratio estimator plays an important role when there is a significant connection between the research and the auxiliary information.Apart from, the product estimator works effectively when there is a lack of association amongst the research and the auxiliary variable.By applicably adapting the auxiliary information, numerous researchers have developed various ratio estimators.Researchers can investigate this research by looking at [1] recommended on certail procedures of enlightening ratio and regression estimators [2].recommended a better class of estimators for the mean of the population that use PPS sampling [3].suggested that under linear transformation of the auxiliary variable, exponential estimators of the population mean be of the ratio type [4].they reviewed a class of estimators of the population mean that hold satisfactorily against linear modification of the auxiliary information [5].discussed a class of exponential ratio estimators consuming two auxiliary information [6].studied mean estimate using quantile regression ratios under full and partial auxiliary information [7].suggested robust quantile regression with two more variables for mean estimation [8].recommended methods of enlightening estimators.The [9] discussed ameliorate estimation of mean using skewness and kurtosis of auxiliary character [10].recommended a class of product estimators of population mean utilizing auxiliary information has been presented and questioned [11].suggested an estimation of the population mean that was of the generalized exponential type and used auxiliary features [12].recommended estimators of the mean of a population using simple random sampling that are based on robust ratios were proposed [13].estimators for the mean of a population that make use of supplementary data and execute consecutive sampling on two occasions are recommended [14].presented several imputation strategies for addressing missing information in two-sample consecutive sampling [15,16].recommended estimation of population mean under probability proportional to size sampling with and without measurement errors.
In various situations, such as medical studies or surveys, it is common for the population sizes to diverge significantly.This can lead to variations in the probabilities or outcomes of different units within the population.For example, in a medical study examining a specific disease may be relatively small compared to the overall population size.This divergence in size can affect the probability of selecting individuals with the disease in a random sample.Researchers may need to account for this difference in population size and adjust their sampling methods or statistical analyses accordingly to ensure accurate representation and valid conclusions.Similarly, in surveys related to family income, the number of siblings within families can vary widely.This divergence in family size can influence the overall distribution of income levels within the survey population.It may be necessary to consider the different family sizes when analyzing the survey data or drawing conclusions about the relationship between family income and other variables.In situations like these, statistical techniques such as weighting, stratified random sampling, or other methods can be employed to address the divergent population sizes and account for the varying probabilities of units within the population.These techniques aim to provide accurate estimates and make valid inference despite the difference in population sizes.We utilize PPS sampling to deal with such an unequal probability.A PPS is an unequal random sampling in which, for each sampling component taken collectively, the chance of choices is proportional to an auxiliary variable.Let the context where we must evaluate the population in districts inside a province; we choose the auxiliary variable that has the determined relationship with the research variable.
For example.
(i) The aggregate of all districts inside the province (associated with research variable = 0.85).
(ii) The quantity of families in all societies inside the districts (association with the study variable = 0.98).
On the origin of these facts: (ii) more useful as an auxiliary variable.
Researchers can investigate this research by looking at [17] a discussion of using outliers to estimate the average of a population using a probability-based sampling design [18,19].recommended PPS when outliers are present [20].discussed combination of ratio and PPS estimators [21].offered a more accurate estimation of the population size using PPS data [22].discussed improved estimators in simple random sampling [23].recommended on mixture of ratio and PPS estimators [24].recommended substitute estimators in PPS sampling [25].two auxiliary variables were suggested for improved estimate of the population mean using PPS.
Therefore when evidence like that is not readily accessible or when the auxiliary variable is not available, the earlier designs and estimating procedures do not produce capable results, and their efficiency decreases.Double-phase sampling is more beneficial and effective in this situation.The populations mean of the auxiliary information, which will be used in the evaluation or selection phase, can be estimated using an adequate initial sample.
For example: On the condition of a single auxiliary information X, we take a sizeable investigative sample for estimating the population mean and only a subsample for computing the research variable Y because obtaining evidence on X is less expensive.This may imply allocating a portion of the assets to this large initial sample, resulting in a smaller sample size for computing the study variable.When the improvement in accuracy is significant compared to the rise in price due to the gathering of information on the auxiliary information for huge samples, this technique is favorable.The difficulty of calculating total buffalo milk production in a given region is an actual illustration of this situation.We use a community as the sampling element and the quantity of milk buffalo in a community as the auxiliary information in this study.Because the whole amount of milk buffalo in each community in the region may not be known, the investigator may choose a huge sample of communities and gather data on the number of milk buffalo in each village.This data is then utilized to calculate an estimate of X, the total number of milk buffalo in the area.The researchers are focused on an article regarding double-phase sampling at [26] who proposed the generalized regression estimator for two-phase tax record samples [27].recommended the mean of a finite population can be estimated using linear regression and the ratio product [24].presented double-sampling modified exponential estimators for the mean of a finite population [28].recommended combining exponential functions for effective estimate when two-phase sampling is used [29]. in the context of stratified two-stage sampling, we talked about exponential chain ratio estimators [30].consuming two auxiliary information in stratified two-phase sampling, a new, more accurate calibration estimator was presented [31].recommended a family of estimators for predicting population mean from auxiliary proportions in singleand two-stage samples [32].discussed a two-phase sampling method that uses a generalized methodology to estimate a finite population mean was suggested [33].estimated the mean of a finite population using a mixed exponential-type estimator and a two-stage sampling design [34].proposed that two-phase sampling could improve mean population estimates [35].recommended an effective group of double-sampling estimators for the population mean [36].for double-sampling the mean of a finite population, an exponential estimator of the chain-ratio type is proposed.
Our primary objectives are highlighted as follows.
1.In this paper, the primary objective of the contemporary effort is to estimate the finite population means using double sampling under PPS in the existence of extreme values (minimum and maximum values).2. The numerical properties i.e. bias and MSE of the recommended estimator, are consequent up to the first order of approximation.3. The application of the recommended estimator is highlighted through the use of real data sets from various domains.

Sampling methodology
Let a population Ψ = {Ψ 1 , Ψ 2 , …, Ψ N } of size N unlike elements.In the first phase, we draw an initial large sample of size "m" (m < N) from Ψ by making use of the SRSWOR sampling design and estimating the auxiliary information x.In the second phase, we take out a sub-sample of size "n" from the first phase of size "m", i.e., (n < m) by SRSWOR or at first hand from Ψ , and notice both the study and auxiliary variables.Consider y i , x i and z i to be the study and auxiliary variables, respectively. Let , be the PPS to size for i th units, where Some real data sets include extreme values, e.g., when estimating the intelligence quotient (IQ), the brilliant students got (maximum) marks, and the weak students got (minimum) marks.If there are unexpectedly large or small elements in the population, the finite population mean is particularly delicate to unpredicted values.Furthermore, because the mean estimator is particularly delicate to such unpredicted findings, the population mean will either be ordinary or overstated depending on whether the sample contains large or small values.Consequently, if any of the surprising values are picked in the sample, the estimator can produce ambiguous conclusions.[37], suggested the following unbiased estimator to overcome this issue, which is given in equation (1).
The MSE of ŷss , at the unknown value of c, which is given in equation (2): where Var(y) = φs 2 u11 [11]recommended population total under PPS, which are given in equation ( 3): where, p i = cxi+n cxi+Nn .For estimation of the population means, we can also write equation (3) as given by: The variance of y pps is given in equation ( 4): The ratio and product estimators [38,39] which are given in equations ( 5) and ( 6): The MSE of y RT,PPS , and y PT,PPS are given in equations ( 7) and ( 8): and The regression estimator is given in equation ( 9): The variance of regression is given in equation (10): Where.

Suggested estimators
Some real data sets included extreme values, either very large or small.The efficiency of estimators may suffer in the manifestation of these extreme values.For example, while measuring the average export of goods, China may produce a large number of goods for the international market due to new technology and improved skills of its people, compared to Pakistan's small amount of goods due to poor management and lack of technology.Similarly, if we wish to know the average yearly wheat production in our country, we can see that wheat production in Punjab is extremely large as compared to other provinces.To deal with such an extreme values taking motivation from Refs.[17,18], we suggested an improved ratio, product, and regression type estimator for double phase with PPS sampling in the occurrence of extreme values.The recommended improved estimators are presented in three different situations.
Situation-I: Mean per unit estimator, given in equation ( 11) The optimal value of C, is given as: The least variance at the value of C are given in equation ( 12): Situation-II: When u and v are positively correlated.When the correlation between u and v is positive, when the minimum cost of u is chosen, the collection of the minimum value of v is presumed.And for a maximum value of v, a maximum cost of u is assumed to be nominated.In such a scenario, we suggest the following improved ratio type estimator, which is given in equation (13). or , If the sample included small value of u i11 and v i12 , If the sample included large value of u i11 and v i12 u 11 ) , for all other samples , where . If a trial contains maximum values of u and v, and ), for all further samples.Where c 1 and c 2 are sustained, its value y be decisive for optimal conditions.The regression estimator is given in equation ( 14): where if the trial comprises u and v maximum, and (u c11 = u 11 , v c21 = v 11 ), for all other samples.Situation-III: When u and v are negatively correlated.While u and v are both negatively correlated with one another, the picking of a large assessment of v is expected to be accompanied by a small value of u.Similarly, when a small value of v is selected, it is expected to select a large value of u.Based on these situations, we suggested the following improved product type estimator, which is given in equation ( 15): or , If the sample included small value of u i11 and large values of v i12 , If the sample included large value of u i11 and small values of v i12 ) , for all other samples The regression estimator is given in equation ( 16): where if the sample comprises u and v maximum, and (u c12 = u 11 , v c22 = v 11 ), for all other samples.To find out biases and MSE we explain the relative error term and their expectation given as: Let )] , , , where By simplifying (12), in terms of e's.
. Taking expectations from both sides, we have where R = Y X .Unique values of c 1 and c 2 are not possible, because we have one equation and two unknown values.
Putting the ideal values of c 1(optimal) and c 2(optimal) , the least MSE of Ŷ RT,PPS , is given in equation (17): Similarly, the bias of product type estimator is given: The MSE of product type estimator is given in equation (18): In circumstance of positive correlation, the variance of y T,Reg1,PPS , given in equation (19); In circumstance of negative correlation, the variance of y T,Reg2,PPS , given in equation (20): Generally, we can write the variance of the regression estimator as given in equation ( 21):

Efficiency comparison
In this section, we equate theoretically the suggested estimators with existing counterparts.

Numerical investigation
We took three data sets to determine the suggested estimator's efficiency with existing counterparts.The summary statistics of these data sets are given below: Data-I [Source: [24]]: Y = Expected fish caught throughout 1995, X = expected fish caught throughout 1994, Z = expected fish caught throughout 1993.Data-II [Source: [24]]: Y = Expected fish caught throughout 1995, X = expected fish caught throughout 1993, Z = expected number of fish caught throughout 1992.Data-III [Source: [40] ]: Y=Output for 80 yard, X = stable capital in a region, Z = number of labors.

Discussion
As previously mentioned, we evaluated the performance of our suggested estimators using three real data sets.The proposed estimators are numerically and mathematically related to their current equivalents.The actual data are summarised statistically in Tables 1-3.The MSE and PRE of our proposed and current counterparts are displayed in Tables 4 and 5.In phrase of MSE and PRE, it is detected that the suggested estimators are efficient than existing counterparts.The gain in data 2 is greater as compared to data 1 and data 3. Fig. 1 shows a comparison of estimators in terms of MSE.We plotted estimators on the X-axis and MSE values on the Y-axis.The estimator is more effective when you reduce the value of MSE.The efficiency of an estimator is directly related to the trend of lines.As the value of MSE is the minimum, the line graph shows the downward direction.Fig. 2 shows a comparison of estimators in terms of percentage relative estimators.When compared to their counterparts, our proposed estimators gain the most percentage relative efficiency.We plot estimators on the X-axis and values of PRE on the Y-axis.The higher the value of PRE, the better is the estimator.The trend line indicates an increasing path based on the PRE values.

Conclusion
In this paper, we have recommended an enhanced ratio, product, and regression type estimators for the estimation of finite population mean in double-phase with PPS sampling in the incidence of extreme values.The numerical expressions of properties are derived up to the first order of approximation.The purpose of this proposal is to enhance the accuracy and precision of mean estimation compared to existing estimators.To evaluate the efficiency of the recommended estimator, we conduct a comparative analysis   with several existing counterparts.By comparing the performance of the proposed estimator against these alternatives, we aim to demonstrate its uniqueness and superiority.We used three actual data sets to obtain the MSEs and PRE.From the numerical results, recommended estimators perform well in terms of minimum mean square error and advanced PRE.It has been validated through empirical efficiency comparisons that our proposed estimators perform more effectively than the traditional estimators.The recommended estimators performed well, with the greatest gain in efficiency, and would perform well in applied surveys.The current work can be easily extended to yield an improved family of estimators under stratified random sampling and measurement error using the auxiliary information or attributes for estimation of population mean and variance.Additionally, it would be interesting to examine the efficiency of our recommended estimator in more complex survey settings, such as clustered and stratified sampling.

u 11 +
c, If the selected observation included small value of u i11 u 11 − c, If the selected observation included large value of u i11 u 11 , If the selected observation included other values

Fig. 1 .
Fig. 1.MSE of the suggested and existing estimators Fig. 1: On Y-axis, we put the values of mean square error, and on X-axis, we put the estimators.

Fig. 2 .
Fig. 2. PRE of the suggested and existing estimators Fig. 2: On Y-axis, we put the values of mean square error, and on X-axis, we put the estimators.
the sample contains only minimum, not maximum values y − s, if the sample contains only maximum, not minimum values y, if sample contains all

Table 4
MSE of the existing and suggested estimators.

Table 5
PRE of the existing and suggested estimators.