A modified regression-cum-ratio estimator for finite population mean in presence of nonresponse using ranked set sampling

: Several situations arise where decision-making is required for some characteristics of an asymmetrical population for example estimation of the weekly number of server breakdowns at a company. The estimation methods based upon classical sampling designs are not suitable in such situations and some specialized methods and/or estimators are required. The ranked set sampling is a procedure that is suitable in such situations. In this paper, a new estimator is proposed that can be used to estimate population characteristics in case of asymmetrical populations. The proposed estimator is useful for estimation of population mean in the presence of non-response in study variable by using ranked set sampling procedure. The estimator is based upon two auxiliary variables to reduce the effect of asymmetry. The use of two auxiliary variables is also helpful in minimizing the variation in the estimation of the population mean of the study variable. The ranked set sampling procedure is used to get better accuracy as the actual measurements may be time-consuming, expensive, or difficult to obtain in a small sample size. The use of ranked set sampling also reduces the effect of asymmetry in the characteristics under study. The expressions for the mean square error and bias for the proposed estimators have been derived. The performance of the proposed estimator is evaluated by using real-life data and a simulation study is carried out to get an overview of efficiency. The relative efficiency of the proposed estimator is compared with some existing


Introduction
There are several situations where some characteristics are to be estimated for a highly skewed or asymmetrical population. The sample drawn from such a population by using simple random sampling may lead to some unreliable results. The ranked set sampling (RSS) is a much appropriate method to draw the sample in such situations as it is based upon ordering a large number of sampling units on the basis of their relative sizes and then selecting a smaller number of units in all ranks for actual measurements. The ranked set sampling (RSS) thus improves the estimates in such a way that the amount of sampling error is decreased. The improvement of estimation is also subjected to the availability of some suitable auxiliary variables that are correlated with the variable under study.
This technique was first introduced by [1], where the units of study variable are ranked by using simple judgment. An unbiased estimator of the population mean under ranked set sampling has been proposed by [2]. The use of an auxiliary variables has also played an important role in ranking the study variable as proposed by [3,4]. The utilization of an auxiliary variable is also suitable to reduce the sampling variance and improve efficiency. Several authors have argued the use of auxiliary variables in estimation for more efficient results. It is pertinent that the increase in the number of auxiliary variables will increase the efficiency of the estimates. The use of two auxiliary variables has been studied by several authors in the case of simple random and ranked set sampling. Some notable references in this regard are [5][6][7][8][9].
Sometimes it happens that information on some of the units of study variable is not available, for example, we may not be able to record information about some sensitive characteristics from some of the units. In such cases, the non-response is observed that asks for specialized treatment. Usually, the problem of non-response is reduced by using the subsampling technique and various authors have proposed estimators to overcome this situation. A subsampling approach to adjust the nonresponse in the survey has been proposed by [10]. The problem of non-response can also be reduced by using the information of some auxiliary variable with the full response as discussed by [11]. Some other notable references on dealing with the non-response are [12][13][14][15][16][17][18][19].
The estimation in ranked set sampling is done to reduce the sampling variance and different new estimators have been proposed from time to time by various authors to reduce the sampling variance. A class of ratio-in-exponential type estimators in ranked set sampling has been proposed by [20] by using the information of two auxiliary variables. A single auxiliary variable ratio-cum-product estimators of finite population mean in ranked set sampling has been proposed by [21]. The median and neoteric ranked set sampling provides more efficient estimates as compared with the classical ranked set sampling and [22] have proposed some regression estimators in median and neoteric ranked set sampling. These estimators are based upon single and two auxiliary variables. The transformation of auxiliary variable is sometimes helpful in improving the efficiency of the estimates as discussed by [23] where the authors have proposed a ranked set sampling estimator by using transformation of the auxiliary variable. The new methods of some log type class of estimators using RSS have been proposed that have been considered as novel class of estimators in current literature by [24].
In some situations, some sensitive auxiliary information is to be used and non-response is evident for such an auxiliary variable. Such situations usually arise in medical research where the study variable is related to post-operative conditions or transplantation of organs and the collection of observation is difficult or missed. Such situations also have an effect on the sample size and in this scenario the ranked set sampling technique is suitable. In this paper, we will propose a new regressioncum-ratio estimator for estimation of population mean when the study variable is sensitive or it is difficult to obtain information about that variable in a normal sample size. The estimator has been proposed by using the information of two auxiliary variables with the view that one of the variables has a high correlation with the study variable.
The plan of the paper follows. A brief description about the ranked set sampling with two auxiliary variables in presence of nonresponse is given in Section 2. Some existing ranked set sampling estimators are given in Section 3. The new estimator is proposed in Section 4 alongside the expressions for mean square error and bias of the proposed estimator. The results of the simulation study are given in Section 5 alongside the relative efficiency of the proposed estimator as compared with Mohanty's estimator. A numerical study is presented in Section 6 followed by conclusions in Section 7.

Ranked set sampling with two auxiliary variables in presence of non-response
The ranked set sampling is based upon ranking the variable of interest either visually or by any cost independent ranking method. This type of sampling differs from conventional random sampling and hence requires some description. In this section, description of the ranked set sampling technique is given. The technique is described below.
Suppose we have a finite population W, of size N, from where the sample is to be drawn. Suppose further that two auxiliary variables, X and Z, are also known that are highly correlated with the study variable Y. Let x  and z  are population mean of X and Z respectively. It is assumed that the information about the study variable Y, with the population mean y  , is not easy to obtain and potentially nonresponse exist and hence auxiliary information is to be utilized. Suppose that the population W is divided into two subgroups W1 and W2 such that W = W1 + W2 and let information on ( ) ,, Yi is not available from W2 at the first attempt. Also, let ,, x z y . Continue this procedure on all m sets. 4) Repeat steps 1-3 r times where r is the pre-decided number of cycles. In this way, mr  observations are selected using r cycles of m 2 observations. It is assumed that some non-response has occurred such that complete response is available on a sample 11 sW  of size n are the number of observations of study variable that are not recorded from the first sample. Also, assume that incomplete response is available on 22 sW  observations. The additional steps of ranked set sampling are based upon the procedures given by [10] and [11] and are given below. 5) Following [10], consider ( ) 6) Draw a second phase sample of size s2 and record information on X, Y and Z. Some notations to be used in ranked set sampling are: The means and variances of auxiliary variables are: The variances for ranked set sampling are The estimator of  , proposed by [24], in case of ranked set sampling is The sample mean of study variable, in RSS, for each rank j is written The sample means of two auxiliary variables X and Z at a specific rank j are ( ) Above joint expectations are equal to specific covariance and are useful in deriving the mean square error of estimator in ranked set sampling.

Some existing estimators
In this section, we will discuss some existing estimators in ranked set sampling. These estimators are discussed below.
1) A ratio-cum-product estimator by [21]. The estimator is where  is a suitably chosen constant. The MSE and Bias are: and 2) A class of ratio-in-exponential-type estimators proposed by [20]. The estimator is ( ) 3) A class of regression cum ratio estimators of population mean in ranked set sampling by [23] ( ) where  and  are known constants and can be population coefficient of kurtosis, skewness, variation and correlation, etc. Also Different estimators have been proposed by [23] by using different values of  and  in (14) In the following section, we will propose a new estimator for the population mean in the case of ranked set sampling. The estimator is proposed by using information on two auxiliary variables. The estimator is motivated by the fact that the additional auxiliary variable, having a higher correlation with the study variable, will improve the efficiency of the estimates. These types of estimators are useful for the estimation of population characteristics in different types of studies, for example, in information technology we may be interested to estimate the average time taken by a server to process a task by using the information of the processors and the installed RAM. The proposed estimator will also be suitable when there is non-response on some auxiliary variables.

A new ranked set sampling estimator with two auxiliary variables
In the following we will propose a new estimator in ranked set sampling with two auxiliary variables. The proposed estimator is etc., the mean square error can be written as It is to be noted that the bias of the estimator is not zero.
In the following, we have conducted a simulation study to see the performance of the proposed estimator.

Simulation study
A simulation study has been conducted to see the performance of the proposed estimator as compared with the existing estimator. The simulation study has been conducted by generating random observations from the normal distribution. We have generated artificial populations of size N = 5000 on auxiliary variables as (5000, 0,1) . A response of 60% is considered giving a nonresponse of 40%. We have drawn subsamples from the non-responded group. This procedure is repeated 20,000 times giving 20,000 values of the estimators. Mohanty's estimator is also computed by using the above-mentioned procedure. The mean square errors of the proposed estimator and Mohanty's estimator are computed by using the 20,000 values of each of the estimators at different combinations of the parameters. The mean square errors of the proposed estimator and its relative efficiency relative to Mohanty's estimator are given in Table 1 below. From the above table, we can see that the efficiency of the proposed estimator is far better as compared with Mohanty's estimator and hence the proposed estimator is much better than Mohanty's estimator.
We have also obtained the mean square error of the proposed estimator for different correlation combinations between study variable and auxiliary variables. These mean square errors are given in Table 2 below. From above table we can see that for small value of m, the mean square error decreases with increase in correlation coefficient between X and Y.

Numerical study
In this section, we have given a real data example for the proposed estimator. For this, we have used an artificial population of 60 patients of type-II diabetes II [25]. The study has been conducted by using the resistive index of the renal artery as the study variable, Y; alongside two auxiliary variables, blood urea, X; and serum creatinine, Z. The study variable had obvious non-response and hence the proposed estimator is suitable to estimate its mean. The calculated values are 54.55 We can see that the proposed estimator again has a much smaller mean square error as compared with Mohanty's estimator.

Conclusions
In this paper, we have proposed a new regression cum ratio estimator of mean by using the ranked set sampling approach. The estimator has been proposed by using the information of two auxiliary variables. The proposed estimator is useful for the estimation of population characteristics when the study variable has non-response. The expression for the mean square error of the proposed estimator has been obtained. The simulation study has been conducted to see the performance of the proposed estimator as compared with Mohanty's estimator. The simulation study has been conducted by using different proportions of non-response. Specifically, the proportion of non-response used is 50% (k = 2) and 25% (k = 4). The mean square error of the proposed estimator is compared with the Mohanty's estimator by using three different settings. We have seen that for all the combinations of m and r, the proposed estimator is better that the Mohanty's estimator.

Conflict of interest
The author declares no conflict of interest in this paper.