Nonparametric Estimation of Quantile and Quantile Density Function

In this article, we derive a new and unique method of estimating quantile and quantile density function, which is based on moments of fractional order statistics. A comparison of the proposed estimators is made with existing popular nonparametric quantile and quantile density estimators, in terms of mean squared error (MSE) for censored and uncensored data. Recommendations for the choice of quantile and/or quantile density estimators are given. Citation: Yang X, Hutson AD, Wang D (2017) Nonparametric Estimation of Quantile and Quantile Density Function. J Biom Biostat 8: 356. doi: 10.4172/2155-6180.1000356


Introduction
The quantile function where F(.) is the cumulative distribution function (CDF) of a continuous random variable X and 0<u<1, is an alternative to the probability density function (PDF), the CDF and the characteristic function for describing a probability distribution. The estimation of Q(u) is of great interest, especially when one is unwilling to assume the distribution as parametric or when the underlying distribution is skewed.
Many nonparametric estimators of the quantile function have been proposed and studied extensively. For uncensored data, the simplest method is the empirical quantile (EQ) estimator based on a single order statistic. It is a piecewise constant function that does not provide a useful quantile density function estimation. Details about advantages of smoothed quantile estimators can be found in Cheng and Parzen [7]. Numerous smoothed quantile function estimators have been introduced. Here, only the most representative ones are outlined. The most commonly used estimator is the linear interpolation of successive order statistics, which is employed in applications, for example, Q-Q plots and popular software packages such as SAS, BMDP, and MINITAB. Parzen [1] developed kernel smoothing of the EQ estimator, which is well known as the kernel quantile estimator. It has been extensively studied and analyzed [3][4][5]8]. More complete literature reviews on kernel-based quantile estimators can be found in Sheather and Marron [8] and Cheng and Parzen [7]. However, kernel quantile estimators in general are complicated and analytically intractable. Their performance in the sense of MSEs is very sensitive to the choice of bandwidth. In addition, the approximations to kernel estimators may violate the monotonicity requirement as described by Yang [5] and this issue is also shown in our simulation study and the real data application in Section 5 and Section 6. Generalized order statistics were considered as alternatives to sample quantiles by Harrell and Davis [14] and Kaigh and Lachenbruch [15]. Huang [16] proposed a modification of the Harrell-Davis (HD) estimator based on developing a weighting scheme through the use of the level crossing empirical distribution function.
In the presence of right-censored data, the product-limit quantile (PLQ) estimator proposed by Sander [17] and the general kernel smoothing version of PLQ estimator by Padgett [18], which share similar problems as their parallels for uncensored data, have gained the most popularity in the literature. More recently, Wang et al. [19] extended the HD quantile function estimator for censored data and proposed an exact bootstrap procedure for optimization in terms of MSE related criteria.
In the same way that the CDF can be differentiated to give the PDF, Parzen [1] and Jones [20] defined the derivative of Q(u) as the quantile density function. That is, q(u)=Q'(u). Common applications of q(u) include but are not limited to constructing the asymptotic confidence interval of sample quantiles, inference procedures based on linear rank statistics in Hettmansperger [21], and quantile density based approach in the location scale problem, see Eubank [22].
To estimate the quantile density function q(u), given either censored or uncensored data, two main approaches can be applied. One is the mathematical derivative of quantile estimators (if differentiable), the other is the reciprocal of density quantile function f(Q(u)) obtained by differentiating on both sides of the equation F(Q(u))=u. The former way is more advantageous over the latter in terms of efficiency; see Jones [20] for more information pertaining to the comparison of these two methods. In addition, kernel smoothing of the reciprocal of density quantile function has also been considered [23]. As mentioned before, PLQ and EQ estimators fail to provide useful quantile density function estimation. However, the linear interpolation of two successive order statistics is differentiable and the resulting quantile density function estimator is a histogram type estimator by Siddiqui [24]. The derivative of the kernel quantile estimator by Parzen [1] was introduced by Falk [25]. Xiang [26] proposed a natural derivative of the quantile estimator by Padgett [18]. More reviews on quantile density estimators can be found in Cheng and Parzen [7].
In this article we take a new and novel approach to quantile and quantile density function estimation based on estimating moments of fractional order statistics and solving a set of simultaneous equations pertaining to a series of moment expansions. We studied and compared the performance of our estimators with EQ estimator, PLQ estimator, the kernel smoothing of EQ and PLQ estimators, piecewise linear estimator and their corresponding quantile density estimators (if exist) for censored and uncensored data. The competing estimators were considered simply because they are commonly used for quantile and quantile density estimation. The advantages of our method are as follows: First, it does not require a selection of the optimal bandwidth and therefore can be more stable compared to the common kernelbased methods. Second, it at least outperforms the PL and the piecewise linear quantile estimators across all possible simulation parameters we considered in terms of MSEs and also appears to preserve the monotonicity of the quantile function curve. Third, the associated quantile density function estimator is shown to yield the smallest MSE among all quantile density estimators considered for both censored and uncensored data.
In Sections 2 and 3, we outline the existing methods of quantile and quantile density estimation considered in this investigation. In Section 4, our new quantile and quantile density estimators are introduced. The performance of our estimators is illustrated in terms of MSE by a Monte Carlo simulation study in Section 5. This is followed by an application of the switch life data reported by Nair [27] in Section 6. Recommendations of the choice of quantile, and/or quantile density estimators are summarized in Section 7.
indicates that T (i) is uncensored, while a value of δ (i) =0 indicates that T (i) is censored. The methods described below can be readily applied for uncensored data by setting all δ (i) =1.
Then the well-known PLQ estimator by Sander [17], is defined as where PL S is the common PL estimator of the survival function by Kaplan and Meier [28]: Defining ( ) PL S t 0 = for t ≥ T (n) has been studied by Xiang [26] with respect to convergence properties for a class of kernel quantile function estimators, and was shown to provide a more technically suitable definition in term of the large sample theory as compared to setting ( ) PL S t as undefined.
In the absence of censoring, ( ) PL Q u reduces to the EQ estimator, where ( ) The linear interpolation estimator of the quantile function given uncensored data is denoted as where S i is the (3) at T (i) and K(.) is a symmetric kernel function. If no censoring, the estimator in eqn. (6) reduces to the general kernel smoothing of the EQ estimator by Parzen [1],

Estimation of q(u)
Let, i i 1 u ,i 1, 2,..., n 1 n n ′ ′ + ≤ < = − the first order derivative of ( ) L Q u is: This is called the spacings of the sample, see Pyke [29,30], or a histogram type estimator by the fact that ( ) ( ) is the density quantile function estimator based on finite differences introduced by Siddiqui [24].
Again, the PLQ estimator and the EQ estimator do not have the corresponding quantile density estimators. The natural derivative of (6), ( ) KPL q u , was established by Xiang [26] as where s i denotes the jump of

Fractional Order Statistic-Based Quantile and Quantile Density Estimator
In order to develop our new quantile and quantile density function estimators, we need to derive asymptotic expansions corresponding to the k th non-central moment of the fractional order statistic, ( ) k n u:n E X ′ , where the fractional order statistic n u:n X ′ for an i.i.d. uniform sample jointly follow a particular Dirichlet process n u:n U ′ ; see Stigler [31]. Even though fractional order statistics do not exist in the empirical sense, their respective expectations may be calculated.
In deriving the expansion we assume that the first three derivatives of Q are bounded in a neighborhood of u and denote them as ( ) We also assume, similar to the results pertaining to the i th order statistics as in Section 3.1 and Section The weight w i is given as for uncensored data and for censored data in Wang et al. [19], where ( ) As an aside, estimates of Q'(u) and Q″(u) are also available as part of this process. In this investigation, we only interested in the performance of the first-order derivative of Q(u), i.e., the quantile density function q(u), and let's denote the quantile density estimator as If we are only interested in an estimate for Q(u) then the numerical solution with respect to Q(u) in terms of our system of equations 17-19 is relatively straightforward and reduces to solving the simple cubic equation with respect to Q(u), u fixed, where E * denotes the exact bootstrap moment estimator of the quantity at (4.10). Alternatively, we can reformulate (20) and define it as follows: Definition: The cubic quantile estimator, with respect to Q(u), where the weights w i are defined at (19) for censored data or at (18)

Simulation Results
For the purpose of illustrating the behavior of our estimators, a straight forward simulation study was carried out for samples of size n=30, 50, 100 for Weibull distribution with the quantile function, .5,1,1.5 and across the standardized normal, exponential, extreme value, and logistic distributions. The censoring distribution was given as a uniform distribution uniform (0,T) with T=2,5 and uncensored case was also considered. We utilized fixed quantiles of u=0.25, 0.5 and 0.75. For each combination, 2,000 Monte Carlo simulations were utilized. − ≤ , was utilized here for the considered kernel estimators. Note that even though the triangular kernel is more commonly used for kernel quantile estimators in literature, see Padgett [18], Nair and Sankaran [33], and Soni [23], it fails to provide a useful derivative when calculating the K 1 (.) in the kernel quantile density estimators. The Epanechnikov kernel, which gives the optimal kernel, see Prakasa Rao [34], was studied by Soni [23] for comparing non-parametric quantile density estimators and our simulation showed that Q(u) estimation behaviors under Epanechnikov kernel and triangular kernel were quite close, which was also confirmed in the study of Soni [23]. For simplicity, only the result of the Epanechnikov kernel was presented here. Bandwidth                Among estimators without the bandwidth selection, behaves the best for both censored and uncensored cases. And M Q and M.î Q are almost the same given fixed u.

1.
And from Tables 7-12, we conclude for nonparametric quantile density estimators that: Ĉ q Produces the smallest MSE in almost all cases. KPL q or KEQ q yields the largest MSE and is substantially larger than other quantile density estimators M q is the second best quantile density estimator in terms of MSE, which also implies that it is better than M.î q .

Application
A real life test data set with n=40 mechanical switches by Nair [27]  M.î Q and M.î q were not included here since they do not show any additional advantages compared with the moment quantile method and the cubic quantile method in Definition 4.1-2 as shown in the simulation study. Figures 1 and 2 showed the estimates of quantile and quantile density for this example data. We see from Figure 1 that KPL Q , M Q , and Ĉ Q do not differ much except at tails, and furthermore, only PL Q and Ĉ Q preserve monotonicity of quantile function curves for this data. These matches with our finding in simulation study that KPL Q and M Q can be away from the true quantile function for large values of u when data is heavily censored. Some techniques for correction at tails have already been explored and a brief review can be found in Soni et al. [23]. In addition, as we found in the simulation study, Ĉ q performs the best among all quantile density estimators. The fluctuated curve of KPL q in Figure 2 may also give a little hint about how bad the KPL method may perform in estimating quantile density functions.

Conclusion
In this article, we proposed three types of smooth quantile and   q , performs better than the two moment quantile methods mentioned above for both Q(u)and q(u) estimation, and also shows an obvious advantage in MSE to all the other nonparametric estimators, especially when it comes to quantile density function estimation.
In summary, if one is only interested in quantile estimation, KEQ Q or KEQ Q , with modification at tails if in need, is a good choice. But if one prefers a more stable estimator, L Q or Ĉ Q may also be considered. For the estimation of quantile density function, Ĉ q is clearly an optimal choice based on considerations of MSE, smoothness and simplicity. In addition, use and KPL q or KEQ q with care for quantile density estimation since the bias can be extremely large compared to other alternatives.