Kernel density estimation under widely orthant dependence

We investigate the kernel density estimator for widely orthant dependent random variables and obtain the exponential inequalities and the exponential rate for the estimator of density function with a uniform version over compact sets. The consistency of the estimator is also proved. Abstract: The kernel density estimator for widely orthant dependent random variables is studied. The exponential inequalities and the exponential rate for the estimator of a density function with a uniform version over compact sets are investig-ated. Further, the consistency of the estimator is proved. The results are generalizations of some existing outcomes for both associated and negatively associated samples. The convergence rate of the kernel density estimator is illustrated via a simulation study. Moreover, a real data analysis is presented.


Introduction
The density estimation of random variables has always been a classical problem in statistics.With the continuous development of statistics, numerous density estimation theories and methods have been proposed.Among them, kernel density estimation is commonly used to display the shape of a dataset without relying on a parametric model.Parzen [1] , Rosenblatt [2] , and Silverman [3] have provided early results on kernel density estimation, and since then, much research has been conducted in this area.
In most previous studies, it is assumed that random variables are independent and identically distributed with a common density.However, in many stochastic models and statistical applications, the assumption that random variables are independent is not plausible; therefore, the assumption of dependence is more appropriate.Many useful and interesting results have been obtained for dependent random variables .For instance, Yakowitz [4] showed that the kernel density estimator is asymptotically normal when the sample is obtained from a stationary time series.Bagai et al. [5,6] proved the pointwise and uniform consistency of the kernel estimator under association.Roussas [7] , Masry [8] and Oliveira [9] introduced the consistency results for the kernel estimator under association.Henriques et al. [10] obtained an exponential rate for pointwise and uniform convergence on compact sets with additional conditions for the latter case.Ling et al. [11] proved the strong, uniformly strong, and square consistency of the kernel estimator for negative quadrant dependent variables under some suitable conditions.Su et al. [12] obtained the strong and moment consistency of the kernel estimator under suitable conditions.Wang and Li [13] provided the definitions of generalized consistency for kernel density estimation and obtained several types of generalized consistency of kernel density estimation, and so forth.
Recently, Kheyri et al. [14] studied the kernel density estimat-or for negatively superadditive dependent (NSD) random variables and obtained the exponential inequalities and exponential rate for NSD random variables.As an application, the corresponding results for Farlie-Gumbel-Morgenstern (FGM) sequences have been studied.It is known that some dependence structures, such as negatively orthant dependence (NOD), extended negative dependence (END) and widely orthant dependence (WOD), are much more general than NSD.Notably, the extensibility of the results of Kheyri et al. [14] to a more general case is an issue that has attracted much interest.Moreover, Kheyri et al. [14] did not consider the consistency of the kernel density estimator.Notwithstanding the vast number of reported studies in the literature on the asymptotic properties of the kernel density estimator under dependent cases, there are no reported studies concerning WOD random variables, which are the most general among the aforementioned dependent random variables.In this study, we consider samples that satisfy the WOD notion and study the exponential inequalities, consistency, and convergence rate of the kernel density estimator.The results obtained in this study extend and complement those of Kheyri et al. [14] to a much more general extent.Moreover, some numerical analysis and a real data analysis are presented to support the theoretical results.
Throughout this study, we always assume that denotes a positive constant that only depends on some given numbers and may vary from one place to another.The bandwidth and density function are denoted by and , respectively, while the limits are taken as unless indicated otherwise.The remainder of this paper is organized as follows.The definitions, necessary notations, and some preliminary lemmas are introduced in Section 2. In Section 3, we prove the exponential inequalities for pointwise convergence and, with an extra condition imposed on the kernel function, uniform convergence on compact sets under WOD.In addition, we derive the convergence rate of the kernel estimator that is of the order , is the dominating coefficient, which will be discussed later.This convergence rate is slightly slower than the best-known rate for independent cases, and it is of the order .In Section 4, we prove some results regarding the consistency of the estimator.Moreover, some numerical analysis and a real data analysis are presented to support the theoretical results in Sections 5 and 6, respectively.

Preliminaries
Let be a sequence of random variables with the common unknown density function .
Let be a fixed probability density function and be a sequence of nonnegative real numbers converging to zero.Then, the kernel estimator of the density function is, as usual, defined as which is well known to be asymptotically unbiased, provided that the density function is bounded and continuous.Moreover, the convergence of to is uniform on compact sets under these assumptions on .The concept of WOD random variables was introduced by Wang et al. [15] as follows: A finite collection of random variables is said to be widely upper orthant dependent (WUOD) if there exists a finite real number such that for all finite real numbers , A finite collection of random variables is said to be widely lower orthant dependent (WLOD) if there exists a finite real number such that for all finite real numbers , If are both WUOD and WLOD, then are WOD with dominating coefficients .A sequence of random variables is said to be WOD if every finite subcollection is WOD.
Recall that when for some positive constant , the random variables are said to be END.In particular, when for any , the random variables are said to be NOD or negatively dependent.Joag-Dev and Proschan [16] pointed out that negatively associated (NA) random variables are NOD.Hu [17] introduced the NSD concept and pointed out that NSD implies NOD.From the statements above, we can see that the class of WOD random variables includes END, NOD, NSD, and NA random variables, as well as independent random variables as special cases.In particular, every -dimensional FGM distri-bution describes a specific WOD structure.Furthermore, Wang et al. [15] also exemplified that WOD encompasses some positively dependent structures.Hence, it is of great interest to study the limiting behavior of WOD random variables and its applications.
A technical problem arises when dealing with for WOD variables.In fact, WOD is only preserved under monotone transformations; that is, in general, the variables are not WOD.One way to resolve this problem is to assume that the kernel has bounded variation.Now, we introduce a set of assumptions to prove our main results.
is a strictly stationary sequence of WOD random variables with a bounded and continuous density function .
has a bounded and continuous derivative of first and second orders.
and .

A3
The sequence of bandwidth { } is such that, as , and .

k(•)
Remark 2.1.We state that A1−A3 are highly general assumptions that are frequently used in investigating kernel density estimation, except A2(i).Kheyri et al. [14] first provided some common examples of the kernel function satisfying A2(i).
If are all non-decreasing (or non-increasing), then are still WOD.
n ⩾ 1 (ii) For each and any real number t, [19] Let be a centered random variable.If there exist such that , then for every , ) .
To obtain the main results, we require the following notations.Given A2(i), define . For each and , let Note that, by Lemma 2.1, we can see that if is WOD, then is also WOD.Finally, for each and , we define If the kernel satisfies A2(i), the functions and may be chosen bounded such that every random variable is bounded by , where represents the supremum norm.Here, we use Lemma 2.2 to control the moment-generating function of the variables.A simple application of this lemma yields the following upper bounds.The proof is straightforward; therefore, it is omitted.
Lemma 2.3. [14]Let be random variables, and suppose that A2(i) is satisfied.If , are defined by Eq.( 4), then, for every , 3 Exponential inequalities and convergence rate Now, we prove an exponential probability inequality for and thereafter for the centered estimator of .Next, we obtain the exponential convergence rate of the centered estimator of .
Here, we extend and improve the results of Henriques and Oliveira [9] and Kheyri et al. [14] for associated random variables and NSD random variables, respectively, to the WOD case.One of our main results is presented in the following proposition: Suppose that A1(i) and A2(i) are satisfied.Then, for every and sufficiently large , Proof.Using Eq. ( 3), we have ε > 0 So for all and applying Lemma 3.1, This completes the proof.Now, we use the decomposition of a compact interval to derive a uniform exponential rate for the centered estimator.
Proposition 3.2.Suppose that A1(i) and A2(i) are satisfied, the kernel is Lipschitzian and as .Then, for every , a sufficiently large , and each interval , Proof.Let be a fixed interval and decompose into subintervals of length .Then, it is easy to show that Because is Lipschitzian, there exists a constant such that, for all , Therefore, the condition as implies that for all , ) .
Thus, by (8), and the proof is complete.

ε n
We derived some sufficient conditions to prove an exponential rate for the kernel estimator of the density function.To prove the convergence rate, we choose depending on as To obtain a convergence series in the right-hand side of ( 8) and (10), must be conveniently chosen (it depends on a constant appearing in the inequality).Note that in Proposition 3.2, the assumption must be replaced by , and this is also verified by the choice made for and letting . Therefore, the convergence rate of the kernel estimator is of the order .This convergence rate is slightly slower than the best-known rate for independent cases, which is of the order .Noting that NSD implies WOD, videlicet , our result extends that of Kheyri et al. [14] .Furthermore, the dominating coefficient can increase with for any , whereas the convergence rate remains the same.Even if increases geometrically, our results remain robust to some extent.

Consistency
In this section, we study the consistency of kernel density estimation for the WOD sample.First, we present a lemma that is useful for proving the main results.
Lemma 4.1. [14]Let A1, A2(ii), and A3 be satisfied.Then With Lemma 4.1, we can obtain the following results for weak consistency and the rate of strong consistency of the kernel density estimator.
Then, at each point of continuity of , the estimator is weakly consistent, that is, for each , It is sufficient to prove that both summands in the last expression tend to 0 as .The first summand is stochastic and the second one is the deterministic component (the bias), both of which tend to 0 as .By (8), we have that Combining (12) (13), we can easily obtain .Hence, is consistent in probability.The proof is complete.
Proof.According to (12), we have .Hence, it suffices to show that . By virtue of the Borel-Cantelli lemma, one only needs to prove . In fact, from (8), we can easily obtain . This completes the proof.

Numerical simulation study
In this section, we conduct a numerical simulation study using R software to examine the performance of the kernel density estimation with the WOD samples.The simulation is conducted for the following two cases.
Case 1 To generate the WOD random variables, for any fixed , , where represents zero vector and

Here
. In the following, we consider for simplicity.From Joag-Dev and Proschan [16] , it can be seen that is an NA vector, and it is, therefore, a special case of the WOD vector.To examine the performance of the kernel density estimation with the WOD samples, we consider the kernel estimator with .We consider the fixed bandwidth and the bandwidth based on cross-validation (CV), and we take the sample sizes as , respectively.We use R software to compute the estimators 500 times to obtain the final values and then compare them with in Figs. 1 and 2 for different sample sizes.
In Figs. 1 and 2, the black lines represent the true probability density function and the red points are the estimation values.We can see that the estimator tends to as the sample size increases.These results are in good agreement with the theoretical results that we establish in this study.
Case 2 Let and be independent of each other, where and are possibly valued at and is a sequence of mutually independent 3) n = 100, 200, 400, 800 random variables.From Liu [20] , it follows that is the END vector; therefore, it is WOD.To examine the performance of kernel density estimation with the WOD samples, we generate a sample of the simulations chosen as , with the sample sizes .Other settings are the same as those in Case 1.For every point , , , we also use R software to compute the estimator times to obtain the global root mean square error (GRMSE), as presented in Table 1 for different sample sizes, where the GRMSE is defined as GRMSE = .To express this tendency more clearly, we also obtain the boxplots of in Fig. 3. From Table 1, we can see that the GRMSE decreases for n ⩽ 200 n ⩾ 400 each bandwidth as the sample size increases.For , the GRMSE with a fixed bandwidth is slightly smaller than that with CV.For , the GRMSE with CV reduces.Fig. 3 shows the similar results.Overall, the fixed bandwidth and the bandwidth based on CV do not have evident differences.

Real data analysis
In this section, we perform a real data analysis for kernel density estimation based on the urbanization rate time series of China (denoted as ).Fig. 4 shows 67 yearly urbanization rates from 1949 to 2015.The data are obtained from the website of the National Bureau of Statistics of China.First, we plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) in Fig. 4. From Fig. 4, it is evident that the sample is not stationary.For the best modelling, we must transform the data.The results of the first-order difference are shown in Fig. 5.The trend of and its ACF and PACF in Fig. 5 show a stationary process.Thereafter we perform the unit root test and find that this time series is stationary.According to the Akaike information criterion (AIC) and Fig. 5 [15] , we know that WOD comprises not only negatively dependent structures but also some positively dependent structures.Therefore, it is more reasonable to use WOD to describe the data.Because the distribution of linear transformations or linear combinations of multivariate normal variables are again multivariate normal, the difference in real data has a multivariate normal distribution with a zero mean vector and covariance matrix For the fixed bandwidth and the bandwidth based on CV, the kernel estimator using as the standard normal distribution is presented in Fig. 6.The black line in Fig. 6 is the probability density function of , where and 0.5982 are the estimation values of and , respectively, and the red dotted lines are the kernel estimation values.We can see that the kernel estimation has a good fitting effect in real data analysis.Overall, the fixed bandwidth and the bandwidth based on CV do not have evident differences.

Table 1 .
The GRMSE of the kernel density estimator.