A flexible method for estimating luminosity functions via Kernel Density Estimation -- II. Generalization and Python implementation

We propose a generalization of our previous KDE (kernel density estimation) method for estimating luminosity functions (LFs). This new upgrade further extend the application scope of our KDE method, making it a very flexible approach which is suitable to deal with most of bivariate LF calculation problems. From the mathematical point of view, usually the LF calculation can be abstracted as a density estimation problem in the bounded domain of $\{Z_1f_{\mathrm{lim}}(z) \}$. We use the transformation-reflection KDE method ($\hat{\phi}$) to solve the problem, and introduce an approximate method ($\hat{\phi}_{\mathrm{1}}$) based on one-dimensional KDE to deal with the small sample size case. In practical applications, the different versions of LF estimators can be flexibly chosen according to the Kolmogorov-Smirnov test criterion. Based on 200 simulated samples, we find that for both cases of dividing or not dividing redshift bins, especially for the latter, our method performs significantly better than the traditional binning method $\hat{\phi}_{\mathrm{bin}}$. Moreover, with the increase of sample size $n$, our LF estimator converges to the true LF remarkably faster than $\hat{\phi}_{\mathrm{bin}}$. To implement our method, we have developed a public, open-source Python Toolkit, called \texttt{kdeLF}. With the support of \texttt{kdeLF}, our KDE method is expected to be a competitive alternative to existing nonparametric estimators, due to its high accuracy and excellent stability. \texttt{kdeLF} is available at \url{http://github.com/yuanzunli/kdeLF} with extensive documentation available at \url{http://kdelf.readthedocs.org/en/latest~}.


INTRODUCTION
Luminosity function (LF) is a very fundamental statistic, which provides one of the most important tools to probe the distribution and evolution of galaxies and active galactic nuclei (AGNs) over cosmic time. Accurately measuring the LF of extragalactic objects has long been an important pursuit in modern astronomy. However, this is difficult in observational cosmology since the presence of observational selection effects (e.g., due to detection thresholds in flux density, apparent magnitude, color, surface brightness, etc.) can make any given galaxy survey incomplete and thus introduce biases into the LF estimate (e.g., Marchetti et al. 2016). Numerous statistical approaches have been developed to overncome this limit. These mainly include parametric techniques (e.g., Sandage et al. 1979;Marshall et al. 1983;Jarvis & Rawlings 2000), and nonparametric methods such as the binned methods (e.g., Schmidt 1968;Avni & Bahcall 1980;Page & Carrera 2000, denoted asφ bin ), the Lynden-Bell (1971) C − estimator and its extended versions (e.g., Efron & Petrosian 1992;Caditz & Petrosian 1993;Singal et al. 2014).
The parametric technique itself has been very mature, and the main focus is on chosing appropriate physical models. Nonparametric estimators are advantageous in cases where either there is difficult to find a proper parametric physical model or there is a desire to validate a parametric model (e.g., Schafer 2007). They still have great room for development. On one hand, the traditional binned methods and C − estimator have insurmountable disadvantages (Yuan & Wang 2013;Kelly et al. 2008). On the other hand, some newly developed methods (e.g., Schafer 2007;Kelly et al. 2008;Takeuchi 2010) had hardly shaken the dominant position of traditional estimators, either due to issues of application scope or other factors like computational complexity.
From the mathematical point of view, LF calculation is a typical density estimation problem in the bounded domain. The mathematics behindφ bin is the histogram, the oldest and also the simplest density estimator. The C − estimator resorts to empirical cumulative distributions to avoid the binning of data. The more recent LF methods adopted some more advanced mathematics such as the local likelihood density estimation (Schafer 2007), the copula method (Takeuchi 2010), the kth-nearest-neighbor method (Caditz 2018), and the mixture of Gaussians model in a Bayesian perspective (Kelly et al. 2008), etc.
Kernel density estimation (KDE) is a well-established nonparametric approach to estimate continuous density functions either in univariate or multivariate cases. After years of development and continuous renewal, KDE has become the most popular method for estimation, interpolation, and visualization of probability density functions (e.g., Silverman 1986;Botev et al. 2010;Zhang et al. 2014;Chen 2017;Davies & Baddeley 2018;Gramacki 2018). Caditz & Petrosian (1993) was probably the first to introduce the concept of KDE to LF calculation, while they mainly utilized KDE to improve the C − estimator. In the following years, the idea of KDE did not form a mature solutions for LF measurement. This is due to either the limitation of computing power or lack of data in earlier years. Both the factors restricted the advantages of KDE. Yuan et al. (2020, hereafter Paper I) recently proposed a new method for estimating LFs in the framework of KDE. The new method, proved to be significantly more accurate and stable thanφ bin by simulation, has the advantages of both parametric and nonparametric methods. Nevertheless, the method was mainly devised to treat problems where the sample should have a coverage of redshift as broad as possible. In practice one is usually interested in the LF in some redshift interval (Z 1 , Z 2 ), e.g., estimating the local LF of extragalactic sources (e.g., Condon et al. 2002;Mauch & Sadler 2007;Sadler et al. 2014;Lofthouse et al. 2018;Symeonidis & Page 2019), or measuring the high-redshift LFs of galaxies (e.g., Bhatawdekar et al. 2019;Bowler et al. 2020), AGNs (e.g., Matsuoka et al. 2018;Wang et al. 2019;Adams et al. 2020), and Lyα emitters (e.g., Zheng et al. 2017;Konno et al. 2018;Hu et al. 2019). In this paper, we further develop the KDE method to suit more common application scenarios. We have also developed kdeLF: a public, open-source Python Toolkit. This code is designed to standardize the calculations of our method into a coherent, well-documented user interface. The efficiency and generality of the algorithm makes it capable of treating most of the LF problems for AGNs and galaxies.

Kernel density estimation
Suppose we observe n points in a 2-dimensional (2D) space, X = {X 1 , X 2 , · · · , X n }. Assuming X arises from an unknown probability density function f (x), the goal of kernel density estimation (KDE) is to estimate f . The classical fixed-bandwidth kernel estimate of f given X is written aŝ where X j,1 and X j,2 are the jth observations of the first and second components of the vector X, and n is the sample size. In this equation K(·, ·) is a bivariate kernel function (a non-negative probability density function that integrates to one), and h 1 and h 2 are smoothing parameters also called the bandwidths. It is well known that the choice of kernel function is a secondary matter in comparison with selection of the bandwidth (e.g., Wand & Jones 1995;Botev et al. 2010;Chen 2017;Gramacki 2018). The normal kernel is often used, i.e.,

Estimating the LF via KDE
The differential LF of a sample of objects is defined as the number of objects per unit comoving volume per unit luminosity interval, where z denotes redshift and L denotes the luminosity. Due to the typically large span of the luminosities, it is often defined in terms of log L, where L ≡ log L denotes the logarithm of luminosity. In practice, the main challenge is that only very limited objects are observed and one have to estimate the LF given truncated data. Figure 1 depicts the typical L − z plane of a truncated data set. The domain of data is W defined by {Z 1 < z < Z 2 , L > f lim (z)}; W is referred to as the survey region (Paper I) or study window (e.g., ; f lim (z) is the truncation boundary of sample. For the common flux-limited samples, f lim (z) is given by where d L (z) is the luminosity distance, F lim is the survey flux limit, and K(z) represents the K-correction. The case described in Figure 1 is general to nearly all kinds of LF probleams. If Z 1 0 and Z 2 Z 1 , it is the case studied by Paper I; Else if the interval of (Z 1 , Z 2 ) is relatively small, this corresponds to artificial or forced division of redshift bins as done in many practices. In addition, f lim (z) can take any complicated form (e.g., Richards et al. 2006), not just the one given in Equation (5). The purpose of this work is to give an estimate to the LF φ over the survey region W . Following Paper I, φ is related to the probability distribution of (z, L) by where Ω is the solid angle subtended by the survey, and dV /dz is the differential comoving volume per unit solid angle (Hogg 1999); n is the number of objects observed within (Z 1 , Z 2 ), and hereafter it is referred to as "sample size". The domain of p(z, L) is also W . Seeing that W is a bounded area, direct estimate of p(z, L) via KDE based on the data set {(z 1 , L 1 ), (z 2 , L 2 ), ...(z n , L n )} is difficult. Serious bias can arise at the boundaries and near them, known as boundary effects or boundary bias (e.g., Müller & Stadtmüller 1999). The following transformation is made: The determinant of Jacobian matrix for the transformation is Following the transformation-reflection method of Paper I, we first transform all pairs (z i , L i ) to (x i , y i ) according to Equation (7), then add the missing "probability mass" (e.g., Gramacki 2018) represented by the data set (x 1 , −y 1 ), (x 2 , −y 2 ), ..., (x n , −y n ). Figure 2 provides a toy example showing how a data set with the bounded domain of {Z 1 < z < Z 2 , L > f lim (z)} (top panel) will look like after transformation-reflection (bottom panel). Therefore, the purpose of transformation-reflection is to achieve an unbounded data set to which the KDE estimator can be deployed. The density of (x, y), denoted as 1f (x, y), can be estimated bŷ Then the density of original data set, (z, L), iŝ where the bandwidths, h 1 and h 2 , are the only unknown parameters that need to be determined.

Optimal Bandwidth Selection
Bandwidth selection is the most important issue for using KDE, because the accuracy of KDE is mainly determined by the chosen bandwidths (e.g., Gramacki 2018). To find the optimal bandwidth, we use the likelihood cross-validation (LCV) criterion of Paper I, but make slight modifications described as follows. wheref where J i = {j : x i = x j and y i = y j , for j = 1, 2, ..., n}, and J i = {j : x i = x j , for j = 1, 2, ..., n}, for i=1,2,...n. Note thatf (−i) (x i , y i ) is slightly different from the leave-one-out estimator of Paper I (see their Equation 26).The purpose of the conventional leave-one-out estimator is to exclude an observation that makes the argument of the kernel function zero. Otherwise, the resulting kernel density estimator would contain an unwanted term K(0/h 1 , 0/h 2 ), and minimisation of the cross-validated likelihood would approach infinity as h 1 and h 2 tend to zero. However, we find that it is not enough to simply leave one identical observation out, because there are repeated observations (mainly the redshift). This will lead to the unwanted terms of 0/h 1 or 0/h 2 even when j = i. Thus, at any point (x i , y i ), we propose excluding all the jth terms that make x i = x j or y i = y j from the first summation, and x i = x j from the second summation in Equation (13), and η i denotes the total number of terms that are excluded. Similar ideas can be found in Zhang et al. (2014). We referf (−i) (x i , y i ) as the leave-more-out estimator. In Equation (11), L max is the higher luminosity limit of the survey region W . Because our likelihood function was derived from Marshall et al. (1983), in paper I, we let L max take a value slightly larger than max(L i ) according to the conventions in the literature (e.g., Cara & Lister 2008;Ajello et al. 2012). But we should note that the basis for the conventional value of L max is ambiguous. Can we make it take a much larger value? Obviously, the answer is yes, as usually a sample does not have an observed maximum luminosity limit. If we let L max max(L i ), the double integration in Equation (11) 1, and is independent of the values of h 1 and h 2 . So we may drop the double integration term and obtain a more simplified LCV criterion given by The optimal bandwidths h 1 , h 2 can be obtained by numerically minimizing the objective function S or S 0 . Note that S and S 0 are the negative logarithmic likelihood functions, they can also be used to perform Bayesian inference by combining with prior information on h 1 and h 2 . We find that the results based on S and S 0 are consistent, especially for big samples (n 1000). We suggest using S for samples of n < 1000, and otherwise using S 0 . Our considerations are as follows: (1) the influence of failing to observe a object with L > max(L i ) for small samples is larger than that for big samples, and setting a more conservative (small) L max is necessary for small samples. Thus S is more suitable for small samples as the double integration term provides extra constraint; (2) for big samples, droping the double integration term can significantly save computational time and has little effect on the result. In addition, for the case of having difficulties in convergence while minimizing S 0 , we suggest trying S. Finally, the KDE of the LF may be obtained byφ

Adaptive Kernel Density Estimation
In Equation (9) the bandwidths are constant for every individual kernel. This may suffer from a slight drawback when applied to the typical L−z data which exhibit inhomogeneous point patterns. The fixed bandwidths, even being chosen "optimally" in some sense, can not balance the estimates for sparsely and densely populated areas simultaneously. There is a tendency for spurious noise to appear for the former; if the estimates are smoothed sufficiently to deal with this, then oversmoothing would occur for the latter (e.g., Silverman 1986;Davies & Baddeley 2018). An effective solution to this problem is using variable bandwidths or adaptive kernel estimator, which allows the bandwidths of the kernels to vary from one point to another. The adaptive KDE have been shown to possess both theoretical and practical advantages over their fixed-bandwidth counterpart (Abramson 1982;. Following Paper I, we implement the adaptive KDE in the following steps: 1. Make a pilot estimate using Equations (7) to (14), and obtain the optimal bandwidth, denoted ash 1 ,h 2 .
2. Let the bandwidth vary with the local density: wheref (x j , y j |h 1 ,h 2 ) is calculated via Equation (9) given h 1 =h 1 and h 2 =h 2 ; h 10 and h 20 are global bandwidths, and β is the sensitivity parameter satisfying 0 β 1 (Silverman 1986).
3. In Equations (9) and (13), we replace h 1 and h 2 with λ 1 (x j , y j ) and λ 2 (x j , y j ), respectively. We can obtain the adaptive KDE for the density of (x, y), denoted asf a (x, y|h 10 , h 20 , β), and its leave-more-out estimator, denoted asf a,(−i) (x, y|h 10 , h 20 , β). The expression off a andf a,(−i) are given in Appendix A.
4. The adaptive KDE for the density of (z, L) and the corresponding leave-more-out estimator are 5. In Equations (11) and (14), replacingp andp (−i) withp a andp a,(−i) , respectively, it is easy to obtain the S or S 0 functions for the adaptive KDE. The optimal values of h 10 ,h 20 and β can also be obtained.
6. Finally, the adaptive KDE of the LF iŝ

Small sample approximation
It is rather common in practice that the sample size is small (n 200) in the interest redshift range (Z 1 , Z 2 ). In this situation the bivariate KDE method by Equation (15) can not play its advantages and may give a poor estimate. A possible solution is to approximate the two-dimensional problem to one-dimensional KDE problem. A transformation of the data is also required in the small sample case. But this is a little different from the one given in Equation (7), instead, we have The domain of the new data set, The density of (z, l), denoted aŝ f (z, l) can be approximated by KDE. As the redshift interval (Z 1 , Z 2 ) is small and φ changes little over ∆z, we havê wheref l is the marginal probability density function that can be estimated bŷ where l j = L j − f lim (z j ) corresponding to the jth object, h is the bandwidth, and K 1 is the univariate kernel function given by According to the theory of probability transformation, The corresponding leave-one-out estimator iŝ Combining Equations (23) and (24) with Equation (11), we can obtain the optimal bandwidth h. Finally, the KDE to the LF at the bin ∆z is estimated aŝ where z 0 = (Z 1 + Z 2 )/2, the subscript 1 inφ 1 means one dimensional approximation. Similar with the six steps introduced in Section 2.4, we can give the adaptive version ofφ 1 , denoting asφ 1a (see Appendix B for details).

Inclusion of the survey selection function
More often a quasar survey may have a complicated selection function, and its selection probability is a complicated function of z, M , and the spectral energy distribution of the object (e.g., Fan et al. 2001;Kim et al. 2020). In this case, our KDE method has to be slightly modified to include the contribution from this selection function. A basic idea is to consider a weight for each data point when using KDE. For the ith object with reshift of z i and absolute magnitude of M i , its weight w i ≡ 1/P(z i , M i ), where P is the selection function at (z i , M i ). Intuitively, a object with selection function of 0.5 is "like" two observations at that location (Richards et al. 2006). To perform the KDE, one need to first transform the data by Then the KDE to the new data set iŝ where N eff is the effective sample size given by where J i = {j : x i = x j and y i = y j , for j = 1, 2, ..., n}, and J i = {j : x i = x j , for j = 1, 2, ..., n}, for i=1,2,...n; η i denotes the total number of terms that are excluded from the two summations. Finally, the weighted version KDE LF is given byφ Following the steps of section 2.4, it is easy to obtain the adaptive versions for Equations (27) and (28), whose expressions are given in Appendix C.

SIMULATION RESULTS
Hereafter we denote the LF estimated by Equation (15) asφ, and the small sample approximation by Equation (25) asφ 1 . Their adaptive versions are denoted asφ a andφ 1a , respectively. To test the effectiveness of each estimator, we apply them to the 200 simulated samples of Paper I. These samples are flux-limited, with their flux limits randomly drawn between 10 −2.5 and 10 −0.5 Jy, and they share the same input LF (the model A RLF of Yuan et al. 2017). Their sample sizes range from 2000 to 40,000. For more details about the simulation, one can see Section 4.2 of Paper I. As described in Section 2.2, the range of (Z 1 , Z 2 ) can be flexibly expanded or narrowed according to the practical need. To test the performance of our estimators for different (Z 1 , Z 2 ) ranges, we discuss two cases where samples are divided or not diveded into redshift bins.

Dividing redshift bins
We divide each simulated sample into eight redshift bins, i.e., (Z 1 , Z 2 ) takes (0.0, 0.2), (0.2, 0.5), (0.5, 1.0), (1.0, 1.7), (1.7, 2.5), (2.5, 3.5), (3.5, 4.5), (4.5, 6.0) in turn. We then apply ourφ andφ 1 estimators and also their adaptive versions (φ a andφ 1a ) to each redshift bin for the 200 simulated samples. There are 8 × 200 = 1600 calculations for each LF estimator. For comparison, the result measured by the traditional binning method (denoted asφ bin ) of Page & Carrera (2000) is also given, where the same redshift binning is adopted. The choice of luminosity bin size is also important tô φ bin . Intuitively, neither too large or too small L bins can give desired estimates. Choosing the right number of bins involves finding a good tradeoff between bias and variance (Wasserman 2006). There should exist an optimal L bin size, which may mainly depends on the density distribution of L but not simply on the sample size. But in practice, it is difficult to find the "optimal" L bin size. We thus use an uniform L bin size ∆L = 0.3, also adopted by the many authors in literature (e.g., Page & Carrera 2000;Richards et al. 2006).  To quantify the performance of each LF estimator, following Paper I, we define a statistic d LF for the ith redshift bin which measures the discrepancy of the estimated LFφ from the true LF φ (it is known for the simulated samples). For theφ (andφ a ) estimator, the statistic is given by n i counts the number of objects in the ith redshift bin. For theφ bin estimator, estimates are given only at some discontinuous points, i.e., the centers of each zL bin. The mathematics behindφ bin is the histogram. In principle, a histogram can be considered as a step function that can be evaluated at an arbitrary point. This means that any point of in the zL bin has the same LF, i.e., φ bin (z j , L j ) ≡φ bin (z c , L c ), where (z c , L c ) locates the center of the zL bin. Therefore, Equation (30) should be applicable also for theφ bin estimator.
For theφ 1 (andφ 1a ) estimator, the statistic is given by where z 0 = (Z 1 + Z 2 )/2. Note that the above calculation should exclude those few points with L j < f lim (z 0 ). The statistic d LF can be understood as the typical error of a LF estimator. So, the smaller the d LF value the better. Figure 3 shows the distributions of d LF in eight redshift bins for different LF estimators based on the 200 simulated samples. In each redshift bin, the black solid curve represents theφ bin estimator. The red dashed and solid curves correspond to theφ andφ a estimators, respectively. In Table 1, we summarize the median d LF for different LF estimators based on the 200 simulated samples. Except for the two highest redshift bins, the statistic d LF ofφ and φ a are better than that ofφ bin . Moreover, the smaller dispersion of distributions suggests that they all have better stability than theφ bin estimator, as expected. Comparingφ andφ a , we find that generally the latter performs better, indicating that an adaptive KDE procedure is worthwhile.
Note that for the two highest redshift bins, ourφ andφ a estimators do not exhibit advantages comparing withφ bin . This is because the sample size is relatively small (∼ 50 − 300) in the two highest redshift bins. For such a small sample size, a bivariate KDE method can not play its advantages. While this is a good opportunity for theφ 1a estimator (shown as cyan solid curves in Figure 3) to play its value.
In Figure 4, we show how the statistic d LF changes with the sample size n for theφ,φ bin andφ 1 estimators. In each panel, there are 1600 points which correspond to the d LF values for the eight redshift bins of the 200 simulated samples. For ourφ estimator, there is significant inverse correlation between d LF and n, indicating that it converges to the true LF quickly as n increases. The inverse correlation forφ bin is also obvious, but the trend is less steep. Thus we conclude that ourφ estimator converges to the true LF remarkably faster than the traditionalφ bin estimator. This conclusion is mathematically supported by Wasserman (2006). They proved that the kernel density estimators converge to the true density at the rate O(n −4/5 ), faster than the histogram (rate of O(n −2/3 )). As for theφ 1 estimator, there is not an obvious inverse correlation between d LF and n. This is not surprising, sinceφ 1 makes some approximations, which inevitability leads to information loss. With the increase of n, the bias due to such information loss will dominate. In practice,φ 1 (andφ 1a ) is mainly applied to the small n cases ( 200), where it performs better thanφ (andφ a ).

No dividing redshift bins
No dividing redshift bins corresponds to setting (Z 1 , Z 2 ) as (0.0, 6.0) for the simulated samples. This means considering each sample as a whole and estimating a global LF in the redshift range of (0.0, 6.0). For this "global LF" calculated byφ (andφ a ), we calculate its statistic d LF respectively for the eight redshift bins defined in section 3.1. So we can judge the performance of the "global LF" byφ (andφ a ) at difference redshift intervals, and also compare withφ bin . To test the performance ofφ bin with different L bins, we consider two cases: (1) adopting an uniform L bin of ∆L = 0.3, and (2) using variable L bins of ∆L ∝ 1/ log 10 (n), where n is the sample size in a redshift bin. Table 1 reports the median d LF forφ andφ a . They are significantly smaller than that for the cases of dividing redshift bins. Therefore, we emphasize that redshift binning is a matter of expediency for ourφ andφ a estimators.
Note that when setting (Z 1 , Z 2 ) as (0.0, 6.0), the estimate given byφ a should be equivalent to the result ofφ tra in Paper I. In the Figures 7 and 8 of Paper I., the LFs estimated byφ tra andφ bin , as well as the true LF, were shown. We refer the interested reader to there for an illustration. Figure 5 shows the distributions of d LF for theφ a andφ bin estimators in eight redshift bins.φ a performs significantly better thanφ bin in all the redshift bins. The two cases of uniform L bin and variable L bins forφ bin are shown as black solid and dashed curves, respectively. It seems that adopting variable L bins does not improve the performance ofφ bin , indicating that the "optimal" L bin size may mainly depends on the density distribution of L but not simply on the sample size. Therefore, in practice it is very difficult to find some "optimal" L bin size (and also redshift binning) forφ bin . The difficulties inφ bin stems from the drawbacks of its mathematical foundation, i.e., the histogram. In the mathematical community, the histogram has been an outdated density estimator, used only for rapid visualization of results in one or two dimensions (e.g., Gramacki 2018).

DISCUSSIONS
For the case of dividing redshift bins, we have four LF estimators,φ,φ a ,φ 1 andφ 1a . One may be confused about when and which estimator to use. Generally, the adaptive versions perform better, i.e.,φ a is better thanφ, andφ 1a is better thanφ 1 . At small sample size,φ 1 andφ 1a generally performs better thanφ andφ a . However, there are some exceptions to the above qualitative criterion. Takeφ andφ a for example, about 20% d LF (φ a ) have values larger than that of d LF (φ) in our simulation (see Figure 6). Also note that there are some outliers which are poor estimates bŷ φ a and should be avoided. Therefore, we need a more quantitative criterion to decide which estimator to be used in a specific problem.

Choosing the optimal estimator
One appealing feature of KDE method is its continuous and smooth estimate. In Equations (10) and (15), once knowing the values for h 1 and h 2 by numerical approach, the function forms forφ(z, L) andp(z, L) can be uniquely  determined. Therefore the Kolmogorov-Smirnov test (KS-test) can measure the goodness-of-fit ofφ to the data. The classical one-dimensional KS-test compare the empirical and expected cumulative distribution functions (CDFs), F n (x) and F (x). The KS-test statistic, D KS , is the maximum absolute difference between F n (x) and F (x) for the same x: For theφ estimator, the expected CDF is given by wherep(z, L) is given by Equation (10). The expected CDF forφ a ,φ 1 andφ 1a can be calculated similarly, just replacingp(z, L) with their ownp. When one is confused about how to choose amongφ,φ a ,φ 1 andφ 1a , the KS-test will provide a criterion. One needs to first calculate LFs using the four estimators, and obtain their own D KS values by Equation (32), respectively. Then choose the estimator with the smallest D KS value as the optimal one. Specifically, there are three steps: (1) comparê φ andφ a , recording the optimal one as 2φ 2d ; (2) compareφ 1 andφ 1a , recording the optimal one asφ 1d ; (3) comparê φ 1d andφ 2d , recording the optimal one asφ auto ; Note that the KS-test criterion does not require a knowledge of the true LF. In order to prove its effectiveness, we apply it to the 200 simulated samples and check the consistency between d LF and D KS statistically. A good consistency can guarantee that the KS-test criterion is effective. For the eight redshift bins of 200 simulated samples, we make 1600 calculations and thus have 1600 pairs (d LF , D KS ) for each LF estimator. We only consider those with n 1000, leaving 1061 pairs (d LF , D KS ).
Takeφ andφ a for example, the relation between D KS (φ)/D KS (φ a ) and d LF (φ)/d LF (φ a ) is plotted in the top panel of Figure 7. We find that most points (located at the upper right and lower left quadrants in the figure) meet one of the following conditions: This indicates a good consistency between d LF and D KS . For the rest points, we find that quite a few of them meet one of the following conditions: The condition 3 ○ implies that d LF (φ) and d LF (φ a ) are very close, while the condition 4 ○ indicates that both d LF (φ) and d LF (φ a ) are relatively small. Thus for these "neutral" data points, choosing eitherφ orφ a is acceptable. In the top panel of Figure 7, the points that meet none of the 1 ○ to 4 ○ conditions are shown as red solid circles, and the miss rate of the KS-test criterion is approximately equal to the percentage of these points. The miss rate of the KS-test criterion for the other two steps,φ 1 vs.φ 1a andφ 1d vs.φ 2d , can also be estimated similarly. The only difference is that forφ 1d vs.φ 2d , the last condition should also be involved: 5 ○ : n r < 320 and n r > 1000, where n r is the reduced sample size, defined by Because for small reduced sample size of n r < 320, we always chooseφ 1d as the optimal one, while for n r > 1000 we always chooseφ 2d . The reason will be explained latter. Table 2 summarizes the result of miss rate. Basically, the miss rate for all the three steps are acceptable. Figure 8 plots the d LF ratio as a function of reduced sample size n r for step (3). Intriguingly, there is a obvious trend that the ratio of d LF (φ 2d ) to d LF (φ 1d ) decreases with the increase of n r . A power-law fitting gives d LF (φ 2d )/d LF (φ 1d ) = 10.96n −0.38 r (shown as the gray thick solid line). The orange shaded area takes into account the 3σ error band. The two outermost intersection points between the cyan dashed line and the shaded area are at n r ≈ 320 and n r ≈ 1000 (marked by the black dash dot lines). Below n r ≈ 320 nearly all the points have ratios of d LF (φ 2d )/d LF (φ 1d ) > 1, while above n r ≈ 1000 nearly all ratios are smaller than 1. This is where the condition 5 ○ comes from. In summary, we have proved that the KS-test criterion is effective for choosing an optimal estimator amongφ,φ a ,φ 1 andφ 1a in practice.

A guidance for using our LF estimators
We summarize a guidance for using our LF estimators as follows: 1. Do not divide redshift bins, only if the sample size is too large (n > 10 5 ).
2. If binning is essential, one should make every redshift bin contain sources as many as possible.
3. For small reduced sample size of n r 320, use the small sample approximation. The choice ofφ 1 orφ 1a depends on which one has the smaller D KS value.
4. For small-medium reduced sample size of 320 < n r < 1000, use the KS-test criterion to choose the optimal one fromφ,φ a ,φ 1 andφ 1a .
5. For large reduced sample size of n r > 1000, useφ orφ a , depending on which one has the smaller D KS value.
It should be emphasized that the critical values n r = 320 and n = 1000 are experiential, and appropriate changes are allowed. We denote the "optimal LFs" obtained by the guidance 3 to 5 asφ auto . Figure 5 shows the distributions of d LF in eight redshift bins forφ auto based on the 200 simulated samples. In Table 1, we report the median d LF for φ auto . On the whole,φ auto combines the advantages ofφ,φ a ,φ 1 andφ 1a , and performs significantly better thanφ bin in all the eight redshift bins.  Figure 9. Importing kdeLF and its subroutines to illustrate the overall structure of kdeLF.

Comparison with Paper I
The KDE method in Paper I is only applicable to problems where the sample has a coverage of redshift as broad as possible, i.e., Z 1 0 and Z 2 Z 1 . The KDE method in this paper is not subject to this restriction. In order to reduce boundary effects of KDE, Paper I introduced two solutions, the transformation and transformation-reflection methods. This work inherits the latter method, and add a consideration for the case of small sample size. In a word, this work has generalized our previous KDE method. This new upgrade greatly extend the application scope of our KDE method, making it a very flexible approach which is suitable to deal with nearly all kinds of bivariate LF estimating problems.

A PYTHON IMPLEMENTATION OF THE KDE METHOD
We have developed a software toolbox to implement our KDE method, called kdeLF. The kdeLF package is mainly written in Python and combines the Fortran language to resolve speed bottlenecks. kdeLF's interface is designed to be simple, intuitive, and easily extensible.

Package Structure and Basic Usage
The overall structure of the kdeLF package is illustrated in Figure 9, where the package and its main subroutines are imported or executed in a typical python session. These subroutines have fairly obvious names: get optimal h obtains the optimal values for bandwidths using maximum likelihood estimation (MLE); get lgLF gets the logarithm values of the LFs estimated via the KDE estimators, and also plots the LFs; KStest performs the KS-test and plots the empirical CDF and expected CDFs by the KDE estimators; run mcmc implements a fully Bayesian Markov Chain Monte Carlo (MCMC) method to determine the posterior distributions of bandwidths and other parameters (e.g., β for adaptive estimators). The MCMC core embedded in kdeLF is the Python package emcee (Foreman-Mackey et al. 2013). The Bayesian method allows us to recover the parameters with a complete description of their uncertainties and degeneracies via calculating their probability density functions (PDFs). Then, by running the chain analysis subroutine, one can probe the shape of these PDFs, and the correlations among bandwidth parameters, giving more information than just the best-fit and the marginalized values for the parameters (also see Calistro Rivera et al. 2016).
which sets up an KdeLF instance, named test, that represents the initial conditions of a LF calculation. The first four arguments are required and the others are optional. sample file is the name of sample file that contains at least two columns of data for z and L (or absolute magnitude M ). If the user wish to calculate a KDE LF considering weighing, a third column containing selection probabilities should be provided in sample file. zbin is the redshift range [Z 1 , Z 2 ]. f lim is the user defined Python function calculating the truncation boundary f lim (z) of sample, and solid angle is the solid angle (unit of sr) subtended by the sample. kdeLF adopts a Lambda Cold Dark Matter (LCDM) cosmology, but it is not limited to this specific cosmological model. The optional arguments Om0 and H0, defaulting as 0.30 and 70 (km s −1 Mpc −1 ), represent the Ω m parameter and Hubble constant for the LCDM cosmology, respectively. The optional arguments small sample approximation and adaptive have four different combinations, (False, False), (False, True), (True, False) and (True, True), corresponding to the usages ofφ,φ a ,φ 1 andφ 1a , respectively. The speed of the kdeLF code mainly depends on the sample size n. The most time-consuming part of the calculation in kdeLF is implemented by Fortran, and a parallel processing via OpenMP (Open Multi-Processing) is used. For a calculation of n = 5000, kdeLF takes about one minute to finish the get optimal h subroutine on a 8-core personal computer. run mcmc is the most time-consuming subroutine, which takes about 1-2 hours.

An Example
As a test example, we apply kdeLF to the quasar sample compiled by Kulkarni et al. (2019). The sample is from the SDSS DR7 quasar catalogue (Schneider et al.2010). Kulkarni et al. (2019) restricted it to the 48 664 0.1 < z < 2.2 quasars selected with the final SDSS quasar selection algorithm (Richards et al. 2006) from a survey area of 6248 deg 2 . Considering that the lower redshift bins are excluded from Kulkarni et al. (2019)'s LF evolution analysis due to obvious systematic errors, we only use the 0.6 < z < 2.2 quasars. The sample is not assumed to be complete within its survey region. The ith quasar (z i , M i ) is associated with a value for the selection function P(z i , L i ), which can be interpreted as the probability that a quasar is at this location. To use kdeLF, we prepare a txt file named, say 'data.txt'. The first four lines of its content With this three-column data, kdeLF will automatically implement the KDE analysis considering the weighing due to the selection function. For comparison, we divided the sample into the same seven redshift bins as Kulkarni et al. (2019). For the data of each redshift bin, we can respectively run kdeLF through a few lines of simple code to obtain a KDE LF analysis. Figure 10 shows a example of code for the 0.6 < z < 0.8 bin. Figure 11 shows the resultant LF estimates byφ a for the seven redshift bins. These are shown as blue solid curves with orange error bands of 3σ (99.93 %) uncertainties. Different from theφ bin estimator, our KDE method is able to give the LF values at any redshift in the range of Z 1 < z < Z 2 . The LFs are plotted at the mean redshift of each redshift bin. The red solid circles indicate the binned LFs inferred by Kulkarni et al. (2019). Our KDE LFs are  in good agreement with their estimates. But for the bright end LFs, our KDE estimates have smaller uncertainties. The main difference occurs at the faint limit, where the binned LFs show a suspicious decline (red open circles in the Figure). Kulkarni et al. (2019) thus concluded that the SDSS selection function is systematically overestimated at its magnitude limit. They treated those few bins near the faint limt as "discrepant" ones, and discarded the contributing quasars from their analysis. We argued that although the possibility of overestimation for the selection function can not be ruled out, theφ bin method they used is also an important factor. Through a detailed graphical analysis and Monte Carlo simulation, Yuan & Wang (2013) concluded that once the LF is evolving with redshift, the classical binned methods will inevitability underestimate the density for the bins which are crossed by the flux limit curves. This is a typical boundary bias. In our KDE analysis, we do not discarded the "discrepant" quasars. Figure 11 shows that our KDE LFs do not show a suspicious decline at the faint limit. This is owe to the reflection operation in our KDE method, which can significantly minimize the boundary bias near the faint limit.
A careful inspection may find that our KDE LFs do not extend down to the faint luminosities of the open circles in Figure 11. This is due to the existence of survey limit, the faint luminosity limits are different for different redshifts. Figure 12 illustrates how does this happen.φ bin gives discrete LF points, whose faint luminosity limit is shown as red solid circle in Figure 12.φ a (and our other KDE estimators) gives continuous LF, whose faint luminosity limit is shown as black solid circle in the figure. If the LF byφ a takes a z value close to Z 1 , it can extend down to the faint luminosity of red solid circle.
One important feature of our KDE method is involving the Bayesian inference. Figure 13 shows marginalized 1D and 2D posterior probability distribution functions of the bandwidth parameters h 1 and h 2 forφ, and h 10 , h 20 and β forφ a . The values of h 1 and h 2 are used as pilot bandwidths for implementingφ a (see Section 2.4). The Figure illustrates that all the five parameters are well constrained.

Future Development
In mathematics, KDE is a very successful smoothing technique with many practical applications. Some of its aspects, such as bandwidth selection, and fast KDE algorithm for big data, are still at the forefront of research. We are optimistic about the application prospect of KDE method in LF measurement and also other fields of astronomy. We anticipate that our method and the kdeLF code, especially the latter, will be under continuous development over the coming years, and that this upgrade will be driven by the needs of its users. The source code is currently hosted in a git repository on GitHub at http://github.com/yuanzunli/kdeLF, and version 1.0.0 is archived in Zenodo (Yuan et al. 2022). Users are encouraged to report bugs through the issue tracker on GitHub, and contributions are welcome. To learn more about how to use kdeLF in practice, it is best to check out the documentation on the website http://kdelf.readthedocs.org/en/latest , where the API documentation and some examples are presented.

SUMMARY
We summarize the important points of this work as follows.
1. We propose a generalization of our previous KDE (kernel density estimation) method for estimating luminosity functions (LFs). This new upgrade further extend the application scope of our KDE method, making it a very flexible approach which is suitable to deal with nearly all kinds of bivariate LF estimating problems.
2. We believe that, from the mathematical point of view, the LF calculation can be abstracted as a density estimation problem in the bounded domain of {Z 1 < z < Z 2 , L > f lim (z)}. We use the transformation-reflection KDE method to solve the problem and denote the estimator asφ. For small sample size situation, an approximate method based on one-dimensional KDE is proposed, denoting asφ 1 . To further improve their performance, we introduce their adaptive versions, denoting asφ a andφ 1a .
3. Based on 200 simulated samples, we find that for the cases of no dividing redshift bins,φ andφ a , especially the latter, have a very good performance, and have overwhelming superiority to the traditional binning method (φ bin ).
4. For the cases of dividing redshift bins, we propose a KS-test criterion to choose the optimal LF estimator from φ,φ a ,φ 1 andφ 1a according to a simple and easy-to-operate guidance. The selected estimator, denoted asφ auto , combines the advantages ofφ,φ a ,φ 1 andφ 1a , and performs significantly better thanφ bin in all the eight redshift bins.
5. The simulation suggests that with the increase of n, our KDE LF estimator converges to the true LF remarkably faster thanφ bin .
6. We have developed a public, open-source Python Toolkit, called kdeLF, to implement our KDE method. The performance of the code for real data was tested on a quasar sample from the SDSS. With the support of kdeLF, our KDE method could be a competitive alternative to existing nonparametric estimators, due to its high accuracy and excellent stability.