Global Polynomial Kernel Hazard Estimation

This paper introduces a new bias reducing method for kernel hazard estimation. The method is called global polynomial adjustment (GPA). It is a global correction which is applicable to any kernel hazard estimator. The estimator works well from a theoretical point of view as it asymptotically reduces bias with unchanged variance. A simulation study investigates the finite-sample properties of GPA. The method is tested on local constant and local linear estimators. From the simulation experiment we conclude that the global estimator improves the goodness-of-fit. An especially encouraging result is that the bias-correction works well for small samples, where traditional bias reduction methods have a tendency to fail.


Introduction
In this paper we introduce a polynomial correction to a kernel hazard estimator based on standard survival data formulated via counting processes.The global correction attempts to obtain the best of both the two dominating non-parametric estimation worlds: kernel estimation and spline estimation.One clear advantage of kernel estimators is their analytical tractability whereas many practitioners seem to prefer polynomial based methods since they often give a good performance.
In this paper we start with an arbitrary kernel hazard pilot estimator, α, and correct it multiplicatively by a global polynomial.The parameters of the polynomial are chosen in such a way that the empirical moments of the counting process fit the resulting estimator up to some chosen order, r, of the adjusting polynomial.The resulting estimator is called global polynomial adjustment (GPA).
Such a polynomial adjustment is known from density estimation, see Efron & Tibshirani (1996).They do however use an exponential transformation of the data.This procedure requires more computations, but it does have the advantage of preserving positivity.
For the last decades there has been a lot of research aimed at finding new ways of reducing the bias of basic kernel smoothers.See Jones & Signorini (1999) for an overview and a comparative simulation study in the case of kernel density.Recently there is a huge development in the case of filtered data, see e.g.Nielsen, Tanggaard & Jones (2009), Gámiz Pérez, Martínez Miranda & Nielsen (2013) and Spreeuw, Nielsen & Jarner (2013).The case of kernel hazard estimation is studied in Nielsen & Tanggaard (2001).A multivariate study is given in Gámiz Pérez, Janys, Martínez Miranda & Nielsen (2013).However, none of these papers use the GPA technique introduced in this paper but rather stop with local adjustments.This is where our paper starts.We show that easy applicable global adjustments, here the GPA, can improve these estimates significantly.
In this paper, we consider as pilot estimators the estimators considered in Nielsen & Tanggaard (2001) who introduced the concepts of local constant and local linear kernel hazard estimators and rephrased the well known estimator of Ramlau-Hansen (1983) in this framework.Nielsen & Tanggaard (2001) also reformulated the traditional multiplicative bias correction method of Jones, Linton & Nielsen (1995) and Nielsen (1998)  and introduced a local additive bias correction method based on the same minimisation.
The GPA method could be considered as a semiparametric approach to nonparametric estimation along the lines of Copas (1995), Eguchi & Copas (1998), Hjort & Glad (1995), Hjort & Jones (1996) and Loader (1996).However, as the asymptotic theory points out, then our type of semiparametric estimation is indeed quite different from the more common semiparametric estimation techniques.It is seen from the asymptotic theory that our estimator very often has a better asymptotic bias than if a GPA correction had not been applied.This holds even in situations where the true underlying hazard is far from the used parametric model: a polynomial in our case.This is different from these other semiparametric models, where the estimation method only works if the used parametric model is close to the true underlying hazard, see for example Hjort & Jones (1996).
The theory and the simulation experiment in Nielsen & Tanggaard (2001) demonstrated that local bias reduction methods are effective.Nevertheless, it turns out that the global polynomial correction of this paper is so good, that it is only of minor importance to use local bias adjustment in addition to the global adjustment.The step from local constant estimation to local linear estimation is, however, still important even when our global adjustment is used as a final step.In particular, we note that the global correction method gives a substantial improvement for small data sets where traditional bias reducing methods typically do not work.A feature of the traditional bias reducing methods is that they seem to work for data sets where there is no need for further precision, namely large samples, whereas they do not work well for small data sets.This is not the case for the global polynomial correction method as it works very well for both small and large data sets.
The simulation study shows that the usual conclusion holds for global polynomial adjustment, namely that it is better not to use too much bias correction when the data sets are small.It is preferable to use the more sophisticated bias correcting methods for large data sets.However, inspection of the simulation results suggests that applied researchers can do uniformly well by simply taking the local linear estimator as pilot estimator and make a global correction along the lines of this paper.This paper proceeds as follows.In section 2 we introduce the general principle of globally correcting a pilot kernel hazard estimator.In section 3 we give the asymptotic properties of the GPA method and in section 4 we give an extensive simulation evaluating the performance of 5 different pilot estimators.

Global Hazard Estimators in a Counting Process Setting
We observe n individuals i = 1, . . ., n.Let N i count observed failures for the i'th individual in the time interval [0, T ].We assume that N i is a one-dimensional counting process with respect to an increasing, right continuous, complete filtration F t , t ∈ [0, T ], i.e. one that obeys les conditions habituelles, see Andersen, Borgan, Gill & Keiding (1993, p.60).We model the random intensity as with no restriction on the functional form of α(•).Here, Y i is a predictable process taking values in {0, 1}, indicating (by the value 1) when the ith individual is under risk.We assume that (N 1 , Y 1 ), . . ., (N n , Y n ) are iid for the n individuals.
We consider any preliminary estimator, α, of α, and we define the r'th order global polynomial correction based on adjusting the empirical counting process moments to this estimator.
Let r be a positive integer and let M = ( M 0 , . . ., M r−1 ) , where is the j'th empirical counting process moment.We construct the global polynomial estimator, α r , by adjusting the original estimator by a multiplicative polynomial correction such that where The parameter vector c = ( c 0 , . . ., c r−1 ) is chosen such that the first r counting process moments are correct, i.e.
Note.One could have chosen to multiply an exponential function of a polynomial instead of a polynomial itself.This would have the attraction of preserving positivity, but it would also have the disadvantage of requiring more complicated numerical computations.

Asymptotic Theory of the Global Polynomial Estimators
The general approach in non-parametric kernel estimation is to divide the estimation error into a stable part and a variable part.Under the standard conditions in that smoothing theory (cf.e.g.Nielsen & Tanggaard, 2001 for the local linear estimator) it is easy to verify that the asymptotic properties of the variable part, and thus the asymptotic variance, will remain unchanged from the global adjustment.The bias will, however, change.While the order of magnitude of the bias is unchanged, the leading term of the bias will be substantially reduced.
Let's assume that before the global polynomial adjustment, the bias of the estimator has the form where b is a chosen bandwidth and g(t) is a function of the hazard argument, t.
For the estimators we will consider in the next chapters, we will have x = 2 for the local constant and the local linear estimators, x = 4 for the multiplicative bias corrected estimator, and x = 6 for the two times multiplicatively bias corrected estimator and the additively bias corrected estimator.After a global polynomial adjustment it is a straightforward exercise to show that under the conditions of the theorems in Nielsen & Tanggaard (2001) the bias will be adjusted to where x is unchanged and g is replaced by The coefficients, c 0 , . . ., c r−1 , are chosen so that is a polynomial down-weight of g(s) towards 0. The larger r is, the heavier the down-weight will typically be.Therefore, the global polynomial method reduces the leading term of the bias without changing the variance.
Accordingly, the global polynomial kernel hazard estimation approach has good theoretical properties and appears to be a promising method for applied research.In practice one should expect that r should be rather small when the observed data set is small, whereas it takes a relatively large data set to allow for the increased complexity in the estimation step introduced by large r's in the global polynomial regression.This conjecture is supported by the study of the small sample properties in the next section.

A Simulation Study
In this section we conduct a Monte Carlo simulation study for several hazard estimators and the global polynomial adjustment of these estimators.We follow Nielsen & Tanggaard (2001) and consider five different kernel hazard estimators: α 1 (local constant), α 2 (local linear), α 3 (multiplicatively bias corrected local linear), α 4 (double multiplicatively bias corrected local linear), α 5 (additively bias corrected local linear).More precisely, these estimators are defined as follows.
For a kernel K and a bandwidth b let where K b (u) = b −1 K(u/b).The local constant estimator with the weighting W (s) is The local linear estimator with weighting W (s) equals The multiplicatively bias corrected estimator of the local linear estimator is defined as where the local linear estimator of the multiplicative error, g M (t) = α(t)/ α 2 (t) and given by Revista Colombiana de Estadística 38 (2015) 399-411 is constructed with the weighting function We also consider the two times multiplicatively bias corrected estimator, α 4 , which is constructed by bias correcting α 3 .
The additively bias corrected estimator is based on a preliminary bias estimator, B t , which is constructed by bootstrapping the bias (see Nielsen & Tanggaard 2001).The result is We call α 5 (t) = α 2 (t) + g A (t) B t the local linear additively bias corrected estimator.
In the simulation studies we use the weighting W (s) = 1 and we consider global corrections of order, r = 0, . . ., 8 (note that r = 0 amounts to no correction).All estimators are based on the kernel function where we have left out the normalization constant.As in Nielsen (1998) and Nielsen & Tanggaard (2001) we use as the true hazard one of the four functions: where B(t, α, β) is for t ∈ (0, 1) the value of the Beta-density with parameters, α, β (see (Nielsen 1998) for a graph of the four functions).
Each simulation run is constructed as follows.First, we define a discrete grid on the interval (0, 1) with grid length, δ M = 1/(M + 1), as {t m : t m = mδ M , m = 1, . . ., M }.Then, for a sample of n individuals, failures at time t m are generated from the binomial distribution, Binomial Y (n) (t m ), γ j (t m )δ M .Although we tried several values of M , we report only the results for M = 500.Higher values of M do not seem to alter the conclusions.
To evaluate the simulations we use the following global measure of estimation error 2 Y i (s)ds.
Our simulation study is based on the best possible bandwidth and, in the case of the global polynomial fit, also on the best possible r, with 1 ≤ r ≤ 9. Comparisons are therefore made in a best-case scenario, which separates choice of estimator from the bandwidth and polynomial degree selection problems.This means that for each simulation run where GPA is applied, we choose the best two dimensional parameters, (b, r), having the lowest possible value of err( α b ).The lower bound on the estimation error is unattainable in practice.This parallels the type of simulation study carried out in Jones et al. (1995), Jones & Signorini (1999), Nielsen (1998), Jones, Signorini & Hjort (1999) and Nielsen & Tanggaard (2001).
Tables 1-5 summarise the simulation results for the five different pilot estimators, receptively, with samples of size n = 50, n = 150, n = 500 and n = 1000.The estimation error is measured as the average over 250 simulation runs.The minimisation is done with respect to the bandwidth, b, in the left panel and with respect to the bandwidth and the polynomial degree, (b, r), in the right panel.
The left panel shows the minimum estimation error with no global adjustments.The numbers are averages of over 250 simulation runs of err( α b ).The right panel shows the values if the GPA is applied.
We can conclude that the GPA improves the goodness-of-fit for all estimators and all sample sizes.Furthermore, the GPA works relatively better on the local Revista Colombiana de Estadística 38 (2015) 399-411 The estimation error is measured as the average over 250 simulation runs.The minimisation is done with respect to the bandwidth, b, in the left panel and with respect to the bandwidth and the polynomial degree, (b, r), in the right panel.The estimation error is measured as the average over 250 simulation runs.The minimisation is done with respect to the bandwidth, b, in the left panel and with respect to the bandwidth and the polynomial degree, (b, r), in the right panel.
constant and the local linear estimators, that is when no local bias correction is applied.It also seems to work better for small data sets.This is different to classical local bias corrections (See α 3 , α 4 , α 5 with no GPA) which seem not to be advisable for small data sets.The estimation error is measured as the average over 250 simulation runs.The minimisation is done with respect to the bandwidth, b, in the left panel and with respect to the bandwidth and the polynomial degree, (b, r), in the right panel.The estimation error is measured as the average over 250 simulation runs.The minimisation is done with respect to the bandwidth, b, in the left panel and with respect to the bandwidth and the polynomial degree, (b, r), in the right panel.
Furthermore, we can see that the optimal r increases with the size of the data set which indicates that large data sets allow for more complexity in the estimation step.
In general, the simulation results support the asymptotic theory.We observe that when GPA is applied, a greater bandwidth is chosen than without GPA.
The asymptotic theory shows that for a fixed bandwidth, GPA reduces the bias.Therefore, a greater bandwidth (which adds bias and reduces variance) re-balances bias and variance so that the global estimation error is minimised.

Conclusion
In this paper, we have introduced the so-called global polynomial adjustment (GPA) for the bias reduction of kernel hazard estimators.The theoretical properties of the GPA as well as a simulation study, using several different pilot kernel hazard estimators, suggest that the GPA is very promising.Therefore, it is worth exploring the extension of the global estimation principle of this paper to the local likelihood estimation principle (see, e.g.Otneim, Karlsen & Tjøstheim 2013) and to the cases where variable bandwidth and variable kernel are considered (see Nielsen 2003, Koul & Song 2013).Also, other parametric shapes other than global polynomials could have been used, for example the Gumbel distribution of (Salinas, Pérez, González & Vaquera 2012) or multivariate structures as in Martínez-Flórez, Moreno-Arenas & Vergara-Cardozo (2013) or Lemonte, Martínez-Florez & Moreno-Arenas (2014).In the latter case, one could for example adjust the marker dependent hazard estimators of Nielsen & Linton (1995) or Nielsen (1998) according to relevant global moments.The practical choice of polynomial correction r is of course important.We are currently investigating how to adapt cross-validation and Do-validation to global polynomial kernel hazard estimation.This would involve generalising the cross-validation and do-validation procedures given in Gámiz Pérez, Martínez Miranda & Nielsen (2013) to also include picking the order of the adjusting polynomial.This again would secure that a classical variance/bias trade-off ensures that r is not being picked too big (overfitting) or too small (not taking advantage of the global correction trick).
as the result of minimising a loss function Revista Colombiana de Estadística 38 (2015) 399-411

Table 4 :
The two times multiplicatively bias corrected estimator α4.