Nonparametric Estimation for Hazard Rate Function by Wavelet Procedures with Simulation

This article is a study of a non-parametric estimation of the hazard rate function using the linear wavelet estimation for right randomly censoring data. The strategy of the estimation is based on the use of the wavelet projection of the father function{φj,k(x),j = 1,2,…, 0 ≤ k ≤2j−1} on the subspace (Vj) of the spaceL2(R), with the Breslow estimate of the cumulative function. Real data of patients suffering from liver metastases is using as an application. Moreover, a simulation study is used to give more clarify to the method of estimation.


Introduction
In medical studies relating to cases of patients in terms of death or loss of follow-up which take a particular approach to censoring data, which is more generally adopted as survival data. One of the data collection operations is right randomly censoring, where a time interval is specified for the occurrence or waiting of the event for each individual. If the event occurs before the specified time, this variable is censored. If it happens after time, it is uncensored. In general, waiting for a particular event to occur produces a survival dataset. The specific time determents for each individual known as failure time. Medically, waiting for death or failure to follow up is considered to be one of those events associated with predetermined time to occur. Mathematically, let { } =1 be denoted the failure time for each individual which is in the case of censoring is not generally possible to observe for each individual. Let { } =1 be the censoring times, which each individual has a specific censoring time 's, then it is important that both { } =1 and { } =1 are non-negative, independent, identically distributed, with the density functions f and g and distribution functions F and G respectively.
Because of independency property of failure and censoring times, it's possible to assume the independent variables { } =1 and the indicator function { } =1 , such as: = min( , ) and = 1 ≤ . One of the topics that are of great importance in statistics is the hazard rate function which takes its importance in the calculation of risk rates, and this is particularly important in dealing with non-parametric data. Over years, researchers have been interested in estimating the hazard rate function in many ways, such as Kaplan-Meier, Nelson-Aalen and Kernel methods.
Generally, in statistics and especially in nonparametric data applications, wavelets provided new and useful techniques in terms of applications such as approximation and data analysis in function estimation problems. This is due to their effectiveness and ability to generate responses to variables that affect the behavior of the functions to be estimated. One of the important roles in statistics provided by the wavelets is estimating the probability density function, hazard rate function and others. A. Antoniadis and G. Gregoire [1], presented a wavelet-based method for estimating hazard rate and density function for right censoring survival data. D.R.M. Herrick and el at (2001) [6], proposed a non-linear wavelet thresholding method exploits the non-stationary variance structure of the wavelet Coefficients. Juan-Juan C. and el at (2011) [16], estimated the density function used non-  [9], they estimated the derivatives of a density function by wavelet block thresholding for randomly right censoring data and study the performance of various wavelet threshold estimators. Christophe Chesneau and T. Willer (2013), [3] estimated the cumulative function for non-parametric data and construct a new adaptive estimator based on a warped wavelet basis and a hard thresholding rule. H. Wendt and el at (2014) [13], investigated the potential of a new multifractal formalism, constructed on wavelet p-leader coefficients, to help discrimination between survivor and non-survivor patients. Maryam Farhadian and el at (2014) [18], developed a new method for estimation of hazard function based on combining wavelet approximation coefficients and cox regression. Mahmoud Afshari (2014) [17] has done some researches about density function estimator use wavelet method for estimating the density function for censoring data, and evaluated the mean integrated squared error. Christophe Chesneau and el at (2015) [4], they presented two types of wavelet estimators for the quantile density function a linear wavelet dependent on projections of father wavelet functions and a nonlinear wavelet dependent on a hard thresholding rule. Fabienne Comte and et al (2015) [10], they estimated hazard function by wavelet and focused on the case where the measurement errors affect both the variable of interest and the censoring variable. Chesneau and H. Doosti (2016) [5], developed a new estimator g(x, m) based on wavelet methods of multivariate discrete and continuous density function. G. A. Schnaidt Grez, and B. Vidakovic (2017) [12], estimated the density function using empirical approach linear estimator based on an orthogonal projection wavelet with Kaplan-Meier estimator of randomly censored data, and proposed the multiresolution space index J= 2 ( ) − 2 (log( )). This article will include, section two contains some concepts about wavelets, section three will address some facts about randomly right censoring data and hazard function, the estimation method Hazard function by wavelets include in fourth section, and section five discuses a real and simulation application to estimate hazard function.

Wavelet
Wavelets are defined as mathematical functions that divide data into different frequency components and then study each component separately. Wavelets are characterized by accuracy in the analysis of functions with signals and interruptions.

Model-up and Hazard Rate Function
The data model in this paper follows the assumptions:  Our strategy to estimate hazard function follows partially estimation, at first estimate the probability density function denoted as(̂( )) and then estimate survival function denoted as(̂( ) = 1 − ( )).

Estimation of Density Function (̂( ))
In order to estimate (̂( )), The wavelet projection method previously referred to as (8). It will be followed by the creation of a hybrid between the wavelet and the Breslow estimate.

Estimation of Survival Function (̂( ))
It is known that one of the general formulas for the survival function is to find out from the following form: It is noted from the equation above (22), it is enough only to find (̂( )). based on the work of (F. Comte [10]), it could be found (̂( )) as follows: Then, it's directly followed by: Finally, the estimation of the hazard rate function will be taken the form:

Data Application
Two applications are processing for the proposing method, first application is simulation and the second data application is real application data of liver metastases.

Simulation Study
Simulation data is generated using Gamma distribution for lifetimes { } =1 with two parameters, shape parameter equal to 5 and scale parameter equal to 1. The independent censoring times { } =1 are generated using exponential distribution with one parameter equal to 6. The aim of choosing parameters for both distributions is to have simulation data with 50% censoring. For data generation, n = 100, 200 were selected. As noted in figures (1 and 2), the intermittent curve represents the wavelet estimation of the hazard rate and density functions. While the solid curve represents the true hazard rate and density functions, in the proposed estimation method, Daubechies wavelet was used with the wavelet level determined by (2 Ĵ ) and (Ĵ = 2 ( / 10 ( ))). In order to give more information, use the global error measurement, Where R =200 is the number to repeat the experience and choosing the Daubechies wavelet filter (db50).

liver Metastases
The data is of 622 patients survival times suffering from liver metastases from a colorectal primary tumor collected by Haupt and Mansmann (1995). The survivals times of patients collected in months with 259 censored samples (43.62%). Moreover, the data is available in one of R program packages called locfit. We estimated the hazard function of the data using the Wavelet method dependent on the wavelet level (Ĵ = 2 ( / 10 ( ))). The results were then compared with the results obtained from Nelson-Aalen estimate as shown in Figure (3), where the intermittent curve represents the wavelet estimate, while the solid curve is Nelson-Aalen estimate. Notes that the hazard rate is in less cases is for less than 20 months, however, it begins growing, gradually in the times of more than 20 months. In order to add more information about the estimation method, the MSE was calculated and the result was equal to (0.363187572). Conclusion. This research presented a method for estimating the hazard function using linear wavelet estimation for randomly right censoring data. Where the strategy used is two stages of estimation, including the first estimate of probability density function and the second is the survival function estimate. The method of estimation using the projection property of the father wavelets { , ( )}, 0 ≤ ≤ 2 − 1, ≥ 0 on the subspace depending on the correct selection J. The use of simulation showed the strength of estimation in the calculation of hazard and probability density functions through the use of global error rate as we noted. In addition, a real application of liver metastases from a colorectal primary tumor data was used.