Modeling lifetime of parallel components system with covariate, right and interval censored data

This research aims to model the lifetime of parallel components system with covariates, right, and interval censored data. The lifetimes of the components are assumed to follow the exponential distribution, with constant failure rates. A simulation study is conducted to assess the performance of the maximum likelihood estimates, without and with midpoint imputation method at various sample sizes, censoring proportions, and number of components in the system. The combination which produces the best parameter estimates is then identified by comparing the bias, standard error and root mean square error of these estimates. The simulation results indicate that the midpoint imputation method produces more efficient and accurate parameter estimates with smaller bias, standard error and root mean square error. Also, in general, better estimates are obtained at low censoring levels, large sample sizes, and a high number of parallel components in the system. The proposed model is then fitted to a modified real data of diabetic retinopathy patients. Following that, the non-parametric log-rank test and Wald hypothesis test are carried out to check the significance of the covariate, age in the model. The results show that the model fits the data rather well and the age of patients has no significant effect on the survival time of the patients’ eyes.


Introduction
Exponential distribution plays a significant role in life testing. According to Epstein [7], the survival time of electron tubes or time intervals between consequent failures of electronic systems is random variables which is distributed exponentially.
Parallel component system is a system that functions only if at least one of its m components functions.Parallel component system's survival time is the maximum survival time of all components as discussed by Marshall and Olkin [14]. Parallel component system is integrated into the engineering field to increase the expected system lifetime by increasing number of parallel components in the system, thus increasing system reliability as discussed by Xie and Lai [16]. Basu and Mawaziny [3] studied a system with m independent components and found that the parameter estimators perform well when the component lifetimes are identically distributed. Arasan and Daud [1] studied log-linear-exponential regression model of Glasser [9] incorporated in a parallel system model with two covariates and Type II censoring and found that the parallel model with two covariates work well, especially when the data has low censoring proportion, high number of components in the system and large sample size.
Maximum likelihood estimation is commonly used to estimate parameter of distribution and it can be used with many types of censored data as discussed by Lawless [13]. Baklizi [2] and Kundu and Gupta [12] proposed that performance of maximum likelihood estimators are quite satisfactory even for small sample size.
Feigl and Zelen [8] proposed that concomitant variates are important as within a treatment group, the patients will vary in disease status hence respond differently to treatment. Glasser [9] proposed that covariates should be incorporated in the model to get a better model, and it can be extended by adding additional covariable between groups with large differences. Breslow [4] claimed that incorporation of covariance is important for prognosis and that the adjustment for covariance had a marked effect on the treatment comparison. Byar and Corle [5] proposed that initial covariate values allow us to identify two subsets of patients in which different treatments can be applied to different patient groups. Huster [11] extended Clayton-Oakes model [6,15] and proposed a parametric model for the analysis of paired censored survival data with the incorporation of covariates. Gorfine et al. [10] then claimed that the dependence parameter estimator in the Clayton-Oakes model [6,15] may be significantly biased if the measurement error in the covariate is not accounted for.
Zyoud et al. [17] studied multiple imputation methods and found that midpoint, random, mean, and median imputation approaches showed better performance in estimating survival function compared to left and right imputation method.
2. Methodology 2.1. Parallel exponential distribution with m components PDF of the parallel exponential distribution with m components and a single covariate in which β 0 is shape parameter, β 1 is covariate, t is lifetime and m is number of components in the system is given by, The survival function of parallel exponential distribution is given by, In this research, we used two approaches, which are midpoint imputation method and without imputation method to analyze interval censored data. Suppose there is a random sample of size n, let t i denotes as failure time of the ith observation and δ i its censoring indicator, then the general likelihood function of model with uncensored, right censored and interval censored lifetime data without any imputation method is given by, The general likelihood function for parallel exponential model with uncensored, right censored and interval censored lifetime data for second approach with midpoint imputation method is given by, Log-likelihood function without imputation method for i = 1, 2, . . . , n observations with right and interval censored is, Log-likelihood function for parallel exponential model using midpoint imputation method is

Log rank test
Log-rank test is a general test used to test the difference in survival between two or more independent groups. Log rank statistic can be approximately distributed as chi-square test statistic. It can be computed as, where

Wald Confidence Interval
Letθ be the maximum likelihood estimator of θ and I(θ) be the log-likelihood function of θ. The matrix I(θ) can be approximated using observe information matrix, i(θ). The estimate of var(θ j ) is then given by the (j, j) th element of quantile of the standard normal distribution, then the 100(1 − α)% confidence interval for θ j is given as, 3. Simulation study A simulation study using N =1000 with different sample sizes, n=50,100,150,200,250 and number of components, m=2,3,4,5,6 was conducted to compare the performance of parameter estimates of the parallel exponential model. Several different levels of approximate censoring proportions, cp=0.000, cp=0.125, cp=0.250, cp=0.325, cp=0.500 and cp=0.625 were chosen in this simulation to represent different levels of censoring proportions. Initial values of -1 and 0.5 were chosen as the parameters of β 0 and β 1 to simulate failure times. Let F (t i ) be cumulative density function for parallel exponential distribution, where U is a uniform variable on (0,1), and the lifetime, t i is generated using inverse transformation, which can be computed as: For exact observation, t i is exactly observed with censoring indicator, will be taken as the interval of interval censored observation and is assigned with censoring indicator, δ I i = 1. Parameter estimates of β 0 and β 1 were obtained using the maximum likelihood estimation without and with midpoint imputation method. The values of bias, standard error (SE) and root mean square (RMSE) were calculated to compare the performance of parameter estimates at different sample sizes, number of components and censoring proportions. Bias, standard error (SE) and root mean square (RMSE) can be calculated using formula as below.
Best imputation method was then chosen based on simulation results by comparing the bias, standard error and root mean square error of these estimates and further applied in real data analysis.  smaller bias, standard error and root mean square values compared to without any imputation method, thus yields parameter estimates that are more efficient than the parameter estimates obtained without any imputation method. Hence, the midpoint imputation technique will be used to test the interval censored data in real data analysis.

Real data analysis
In this study, modified real data of diabetic retinopathy was fitted to the existing parallel exponential model. Diabetic retinopathy data are obtained from the R survival package. This data are about the survival of patients' eyes with a high risk of diabetic retinopathy.
Original data consists of 197 observations on survival time in months that measured from the day the patient received laser treatment with censoring indicator 0, lost to follow up and 1, loss of vision and age (0 = juvenile and 1 = adult). The data were modified by taking a subset of the data of patients with age between 10-30 and those who have left eye treated, which consists of 56 observations. Data were further modified to become data with right-censored, interval censored and exact observations. Patients with a censoring indicator of 0 were treated as right-censored data. Twenty-one observations from observations with a censoring indicator of 0 were randomly selected using the Bernoulli distribution to generate interval-censored data to obtain censored data consists of 60% interval-censored data and 40% right-censored data. Interval censored data were then assigned with a censoring indicator 2. Thirty-five of the observations were censored in which the censoring proportion is approximately 62.5%. The remaining data were treated as exact data.
The non-parametric Kaplan Meier survival curve for diabetic retinopathy data was plotted. Parallel exponential model without covariate was fitted to diabetic retinopathy real data and survival curve was then plotted on the same graph. The parallel exponential fit is close to the Kaplan-Meier survival curve, hence we can conclude that the parallel exponential model is appropriate for diabetic retinopathy data. Survival and hazard plot based on age (juvenile and adult) were plotted. Survival probabilities for both juvenile's eyes and adult's eyes decrease over time, while hazard rates rise over time. However, survival probability for adult's eyes is slightly higher than juvenile's eyes because the survival curve of adult is above juvenile almost all of the time, whereas failure rate of juvenile's eyes are higher than the adult's eyes since the hazard curve of juvenile is above adult almost all of the time. This might due to treatment for diabetic retinopathy is more effective for adult-onset diabetics than for juvenile-onset diabetics as mentioned by Huster (1989). Preliminary analysis shows that age affects survival time, hence we perform non-parametric log-rank test to check whether age which acts as a covariate has significant effect on survival time.
, p = 0.7 Chi-square value is 0.2, whereas p-value is 0.7 which is higher than α=0.05. Hence, we failed to reject the null hypothesis at α=0.05. The effect of age on the survival times of the patients' eyes is negligible.  Table 3 shows parameter estimates of parallel exponential model. Wald statistic forβ 1 is -0.570377, which falls into acceptance region when α=0.05 and 0.10. Therefore, this result indicates thatβ 1 is not significant in the model.
Wald hypotheis test is further carried out to check the significant of age on survival time. 90% and 95% confidence intervals for β 1 were constructed using Wald statistic. β 1 represents covariate which is age in this study. Age has no significant effect on the survival time of patients' eyes if β 1 = 0. Zero is included in all confidence intervals in table 4. Hence, the null hypothesis is accepted at α =0.10 and 0.05. We are 95% confident that age has no significant effect on the survival time of patients' eyes. Hence, we can deduce that the age of patients has no significant effect on the survival time of the patients' eyes.

Conclusion
The performance of parameter estimates was compared at different number of components, sample sizes and censoring proportions using the values of bias, standard deviation, and root mean square error. In general, the results show that parameter estimates for the parallel model with covariate perform best with low censoring proportion level, high sampling size and high number of parallel components in the system. On the other hand, high censoring proportion, fewer parallel components in the system and small sample size give higher values of root mean square error, which indicates that the model is less efficient.Moreover, in the simulation study which included interval censored data, we noticed that the midpoint imputation method was able to produce better parameter estimates with lower bias, standard deviation, and root mean square error.
For the analysis of modified real data, we noticed that parallel exponential model fitted diabetic retinopathy data. From the preliminary analysis, we found that survival probability for adult's eyes is slightly higher than the juvenile's eyes. However, we concluded that age has no significant effect on the survival time of patients' eyes through non-parametric log rank test and Wald Hypotheis test.
We only focused on parallel exponential model with single covariate, age in this study. Other models such as Weibull, lognormal, log-logistic and Gaussian can also be used to analyze the performance of parameter estimates with right and interval-censored data in future study. Besides that, we can extend this study to include more covariates and higher number of components in the model. Furthermore, time-varying covariates can also be incorporated into the model as some covariates might have the ability to change over time. Apart from that, parameters estimates and values of bias, standard error and root mean square error calculations 10 in this study were purely based on maximum likelihood estimation. Maximum likelihood estimation relies heavily on asymptotic properties, and hence it might fail to give accurate and efficient parameter estimates at small sample size. Therefore, alternative methods such as Bootstrap and Jacknife with lesser assumptions can be applied in future research.