Interval-Censored Regression with Non-Proportional Hazards with Applications

: Proportional hazards models and, in some situations, accelerated failure time models, are not suitable for analyzing data when the failure ratio between two individuals is not constant. We present a Weibull accelerated failure time model with covariables on the location and scale parameters. By considering the effects of covariables not only on the location parameter, but also on the scale, a regression should be able to adequately describe the difference between treatments. In addition, the deviance residuals adapted for data with the interval censored and the exact time of failure proved to be satisfactory to verify the ﬁt of the model. This information favors the Weibull regression as an alternative to the proportional hazards models without masking the effect of the explanatory variables.


Introduction
Data that represent the time to event are characterized by the presence of censorship. In other words, they are partial observations of the response variable caused by the non-occurrence of the event of interest during the period under study. Interval-censored data arise when the occurrence time of the event is only known to have occurred in an interval. Interval censoring is a type of censorship that occurs naturally in studies in which the sample unit is evaluated periodically [1]. Such data appear, for example, in clinical or longitudinal experiments in which patients can be followed up only through periodic examinations. In this case, we do not know the exact occurrence time, but only that it occurs within an interval. Recently, some works on interval-censored data have been reported. In Betensky et al. [2], the authors introduced proportional hazards regressions, Calle and Gómez [3] used a Bayesian framework for analyzing regressions in which one of the covariables is interval-censored, Komárek and Lesaffre [4] proposed an accelerated failure time model, and Rodrigues et al. [5] discussed an alternative to the Kaplan-Meier method.
In fact, what can be said is that most studies adopt the assumption of proportional risks, i.e., the survival curves do not intersect. For example, Cox's model considers this assumption. In this research, the risks may be non-proportional.
On the other hand, in a study of dairy cow herds developed by the Department of Veterinary Medicine of the Federal University of Lavras, MG, in Brazil, to evaluate the effect of animal supplementation on the ovulation of dairy cows (interval-censored survival data), the survival curve of the control supplementation was crossed with the survival curve of the new supplementation (see Figure 1a). Another example refers to breast cancer data, where the goal is to compare the effects of radiotherapy alone versus radiotherapy and adjuvant chemotherapy on women. In Figure 1b, we present the graph of the survival curve with the two effects.
In both cases, proportional hazards models are not suitable for analyzing ovulation times and compare radiotherapy and radiotherapy + chemotherapy treatments, and as an alternative, we can consider accelerated failure time models. However, in some situations, the accelerated failure time models may not be able to capture the difference between treatments, possibly due to interference from the crossing of survival curves, as shown in Hashimoto et al. [1]. A solution to solve the effect of crossed survival curves (non-proportional hazards assumption) is to consider the scale parameter as a systematic component of the model.  In this context, we include the effects of covariables in the location and scale parameters in the Weibull accelerated model with two systematic components to analyze interval-censored data with non-proportional hazards. We use standard likelihood theory for inferential purposes. Another objective is to propose an extension of the deviance residuals [1] for interval-censored survival data without considering the proportional hazards assumption.
The paper is organized as follows. In Section 2, we propose the log-Weibull regression with two systematic components for the location and scale parameters for interval-censored data. The estimation of the parameters by maximum likelihood is addressed in Section 3. We provide a simulation study to check the accuracy of the estimates and residuals in Section 4. In Section 5, we empirically prove the utility of the new regression. We offer some concluding remarks in Section 6.

The Proposed Model
The Weibull and log-Weibull distributions are frequently adopted for analyzing censored data [10][11][12][13] and phenomena with monotone failure rates [14]. Let T ∼W(α, λ) be the Weibull random variable, where α > 0 is the shape parameter and λ > 0 is the scale parameter. The probability density function (pdf) of the log-Weibull defined by Y = log(T) under the re-parametrization σ = α −1 > 0 (scale) and µ = log(λ) ∈ R (location), say Y ∼ LW(µ, σ), has the form The log-exponential distribution comes with σ = 1. The survival function corresponding to (1) is Clearly, the random variable Z = (Y − µ)/σ has density The accelerated failure time parametric model, also known as the location-scale regression [14], has been used for studying the effects of covariables on the response variable. Let Y 1 , · · · , Y n be independent random variables such that Y i ∼ LW(µ i , σ i ). In addition, we consider the vectors of explanatory variables x i = (1, x i1 , · · · , x ip ) and w i = (1, w i1 , · · · , w iq ) . The linear location-scale regression for interval-censored data is defined as by (for i = 1, . . . , n) where µ i = x i β 1 and log(σ i ) = w i β 2 , the random error Z i has density function (3), and β 1 = (β 10 , · · · , β 1p ) and β 2 = (β 20 , · · · , β 2q ) are functionally independent. Thus, we associate a systematic component with the parameter σ to allow us to model data with non-proportional risks and the presence of heteroscedasticity. Equation (4) can be useful for presenting such data.
For interval-censored data, the logarithms of the failure times Y i (for i = 1, . . . , n) are not observed exactly, only that the event occurred at some moment in a interval time of the type (log(U i ), log(V i )], where log(U i ) < Y i < log(V i ). An exact failure time is observed if log(U i ) = log(V i ). In practice, the values of log(U i ) and log(V i ) refer to evaluation times, and hence for some individuals the event can occur after the last visit to the evaluator, thus characterizing observations subject to right censoring, where the failure time Y i occurs in the interval [log(U i ), ∞). A summary of the structure of the data is given below:

Estimation
We are only interested in models with an exact failure time and interval censoring. Let (log(u 1 ), log(v 1 ), x 1 , w 1 ), · · · (log(u n ), log(v n ), x n , w n ) be a set of interval-censored observations. The log-likelihood function of the log-Weibull regression for an exact failure time and interval-censored data has the form where θ = (β 1 , β 2 ) is the vector of parameters, and E, R, and I denote sets of elements with exact failure time, right censoring and interval censoring, respectively. The maximum likelihood estimate (MLE) θ of θ can be found by maximizing (5) in numerical platforms such as Ox 8.00 (MaxBFGS function) or R Version 3.6.3.

Modified Deviance Residuals
Various types of residuals have been proposed in the regression literature [14]. For some applications of censored data, see, for example, the log-Birnbaum-Saunders regression [15] and the generalized log-gamma regression [16]. Furthermore, [1] presented martingale residuals for interval-censored data. Following these ideas, we extend these residuals with no proportional risk assumption. The . . , n} is taken from Section 3. The martingale residuals [1,17,18] for interval-censored data have the form whereŜ(·) is the estimated survival function. So, replacingŜ(·) in Equation (2), we obtain the martingale residuals for the log-Weibull regression for interval-censored data and non-proportional hazards: where However, the martingale residuals are asymmetrical and have values less than or equal to one. In contrast, [1] used an adaptation of the deviance residuals [17] for intervalcensored data. Based on a combination of these residuals, we define the modified deviance residuals by where r M i are the martingale residuals given by (6).

Deviance residual
A first simulation study with 1000 replications analyzed the empirical distribution of the residuals (7) by combining n = 30, 50, 100 and 300 with the interval censoring percentages 100%, 90%, and 70%. For each combination, the generated data follow the proposal by [1]: (a) The coefficients are fixed at β 10 = 3.00, β 11 = 0.72, β 20 = −0.71 and β 21 = 0.60; (b) The explanatory variable x 1 is generated from the binomial distribution with a success probability equal to 0.5 and 1, and µ i = β 10 + β 11 x 1 and σ i = exp(β 20 For each fit, we calculate the residuals r D i and construct the QQ plots displayed in Figure 2. Table 1 reports the averages of the MLEs of the parameters and mean square errors (MSEs) of the Weibull regression, respectively. The numbers in Table 1 indicate that the MSEs of the estimates decay toward zero when n increases. However, for the estimate of σ, although its MSE is decreasing, the estimated bias does not change, which may be influenced by the presence of interval censoring. We can note in Figure 2, regardless of n, that there is a greater displacement of the residuals when the percentage of interval censorship decreases. This is because the sample is being contaminated by interval censoring.

2.
Estimating the survival function We construct a second simulation study with 200 replicates to estimate the survival functions with covariables on the parameters µ and σ simultaneously and on the parameter µ.
For the simulation scenarios, we set the sample size n = 100 combined with the same censoring percentages of the previous study. The data are also generated in the same way given in the item a. However, using the same generated data, the estimation is conducted as follows: (a) Effects of the covariables on the parameters µ and σ simultaneously: i Initial values β 10 = 3.00, β 11 = 0.72, β 20 = −0.71 and β 21 = 0.60; ii Estimate the survival function bŷ where x 1 is generated from a binomial distribution with a success probability 0.5 and 1.
(b) Effects of the covariables on the parameter µ: i Initial values β 10 = 3.00, β 11 = 0.72, and σ = 0.6129, where the initial value for σ is the estimate given in Table 2; ii Estimate the survival function bŷ where x 1 is generated from a binomial distribution with a success probability equal to 0.5 and 1.   Figure 3 shows that the systematic components for both parameters µ and σ can model the non-proportional hazards well. Figure 4 clearly reveals that we cannot model the non-proportional hazards by taking only the systematic component for µ.

Applications
We present two applications in different areas. The first application refers to experiments in veterinary science and the second in medicine.

Regression for the Supplementation Animal Data
The dataset refers to a study of dairy herds developed by the Department of Veterinary Medicine of the Federal University of Lavras. The objective was to verify if the supplementation offered to the herd was influencing the ovulation time of the animals. The experiment was carried out with fifty dairy cows divided into two groups: the control group corresponds to the animals without supplementation and the treatment group corresponds to the animals treated with supplementation to induce ovulation.
The response variable is the time (in days) after delivery until the first ovulation, but only for some animals was it possible to know the exact time of the first ovulation. For the other animals, the only information is the exact time that failure occurred in a time interval, which characterizes the presence of interval censoring. The following variables are considered (for i = 1, . . . , 50): • y i : Logarithm time after delivery until the first ovulation; • x i1 : Treatment (0 = control, 1 = supplementation); • x i2 : Ovary (0 = right, 1 = left); • x i3 : Number of pups (0 = two pups, 1 = two more pups).
The dataset is formed by dichotomous covariables. The estimated survival curves in Figure 5 created using the Turnbull method examine the behavior of the covariables in relation to the logs of ovulation times. Figure 5 indicates that the assumption of proportional risks is not satisfied for the current dataset. So, we adopt the regression in Equation (4) to explain these data.  Consider the log-Weibull regression for the current data, where z i has the density function (3) and The MLEs from the two regressions are reported in Table 3. The p-value of 0.003 for the estimate of β 21 indicates a significant difference between the levels of the treatment to explain the variability of the logs of the failure times.  Figure 6a displays the random residuals within the interval (−3, 3). The QQ plot with a generated envelope is reported in Figure 6b to verify the response distribution. Both plots support that the two components of the fitted regression are necessary to explain these data. Plots of the empirical and estimated survival functions and the estimated hazard rates are given in Figures 7 and 8, respectively. We note an increasing curve for the ovulation time data and the non-proportionality of the hazards, which supports the new regression with two components. We can also note that this change in 33 days is captured by the systematic part of the parameter σ.

Regression for Breast Cancer Data
We investigate the log-Weibull regression in the presence of interval-censored data [19] when proportional risks are not satisfied. These data are taken from a retrospective study reported by [20,21] to compare the cosmetic effects of radiotherapy alone versus radiotherapy and adjuvant chemotherapy on women with early breast cancer.
In this study, we consider the following variables (for i = 1, . . . , 94): • y i1 : Logarithm of time (in months) to first appearance of moderate or severe breast retraction; • x i1 : Type of treatment (0 = radiotherapy and chemotherapy, 1 = radiotherapy).
The dataset is composed of a dichotomous covariable. Figure 9 displays the estimated survival curves obtained using the Turnbull method to verify the behavior of this covariable in relation to the log retraction time. The proposed regression for the breast cancer data takes the form where z i has the density function (3) and the parameters are µ i = β 10 + β 11 x i1 and σ i = exp(β 20 + β 21 x i1 ). Table 2 provides some results from the fitted regressions. For a 5% significance level, the retraction time has significantly different effects from radiotherapy and radiotherapy plus chemotherapy, considering both location and dispersion parameters. Figure 10a gives the plots of the modified deviance residuals against the index. Figure 10b provides the QQ plot and generated envelope. These plots support the wider log-Weibull regression for the current data. In Figure 11, we present the plots for the empirical and estimated survival functions for the log-Weibull regression. Figure 11a considers only the systematic component for µ, whereas Figure 11b considers the systematic components for µ and σ.
The estimated hrf displayed in Figure 12 indicates increasing shapes for the ovulation time. Figure 12a refers to just one component for µ, whereas Figure 12b refers to two components for µ and σ. These plots support the non-proportionality of the hrf and a regression with two components for a better fit to these data.
Interpretation for µ • There is a significant difference between the levels of radiotherapy and chemotherapy and radiotherapy in terms of the covariable treatment in relation to the log time of the first appearance of moderate or severe breast retraction.

Interpretations of σ
• There is a significant difference between the levels of radiotherapy and chemotherapy and radiotherapy in terms of the covariable treatment in relation to the variability of the logarithm of the time of the first appearance of moderate or severe breast retraction. • We note from Figure 11b that before exp(2.5) = 12 months (approximately), the radiotherapy and chemotherapy level of treatment has a longer survival time than the radiotherapy level, but this difference is not significant. • After 12 months, we note the opposite, i.e., the survival time of the radiotherapy level is longer than that of the radiotherapy and chemotherapy level in relation to the time of the first appearance of moderate or severe breast retraction.
• So, we note that 12 months of applying radiotherapy and chemotherapy to the patient makes them less immune. • We can also note that this change after 12 months is captured by the systematic part of σ.

Conclusions
We define and study a new log-Weibull regression for interval-censored data with two systematic components for the location and scale parameters, whose risks are not proportional. The parameters are estimated by maximum likelihood and some Monte Carlo simulations are used to investigate the accuracy of the estimates and the normal approximation for the deviance residuals. We show significant differences between two treatments for the supplementation of dairy cows. We emphasize the utility of the log-Weibull regression in two applications to real data. The datasets can be obtained by contacting the main author. Several future works can be considered, based on the assumption of non-proportional hazards; for example, the research by Hashimoto et al. [1] referring to the regression model based on the log-exponentiated Weibull distribution for interval-censored data can be generalized considering the assumption of non-proportional hazards. Analogously, we can extend the research presented by Hashimoto et al. [22] and, in this case, the extension will be related to regression models with a cure fraction for interval-censored data and non-proportional hazards. The study presented by Yang et al. [23] can be extended to non-proportional hazards models for interval-censored data considering two systematic components. Other future works may include, for example, the use of regression models with random effects for interval-censored data with non-proportional hazards in the form of group structures or correlated data, and, finally, the use of regression models for interval-censored data under the assumption of hazards not being proportional to informative censorship.
Author Contributions: All the authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.