Time-varying effect in older patients with early-stage breast cancer: a model considering the competing risks based on a time scale

Background Patients with early-stage breast cancer may have a higher risk of dying from other diseases, making a competing risks model more appropriate. Considering subdistribution hazard ratio, which is used often, limited to model assumptions and clinical interpretation, we aimed to quantify the effects of prognostic factors by an absolute indicator, the difference in restricted mean time lost (RMTL), which is more intuitive. Additionally, prognostic factors of breast cancer may have dynamic effects (time-varying effects) in long-term follow-up. However, existing competing risks regression models only provide a static view of covariate effects, leading to a distorted assessment of the prognostic factor. Methods To address this issue, we proposed a dynamic effect RMTL regression that can explore the between-group cumulative difference in mean life lost over a period of time and obtain the real-time effect by the speed of accumulation, as well as personalized predictions on a time scale. Results A simulation validated the accuracy of the coefficient estimates in the proposed regression. Applying this model to an older early-stage breast cancer cohort, it was found that 1) the protective effects of positive estrogen receptor and chemotherapy decreased over time; 2) the protective effect of breast-conserving surgery increased over time; and 3) the deleterious effects of stage T2, stage N2, and histologic grade II cancer increased over time. Moreover, from the view of prediction, the mean C-index in external validation reached 0.78. Conclusion Dynamic effect RMTL regression can analyze both dynamic cumulative effects and real-time effects of covariates, providing a more comprehensive prognosis and better prediction when competing risks exist.


Background
Older patients with early-stage breast cancer (particularly when comorbidities are advanced) tend to die from other diseases.That is, the number of deaths from non-breast cancer is large (1).Patients may experience a variety of outcomes: death from breast cancer, death from heart disease and so on.Assuming the outcome we are interested in is death from breast cancer (event of interest), then we hope to observe the time from the start of follow-up to the occurrence of the event of interest, but this will not be observed for the patient dying from heart disease (competing event).In this case, traditional single-endpoint survival analysis, such as the Cox proportional hazards model, only considers the event of interest and treats patients who die of heart disease as censored simply.However, this doesn't meet the non-informative censoring hypothesis.That is, the risk of dying from breast cancer among women who have already died of heart disease needs to be the same as that among women who remain in follow-up.However, patients who have already died of heart disease will not die of breast cancer again.So it is improper to simply treat patients who experience competing events as censored, which will overestimate the cumulative incidence of the event of interest and result in bias (2)(3)(4).To solve the situation where multiple outcomes compete with each other, we should consider competing risks models.
In traditional multivariate analysis of competing risks, causespecific Cox regression and Fine-Gray regression are often used, and the corresponding effect sizes are the cause-specific hazard ratio (cHR) and subdistribution hazard ratio (sHR).However, both the cHR and sHR are relative indicators, defined as the ratio of the hazard function.It is difficult for clinicians to interpret them as intuitive clinical benefits and communicate with patients (5,6).For example, when the sHR of estrogen receptor (ER) is 0.43, that is, the risk of death in the ER-positive group is 0.43 times that in the ERnegative group, and because the baseline hazard is unknown generally, the absolute risk of death of two groups cannot be known.In addition, both cause-specific Cox regression and Fine-Gray regression models need to satisfy the proportional hazards assumption.From a clinical perspective, clinicians or patients are more interested in direct (absolute) effect sizes on a time scale, e.g., how long will I live?How long will surgery extend my life expectancy?As Blagoev (6) points out, "While a hazard ratio has some value, for the clinician caring for a patient and, more importantly, the patient, it does not convey benefit in terms that are meaningful-how much longer will the patient live or live without experiencing disease progression."Therefore, restricted mean time lost (RMTL) has been proposed as an alternative measure to the hazard ratio (7)(8)(9)(10)(11)(12).The RMTL is the area under the cause-specific cumulative incidence function (CIF) over a period of time (from 0 to a restricted time point t), which can be interpreted as the life expectancy lost due to a specific cause in this period of time.Compared with the hazard ratio, the interpretation of RMTL is more intuitive, giving the life lost of each group due to death from breast cancer over a period of time and measuring the effect of a factor by the difference in RMTL between groups.For example, Figure 1 shows areas under the CIF for patients in the ERpositive group and ER-negative group.During the 10.5 years (t = 10:5), the mean life lost due to death from breast cancer was 0.9 (S 0 ) years for patients in the ER-positive group and 2.4 (S 0 + S 1 ) years for patients in the ER-negative group, and ER-negative patients lost an additional 1.5 (2.4-0.9) years of life on average.At the same time, RMTL does not need to meet the proportional hazards assumption.
The existing multivariate analysis of RMTL includes two methods: regression based on the pseudo-value method (11) and regression based on inverse probability of censoring weighting (IPCW) (12).The regressions mentioned above only concern the cumulative effects of prognostic factors during a t-year follow-up, which are constant values (static or time-fixed effects).However, the real-time effects of many covariates (e.g.,: covariates with timevarying effects) vary, for example, the real-time effect of chemotherapy tends to decrease with increasing follow-up time.It has been documented that the effects of age, histological grade, and ER status on the survival of patients with breast cancer change over time (13,14), so it may not be comprehensive to fit only the static effects of prognostic factors.
Given covariates with time-varying effects in the field of competing risks, we proposed a dynamic effect RMTL regression model.Monte Carlo simulation was used to assess the accuracy of the coefficient estimates of the model.At the same time, we applied this model to older patients with early-stage breast cancer in the Surveillance, Epidemiology, and End Results (SEER) database to explore the dynamic cumulative effects and real-time effects of Abbreviations: cHR, cause-specific hazard ratio; sHR, subdistribution hazard ratio; RMTL, restricted mean time lost; CIF, cause-specific cumulative incidence function; IPCW, inverse probability of censoring weighting; SEER, Surveillance, Epidemiology, and End Results; PR, progesterone receptor status; ER, breastconserving surgery; ALND, axillary lymph node dissection; SLNB, lymph node biopsy; 95%CI, 95% confidence interval.

FIGURE 1
Cumulative incidence curves for death from breast cancer in the ER-positive group and ER-negative group.S 0 , S 1 correspond to the blue and red areas, respectively.prognostic factors, as well as to establish a prediction model to predict mean life lost due to death from breast cancer over a period of time among patients.We hope to guide doctors to better determine the prognosis of patients, select better therapeutic regimens, and improve the survival time of patients.

Model construction
Let T be the time to event and C be the censoring time so that the observed time is U = min (T, C).An event indicator e equals 1, 2, or 0 when the observed outcome is an event of interest, a competing event, or censoring, respectively.At the same time, let Z * = (1, Z) denote the n Â (p + 1) matrix of covariates allowing an intercept term.Thus, for patient i(i = 1, …, n), the observed data include U i , e i , Z * i f g .Let a continuous variable l(0 ≤ l ≤ t; t ≤ t max ) be the prespecified end time of follow-up, where t is the pre-specified maximum follow-up time, and t max is the natural maximum follow-up time of data.J time points l j are selected from 0 to t in ascending order and recorded as (l 1 , l 2 , …, l J ).
For the need of the method, we advance the end time of followup from t max to l j .Correspondingly, each patient's survival outcome will change at different pre-specified end times of follow-up.When a patient experienced the event of interest or the competing event before l j , e(l j ) is equal to 1 or 2; in other cases, e(l j ) is equal to 0. e(l j ) denotes the survival outcome after restraint.Let T(l j ) = min (T, l j ) and U(l j ) = min(U,l j ) be the event time and observed time after constraint, respectively.For patient i, the observed data consist of U i (l j ), e i (l j ), Z * i È É .A regression model is developed to assess the dynamic effects of covariates in RMTL: with link function g(•) using the identity function.m 1 (ljZ * i ) is the life expectancy lost due to the event of interest of patient i during the l-year follow-up.Regression coefficients b , the model can be rewritten as Regression coefficients are estimated by solving the estimating equation The model assumes the life lost of individuals with the event of interest is (l j − T i (l j )) Â I(e i (l j ) = 1) = l j − T i (l j ); the life lost of individuals with the competing event is (l j − T i (l j )) Â I(e i (l j ) = 1) = 0; and censored observations have I(e i (l j ) ≠ 0) = 0, which means they do not contribute to the estimating equation.E(F(b)) ≠ 0 in the presence of censoring.However, when applying IPCW to the estimating equation, its expectation is 0 (15).Therefore, the estimating equation changes to The fitted data are actually obtained by stacking J datasets.The j-th dataset is the risk set with l j -year follow-up (as shown in Figure 2).Ĝ (t, l j ) is the Kaplan-Meier estimator of the noncensoring distribution in the j-th dataset.
We treat the inverse probability censoring weight as a fixed value rather than a random variable (16).Thus, the variance in the regression coefficients will not consider the variation brought by the weight.Therefore, we have ffiffiffi where a ⊗ 2 = aa T and h Then, we estimate regression coefficients by a generalized estimating equation, thereby correcting for data correlation.

Simulation designs
Next, we assessed the performance of the estimation of dynamic-effect RMTL regression by a simulation.We used the mean bias, mean relative bias, root mean squared error, relative standard error, and empirical coverage rate as evaluation indicators.

Data generation
First, we generated two independent variables Z = (Z 1 ,Z 2 ), which were generated by an independent Bernoulli distribution.We let the subdistribution hazard function for the event of interest follow a Gompertz distribution, l 1 (tjZ) = g z exp (r z t), where g z and r z were set according to four strata of (Z 1 , Z 2 ).We defined the CIF for the event of interest as and the CIF for the competing event as F 2 (tjZ) = P(T ≤ t, e = 2jZ) = exp (g z =r z ) 1 − exp (− t) f g .Survival outcome was generated by Bernoulli distribution, P(e = 1jZ) = 1 − exp (g z =r z ).Thus, the conditional CIF for the event of interest and the competing event were ) and P(T ≤ tje = 2, Z) = 1 − exp ( − t), respectively.Next, we used the inverse method to generate event time T. Finally, we generated right censoring and determined the final observed survival outcome.
By integrating the CIF, we obtained the true value of the RMTL of each group at different Z, m 1 (ljZ) = Z l 0 F 1 (tjZ)dt.Moreover, the true values of regression coefficients were obtained by the difference in RMTL between groups: the baseline is b We found that the regression coefficient b k (l) was a cumulative quantity, and its absolute value increased as l increased.

Simulation results
Tables 1, 2 demonstrate the accuracy of b (l) at different l.In all cases, the mean relative bias was small, in which the mean relative bias of b 0 (l) was less than 2%; the relative standard error was approximately 1; and the coverage rate was approximately 95%.Because the absolute value of true value of the regression coefficient increased with increasing l, it was reasonable that the mean bias increased with increasing l.The simulation showed that the estimation of dynamic effect RMTL regression was accurate.
With the increase in sample size, the mean bias, mean relative bias, and root mean squared error were more likely to decrease.Moreover, different censoring rates had little effect on the mean relative bias and root mean squared error.

Model application
In this study, we extracted data from the SEER database for older patients with early-stage breast cancer.
We used 3892 patients diagnosed from 2000 to 2012 as a training set and another 1561 patients diagnosed from 2013 to 2015 as an externally validated set.Details of data collection and variables can be found in the Supplementary Material.
There were 769 deaths from breast cancer and 998 deaths from non-breast cancer in the training set, giving an approximately 55% censoring rate.The follow-up time ranged from 0.17 to 18.92 years, with a median of 8 years.
Table 3 shows the result of the static effect RMTL regression (t = 10:5 years) (12).The regression coefficient b indicates a cumulative difference in mean life lost during the 10.5-year follow-up due to death from breast cancer between groups of the prognostic factor.For example, b ER = −0:638 showed that patients  in the ER-positive group died of breast cancer 0.638 years later than those in the ER-negative group during the 10.5 years, so ER positivity was a protective factor.In general, a prognostic factor was protective when b was negative and deleterious when b was positive.Table 3 shows that patients with ER positivity, PR positivity, breast-conserving surgery (relative to mastectomy), and chemotherapy had a better prognosis, while patients with older age, higher T stage, higher N stage, and higher histological grade had a worse prognosis.Race, marriage, axillary surgery, and radiation therapy had no statistical significance on survival.The significant covariates did not meet the proportional subdistribution hazards assumption, indicating that dynamic effects might exist.However, static effect RMTL regression only gives the static cumulative effect, and the real-time effect in the cumulative process cannot be known.Therefore, we fitted the proposed dynamic effect RMTL regression and used a backward  stepwise approach to screen covariates.As a result, race, marriage, axillary surgery, and radiation were screened out.Table 4 shows the results of the dynamic effect RMTL regression.Because the cumulative effect of the k-th prognostic factor was assumed by b k (l) = (b k0 , b k1 , b k2 ) Â (1, l, l 2 ), which would be screened by the stepwise method, the regression coefficients would include at least one of b k0 , b k1 , b k2 .For example, b ER (l) = 0:291 − 0:141l + 0:005l 2 , which was dynamic and varied with l.In the case of, b ER (4:5) ≈ À 0:25 ( = 0:291 − 0:141 Â 4:5 + 0:005 Â 4:5 2 ), which means ER-negative patients lost an additional 0.25 years of life on average during the 4.5 years follow-up.Figures 4A-J shows the regression coefficients of different prognostic factors in different l.For example, Figure 4G shows b ER (l) (solid black line), with breast cancer deaths occurring an average of 0.25 years later (y-axis b ER (4:5) ≈ −0:25) in ERpositive patients than in ER-negative patients during the 4.5-year follow-up (x-axis l = 4.5); breast cancer deaths occurring an average of 0.42 years later (y-axis b ER (6:5) ≈ −0:42) in  (black solid lines) and the speed of accumulation (blue dashed lines) of ER and PR, respectively.In Figure 4G*, the speed of decline of b(l) was decreasing, and the speed of decline was 0.097 when l = 4.5, which is in units of the difference in life lost between the positive group and negative group in the 1-year follow-up; the speed of decline was 0.077 when l = 6:5.Therefore, the real-time effect of ER decreased with time.In Figure 4H*, the speed of decline of b(l) was constant, and the speeds of decline were both 0.027 when l = 4.5 and l = 6.5.Therefore, the real-time effect of PR remained unchanged with time.
We added an auxiliary line (red dotted line), which is the line between the two endpoints of the regression coefficient curve b(l) in Figures 4A-J, to determine whether the real-time effect of the prognostic factor changed.In general, 1) when b(l) coincided with the auxiliary line (Figures 4A, D, F, H), the real-time effect was unchanged; 2) when b(l) decreased with increasing l (Figures 4G, I, J), if the regression coefficient curve was below the auxiliary line, that is, the real-time effect decreased (Figures 4G, J), and conversely, the real-time effect increased (Figure 4I); and 3) when b(l) increased with increasing l (Figures 4B, C, E), the regression coefficient curve above the auxiliary line corresponded to a decrease in the real-time effect, and conversely, it corresponded to an increase in the real-time effect (Figures 4B, C, E).Therefore, it was concluded that the real-time effects of age, stage N3 (relative to stage N1), histological grade III&IV (relative to grade II), and PR positivity were unchanged; the real-time effects of ER positivity and chemotherapy decreased; and the real-time effects of T2 (relative to T1), N2 (relative to N1), histological grade II (relative to grade I), and breast-conserving surgery increased.
In addition to exploring the dynamic cumulative effects and real-time effects of prognostic factors, another role of dynamic effect RMTL regression is providing personalized prediction for patients.Three patients were selected (see Table 5 for details).Figure 5 shows the predicted RMTL during the l-year follow-up of each patient, and Table 5 also shows the predicted RMTL during the 5-year and 10year follow-up.In the case of patient A, the predicted mean life lost due to death from breast cancer was 1.5 years in the 5-year followup; in the decade of follow-up, the predicted value was 4.2 years.Patients B and A differed only in the choice of treatment.Compared with patient A, patient B received breast-conserving surgery and chemotherapy, and his predicted RMTL was less than that of patient A; that is, breast-conserving surgery and chemotherapy could prolong the survival time of older patients with early-stage breast cancer.Patient C differed from patient B in N stage and histological grade, and because patient C had lower N stage and histological grade, his predicted RMTL was lower than that of patient B.
In addition, the accuracy of prediction was evaluated by an external validation set. Figure 6 shows the C-index and relative prediction error when the pre-specified end time of follow-up was different (18).The mean C-index was 0.78, indicating good discrimination of the model, and the relative prediction error was within 10%.
The prediction formula can be seen in Table 4, and the prediction model has been converted into a web-based prediction tool available on the web at https://m92imi-oscar-0.shinyapps.io/newapp/.

Discussion
When the effect of a prognostic factor on competing events is large, we should use a competing risks approach; otherwise, the estimate of the effect of this factor on the event of interest will be biased greatly (19).In our data, the sHRs of age and chemotherapy on death from non-breast cancer (the competing event) were 2.486 (95% CI: 2.181 to 2.834) and 0.627 (95% CI: 0.545 to 0.722), respectively.Moreover, the number of those who experienced the competing event accounted for 26% of the total sample size and 56% of the total number of events, so it is necessary to consider competing risks in these data.
In the static effect RMTL regression, it only gives the cumulative effect during the t-year follow-up, and it is impossible to know the real-time effect in the cumulative process.In particular, this result is incomplete for covariates with time-varying effects.Additionally, for patients who have been followed up for some time, the cumulative effect from 0 to t years is no longer applicable.In contrast, the dynamic effect RMTL regression can not only obtain the dynamic cumulative effect in the l-year follow-up but also explore the real-time effect.The real-time effect can help doctors and patients to have a better understanding of the prognosis of breast cancer.For example, the real-time effect of ER positivity decreased, which means its protective effect is larger in the first period and smaller in the later period, suggesting that estrogen therapy should be used as early as possible; the real-time effect of breast-conserving surgery increased, which means its protective effect is larger in the later period, suggesting that the effect of breast-conserving surgery is delayed.Regarding the prognostic analysis of death from breast cancer, Yao used Cox regression and cause-specific Cox regression to analyze the difference in the effects of prognostic factors on breast cancer in men and women (20), and Xu used Fine-Gray regression to develop a prediction model for patients with inflammatory breast cancer (21).However, none of these studies considered the potential time-varying effects of prognostic factors.Moreover, some studies analyzed the time-varying effects of prognostic factors (13, 14, 17, 22), but these were the results of singleendpoint survival analysis and did not consider the impact of competing events, which may result in competing bias.
In this paper, both competing risks and time-varying effects were considered for the first time, and the real-time effects of the following prognostic factors were found to be different from the Predicted trajectories of RMTL for different patients.The predicted mean life lost of patient A due to death from breast cancer was 1.5 years in the 5year follow-up; in the decade of follow-up, the corresponding predicted value was 4.2 years.previous single-endpoint analysis results.First, in a single-endpoint analysis of breast cancer, the risk effect of stage N2 relative to stage N1 decreased over time (17).In contrast, we found that stage N2 was also a risk factor, but the real-time effect increased over time (Figure 4C).Second, previous single-endpoint studies have shown that the deleterious effect of histological grade II relative to grade I decreased over time (13,22).However, we found that the deleterious effect of histological grade II increased over time (Figure 4E).Third, in previous single-endpoint studies, ER positivity was a protective factor in the early period and a deleterious factor in the late period (13,22,23).This was different from our results, which showed that the protective effect of ER positivity decreased over time (Figure 4G).Fourth, in terms of treatment, we found that patients with breast-conserving surgery had a better prognosis than those with mastectomy (Figure 4I).This is consistent with Kim's study and a meta-analysis, which showed that patients who underwent breast-conserving surgery had a higher overall survival rate than those who underwent mastectomy (24, 25).However, we further discovered that the protective effect of breast-conserving surgery increased over time (Figure 4I).Finally, chemotherapy was the protective factor, and its real-time effect decreased (Figure 4J).This is similar to Rakovich's study, which found that chemotherapy after breast-conserving surgery in patients with ductal carcinoma in situ reduced the risk of early local recurrence but not the risk of late recurrence (26).Finally, the final dynamic RMTL model was constructed with the full dataset (see Web Supplementary Table 2 in Supplementary Material), and the result was similar to that constructed with the training set (Table 4).

B A
Time-varying covariate and covariate with time-varying effect are two different types of data, which requires different statistical methods to analyze (27).Time-varying covariate means the value of a covariate changes over time, which needs methods related to longitudinal data to analyze.While covariate with time-varying effect means the effect on the outcome is time-varying (28).Meanwhile, covariates do not meet the proportional subdistribution hazards assumption, tending to have time-varying effect in the competing risks.Because time-varying effect is difficult to identify, we often ignore it.And then biased estimates will be obtained, and the significant effect occurring only in part of the follow-up period will be missed (29).Among the two types of covariates, this paper focuses on the latter and proposes an extended RMTL regression model to depict time-varying effects, which also can be used in single-endpoint survival data.The extension for time-varying covariates will be the focus of our future research.
There are still some shortcomings in this study.First, the model uses IPCW.It should be noted that there are very few patients remaining at-risk at the end of follow-up, which may lead to large and unstable weights.2) The life lost is the time lost due to death from breast cancer over a period of time (the l-year follow-up) rather than the reduction in total life in the traditional sense.3) The HER2 status is also an important prognostic factor for breast cancer.Due to the SEER database only beginning to record HER2 status in 2010, we have chosen not to include this variable in our analysis.

Conclusion
To explore the potential time-varying effects of prognostic factors under competing risks survival data, we develop a dynamic effect RMTL regression to model the stacked dataset by generalized estimating equation and IPCW technique.The simulation of regression coefficients and external validation of prediction demonstrate that dynamic effect RMTL regression is accurate in both prognosis and prediction when competing risks exist.The new model can explore dynamic cumulative effects and real-time effects of prognostic factors on a time scale, which gives clinical researchers a more comprehensive understanding of the progression of breast cancer.Moreover, timescale-based individual prediction also allows physicians and patients to more intuitively determine the disease and choose the best treatment.

FIGURE 2
FIGURE 2Composition of the stacked dataset.

FIGURE 3
FIGURE 3Subdistribution hazard ratios (sHRs) of two independent variables for the event of interest in the simulation.

4
FIGURE 4The curves of the regression coefficient changing over time.The panels (A-J) represent different variables.The black solid line represents the regression coefficient b(l), the black dashed line represents the 95% confidence interval of b(l), and the red dotted line is an auxiliary line (straight line between two endpoints of b(l)), which is used to judge whether b(l) is a curve.In (G*, H*), the blue dashed line corresponds to the right coordinate and is the absolute value of the slope of b(l).

C
-index (A) and relative prediction error (B) at different end times of follow-up.The C-index refers to the accuracy of the model in predicting the sequence of occurrence of death from breast cancer in the l-year follow-up.Relative prediction error is the proportion of prediction error to length of follow-up.Yu et al. 10.3389/fonc.2024.1352111Frontiers in Oncology frontiersin.org , the sample size; Cen, the censoring rate; l, end time of follow-up pre-specified; Rel bias, mean bias relative to true parameter; RMSE, root mean square error; Rel SE, mean estimated standard error/Monte Carlo empirical error; Cov, empirical coverage rate.
NN, the sample size; Cen, the censoring rate; l, end time of follow-up pre-specified; Rel bias, mean bias relative to true parameter; RMSE, root mean square error; Rel SE, mean estimated standard error/Monte Carlo empirical error; Cov, empirical coverage rate.

TABLE 3
Regression coefficients of static-effect RMTL regression (t = 10.5 years).Except for race, marriage, breast surgery, and axillary surgery, none of the other variables met the proportional subdistribution hazards assumption.The regression formula of the static effect RMTL model is as follows: RMTL

TABLE 5
The definition of three example patients.
(10)e patients were over 75 years of age, ER negative, PR negative, and had T2 stage cancer.RMTL(5) is the predicted mean life lost due to death from breast cancer during the five-years followup; RMTL(10)is the corresponding predicted value during the decade of follow-up.