A consistent version of distance covariance for right‐censored survival data and its application in hypothesis testing

Distance covariance is a powerful new dependence measure that was recently introduced by Székely et al. and Székely and Rizzo. In this work, the concept of distance covariance is extended to measuring dependence between a covariate vector and a right‐censored survival endpoint by establishing an estimator based on an inverse‐probability‐of‐censoring weighted U‐statistic. The consistency of the novel estimator is derived. In a large simulation study, it is shown that induced distance covariance permutation tests show a good performance in detecting various complex associations. Applying the distance covariance permutation tests on a gene expression dataset from breast cancer patients outlines its potential for biostatistical practice.


INTRODUCTION
Testing associations between continuous clinical covariates and survival are predominantly performed using some (semi-)parametric survival model, such as, for example, the Cox proportional hazards regression (Cox, 1972) or the accelerated failure time (AFT) model (Wei, 1992;Kalbfleisch and Prentice, 2011). These approaches assume that the influence of the clinical covariate on survival takes a simple parametric form. For example, a proportional hazards model with a single predictor assumes that the impact on the hazard ℎ( ) at all times is multiplicative with ℎ( ) = ℎ 0 ( ) exp( ), where ℎ 0 (⋅) is the baseline hazard and ∈ ℝ is the corresponding regression coefficient.
Consequently, tests based on these models are targeted to detect parametric associations of the specified form. Notably, tests based on the Cox proportional hazards model or the AFT model have low power against alter-natives that involve nonmonotone influences of covariates on survival.
Yet, the impact of many clinical parameters with survival is more complex and cannot be modeled using a simple proportional hazards model. For detecting such complex association in statistical practice, one can use visual diagnostic tools such as for example plotting the martingale residuals against the covariate values (Therneau and Grambsch, 2000). However, there is an increasing number of studies investigating the influence of high-dimensional molecular data on patient survival, see, for example Chen et al. (2007), Dvinge et al. (2013), and Ng et al. (2016). When considering such a high number of predictors-the number of covariates often exceed several thousand-visual diagnostic tools are not feasible anymore. Consequently, complex associations between high-dimensional molecular data and survival cannot be detected by the testing procedures that are predominantly used in statistical practice. This clearly urges the need for automatic tools for finding general associations between covariates and censored data and first approaches for this purpose have been developed during the last years (Peng and Fine, 2008;Wood, 2012).
For testing dependence in uncensored data, Gábor Székely and his coauthors (Székely et al., 2007;Székely and Rizzo, 2009) recently developed the concept of distance covariance. Different from classical dependence measures such as the usual covariance, the distance covariance coefficient can detect any possible association between random variables. Since the appearance of Székely et al. (2007) and Székely and Rizzo (2009), enormous interest in the theory and applications of distance correlation has arisen; we refer to Edelmann et al. (2019) for an extensive review of distance covariance methods with a focus on time series analysis.
In some recent work, distance covariance has been applied on variable screening for high-dimensional models (Zhao and Li, 2012;Song et al., 2014;Li et al., 2016;Zhang et al., 2017;Huang et al., 2019;Pan et al., 2019). The approach in  screens variables based on the distance correlation between covariates and the martingale residuals of a null Cox model, whereas Chen et al. (2018) introduce two different versions of robust distance correlation for survival data. While both approaches show good performance in simulations, neither of the two works derives an empirical version of distance correlation satisfying the desirable properties in Székely et al. (2007). Most importantly, they do not show that their estimators converge to a suitable population version of distance covariance or distance correlation and they do not derive tests, which can provably detect any dependencies between covariates and survival times.
In this work, we will fill this gap by developing an estimator for distance covariance between covariates and survival times under right-censoring. Notably, exploiting results by Datta et al. (2010) an Huo and Székely (2016), we derive the consistency and asymptotic normality of an inverse probability of censoring-weighted (IPCW) version of distance covariance. Moreover, using cumbersome combinatorial calculations, we derive computationally efficient versions of these novel estimators that can be calculated in ( 2 ) steps. The new estimator for distance covariance immediately induces a permutation test for detecting arbitrary associations between covariates and survival times.
The remainder of the paper is organized as follows. In Section 2, we introduce the concept of distance covariance and state the U-statistic estimate for the squared distance covariance derived by Huo and Székely (2016). Section 3 contains the definitions of the novel estimator of distance covariance for right-censored survival data and corresponding theoretical results; we also discuss important practical aspects. A simulation study in Section 4 compares a permutation test based on the estimator for distance covariance with tests based on the Cox model, thin-plate regression splines (Wood, 2003), and a test adapted from the residual distance covariance in . In Section 5, the distance covariance permutation test is applied on a gene expression dataset from breast cancer patients. All proofs to theoretical results are provided in the Appendix "Proofs" in the Supporting Information.

PRELIMINARIES: DISTANCE COVARIANCE
In the following, we will introduce the notion of distance covariance as defined by Székely et al. (2007) and Székely and Rizzo (2009). Moreover, we will state some important results that will be crucial for deriving the novel estimator.

An IPCW estimator for the distance covariance under right-censoring
We consider random variables ∈ ℝ , ∈ [0, ∞) and ∈ [0, ∞), where represents a covariate vector, rep-resents the survival time, and represents the censoring time. While and are allowed to feature any possible dependency, we assume that is independent of ( , ). Moreover, we will assume that the hazard function of the censoring distribution is absolutely continuous. The censoring indicator will be denoted by Δ = 1 { ≤ } and the follow-up time by = min( , ).
Given an i.i.d. sample ( 1 , 1 , Δ 1 ), … , ( , , Δ ) drawn from ( , , Δ), our goal is to detect associations between the covariate vector and the survival times by deriving a consistent estimator for the distance covariance ( , ). For this purpose, we will consider an IPCW versions of the U-statistic representation of distance covariance (2.6) given by Now applying (Datta et al., 2010, Theorem 1) (see Supporting Information for details) yields that (under the technical assumptions (C1)-(C3) given below), (3.1) is a consistent estimator for ( , ), where ℎ(⋅) denotes the kernel function defined in Equation (2.5),̂(⋅) is the Kaplan-Meier estimate of , and for any function defined on a subset of ℝ, ( −) denote the left-sided limit of (⋅) in .
However, different from the empirical version of distance covariance for uncensored data defined in Equation (2.2) that can be calculated in ( 2 ) steps, the representation (3.1) features a complexity of ( 4 ). Consequently, this representation is computationally highly inefficient and virtually unfeasible for most practical purposes.
Using nontrivial calculations (see Appendix "Proofs" in the Supporting Information for details), we can derive an alternative representation for (3.1), that features a computational complexity of ( 2 ) only. Notably, defining = ) . (3.2) While the representation forΩ stated in (3.2) may appear rather involved on the first sight, merely elementary algebra is needed to evaluate the summands in the respective expression. Moreover, it is evident that the calculation of (3.2) can be carried out in ( 2 ) steps.
For the purpose of stating the asymptotic properties ofΩ, we introduce the notation where ( , ,Δ) denotes the joint probability measure of the triple ( , , Δ). Now, assuming the conditions we can show (see Appendix "Proofs" in the Supporting Information) the following asymptotic results forΩ.
Moreover, if 2 > 0, then Remark 1. If and are independent, it holds 2 = 0 and hence In a similar fashion, we obtain an IPCW version of distance correlation that is presented in the Supporting Information.
Before concluding this section, we briefly discuss the meaning and purpose of the conditions (C1)-(C3). The existence of the second moments of the covariates and the survival times (C1) implies the existence of the second moments of the kernel function (2.5) (see ), which, in turn, guarantees the asymptotic normality of the estimator (3.2) when no censoring is present. (C2) now states that the second moments of the weighted kernel function ℎ 2 1 ( , )Δ 2 ( −) exist leading to the asymptotic normality of the estimator (3.2), when the censoring distribution is known and used for calculating the weights . The last condition (C3) now guarantees that this asymptotic normality is retained when is replaced by its Kaplan-Meier estimateˆ; technically, it ensures that the second summand of the asymptotic variance in (3.3) is finite.
For practical purposes, it is interesting to know that, given the existence of the second moments in (C1), conditions (C2) and (C3) are satisfied if is bounded and the supports of and are of the form [0, ] and [0, ], respectively, with < . This corresponds to studies, where the follow-up is longer than the maximal lifespan of an individual, but there may be censoring due to dropouts. On the other hand, (C2) is never satisfied, if the supports of and are of the form [0, ] and [0, ], respectively, with > . This relates to the situation, where the maximal lifespan of an individual is longer than the follow-up, which is often the case in clinical studies. In these settings, variations ofΩ are required, which are discussed in Section 3.3.
For an illustration of the computational performance, see Appendix "Computation time of the permutation test" in the Supporting Information.

Practical considerations
In Section 3.1, we have derived a consistent estimator for the distance covariance between covariates and the survival time under right-censoring using only mild assumptions. However, even these mild assumptions are often too restrictive in biomedical situations. Notably, it can happen that the distribution of the survival time is not completely identifiable and hence also the distance covariance between and cannot be consistently estimated. To be precise, in a clinical or observational study, the censoring time is typically upper-bounded by the total duration of the study 0 . On the other hand, the probability that a patient's survival time is larger than 0 is typically positive. Hence, ( > 0 ) > 0, which directly implies that ( > | > 0 ) is not identifiable for > 0 . Concerning the assumptions for Theorem 1, (C2) is obviously not satisfied in such a situation, since the corresponding integral is not finite.
As a remedy, we suggest to estimate the distance covariance of the covariate with the truncated survival times * = min( , ), where ∈ [0, ∞) is an suitably chosen constant. In most medical trials, a natural choice for is given by the follow-up time of the trial. If this is not the case, one may consider a clinically meaningful endpoint (e.g., 5 years) or choose a point in time, where a specified number of patients is still under observation (e.g., 20%). The truncated survival time * is clearly a meaningful substitute for , while featuring the advantage that the distance covariance of with * can be consistently estimated (as long as 2 < ∞ and ( ) is absolutely continuous).
For calculating the distance covariance between and * , all follow-up times larger than are replaced by ; moreover, the corresponding censoring indicators are set to 1.
We will see in Section 5 that this approach works well in practice.
Moreover, we point out that for any monotone transformation (⋅), the distance covariance between and ( ) also provides an association test that is consistent against all alternatives. In Section 5, we compare the distance covariance test after applying a log-transformation with the standard version on a real data example.

Computational issues in high dimensions and alternative tests
When analyzing high-dimensional covariate data, one may often perform a single association tests for each coefficient of the covariate resulting in a large number of tests. In this case, one may consider adjusting for multiple testing using, for example, a Benjamini-Hochberg or Bonferroni correction. For this purpose, it is crucial to precisely evaluate small p-values of the tests. When using a permutation test, this issue requires a large number of permutations and hence leads to an increased computation time.
For dealing with this issue, we propose two different strategies. The first strategy involves generating permuted samples and then fitting a generalized Pareto distribution to the tail of the empirical distribution of the corresponding statisticsΩ 1 , … ,Ω ; subsequently, the p-values are obtained by the fitted distribution.
The second strategy is a stepwise procedure. In the first step, the p-value for each test is calculated using a small number of permutations. In the second step, only tests for which p-values fall below a certain threshold are reevaluated using a larger number of permutations. In subsequent steps, very small p-values can then be more precisely evaluated using even more permutations.
More explanation and a comparison of the two different strategies are provided in the Supporting Information.

SIMULATION STUDY
To evaluate the performance of the proposed distancelikelihood ratio covariance permutation test, a simulation study was conducted. The novel method was compared to likelihood ratio (LR) tests based on the AFT and the Cox proportional hazards models with a linear predictor. As competitors that take into account nonlinear effects, we considered a Cox proportional hazards model with a thin-plate regression spline (Wood, 2003), a marginal test between covariates and martingale residuals of an empty Cox model based on sliced average variance estimation (Shao et al., 2007) and an ad-hoc permutation test based on the distance covariance between covariates and martingale residuals, cf. . For the thin-plate regression spline, a Wald-type test was used to test the null hypothesis that the smooth term equals zero (Wood, 2012). The regression spline was set to five degrees of freedom with one covariate and 25 degrees of freedom in the case of two covariates. The number of permutations in the two permutation tests was set to = 1000. The simulation study consists of two phases. The first phase uses prior simulations to specify the simulation parameters for the second phase. In the second phase, the different methods are compared with respect to power and type I error rates. For the first phase, a large dataset 1 of = 10 6 observations was randomly drawn once. For each observation, three covariates were generated independently with uniform distribution ∀ = 1, … , 3 ∶ ∼ (−3, 3). Based on the dataset 1 , nine different linear predictor structures with one covariate and five two-dimensional structures were calculated. Explicit formulas used in the construction of are given in Appendix "Simulation Study" in the Supporting Information; a graphical illustration of the different structures is given in Figure 1. This figure appears in color in the electronic version of this article, and any mention of color refers to that version. We investigated, for example, polynomials (linear, quadratic, cubic), cyclical patterns (cosinus, modulus, Rastrigin), and functions with random components (twoLines, twoParabolas, cond-Var). Survival and censoring times were log-normally distributed with ∼ log-Normal( ,log = , ,log ) and ∼ log-Normal( ,log , ,log ). Using the generated dataset 1 , was standardized. For each , the standard deviation ,log was set to achieve prespecified signal to noise ratios. Linear, quadratic, cubic, and piecewise constant scenarios signal-to-noise ratios were set to 0.125, cosinus and circle to 0.25, two lines and parabolas as well as two-way interactions to 0.5, and all remaining structures to 0.75. For each , the parameters ,log and ,log were then set to three different pairs of values corresponding to censoring rates of 0.25, 0.5, and 0.75, respectively. Finally, for each structure, was set to the 90% quantile of the survival times. All prior parameters (standardization of , ,log , ,log , ) were computed once for each structure and then remain fixed for the second phase of the simulation procedure.
In the second phase, the different tests were compared in terms of type I error and power. For each sample size ∈ {100, 200, … , 1000} and each of the 14 structures, we simulated = 1000 datasets, corresponding to observations of the vector ( 1 , 2 , 3 , , ) ∈ ℝ 5 . All survival times were cut by the parameter (corresponding to the 90% quantile of the survival times in the dataset 1 ). We evaluated six different tests: AFT LR test, Cox proportional hazards with linear effects (Cox) LR test, Cox proportional hazards with thin-plate regression spline (CoxSpline) Wald test, permutation test based on the IPCW distance covariance (DCov), test based on sliced average variance estimation (SAVE_martingale), and permutation test based on the distance covariance between covariates and martingale residuals (Dcov_martingale). Power was evaluated by testing the influential covariates with survival. Type I error rates were estimated by evaluation of the test statistics based on the third noise covariate, which had no relation to the survival time.
The results of the type I error rates are provided in Figures 3-5 in Appendix "Simulation Study" of the Supporting Information. In all scenarios, the type I errors of all tests are close to the nominal threhold = .05. In most cases, the AFT model test has slightly higher type I error than the Cox model based test (Cox). The type I error rates of the IPCW distance covariance test (Dcov) were quite comparable to those of the martingale-based distance covariance test (Dcov_martingale) and the Wald test based on thin-plate regression spline (CoxSpline). The SAVE_martingale methodology showed slightly higher type I errors.
The results of the power analyses are given in Figures 2-4. Most considered tests showed a high power in discovering linear, quadratic, and cubic effects. Even AFT and Cox could detect quadratic and cubic effects, which is likely due to the positive trend in the considered structure. Only SAVE_martingale showed a substantially worse performance than all other tests in detecting linear and cubic relationships. The two distance covariance based tests give very similar results and show a very good overall performance. In the low censoring scenario, Dcov shows the highest power in four scenarios (cosine, condVar, modulus, shubert), CoxSpline has the highest power in three scenarios (piecewise constant, circle, rastrigin), and SAVE_martingale has the highest power in two scenarios (twoLines, twoParabolas). As expected, tests based on linear models (AFT, Cox) could not detect the two-way interaction and continuous XOR example. Dcov, Dcov_martingale, SAVE_martingale, and CoxSpline performed equally well in those cases. The results for moderate censoring are similar; however, Dcov seems to show some slight problems with heavy censoring.

REAL DATA EXAMPLE
For demonstrating the performance of the proposed distance covariance test on real data, we applied this method on the breast cancer microarray dataset investigated by Van De Vijver et al. (2002). The dataset consists of expression values of 24,481 genes in 295 tumor samples from different patients. Moreover, it also contains data on overall survival for each individual and additional patient data such as estrogen receptor status and age. The median To search for genes that are associated with overall survival, we applied both an LR test based on a standard Cox model and the proposed distance covariance permutation test using = 10, 000 permutations. For the distance covariance test, we restricted our dataset to a 10-year follow-up. The censoring indicator of all individuals surviving 10 years was set to 1 and its survival time to 10 years; this corresponds to considering the truncated endpoint min( , 10years), see Section 3.3 for details.
For comparison of the two tests, we first evaluated the number of rejections for association tests between single genes and overall survival using a gene-wise significance level of = 0.05. The number of rejections for the distance covariance permutation test and the LR test based on the Cox model are illustrated on the left-hand side of Figure 5. Additionally to the test controlling the gene-wise significance level of .05, we additionally applied a Benjamini-Hochberg (BH) correction to control for a false-discovery rate (FDR) of 0.1. The number of rejections for an FDR of 0.1 is illustrated on the right-hand side of Figure 5.
When controlling for a gene-wise significance level for .05, the distance covariance test leads to 6649 rejections, whereas the tests based on the Cox model reject the null hypothesis of no association in 5362 cases. When adjusting the p-values using the Benjamini-Hochberg procedure and controlling for an FDR of 0.1, we obtain 4417 rejections for the distance covariance test and 3107 rejections for the Cox model. In total, these results suggest that, for this example, the distance covariance test is superior in detecting associations between gene expression and survival.
Additionally, we also performed the distance covariance permutation test on log-transformed follow-up times to investigate possible benefits arising from the use of such a transformation. When controlling for a gene-wise significance level for .05, the test on log-transformed survival times leads to 6759 rejections. When adjusting the p-values using the Benjamini-Hochberg procedure and controlling for an FDR of 0.1, the null hypothesis of no association is rejected in 4633 cases. A corresponding Venn diagram is provided in the Appendix "Real Data Example" in the Supporting Information. In total, we observe a slightly higher number of rejections when applying a log-transformation, suggesting that it may be beneficial to apply such a transformation in practice.
To better understand the nature of the dependencies, which are detected by a distance covariance test, but not by a Cox model, we performed an exploratory analysis investigating the associations that turned out to be highly F I G U R E 3 Estimated power of six different tests over all structures and varying sample sizes. Moderate censoring (censoring rate 0.5). This figure appears in color in the electronic version of this article, and any mention of color refers to that version significant (p-value < 10 −3 ) for the distance covariance test, but nonsignificant (p-value >.05) for the Cox modelbased test. In total, there were 101 of such associations. First, we fitted Cox models using penalized splines via the pspline function from the R package survival, which revealed nonlinear functional dependencies (i.e., relationships, where the hazard can be expressed by a function of the gene) between numerous genes and the corresponding log-hazard ratio (vs. a reference level of gene expression 0). Four of these dependencies are displayed in Figure 6. The upper two plots illustrate genes with a U-shaped and reverse U-shaped relationship with the log-hazard ratio, respectively. In the lower left plot, the log-hazard ratio seems to be approximately constant for expression values smaller than 0.2, followed by a sharp decrease. Finally the hazard ratio in the lower right plot is decreasing for small expression values, approximately constant for intermediate values and increasing for large expression values.
However, as we have seen in Section 4, distance covariance cannot only detect functional but any kind of relationships between a predictor and its log-hazard ratio, see the examples of two lines or two parabolas in Section 4. To investigate nonfunctional relationships, we fitted separate standard Cox models on different subgroups of the data. As an example, we considered the subgroups of patients with a small tumor (diameter ≤20 mm, = 155 patients) and patients with a large tumor (diameter >20 mm, , = 140 patients). We also explored the subgroups of younger (age ≤40 years) and older (age >40 years) women. In 46 of the 101 genes under consideration, a standard Cox model in at least one of the four subgroups gave a significant result. While this could possible indicate that there is a subgroup effect for some of these genes, we also emphasize that we encounter multiple testing issues at this point. In total, we found the evidence of subgroup effects in these four subgroups rather unconvincing.
Finally, since the distance standard deviation is more robust against outliers and heavy-tailed distributions than the classical standard deviation , one may conjecture that the distance covariance independence test for survival is more robust against outliers than the Cox model. Indeed, when removing the four most extreme observations (in terms of distance to the median) from each gene, Cox model based tests gives significant results in 47 of the 101 genes under consideration; this observation suggests that the distance covariance test may indeed be more robust than the LR test based on a standard Cox model.

DISCUSSION
The number of studies investigating associations between survival and high-dimensional molecular data has been rapidly increasing throughout the last decade. Concurrent with this development, a large number of different methods for testing and modeling in high-dimensional survival data have been developed (Tibshirani, 1997;Goeman et al., 2005;Gorst-Rasmussen and Scheike, 2013). An aspect, which is often neglected in hypothesis testing, is the possible presence of nonmonotone effects. Consequently, the overwhelming majority of approaches have little power against these alternatives. In this work, we have extended the well-studied distance correlation approach to right-censored survival data. Notably, we have established consistent estimators for the distance covariance and distance correlation between a covariate vector and the survival time. Using permutation tests based on the distance covariance estimator allows for testing against general alternatives in this setting. Both simulation results and a real data example demonstrate that this test appears to be a very promising approach, outperforming existing methods in numerous settings. Especially the real data example shows that this approach can detect a considerable number of associations that a standard Cox model would miss.
This article may be seen as a first step into distance covariance testing for survival data. Numerous extensions of the distance correlation and distance covariance methodology have been developed and it is obvious that many of these extensions are also sensible in the survival setting. An important topic is, for example, the consideration of confounders using partial (Székely and Rizzo, 2014) or conditional (Wang et al., 2015) distance covariance. Moreover, while permutation tests are still the standard option for distance covariance in applications, parametric tests have recently been developed, which may further speed up the necessary computations (Berschneider and Böttcher, 2018). Finally, distance covariance methods are not restricted to covariates in ℝ ; the concept of generalized distance covariance (Lyons, 2013;Sejdinovic et al., 2013) actually allows for testing independence between random variables on general metric spaces. It is our goal to address extensions of these methods on survival data in our future work.

A C K N O W L E D G M E N T S
We thank Thomas Alexander Gerds and Paul Blanche for helpful discussions. Dominic Edelmann is funded by the Deutsche Forschungsgemeinschaft (DFG), Project No. 417754611. This work was supported by grants from the German Federal Ministry of Education and Research (BMBF) (grant numbers: 01ER1505a and 01ER1505b).

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this paper are openly available in the R Package cancerdata at https://biocon ductor.org/packages/release/data/experiment/html/canc erdata.html [10.18129/B9.bioc.cancerdata] by Budczies and Kosztyla (2020).