Evaluating Random Forests for Survival Analysis Using Prediction Error Curves

Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for eﬃcient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R packages randomSurvivalForest and party . Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection.


Introduction
In survival analysis many different regression modeling strategies can be applied to predict the risk of future events.Often, however, the default choice of analysis relies on Cox regression modeling due to its convenience.Extensions of the random forest approach (Breiman 2001) to survival analysis provide an alternative way to build a risk prediction model.This bypasses the need to impose parametric or semi-parametric constraints on the underlying distributions and provides a way to automatically deal with high-level interactions and higher-order terms in variables and allows for accurate prediction (Ishwaran, Kogalur, Blackstone, and Lauer 2008).In certain applications it is of interest to compare the predictive accuracies of Cox regression to random forest or other strategies for building a risk prediction model.Several measures can be used to assess the resulting probabilistic risk predictions.Most popular are the Brier and logarithmic scoring rules (Gneiting and Raftery 2007) and rank statistics like the concordance index which equals the area under the ROC curve (AUC) for binary responses (Harrell Jr., Lee, and Mark 1996).For event time outcome in survival analysis these measures can be estimated pointwise over time, where pioneering work was done by Graf, Schmoor, Sauerbrei, and Schumacher (1999) for the time-dependent Brier score and by Heagerty, Lumley, and Pepe (2000) for time-dependent ROC analysis.Performance curves are obtained by combining time pointwise measures.
In this article we concentrate on prediction error curves that are time dependent estimates of the population average Brier score.However, similar results are usually obtained with the time-dependent AUC.At a given time point t, the Brier score for a single subject is defined as the squared difference between observed survival status (e.g., 1 = alive at time t and 0 = dead at time t) and a model based prediction of surviving time t.Typically the survival status at time t will be right censored for some data.Thus, inverse probability of censoring weights (IPCW) were proposed (Graf et al. 1999;Gerds and Schumacher 2006) to avoid bias in the population average.An important issue in prediction is correct prediction error estimation.If a risk prediction model fits well over the training data used to build the model, and has good prediction accuracy (assessed using the training data), we would like to know if it continues to predict well over independent validation data and what that prediction accuracy is.Various data splitting algorithms have been proposed, based on cross-validation and bootstrap, to correctly estimate the prediction accuracy of a model in the typical situation where a single data set has to be used to build the prediction models and again to estimate the prediction performance (Efron and Tibshirani 1997;Gerds and Schumacher 2007;Adler and Lausen 2009).
We present the R (R Development Core Team 2012) package pec, short for prediction error curves, that is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=pec.The package provides functions for IPCW estimation of the time-dependent Brier score and has an option for selecting between ordinary cross-validation, leave-one-out bootstrap, and the .632+bootstrap for estimating risk prediction performance.It is also possible to compute prediction error curves with independent test data.An important feature of pec is that the entire model building process can be taken into account in the evaluation of prediction error, including data dependent steps such as variable selection, shrinkage, or tuning parameter estimation.By using repeated data splitting (either crossvalidation or bootstrap), this yields estimates of the prediction error that are a composite of the prediction accuracy and the underlying variability of the prediction models due to whatever data dependent steps were used for their construction over the training splits of the data (Gerds and Van de Wiel 2011).
To illustrate the usage of pec we have extended the package to work with prediction models obtained using the R packages randomSurvivalForest (Ishwaran and Kogalur 2007;Ishwaran et al. 2008) and party (Hothorn, Bühlmann, Dudoit, Molinaro, and van der Laan 2006) which implement extensions of the random forest method for survival data.The new functions are illustrated in a worked out example where we analyse the data of the Copenhagen stroke study (COST, Jørgensen, Nakayama, Raaschou, Gam, and Olsen 1994).Earlier analyses of COST were based on a Cox regression model where the model was obtained by backward stepwise selection (Andersen, Andersen, Kammersgaard, and Olsen 2005).We compare the Cox prediction model obtained in this fashion to random forest prediction models.

Data structure
A survival prediction model uses data on the life history of subjects (the response) and their characteristics (the predictor variables).The response is ( Ti , ∆ i ), where Ti is the minimum of the survival time T i and the right censoring time C i and ∆ i = I{T i ≤ C i } is the status (censoring) value indicating whether a patient died (∆ i = 1) or was right-censored (∆ i = 0).The predictor variables X i = (X 1 i , . . ., X K i ) for subject i usually consists of both continuous scale variables, like age or blood pressure, or qualitative variables, like gender or genotype.
Example.We reconsider the data of the Copenhagen stroke study (COST, Jørgensen et al. 1994).In the COST study patients were enrolled after being admitted to a hospital with a stroke and were followed for 10 years.Recorded for each patient was the length of time from admission to death, for those who died, otherwise the length of time from admission to the maximal time where the patient was known to be alive was recorded (i.e., right censored).Table 1 provides summary information for the 13 variables collected among the 518 patients with complete case information.
For the purpose of illustration we construct three hypothetical new patients with values in the range of the predictor space defined by the COST patients and store them in a data frame called newData.These new "patients" have different ages, but do not differ otherwise: All are females, have a strokeScore of 38, a mean value of 6 for cholest, and the remaining predictor variables set to the value "no".

Predictor variables
Levels/ IQR

Stepwise variable selection in Cox regression
A Cox regression model specifies the conditional cumulative hazard function dependent on the vector of predictor variables X i = (X 1 i , . . ., X K i ): Here Λ 0 is an unspecified increasing function, referred to as the cumulative baseline hazard and β = (β 1 , . . ., β K ) ∈ R K is an unknown vector of regression coefficients.
Many different variable selection strategies can be applied within the context of Cox regression.Our approach will be to use backward stepwise variable selection (implemented using the function fastbw of the rms package, Harrell Jr. 2012) using the Akaike information criteria (AIC).We then fit a Cox model using only those variables selected in the stepwise procedure (we use cph from the rms package for the Cox regression analysis).In total, our approach is: Step 1. Run fastbw using AIC.
Step 2. Fit a Cox regression model using the predictors selected in Step 1.
Step 3. Using the estimates of β and Λ 0 obtained in Step 2, calculate for a subject with predictor values x: Note that the AIC criterion is likelihood based and closely related to the logarithmic scoring rule, which is strictly proper (Gneiting and Raftery 2007), and hence should be adequate for identifying a prediction model.However, as with any automated model selection procedure the result can be quite unstable (Austin and Tu 2004).

Random forests
A random forest is a nonparametric machine learning strategy that can be used for building a risk prediction model in survival analysis.In survival settings, the predictor is an ensemble formed by combining the results of many survival trees.The general strategy is as follows: Step 1. Draw B bootstrap samples.
Step 2. Grow a survival tree based on the data of each of the bootstrap samples b = 1, . . ., B: (a) At each tree node select a subset of the predictor variables.
(b) Among all binary splits defined by the predictor variables selected in (a), find the best split into two subsets (the daughter nodes) according to a suitable criterion for right censored data, like the log-rank test.
(c) Repeat (a)-(b) recursively on each daughter node until a stopping criterion is met.
Step 3. Aggregate information from the terminal nodes (nodes with no further split) from the B survival trees to obtain a risk prediction ensemble.
In what follows we consider two different implementations of random forests from the two R packages: randomSurvivalForest (Ishwaran et al. 2008) and party (Hothorn et al. 2006).To describe the risk predictions for the forest ensembles, let T b denote the bth survival tree and let T b (x) denote the terminal node of subjects in the bth bootstrap sample with predictor x (each x will sit inside a unique terminal node due to the recursive construction of the tree).Note that when bootstrap samples are drawn with replacement then subjects from the original data set may occur multiple times.Let c ib be the number of times subject i occurs in the bth bootstrap sample.Note that c ib = 0 if the ith subject is not included in the bth bootstrap sample.Using the counting process notation (Andersen, Borgan, Gill, and Keiding 1993) Ñi (s) = I( Ti ≤ s, ∆ i = 1); Ỹi (s) = I( Ti > s), we have Thus, in the terminal node corresponding to covariate value x, Ñ * b (s, x) counts the uncensored events until time s and Ỹ * b (s, x) is the number at risk at time s.

Random survival forests (randomSurvivalForest)
In random survival forests (Ishwaran et al. 2008), the ensemble is constructed by aggregating tree-based Nelson-Aalen estimators.Specifically, in each terminal node of a tree, the conditional cumulative hazard function is estimated using the Nelson-Aalen using the "in-bag" data (i.e., subjects in the bootstrap sample): .
The ensemble survival function from random survival forest is Ĥb (t|x) . (1)

Conditional inference forests (party)
The conditional inference forest for survival analysis (Hothorn, Lausen, Benner, and Radespiel-Tröger 2004) utilizes a weighted Kaplan-Meier estimate based on all subjects from the training data at x for prediction.Its ensemble survival function is . (2) If the underlying survival function is continuous, then this is asymptotically equivalent to .
Comparison to equation (1) shows that the way trees are aggregated in conditional inference forests is different from the random survival forest approach.Conditional inference forests put more weight on terminal nodes where there is a large number of subjects at risk: .
In contrast, random survival forests uses equal weights on all terminal nodes.It is difficult to say which formula may be better in general, however.

Extracting predicted survival probabilities
The S3-methods predictSurvProb.We now explain how we have extended the package pec to work with R objects of classes fastbw (rms), rsf (randomSurvivalForest), and cforest (party).Section 3.4 then shows some details for writing new extensions.

Selected Cox regression
The following function selectCox evaluates Step 1 and Step 2 the stepwise variable selection strategy for the Cox regression model described in Section 2.2.
selectCox <-function(formula, data, rule = "aic") { require("rms") require("prodlim") fit <-cph(formula, data, surv = TRUE) bwfit <-fastbw(fit, rule = rule) if (length(bwfit$names.kept The function cph from rms is used to fit a Cox regression model using the selected predictor variables, with one exception: If the set of selected predictor variables in Step 1 is empty, then the Kaplan-Meier method is applied to predict survival via the function prodlim (prodlim).The resulting R object is assigned to the S3 class selectCox.

randomSurvivalForest package: rsf
A random survival forest model is fitted with the function rsf (randomSurvivalForest) which results in an object of S3 class rsf.Using the built-in predict.rsfmethod we extract the averaged cumulative hazard function for each line in newdata at the event times of the original data set (see Section 2.3).The survival probabilities are then computed via Equation 1 and with the help of the function sindex (prodlim) these are evaluated at the requested times.

party package: cforest
A conditional inference forest model (Section 2.3) is fitted with the function cforest (party) and results in an S4 class object.We get around the class problem by creating a wrapper function pecCforest.The fitted cforest S4 class object is stored in a list which is supplied with the call.The output is assigned to the S3 class pecCforest.pecCforest <-function(formula, data, ...) { require("party") out <-list(forest = cforest(formula, data, ...)) class(out) <-"pecCforest" out$call <-match.call()out } The treeresponse method (party) can be applied to the list element pecCforest$forest of the S3 class object in order to extract survival probabilities for newdata at times (see Equation 2).The resulting object is a list which contains for each line in newdata the Kaplan-Meier curve in form of a survfit object (survival).We then apply predictSurvProb.survfit to the elements of the list.predictSurvProb.pecCforest<-function (object, newdata, times, ...) { survObj <-treeresponse(object$forest, newdata = newdata) p <-do.call("rbind", lapply(survObj, function( Example (continued).We use the complete case data of the COST study that contains data from 518 patients with no missing values in any of the 13 predictor variables.
R> library("pec") R> library("survival") R> library("rms") R> library("randomSurvivalForest") R> library("party") R> data("cost") We fit a random survival forest model based on 1000 trees under default settings of the package (Ishwaran and Kogalur 2007).We also fit a conditional inference forest model via pecCforest based on 1000 trees, otherwise using the default options.Finally a selected Cox regression model is fitted via selectCox as described above.

R>
Interestingly, for all newData the conditional inference forest model predicts a lower 10 year survival chance than the random survival forest model.
We use the function plotPredictSurvProb to plot the predicted survival curves for new subjects for a given modeling strategy.It applies the predictSurvProb method to predict at all event times but other time points can be selected.The following code produces the curves in Figure 1. Figure 1 displays the survival curves for the hypothetical new patients for each of the three different methods.The three models yield similar prediction at the median age but differ for the young and old patients.For these patients, Cox regression is more extreme.This is consistent with what we saw in Table 2.

Writing new extensions
A predictSurvProb method has three required arguments: object: The fitted R object.
newdata: A data frame with the predictor variables.times: A vector of time points.
To extend the functionality of pec a function predictSurvProb.xcan be written which extracts survival probabilities from an object of class x and returns them in a matrix with as many columns as there are times and as many rows as there are lines in newdata.It is possible to pass further arguments to the predictSurvProb method via the argument model.args of the function pec.For example the function predictSurvProb.rpartuses an optional argument train.data.Note that a requirement for repeated data splitting (cross-validation) is that the R object contains its call which has to be a list in which the argument data can be modified.Note also that presently only S3 objects are supported by the functionality of pec, but that it is relatively easy to wrap S4 objects into the required form, as shown earlier for cforest.

Prediction error curves
The function pec compares the predictive performances of rival survival modeling strategies over time.As outlined in the introduction, several measures are available for assessing a model in survival analysis.Here we restrict attention to prediction error defined as the timedependent expected Brier score: Here the expectation is taken with respect to the data of a subject i which does not belong to the training set, and Y i (t) = I(T i ≥ t) is the true status of subject i and Ŝ(t|X i ) is the predicted survival probability at time t for subject i with predictor variables X i .Useful benchmark values for the Brier score are 33%, which corresponds to predicting the risk by a random number drawn from U [0, 1], and 25% which corresponds to predicting 50% risk for everyone.The most important benchmark is the expected Brier score of a prediction model which ignores all predictor variables.In survival analysis the Kaplan-Meier estimate of survival calculated with all training samples yields such a null model.

Estimation from right censored data
The function pec provides several estimates of the expected Brier score.For the estimation of this prediction error the true status is replaced by the observed status defined as Ỹi (t) = I( Ti > t) and the squared residuals are weighted using inverse probability of censoring weights (IPCW, Gerds and Schumacher 2006), given by where Ĝ(t|x) ≈ P(C i > t|X i = x) is an estimate of the conditional survival function of the censoring times.If an independent test data set DM is available, the expected Brier score is estimated by where M is the number of subjects in DM and Ŝ is based on a training data.
The weights (3) can be optionally estimated using the function ipcw by making use of the marginal Kaplan-Meier estimator (ignoring the predictor variables), a Cox regression model, or an additive Aalen regression model.Furthermore, in the case of only discrete covariates the stratified Kaplan-Meier for the censoring times can be used and in case of a single continuous covariate a nonparametric kernel type estimator.See Section 6 for more discussion on how to choose the appropriate method in practice.

Cross-validation
Several methods are implemented to deal with over-fitting in situations where only one data set is available for building the prediction models and for the estimation of prediction performance (Gerds and Schumacher 2007).Optionally the function computes one or all of the following estimates: AppErr: Apparent or re-substitution estimate.
Since this terminology which is used in the package pec can be confusing and since also the literature is not always consistent, we provide explicit formulae for all estimates below.Note that the term "bootstrap cross-validation" has been used for example in Fu, Carroll, and Wang (2005) for an estimate which is currently not implemented in pec, where models are trained in bootstrap samples and validated in the full data.Note further that the estimate termed "leave-one-out bootstrap" (see below) was defined in Efron and Tibshirani (1997) and is also not implemented in the current version of pec.The subsampling version of the bootstrap .632+estimate was proposed in Binder and Schumacher (2008).
The apparent estimate of the prediction error re-substitutes the data of the N subjects, D N , that were used to build the models as follows: The bootstrap cross-validation approach splits the data D For bootstrap without replacement (subsampling) M b is a fixed user defined number smaller than N and the same for each b, and is the size of the bootstrap samples for resampling without replacement.For bootstrap with replacement M b is the number of subjects not drawn in the bootstrap sample D b .We note that the bootstrap cross-validation estimate (with replacement) is closely related to the leave-one-out bootstrap estimate, which is given by reversing the order of summation: where K i is the number of bootstrap samples where the ith subject is left-out.As noted above the leave-one-out bootstrap estimate is currently not implemented in pec.
k-fold cross-validation is similar to bootstrap cross-validation and differs only in the way in which training and test sets are constructed.The number of training sets is fixed by k in k-fold cross-validation.The data D N are split into k subsets D j (j = 1, . . ., k) and models Ŝj are trained with the data D N \ D j where the jth subset is removed.The data in the jth D j are used as test set.The cross-validation estimate is then obtained by averaging: Typical choices for k are 5 and 10.Since the resulting estimate may depend on how the data are split into k pieces, the function pec allows to repeat k-fold cross-validation B times.For example, when the user specifies splitMethod = cv10 and B = 20 then 10-fold crossvalidation is repeated 20 times and the average of the 20 cross-validation estimates returned.
Leave-one-out cross-validation is the same as N -fold cross-validation.Models S i are trained on the data Note that the leave-one-out cross-validation estimate is not random.
The bootstrap .632estimate of the prediction error is a weighted linear combination of the apparent estimate and the bootstrap cross-validation estimate: The constant 0.632 is independent of the sample size and corresponds to the probability to draw with replacement subject i into the bootstrap sample: The bootstrap .632+estimate of the prediction error is a weighted combination of the apparent estimate, the bootstrap cross-validation estimate and the no information estimate given below (Efron and Tibshirani 1997;Gerds and Schumacher 2007).The bootstrap .632+estimate of the prediction error is given by where one defines ω = 0.632 in the special case where BootCvErr(t, Ŝ) < AppErr(t, Ŝ).
The no information estimate is needed to construct the bootstrap .632+estimate and obtained by permuting the status indicator of the subjects: See Section 6 for more discussion of the different cross-validation estimates and further references.

Integrated Brier score
The prediction error curves can be summarized with the integrated Brier score defined as where predErr refers to any method for estimating the predictive performance, and τ > 0 can be set to any value smaller than the minimum of the maximum times for which estimated prediction errors can be evaluated in each bootstrap sample.

COST study
Example (continued).We fit a pec object with the three rival prediction models fitcox, fitrsf, and fitcforest described in Section 3. The models are passed to pec as a list.
We choose splitMethod = Boot632plus and base our analysis on the complete data of the COST study with N = 518 patients.We set B = 1000 and use bootstrap without replacement (subsampling) to find training sets of size M = 350, and correspondingly test sets of size N -M = 168.Note that with this option, pec computes in addition to the .632+estimate, the apparent estimate, the bootstrap cross-validation estimate, and the no information estimate of the prediction error curves.The integrated Brier scores between 0 and 4215 days for the bootstrap .632+estimates of the prediction error are lowest for random survival forest.The selected Cox regression model and the conditional inference forest have approximately the same value.All three models perform substantially better than Kaplan-Meier.
The following command plots prediction error curves estimated with the bootstrap .632+method for all four models in one graph (Figure 2): All curves start at time 0 where all subjects are alive and all predictions equal to 1.The  prediction error curve of the benchmark Kaplan-Meier model reaches its maximum value, at the median survival time of 4.9 years.This value is 0.25 for prediction errors estimated with the apparent method.For the bootstrap .632+estimator, random survival forest clearly outperforms the other strategies.However, the bootstrap cross-validation estimates of the prediction error curves of all three strategies are close to each other (Figure 3) showing that The bootstrap .632+estimate is a combination of the apparent estimate, the bootstrap crossvalidation estimate, and the no information estimate, so the difference in the prediction error curves for the bootstrap .632+estimates compared to the bootstrap cross-validation estimates rely on one of the other estimates.To observe how the different estimates behave for each of the three modeling strategies we plot in Figure 4 the apparent-, the bootstrap-, and the no information estimates of the prediction error together with the bootstrap .632+estimate and 100 prediction error curves obtained during the cross-validation procedure bootstrap.These latter curves were extracted via the argument keep.matrix= TRUE in pec.The following code produces these figures: special.addprederr= c("AppErr", "BootCvErr", "NoInfErr")) + }) For both random forest approaches the apparent estimate of the prediction error curves are much lower than the bootstrap cross-validation estimate of the curves.

Discussion
The pec package offers a highly extensible methodology for prediction error curve estimation under almost any type of prediction model.In a sequence of worked out detailed examples, we showed how to incorporate random survival forests, conditional inference forest, and a specific type of backward selection algorithm for Cox regression modeling within the pec framework.We used data from the COST study for illustration, which is a data scenario involving a lowdimensional predictor space, and is a common scenario seen in the field of medical applications.We compared prediction performance of the three modeling strategies and found that the bootstrap cross-validation estimate of the prediction error was comparable across all methods.Interestingly, the .632+bootstrap estimate showed that random survival forest was best.Also, despite the similarity of the overall bootstrap cross-validation performance, we found that the three strategies yielded different predictions when evaluated at synthetically made predictor values.Efron and Tibshirani (1997) proposed the .632+estimate as an improvement on crossvalidation for the misclassification rate.The .632+ was studied for other loss functions including the Brier score (Molinaro, Simon, and Pfeiffer 2005;Jiang and Simon 2007;Wehberg and Schumacher 2004;Gerds and Schumacher 2007).Hence the .632+estimate is an attractive choice.For high-dimensional settings, Binder and Schumacher (2008) recommended a bootstrap subsampling version of the .632+estimate where the size of the training sets is set at 63.2% times the full sample size.In any case, for models like the ones obtained with random forests, which can reduce the apparent error to almost zero, the usefulness of the .632+estimate has not yet been resolved.The main results obtained for the COST study in the present article were therefore based on the bootstrap cross-validation estimate.

Estimation of weights
The question of which model to use when estimating the weights in the IPCW approach may be hard to answer in general.However, some advice can be given.If there are good reasons to believe that the censoring times C i are mutually independent of the event times T i and the covariates X i , then marginal Kaplan-Meier weights yield consistent estimates of the Brier score (Graf et al. 1999).The result may not be asymptotically efficient (van der Laan and Robins 2003; Gerds and Schumacher 2006) but the advantage is that no further modeling is needed.However, if the censoring times are only conditionally independent of the event times given the covariates then marginal Kaplan-Meier weights will introduce a bias.Instead a so-called working model could be used for the weights, i.e., as obtained by a Cox proportional hazard or an additive Aalen regression model.However, a different bias will be introduced if the working model is mispecified.This dilemma needs to be resolved in the specific application at hand.
For the comparison of prediction models it is most important that the same IPCW weights are used for all models; this is a feature of the function pec.Also, if we compare different models that are based on different subsets of predictor variables, then the working model for the censoring distribution should include and combine all the predictor variables.This is controlled by the argument formula of the function pec.
By default, the weights are estimated using all subjects when bootstrapping or cross-validation is used.However, a new release of pec has an option that allows the weights to be estimated separately in each test sample.In our (short) experience there were no great differences between the two options.

Model variability
For data-adaptive modeling strategies one would expect models selected from different training samples to be different.For example, the algorithm described in Section 2.2 could select a Cox regression model containing two predictor variables in one training sample and a model with three predictor variables in another training sample.Similarly for the random forest approaches the trees will differ across bootstrap samples.This model uncertainty is well known (see e.g., Austin and Tu 2004) and should be considered as a substantial part of the prediction error (Gerds and Van de Wiel 2011).

Other packages
There are several other R packages for comparing prediction models in survival analysis with the Brier score as an accuracy measure.The package peperr (Porzelius and Binder 2010) is an early branch of pec featuring parallel computing and separate control of complexity parameters which is of interest for high-dimensional settings.However, presently there are more predictSurvProb methods implemented in pec, and notably peperr only supports Kaplan-Meier weights for the IPCW estimate.
The package ipred (Peters and Hothorn 2008) provides functionality for estimating the model performance in classification, regression, and survival settings.In the survival context the Evaluating Random Forests for Survival Analysis Using Prediction Error Curves expected Brier score can be estimated using cross-validation, and bootstrap 632+.But the IPCW weights can only be obtained from the marginal Kaplan-Meier estimate of the censoring survival distribution.As well, only one model is assessed in one call to the function.The pec function allows one to compute the performance of a list of different modeling strategies simultaneously, which guarantees that exactly the same bootstrap samples are used for the training of all models.

Alternative assessment measures
As noted in the introduction, ROC curves are another popular method for assessing prediction performance, which can be extended to survival analysis.The package survivalROC (Heagerty 2006) offers functions to estimate time-dependent ROC curve the area under the ROC curve (AUC).The package Hmisc (Harrell Jr. 2009) provides a popular estimate of the closely related C-index which assesses the frequency of concordant pairs of subjects, where the model predicted the lower risk for the subject with the higher survival time, among all usuable pairs.An IPCW estimate of the C-index which works similar as the function pec is implemented in pec, see Gerds, Kattan, Schumacher, and Yu (2010).
The package survcomp (Haibe-Kains, Sotiriou, and Bontempi 2009) can be used to compute the Brier score, ROC curves, and the C-index.However, there are no cross-validation estimates implemented, and the package can only be used to assess the predictive performances of Cox regression models and Kaplan-Meier risk.

Further extensions
Extensions of the pec package are in the works.One planned extension is the van de Wiel test (Van de Wiel, Berkhof, and Van Wieringen 2009) for pairwise testing of the difference of the prediction error curves from rival modeling strategies at selected time points.Another is an extension to compare modeling strategies in competing risks settings.
We also plan to implement other measures of prediction performance like the time-dependent AUC and the logarithmic score in pec.Note that estimates of the concordance index are readily implemented in pec (Gerds et al. 2010).
Full support for S4 methods is planned for future versions of the package pec.
N into many bootstrap training samples D b and corresponding test samples D N \ D b (b = 1, . . ., B).Here bootstrap samples can either be drawn with or without replacement from the original data.Then, models Ŝb are trained with the bootstrap training data D b , corresponding test samples are predicted and residuals are computed.Finally the bootstrap cross-validation estimate of the prediction error is calculated by averaging over the test data sets: The estimation of the IPCW weights (3) depends on two arguments in pec: The argument cens.modelspecifying the model class, and the predictor variables specified on the right hand side of the formula used to describe the censoring model.In our example there are few censored observations, and we use the Kaplan-Meier estimator for the censoring distribution.The default option is cens.model= "cox" which reverts to Kaplan-Meier when no predictor variables are specified.The left hand side of the argument of formula (either a Surv object or a Hist object) is used to identify the survival response.The argument keep.index= TRUE controls whether or not to keep the indices of the bootstrap samples and keep.matrix= TRUE controls whether or not to keep for each model all estimates of the prediction error curves obtained in the B cross-validation steps.R> extends <-function(...) TRUE R> set.seed(13)R> library("doMC") R> registerDoMC() R> fitpec <-pec(list("selectcox" = fitcox, "rsf" = fitrsf, + "cforest" = fitcforest), data = cost, formula = Surv(time, status) ~1, + splitMethod = "Boot632plus", B = 1000, M = 350, keep.index= = Surv(time, status) ~1) $selectcox selectCox(formula = fitform, data = cost, rule = "aic") $rsf rsf(formula = fitform, data = cost, ntree = 1000, forest = TRUE) $cforest pecCforest(formula = fitform, data = cost, controls = cforest_classical(ntree = 1000)) rightCensored response of a survival model Noshows information of the three modeling strategies and of the Kaplan-Meier model, a null model added by default.

Figure 2 :
Figure 2: The bootstrap .632+estimates of the prediction error based on 1000 bootstrap samples.Both random forest approaches are based on 1000 trees per bootstrap sample.

Figure 3 :
Figure 3: The bootstrap cross-validation estimates of the prediction error based on 1000 bootstrap samples.Both random forest approaches are based on 1000 trees per bootstrap sample.

Figure 4 :
Figure 4: Each of the three panels shows four different estimates of the prediction error together with a cloud of 100 bootstrap cross-validation curves (grey lines).Both random forest approaches are based on 1000 trees per bootstrap sample.

Table 1 :
Shown are the count of the 518 complete case COST patients.For factors the count are shown for each level listed in column 2. For continuous variables are shown for the interquartile range (IQR); median [Q1; Q3].

Table 2 :
Predicted survival (in %) for newData at 10 years based on the selected Cox regression model (selectCox) and two random forest models: Random survival forest (rsf) and conditional inference forest (cforest) both based on 1000 trees.
The function extends is required to ensure that cforest yields a survival probability.This is not necessary if only functions in party are used.