A note on estimating the Cox-Snell R 2 from a reported C statistic (AUROC) to inform sample size calculations for developing a prediction model with a binary outcome

2 from closely related models already published in their field. In this letter, we present details on how to derive R 2 using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R 2 for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.

In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, or with a binary or time-to-event outcome.
As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a prediction model with a binary outcome, the outcome proportion and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R 2 (proportion of variance explained) must be specified. This proposal raises the question of how to identify a plausible value for R 2 in advance of model development. Our articles suggest researchers should identify R 2 from closely related models already published in their field. In this letter, we present details on how to derive R 2 using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R 2 for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.

K E Y W O R D S
clinical prediction model, C statistic (AUROC), R squared, sample size

INTRODUCTION
In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, 1 or with a binary or time-to-event outcome. 2 These approaches have been implemented in the package pmsampsize produced for Stata and R by Ensor et al 3 The required sample size aims to minimize model overfitting and to ensure key parameters (such as the model intercept) are estimated precisely. As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a logistic regression-based prediction model, the outcome proportion, and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R 2 (proportion of variance explained) must be specified. 4,5 For example, to minimize overfitting when developing a logistic regression-based prediction model for a binary outcome, we showed that the sample size (number of participants, n) needed to achieve an expected uniform shrinkage factor of S is, where P is the total number of parameters corresponding to the predictors to be considered for inclusion in the model, S is recommended to be ≥ 0.9 (such that predictor effects must be shrink by ≤10%), and R 2 CS is a conservative guess at the actual overall fit of the model after model development (ie, the adjusted Cox-Snell R 2 CS ). This proposal raises the question of how to identify a plausible value for R 2 CS in advance of model development. In most clinical fields, previous prediction models already exist. Indeed, often a new prediction model is developed specifically to update or improve (eg, by adding additional predictors) upon the performance of an existing model. Therefore, our articles suggest researchers should identify R 2 CS from closely related models already published in their field, 1,2 and use it to inform the value of R 2 CS to use in the sample size calculation for the development of their new model. Extraction of R 2 CS is straightforward for prediction models with continuous outcomes, as R 2 CS is nearly always reported. For binary and time-to-event outcomes, it is rarely reported, but our article explains how to obtain it from other reported measures including the likelihood ratio statistic along with Nagelkerke's R 2 , McFadden's R 2 , (for binary outcomes) and Royston's D statistic, O'Quigley's R 2 , Royston's R 2 , and Royston and Sauerbrei's R 2 (for survival outcomes). A widely reported performance measure is the C statistic, which measures the discrimination performance of a model, and for a binary outcome is equivalent to the area under the receiver operating characteristic curve (AUROC). For time-to-event outcomes, we also discussed how to use the approach of Jinks et al to predict Royston's D (and thus subsequently R 2 CS ) from a reported C statistic from a survival model such as Cox regression. 6 However, we did not present details on how to derive R 2 CS when only the C statistic is reported for a prediction model with a binary outcome-which is often the case. Hence, we now address this to further help researchers to implement our sample size proposal.

OBTAINING R CS FROM A REPORTED C STATISTIC FOR A PREDICTION MODEL WITH A BINARY OUTCOME
We consider the scenario where a new prediction model for a binary outcome is being developed for a particular target population. Assume that an article exists that describes the performance of a closely related model (eg, similar outcome and target population), which reports the model's C statistic but not the model's R 2 CS . We want to use the reported C statistic to estimate the unreported R 2 CS , which is needed to base our sample size calculation on. To do this, we proceed as follows. First, letp i denote the existing model's predicted risk of the outcome event for an individual (i) conditional on their values of predictors included in the model. We refer to logit(p i ) = LP i as the linear predictor (LP) values of the existing model. Second, assume LP i is normally distributed in those with the event and also those without the event, with different means but a common variance. Under these (potentially strong) assumptions, the difference in means of these two normal distributions is a function of the C statistic, as described by various authors elsewhere 7-10 ; specifically, the difference in means is where C is the C statistic, and Φ −1 (⋅) denotes the inverse of the standard normal distribution. Third, we simulate a large dataset of LP i values based on these two normal distributions, whilst also ensuring the overall outcome proportion matches that assumed for the target population. A logistic regression model can then be fitted to this simulated data, and R 2 CS obtained post estimation. The steps can be outlined more formally as: i. Simulate a large dataset (eg, one million participants) ii. Assign an outcome of Y i = 0 (no event) or Y i = 1 (event) based on sampling from a Bernoulli ( ) distribution, where is the outcome proportion in the article reporting the existing prediction model iii. Simulate LP i values for every participant assuming LP i ∼ N(0, 1) in the non-events group and LP i ∼ N( , 1) in the events group, where = iv. Fit a logistic regression to the simulated data; that is, This fitted model will have the same C statistic as specified in step (iii). The estimated values of and ensure a perfect calibration-in-the-large (= 0) and calibration slope (= 1), respectively, in new data from the same assumed target population. v. Obtain the R 2 CS value for this fitted logistic regression model post estimation, for example, by using the fitstat command in Stata or the PseudoR2(model, which="CoxSnell") function in the DescTools package in R. Alternatively, it can be calculated directly using where n is the number of simulated participants (step i) and LR is the likelihood ratio statistic of the fitted logistic regression model. The obtained R 2 CS value can now be used in the sample size calculation for the new prediction model.
Stata and R code are provided in the appendix to implement the approach, and we plan to embed within the pmsampsize package. Note that, as discussed in our articles, 2,11 the value of R 2 CS depends on the outcome proportion in the target population. Therefore, if the outcome proportion is anticipated to be lower than that reported by the article of the existing model (eg, perhaps because outcomes have since improved), then this could be used in step (ii) (and subsequent sample size calculations) instead.
Where there are a few options for the choice of C statistic (eg, based on multiple validation studies of a previous model), we recommend taking the lowest value, as this is conservative (ie, leads to larger required sample sizes for the new model development study). When using the C statistic reported from a model development study, ideally the C statistic should be adjusted for optimism due to any overfitting. For example, this could be the C statistic after a penalized regression approach has been used; the C statistic after optimism-adjustment based on results from bootstrapping 12 ; or based on the C statistic estimated in any independent validation (test) datasets.

A SIMULATION STUDY TO INVESTIGATE THE FIVE-STEP PROCESS WHEN THE ASSUMPTIONS ARE POTENTIALLY INCORRECT
Our five-step approach makes strong assumptions of normality of the existing model's LP, with a common variance for both events and non-events groups. These assumptions are a practical compromise, to help researchers elicit an approximate value for R 2 CS in situations where only a reported C statistic, so that they can apply our sample size proposal. Further research might investigate whether they are a good approximation in other situations where the assumptions are invalid. For example, in Figure 1 we show the accuracy of the R 2 CS estimate from our five-step process, compared with the actual value (ie, that which would have been observed but is unreported), when the overall LP i distribution is assumed normal, but the LP i distribution may not be normal with common variances in the events and non-events groups. We generated 100 different LP i distributions (ie, 100 different true prediction models) with LP i ∼ N( , 2 ), ∼ uniform(0, 5) and ∼ uniform(0.5,3), to cover a range of true C statistic values from about 0.63 to 0.94, corresponding to true R 2 CS values of about 0.002 to 0.49. Reassuringly, there is still close agreement between the estimated and actual R 2 CS in most scenarios [ Figure 1], even though the assumptions made in the five-step process are not necessarily correct.

APPLIED EXAMPLE
Thangaratinam et al 13 developed a prediction model for calculating the risk of an adverse maternal outcome by discharge, in women with early onset preeclampsia in the context of current care. Upon external validation in the target population, the reported C statistic was 0.81. The R 2 CS was not provided, and so we applied the five-step procedure described in the previous section, assuming a C statistic of 0.81 and an outcome proportion of 0.77 as reported in the validation study. This gave a R 2 CS of 0.21.
F I G U R E 1 Agreement between the estimated R 2 CS value (estimated from the reported C statistic estimate using our five-step procedure) and the actual R 2 CS (ie, that which would have been observed but is unreported) in 100 prediction model scenarios corresponding to LP i ∼ N( , 2 ) and ∼ uniform(0, 5) and ∼uniform(0.

CONCLUDING REMARK
We have shown how to derive an estimate of the Cox-Snell R 2 from a reported C statistic of a prediction model for a binary outcome. As C statistics (or equivalently AUROCs) are commonly reported for existing prediction models of binary outcomes, our approach allows researchers to quickly obtain a Cox-Snell R 2 to use within sample size calculations when developing new prediction models in the same field.