Modeling Biomarker-Informed Adaptive Design

Adaptive clinical trial designs incorporating biomarkers have gained much attention because of their potential benefits of shorter trial duration, smaller study sizes, higher probability of trial success, enhancement of the benefit -risk relationship, and mitigating ever-escalating development costs. In the planning of a biomarker-informed adaptive design, it is important to perform clinical trial simulations in order to understand the operating characteristics of the design. This manuscript is concerned with simulating the trial data for a biomarker-informed adaptive design that uses a biomarker/surrogate endpoint for interim treatment selection. We demonstrated that correlation between biomarker and the primary endpoint alone is not sufficient. Instead, the modeling the relationship between biomarker and primary endpoint is necessary as we demonstarted using an example of non-small-cell lung cancer trial, and presented an alternative hierarchical model for modeling the two endpoints. We studied how each parameter in the new model affects the power of a biomarker informed two-stage winner design and proposed methods for estimating the parameters. R code for application of the new methodology is provided.


Background
With the surge in advanced technology especially in the "OMICS" space (eg.Genomics, proteomics, etc), the adaptive clinical trial designs that incorporate biomarker information have attracted significant attention.
Biomarkers are measurable biological indicators of the status of an organism in a particular health condition or disease state Chen et al. [1].In drug development, biomarkers can be classified into four categories: prognostic biomarkers, predictive biomarkers, pharmacodynamics biomarkers, and surrogate endpoints BDWG, Lassere MN, Wang SJ [2][3][4].
Prognostic biomarkers predict patients with differing risks of an overall outcome of disease, regardless of treatment.Predictive biomarkers predict the likelihood of patient's response to a particular treatment.Pharmacodynamic biomarkers indicate drug effect on the target in an organism, which are often used in earlier phases of drug development to demonstrate drug activity and to provide information on likely clinical benefit and go/no-go decisions.A surrogate endpoint is a measure of the effect of a treatment that correlates well with a clinical endpoint Jenkins M, Buyse M [5,6].Surrogate endpoints are mostly used as biomarkers intended to substitute for clinical endpoints with faster and more sensitive evaluation of treatment effects.
Many types of adaptive clinical trial designs incorporating biomarkers have been proposed and discussed, including the biomarker-enrichment designs that use predictive biomarkers for interim study population selection Freidlin et al., Jiang et [7][8][9][10][11][12][13][14][15][16].Focused clinical trials using a biomarker strategy have the potential to result in shorter trial duration, smaller study sizes, higher probability of trial success, enhancement of the benefit-risk relationship, and potentially mitigating ever-escalating development costs.
In the planning of a phase III or a phase II/III study that uses biomarker informed adaptive procedures, a biomarker or set of biomarkers needs to be available and has been well studied first in the phase II development stage or other validation studies.Furthermore, as suggested inthe FDA guidance on adaptive design clinical trials for drugs and biologics and Chow and Chang, it is important to perform clinical trial simulations before conducting the study in order to evaluate the multiple-trial design options and clinical scenarios that might occur when the study is actually conducted and to assess operating characteristics of the design, including sample size required for a target power [17,18].
In general, clinical trial simulations rely on a statistical model to generate the trial data.This manuscript is concerned with generating trial data for a biomarker-informed adaptive design that uses a biomarker/surrogate endpoint for interim treatment selection, that is, the statistical model for the relationship between biomarker and primary endpoint.
Friede et al. proposed a simulation model based on standardized test statistics that allows the generation of biomarker-informed adaptive trials [16].The test statistics of the trial were simulated directly instead of trial data.To simulate individual patient data for the trial, on the other hand, the conventional statistical model used is a one-level correlation model.For example, if both endpoints follow normal distribution, Shun et al. used a bivariate normal distribution to model the biomarker and primary endpoint [14].Wang et al. showed that the bivariate normal model that only considers the individual level correlation between the two endpointsis inappropriate when

Abstract
Adaptive clinical trial designs incorporating biomarkers have gained much attention because of their potential benefits of shorter trial duration, smaller study sizes, higher probability of trial success, enhancement of the benefit -risk relationship, and mitigating everescalating development costs.In the planning of a biomarker-informed adaptive design, it is important to perform clinical trial simulations in order to understand the operating characteristics of the design.This manuscript is concerned with simulating the trial data for a biomarkerinformed adaptive design that uses a biomarker/surrogate endpoint for interim treatment selection.
We demonstrated that correlation between biomarker and the primary endpoint alone is not sufficient.Instead, the modeling the relationship between biomarker and primary endpoint is necessary as we demonstarted using an example of non-small-cell lung cancer trial, and presented an alternative hierarchical model for modeling the two endpoints.We studied how each parameter in the new model affects the power of a biomarker informed two-stage winner design and proposed methods for estimating the parameters.R code for application of the new methodology is provided.

ISSN: 2641-8681
little is known about how the means of the two endpoints are related [19].Wang et al. further proposed a two-level correlation (individual level correlation and mean level correlation) model to describe the relationship between biomarker and primary endpoint [19].The twolevel correlation model incorporates a new variable that describes the mean level correlation between the two endpoints.The new variable, together with its distribution, reflects the uncertainty about the meanlevel relationship between the two endpoints due to a small sample size of historical data.It was shown that the two-level correlation model is a better choice for modeling the two endpoints.
In this manuscript, we demonstrate the necessity of considering the uncertainty about the mean level relationship between biomarker and primary endpoint using an example of non-small-cell lung cancer trial, and present an alternative hierarchical model for the relationship between biomarker and primary endpoint in Section 2 [20].We investigate how each parameter in the hierarchical model affects the power of a biomarker informed two-stage winner design in Section 3 and discuss methods to estimate the parameters in the hierarchical model in Section 4. Conclusions are drawn in Section 5.

A non-small-cell lung cancer trial that uses biomarker informed two-stage winner design.
For simplicity, we present our discussions and results in the context of a "biomarker informed two-stage winner design", however the proposed model and the conclusions drawn could be extended to other biomarker-informed adaptive designs that use a biomarker/ surrogate endpoint for interim treatment selection.
A "biomarker informed two-stage winner design" Shun et al. combines a phase II and a phase III study [14].It starts with several active treatment arms and a control arm with a planned interim analysis on biomarker.At interim, the inferior arms will be terminated based upon results of biomarker by ranking of observations, and only the most promising treatment ("winner") will be retained and carried to the end of the study with the control arm.The final comparison between the winner arm and the control arm will be performed on data from both stages and on study primary endpoint.This design has the potential to shorten the duration of the trial for drug development and can be cost effective.
In this section, we demonstrate the necessity of considering the uncertainty about the mean level relationship between biomarker and primary endpoint by considering an example of non-small-cell lung cancer trial that uses biomarker informed two-stage winner design.
In cancer trials, early tumor size reduction allows early assessment of the activity of an experimental regimen, and can serve as an early biomarker for survival prediction and assist in early drug development decisions.
Wang et al. quantified the relationship between early tumor size reduction and patient survival in non-small-cell lung cancer patients, and developed a parametric model for survival times, utilizing data from four non-small-cell lung cancer registration trials [20].The parametric survival model proposed includes baseline tumor size (centered at 8.5 cm), ECOG status (0/1/2/3 as a categorical variable) and percentage tumor reduction from baseline at week 8 ( where.~(0, 0.68) These models have been shown reasonably good predictive ability.
Given the above historical information, we evaluate the performance of a non-small cell lung cancer trial with 3 experimental treatments and a control arm using biomarker informed two-stage winner strategy.It is expected that, in a biomarker informed two stage winner design, the more closely a biomarker and primary endpoint correlated, the better the performance of the design would be.While the individual level correlation ( )

(
, log wk Corr PTR T ρ = is the only measurement considered for the relationship between biomarker and primary endpoint, we simulate the power of the design for different values of ρ.Some assumptions for the design are as follows.
Assume the expected mean survival time for patients in the 3 active treatment arms are 8 months, 10 months and 12 months, respectively, and the expected mean survival time for patients in the control arm is around 6 months.For simplicity, we assume the patient to be enrolled in the study share the same baseline characteristics with baseline tumor size 8.5 cm and ECOG status 1.
We consider the two-stage winner design with maximum sample size N = 86 for each treatment group, and the interim analysis is planned at the information time 0.5 (that is, the interim sample size is 1 43 n = per group).The same total sample size will yield 99% power for a non-adaptive design with three active treatment arms and a control arm with family-wise error rate controlled at 0.05.(The sample size is chosen to ensure the sample correlation coefficient in our simulations is not significantly different from the theoretical ISSN: 2641-8681 value).
Two sets of measurements will be obtained, ( ) , percentage tumor reduction from baseline at week 8 ( Details on how to simulate the data that satisfies the models (1) and ( 2) while preserving the correlation coefficient ρ can be found in Appendix 1. Type I error rate of the trial is preserved at 0.025 level by adjusting critical rejection values of the final test statistic of the design Wang et al. [21].
Our simulations show that even for 0.1 ρ = (with average sample correlation coefficient around 0.05), the considered two-stage winner design has power over 95%, which violates the presumption that a biomarker informed two-stage winner design should have a better performance when the interim and final endpoints have a stronger correlation.Further, it suggests that the individual level correlation alone is not sufficient to describe the relationship between biomarker and primary endpoint.
The model we considered that with individual level correlation alone can be written as follows: While in our context, 5.57 0.42 for treatment group, and 5.42 0.38 for placebo group.
Therefore, when T u is assumed, PTR u is a fixed number, and a larger value of u T corresponds to a larger value to PTR u .The power of the design is high even for small values of ρ because the same rank order of the mean responses of the two endpoints is always preserved.However, it is not true that the mean responses of the two endpoints are always with the same rank order for treatment groups.When the parameters in the regression model Are estimated, the estimates α o , α 1 , α 2 , α 3 , come with variance.
The uncertainty of the estimates α o , α 1 , α 2 , α 3 , which corresponds to the uncertainty of mean level relationship should be considered for describing the relationship between biomarker and primary endpoint.
In other words, instead of a fixed effects model, a random effects model should be used to describe the relationship between biomarker and primary endpoint.
For the case when both biomarker and primary endpoint follow normal distribution, as an alternative to the two-level correlation model proposed by Wang et al.Proposed the following hierarchical (multilevel) model (MEM): where ( ) j i X is measurement of biomarker for i th person in j th treatment group, ( ) j i Y measurement of primary endpoint, ���ρ and ρ u are the common correlations between biomarker and primary endpoint at individual and mean levels, respectively [21,22].

Biomarker-informed two-stage winner design using the hierarchical model
To construct clinical trial simulations for a biomarker-informed two-stage winner design using the hierarchical model, the below steps could be followed:  4. Test the hypothesis based on the primary endpoint Y at the final analysis, which will be based on data of the winner arm from the two stages and the all the data of Y from placebo.R function for simulating the power of a biomarker-informed two-stage winner design with the hierarchical model could be found in Appendix 2. By specifying with the null hypotheses and the worst case scenario that 1 u ρ ρ = = , this R function could also be used for determining the critical value of the test statistic for the design that controls the type I error.Required sample size for the design could be obtained by invoking this R function by specifying a H and the target power as well.
Simulation studies that investigate how each parameter in this ISSN: 2641-8681 hierarchical model affects the power of the design have been carried out.For the purpose of simulation, we borrow the data from the above non-small-cell lung cancer example and consider a control and three active arms with responses in the primary endpoint 5.42, 5.48, 5.70, and 5.88, respectively.The responses in the biomarker are -0.21,0.32 and 0.75 for the three active arms respectively.
The critical value for the final test statistic that controls the type I error at 0.025 could be obtained by simulation and is equal to 2.4 in our case, and the simulation results for power are summarized in Table 1.We can see that, the mean level parameters, u ρ and u σ impact the power significantly, while ρ and σ only have a mild impact on power.

Estimation of the Parameters
To estimate each parameter in the hierarchical model using historical data, maximum likelihood method and Bayesian inference could be considered.
Assume for each treatment group j , a set of historical data {(x i (j) , y i (j), i=1…N j }, is available, where ( ) j x i is measurement of biomarker and ( ) j y i measurement of primary endpoint.
Since the distribution of could be rewritten as: The maximum likelihood estimator for each parameter could be obtained by maximizing the likelihood function: However, since there's no closed form solution for each parameter, numeric iterative methods should be applied in order to obtain the value of estimators.
Bayesian inference is an easier option when appropriate prior distribution is chosen in our case.The Normal-inverse-Wishart distribution is a multivariate four-parameter family of continuous probability distributions.It is the conjugate prior of a multivariate normal distribution. .Murphy et al. [23] Since the conjugated prior for u indj is Gaussian, and the likelihood is Gaussian, the updated distribution after observing the data Given the above, we propose to estimate the parameters in the hierarchical model as follows: , predictors of time to death (T).The regression model writes as follows:Where T is the time to death (day), a o is the intercept, a 1 a 2 a 3 are the slopes for ECOG, centered baseline, and 8 wk PTR , respectively, and TD ε is the residual variability following a normal distribution with a mean of 0 and variance of 2 TD σ .It was showed that, for second-line treatments: for the interim analysis based on biomarker X and determine the winner based on the best response in X in the winner arm w and N samples of Y from ( ) for the placebo.
al., Freidlin et al., Zhou et al. and Lee et al. and the biomarker-informed adaptive designs that use surrogate endpoints for interim treatment selection Todd and Stallard, Stallard, Shun et al., Di Scala and Glimm Friede et al.

Table 1 :
Biomarker-informed two-stage winner design with MEM ( ρ and σ u σ