Personalized Schedules for Surveillance of Low Risk Prostate Cancer Patients

Low risk prostate cancer patients enrolled in active surveillance (AS) programs commonly undergo biopsies on a frequent basis for examination of cancer progression. AS programs employ a fixed schedule of biopsies for all patients. Such fixed and frequent schedules, may schedule unnecessary biopsies for the patients. Since biopsies have an associated risk of complications, patients do not always comply with the schedule, which increases the risk of delayed detection of cancer progression. Motivated by the world's largest AS program, Prostate Cancer Research International Active Surveillance (PRIAS), in this paper we present personalized schedules for biopsies to counter these problems. Using joint models for time to event and longitudinal data, our methods combine information from historical prostate-specific antigen (PSA) levels and repeat biopsy results of a patient, to schedule the next biopsy. We also present methods to compare personalized schedules with existing biopsy schedules.


Introduction
Cancer screening is a widely used practice to detect cancer before symptoms appear in otherwise healthy individuals.A major issue of screening programs that has been observed in many types of cancers is overdiagnosis (Esserman et al., 2014).To avoid subsequent overtreatment, patients diagnosed with low grade cancers are commonly advised to join active surveillance (AS) programs.The goal of AS is to routinely examine the progression of cancer and avoid serious treatments such as surgery, chemotherapy, or radiotherapy as long as they are not needed.To this end, AS includes, but is not limited to periodical evaluation of biomarkers pertaining to the cancer, physical examination, medical imaging, and biopsy.
In this paper we focus on AS programs for prostate cancer (PCa), wherein the decision to exit AS and start active treatment (e.g., operation, chemotherapy) is typically based on invasive examinations, such as biopsies (Bokhorst et al., 2016).Biopsies can be reliable, but they are also painful, and have an associated risk of complications such as urinary retention, hematuria and sepsis (Loeb et al., 2013).Because of this reason the schedule of biopsies has significant medical consequences for patients.A frequent schedule of biopsies may help detecting PCa progression earlier but the corresponding risk of complications will be high.Although such a schedule may work well for patients with faster progressing cancer, for slowly progressing PCa patients many unnecessary biopsies may be scheduled.Furthermore, patients do not always comply with such a schedule, as has been observed by Bokhorst et al. (2015).In the specific case of the world's largest AS program, Prostate Cancer Research International Active Surveillance (PRIAS) (Bokhorst et al., 2016), the compliance rate for biopsies steadily decreased from 81% at year one of follow up, to 60% at year four, 53% at year seven and 33% at year ten.Such non-compliance can lead to delayed detection of PCa progression, which may reduce the effectiveness of AS programs.This paper is motivated by the need to reduce the medical burden of repeat biopsies while simultaneously avoiding late detection of PCa progression.For the latter purpose, several AS programs employ a fixed annual schedule (biopsies with a gap of one year) of biopsies (Tosoian et al., 2011;Welty et al., 2015).However, given the medical burden of biopsies, most AS programs also strongly advise against scheduling biopsies more frequently than the annual schedule.The PRIAS schedule for biopsies is relatively lenient: one biopsy each is scheduled at year one of follow up, year four, year seven, year ten, and every five years thereafter.However, PRIAS also switches to the annual schedule if a patient's prostate-specific antigen (PSA) doubling time, also known as PSA-DT and measured as the inverse of the slope of the regression line through the base two logarithm of PSA values, is less than 10 years.We intend to improve upon such fixed schedules by creating personalized schedules for biopsies.That is, a different schedule for every patient utilizing their periodically measured serum PSA levels (measured in ng/mL) and repeat biopsy results.Biopsies are graded using the Gleason score, which takes an integer value between 6 and 10, with 10 corresponding to the most serious state of the cancer.Patients enter AS only if their Gleason score is 6.When the Gleason score becomes greater than 6, also known as Gleason reclassification (referred to as GR hereafter), patients are often advised to switch from AS to active treatment.Hence, for AS programs it is of prime interest to detect GR soonest with least number of biopsies possible.
Personalized schedules for screening have received much interest in the literature, especially in the medical decision making context.For diabetic retinopathy, cost optimized personalized schedules based on Markov models have been developed by Bebu and Lachin (2017).For breast cancer, personalized mammography screening policy based on the prior screening history and personal risk characteristics of women, using partially observable Markov decision process (MDP) models have been proposed by Ayer, Alagoz, and Stout (2012).MDP models have also been used to develop personalized screening policies for cervical cancer (Akhavan-Tabatabaei, Sánchez, and Yeung, 2017) and colorectal cancer (Erenay, Alagoz, and Said, 2014).Another type of model called joint model for time to event and longitudinal data (Rizopoulos, 2012;Tsiatis and Davidian, 2004) has also been used to create personalized schedules, albeit for the measurement of longitudinal biomarkers (Rizopoulos et al., 2016).In context of PCa, Zhang et al. (2012) have used partially observable MDP models to personalize the decision of (not) deferring a biopsy to the next checkup time during the screening process.The decision is based on the baseline characteristics as well as a discretized PSA level of the patient at the current check up time.
Our work differs from the above referenced work in certain aspects.Firstly, the schedules we propose in this paper, account for the latent between-patient heterogeneity.We achieve this using joint models, which are inherently patient-specific because they utilize random effects.Secondly, joint models allow a continuous time scale and utilize the entire history of PSA levels.Lastly, instead of making a binary decision of (not) deferring a biopsy to the next pre-scheduled check up time, we schedule biopsies at a per patient optimal future time.To this end, using joint models we first obtain a full specification of the joint distribution of PSA levels and time of GR.We then use it to define a patient-specific posterior predictive distribution of the time of GR given the observed PSA measurements and repeat biopsies up to the current check up time.Using the general framework of Bayesian decision theory, we propose a set of loss functions which are minimized to find the optimal time of conducting a biopsy.These loss functions yield us two categories of personalized schedules, those based on expected time of GR and those based on the risk of GR.In addition we analyze an approach where the two types of schedules are combined.We also present methods to evaluate and compare the various schedules for biopsies.
The rest of the paper is organized as follows.Section 2 briefly covers the joint modeling framework.Section 3 details the personalized scheduling approaches we have proposed in this paper.In Section 4 we discuss methods for evaluation and selection of a schedule.In Section 5 we demonstrate the personalized schedules by employing them for the patients from the PRIAS program.Lastly, in Section 6, we present the results from a simulation study we conducted to compare personalized schedules with PRIAS and annual schedule.

Joint Model for Time to Event and Longitudinal Outcomes
We start with the definition of the joint modeling framework that will be used to fit a model to the available dataset, and then to plan biopsies for future patients.Let T * i denote the true GR time for the i-th patient enrolled in an AS program.Let S be the schedule of biopsies prescribed to this patient.The corresponding vector of time of biopsies is denoted by ik , ∀j < k}, where N S i are the total number of biopsies conducted.Because of the periodical nature of biopsy schedules, T * i cannot be observed directly and it is only known to fall in an interval l i < T * i ≤ r i , where , r i = ∞ if GR is not observed yet.Further let y i denote the n i × 1 vector of PSA levels for the i-th patient.For a sample of n patients the observed data is denoted by D n = {l i , r i , y i ; i = 1, . . ., n}.
The longitudinal outcome of interest, namely PSA level, is continuous in nature and thus to model it the joint model utilizes a linear mixed effects model (LMM) of the form: where x i (t) denotes the row vector of the design matrix for fixed effects and z i (t) denotes the same for random effects.Correspondingly the fixed effects are denoted by β and random effects by b i .
The random effects are assumed to be normally distributed with mean zero and q × q covariance matrix D. The true and unobserved PSA level at time t is denoted by m i (t).Unlike y i (t), the former is not contaminated with the measurement error ε i (t).The error is assumed to be normally distributed with mean zero and variance σ 2 , and is independent of the random effects b i .
To model the effect of PSA on hazard of GR, joint models utilize a relative risk sub-model.The hazard of GR for patient i at any time point t, denoted by h i (t), depends on a function of subject specific linear predictor m i (t) and/or the random effects: where M i (t) = {m i (v), 0 ≤ v ≤ t} denotes the history of the underlying PSA levels up to time t.The vector of baseline covariates is denoted by w i , and γ are the corresponding parameters.The function f (•) parametrized by vector α specifies the functional form of PSA levels (Brown, 2009;Rizopoulos, 2012;Rizopoulos et al., 2014;Taylor et al., 2013) that is used in the linear predictor of the relative risk model.Some functional forms relevant to the problem at hand are the following: These formulations of f (•) postulate that the hazard of GR at time t may be associated with the underlying level m i (t) of the PSA at t, or with both the level and velocity m ′ i (t) of the PSA at t. Lastly, h 0 (t) is the baseline hazard at time t, and is modeled flexibly using P-splines.More specifically: where B q (t, v) denotes the q-th basis function of a B-spline with knots v = v 1 , . . ., v Q and vector of spline coefficients γ h0 .To avoid choosing the number and position of knots in the spline, a relatively high number of knots (e.g., 15 to 20) are chosen and the corresponding B-spline regression coefficients γ h0 are penalized using a differences penalty (Eilers and Marx, 1996).
For the estimation of joint model's parameters we use a Bayesian approach.The details of the estimation method are presented in Web Appendix A of the supplementary material.

Personalized Schedules for Repeat Biopsies
Once a joint model for GR and PSA levels is obtained, the next step is to use it to create personalized schedules for biopsies.Let us assume that a personalized schedule is to be created for a new patient j, who is not present in the original sample D n of patients.Further let us assume that this patient did not have a GR at his last biopsy performed at time t, and that the PSA levels are available up to a time point s.The goal is to find the optimal time u > max(t, s) of the next biopsy.

Posterior Predictive Distribution for Time to GR
Let Y j (s) denote the history of PSA levels taken up to time s for patient j.The information from PSA history and repeat biopsies is manifested by the posterior predictive distribution g(T * j ), given by (conditioning on baseline covariates w i is dropped for notational simplicity hereafter): The distribution g(T * j ) depends on the observed longitudinal history Y j (s) of patient j via the random effects b j , and on the information from the original dataset D n via the posterior distribution of the parameters p(θ | D n ), where θ denotes the vector of all parameters.

Loss Functions
To find the time u of the next biopsy, we use principles from statistical decision theory in a Bayesian setting (Berger, 1985;Robert, 2007).More specifically, we propose to choose u by minimizing the posterior expected loss E g L(T * j , u) , where the expectation is taken with respect to g(T * j ).The former is given by: Various loss functions L(T * j , u) have been proposed in literature (Robert, 2007).The ones we utilize, and the corresponding motivations are presented next.
Given the medical burden of biopsies, ideally only one biopsy performed at the exact time of GR is sufficient.Hence, neither a time which overshoots the true GR time T * j , nor a time which undershoots is preferred.In this regard, the squared loss function L(T * j , u) = (T * j − u) 2 and the absolute loss function L(T * j , u) = T * j − u have the properties that the posterior expected loss is symmetric on both sides of T * j .Secondly, both loss functions have well known solutions available.The posterior expected loss for the squared loss function is given by: The posterior expected loss in (2) attains its minimum at u = E g (T * j ), the expected time of GR.The posterior expected loss for the absolute loss function is given by: The posterior expected loss in (3) attains its minimum at the median of g(T * j ), given by u Rizopoulos, 2011).It is given by: For ease of readability we denote π −1 j (0.5 | t, s) as median(T * j ) hereafter.Even though the mean or median time of GR may be obvious choices from a statistical perspective, from the viewpoint of doctors or patients, it could be more intuitive to make the decision for the next biopsy by placing a cutoff 1 − κ, where 0 ≤ κ ≤ 1, on the dynamic incidence/risk of GR.This approach would be successful if κ can sufficiently well differentiate between patients who will obtain GR in a given period of time, and those who will not.This approach is also useful when patients are apprehensive about delaying biopsies beyond a certain risk cutoff.Thus, a biopsy can be scheduled at a time point u such that the dynamic risk of GR is higher than a certain threshold 1 − κ, beyond u.To this end, the posterior expected loss for the following multilinear loss function can be minimized to find the optimal u: (5) where k 1 , k 2 are constants parameterizing the loss function.The posterior expected loss (Robert, 2007).The choice of the two constants k 1 and k 2 is equivalent to the choice of κ = k 1 /(k 1 + k 2 ).
In practice, for some patients we may not have sufficient information to accurately estimate their PSA profile.The resulting high variance of g(T * j ) could make using a measure of central tendency such as mean or median time of GR unreliable (i.e., overshooting the true T * j by a big margin).In such occasions, the approach based on dynamic risk of GR could be more robust.This consideration leads us to a hybrid approach, namely, to select u using dynamic risk of GR based approach when the spread of g(T * j ) is large, while using E g (T * j ) or median(T * j ) when the spread of g(T * j ) is small.What constitutes a large spread will be application-specific.In PRIAS, within the first 10 years, the maximum possible delay in detection of GR is three years.Thus we propose that if the difference between the 0.025 quantile of g(T * j ), and E g (T * j ) or median(T * j ) is more than three years then proposals based on dynamic risk of GR be used instead.

Estimation
Since there is no closed form solution available for E g (T * j ), for its estimation we utilize the following relationship between E g (T * j ) and π j (u | t, s): There is no closed form solution available for the integral in (6), and hence we approximate it using Gauss-Kronrod quadrature.We preferred this approach over Monte Carlo methods to estimate E g (T * j ) from g(T * j ), because sampling directly from g(T * j ) involved an additional step of sampling from the distribution p(T * j | T * j > t, b j , θ), as compared to the estimation of π j (u | t, s) (Rizopoulos, 2011).The former approach was thus computationally faster.
As mentioned earlier, selection of the optimal biopsy time based on E g (T * j ) alone will not be practically useful when the var g (T * j ) is large, which is given by: Since a closed form solution is not available for the variance expression, it is estimated similar to the estimation of E g (T * j ).The variance depends both on last biopsy time t and PSA history Y j (s).The impact of the observed information on variance is demonstrated in Section 5.2.
For schedules based on dynamic risk of GR, the value of κ dictates the biopsy schedule and thus its choice has important consequences.Often it may be chosen on the basis of the amount of risk that is acceptable to the patient.For example, if the maximum acceptable risk is 5%, then κ = 0.95.
In cases where κ cannot be chosen on the basis of the input of the patients, we propose to automate the choice of κ.More specifically, we propose to choose a threshold κ for which a binary classification accuracy measure (López-Ratón et al., 2014), discriminating between cases and controls, is maximized.In PRIAS, cases are patients who experience GR and the rest are controls.However, a patient can be in control group at some time t and in the cases at some future time point t + ∆t, and thus time dependent binary classification is more relevant.In joint models, a patient j is predicted to be a case if π j (t + ∆t | t, s) ≤ κ and a control if π j (t + ∆t | t, s) > κ (Rizopoulos, 2016;Rizopoulos, Molenberghs, and Lesaffre, 2017).In this work we choose the time window ∆t to be one year.This because, in AS programs at any point in time, it is of interest to identify patients who may obtain GR in the next one year from those who do not, so that they can be provided immediate attention (in exceptional cases a biopsy within an year of the last one).As for the choice of the binary classification accuracy measure, we require a measure which is in line with the goal to focus on patients whose true time of GR falls in the time window ∆t.To this end, a measure which combines both sensitivity and precision is the F 1 score.It is defined as: TPR(t, ∆t, s) PPV(t, ∆t, s) TPR(t, ∆t, s) + PPV(t, ∆t, s) , where TPR(•) and PPV(•) denote time dependent true positive rate (sensitivity) and positive predictive value (precision), respectively.The estimation for both is similar to the estimation of AUC(t, ∆t, s) given by Rizopoulos, Molenberghs, and Lesaffre (2017).Since a high F 1 score is desired, the optimal value of κ is arg max κ F 1 (t, ∆t, s).In this work we compute the latter using a grid search approach.That is, first F 1 is computed using the available dataset over a fine grid of κ values between 0 and 1, and then κ corresponding to the highest F 1 is chosen.Furthermore, in this paper we use κ chosen only on the basis of F 1 score.

Algorithm
The aforementioned personalized schedules, schedule biopsy at a time u > max(t, s).However, if time u < T * j , then GR is not detected at u and at least one more biopsy is required at an optimal time u new > max(u, s).This process is repeated until GR is detected.To aid in medical decision making, we elucidate this process via an algorithm in Figure 1.Since AS programs strongly advise that biopsies are conducted at a gap of at least one year, when u − t < 1, the algorithm postpones u to t + 1, because it is the time nearest to u, at which the one year gap condition is satisfied.

Evaluation of Schedules
Given a particular schedule S of biopsies, our next goal is to evaluate the schedule and to compare it with other schedules.To this end, we first present the methods to evaluate the biopsy schedules and then discuss the choice of a schedule.
We evaluate a schedule S using two criteria, namely the number of biopsies N S j ≥ 1 a schedule conducts for the j-th patient to detect GR, and the offset O S j ≥ 0 by which it overshoots the true GR time T * j .The offset O S j is defined as O S j = T S jN S j − T * j , where T S jN S j ≥ T * j is the time at which GR is detected.Our interest lies in the joint distribution p(N S j , O S j ) of the number of biopsies and the offset.Given the medical burden of biopsies, ideally only one biopsy with zero offset should be conducted.Hence, realistically we should select a schedule with a low mean number of biopsies E(N S j ) as well a low mean offset E(O S j ).It is also desired that a schedule has low variance of the number of biopsies var(N S j ), as well as low variance of the offset var(O S j ), so that the schedule works similarly for most patients.Quantiles of p(N S j ) may also be of interest.For example, it may be desired that a schedule detects GR with at most two biopsies in 95% of the patients.

Choosing a Schedule
Given the multiple criteria for evaluation of a schedule, the next step is to use them to select a schedule.Using principles from compound optimal designs (Läuter, 1976) we propose to choose a schedule S which minimizes a loss function of the following form: where R r (•) is an evaluation criteria based on either the number of biopsies or the offset (for brevity of notation, only N S j is used in the equation above).Some examples of R r (•) are mean, median, variance and quantile function.Constants η 1 , . . ., η R , where 0 ≤ η r ≤ 1 and R r=1 η r = 1, are weights to differentially weigh-in the contribution of each of the R criteria.An example loss function is: Enter Active Surveillance.
1. Measure PSA and Gleason at induction.
(2) Set u = new optimal u. u ≤ u pv Set u = u pv .
Set u pv = u.
Set u = t + 1. Conduct biopsy at u.The choice of η 1 and η 2 is not easy, because biopsies have associated medical risks and consequently the cost of an extra biopsy cannot be quantified or compared to a unit increase in offset easily.
To obviate this problem we utilize the equivalence between compound and constrained optimal designs (Cook and Wong, 1994).More specifically, it can be shown that for any η 1 and η 2 there exists a constant C > 0 for which minimization of loss function in ( 9) is equivalent to minimization of the loss function subject to the constraint that E(O S j ) < C. That is, a suitable schedule is the one which conducts the least number of biopsies to detect GR while simultaneously guaranteeing an offset less than C. The choice of C now can be based on the protocol of AS program.In the more generic case in (8), a schedule can be chosen by minimizing R R (•) under the constraint R r (•) < C r ; r = 1, . . ., R − 1.

Demonstration of Personalized Schedules
To demonstrate how the personalized schedules work, we apply them to the patients enrolled in PRIAS.To this end, we divide the PRIAS dataset into a training dataset with 5264 patients and a demonstration dataset with three patients who never experienced GR.We fit a joint model to the training dataset and then use it to create personalized schedules for patients in demonstration dataset.We fit the joint model using the R package JMbayes (Rizopoulos, 2016), which uses the Bayesian methodology to estimate the model parameters.

Fitting the Joint Model to PRIAS Dataset
The training dataset contains age at the time of induction in PRIAS, PSA levels and the time interval in which GR is detected, for 5264 prostate cancer patients.PSA was measured at every three months for the first two years and every six months thereafter.To detect GR, biopsies were conducted as per the PRIAS schedule (Section 1).For the longitudinal analysis of PSA we use log 2 PSA measurements instead of the raw data (Nieboer et al., 2015).The longitudinal sub-model of the joint model we fit is given by: where B k (t, K) denotes the k-th basis function of a B-spline with three internal knots at K = {0.1,0.5, 4} years, and boundary knots at zero and seven years.The spline for the random effects consists of one internal knot at 0.1 years and boundary knots at zero and seven years.The choice of knots was based on exploratory analysis as well as on model selection criteria AIC and BIC.Age of patients was median centered to avoid numerical instabilities during parameter estimation.For the relative risk sub-model the hazard function we fit is given by: where α 1 and α 2 are measures of strength of the association between hazard of GR and log 2 PSA value m i (t) and log 2 PSA velocity m ′ i (t), respectively.Since the PRIAS schedule depends only on the observed PSA values (via PSA-DT), the interval censoring observed in PRIAS is independent and non informative of the underlying health of the patient.
From the joint model fitted to the PRIAS dataset we found that only log 2 PSA velocity was strongly associated with hazard of GR.For any patient, a unit increase in log 2 PSA velocity led to an 11 fold increase in the hazard of GR.The parameter estimates for the fitted joint model are presented in detail in Web Appendix C of the supplementary material.

Personalized Schedules for the First Demonstration Patient
Using the demonstration dataset, we next present the functioning of personalized schedules based on expected time of GR and dynamic risk of GR.The evolution of PSA, time of last biopsy and proposed biopsy times for the first demonstration patient are shown in the top panel of Figure 2. We can see the combined effect of decreasing PSA levels and a negative repeat biopsy on personalized schedules, between year three and year 4.5 for this patient.In accordance with the two negative repeat biopsies and consistently decreasing PSA, the proposed time of biopsy based on dynamic risk of GR increases from 14 years to 15 years in this period.Whereas, the proposed time of biopsy based on expected time of GR increases from 16.6 years to 18.8 years.We can also see in the bottom panel of Figure 2 that after each negative repeat biopsy, SD[T * j ] = var g (T * j ) decreases sharply.Thus, if the expected time of GR based approach is used, then the offset O S j will be smaller on average for biopsies scheduled after the second repeat biopsy than those scheduled after the first repeat biopsy.
The demonstration of personalized schedules for the two other patients from the demonstration data set is presented in Web Appendix D of the supplementary material.

Simulation Study
The application of personalized schedules for patients from PRIAS demonstrated that these schedules adapt according to the historical data of each patient.However we could not perform a full scale comparison between personalized and PRIAS schedule, because the true time of GR was not known for the PRIAS patients.To this end, we conducted a simulation study comparing personalized schedules with PRIAS and annual schedule, whose details are presented next.

Simulation Setup
First we assume a population of patients enrolled in AS, with the same entrance criteria as that of PRIAS.The PSA and hazard of GR for patients from this population follow a joint model of the form postulated in Section 5.1, with parameters equal to the posterior mean of parameters estimated from the joint model fitted to PRIAS dataset (Web Appendix C of the supplementary material).We further assume that there are three equal sized subgroups G 1 , G 2 and G 3 of patients in the population, differing in the baseline hazard of GR.This was done because we wanted to test the performance of different schedules for a population with a mixture of patients, namely those with faster progressing PCa, as well as those with slowly progressing PCa.For the three subgroups we use a Weibull distributed baseline hazard with the following shape and scale parameters (k, λ): (1.5, 4), (3, 5) and (4.5, 6) for G 1 , G 2 and G 3 , respectively.The effect of these parameters is that the mean GR time is lowest in G 1 (faster progressing PCa) and highest in G 3 (slowly progressing PCa).
From this population we have sampled 500 datasets with 1000 patients each.Patients are randomly assigned to a subgroup.Further, each dataset is split into a training (750 patients) and a test (250 patients) part.The k-th simulated training dataset D k is given by D k = {l ki , r ki , y ki ; i = 1, . . ., 750}, where y ki denote the PSA measurements for the i-th patient in D k .The frequency of PSA measurements is same as that in PRIAS.Other than simulating a true GR time T * ki , we also generate a random and non-informative censoring time C ki .When T * ki < C ki , then l ki = r ki = T * ki , otherwise l ki = C ki and r ki = ∞.For the test patients, censoring time is not generated.
We next fit a joint model of the specification given in ( 10) and ( 11) to each of the D k , k = 1, . . ., 500, and obtain a MCMC sample from the posterior distribution p(θ | D k ).We then obtain g(T * kl ) for each of the l-th test patient of the k-th data set and conduct hypothetical biopsies for him.For every patient we conduct biopsies using the following six types of schedules (abbreviated names in parenthesis): personalized schedules based on expected time of GR (Exp.GR time) and median time of GR (Med.GR time), personalized schedules based on dynamic risk of GR (Dyn.risk GR), a hybrid approach between median time of GR and dynamic risk of GR (Hybrid), PRIAS schedule and annual schedule.The biopsies are conducted iteratively in accordance with the algorithm in Figure 1.
To compare the aforementioned schedules we require estimates of the various criteria based on offset and number of biopsies conducted to detect GR (Section 4).To this end, we compute pooled estimates of each of the E(N S j ), var(N S j ), E(O S j ) and var(O S j ), as below: , where n k denotes the number of test patients, is the estimated variance of the offset for the k-th simulation.The estimates for number of biopsies are obtained similarly.

Results
The pooled estimates of the aforementioned criteria are summarized in Table 1.In addition, mean offset is plotted against mean number of biopsies conducted to detect GR in Figure 3. From the figure it is evident that across the schedules there is an inverse relationship between E(N S j ) and E(O S j ).For example, the annual schedule conducts on average 5.2 biopsies to detect GR, which is the highest among all schedules, however it has the least average offset of 6 months as well.On the other hand the schedule based on expected time of GR conducts only 1.9 biopsies on average to detect GR, the least among all schedules, but it also has the highest average offset of 15 months.The schedule based on median time of GR performs similar to that based on expected time of GR.Since the annual schedule attempts to contain the offset within an year it has the least SD(O S j ) = var(O S j ).However to achieve so, it conducts a wide range of number of biopsies from patient to patient, i.e., highest SD(N S j ) = var(N S j ).Schedules based on expected and median time of GR perform the opposite of annual schedule in terms of SD(N S j ) and SD(O S j ).The PRIAS schedule conducts only 0.3 biopsies less than the annual schedule, but with a higher variance of offset, it does not guarantee early detection for everyone.If we compare the PRIAS schedule with dynamic risk of GR based schedule, we can see that the latter performs slightly better than PRIAS schedule in all four criteria.The hybrid approach combines the benefits of methods with low E(N S j ) and SD(N S j ), and methods with low E(O S j ) and SD(O S j ).It conducts 1.5 biopsies less than the annual schedule on average and with a E(O S j ) of 9.7 months it detects GR within an year since its occurrence.Moreover, it has both SD(N S j ) and SD(O S j ) comparable to PRIAS.
The performance of each schedule differs for the three subgroups G 1 , G 2 and G 3 .The annual schedule remains the most consistent across subgroups in terms of the offset, but it conducts 2 extra biopsies for subgroup G 3 (slowly progressing PCa) than G 1 (faster progressing PCa).The performance of schedule based on expected time of GR is the most consistent in terms of number of biopsies but it detects GR an year later on average in subgroup G 1 than G 3 .For the dynamic risk of GR based schedule and the hybrid schedule the dynamics are similar to that of the annual schedule.Unlike the latter two schedules, the PRIAS schedule not only conducts more biopsies in G 3 than G 1 but also detects GR later in G 3 than G 1 .
The choice of a suitable schedule using (8) depends on the chosen criteria for evaluation of schedules.For example, the schedule based on dynamic risk of GR is suitable if on average the least number of biopsies are to be conducted to detect GR, while simultaneously making sure that at least 90% of the patients have an average offset less than one year (Figure 4 and 5).The schedule based on expected time of GR is suitable if on average the least number of biopsies are to be conducted to detect GR, while simultaneously making sure that at least 90% of the patients have an average offset less than three years.If a stricter cutoff is required on offset the hybrid approach may be suitable, since it conducts only 3.8 biopsies on average while guaranteeing an offset of two years for 95% of the patients and three years for 99.9% of the patients.Besides if further cutoffs are required on variance of number of biopsies or offset they are not too high either for the hybrid approach.

Discussion
In this paper we presented personalized schedules for surveillance of PCa patients.The problem at hand was the following: Low risk PCa patients enrolled in AS have to undergo repeat biopsies on a frequent basis for examination of PCa progression, and since biopsies have an associated risk of complications, not all patients comply with the schedule of biopsies.This may reduce the effectiveness of AS programs because detection of PCa progression is delayed.To approach these problems we proposed personalized schedules based on joint models for time to event and longitudinal data.At any given point in time, the proposed personalized schedules utilize a patient's information from historical PSA measurements and repeat biopsies conducted up to that time.We proposed two different classes of personalized schedules, namely schedules based on expected and median time of GR of a patient, and schedules based on dynamic risk of GR.In addition we also proposed a combination (hybrid approach) of these two approaches, which is useful in scenarios where variance of time of GR for a patient is high.We then proposed criteria for evaluation of various schedules and a method to select a suitable schedule.
We demonstrated using the PRIAS dataset that the personalized schedules adjust the time of biopsy on the basis of results from historical PSA measurements and repeat biopsies, even when the two are not in concordance with each other (Web Appendix D).Secondly, we conducted a simulation study to compare various schedules.We observed that the schedules based on expected and median time of GR conduct only two biopsies on average to detect GR, which is promising compared to PRIAS (4.9 biopsies) and annual schedule (5.2 biopsies).We also observed that the performance of the schedules depends on the true GR time of the patient.For example, in simulated patients who have a slowly progressing PCa (subgroup G 3 ), personalized schedule based on expected time of GR detects GR one year earlier on average compared to patients who have a faster progressing PCa (subgroup G 1 ), while conducting approximately the same number of biopsies for both subgroups.For subgroup G 1 , the annual or PRIAS schedule may be preferred because they detect GR at 6 and 7.4 months since its occurrence, respectively.However for slowly progressing PCa patients up to 6 biopsies were needed to detect GR at 6 and 8 months, respectively.In such scenarios, that is, where it is not known in advance if the patient will have a faster or slower progression of PCa, the hybrid approach provides an interesting alternative.This because, it conducts one biopsy less than the annual schedule in faster progressing PCa patients while detecting GR at 10.3 months since its occurrence on average.Whereas, for slowly progressing PCa patients it conducts two biopsies less than the annual schedule while detecting GR at 8.6 months since its occurrence on average.
While each of the personalized schedules have their own advantages and disadvantages, they also offer multiple choices to the AS programs to choose one as per their requirements, instead of choosing a common fixed schedule for all patients.In this regard, we proposed to choose schedules which conduct as less biopsies as possible while restricting the offset to be below a AS program specific threshold, or vice versa.There is also potential to develop more personalized schedules.For example, using loss functions which asymmetrically penalize overshooting/undershooting the target GR time can be interesting.Furthermore, for dynamic risk of GR based schedules, it is also possible to choose κ using other binary classification accuracy measures or as per patient's wish (Web Appendix E).Although in this work we assumed that the time of GR was interval censored, in reality the Gleason scores are susceptible to inter-observer variation (Carlson et al., 1998).Models and schedules which account for error in measurement of time of GR will be interesting to investigate further.Lastly, there is potential for including diagnostic information from magnetic resonance imaging (MRI) or DRE.Unlike PSA levels, such information may not always be continuous in nature, in which case our proposed methodology can be extended by utilizing the framework of generalized linear mixed models.

Figure 1 :
Figure1: Algorithm for creating a personalized schedule for patient j.The time of the latest biopsy is denoted by t.The time of the latest available PSA measurement is denoted by s.The proposed personalized time of biopsy is denoted by u.The time at which a repeat biopsy was proposed on the last visit to the hospital is denoted by u pv .The time of the next visit for the measurement of PSA is denoted by T nv .

Figure 2 :
Figure 2: Top panel: Evolution of PSA, history of repeat biopsies and corresponding personalized schedules for the first demonstration patient.Bottom Panel: History of repeat biopsies and SD g (T * j ) = var g (T * j ) over time for the first demonstration patient.

Figure 3 :
Figure3: Estimated mean number of biopsies and mean offset (months) for the 6 schedules, using all simulated patients.

Figure 4 :Figure 5 :
Figure 4: Boxplot showing variation in number of biopsies conducted by different methods, using all simulated patients.

Table 1 :
Estimated mean and standard deviation of the number of biopsies and offset (months).