Cancer recurrence times from a branching process model

As cancer advances, cells often spread from the primary tumor to other parts of the body and form metastases. This is the main cause of cancer related mortality. Here we investigate a conceptually simple model of metastasis formation where metastatic lesions are initiated at a rate which depends on the size of the primary tumor. The evolution of each metastasis is described as an independent branching process. We assume that the primary tumor is resected at a given size and study the earliest time at which any metastasis reaches a minimal detectable size. The parameters of our model are estimated independently for breast, colorectal, headneck, lung and prostate cancers. We use these estimates to compare predictions from our model with values reported in clinical literature. For some cancer types, we find a remarkably wide range of resection sizes such that metastases are very likely to be present, but none of them are detectable. Our model predicts that only very early resections can prevent recurrence, and that small delays in the time of surgery can significantly increase the recurrence probability.


Author summary
The majority of cancer related deaths are due to the development of secondary tumors called metastases. However, the dynamics of metastases establishment and growth and their relation with the primary tumor evolution are still not clear. A standard treatment starts with the resection of the primary tumor. At this time metastases may have already formed and still be too small to be detected. The presence of only undetectable metastases poses a challenge for deciding on the follow-up therapy. These small metastases could grow to a detectable size-thus leading to a recurrence of the disease-some time after surgery. We are interested in this time until cancer relapse. We present a mathematical model of metastases formation using tools from probability theory and estimate the model parameters for five different cancer types. Our predictions for the probability of visible metastases present at surgery and the mean time to relapse when no visible metastases are found at surgery are both in agreement with clinical data. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Metastases develop as cancer cells disseminate from a primary tumor and establish new malignant lesions in the surrounding tissue or at other sites [1]. However, the full process of metastasis formation is much more complex and many related aspects are not yet fully understood. In particular, it is still unclear whether metastases are initiated during early or late stages of carcinogenesis (see e.g. [2][3][4]). These details, however, affect the chances of a patient presenting detectable or undetectable metastases at diagnosis, which in turn influence treatment strategies and prognosis. For these reasons, different authors (see e.g. [5][6] and the references therein) have proposed mathematical models to improve our understanding of the dynamics of metastasis formation.
Metastases frequently arise in cancer patients, and their occurrence greatly diminishes the chances of effective treatment. In fact, even when a therapy is initially successful, metastases often lead to relapse and are responsible for an estimated 90% of cancer related deaths [7]. Despite this common disease progression, reliable predictions for cancer recurrence rates and times are still lacking [8].
Recently, many generalizations of the Luria-Delbrück model [9] have been employed to study specific traits of tumor evolution, such as the development of drug resistance [10][11][12][13], the role of driver mutations [14,15] and metastasis formation [5,6,16,17]. Another line of research focused on temporal features, after the first stochastic model for the time to tumor onset was proposed by Armitage and Doll in their pioneering work on carcinogenesis [18]. A few decades later authors began to investigate stochastic models of tumor latency time. In particular, these works led to mathematical descriptions of optimal schedules of cancer surveillance [19,20], cure rates [21] and cancer recurrence [22]. These models are studied in the context of survival analysis and reviewed in the excellent book of Yakovlev and Tsodikov [23].
In this paper we build a model for cancer recurrence by joining these two approaches, that is we use Luria-Delbrück type models to study cancer relapse times. In particular, we consider a deterministically growing tumor seeding metastases at a rate depending on its size [24], and model the evolution of each metastasis (or clone) as independent birth-death branching processes. A similar setup was used by Lea and Coulson to mimic mutations occurring in a growing bacterial population [25]. In our model though we interpret these mutation events (from wild-type cells to mutants) as metastasis initiation events. The distribution of mutant close sizes was studied with an exponentially growing wild-type population [26] and with more general wild-type growth function [16]. Kendall [27] also allowed the wild-type population to grow stochastically, but this extension left the mutant behavior unchanged for small initiation (mutation) rates [28,29]. Hence in this paper we model the size of the primary tumor as a deterministic function (focusing on exponential and logistic growth as examples), while allow the seeded metastases to evolve stochastically according to branching processes.
Within this framework we study the time to cancer relapse, defined as the interval between the primary onset and the first time that any of the metastases reaches a fixed detectable size. Similar characterizations are employed in the threshold models described in [22,23].
The rest of the paper is organized as follows: In Results we present our mathematical model of metastases initiation and growth, and derive an explicit formula for the probability distribution of the time to relapse. We then extend our model to include the resection of the primary tumor at a given time and distinguish between synchronous and metachronous metastases. In Discussion we report parameter estimates for five different cancer types (namely breast, colorectal, headneck, lung and prostate) and analyze the corresponding predictions yielded by our model. Quantitative results are compared with data collected from clinical literature. In Material and Methods we present details about the mathematical formulation of our model and related derivations.

Results
Our mathematical characterization of the time to cancer recurrence is based on a stochastic model of metastasis formation. We first present the fundamental assumptions and features of this model, and then use them to derive the probability distribution of the time to relapse.

Model setup
We model the number of cells in the primary tumor as a deterministic function of time n(t). The tumor initiates metastases at rate νn(t), where ν is constant. We implicitly assume that all tumor cells can metastasize at the same rate. Since we make no assumptions on n(t), one can define initiation at rate νn(t) γ to model scenarios where only a fraction of the primary tumor can metastasize, for example only the cells near its surface or close to blood vessels (see e.g. [6]). The initiated metastases are then modelled as independent branching birth-death processes [31], all with the same birth rate α and death rate β. We assume that they are supercritical, that is they have a positive net growth rate λ = α − β > 0, and consequently grow exponentially for large times [31]. Exponential growth has also been observed in clinical studies [32] for untreated human lung metastases, which supports our modelling choice.
Under these assumptions each metastasis will eventually go extinct with probability q = β/α < 1. The surviving ones instead grow unboundedly and will reach any given size [31]. Let M be a fixed number of cells representing the minimal detectable size of a cancerous lesion. We aim to describe the time to cancer recurrence, defined as the first time τ that any metastasis reaches the detectable size M.
The minimal detectable size M is typically very large, with estimates over 10 6 (see parameter estimations in Discussion). As the probability that a large supercritical population goes extinct is negligibly small, we assume that each metastasis survives indefinitely if it reaches M. Then, due to the splitting property of Poisson processes, the surviving metastases that eventually reach the detectable size are initiated as a non-homogeneous Poisson process (K t ) t�0 with rate ν(1 − q)n(t). Here K t denotes the number of metastases established by t, conditioned on survival. The expected number of established metastases at time t is thus nðsÞds and the probability that at least one is present at t is equal to Surviving metastases are initiated at times σ i ≔ inf{t � 0: K t = i} and are described by i.i.d. birth-death processes (S i (s)) s�0 , where S i (s) is the number of cells in the i-th metastasis at time s after its establishment. In particular, we have S i (0) = 1 for every i. For each of these processes we define Θ i ≔ inf{s � 0: S i (s) = M} as the time needed by the i-th established metastasis to grow to the detectable size M, counting again from its initiation. Since the processes S i (s) are independent, the hitting times Θ i are also independent and identically distributed. As shown in Material and Methods, for M large their distribution asymptotically satisfies where O ðiÞ 1 denotes the eventual survival for the i-th metastasis. Interestingly, the distribution G(t) is of a Gumbel type, which generally describes the maximum of independent random variables with exponential (right) tail. This Gumbel type has two parameters, a and b > 0, and distribution function exp À e À xÀ a b ð Þ. Hence, conditioned on survival, we asymptotically have Y i � Gumb max logMð1À qÞ l ; 1 l À � for every i.

Time to reach detectable size
Given the definitions in the previous section, we have that the i-th metastasis reaches detectable size at time τ i ≔ σ i + Θ i , measured from primary onset. Metastases are initiated at time s at rate ν(1 − q)n(s) and then reach the detectable size before t with probability G(t − s) for s � t. Hence, the thinning property of Poisson processes yields that metastases which become detectable by time t are initiated at time s at rate ν(1 − q)n(s)G(t − s). Consequently, the number of metastases detectable by t follows a Poisson process (S s ) 0�s�t with respect to time s for a fixed t. In particular, the number S t of such metastases established by t is thus a Poisson random variable with mean The relapse time is defined as the first time that any metastasis reaches the detectable size, τ ≔ min i {τ i }. Hence, τ is smaller than t if by that time at least one metastasis that becomes detectable before t is initiated, and so A sample realisation of our model, including the relapse time τ, is depicted in Fig 1. In the large detectable size M limit, the relapse time distribution converges to a simpler form (see Material and Methods) where the random variable � t is distributed as Hence for large M the relapse time decomposes as t � 1 l log M þ � t into a deterministic part which depends only on λ and M, plus random fluctuations described by � t. This decomposition leads to the estimate E½t� � 1 l log M þ C for the expected value of the relapse time, where the constant C ¼ E½� t� can be obtained from Eq 5.

Exponential population growth
Two commonly employed primary growth functions are the exponential and the logistic ones (see e.g. [33]). These are given by n(t) = e δt and nðtÞ ¼ Ke dt Kþe dt À 1 , respectively, where δ denotes the primary tumor net growth rate and K a carrying capacity. Relapse time densities for these two growth types and different initiation rates are shown in Fig 2. We observe that as ν increases, the logistic distributions converge to the exponential ones (see Material and Methods for more details). Moreover, for all our parameter estimates our model predicts the same results with these two growth types. The reason is that the metastases determining the time to relapse are initiated during the early phase of tumor evolution which is almost exponential even for a logistic growth. Therefore, from now on we will focus on exponentially growing primary tumors. Exponential growth has the additional advantage that if only a portion of primary cells can metastasize and their number is proportional to n(t) γ (say only cells close to the surface of a spherical tumor for γ = 2/3), then this would be equivalent to changing the primary net growth rate, that is using n(t) = e γδt .
Since the initiation rate ν is by far the slowest rate in our model, here we study in detail the most relevant case, that is the small ν limit for an exponentially growing tumor. The deterministic part of the relapse time remains 1 l log M, but interestingly the fluctuations � t are distributed as This Gumbel distribution describes the minimum of independent random variables with exponential (left) tail, has two parameters a and b < 0 and distribution 1 À expð À e À xÀ a b Þ. Parameter a describes a shift in the distribution, and since a * log ν, it explains the equal spacing between the densities in Fig 2 for logarithmically-spaced values of the initiation rate. Also notice that these curves are left skewed, as it is expected from the Gumbel for the minimum. The primary tumor grows according to a deterministic exponential function n(t)-depicted by the blue line. It initiates distant metastases at rate νn(t), and each of them grows as an independent branching process (only the first five are plotted). The first time τ that any of these metastases reaches a minimal detectable size M is defined as the time to cancer relapse. Also, the primary tumor is surgically removed at a given time T, when it is made of N = n(T) cells. In the realisation shown, the third established metastases (green curve) is the first to reach detectable size, and hence determines the time to cancer relapse τ. Based on clinical data (summarized in Table 1), we estimated model parameters (summarized in Table 2), and here we use those for colorectal cancer, with N = 2 × 10 11 . Note that a similar illustration for metastasis formation appears in [30].
On the other hand, the Gumbel for the maximum-which describes the fluctuations of the time to detection starting from a single initial cell-is right skewed.
For small initiation rates ν and large detectable sizes M, the mean relapse time is approximately given by where γ E � 0.5772 denotes the Euler-Mascheroni constant. As shown by Fig 3, Table 2) we find C � 250 andC � 309. The reason for this difference is that even in the small ν-large M limit, later established metastases can outrun the earlier ones in reaching M first. While the mean relapse time τ shows logarithmic increase in terms of M and δ/ν, its variance stays constant, Var(τ) = π 2 /(6δ 2 ), see Eq 15. Hence, due to the slow logarithmic growth of  Table 2), the logistic densities (dashed lines) converge to the corresponding exponential ones as the initiation rate increases. Furthermore, in the exponential case and for all the above values of ν, the densities derived from Eq 4 and their approximation obtained from Eq 6 are indistinguishable.
https://doi.org/10.1371/journal.pcbi.1007423.g002 the mean, the fluctuations of the relapse time stay relevant even for large detection sizes and small mutation rates.

Relapse time with resection
Surgery is still the most common and effective type of treatment for solid tumors, although often used in combination with other kind of therapies (see e.g. [34]). However, how the time of resection affects prognosis, and in particular the estimation of the time to relapse, is still unclear. In order to investigate this question in a theoretical framework, we now embed surgery in our model and study how it changes the distribution of the time τ to relapse. Let us assume that at a given moment after detection a primary solid tumor is surgically removed. This event can be mathematically implemented in our model by considering a resection time T such that n(t) � 0 for t � T. In particular, this implies that after T no metastases can be initiated. The number of metastases already established at resection is equal to K T , and their size distribution is given in [16]. The distribution of the time τ to relapse can then be expressed exactly as in Eq 4, however here τ is not a proper random variable. In fact, as there is a positive probability that no metastasis will ever occur (notice that from this point of view our framework can be seen as a cure model-see e.g. [35]) and in this case we set τ = 1. The distribution of the relapse time conditioned on at least one  Table 2). Symbols represent simulation results for a deterministic (diamonds) or a stochastic primary growth (circles, see Discussion at the end of the parameter estimation section), while solid lines correspond to the theory in the small ν-large M limit. On the left, each starred dot denotes the mean of 1000 simulations, while lines represent the theoretical expectation given by Eq 7. These match the simulated means well for values of ν = 10 −6 or less. On the right, the relapse time densities derived from Eq 14 yield a good approximation of the simulated data (10000 simulations per curve) for M = 100 or greater for both deterministic and stochastic primary growth.
where we used that a relapsing metastasis had to be initiated before resection, that is {τ � t} � {K T � 1}. This conditional distribution for different resection times is depicted in Fig 4. In this and following figures, the resection time is shown at the bottom of the figure, and the corresponding resection size N = e δT is shown on the top. As T ! 0 all metastases have to be initiated close to time zero, so the relapse time becomes the time to reach size M from a single cell, which has the Gumbel distribution for the maximum given by Eq 2. If we then increase the resection time, the conditional densities shift to the right by the same amount. Finally, as T ! 1 the relapse time distribution converges to the case without resection The fluctuations for the unconditional distribution follow a Gumbel type for the minimum, as per Eq 6. Hence, as time increases, the relapse time distribution turns from a right-skewed Gumbel to a left-skewed Gumbel.
Note that the densities in Fig 4 become indistinguishable from the large time limit as P(K T � 1) approaches one. The reason is that by this time metastases have probably already been initiated and one of the early established ones is likely to relapse first. This suggests that only early enough resection times change the behaviour of the model. For example in the case of colorectal cancer, according to Fig 4, only resections of tumors smaller than 10 9 cells affect the time to recurrence.
Right skewed densities are often chosen to fit probability distributions arising in survival analysis. This is due to the fact that most survival data suffer from right censoring [36], where only a lower bound is known for data points. Looking at the densities in Fig 4, though, we can see both left and right skewed distributions. While a few survival datasets are negatively skewed [37], cancer relapse times are typically right censored as a consequence of limited follow-up and patients decease before relapse (see e.g. [38]). However, our model does not take into account any of these events. Furthermore [39] recently proposed a model for the estimation of screening times for colorectal cancer based on the observation that some datasets suffer from left censoring as well.

Metastasis classification
If the resection is successful and the primary tumor is completely removed, the therapy can still fail due to the formation of metastases. For this reason, it is common practice to start looking for detectable metastases several weeks before the surgery. In this section we thus want to characterize the metastases which are detectable at a given time and those which are not.
In general, for a fixed time t, the metastasizing process (K s ) 0�s�t can be split into two independent Poisson processes (S s ) 0�s�t and (M s ) 0�s�t describing the initiation of metastases which reach size M before or after t, respectively. Following the same argument we used to derive the relapse time distribution, we obtain that In particular, we have that the events {τ > t} and {S t = 0} are equivalent. We also stress that the definitions above naturally extend to the case of a primary resection, by simply redefining n(t) to be zero after the resection time T. Now, despite an ongoing discussion on the following nomenclature (see e.g. [40]), in the rest of the paper we will call a metastasis synchronous if it reaches the detectable size M before or up to the time of resection, and metachronous otherwise (hereby the choice of notation S t and M t ). These characterizations immediately allow us to estimate the probability of some clinically relevant events. The probability of no synchronous metastases is equal to Also, under this condition, relapse is not certain: the probability that at least one metastasis was initiated given that there are no visible ones at resection is since S T and M T are independent. In next section we will study the above and related quantities in greater detail.

Discussion
In this section we compare the predictions provided by our model with clinical data collected for different cancer types. To this purpose, we first need to estimate the parameter values for each of these cancer types.

Parameter estimation
The net growth rates of the primary and metastatic tumors, δ and λ, are inferred from the corresponding tumor volume doubling times (denoted DT pt and DT m , respectively) as These times have been studied by many authors, starting from the influential papers of [32, 41, 42]. Many authors still refer to these early works, although in some case more recent estimates are available. Colorectal, breast and lung cancers are the most frequently studied. Furthermore, more papers focus on primary doubling times than on metastatic ones.
Similarly, the birth rate α is derived from the potential doubling time T pot , defined as the average time between cell divisions in the absence of cell death [43][44][45]. In this case we simply use the estimation Note that some authors (see e.g. [46]) define instead T pot as the tumor doubling time in absence of cell death. While in this paper we employ the former definition, the latter would simply yield a factor log 2 in the formula above.
As for the primary tumor size N at resection, many studies report data on the primary maximum diameter, allowing for ellipsoidal forms. However, given the relatively small tumor volume and the wide interpatient variability, we assume a spherical shape and estimate d pt from the corresponding typical range. By also assuming 10 9 cells per cm 3 , the primary size at resection (expressed in number of cells) is thus estimated as N ¼ 1 6 pd 3 pt 10 9 . Table 1 summarizes typical ranges of these quantities for five different cancer types, together with the estimates we picked for our model and the corresponding literature references. Difficulties in distinguishing between primary and secondary tumors or in tracking down the primary origin of a metastatic cancer could in principle affect some of these data, but the wide range and multiple references reported reduce the potential impact of this effect.
Notice that by estimating the rates λ and α we also infer values for the death rate β = α − λ and the extinction probability q = 1 − λ/α. For the two remaining parameters, namely the initiation rate ν and the minimal detectable size of a metastasis M, we use common estimates across different cancer types. Various studies report a lowest detectable tumor diameter of 0.2cm for different cancer types (see e.g. [90-92]), corresponding to M � 4.19 × 10 6 cells. Moreover, several papers argue that the first metastases are likely to be established long before the detection of the primary tumor (see for example [54] and the references therein). In particular, the review of the progression model for metastases formation in [24] reports that dissemination starts when the primary diameter is between 0.1 and 0.4cm. We thus consider the primary tumor size at the expected time of the first metastasis initiation and estimate it to be e dE½s 1 � ¼ 10 8 cells, corresponding to a diameter of about 0.58cm. Hence, by using the results in Material and Methods, we set  Table 2.
Before we proceed to study our model predictions, let us further discuss the assumption of a deterministic primary tumor growth function. Firstly, as we just showed, the only data we Cancer recurrence times from a branching process model found to infer the rate of growth of a primary tumor refer to doubling times, whose notion implicitly assumes an exponential growth. For this reason we focus here on growth functions that (at least in their early stages) show an exponential behaviour. Other growth functions, for example n(t) = ct 3 for spherically growing tumours or n(t) = c 0 t 2 for tumors with active cells only around the surface, could be studied when more data becomes available. Secondly, one could model not just the metastases but also the primary tumor growth as a branching process to account for further stochastic effects. However, due to the large tumor size at resection a branching process model would predict an almost perfect exponential growth around resection time. For this reason we set n(t) = e δt , which then determines the initial time t = 0. Note that the tumor is not initiated precisely at t = 0, but that time is distributed according to a Gumbel distribution, analogously to the results in the Single type process section in Materials and methods. Since the initiation time is not accessible experimentally anyway, for simplicity we use this above definition for t = 0. In order to justify the exponential deterministic approximation for the primary size, we performed simulations where we modelled the primary tumor as a branching process as well. We found that for initiation rates of ν = 10 −5 or less (and all other parameters set for colorectal cancer) the exponential approximation of the primary causes less than a few percent error in the relapse time distribution (see Fig 3). The relationship between stochastic and deterministic wild type populations has been studied rigorously in [29].

Model predictions
Now that we have estimated the parameters of our model in Table 2, we are in position to study its predictions and compare them to clinical data. Let us start by analyzing the simplest predictions of the model, which are about the presence of synchronous and metachronous metastases. Fig 5 shows the probability of initiated metastasis by resection PðK T � 1Þ ¼ 1 À e À a T (Eq 1), and the probability of visible metastasis by resection PðS T � 1Þ ¼ 1 À e À b T (Eq 4) as functions of the resection time T, for five different cancer types. that, obviously, the probability of having initiated metastasis is always higher than the probability of having visible metastasis at resection. For all five cancer types considered, one or more metastases have likely been initiated by the time the primary tumor reaches about 8.2 × 10 8 cells (diameter 1.16cm). While this value is similar across different primary types (as a consequence of the parameters estimation procedure), the results for the probability of synchronous metastases vary widely. For breast, colorectal, headneck, lung and prostate cancer, Table 3 reports primary tumor sizes at which synchronous metastases might start to appear and are likely to be present, respectively (expressed both in terms of number of cells and tumor diameter). By comparing these values to typical resection sizes in Table 1, we find that detecting metastases at resection is very likely for lung and prostate cancer and rare for headneck primary tumors.
One of the most challenging scenarios for the development of an effective treatment is when there are only undetectable metastases present. In our framework this scenario corresponds to the event which has probability (see Eqs 9 and 10) Because of the last identity, the probability of established and all metachronous metastases can be read out from Fig 5 as the difference of the two curves. There, the shaded areas highlight intervals of resection times yielding P(U T ) > 85%. These intervals, often referred to as highrisk period [94], are especially wide for breast, colorectal and headneck cancers. The reason is that these cancer types have a lower ratio of metastatic over primary net growth rates, so metastases take longer to grow to visible size. Hence, although for these cancer types metastases grow slower, which improves prognosis, they stay undetectable for longer, which poses a challenge for diagnosis. The estimated resection sizes given in Table 1 fall within or close to these ranges (P(U T ) equal to 93.87%, 79.83%, 98.35%, 66.04% and 85.85% for the five primary tumor types studied, respectively). In general, by assuming that the primary tumor diameter at resection fits a normal distribution (with mean computed as the mean of d pt and variance set so that 95% of the observations belong to its typical range, see Table 1) we estimate that resections for breast and headneck cancers fall in the high-risk window 99.8% and 99.58% of the   Table 3. For each cancer type, the shaded areas highlight resection time intervals leading to a probability higher than 85% of established and all undetectable metastases. Using the parameter estimates from Cancer recurrence times from a branching process model times respectively, followed by colorectal (24.67%), prostate (13.41%) and lung (0.69%) cancers.
In order to check how robust the presence of a wide high-risk interval is, we plotted in Fig 6 the probability of having only undetectable metastasis at detection, P(U T ), for different values of the primary net growth rate δ and of the initiation rate ν. Other parameters are taken for colorectal cancer. The width of the high-risk interval is constant with respect to ν, and shrinks only as the ratio between the primary and metastatic net growth rate becomes very small. The same qualitative behaviour can be obtained with the parameter estimates for the other cancer types. As most metastases grow up to two times faster than the primary tumor they originated from [24], our model suggests that for a wide choice of parameters there is a substantial range of resection sizes that lead to a high probability of established and all undetectable metastases.
Next, we ask how such a probability, P(U T ), influences the time to cancer recurrence. The conditional distribution of the relapse time τ becomes for t � T, where we used the definition of U T (see Eq 11) and Eq 8. From this distribution we compute the expected relapse time measured from resection and conditioned on U T , E[τ − TjU T ]. This expectation and the probability P(U T ) are plotted in Fig 7. We see that for resection sizes smaller than 10 8 cells the relapse occurs on average between 4 and 5 years after resection, independently of the primary size. For resection sizes around 10 8 cells, undetectable metastases become likely to be present and E[τ − TjU T ] starts to decrease with tumor size. At

Fig 6. Probability of established and all metachronous metastases-P(U T ), as given by Eq 12-plotted as a function of T and δ (left panel) and of T and ν (right panel).
The parameter estimates used are those for colorectal cancer reported in about 19 years the probability of only undetectable metastases present and the conditional mean relapse time both approach zero. Let us stress that while some clinical studies report data on the whole distribution of recurrence times, these are usually measured from a varying time of surgery, which corresponds to different primary tumor resection sizes. Therefore, unless the distribution of relapse times is reported together with the corresponding resection sizes, we cannot compare it directly to the predictions of our model. However, we expect the variability of primary sizes at resection to average out across large cohorts of patients, which is why we analyzed the expected value of the time to recurrence. Using the values from Table 2 we tested our model by computing the probability of synchronous metastases and the mean relapse time conditioned on established but all undetectable metastases. The predictions from our model, typical ranges and references for each cancer type considered are summarized in Table 4. Notice that our predictions for the mean relapse time fall on the lower end of the respective typical ranges. This is expected since we compute the time to recurrence τ based on the minimal detectable size M, while in practice metastases are often detected only at larger sizes. In general, for different cancer types it is observed that metastases can grow up to 2 times faster than the primary tumor they originated from [24], although values as high as 4 have been proposed [95]. Our estimates fall within this range (λ/δ = 4 for prostate cancer, 3 for lung and between 1.5 and 2 for the others). As per the time interval from primary onset to surgery, the typical range is 15 − 25 years [43]. The high variability  Table 2. For resection times close to zero this conditional expectation coincides with that of the Gumbel distribution given by Eq 14, at about 5 years. As The last trait of cancer recurrence that we are going to examine is disease-free rates. These generally correspond to the survival function of the relapse time, P(τ > t). However, following the previous discussion we will condition this probability on no synchronous metastases, obtaining for t � T. In this case we do not observe any convergence to the density without resection, because if T ! 0 then no metastasis can be initiated and if T ! 1 the condition S T = 0 pushes the relapse time to infinity. Let us also stress that our model does not provide information on survival rates, as no modelling of the time to decease is incorporated. Furthermore, notice that P(τ > t) yields a good description of the disease-free rates in terms of metastases detectability, but not necessarily with respect to cancer symptomaticity. The relapse time distribution in case of no synchronous metastasis, P(τ > t j τ > T), for different resection times is shown in Fig 8, studying again the case of colorectal cancer. As we are not conditioning on at least one metastasis being initiated, there is always a positive probability that relapse will not occur, that is τ = 1. The resection times are thus chosen so to yield cure probabilities-P(K T = 0), corresponding to the final plateaus-equal to 0.75, 0.6, 0.45, 0.3, 0.15 and 0.001, respectively. These times span across a total range of about 2.2 years. Furthermore, excluding the latest resection time considered, the difference between two consecutive of these T values is between 0.28 and 0.4 years. Hence, our model suggests that delays of the order of months in the time of primary resection can lead to a significant decrease in the cure probability.
To quantify more precisely the implications of surgery delays, we study the probability that the first metastasis is initiated in the time interval (T, T + ΔT) This probability is depicted in Fig 9 using our parameter estimates for colorectal cancer. We see in the figure that there is a middle range of resection sizes where the recurrence probability can be significantly affected by small surgery delays. For colorectal cancer we estimate that if the primary resection is originally planned for a tumor of diameter between 0.44 and 0.9 Cancer recurrence times from a branching process model centimeters (4.39 × 10 7 and 3.89 × 10 8 cells, respectively), then a surgery delay of 60 days would decrease the cure probability by 5 − 9%. Conversely, tumors smaller than this critical range are less likely to metastasize during the surgery delay, while larger tumors likely metastasized already, so that the effect of surgery delay for these sizes is smaller.

Conclusions
We introduced a model of metastasis formation where metastases are initiated at a time dependent rate, in the simplest case proportional to the size of a growing primary tumor. All initiated metastases then evolve as independent supercritical branching processes. Parameters of the model were estimated for five different cancer types from the clinical literature. We studied the relapse time τ, that is the earliest time when any of the metastases becomes detectable. We obtained the distribution of τ for a general primary tumor growth and focused in particular on logistic and exponential growth functions. For clinically relevant initiation rates the metastases which relapse first are typically initiated in the early phase of the primary tumor development, which is exponential for both growth functions considered. Hence the distributions of τ for exponential and logistic primary growths are practically identical unless the initiation rate is unrealistically small (ν � 10 −13 or smaller) and we can thus exploit the much simpler formulas for the exponentially growing tumor. We model the resection of the primary tumor by introducing a cut-off for the growth function n(t). If metastases are likely already established at surgery, their time of relapse is not  Table 2) these times range from 12.28 to 14.48 years, corresponding to sizes between 5.12 × 10 7 and 1.23 × 10 9 cells (diameter 0.46 − 1.33cm), respectively. https://doi.org/10.1371/journal.pcbi.1007423.g008 Cancer recurrence times from a branching process model influenced by the resection timing. We categorized all metastases into synchronous and metachronous and computed corresponding occurrence probabilities. With our estimated parameters we found that the probability of synchronous metastases and the mean relapse time after resection falls in the typical clinical range for all five different cancer types we study.
A challenging scenario for treatment is that of patients with established but all undetectable metastases. For all five cancer types we considered, the probability of this event is high within a significant range of resection sizes. Unfortunately, the typical size of a resected tumor falls in or near this range for all cancer types. We found that relatively small delays in these resection times can cause significant decrease in the cure probability. Within our model, surgery only prevents recurrence if it is done before the onset of the first surviving metastases.
The parameter estimates summarized in Table 2 yield realistic predictions for several quantities of clinical interest. Although in principle we can explore our model predictions across the whole range of parameters, this would often lead to unrealistic outcomes. In this sense the quantitative predictions of our model are quite sensitive to the parameter values, but we have been able to find a combination of parameters that yields realistic results. On the other hand, the qualitative features of our model are more robust to parameter changes, as demonstrated for example in Fig 6. In particular, our estimate for the initiation rate of metastases, ν, is based on the assumption of early dissemination at primary size, e dE½s 1 � ¼ 10 8 cells. However, since the metastatic net growth rate δ and extinction probability q are estimated independently from data on tumor volume doubling times (see Table 1), by changing the early dissemination assumption our model predictions could fall outside their typical ranges. For example, for colorectal cancer ,   Fig 9. Probability of the first metastasis being initiated during surgery delay. This probability P(K T+ΔT � 1, K T = 0)-where T is the set time of resection and ΔT the surgery delay-is plotted as a function of the resection size N = e δT (x-axis) and of the delay ΔT (y-axis). With the parameter estimates for colorectal cancer (see Table 2) we see that if a primary tumor is resected at a critical size (around 2 × 10 8 cells, diameter � 0.725cm), surgery delays of 2-3 months can decrease the cure probability of more than 10%. https://doi.org/10.1371/journal.pcbi.1007423.g009 Cancer recurrence times from a branching process model assuming e dE½s 1 � ¼ 10 9 cells would lead to the unrealistic values E[τ − T j U T ] = 836 days and P (S T � 1) = 2.23%. Thus, indirectly, our model supports the hypothesis of early metastatic dissemination.
Note that in this paper we focused on the presence or absence of synchronous or metachronous metastasis at resection as these events determine if there is ever a relapse (K T � 1) or if relapse has already occurred by resection (S T � 1). Our model also provides estimates for the number and sizes of metastases at resection, but these are less relevant for the study of the time to cancer relapse, and have already been studied in detail in [16,29]. A general feature the model predicts is that the cumulative distribution of metastases sizes at resection has a power law tail with exponent δ/λ. This power law tail was observed in [16] using data on 21 patients with colorectal cancer from [13], and the exponent was found to typically be in the range 0.3 − 0.8. Our estimate δ/λ = 0.61 falls in this range, supporting our parameter inference. The paper above also reports data on the number of visible metastases at surgery. In our model, for a given primary resection size e δT , this number is a Poisson random variable S T . However, since the primary tumor sizes are not published and likely different for all patients in the data, we could not use this quantity reliably for our parameter estimation. For example, if we infer the initiation rate ν for colorectal cancer from the probability of visible metastasis at resection (given by PðS T � 1Þ ¼ 1 À e À b T and with estimate 0.2 from the data reported in Table 4) we would get essentially the same estimate as in Table 2. But by using instead the mean number of visible metastases (expressed as EðS T jS T � 1Þ ¼ b T =ð1 À e À b T Þ, with estimated lower bound 1.4 from [107]), we would infer a 3 times greater estimate for ν. Again, a possible cause of this discrepancy lies in the different resection sizes for patients which we have no data for.
Metastases are seeded and establish colonies via a specific and complex process called metastatic cascade (for details see e.g. [125]). Since this is known to be a multi-stage process, some authors (see for example [6,126,127] and references therein) have described metastases initiation through two-type stochastic models, where a cell needs to gain the ability to metastasize before it can establish a new metastatic lesion. We did not choose that route for several reasons: (i) the precise details of how and when cells reach this ability are not clear [43,128], (ii) in our model we can think of n(t) as the number of cells which can metastasize and so tailor the two approaches, and (iii) if we assume that an acquired metastatic ability lowers the primary net growth rate and that the seeding rate is sufficiently small (at most 10 −4 according to simulations), a branching process model would predict the same exponential growth for the cells with this ability [29,129], and hence this would only change the estimate of the initiation rate in our model.
We did not include into our model a mechanism for metastases seeding other metastases, although this phenomenon has been observed in clinical studies [130]. The main reason for this omission was the lack of reliable data for the estimation of the secondary seeding rate. By assuming the same primary and secondary seeding rates, however, we would expect metastases to initiate secondary ones when they reach around 10 8 cells, at which size they are already detectable. Hence, by considering this scenario our predictions for the time to cancer relapse would not change.
We aim to compare our model in the future to data where relapse times are given jointly with primary tumor sizes at resection. Tumor size is of course not the only relevant factor in predicting relapse times, so the model should be extended to involve other features like a measure of malignancy, possibly as in [131]. Many of the parameters of the model can differ between patients, and also between each metastasis. Therefore, including a probability distribution for the parameters could also make our model more realistic, provided that such distributions can be estimated from data. Other possible extensions could include interactions among metastatic cells and among metastatic lesions, effects of the immune system, allowing metastases to seed other metastases, and providing an estimate for the fraction of cells which can metastasize, perhaps through modelling angiogenesis.

Materials and methods
In this section we provide more details about the mathematical foundations of our model.

Single type process
Let (Z t ) t�0 be a birth-death branching process, i.e. a Markov chain on non-negative integers with transition rates The two positive constants α and β are called birth and death rate, respectively. In our model we employ this process to describe the evolution of each metastasis. We assume that all metastases have the same birth and death rate and that they are supercritical, that is they have positive net growth rate λ = α − β > 0. Moreover, since we only want to model surviving metastasis, we condition on the eventual survival of the process, that is on the event O 1 = {ω: Z t (ω) > 0 for all t � 0}. The probability of such event is equal to P( We define the first passage time to size M as T M ≔ inf{t > 0: Z t = M}. A well known property of branching processes is that e −λt Z t ! W almost surely as t ! 1, and conditioned on survival and a single initial cell W * Expo(λ/α) [31]. Since W and T M are connected by lim Early derivations of this result already appear in [132,133]. Interestingly, T M follows the Gumbel distribution Gumb max log Mð1À qÞ l The Gumbel type is an extreme value distribution. If M n denotes the maximum of n IID random variables X i , the Gumbel distribution above generally describes the limit of M n as n ! 1, when X i have an exponential (right) tail. A similar definition can be given for the reverse Gumbel distribution, i.e. the limit of minimum of IID random variables with an exponential (left) tail Y � Gumb min ða; bÞ , PðY � yÞ ¼ 1 À e À e À yÀ a b ; a 2 R; b < 0 For both of these distributions we have where γ E � 0.5772 denotes the Euler-Mascheroni constant. Hence the mean hittingtime to M cells grows logarithmically with M, while its variance remains constant Thus, for sizes M � α/λ the standard deviation is approximately equal to the mean, but since the mean only grows logarithmically with M, fluctuations of T M stay relevant even for much larger values of M.

Scaled relapse time distribution
In Results we derive the general expression for the relapse time distribution, whose full expression is obtained by combining Eqs 3, 4 and 14. Here we show how to scale the detectable size M out of this expression, so to split the distribution into a deterministic part and a stochastic term. Let us focus on the integral and apply the change of variables z ≔ t À s À 1 l log M to find Z tÀ 1 By plugging this expression back into Eq 4, at time t þ 1 l log M we get Hence, as M tends to infinity we obtain nðtÀ sÞe À ð1À qÞe À ls ds From the last two equations we also see that asymptotically as M ! 1 Explicit results for exponential primary growth Two commonly employed growth functions for primary tumors are the exponential n e (t) = e δt and the logistic n l ðtÞ ¼ Ke dt Kþe dt À 1 ones (see e.g. [33]). A logistic growth implies that the primary tumor has a carrying capacity K. During the first stages of its development n l (t) follows the same exponential trajectory of n e (t) and then approaches a constant as it gets closer to size K. As the carrying capacity is typically large, this slowdown for n l (t) happens around t ¼ logðKÞ=d. The differences between the results provided by these two growths functions thus depend on the probability of metastases being initiated by timet, i.e.
metastases are likely established in the first stages of the primary growth, i.e. when n l (t) � n e (t). Otherwise, metastases are initiated late in the primary evolution, when the two growth functions are substantially different. This feature is visualized in Fig 2, where τ densities for a logistic growth are shown to converge to the exponential ones as ν increases and the other parameters are fixed.
Using the parameter values from Table 2, however, we observe that the condition in Eq 17 is satisfied for all cancer types considered. In other words, our estimates for ν, q, K and δ yield no difference between exponential and logistic growth functions. In light of this, we study in greater detail the results obtained with n e (t).