Tumour Growth Models of Breast Cancer for Evaluating Early Detection—A Summary and a Simulation Study

Simple Summary Advanced statistical methods can be useful for understanding the roles of breast cancer risk factors in cancer progression and detection, and for assessing impacts of early detection of breast cancer in populations with implemented breast cancer screening programmes. In this article, we summarise approaches for estimating, from observational data, tumour progression models that are inspired by biological arguments. These models have the potential to be used in studies of personalised screening. We describe a simulation study that explores the impact of extending the age of screening invitation, which is currently being considered by Sweden’s National Board of Health and Welfare. Abstract With the advent of nationwide mammography screening programmes, a number of natural history models of breast cancers have been developed and used to assess the effects of screening. The first half of this article provides an overview of a class of these models and describes how they can be used to study latent processes of tumour progression from observational data. The second half of the article describes a simulation study which applies a continuous growth model to illustrate how effects of extending the maximum age of the current Swedish screening programme from 74 to 80 can be evaluated. Compared to no screening, the current and extended programmes reduced breast cancer mortality by 18.5% and 21.7%, respectively. The proportion of screen-detected invasive cancers which were overdiagnosed was estimated to be 1.9% in the current programme and 2.9% in the extended programme. With the help of these breast cancer natural history models, we can better understand the latent processes, and better study the effects of breast cancer screening.


Introduction
The main objective of breast cancer screening programmes is to detect breast cancer tumours early in their progression and thereby increase the chance of treatment being successful. Before national breast cancer screening programmes were introduced widely across Europe and other countries around the globe, several randomised mammography screening trials were conducted. These typically compared a screening/intervention group to a control (no screening) group and used mortality from breast cancer as a primary endpoint. An independent review panel analysed data from nine trials and, using Poisson regression and studying breast cancer mortality, estimated relative risks of approximately 0.8 for invited women aged between 50 and 70 [1,2].
Latent variable model-based approaches-in particular multistate models [3]-have also been used to analyse breast cancer screening trials data, in order to quantify the natural history of breast cancer [3,4]. The simplest multistate model has three states: no detectable cancer, preclinical cancer, and clinical cancer [3]. Transitions between states are not observed, but distributions of time spent in these states can be estimated from observed data, under parametric modelling assumptions. An important quantity in the multistate model is the mean sojourn time, which is the average time spent in the preclinical state. If a woman attends screening when in the preclinical state, there is a probability that her tumour will be detected. The basic 3-state model can be extended in various ways to include additional states.
Since the introduction of population-based screening programmes, a range of (latent variable) model-based approaches have been used to assess the impacts of screening. Notably, in 2000, the US National Cancer Institute established the consortium CISNET (Cancer Intervention and Surveillance Network), which studies breast (and other) cancer mortality trends. For breast cancer, there are six groups [5], each using different models. Some are based on multistate modelling assumptions, while others use continuous growth models.
The CISNET groups used simulation approaches based on calibrations to US population incidence and mortality statistics and employed their natural history models to study the relative impacts of treatment and screening on incidence and mortality trends [6,7]. Changes in these impacts across time are important to study, as both methodologies of screening and breast cancer treatment are continuously evolving [8]. CISNET argues that, by analysing data using several different models, it incorporates model uncertainty. Data on screening attendance were not available in these studies.
It is important to study the impacts of screening not only in terms of early detection and mortality reduction but also in terms of its harms; see Table 2 in Trentham-Dietz et al. [9] for common outputs of the CISNET breast cancer models. Recalling a woman after screening, when further testing does not lead to a diagnosis (i.e., a false positive), can cause a psychological burden [10] and lead to unnecessary resource use. False positives can also reduce the probability that a woman attends further screens [11]. Another major harm of screening is overdiagnosis, which can be defined as screen-detected cancer that would otherwise not have been diagnosed in the woman's lifetime [12,13].
The age-based screening programmes used in today's health care systems may, in the future, be replaced by individualised risk-based screening programmes [14], although it must be acknowledged that there are a number of hurdles that first need to be overcome [15]. When designing such programmes, more detailed knowledge of the natural history of breast cancer will be useful. Screening programmes will need to distinguish which women have a high risk of getting the disease but can also make use of individual-level information on how long breast cancers are likely to be present in women's bodies before symptoms emerge, and on the detectability of the cancers at screening as they progress. In recent years, a number of studies have made use of data on screening attendance collected in populationbased studies of breast cancer, in order to learn more about these processes. Screening attendance data can be important for better estimating underlying/latent processes in the natural history of the disease.
In this article, we summarise some recent developments in this area, i.e., for fitting natural history models to observational (and detailed) breast cancer screening data. Our focus is on continuous growth models and invasive breast cancer. Continuous growth models represent a class of models in which tumours are assumed to grow in size according to a particular mathematical growth function. These models have advantages over multistate models in terms of the ease in which individual risk factors can be incorporated and in terms of interpretability. We focus on invasive breast cancer (invasive ductal or invasive lobular) as the models, so far, have not been developed fully for noninvasive breast cancer. After describing the methodology, we provide an example of how continuous growth models can be employed, once trained on observational data. We describe a simulation study which explores consequences of extending the age limit of screening-a potential policy change that has been raised by local political groups and is currently being considered by Sweden's National Board of Health and Welfare [16]. Our simulation approach does not account for all aspects of the benefits/harms of screening (we do not, e.g., include false-positive results) and should therefore be considered as illustrative. It does, however, show how detailed models of tumour onset, progression, and detection-once trained on observational data-can in turn be used to study impacts of changes in screening policy. Overdiagnosis is a particularly relevant concern to be aware of when considering the consequences of extending the age limit of screening, and this is considered here, albeit only for invasive breast cancer.

Inference Methods
In this section, we summarise inference approaches that have been proposed for estimating continuous growth models using observational mammography screening data. Just like multistate models, continuous growth models are latent variable models. The simplest continuous growth model assumes that tumour size follows a particular growth function, and it accounts for individual variability between tumours by allowing growth rates to follow a probability distribution. Continuous growth models incorporate biological assumptions more explicitly than multistate models do. Multistate models developed for breast cancer screening data have mostly assumed exponentially distributed transition times in the nonabsorbing states, although other distributional assumptions have also been made [3]. As Uhry et al. [3] have pointed out, the most commonly employed multistate models are prone to underestimating the variances in tumour progression processes. Other problems with "classical" multistate Markov models are described in Section 2.2 of Weedon-Fekjaer et al. [17].

Components of Models of Breast Cancer Tumour Progression
The earliest continuous growth model for observational mammography screening data that we are aware of is the one described by Weedon-Fekjaer et al. [17,18]. This included two components, namely a growth function for tumour volume (logistic growth function with lognormal random effects) and a logistic screening sensitivity function (of latent tumour diameter). Prior to this, a related approach was described for data collected from unscreened populations, which included components for tumour growth, localised spread, and distant metastatic spread [19]. More recently, a number of models were described for observational screening data, which also incorporate components for symptomatic detection of the primary tumour [20], tumour onset [21], and lymph node metastasis [22]. The inclusion of symptomatic detection of the primary tumour built on an earlier approach described for unscreened populations [23]. The continuous growth model which we will use in our analyses (Section 3) has the following components/submodels:

•
Onset of the primary tumour-onset is defined as the point at which a tumour reaches a diameter of 0.5 mm (from which it can reasonably be assumed to have deterministic growth), and its distribution is defined according to the Moolgavkar-Venson-Knudson carcinogenesis model [24]. Tumours are assumed to be spherical.

•
Growth of the primary tumour-tumour volume is assumed to follow an exponential growth function where the inverse growth rate is a gamma random effect.

•
Lymph node spread-a nonhomogeneous Poisson process with rate of spread assumed to be proportional to the number of cell divisions (raised to a power) and the rate of growth of the primary tumour. • Symptomatic detection-the hazard rate of symptomatic detection is proportional to the (latent) tumour volume. • Detection via mammography-screening test sensitivity follows a logistic function of the (latent) tumour diameter.
The above lymph node spread model has been shown to fit observational data more closely than models based on alternative rate functions (e.g., those that are used in CISNET models) [22]. The exponential growth model (with inverse growth rates distributed as gamma random variables) has convenient mathematical properties: for example, in the absence of screening, the distribution of symptomatic tumour sizes has a closed form. Other choices of the above submodels are of course possible, and theoretical results for general functions of tumour growth and tumour detection have recently been established [25]; see also Section 2.2.1.

Likelihood Inference for Incident Cases and Cohort Designs
Both likelihood [3,4] and Bayesian inference [26] methods have been developed for fitting multistate models to screening cohort data. For trials data, likelihood-based methods have been used to fit multistate Markov models-these have been cohort studies, and likelihoods have, e.g., been constructed on the basis of incidence of interval cancers (modelled using a Poisson process) and the incidence of cancers at screening rounds [3,4,27]. In addition to cohorts, large studies of incident breast cancer have the potential to provide information on latent processes of tumour progression, although these types of studies have rarely been used for this purpose. In this section, we summarise likelihood inference approaches for fitting continuous tumour growth models to both samples of incident cases with information on screening histories and to mammography screening cohort data.

Continuous Growth Models for Collection of Incident Cases
Likelihood inference of tumour progression based on incident cases relies on the concept of a stable disease population, which describes the disease population dynamics of women with breast cancer, under the assumption of no screening. Screening conditions are then imposed on to the stable disease population through the likelihood function. The general idea behind the likelihood is to probabilistically project the tumours backwards in time to the hypothetical times of onset-taking each woman's screening history, mode of detection, and tumour characteristics into account-and from there, at a population level, to infer the most likely growth trajectories for tumours.
The stable disease population is represented by breast cancer incidence being constant (in the absence of screening) and by the disease progressing according to time-constant rules. In Isheden and Humphreys [28], a stable disease population was formalised in the absence of screening by assuming that each woman can pass through three discrete states: • P free -a disease-free state (prior to breast cancer tumour onset). • P tumour -a breast cancer state (preclinical/as yet undetected). • P after -a post-symptomatic detection state.
A woman can only pass from P free to P tumour and from P tumour to P after . Three further assumptions about the population were also made:

•
The rate of births in the population is constant across calendar time.

•
The distribution of age at tumour onset is constant across calendar time.

•
The distribution of time to symptomatic detection is constant across calendar time.
These formulations and assumptions make the connection with multistate Markov models explicit (the same assumptions are in fact made in multistate Markov models). In continuous growth models, however, tumour growth is modelled explicitly on a continuous scale and, unlike multistate models, entrance to P tumour is defined explicitly in terms of a (small) tumour size at which tumours become detectable.
The above three assumptions together lead to a constant rate of new cancers entering the population and a constant incidence of (diagnosed) breast cancers. From these assumptions, two important properties were formalised in mathematical lemmas [25].
The first relates to the probability of screening a woman with breast cancer and states that "The probability for an individual to have a pre-clinical tumour at a particular/current time point is proportional to the time it will spend in tumour growth".
The second property relates to the time a screen-detected woman has spent in the preclinical state and can be stated as "For an individual currently with a pre-clinical tumour, the probability that tumour onset occurred t years earlier is uniformly distributed over the eventual time it will spend in the pre-clinical state".
These two properties greatly simplify likelihoods for observable data (tumour volumes given mode of detection and history of negative screens) in terms of tumour growth, screen- ing sensitivity, and symptomatic hazard functions. Isheden and Humphreys [25] essentially showed that likelihood inference of tumour progression based on these three functions can be carried out as long as the functions are calculable and the tumour growth function is monotonously increasing (they presented explicit results for exponential, Gompertz, and logistic growth functions). This was made possible by displaying two further properties of the progression models under stable disease assumptions.
The first property is that the probability density function for the growth parameter conditioned on there being an undetected tumour of size v is equal to the probability density function for growth rate conditioned on a tumour being symptomatically found at size v.
The second property is that the probability density for tumour size, conditioned on an individual belonging to the preclinical tumour growth state, is proportional to the probability density of volume at symptomatic detection divided by the hazard for symptomatic detection at that volume.
Based on the above-mentioned properties and functions, a likelihood for tumour size conditional on screening history, clinical characteristics, and mode of detection can be calculated and used to estimate the parameters of the (e.g., tumour growth) functions, with limited bias.
Finally, we note that the stable disease assumption described here is typically used also in cohort studies (both when fitting multistate and continuous growth models), although there the assumptions are not relied on so explicitly as they are with incident cases.

Continuous Growth Models for Screening Cohorts
The continuous growth model of Weedon-Fekjaer et al. [17,18] was fitted to data from a screening examination in the Norwegian Breast Cancer Screening Program. In [18], this was performed by optimising a likelihood function that was based on jointly modelling the incidence of cases at screening and the incidence of interval cancers (the cancers detected symptomatically after the prevalent screen and before the next scheduled screen). In [17], tumour sizes of the screen-detected cases (using a multinomial distribution) and the incidence of interval cancers (using a Poisson distribution) were jointly modelled. Strandberg et al. [29] fitted a continuous growth model to mammography screening cohort data by jointly modelling age at detection, mode of detection (screen/symptomatic), and tumour size, conditional on screening attendance. This was made possible by the inclusion of a submodel for age at tumour onset. This also made it possible to incorporate the lefttruncation (women previously diagnosed with breast cancer are not included) inherent in these kinds of cohorts. From an epidemiological methods/statistical epidemiology point of view, these approaches are more conventional than those described above for the collection of incident cases (Section 2.2.1), so we do not provide additional descriptions of the methods here but refer the reader to the original articles. We note that in Strandberg et al. [29], risk of tumour onset was modelled as a function of a number of breast cancer risk factors. There is good reason to believe that the cohort design is likely to be more reliable (more strongly identified parameter estimates, lower variances) than the incident cases design for estimating the parameters of the natural history models, because the cohort approach uses information on time (age at diagnosis) that is not included in the incident cases approach. Strandberg et al. [29] concluded, in their cohort study, that their estimates for growth rates and tumour doubling times were comparable to those obtained from studies of sequential mammograms/ultrasound images.

Simulation-Based and Inference-Based Evaluations of Early Detection
Consequences of early detection of breast cancer can be evaluated in different ways. Screening trials and CISNET simulation approaches focus on the effect of the intervention, i.e., of being invited to screening, or the effect of participating in the programme on breast cancer mortality. This is the type of approach we took in our simulation study (Section 3). It is, however, also possible to study the effect of being screen-detected, e.g., on breast cancer survival. Analysis of trials data for studying the effect of invitation to/participation in screening on breast cancer mortality is relatively straightforward (although consideration may need to be given to adherence to "treatment" or healthy screenee bias [30]) and can be based on estimating relative risks. Focusing on screening participation, CISNET have used simulation-based approaches extensively, e.g., to gain insights into the relative contributions of treatment and screening across time [6,7]. This type of analysis is more complex.
For evaluating the effect of being screen-detected on breast cancer survival, special care has to be taken to handle a number of biases [31] which can have a large impact on survival comparisons between different sets of individuals, in particular between those detected by screening and those detected in the interval between two screens. The biases, which specifically occur in screening data, result from lead time (the time between screen detection and when the tumour would have been detected through symptoms), length time (may be defined as the time a tumour is observable by screening or the time a tumour is present in a woman's body), and overdiagnosis (women diagnosed with breast cancer by screening, whom in the absence of screening would not have been diagnosed in their lifetime).
Based on the continuous tumour growth model, including separate models for tumour growth, time to symptomatic detection, and mammography screening sensitivity, Abrahamsson et al. [32] derived a formula for the lead-time distribution, conditional on a screen-detected individual's tumour size at detection, previous screening history, and mammographic percentage density. These distributions are informative in their own right, e.g., for quantifying how lead time is longer for tumours that are small at screen detection. If covariates are included in the different submodels of the continuous tumour growth model as, for example, in Abrahamsson et al. [33], it is possible to make individual-level comparisons. From quantifying the inverse association between breast size and rate of symptomatic detection [33], it is possible to quantify the extent to which lead time is longer for women with large breasts than for women with small breasts. The conditional lead-time distributions can in turn be used to correct for the lead-time bias in survival comparisons, by subtracting lead times from screen-detected cases' survival times. Lead times may also be useful for estimating an individual's risk of being overdiagnosed.
Abrahamsson et al. [32] also described a simulation study using the same tumour growth model, to illustrate potential biases and causal effects of screen detection on breast cancer survival. Additional models for the risk of breast cancer (age at tumour onset), death from breast cancer (measured from time of detection), and deaths from causes other than breast cancer were postulated for these comparisons. Different counterfactual scenarios (in terms of screening attendance) were simulated. These simulations were used to study biases of lead time and length time. It was demonstrated that not only the tumour growth rate but also the symptomatic tumour size is a part of the length bias through any link between tumour size and survival, and it was also explained how this has a bearing on the way that observable breast cancer-specific survival curves should be interpreted.
Finally, we note that with continuous growth models, equations can be derived to infer the impacts of screening directly. Isheden et al. [34] demonstrated this for lymph node spread. For lymph node-positive cancers at the time of diagnosis, they characterised the probabilities of having already seeded lymph node metastases at different lengths of time prior to detection of the primary tumour and evaluated, for screen-detected lymph node-negative cancers, the probabilities that they would have exhibited lymph node spread if their primary tumours had instead been detected later in time.

A Simulation Study-Extending the Age of Screening Participation
A mammography screening programme with nationwide coverage has been in place in Sweden for more than three decades. Currently, women in Sweden are invited to mammography screening between the ages of 40 and 74, every 18-24 months (depending on the healthcare region) [35]. There have been some suggestions from local political groups that the age limit should be increased [16].
To showcase how continuous growth models can be used to assess screening (which can be easily extended to risk-based individual screening), we present a simulation study based on hypothetically extending the current screening programme in Sweden to age 80. This amounts to adding three more screening rounds-at 76, 78, and 80. We use simulations from a breast cancer natural history model to compare the outcomes in a population subjected to such an extended programme to the outcomes of the current programme.

Parameter Values Used in the Simulation
Our simulations are based on models that have previously been estimated on data from a large Swedish mammography screening cohort [29] and estimates of breast cancer survival taken from published literature; see Appendices A and B. The parameter estimates of the tumour onset, progression, and detection models that we obtained from the Swedish screening cohort are maximum likelihood estimates (see [29] for a description of the estimation procedure). Point estimates and confidence intervals are provided in Table A1 in Appendix A. Estimates of the variance of the parameter estimates were used in some of the simulations. For example, when estimating life expectancy increases due to attending screening, 90% confidence intervals for the means were generated through propagating the estimation errors by sampling parameter values from a multivariate normal distribution using the point estimates as the mean vector and the inverse of minus the hessian of the loglikelihood function as the covariance matrix (see Section 3.3; Results). This was performed over 100 simulations, each containing one million in silico individuals. Uncertainties in estimates of breast cancer survival were not incorporated (uncertainties were unknown since the estimates were extracted from published sources). For other simulations, results are based on averages from ten simulations, each of one million individuals, and do not incorporate variability in parameter estimates (see Section 3.3). For evaluating overdiagnosis, we used external data on mortality from causes other than breast cancer. Mortality rates from causes other than breast cancer were extrapolated by subtracting the 2019 Swedish breast cancer-specific mortality rates from NORDCAN [36] from the 2019 Swedish all-cause mortality rates from population data by Statistics Sweden [37].

Description of the Simulation Approach
We simulated data using the four-component model presented by Strandberg et al. [29]. This consists of the Moolgavkar-Venson-Knudson carcinogenesis model for the age at onset, exponential tumour growth with gamma-distributed inverse growth rates, a continuous hazard of symptomatic detection proportional to the concurrent tumour volume, and logistic functions for the screening sensitivity proportional to the concurrent tumour diameter. Additionally, lymph node metastasis is simulated according to the model developed by Isheden et al. [34]. According to this model, the number of affected lymph nodes at diagnosis, conditional on the observed tumour size, follows a negative binomial distribution. As described in Section 3.1, the submodel formulas and parameter values used are listed in Appendix A.
We begin by sampling the age at onset and the inverse growth rate of the tumour. Based on these, we sample the age when symptomatic detection will occur, including the tumour size and number of affected lymph nodes at detection. We then superimpose a screening programme where the screening sensitivity at each screening round is calculated, and the result is simulated. Screen detection occurs at the first positive screen, with concurrent tumour size and sampled lymph node status. (Note that, since the screening programmes are not indefinite, screen detection is not guaranteed. In those cases, the age is set to infinity, and missing tumour size and node status are omitted).
Based on the outcome of each of the two modes of detection, two separate breast cancer survivals are sampled, along with age of death from other causes. The observed outcome is then determined by whichever occurs first, either screen-detected breast cancer, symptomatic breast cancer, or death before breast cancer diagnosis. For the breast cancer cases, cause of death and all-cause survival is determined by whichever death occurs first.
We then tally the following metrics for each screening programme we simulate: • Number of mammograms performed. Stage shift, which occurs when early detection causes the stages of either the primary tumour size or the number of affected lymph nodes to shift down one or more levels according to the categorisation as described in Table 1 (T: primary tumour or N: lymph node metastasis). • Number of breast cancer deaths. • Lead time, the time "gained" between screen detection and would-be symptomatic detection.

•
Survival differences (all causes, difference between screen-detected and would-be symptomatic diagnosis). N-stage = lymph node metastasis stage.

Results
In Table 2, we compare three different screening programmes: no screening; biennial screening between 40 and 74, representing the current Swedish programme; and biennial screening between 40 and 80, representing a proposed extension to the Swedish programme. Each programme is simulated using the procedures described above, and the numbers presented are averages over 10 runs of one million in silico women each. The columns labelled "None", "Current", and "Extended" represent these respective screening programmes. We separate the cases into the following five categories, four of which represent screen-detected cases:

•
Overdiagnosed-screen-detected cancer but has a non-cancer-related death before the time she would have been symptomatically detected. • T-shifted-cases that were not overdiagnosed and where the T-stage was shifted due to early detection (but N-stage was not shifted). • N-shifted-cases that were not overdiagnosed and where the N-stage was shifted.
(This category also includes cases which where both T-and N-shifted.) • Other screen-detected-the remainder of the screen-detected cases. • Interval cancer-detected symptomatically between the current and next screening rounds. breast cancer, of which 1070 are overdiagnosed. This implies that 0.1% of women will be overdiagnosed with breast cancer under the current programme. The number of breast cancer deaths is reduced by 18.5%, and the average survival difference is 2.79 years.
If screening is extended to age 80, the number of mammograms performed increases by 12.9%, and 24.7% more cases are detected through screening. The number of breast cancer deaths decreases by another 3.9%, but the number of overdiagnosed women increases by 95%. Compared to no screening, the number of breast cancer deaths is reduced by 21.7%.
For the 1301 women whose death from breast cancer was prevented by attending the extended screening programme, survival times increased by a mean (median) of 4.1 (3.1) years. For the 1018 additionally overdiagnosed cases, the mean (median) remaining life years was 3.7 (2.4). Due to the unnecessary treatment, these life years would be of a lower quality compared to not attending the extended programme.
Instead of considering a screening programme in its entirety, we can inspect each screening round separately. In Figure 1, we present a breakdown of the case types for each screening round in the extended screening programme. (Note that the results for each screening round up to age 74 are identical to those that would be obtained when simulating from the current screening programme). Numbers are presented as per 100,000 mammograms performed. The number of cases (per 100,000 mammograms) increases with age-including the number of screen-detected cases. The proportions of screen-detected cases at each screening round which are overdiagnosed ranges from 0.1% to 4.8% over the current screening programme; the respective proportions for the additional screening rounds are 5.9%, 7.2%, and 10.0%.   For the screen-detected cases, we can estimate survival differences that are due to early detection. (For all but the screen-detected cases, this survival difference is zero). In Figure 2, we present the differences in all-cause survival, comparing the observed screen-detected survival to the would-be symptomatic survival. To account for lead-time bias, we begin counting the screen-detected survival from the time of would-be symptomatic detection. The screen-detected cases are separated by screening round, and the mean, 5th, and 95th percentiles of survival difference are presented.  The mean survival difference decreases as the remaining life expectancy (due to deaths from other causes) decreases. The average survival differences are 5.1 years for screen-detected cases at age 40; 4.2 years at age 50; 2.8 years at age 60; and 1.6 years at age 70. The last screening of the current programme at age 74 shows an average survival difference of 1.1 years, and we observe that extending the screening programmes would provide respective mean survival differences of 1.0, 0.8, and 0.6 years for the additional screenings at 76, 78, and 80. Thus, the average survival difference across the whole extended programme would be lower than that of the current one, as seen in Table 2. The mean survival difference decreases as the remaining life expectancy (due to deaths from other causes) decreases. The average survival differences are 5.1 years for screendetected cases at age 40; 4.2 years at age 50; 2.8 years at age 60; and 1.6 years at age 70. The last screening of the current programme at age 74 shows an average survival difference of 1.1 years, and we observe that extending the screening programmes would provide respective mean survival differences of 1.0, 0.8, and 0.6 years for the additional screenings at 76, 78, and 80. Thus, the average survival difference across the whole extended programme would be lower than that of the current one, as seen in Table 2.
By adding all the nonzero survival differences for a screening round and dividing by the number of mammograms performed, we obtain the prolonged life expectancy for attending that round. The resulting number is counted in days and is shown in the left panel of Figure 3 for each screening round between 40 and 80.
Cancers 2023, 15, 912 11 By adding all the nonzero survival differences for a screening round and dividi the number of mammograms performed, we obtain the prolonged life expectanc attending that round. The resulting number is counted in days and is shown in th panel of Figure 3 for each screening round between 40 and 80. We observe that the average life expectancy increases by 3.3 days if attendin first round at age 40. This steadily increases until it reaches 4.4 days with the scre round at age 58. The reason for the increase is that breast cancer incidence is rela low in the 40s. After this, the additional life expectancy decreases until the end o current screening programme, where the additional life expectancy is 2.6 days if at ing at age 74 ( Figure 3). The decrease is due to the fact that the remaining life expec decreases with age (regardless of the screening). We observe that the extra life expec decreases further for each of the three additional screening rounds-2.2, 1.9, and 1.6 respectively. The estimated combined life expectancy change for the current scre programme, considered in its entirety, is currently 60.8 days (90% CI: 51.7-71.4), wh is 64.8 days (90% CI: 55.0-76.4) for the extended programme.
The reciprocal of these results can be represented as the number of mammog performed per year of life (life year) gained, seen in the right panel of Figure 3. Th We observe that the average life expectancy increases by 3.3 days if attending the first round at age 40. This steadily increases until it reaches 4.4 days with the screening round at age 58. The reason for the increase is that breast cancer incidence is relatively low in the 40s. After this, the additional life expectancy decreases until the end of the current screening programme, where the additional life expectancy is 2.6 days if attending at age 74 ( Figure 3). The decrease is due to the fact that the remaining life expectancy decreases with age (regardless of the screening). We observe that the extra life expectancy decreases further for each of the three additional screening rounds-2.2, 1.9, and 1.6 days, respectively. The estimated combined life expectancy change for the current screening programme, considered in its entirety, is currently 60.8 days (90% CI: 51.7-71.4), whilst it is 64.8 days (90% CI: 55.0-76.4) for the extended programme.
The reciprocal of these results can be represented as the number of mammograms performed per year of life (life year) gained, seen in the right panel of Figure 3. The average number of mammograms required across the current programme is 101, and each new round from age 76 to 80 would require 164, 195, and 234 mammograms per life year, respectively.

Discussion
In this article, we first summarised recently developed approaches for inferring processes of tumour progression, based on continuous growth models, from observational breast cancer screening data, and we then presented a simulation study using tumour onset, growth, and spread models-estimated using data from a Swedish screening cohort [29]together with published survival data to investigate the hypothetical effects of extending the Swedish screening programme to age 80.
In the first part of this article, we summarised approaches that have been developed for both samples of incident cases and for screening cohort designs. Although we described fundamental assumptions of the models and other assumptions upon which estimation procedures are based, it is important to note that other intricacies/assumptions have, to some extent, been overlooked. For example, we did not consider implications of an association between growth rate and screening attendance. In particular, though, samples may in practice be subject to selection processes (e.g., may include only women of screening age), which may have some bearing on the consistency of the parameter estimates. In the Swedish cohort study, we accounted for left truncation. For incident cases, such selection processes can be more difficult to account for. Whilst these are interesting issues from a statistical perspective, they are probably not crucially important, and it has to be acknowledged that these issues have also been omitted in the considerable literature on more conventional multistate Markov models. The statistical analysis of cancer screening data is notoriously complex, with many subtle issues and sources of bias.
Our simulation study illustrates how natural history models can be used for studying screening. However, it is important to note that we have combined models based on data from different countries (the survival models were not trained on Swedish data). Some caution should therefore be taken when interpreting results. In our study-investigating extending the age of screening invitation-we could observe how additional screening rounds increase early detection; the 13% more mammograms allowed early diagnosis of 25% more cases. Not surprisingly, screening at later ages considerably increases the rate of overdiagnosis: the rate is nearly doubled compared to the current programme. For policy makers, an important question that is related to other questions of cost-effectiveness [38] is whether or not such absolute rates of overdiagnosis would be tolerable. We also observed (results not shown) that overdiagnosis rates increase rapidly if the upper age of screening is extended even further. Additional screenings will further reduce breast cancer mortality, but survival differences will be lower than at previous screening rounds, due to lower conditional life expectancies. While further considerations of the benefits and harms are necessary for any decision to be made, our simulation study demonstrates key points and illustrates how effects of changing screening can be quantified. These kinds of results could, in principle, be used to inform the design of a screening trial. With regard to harms of screening, a particular limitation of our simulation, as mentioned in the Introduction, is that we do not include false positives. To include false positives, we would require a model of screening specificity. CISNET uses specificities of film and digital mammography provided by the U.S. Breast Cancer Surveillance Consortium as a common input-these values are based on age, breast density, and screening interval (first screen or subsequent, and if subsequent, whether they have annual, biennial, or triennial screening) [39].
Our particular simulation study of screening programmes was strictly age-based, but still demonstrated how screening changes affect population outcomes. The models used-and the simulation study performed-can be extended to individualised risk-based screening. The key to such studies would be to enable each submodel to depend on individual covariates. This is where the observational studies play an important role. For example, Abrahamsson et al. [33] allowed screening sensitivity to depend on mammographic/breast density, tumour growth rates to depend on BMI, and rate of symptomatic detection to depend on breast size. In Isheden et al. [34], both the tumour growth rates and rates of lymph node metastasis were modelled as a function of hormone replacement therapy. This modelling flexibility allow us to untangle various relationships between risk factors and screening. For example, breast density is known to increase breast cancer risk as well as reduce the screening sensitivity by masking. Risk-based screening (screening high-risk individuals more frequently or with particular imaging modalities) will rely on risk prediction tools. For effective risk-based screening, one challenge is to predict the risk of different subtypes of cancer since different subtypes defined, e.g., by gene expressions, have very different prognoses [40,41].
A limitation of our simulation study is that it only considers invasive cancers. Ductal carcinoma in situ (DCIS) is a precursor to invasive breast cancer and accounts for approximately 20% of all breast cancers. Detected cases of DCIS are usually considered to have limited malignant potential (while DCIS with high malignant potential will typically have progressed to invasive by the time it is detected). The DCIS cases which are detected are predominantly screen-detected. This combination of features means that DCIS most likely constitutes the majority of alleged overdiagnosis [13], and when DCIS cases are included, estimates of levels of overdiagnosis are much higher than those reported here [42]. At the same time, screen-detected DCIS represents the ideal outcome for early detection, provided that it would progress to invasive and not be a case of overdiagnosis. To obtain the full picture of mammography screening, DCIS needs to be better understood in order to separate the screen-detected cases into cases of successful early detection and cases of overdiagnosis.

Conclusions
Natural history models can be useful for understanding the underlying events and processes of breast cancer and represent useful tools for assessing screening, particularly when programmes are already implemented and widespread. We summarised recent developments for estimating one type of these models, continuous growth models from observational screening studies. If these models can capture the complex roles of breast cancer risk factors in cancer progression and detection, they may be able to facilitate the process of designing personalised risk-based screening. Simulation studies using detailed models of tumour onset and progression can be powerful tools for evaluating screening policy changes and risk-based screening, but they need to consider a range of benefits and harms of screening.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Regional Ethical Review Board in Stockholm (Dnr 2010/958-31/1).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this article are available on request from the corresponding author. The data are not publicly available due to being large, simulated data sets with up to 100 different runs. The description of the simulation procedures and the contents of the appendices should be sufficient to understand how the data was generated. Upon request, one simulated example data set can be provided.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Submodels and Parameter Values
The Moolgavkar-Venson-Knudson model used for the age at onset is given by the where V(z) is the concurrent latent tumour volume at time z.
for parameters β 0 , β s , where d is the concurrent latent tumour diameter. The number of affected lymph nodes N at detection, given the primary tumour volume V = v at detection, is distributed according to a negative binomial with probability mass function with the parameters k, γ 1 , γ 2 > 0, and v 0 ≈ 0.0655 mm 3 . This has been shown to be consistent with a model in which lymph node spread follows a nonhomogeneous Poisson process with rate of spread proportional to the number of cell divisions (raised to a power, k) and the rate of growth of the primary tumour [22]. The values of the parameters of each respective submodel used in the simulations (with 95% confidence intervals, which were used to construct the error bars in Figure 3) are presented in Table A1.

Appendix B. Survival Functions
In the simulation case study, we used breast cancer survival functions taken from a Surveillance Epidemiology and End Results (SEER) program study [43]. The Kaplan-Meier curves for lymph node-negative (N0) breast cancer survival, stratified by tumour size, are plotted in Figure A1. The baseline hazards in Figure A1 were modified by (proportional) hazard ratios based on the N-stage at diagnosis [44]. These are shown in Table A2.  The baseline hazards in Figure A1 were modified by (proportional) hazard ratios based on the N-stage at diagnosis [44]. These are shown in Table A2.