Alzheimer’s disease as a multistage process: an analysis from a population-based cohort study

In cancer research, multistage models are used to assess the multistep process that leads to the onset of cancer. In view of biological and clinical similarities between cancer and dementia, we used these models to study Alzheimer’s disease (AD). From the population-based Rotterdam Study, we included 9,362 non-demented participants, of whom 1,124 developed AD during up to 26 years of follow-up. Under a multistage model, we regressed the logarithm of AD incidence rate against the logarithm of five-year age categories. The slope in this model reflects the number of steps (n–1) required for disease onset before the final step leading to disease manifestation. A linear relationship between log incidence rate and log age was observed, with a slope of 12.82 (95% confidence interval: 9.01-16.62), equivalent to 14 steps. We observed fewer steps for those at high genetically determined risk: 12 steps for APOE-ε4 carriers, and 10 steps for those at highest genetic risk based on APOE and a genetic risk score. The pathogenesis of AD complies with a multistage disease-model, requiring 14 steps before disease manifestation. Genetically predisposed individuals require fewer steps indicating that they already inherited multiple of these steps. Unravelling these steps in AD pathogenesis could benefit the development of intervention strategies.


INTRODUCTION
AGING Originated in cancer research, multistage models have been used to gain more insight in the number of steps before disease manifestation. These models are able to estimate the number of steps ('mutations') required for a healthy cell to become malignant [5]. After undergoing several of these rate-limiting steps, the last mutation will ultimately lead to clinical manifestation of the disease. These models have yielded consistent findings across a variety of cancers, supporting the notion that the occurrence of cancer is the end result of seven, successful mutations [5].
Cancer and neurodegenerative disease, including AD as its most common form, may be seen as two opposite ends in cell proliferation. Yet they share biological and clinical characteristics, including dysregulations in key DNA repair and inflammation processes, an increasing incidence with advancing age, and rapid disease progression after diagnosis [6,7]. Moreover, they share a complex inheritance pattern with genetic pleiotropy [8]. For instance, a recent GWAS found a positive genetic correlation between AD and cancer genes, further supporting the genetic overlap between these two diseases [8].
Given the commonalities between neurodegenerative diseases and cancer, the multistage model has recently been successfully applied to model the incidence rate of amyotrophic lateral sclerosis, a rare neurodegenerative disease, as a six-step process [9]. So far, this multistage modelling has not been used for AD. We therefore applied a multistage model within a large, populationbased study to test the hypothesis that AD is a multistage process. We determined the number of steps required for disease onset and hypothesized that if AD complies with a multistage process, the number of steps will be smaller in genetically predisposed individuals as these individuals may already inherited one of these key steps.

RESULTS
During a follow-up of up to 26.1 years, 1,124 out of 9,362 participants were diagnosed with AD, (median follow-up 10.3 years [interquartile range 10.1 years].). Table 1 shows the baseline characteristics of the study population. In this sample, 58.2% of the participants were women. Of the included participants, 2,624 were APOE ε4 carriers (28.0%).

Multistep model
The adjusted R-squared for the relation between log AD incidence rate and log age was 0.93, indicating a linear correlation, which is in line with the multistage model. The estimate of the slope (number of steps minus 1) for  Figure 1).

Considering genetic risk
When considering only the APOE-related risk of developing AD, we found that APOE ε4 genotype noncarriers needed more steps to develop AD compared to APOE ε4 carriers (16 steps for non-carriers, 12 for carriers). In an exploratory analysis, we also examined the number of steps among participants homo-or heterozygous for APOE ε4 separately. Participants homozygous for the APOE ε4 allele required 10 steps, while participants heterozygous for APOE with ε3 and ε4 or ε2 and ε4 required 16 steps to develop AD. Similarly, we found for participants in the low-risk tertile of the genetic risk score that more steps were required to develop AD compared to those in the highrisk tertile (16 steps versus 13 steps). When stratifying on both APOE ε4 carrier ship and the genetic risk score, we found that for every increase in tertile of the genetic risk score, APOE ε4 carriers needed less steps to develop AD compared to the APOE ε4 non-carriers. This translated into ten steps for APOE ε4 carriers in the high-risk tertile, compared to 16 steps for non-carriers for APOE in the low-risk tertile ( Table 2).

DISCUSSION
In this population-based study using long-term follow-up of AD, we found evidence that the development of AD follows a multistage process with 14 steps. This indicates that 14 steps are required for the clinical occurrence of AD in the general population. The number of steps was modified by the level of genetic predisposition, translating into six less steps for those individuals at highest genetic risk for AD, compared to those at the lowest genetic risk.
The multistage models have been extensively used in cancer research to provide more insight in their underlying pathogenesis [10][11][12][13][14]. Several studies showed that seven steps were required to develop cancer, which may reflect somatic mutations, genomic rearrangements, or changes in tissue interactions and environment. Neurodegenerative diseases show several similarities with cancer such as dysregulation of DNA repair mechanisms. Yet, the multistage model has only been applied to amyotrophic lateral sclerosis which appears to follow a multistage process with six rate-limiting steps. In this study, we show that AD also can also be modelled as a multistage condition consisting of 14 steps, stressing the genetic complexity and the variety of potential biological pathways involved in the development of this disease.

AGING
We found that the number of steps for AD differed between individuals with different degrees of genetic predisposition. APOE ε4 carriers require a smaller number of steps to develop AD compared to APOE ε4 non-carriers. Moreover, these effects became even more pronounced when additionally considering 23 ADassociated genetic variants. Compared to those at highest genetic risk (i.e. APOE ε4 carrier and within the third tertile of the weighted genetic score), individuals at lowest genetic risk (i.e. APOE ε4 non-carrier and within the first tertile of weighted genetic score) needed six more rate-limiting steps to develop AD. These findings are in line with previous observations in cancer research showing different thresholds before disease becomes clinically apparent between inherited and sporadic cancer events. For instance, individuals with familial adenomatous polyposis are at increased risk of colon cancer due to one mutated copy of the APC gene. It has been shown that these individuals need one step fewer in the overall pathological process to develop clinical colon cancer than individuals without this mutated gene [10]. Furthermore, children with inherited retinoblastoma required only one hit to develop this disease, whilst sporadic retinoblastoma cases became clinically apparent after two hits [15]. Our findings may suggest that individuals with genetic predisposition begin several stages further down the chain of the required pathological threshold before AD becomes clinically apparent.
Although our findings suggest that 14 steps are needed for AD to emerge clinically, the underlying biological pathways and changes reflected by these steps still need to be identified. To date, eight different biological pathways involved in the pathogenesis of AD have been identified using genetic variants in AD [16]. The APOE ε4 allele is the most significant genetic risk factor due to its high prevalence and strong relation to AD. It is involved in four of these pathways, including cholesterol transport, hematopoietic cell lineage, clathrin/AP2 adaptor complex, and protein folding pathways. Our finding that APOE ε4 non-carriers need four more steps before AD clinically manifests compared to APOE ε4 carriers taps into this observation, and could indicate that changes in the abovementioned four pathways are indeed necessary to acquire before AD manifests clinically. This could mean that these pathways are already changed or dysregulated at birth in APOE ε4 carriers, indicating that these individuals subsequently have a lower resilience to the development of dementia. This could in turn lead to a lower required number of subsequent steps before disease manifestation. Indeed, up to 18% of the APOE ε4 carriers in this study developed AD during follow-up, yet the lifetime risk of AD among these individuals is even higher with almost half of all them developing AD in their remaining lifetime. For carriers homozygous for APOE ε4 in the high-risk tertile, this risk is even higher, and the disease moreover manifests earlier, with a 29-year difference in age at onset for AD, compared to homozygous APOE carriers at the low-risk tertile of the genetic risk score [2].
The search of finding successful AD therapies is among the most challenging and expensive healthcare issue to date. So far, many disease-modifying agents aim to reduce the production of amyloid-beta (Aβ), or target a specific but single part of the disease process [17]. Our present study shows that as many as 14 steps are required before AD becomes clinically apparent. This high number of required steps may signal the need to develop multi-domain approaches to target various underlying disease-processes simultaneously in order to halt or deter neurodegeneration.
Several limitations of this study need to be discussed. Firstly, although the use of multistage models produces a number as simple, and concrete result, its exact biological meaning is complex and remains hard to interpret. For instance, multistage models reflect the notion and the trajectory of a single cell or cell lineage to become malignant in several rate limiting steps in cancer research. However, the biological unit and meaning of these independent steps is more variable in the case of AD, as indeed it is for other neurodegenerative diseases such as ALS. This could for instance reflect an essential pathophysiological change in a single neurovascular unit, but could also relate to a key genetic mutation in a single cell or cell lineage. Secondly, the underlying multistage model assumes that disease development is predominantly genetically determined. This means that a certain number of steps, all with a similar exposure time, have to occur before the specific disease manifests clinically. In most instances, this means that the exposure under study must be present at birth or during an individual's early life, such as their genes, ethnicity, sex or environmental factors present from birth onwards. This leaves little room for the incorporation of environmental factors that start later in life, such as smoking. While AD has a strong genetic component, [2]. the importance of lifestyle and environmental factors is also substantial [18,19]. These factors remain however in part unaddressed in the current multistage models. Some studies in cancer epidemiology have tried to model these effects in more complex multistage models, but the results of these models turned out to be difficult to interpret and are currently poorly validated [10]. Since this is the first application of the multistage modelling in AD, we relied on a more simple, yet widely used multistage model. Future research is encouraged to incorporate (time-varying) extensions with environ-AGING mental and lifestyle factors. Thirdly, results derived from exploratory analyses amongst participants either homo-or heterozygous for APOE ε4 should be interpreted with caution as these analyses are based on relatively small sample sizes. Fourthly, due to various reasons including for instance selection bias, the presented frequencies of homo-and heterozygous carriers for the APOE ε4 allele in this population-based cohort study (2.8% homozygous, 25.8% heterozygous), may differ from those in the unselected general population [20]. Nevertheless, the frequencies in this study fell within the reported ranges from several other, large population-based cohort studies (Supplementary Table 1). Finally, estimates of multistage models are vulnerable for several artificial influences on the observed incidence patterns, such as community-wide disease screening programs or misclassification of diagnoses at high ages due to restrained diagnostic work-ups [21]. For some diseases, this subsequently could influence the estimation of the slope and thus the number of steps needed for disease onset. We nevertheless minimized these effects by using a cohort study with standardized and consistent AD ascertainment over time with virtually complete followup (>95% of potential person-years).
In conclusion, we found that AD complies with a multistage model characterized by 14 steps that include essential facets of biological change which are required before AD becomes clinically apparent. Moreover, we observed that individuals with a higher genetic susceptibility require less of these additional steps before disease manifests clinically. Future research is warranted to validate the number of steps, to study the effects of environmental and lifestyle factors, and to further investigate the processes underlying these ratelimiting steps. These findings could further increase the understanding of the pathogenesis of AD, which in turn could benefit the development of prevention and treatment strategies.

Study design
This study was embedded within the Rotterdam Study, a prospective population-based cohort designed to study the occurrence and determinants of age-related diseases in the general population. Details regarding the objectives and design have been reported previously [22]. Briefly, in 1990 inhabitants aged ≥55 years from a well-defined suburb in the city of Rotterdam, the Netherlands were invited to participate. To model AD as a multistage process, we excluded participants with a history of any type of dementia at baseline (N=531) and those who were insufficiently screened for dementia (N=637). We further excluded participants who did not provide informed consent to access medical records or hospital discharge letters (N=159). Lastly, participants without information on their APOE genotype (N=964) or AD-associated genetic variants to calculate the genetic risk score (N=1,565) were excluded, leaving 11,070 participants for analyses ( Figure 2).

APOE genotyping and calculation of a weighted genetic risk score
DNA was extracted from blood samples drawn by venepuncture at baseline. APOE genotype was determined using polymerase chain reaction on coded DNA samples in the initial cohort and with a bi-allelic TaqMan assay (rs7412 and rs429358) in the two extensions (RS-II and RS-III). The majority of samples (81.1%) were further genotyped with the Illumina 610K and 660K chips and imputed to the Haplotype Reference Consortium reference panel (version 1.0) with Minimac 3. We included 23 genetic variants that showed genome wide significant evidence of association with AD to calculate a weighted genetic risk score (Supplementary Table 2 for an overview of the included variants) [9,[23][24][25][26][27][28][29][30][31][32][33][34][35][36][37]. This score was calculated as the sum of the products of single nucleotide polymorphism dosages of the 23 genetic variants (excluding APOE) and their respective reported effect estimates. All 23 variants selected for the calculation of the genetic risk score were well imputed (imputation score R2 > 0.3, median 0.99).

Ascertainment methods of dementia
Baseline and follow-up ascertainment methods for dementia have previously been described in detail [19].  [38].

The multistage model
Multistage models originate from cancer epidemiology, where they were first employed to study the age distribution of several cancer types [5,12,39,40]. Within this framework it is assumed that cancer manifests clinically after a certain threshold number has been reached composed of n mutations within one cell. This threshold for disease occurrence in that cell has a certain probability distribution over time (t), e.g. for an AGING individual the n th mutation occurs at age 50, whereas for another individual this n th mutation may occur at age 80. Of the required mutations, (n−1) mutations have independently taken place at a certain point during the lifespan. For each of these mutations, a certain probability per time unit (e.g., year) exists that a mutation will occur ( ). When a cell is primed, such that it has undergone all of these necessary preceding mutations, the final mutation (n th mutation) leads to clinical manifestation of disease. Subsequently, this final n th mutation has to occur after all of these steps and can for example not occur in between preceding steps. So, the probability density function of time-point t, when the n th change takes place is: It was noted in cancer epidemiology that the agespecific incidence rate of cancer ('i') roughly coincided with the probability that at least one cell of all independent cells acquired the necessary number of seven mutations by that specific age. This means that for most types of cancer six preceding rate-limiting steps (n−1) are necessary during the lifespan, with a seventh and final mutation (n th mutation), leading to disease manifestation [41]. It can subsequently be shown that if the disease under study fits a multistep process, the number of these steps ( ) can be estimated with the following formula: in which c is a constant number containing log( 1 2 … −1 ). The common ground of these ratelimiting definitions is that the speed of a reaction step will have a significant effect on the speed of the overall chain of events to which the step belongs [42]. A reaction step is thus subsequently considered a ratelimiting step, when the rate of that particular step is identical to the overall rate of the entire reaction.

Statistical analysis
We applied a multistage model to determine the slope and the number of steps for the development and clinical onset of AD. In line with previous studies, the incidence rate of AD was calculated per five years age categories [5,9]. Each participant contributed personyears to specific age categories, until the age at AD diagnosis or censoring. To minimize the effects of outliers on the slope of the model, we excluded age categories with less than 500 person-years or with an incidence rate below 1 per 1000 person-years given that estimated incidence rates often become instable in the extremes of the age distribution [40]. This additional criterion resulted in an exclusion of 213,530.6 person-years, which corresponded to the exclusion of 1,708 of the 11,070 participants with age at AD or censoring below the first included age category. This left 9,362 participants available for the final analyses ( Figure 2). The incidence rate of AD and the five-years age categories (log age) were natural log-transformed. Linearity was tested based on the adjusted R-squared obtained from a linear regression model with log age and incidence rate of AD as outcome. Linear models were unadjusted.
Additionally, we stratified according to APOE ε4 carrier status and on tertiles of a weighted genetic risk score in mutually exclusive categories of genetic risk and by combining both in order to be able to stratify those individuals with the lowest and those with the highest AD genetic risk.

ACKNOWLEDGMENTS
We acknowledge the dedication, commitment, and contribution of inhabitants, general practitioners, and pharmacists of the Ommoord district who took part in the Rotterdam Study. We acknowledge Frank van Rooij as data manager, and Brenda C.T. Leening-Kieboom as study coordinator.

CONFLICTS OF INTEREST
We declare no competing interests.