Transferability of Alzheimer’s disease progression subtypes to an independent population cohort

In the past, methods to subtype or biotype patients using brain imaging data have been developed. However, it is unclear whether and how these trained machine learning models can be successfully applied to population cohorts to study the genetic and lifestyle factors underpinning these subtypes. This work, using the Subtype and Stage Inference (SuStaIn) algorithm


Introduction
Alzheimer's disease (AD) is a global health and economic burden affecting around 47 million individuals worldwide.Currently, there exists only one FDA-approved drug with some disease-modifying potential in a subset of patients.In general, a key confound intracellular neurofibrillary tangles (NFTs) consisting of the tau protein, decline in brain glucose metabolism, atrophy of grey matter (in particular in the hippocampus) and finally cognitive decline ( Jack Jr et al., 2010 ).This theoretical model of biomarker progression has been supported by longitudinal studies conducted in asymptomatic mutations carriers for autosomal dominant AD ( Bateman et al., 2012 ) and disease progression models applied to cross-sectional studies of sporadic, lateonset AD ( Donohue et al., 2014 ;Young et al., 2014 ;Lorenzi et al., 2019 ;Koval et al., 2021 ;Venkatraghavan et al., 2019 ).However, the evolution of biomarkers in AD and many other brain disorders remains uncertain and there is heterogeneity in the order in which biomarkers show abnormality.For instance, different pattern of hypometabolism, brain atrophy and tau distribution have been linked to AD subtypes with different cognitive profiles ( Laforce et al., 2014 ;Ossenkoppele et al., 2015Ossenkoppele et al., , 2016 ) ).
Studies tackling the heterogeneity can be broadly categorised into two streams: staging methods that emphasize the temporal progressions of disease process ( Bilgel et al., 2016 ;Donohue et al., 2014 ;Young et al., 2014 ), and subtyping methods that focus on identifying distinct groups of patients based on their phenotypic heterogeneity ( Nettiksimmons et al., 2010 ;Scheltens et al., 2017 ;Tijms et al., 2020 ;Whitwell et al., 2009 ).The recently developed Subtype and Stage Inference (SuStaIn) algorithm ( Young et al., al.,2018 ) is an unsupervised learning technique that identifies disease progression subtypes in cross-sectional data.It uniquely defines subtypes by a trajectory of change, thereby avoiding confounds of temporal change and phenotypic difference.Using SuStaIn, Young et al. (2018) identified three data driven atrophy subtypes with distinct temporal progression patterns based on cross-sectional brain MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.Furthermore, within AD, SuStaIn has been successfully used to identify different subtypes of temporal progression of NFTs and amyloid plaques ( Aksman et al., 2020 ) as well as spatio-temporal spread of cortical tau ( Vogel et al., 2021 ).Beyond AD, SuStaIn has identified atrophy subtypes in genetic frontotemporal dementia ( Young et al., 2018( Young et al., , 2021 ) ) reflecting the causative mutation, in multiple sclerosis ( Eshaghi et al., 2021 ) and in Chronic Obstructive Pulmonary Disease ( Young et al., 2020 ).The reason for the emergence of distinct disease subtypes in AD remains elusive and is likely linked to genetic risk factors, co-morbidities, and environmental factors; many of which may act upstream of amyloid-.
While recent studies have uncovered many factors that increase the risk for developing AD such as diabetes ( Andrews et al., 2020 ), blood pressure complexity and variability ( Ma et al., 2019 ), various lifestyle factors including smoking status, alcohol consumption, physical, social and leisure activities ( Lourida et al., 2019 ;Sommerlad et al., 2019 ;Yates et al., 2016 ) and genetic risk factors ( Kunkle et al., 2019 ), no such study has yet been performed for AD progression subtypes.One of the challenges is that available datasets are either disease-focused (with deep phenotyping using disease-related biomarkers, but lacking decades of pre-clinical data) or are (longitudinal) population cohorts (with deep phenotyping on risk factors, but lacking disease-related biomarkers).Thus, neither dataset can be used to train subtyping tools and to explore early risk factors influencing subtype development.To tackle this challenge, we aim to connect datasets that offer windows on different phases of disease development.We proposed to train atrophy-based subtyping models on disease-focused datasets such as ADNI and then 'transfer' them to population cohorts such as UK Biobank (UKBB).This is not a form of transfer learning in the traditional sense, where a model trained on one dataset is adjusted to solve a different, related task on a different dataset.To quantify the success of a model transfer, we define consistency of subtype and stage assignments at subject-level between the transferred model and a reference model, which was trained on the target dataset.Successful model transfers will enable the investigation of subtype-related risk factors that go beyond what can be discovered in each dataset individually.

Methods
Working across the ADNI and UKBB datasets raises many interesting research questions: (1) do consistent atrophy subtypes emerge from these two datasets?(2) how to eliminate cohort effects on biomarkers for the two datasets while preserving biological information?(3) can a model trained on ADNI effectively subtype and stage participants in UKBB?Fig. 1 gives a summary of the analysis strategy as well as the datasets used for modelling and the models built on the datasets.We briefly outline the overall organisation of the analysis with detailed information being provided later in the Methods section.We addressed these issues by first comparing the SuStaIn models for measures of cortical and subcortical volumes trained separately on the original ADNI dataset and an AD-at-risk population constructed from UKBB dataset.Then we applied the harmonization technique ComBat ( Johnson et al., 2007 ;Fortin et al., 2017 ) to these measures of cortical and subcortical volumes to remove the cohort effect while preserving the biological covariates effects.We next trained the SuStaIn model on the two harmonized datasets.We hypothesized that the ADNI dataset, due to its focus on AD pathology, would constitute the better training set and that harmonization would be critical to a successful model transfer.In total, we trained four SuStaIn models: an ADNI SuStaIn, a UKBB SuStaIn, a harmonized ADNI SuStaIn and a harmonized UKBB SuStaIn.For each subject in the two datasets, we compared the subtype and stage assignments under three models, namely, the model trained on the original dataset it belongs to and the models trained on the two harmonized datasets.In addition, to highlight the central role of harmonization, we also applied the ADNI SuStaIn model to the unharmonized UKBB data and studied the UKBB subjects' subtype assignments under the ADNI SuStaIn model and under the UKBB SuStaIn model.
Finally, to illustrate the usage of transferring SuStaIn models, we studied associations of AD subtypes with factors including: age, the CSF biomarkers ( A , tau and p-tau), and medication history variables.

ADNI dataset
Data used in the preparation of this article was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database ( http://adni.loni.usc.edu ).ADNI was initially launched in 2004 led by Principal Investigator Michael W. Weiner, MD, funded as a publicprivate partnership, that has made major contributions to AD research, enabling the sharing of data between researchers around the world.The primary goal of ADNI has been to support studies of AD early detection and tracking disease's progression using biomarkers including magnetic resonance imaging (MRI), positron emission tomography (PET) and clinical and neuropsychological assessments.For up-to-date information, see www.adni-info.org .
Demographic information (i.e., age, sex, intracranial volume [ICV] and years of education), diagnostic labels (CN: cognitively normal, SMC: substantial memory complaint, EMCI: early mild cognitive impairment, LMCI: late mild cognitive impairment, AD: Alzheimer's disease), and CSF biomarker values ( A , tau and p-tau) were extracted from the table 'ADNIMERGE'.MRI-based features were extracted from T1-weighted MRIs obtained on 3T scanners at baseline for ADNI1/GO/2 participants.The scans were processed with FreeSurfer 5.1 to obtain 80 cortical and subcortical regional measures based on the Desikan-Killiany atlas.Only scans that passed overall quality control carried out by ADNI were retained (see here for details: http://adni.loni.usc.edu/methods/mritool/mri-analysis/ ).

UKBB dataset
This research has been conducted using the UK Biobank under application 70047.
UK Biobank is a major UK collaborative research project that was initially established as a charitable company in 2003.The large-scale Fig. 1.Analysis overview .To explore the transferability of SuStaIn models we train models on different datasets with or without using feature harmonization (A).The training data for SuStaIn is either the original ADNI dataset or a high-risk AD population constructed from the UKBB.The resulting models are referred to as ADNI SuStaIn and UKBB SuStaIn models, respectively.This allows us to investigate how similar the models from the two datasets are (B).Next, these models were applied to assign subtype and stage to individuals within the same dataset to create a subtype and stage reference (C).To explore the transferability of SuStaIn models we first harmonized the features in the training datasets using ComBat (A).Then two additional SuStaIn models were trained on the harmonized versions of the two datasets.The resulting Harmonized ADNI SuStaIn and Harmonized UKBB SuStaIn models were used to assign subtype and stage to individuals within and across the two harmonized datasets (D).This setup enabled us to quantify the consistency of using un-harmonized and harmonized training data as well as assessing the consistency of subtype and stage assignment on the subject level.Our study suggested to use the Harmonized ADNI SuStaIn model for individual subtype and stage for both datasets for further subtype and risk factor association study.
biomedical database and research resource, containing in-depth genetic and follow-up health information from half a million UK participants aged between 40 -69 years, has enabled scientific researchers to investigate meaningful associations between subgroups of the population and certain diseases, which further allows improvement of healthcare according to the varying causes, prognosis and response to treatment of common diseases including determining the risks of disease in different groups, providing direct evidence of the scope for prevention and leading to the development of improved diagnostic methods and treatments.For up-to-date information, see https://www.ukbiobank.ac.uk .
We downloaded the regional volume data based on T1-weighted MRI scans (80 cortical and subcortical volume metrics based on the Desikan-Killiany atlas) from UKBB.All scans were acquired on the same type of 3T scanner (Siemens Skyra) and processed with FreeSurfer 6.0 and FreeSurfer 7.0 for cortical and for subcortical regions, respectively (see http://www.fmrib.ox.ac.uk/ukbiobank/protocol/V4_23092014.pdf and Alfaro-Almagro et al., 2018 ).Quality control was carried out prior to the upload to UKBB and measures for the image data include signalto-noise-ratio for individual modalities and 'discrepancy' between a given pair of images after they have been co-registered (for details see: https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf ).To define control and AD-at-risk populations for the UKBB dataset, we also obtained AD-related variables including number of APOE- 2 and APOE- 4 alleles (from genotyping data), family history of dementia, self-reported neurological or psychiatric disorders and cognitive test results.From the eleven cognitive tests available in UKBB, we focused on three tests that assess reaction time (UKBB RT), executive function (UKBB Trail Making Test A and B) and delayed memory (UKBB Prospective Memory Test [PMT]).The UKBB PMT is a simple test, where early during the touchscreen cognitive section, the participant is shown a message with an instruction for a later point in the session.The first and final answer, the history of attempts and the time it took to answer the later question are recorded.To illustrate the practical benefit of transferring trained SuStaIn models and being able to leverage the deep phenotyping of other non-AD cohorts, we further obtained the data on cholesterol and blood pressure medication use from UKBB.

Defining control, case (training) study populations in ADNI and UKBB
For the ADNI dataset, the control set consisted of SMC and CN subjects with negative amyloid, where amyloid negativity was defined based on Florbetapir PET using the ADNI criteria, i.e., cortical SUVR normalized by whole cerebellum < 1 .11.We regarded all the subjects passing overall quality control and who were not in the control dataset as cases (training set) for SuStaIn modelling.
To assess the quality of the model transfer we require a SuStaIn model trained on the UKBB data to establish a subtyping reference.However, the UKBB dataset lacks AD-specific phenotyping such as diagnostic labels and CSF or PET measurements.Consequently, we can neither define a control set free of AD pathology nor a training set with a clear AD pathology.However, SuStaIn is an unsupervised method and therefore it is sufficient to enrich the training for participants with beginning AD pathology to form AD-related subtypes.Likewise, the control set is used to standardize biomarkers and therefore needs only to be 'mostly healthy' as a small number of subjects exhibiting pathological aging is unlikely to substantially affect the z-scoring.We therefore use the following definitions for a 'healthy control' set and 'AD-at-risk' training set.
The control set of UKBB consisted of subjects not carrying the APOE- 4 allele and without family history of dementia, or self-reported neurological or psychiatric disorders (see Table S1 in the supplementary materials for the list of neurological or psychiatric disorders examined), and who performed well in all the AD-related cognitive tests according to the following set of standards: (1) no error made in the two trail making tests; (2) time to complete the trail making path is less than the UKBB sample median; (3) the mean time to correctly identify items in the reaction time test is less than UKBB sample median; (4) had the correct recall in the UKB PMT on first attempt and the time to answer is less than the first decile of the UKBB imaging subset.The 'AD at-risk' training population was defined as the union of two sets: (1) subjects with high genetic risk (i.e., at least one family member with dementia and APOE- 4 homozygotes); (2) subjects with high cognitive risk according to the following criteria: (i) number of errors made in trail making is larger than four (which is the 90th percentile) and (ii) not being able to correctly recall on the first or the second attempt in the UKBB PMT, plus self-reported AD subjects and excluding those with non-AD self-reported neurological or psychiatric disorders.

Data harmonization
Applying machine learning models, which were trained on one training dataset, to a new application dataset that was generated under different conditions (scanner hardware, scanning protocols, and scan site etc.) is hampered by a change in distribution in the derived features.In our case, both ADNI and UKBB brain T1-weighted MRI data were acquired on 3T scanners.However, the ADNI data was processed with FreeSurfer 5.1, while the UKBB data was processed with FreeSurfer 6.0 and FreeSurfer 7.0.Both datasets provide the same 80 cortical and subcortical volume metrics.In addition, there are biological differences, e.g., the ADNI cohort is older on average (see Table 1 for more details on the demographics).Our first goal was to harmonize the 80 cortical and subcortical volume metrics with the aim to remove the differences in cohorts due to variations in acquisition and processing and to preserve the biological information.Of note, the ADNI cohort features different sites and scanners, and previous work demonstrated that harmonization across sites improves machine learning models ( Chen et al., 2020 ).
We selected the ComBat harmonization technique to eliminate the cohort effect after following a similar analysis suggested by Fortin et al. (2018) (see Supplementary materials).

Data pre-processing
The computational time complexity of the SuStaIn algorithm scales exponentially with the number of input features.Thus, to make computations feasible, the 80 cortical and subcortical regional volumes from the Desikan-Killiany atlas were grouped to derive the volume of 13 regions (accumbens area, amygdala, caudate, hippocampus, insula, pallidum, putamen, thalamus, cingulate, frontal lobe, parietal lobe, temporal lobe, occipital lobe) according to the lobe groupings provided by FreeSurferWiki ( https://surfer.nmr.mgh.harvard.edu/fswiki/CorticalParcellation ).For each of the 13 regional volume variables, we removed the effect of age, sex and ICV using a linear regression model.Then we expressed the values of these variables as z-scores relative to the cohort's control population (see above).The z-scores' signs were adjusted such that increases correspond to increasing pathology.

Subtype and stage inference (SuStaIn)
The SuStaIn algorithm, introduced by Young et al. (2018) , uses unsupervised learning to uncover data-driven disease subtypes with distinct temporal progression patterns.SuStaIn is a continuous generalisation of the event-based model (EBM; Fonteijn et al., 2012 ;Young et al., 2014 ).While the EBM describes disease progression as a single event sequence (  1 ,  2 , ...,   ) where each event is defined as the value of a biomarker turning from normal to abnormal, SuStaIn allows for multiple event sequences.Each event is the value of a biomarker reaching/exceeding a z-score cut-off, each sequence defines a disease progression subtype, and the position in an event sequence defines disease progression stage as accumulated events.During training, SuStaIn simultaneously finds the optimal solutions of subtype membership, subtype trajectory and the posterior distributions of both.In brief, one can think of SuStaIn as a clustering algorithm that takes the progressive nature of the disease into account.
The uncertainty in the subtype progression patterns and the proportion of individuals belonging to each subtypes is estimated using a Markov Chain Monte Carlo (MCMC)-based procedure ( Young et al., 2018 ).The sequence, or event order, is visualized and examined in a positional variance diagram (PVD) chart.For further mathematical details see the Supplementary Materials and Young et al. (2018) for a detailed description.
Like other clustering machine learning models, the number of subtypes in SuStaIn is a hyperparameter and the optimal number of subtypes is generally unknown beforehand.To find the optimal number of subtypes for a SuStaIn model, we used tenfold cross-validation to compute the Cross-Validation Information Criterion (CVIC) ( Gelman et al., 2014 ), which captures out-of-sample generalizability.We selected the number of subtypes by comparing the values of CVIC as well as inspecting the PVDs.
To assign subjects to subtypes and stages, SuStaIn first chooses the subtype with the highest likelihood (after evaluating the likelihood that a subject belongs to each subtype by summing over all disease stages).Next, it selects the stage with the highest likelihood within that subtype.We expect small subtype maximum likelihoods (i.e., low subtype certainty) for subjects assigned to stage zero or to late stages.The zerostage subjects are those with no obvious abnormalities in the features, therefore there is not enough signal to confidently support any subtype.Likewise, subjects at late stages of the disease progression have abnormalities distributed across nearly all brain regions in which case the differences in the MRI patterns between subtypes become less distinguishable.

SuStaIn models
We applied SuStaIn to the ADNI, harmonized ADNI, UKBB and the harmonized UKBB training populations, respectively.The z-score events and the distribution of biomarkers in the four training datasets are listed in Supplementary Table S2.For each of these training populations, the SuStaIn model was fitted up to a maximum of five subtypes and we obtained four SuStaIn models: ADNI SuStaIn, Harmonized ADNI SuStaIn, UKBB SuStaIn and Harmonized UKBB SuStaIn.

SuStaIn subtyping and staging
For each subject in the ADNI and UKBB datasets, we assigned subjects to subtypes and stages predicted by three SuStaIn models ( Fig. 1 ).For example, for a subject in the ADNI dataset, subtype and stage assignment was computed using ADNI SuStaIn, Harmonized ADNI SuStaIn and Harmonized UKBB SuStaIn, respectively.To quantify transferability, we measured the consistency of subtype and stage assignments by different models.

Association studies
In our association study, we focused on individuals with atrophy patterns that can be reliably assigned to a SuStaIn, i.e., non-zero stages and high subtype certainty ( Prob > 0.8) under a model, and we compared the individuals' subtype and stage assignments under different SuStaIn models.Additionally, we repeated the analysis at a lower subtype certainty ( Prob > 0.5).
We investigated the hypothesis that AD atrophy subtypes suggested by SuStaIn are associated with risk factors and biomarkers including age, cerebrospinal fluid (CSF) biomarkers and medication history.We applied the Kruskal-Wallis test to study the association between SuStaIn subtypes and age and CSF biomarkers.In addition, we used linear regression to adjust for age and sex when testing for associations between subtype and CSF biomarkers.Chi-Squared Tests, contingency tables and logistic regressions were used to assess the associations of SuStaIn subtypes and cholesterol and high blood pressure medications, before and after controlling for age and sex.

Results
Our ADNI sample size consists of 795 subjects in total (183 CN, 86 SMC, 244 early MCI, 164 late MCI and 118 CE at baseline).Of these, 163 amyloid-negative SMC and CN subjects were defined as a control set and the remaining 632 subjects were used for SuStaIn model training.In the UKBB dataset, we have 36,494 subjects with MRI scans.According to our training set and control set criteria, 616 subjects were selected as the training population for SuStaIn model fitting (age ranges from 48 to 81 with a median of 68; 319 females; 288 had at least one family member with dementia and were APOE- 4 homozygotes; 333 performed poorly in the AD-related cognitive tests); 262 subjects (age ranges from 48 to 77 with a median of 65; 117 females; with no family member reporting dementia and not carrying an APOE- 4 allele) formed the control set ( Table 1 ).
Our data harmonization analysis showed that all three harmonization techniques performed well in reducing site effects.However, Com-Bat presented its unique advantage in harmonizing the variances in the features over other harmonization techniques (Figs.S1, S2, Tables S3-S10).

Three consistent subtypes with distinct atrophy patterns were identified from ADNI and UKBB before and after data harmonization
We first investigated whether harmonization affects the estimated models on the same dataset and whether training using a disease group (ADNI) and the AD at-risk group (UKBB) result in similar SuStaIn models.We summarised the findings in Fig. 2 , which shows the PVDs from the SuStaIn models with 13 biomarkers.
The CVIC on the ADNI model suggested three subtypes, in line with the original work ( Young et al., 2018 ).The CIVC on the UKBB model suggested four subtypes (Fig. S1).However, two of the subtypes in the UKBB model were nearly identical (Fig. S2).This was supported by a Bhattacharrya coefficient (BC) of 0.99 between these subtypes (a BC of 1.0 indicates identical distributions).Thus, we also selected three subtypes for the UKBB models.Moreover, having the same number of subtypes simplifies the comparison between ADNI and UKBB subtypes, which is a key objective of this work.
These three subtypes agree with the original analysis by Young et al. (2018) : the 'typical' subtype has atrophy starting in the hippocampus and amygdala, atrophy in the 'cortical' subtype originates in the lobes, cingulate and insula, while in the 'subcortical' subtype atrophy is first observed in the pallidum, putamen and caudate.
Further, harmonization did not change the subtype patterns in a dataset as can be seen from the almost identical PVDs for each dataset before and after harmonization in Fig. 2 .We also noted that for all subtypes emerging from each of the SuStaIn models, the positional variance, which corresponds to the level of uncertainty, increases as disease progresses to later stages, which will increase staging uncertainty of the model towards the end of the disease trajectory.Fig. 3 depicts the disease progression pattern rendered as brains for each subtype in the Harmonized ADNI SuStaIn and Harmonized UKBB SuStaIn models.Further, Fig. 4 shows the correlation of the most likely biomarker ordering for each subtype between the models trained on ADNI and UKBB data.The correlation was stronger for the typical ( r = 0.85) and cortical ( r = 0.91) subtype compared to the subcortical subtype ( r = 0.61).

Individual subtype and stage assignments under different sustain models were highly consistent
After observing that the estimated models are similar between datasets and are not affected by harmonization, we investigated whether the subtype and stage assignments are consistent on the subject level.A high degree of agreement in both dimensions indicates a successful model transfer.

High subtype consistency
Over 91% of subject assignments to subtypes are consistent across three models built on different datasets.
Table 2 shows the subtype consistency for ADNI participants between the Harmonized ADNI and Harmonized UKBB models.As a silver standard, we focused on the non-zero-stage population with high certainly subtype assignments ( > 0.8) under a model.For the 320 subjects that meet the silver standard (i.e., non-zero stage with high certainly subtype assignments) under at least one of the two models, 292 have identical subtype assignments between the two models, leading to a 91% consistency.Requiring the silver standard only for subtyping by the Harmonized ADNI (UKBB) model, the consistency increases to 96% (92%).Further, requiring the silver standard by both models, only one out of 153 participants received a discordant subtype assignment.
Similarly, for UKBB subjects Table 2 suggests a 95% subtype consistency under the two harmonized models.For the 9865 subjects that meet the silver standard under at least one model, 9356 have identical subtype assignments between the two models (95% consistency).Requiring the silver standard only for subtyping by the Harmonized ADNI (UKBB) model, the consistency increases to 98% (95%).Further, requiring the silver standard by both models, only 12 out of 4845 participants received a discordant subtype assignments (99.8% consistency).
Using the lower subtype certainty cutoff at 0.5, doubled and tripled the number of subtyped participants in ADNI and UKBB, respectively.While the consistency decreased by about 10%, it remained above 80% (Table S12).
To further highlight the importance of data harmonization when applying a SuStaIn model trained on an alternative dataset to another, we examined results of UKBB individual subtype and stage assignments under the ADNI SuStaIn model without any data harmonization (Table S3).Only 4% of the total population (1449 subjects) were staged as non-zero, and 1.7% (631 subjects) received a non-zero stage with high subtype certainty -compared to 19% (6956 subjects) when using harmonization.Only 424 of 631 (67%) participants received concordant subtype assignments with the UKB SuStaIn model -compared to 6838 of 6956 (98%) when using harmonization.Thus, harmonization results in a larger fraction of successfully subtyped participants by the transferred model and it increases the concordance between subtype assignments by the original and by the transferred models.Fig. 2. Three consistent subtypes were identified by all SuStaIn models .In the Positional Variance Diagrams (PVDs) of the models each row corresponds to a biomarker (regional brain volume) and each coloured entry in a PVD marks the probability that the biomarker has surpassed an event score (here: z-scores) threshold.The figures display z-score thresholds of 1.0 (red), 2.0 (magenta) and 3.0 (blue).While a higher z-score reflects a more severe abnormality of the biomarker, a low positional variance, which is marked in a clearer colour, and is typically observed at the early stage of a diagram, associates with a higher degree of certainty.2) Indistinguishable patterns between subtypes in the late stages of disease trajectory evidenced by the increasing positional variance as disease progresses to later stages; (3) Focusing on the early stages, three consistent subtypes were identified by all SuStaIn models: the 'typical' subtype has atrophy starting in the hippocampus and amygdala, atrophy in the 'cortical' subtype originates in the lobes, cingulate and insula, while in the 'subcortical' subtype atrophy is first observed in the pallidum, putamen and caudate .

High stage consistency
Fig. 5 shows the staging consistency across the models for all 795 ADNI and all 36,493 UKBB subjects regardless of their subtype certainty.The harmonization did not influence the staging with the ADNI model (r 2 = 0.999; .By comparison, using the unharmonized ADNI model to stage UKBB participants the correlation with UKBB SuS-taIn staging drops to r 2 = 0.28.Further, on average ADNI subjects received higher stages by Harmonized UKBB model compared to the Harmonized ADNI model (regression slope > 1), while this observation was reversed for UKBB subjects (regression slope < 1).
In contrast to using all available subjects, Fig. 6 compares the Harmonized ADNI and Harmonized UKBB models for non-zero stage subjects and with high certainty subtype assignments ( > 0.8) under the Harmonized ADNI Model and suggests comparable consistency between these two models.

Participants atrophy subtype was found to be associated with age, CSF biomarkers, cholesterol and high blood pressure medications
The previous results demonstrated that harmonization enables us to transfer models across datasets.Next, we investigated how risk factors and biomarkers are associated with subtypes under the harmonized SuS-taIn models.Table 3 shows an overview of the demographics of ADNI and UKBB subjects assigned to each subtype that meet the silver standard under the Harmonized ADNI SuStaIn model (i.e., with non-zero stage and subtype certainty > 0.8).In the ADNI dataset, the association study population has 219 subjects and in the UKBB dataset, there are 6956 subjects.At the more lenient subtype certainty cutoff of 0.5, there were 455 and 17,315 subtyped subjects in ADNI and UKBB, respectively.
The average age is highest in the typical subtype and lowest in the subcortical subtype.In both the ADNI and UKBB datasets, there was a statistically significant age difference between the typical and the subcortical subtypes (p-value = 0.001 in ADNI and p < 0.0001 in UKBB; Fig. 7 top row), where subjects assigned to the typical subtype were older com-pared to subjects with the subcortical subtype.Similar results hold for subjects that meet the silver standard under the Harmonized UKBB SuS-taIn model ( Fig. 7 ; bottom row).
Subjects assigned to the typical subtype exhibit lower A  and higher tau and p-tau values.We further compared the 185 (82 females) ADNI subjects with CSF measures which consisted of 99 subjects in the typical subtype, 65 in the cortical subtype and 21 in the subcortical subtype.The Kruskal-Wallis H-Test showed significant differences amongst the subtypes and the typical subtype subjects were associated with more AD-like A  and tau measures ( Fig. 8 ; top row).This result remained significant after controlling for age and sex in a regression model: A      (p = 0.014), tau (p = 0.026) and p-tau (p = 0.001) .Conclusions remained unchanged when using the Harmonized UKBB SuStaIn model to subtype the ADNI subjects ( Fig. 8 ; bottom row).Furthermore, the biomarker associations were not sensitive to the subtype certainty threshold (Table S13).
In comparison to the subcortical subtype, the cortical subtype subjects are more likely to associate with cholesterol and high blood pressure medications.SuStaIn subtypes (as assigned by the Harmonized ADNI model) differed both in cholesterol and high blood pressure medications ( Table 4 ; p < 0.0001).Specifically, using regression models with adjustment for age and sex, compared to the cortical subtype, the subcortical subtype demonstrated a lower proportion using both medications (cholesterol: p = 0.005, OR = 0.82; blood pressure p < 0.0001, OR = 0.75).The same holds true for the typical subtype compared to the cortical subtype (cholesterol: p = 0.012, OR = 0.80; blood pressure p = 0.001, OR = 0.74).Similar observations were drawn using the Harmonized UKBB model for subtyping ( Table 4 ).However, after controlling for age and sex, the associations were only significant for blood pressure medication in the subcortical subtype ( p < 0.0001, OR = 0.32).Again, the subtype certainty threshold did not affect these conclusions (Table S14).

Discussion
In this work, we presented results of building subtyping and staging models across ADNI and UKBB that represent different time windows during the disease to better investigate the link between AD atrophy subtypes and early risk factors.We proposed that a subtype model (i.e., a SuStaIn model) is first built on the dataset with detailed disease-related information (e.g., ADNI), and then applied to the UKBB dataset for individual subtyping and investigating of early risk factor associations.
In previous studies, it was pointed out that ADNI participants were not representative of the wider general population so that datadriven studies on ADNI data may not necessarily ensure generalizability ( Veitch et al., 2021 ).Further, the heterogeneity in cohort datasets in the reproducibility of data-driven results has been discussed ( Birkenbihl et al., 2021 ).Our findings in this study answered many practical questions raised in training and transferring data driven disease progression models across datasets: (1) consistent subtypes emerged from the ADNI cohort and the UKBB AD-at-risk cohort, (2) ComBat, a data harmonization technique, helped to remove cohort effects on biomarkers for these two datasets, and (3) a model trained on ADNI can be robustly applied to UKBB after appropriate data harmonization.
Across various pre-processing settings and datasets SuStaIn identified three subtypes characterised by distinct temporal patterns of grey matter volume changes.These subtypes differ in their origin of atrophy and are consistent with previous analyses on the ADNI dataset ( Young et al., 2018 ).The subtypes are referred to by their characteristic pattern of temporal progression as 'typical', 'cortical' and 'subcortical'.The three subtypes were explored in depth in their cognitive and biomarker profile in research datasets as well as in clinical datasets ( Archetti et al., 2021 ;Young et al., 2018 ).
One of our interesting findings is that the SuStaIn model built on the AD-at-risk population constructed from the UKBB dataset resulted in the same atrophy subtypes as those on ADNI; obviously with a different prevalence between the cohorts.This is remarkable, since there were no subjects with a clinical AD diagnosis among the UKBB training set.This finding first supports the subtypes per-se as an existing pattern, and secondly, alludes to the possibility to observe the emergence of different atrophy patterns during the prodromal phase of Alzheimer's disease.
Transferring machine learning models to novel datasets in general is an ongoing challenge due to bias and covariate shifts occurring during data acquisition.It was demonstrated in a recent study that SuStaIn could be transferred from ADNI to a mixture of clinical and pre-clinical cohorts including OASIS, Pharma-Cog and ViTA ( Archetti et al., 2021 ).Different to OASIS, a relatively small disease cohort that covers the full spectrum of the disease progression like ADNI, UKBB is a large dataset that represents a much earlier time window even before the disease starts to emerge.Thus, our work suggests a possible route to further investigating the link between AD atrophy subtypes and early risk factors that are collected in large population cohorts such as UKBB.
To facilitate the analysis across ADNI and UK Biobank, we applied ComBat, a data harmonization technique to the two datasets and we demonstrated that harmonization is important to successfully transfer the trained models.In our harmonization analysis, we studied the impact of different cohorts (ADNI vs UKBB) where cohort differences originate from biological variation (age, disease status etc.) and technical variation (scanner manufacturer, scanning protocol, processing pipelines).Since the FreeSurfer version and cohort have a one-to-one mapping relationship in this study, the impact of differing FreeSurfer versions is addressed as part of our data harmonization study -see details in the supplementary materials (e.g.,Figs S3,. Adding harmonization to the biomarker pre-processing pipeline did not affect the estimated models: models estimated on the same dataset before and after harmonization where we found almost identical (99% similarity).This was expected since SuStaIn converts biomarker values to z-scores with reference to a control population.Thus, any shift or scaling applied to each feature will be removed and the resulting models are identical.However, the harmonization removes cohort-specific bias and therefore mainly enables the successful transfer of the SuStaIn model to external datasets, according to our definition of a successful transfer as outlined in the Introduction section.We further showed that harmonization was essential to achieve a high consistency in individuals' subtypes and stage assignment based on the different models.While subject-level subtype consistency was generally very high across experiments ( > 91%), the correlation of the subject-level stages was lower (e.g., r 2 = 0.68 on the UKBB data) but was a considerable improvement from using unharmonized models (r 2 = 0.28).The lower staging agreement has two main sources.Firstly, although the progression sequences estimated for each subtype from ADNI and UKBB were similar, they were not identical ( Fig. 4 ).Secondly, discordant subtyping by models negatively impacts the agreement between staging.
A further alternative is to combine the harmonized datasets prior to model training.Such a model could potentially add value to situations where there are not enough early-stage subjects in the disease cohort.Indeed, when testing this approach, we observed that (1) the three subtypes are consistent with other models, and (2) stage assignments are highly consistent with both the Harmonized ADNI SuStaIn and Harmonized UKBB models (data not shown).
Linking the patterns of MRI abnormality in the subtypes to clinical phenotypes, other biomarkers, comorbidities, genetic and life-style factors could provide new insights into disease mechanisms.Thus, understanding the association of these factors with subtypes emergence may aid in stratifying patients at early stages for precision medicine and multidomain interventional studies such as those explored in the FINGER trial ( Ngandu et al., 2015 ).We investigated the associations of subtypes with a few selected variables from ADNI and from UK Biobank.On ADNI we focused on associations with other known AD-related biomarkers such as levels of A  and tau in the CSF.Our results suggested that the typical subtype was associated with statistically worse CSF biomarkers values (i.e., more AD-like) in comparison to the other two subtypes, before and after controlling for age and sex.This finding is in line with previous observations that MCI subjects assigned to the typical subtype

Table 4
The cortical subtype subjects are more likely to associate with cholesterol and high blood pressure medications compared to the subcortical subtype.Results remain significant after controlling for age and sex .For the non zero-stage UKBB subjects with high subtype certainty ( > 0.8) under the Harmonized ADNI model (6956 subjects), there were significant differences between SuStaIn subtypes in both cholesterol and high blood pressure medications ( p < 0.001).Specifically, using regression models with adjustment for age and sex, compared to the cortical subtype, the subcortical subtype has the lower proportion associated with both medications (cholesterol: p = 0.005; blood pressure p < 0.001).The same held true for the typical subtype compared to the cortical subtype (cholesterol: p = 0.012; blood pressure p < 0.001).Some of these observations were drawn using the Harmonized UKBB and UKBB models although after controlling for age and sex, the associations were only significant for blood pressure medication in the subcortical subtype ( p < 0.001).Significant results related to subtype assignments are highlighted in bold font.The column 'status' indicates the status of medication usage (0 = no, 1 = yes).Percentages refer to the medication users per subtype.

Chi-Squared Tests and Contingency
have a highest risk to progress from MCI to AD ( Young et al., 2018 ).
In UKBB, we focused on variables that may contribute to the subtype emergence but are not readily available in ADNI.Therefore, showcasing the benefit of transferring SuStaIn models.Our results suggested that the subcortical subtype was associated with a statistically lower likelihood of cholesterol and high blood pressure medication use in comparison to the cortical subtype, before and after controlling for age and sex.
While early hippocampal involvement in the typical subtype may explain more severe CSF biomarkers pathology, the reason for participants assigned to the cortical subtype having higher likelihood of cholesterol and high blood pressure medications use is unclear.Previous studies have demonstrated a link between cholesterol and AD pathology ( Di Paolo and Kim, 2011 ) and statins, a class of cholesterol lowering drugs, are routinely used in the treatment of AD patients.Likewise, midlife elevated blood pressure was associated with increased atrophy in brain regions linked to AD ( Lane et al., 2019 ;Power et al., 2016 ).More broadly, hypertension has been identified as one of the modifiable midlife risk factors contributing to AD ( Livingston et al., 2020 ).Potential explanations for the linkages between cholesterol and high blood pressure and AD subtypes could be that hypertension is associated with specific lower regional grey matter volume ( Gianaros et al., 2006 ;Schaare et al., 2019 ).Further studies are needed for a better understanding of these associations.
A key limitation in this work is that, due to the lack of such information in the dataset, for models built in UKBB, we were unable to use AD diagnosis (CN, MCI or AD), AD biomarkers such as CSF and PET measurements, or cognitive tests tailored for diagnosing AD in the construction of control and training sets as well as in the subtype evaluation.We used variables that are highly correlated with the disease including the numbers of APOE- 4 allele, family history of AD cognitive tests available to identify groups that are least and most likely to develop the disease as the control and training sets, respectively.We also excluded subjects with non-AD self-reported neurological or psychiatric disorders.Also, we transferred the models only between two datasets.Future work on transferring SuStaIn to additional datasets would help to generalize the findings from this study.
Another limitation is that all the harmonization methods examined and employed in this work attempt to capture mostly the linear relationship between cohorts and volume metrics.Improvements over Com-Bat that consider the co-variance of features have been proposed (e.g., CovBat ( Chen et al., 2020 )).Furthermore, while ComBat demonstrated strong efficacy in removing cohort effect for data in ADNI and UKBB, it does not support an efficient extension of the analysis to new datasets.In practice, when applying SuStaIn built on ADNI to a new dataset, one would need to repeat the process of first harmonizing the new dataset with ADNI.Future study of harmonization methods could include Hierarchical Bayesian Regression (HBR) ( Kia et al., 2020 ) which allows applying a SuStaIn model on new dataset without repeating model fitting processes.
While the methodology presented in this paper was based on volume metrics, in principle following the same methodology, one can generalise to other brain morphological measures (e.g., thickness metrics) and to non-imaging metrics.In fact, the work by Fortin et al. applied ComBat to harmonize cortical thickness ( Fortin et al., 2018 ) and diffusion tensor imaging metrics ( Fortin et al., 2017 ).Thus, if such features were useful within a SuStaIn model, then they could be successfully harmonized.
In conclusion, to study the association of AD atrophy subtypes with a broad range of risk factors, we proposed to harmonize the disease cohort and the aging cohort using ComBat, and to apply the SuStaIn model on the disease cohort which is used to subtyping and staging subjects in the aging cohort.Our methodology enabled further detailed investigations on the linkage of clinicopathological late-stage to earlier risk factors, which has the potential to lead to a better understanding of the disease aetiology and to help avoid lifestyle and behaviour choices that puts individuals at risk.

Data and code availability statement
The data used in this work are available upon application to the Alzheimer's disease neuroimaging imitative (ADNI; http://adni.loni.usc.edu/ ) and the UK BioBank (UKBB; https://www.ukbiobank.ac.uk/ ).The SuStaIn algorithm implemented in Python (pySuStaIn) is open source and available at https://github.com/ucl-pond/pySuStaIn .Further analysis scripts are available upon request.
Fig.2.Three consistent subtypes were identified by all SuStaIn models .In the Positional Variance Diagrams (PVDs) of the models each row corresponds to a biomarker (regional brain volume) and each coloured entry in a PVD marks the probability that the biomarker has surpassed an event score (here: z-scores) threshold.The figures display z-score thresholds of 1.0 (red), 2.0 (magenta) and 3.0 (blue).While a higher z-score reflects a more severe abnormality of the biomarker, a low positional variance, which is marked in a clearer colour, and is typically observed at the early stage of a diagram, associates with a higher degree of certainty.For instance, in the Harmonized ADNI PVD (b), hippocampal volume reaches high abnormality (z-score > 2, magenta) very early (stage 3) in the typical subtype with high certainty (low positional variance, a clear magenta).The PVDs correspond to different training datasets (ADNI and UKBB) and harmonization options: ADNI (a), harmonized ADNI (b), UKBB (c) and harmonized UKBB (d).We learned from the following four PVDs that: (1) harmonization did not change the subtype patterns in a dataset (see (a) and (b); (c) and (d)); (2) Indistinguishable patterns between subtypes in the late stages of disease trajectory evidenced by the increasing positional variance as disease progresses to later stages; (3) Focusing on the early stages, three consistent subtypes were identified by all SuStaIn models: the 'typical' subtype has atrophy starting in the hippocampus and amygdala, atrophy in the 'cortical' subtype originates in the lobes, cingulate and insula, while in the 'subcortical' subtype atrophy is first observed in the pallidum, putamen and caudate .

Fig. 3 .
Fig. 3. Representation of the three subtypes in brain space for the Harmonized ADNI SuStaIn and for the Harmonized UKBB SuStaIn models.

Fig. 4 .
Fig. 4. Agreement between the biomarker ordering for each subtype between the SuStaIn models trained on ADNI and UKBB data .In the scatter plots, each mark represented the expected values of the stage of a biomarker at a particular z-score (1.0: red, 2.0: magenta and 3.0: blue) in the Harmonized ADNI (x-axis) and Harmonized UKBB (y-axis) models for a particular subtype.Different shapes correspond to different brain regions.The expected value of the stage was calculated as the expectation of the stage under the subtype model: k =  ∑ 1

Fig. 5 .
Fig. 5. Subject-level stage consistency across different SuStaIn Models .The left panel in (a) shows that the subject-level stage assignments for all 795 ADNI subjects under the ADNI and Harmonized ADNI model were almost identical: the dots in the scatter plot distribute closely around the 45-degree line that goes through the origin, with an OLS regression coefficient being 0.9999 and a r-square of 99.9%.The right panel in (a) shows the correlation between stages provided the Harmonized ADNI and Harmonized UKBB models, with an OLS regression coefficient being 1.1297 and a r-square of 83.0%.Although the individuals, especially those in the late stages, were assigned to a slightly higher stage in the Harmonized UKBB model compared to in the Harmonized ADNI model (resulting in a coefficient being larger than one in the OLS regression line).Similarly, the left panel in (b) shows that the subject-level stage assignments for all 36,493 UKBB subjects under the UKBB and Harmonized UKBB model were almost identical: the dots in the scatter plot distribute closely around the 45-degree line that goes through the origin, with an OLS regression coefficient being 1.0016 and an r-square of 99.6%.The right panel in (b) shows the alignment of stages between the Harmonized ADNI and Harmonized UKBB models, with an r-square of 68.1% and an OLS coefficient of 0.6932.

Fig. 7 .
Fig. 7.The average age is highest in the typical subtype and lowest in the subcortical subtype .Subtype and age association for ADNI subjects (left column) and UKBB subjects (right column) meeting the silver standard under the Harmonized ADNI model (top row; ADNI: 219 in total: 119 typical, 75 cortical and 25 subcortical; UKBB: 6956 in total: 813 typical, 3641 cortical and 2502 subcortical) and the Harmonized UKBB model (bottom row; ADNI: 251 in total: 95 typical, 118 cortical and 38 subcortical; UKBB: 7754 in total: 536 typical, 4755 cortical and 2463 subcortical) respectively: the average age is highest in the typical subtype and lowest in the subcortical subtype.

Fig. 8 .
Fig. 8.The typical subtype subjects are associated with statistically worse CSF biomarkers (A , tau and p-tau) values in comparison to the cortical subtype .Subtype and CSF biomarkers association for ADNI with CSF measures at baseline and meeting the silver standard under the Harmonized ADNI SuStaIn model (top row; 185 in total; 99 typical, 65 cortical and 21 subcortical) and under the Harmonized UKBB SuStaIn model (bottom row; 203 in total; 64 typical, 113 cortical and 26 subcortical), respectively.The typical subtype subjects are associated with lower A  and higher tau and p-tau values in comparison to the cortical subtype.

Table 1
ADNI and UKBB training and control sets demographics.

Table 2
Consistency of subject-level subtype assignments under different sustain models.Columns indicate the subtypes assigned by the reference model, i.e., the model trained on the dataset where it is applied to.Rows indicate the subtypes assigned by the transferred model (e.g., the model trained on UKBB but applied to ADNI data).Subtype assignments are separated into whether they satisfy the 'silver' standard (subtype Prob > 0.8; silver) or whether they are simply the maximum likelihood assignment (ML).Bold font along the diagonals indicate agreement by the two models.

Table 3
Demographics of ADNI and UKBB subjects assigned to each subtype under the Harmonized ADNI model with non-zero stage and high subtype certainty.