MRI-based Alzheimer’s disease-resemblance atrophy index in the detection of preclinical and prodromal Alzheimer’s disease

Alzheimer’s Disease-resemblance atrophy index (AD-RAI) is an MRI-based machine learning derived biomarker that was developed to reflect the characteristic brain atrophy associated with AD. Recent study showed that AD-RAI (≥0.5) had the best performance in predicting conversion from mild cognitive impairment (MCI) to dementia and from cognitively unimpaired (CU) to MCI. We aimed to validate the performance of AD-RAI in detecting preclinical and prodromal AD. We recruited 128 subjects (MCI=50, CU=78) from two cohorts: CU-SEEDS and ADNI. Amyloid (A+) and tau (T+) status were confirmed by PET (11C-PIB, 18F-T807) or CSF analysis. We investigated the performance of AD-RAI in detecting preclinical and prodromal AD (i.e. A+T+) among MCI and CU subjects and compared its performance with that of hippocampal measures. AD-RAI achieved the best metrics among all subjects (sensitivity 0.74, specificity 0.91, accuracy 85.94%) and among MCI subjects (sensitivity 0.92, specificity 0.81, accuracy 86.00%) in detecting A+T+ subjects over other measures. Among CU subjects, AD-RAI yielded the best specificity (0.95) and accuracy (85.90%) over other measures, while hippocampal volume achieved a higher sensitivity (0.73) than AD-RAI (0.47) in detecting preclinical AD. These results showed the potential of AD-RAI in the detection of early AD, in particular at the prodromal stage.


INTRODUCTION
Detection of subjects at risk of developing dementia associated with Alzheimer's disease (AD) and intervention at the early stage provides the greatest opportunity in reducing the increasing dementia burden associated with AD, which is the commonest cause for dementia among the older population. The latest 2018 National Institute on Aging and Alzheimer's Association (NIA-AA) research framework defined AD biologically by the presence of 2 core pathologic molecular biomarkers, amyloid-β (A+) and neurofibrillary tau (T+), rather than by the presence of cognitive impairment [1]. With this definition, subjects harboring A+T+ may exhibit a continuum of severity of cognitive impairment, ranging from cognitively unimpaired (CU) (i.e. preclinical AD), to mild cognitive impairment (MCI) (i.e. prodromal AD), to dementia (i.e. AD with dementia). The evolution from preclinical to prodromal AD, or from prodromal AD to AD with dementia may take several years and this slow transition provides an excellent window to implement strategies that may prevent conversion to dementia. This shift in paradigm (i.e. from reliance on clinical symptoms to molecular biomarkers, from focusing on dementia to pre-dementia stage) makes having an accurate in-vivo method in detecting AD biomarkers to be of great importance. At present, accurate in-vivo detection of beta-amyloid and neurofibrillary tau is feasible with positron emission tomography (PET) and cerebrospinal fluid (CSF) analysis. Studies comparing antemortem amyloid and tau PET and CSF analysis of beta-amyloid1-42 (Aβ1-42) and phosphorylated tau (p-tau) showed excellent correlation with post-mortem amyloid and tau burden [2][3][4]. Both PET and/or CSF are currently considered as the gold standard in-vivo diagnostic tests for preclinical and prodromal AD.
Apart from beta-amyloid and neurofibrillary tau, the 2018 NIA-AA research framework also considers neurodegeneration (N) as another biomarker for AD [1]. However, neurodegeneration is considered a downward and relatively more advanced event in the biological cascade of AD progression and is also non-specific, as many other brain diseases may also cause neurodegeneration. Neurodegeneration in AD is currently captured in-vivo by Fluorodeoxyglucose (FDG) PET hypometabolism, CSF total-tau, and atrophy on magnetic resonance imaging (MRI). Despite being considered as an advanced event in the biological cascade of AD, previous studies suggested that subtle yet characteristic pattern of neurodegeneration could still be detected by FDG PET, CSF total-tau, or MRI at the preclinical or prodromal stage of AD [5][6][7][8]. Moreover, subjects with A+T+(N)+ are at higher risk of future cognitive decline than those with A+T+(N)- [9,10]. Hence, detection of characteristic pattern of neurodegeneration may have a role in the detection or prognostication for preclinical and prodromal AD.
Among the 3 conventional modalities in capturing neurodegeneration in AD, only MRI is non-invasive and is relatively more accessible than PET and CSF analysis. Structural MRI can capture the unique pattern of brain atrophy associated with AD, which is more prominent in the medial temporal lobe (e.g. hippocampus) initially, and then spread throughout the entire temporal lobe, parietal lobe, and frontal lobe [5,8,11]. Medial temporal lobe atrophy (MTA) or hippocampal volume (HV) as determined by MRI is the commonest imaging biomarker used for the diagnosis of AD with dementia or as a prognostic biomarker predicting conversion from MCI to AD with dementia [12,13]. With the advancement of MRI-based automated brain segmentation tools, global and regional brain volumes (e.g. HV) can now be quantified accurately, reliably, easily, and quickly. In addition, several studies attempted to combine multiregion brain atrophy features on MRI in the form of a single severity index as derived from machine learning method and investigated its accuracy in predicting risk of conversion from MCI to dementia or from CU to MCI at an individual level [14][15][16][17][18][19]. We recently showed that a MRI-based machine learning derived AD-resemblance atrophy index (AD-RAI) had the best prognostic performance over other regional volumetric measures in predicting conversion from MCI to dementia and from CU to MCI using subjects from the AD Neuroimaging Initiatives-2 (ADNI-2) [19]. This index indicates the similarity in atrophy pattern between the subject's brain and those with AD with dementia. It ranges from 0 to 1.0 and value closer to 1 implies greater similarity. The optimal AD-RAI cutoff of differentiating converters from non-converters derived from subjects recruited from ADNI was ≥ 0.5 [19].
In this study, we aimed to validate the performance of AD-RAI at the cutoff of ≥ 0.5 obtained from our recent derivation study [19] in the detection of preclinical and prodromal AD among MCI and CU subjects recruited from our prospective cohort and the ADNI cohort (excluding ADNI-2), and to compare its performance with that of traditional MRI-based measures, namely visual MTA rating and quantitative hippocampal measures. We hypothesized that AD-RAI is able to reflect the characteristic pattern of brain atrophy that is associated with A+T+ at the prodromal or preclinical stage of AD.

AGING
We performed separate analysis on the performance of various imaging measures in detecting subjects harboring A+T+ in respective cohorts. Results of these analysis are shown in Table 3A-3C. In general, the performance metrics of AD-RAI were similar between these two cohorts.
The metrics of various imaging measures in detecting subjects harboring A+ with or without T (i.e. A+T+ and A+T-) can be found in Supplementary Table 3A-3C. Overall, almost all imaging measures had lower sensitivity and accuracy in detecting A+ with or without T when compared to that in detecting A+T+.

DISCUSSION
In the present validation study, using the cutoff derived previously from the ADNI-2 database (i.e. ≥ 0.5) [19],    formance of AD-RAI at the pre-specified cutoff of ≥ 0.5 in detecting early AD and supported the hypothesis that the pattern and severity of brain atrophy or neurodegeneration as reflected by MRI-based AD-RAI can aid the detection of early AD, in particular at the prodromal stage. To date, this is the first in vivo study exploring the performance of MRI-based machine learning method in detecting preclinical and prodromal AD as defined by the 2018 NIA-AA research framework, i.e. by the presence A+ and T+. Previous in vivo studies mainly investigated the ability of MRIbased machine learning methods in differentiating between converters and non-converters without knowledge of subjects' amyloid and tau status [14][15][16][17][18].

AGING
Although there is still no definitive pharmacological treatment approved for preventing subjects with prodromal AD from progressing to AD with dementia, emerging studies have shown promising results of various strategies in slowing cognitive decline at an early or prodromal stage [20,21]. Moreover, making a diagnosis of prodromal AD among subjects with MCI is also important for the sake of providing a correct diagnosis of the MCI syndrome, for prognostication, as well as for recruiting prodromal AD subjects into preventive clinical trials. Recent trials for AD have shifted to targeting subjects from the dementia stage to the prodromal or even preclinical stage [22]. Although PET or CSF analyses are now available to detect A+T+ at the early stage and have been used to recruit prodromal or preclinical AD subjects into clinical trials, availability of an easier method in detecting A+T+ subjects will help to reduce the cost of conducting clinical trials. Among MCI subjects, AD-RAI (≥ 0.5) achieved a high NPV of 91.30%, hence a "negative" AD-RAI will first help to rule out subjects without AD. For subjects with a "positive" AD-RAI, further investigations (i.e. PET or CSF analyses) can be arranged to confirm the diagnosis of prodromal AD. Moreover, using MRI as an initial investigation in MCI is also useful in ruling out other common brain lesions, e.g. cerebral small vessel disease (Figure 1) or other rare yet potential reversible causes, e.g. normal pressure hydrocephalus, brain tumor.   Noteworthy is that among our MCI subjects, only less than half of them (48%) had A+T+. This frequency is very similar to a meta-analysis showing that prevalence of amyloid positivity in MCI subjects at age of around 70-year-old (i.e. age similar to our MCI subjects) was around 50% [23]. Overall, the prevalence of amyloid positivity ranges from about 30% at age 50-year-old to 60% at age 80-year-old in MCI subjects [23]. This AGING highlights the need of having additional tool to aid the detection of A+T+ among subjects presenting with MCI syndrome.

AGING
Among CU subjects, AD-RAI obtained the best specificity (0.95) and accuracy (85.90%) in the detection of preclinical AD, although its sensitivity was low (0.47). Given the very high specificity of 0.95, CU subjects who have a "positive" AD-RAI are very likely to have preclinical AD. In comparison, HV achieved a higher sensitivity of 0.73 than AD-RAI in the detection of preclinical AD. A recent study also showed that HV measure had acceptable accuracy in predicting conversion from normal to MCI [24]. Note that in our recent study [19], although AD-RAI achieved the best specificity (0.98) and accuracy (79.45%) over other measures, its sensitivity was also lower (0.39) than that of HV (0.70) (see Supplementary Material). Overall, the higher sensitivity of HV over AD-RAI is consistent with our current understanding on the temporal evolution of brain atrophy in AD, which is most apparent mainly in the hippocampus at the very early stage (e.g. preclinical stage), followed by spreading to other regions as disease progresses (e.g. prodromal stage). Given the high NPV of HV (91.67%), it may be useful in ruling out AD among CU subjects. For CU subjects with a "positive" HV but a "negative" AD-RAI, confirmatory diagnostic test (e.g. PET, CSF analyses) can be further arranged. Hence, to detect preclinical AD, we may need to take into account of both HV and AD-RAI.

AGING
In this study, sensitivity of visual MTA rating in detecting AD at an early stage was low, which might partly be explained by the fact that the current visual grading has a floor effect ( Figure 2). However, devising a finer visual scale may be challenging as detecting small volumetric change by human vision may not be possible and is also not reliable. The intra-rater reliability of this study obtained from an experienced neuroradiologist was 0.74 (weighted Kappa), which was compatible with that obtained from study among experienced neuroradiologist [25]. However, among general radiologists, the intra-rater reliability could be as low as 0.38 [25]. As a whole, if a finer visual scale is used, the reliability will likely be even lower. Note that the current machined-based automated tool had a test/re-test precision of 100%.
Among subjects having A+ with or without T+, the imaging measures had poorer performance when compared to that among subjects having both A+T+. This is expected because brain atrophy is likely absent or negligible when only beta-amyloid is present. Therefore, assessing brain atrophy using MRI is unlikely to be able to identify the earliest stage of the Alzheimer's continuum, i.e. A+T-.
A strength of our study was that all our subjects received comprehensive clinical and imaging assessment, including amyloid and tau PET or CSF concentrations of Aβ1-42 and p-tau, hence allowing accurate classification on the cognitive, amyloid, and tau status of each individual. Another strength was that our study included participants from two separate cohorts involving different ethnicities. Majority of the participants were Caucasians in the ADNI cohort, while in CU-SEEDS, all were Chinese. Note that the performance of AD-RAI and HV was similar between these 2 cohorts, thus enhancing the generalizability of our findings. Our study has several limitations. Despite we had recruited more than a hundred CU and MCI subjects with clearly defined amyloid and tau status, our sample size was relatively small. In particular, among the CU subjects, only 15 were A+T+. Note however that a previous study showed that a sample size of 15 converters, who converted from CU to AD dementia (i.e. presumably A+T+) and 50 nonconverters (presumably non-AD) were able to show a statistically significant difference in the volume of multiple brain regions. [11] Hence, our current sample size should be adequate to investigate the differentiating ability of AD-RAI and other HV measures. Yet, a larger study is needed to further validate the performance of AD-RAI and other HV measures in the detection of early AD. Another limitation was that the current threshold (0.5) of AD-RAI was generated based on subjects' conversion status. Although the performance using the current threshold of 0.5 was good, we could not assume that those who converted to MCI or dementia were all A+T+, as other non-AD pathological process (e.g. cerebral small vessel disease) could also drive the conversion. Ideally, the optimal threshold of preclinical or prodromal AD will need to be derived from a larger cohort of CU and MCI subjects with clearly defined amyloid and tau status. Moreover, we used 18 F-T807 PET for detection of tau pathology, off-target 18 F-T807 AGING bindings unrelated to tau in the basal ganglia [26,27] or in some tau-negative conditions [28,29] were reported. Note that in our study, we did not label subjects with 18 F-T807 uptake at basal ganglia as T+. Ideally, the performance of automatic volumetric segmentation tool needs to be validated against brain pathology.
In conclusion, we validated an MRI-based machine learning derived AD-RAI at the cutoff of ≥ 0.5 in the detection of early AD, in particular at the prodromal stage. Given the validity, reliability, and ease of use, AD-RAI may provide additional information in guiding physicians or researchers of selecting who should receive further confirmatory investigations for the diagnosis of early AD as defined by the presence of A+ and T+, in particular among subjects presenting with MCI.

Participants
Half of the participants of this study was recruited from an on-going CU-SEEDS ( (2) known history of stroke, parkinsonism, major psychiatric disease, or any significant neurological diseases (e.g. brain tumor); and/or (3) contraindication for MRI/PET. An experienced dementia specialist (L.W.C.A.) examined all potential subjects for eligibility of this study.
The other half of MCI and CU participants were recruited from ADNI cohort, excluding subjects from ADNI-2 who were used as the training cohort in our previous derivation study. Details on the ADNI cohort could be found online at: http://adni.loni.usc.edu.

Syndromal staging of cognitive continuum of the participants
In CU-SEEDS, CU and MCI were defined according to the 2018 NIA-AA research framework [1]. We used the Chinese Abbreviated Memory Inventory (CAMI) to define the presence of memory complaints [30]. Subjects having one or more "Yes" to the 5 questions in CAMI were classified as having subjective memory complaints. We performed Hong Kong List Learning Test (HKLLT) [31] and the Hong Kong version of Montreal Cognitive Assessment (HK-MoCA) [32] for all subjects. We defined MCI as the presence of subjective memory complaints that represented a decline from baseline, objective memory impairment as defined by a z-score adjusted by age in Trial 4 (i.e. 10 min-delayed recall) of HKLLT of ≤ -1 standard deviation (SD) [33], and the cognitive impairment that has no major impact in daily function as defined by clinical dementia rating scale (CDR) of ≤ 0.5. We defined CU as having a z-score adjusted by age in Trial 4 of HKLLT > -1SD and a CDR of 0. Apart from MCI and CU subjects, we also recruited 10 dementia subjects for the purpose of validating our PET protocols. These 10 dementia subjects presented with AD-like dementia syndrome (i.e. episodic memory decline as the initial presentation, slowly progressive overtime, no atypical features such as motor deficits or parkinsonism) and had CDR of 1. They were diagnosed by an experienced dementia specialist (L.W.C.A.) All participants provided written informed consent and this study was approved by the local ethics committee.
In ADNI, CU subjects were defined as having Mini Mental State Examination (MMSE) scores between 24-30 (inclusive) and a CDR of 0 without depression, MCI and dementia. MCI subjects were defined as the presence of subjective memory complaints that represented a decline from baseline, having MMSE scores between 24-30 (inclusive) and a CDR of 0.5, and having objective memory loss measured by educationadjusted scores on a delayed logical memory score (9-11 for those with 16 or more years of education, 5-9 for 8-15 years of education, or 3-6 for 0-7 years of education, where possible scores range from 0 to 25), with absence of significant enough levels of impairments in other cognitive domains so that criteria for dementia are not met, largely preserved activities of daily living, and an absence of dementia. Details of inclusion and exclusion criteria could be found online at: http://adni.loni.usc.edu.

MRI
MRI in CU-SEEDS cohort was performed at Prince of Wales Hospital using a 3.0 Tesla scanner (Achieva TX; Philips Medical Systems, Best, Netherlands). The scanning protocol included a 3D T1-weighted MPRAGE sequence acquired at a resolution of 1.1mmx1.1mmx1.2mm which was used for visual assessment and volumetric analysis, as well as standard T2-weighted and FLAIR sequences.
MRI in ADNI cohort were collected from http://adni.loni.usc.edu for further analyses. Imaging analyzed in our study was performed at 3.0 Tesla scanners including a 3D T1-weighted sequence which AGING was used for visual rating and post-processing analysis, as well as T2-weighted and FLAIR sequences. Details could be referred to the website above.

PET in CU-SEEDS cohort
We performed 11 C-PIB and 18 F-T807 PET/CT to quantify beta-amyloid and tau deposition, respectively at the Department of Nuclear Medicine and PET of Hong Kong Sanatorium and Hospital, Hong Kong SAR. All subjects received 11 C-PIB intravenously and were scanned at 35 min post injection. Within one week, they underwent 18 F-T807 PET/CT at 85 min post IV injection. 11 C-PIB and 18 F-T807 uptake were quantified by the "global cortical to cerebellum Standard Uptake Value ratio (SUVR)". The calculation of SUVR included 13 target regions of interest contoured automatically: frontal gyrus, gyrus rectus, lateral temporal lobe, medial temporal lobe, posterior cingulate gyrus, precuneus, putamen, thalamus, superior parietal lobe, occipital lobe, head of the caudate, cerebellar vermis and brainstem.
We defined A+ if (1) increased 11 C-PIB uptake was visually observed in regions known to have beta-amyloid deposits in the early stage of AD, i.e. posterior cingulate and/or precuneus with or without involvement of other brain regions (e.g. frontal lobes) [34] and/or (2) global retention ≥1.42 [35]. We defined T+ if (1) increased 18 F-T807 uptake was visually observed in regions known to have tau deposits in the early stage of AD, i.e. medial temporal lobe, with or without involvement of other brain regions (e.g. rest of the temporal lobe, parietal lobe) [34,36,37] and/or (2) SUVR ≥1.14 [38]. CU and MCI subjects who had A+T+ based on PET findings were defined as having preclinical and prodromal AD, respectively [1]. All PET imaging data was interpreted by an experienced nuclear medicine specialist (E.Y.L.L.) who was blinded to subjects' cognitive and structural imaging data.

CSF biomarkers in ADNI cohort
CSF concentrations of Aβ1-42 and p-tau at baseline were obtained from http://adni.loni.usc.edu. We defined A+ if the concentration of Aβ1-42 was equal to or less than 192pg/ml [39]. We also defined T+ if the concentration of p-tau was equal to or above 23pg/ml [39]. CU and MCI subjects harboring A+T+ based on CSF findings were defined as have preclinical and prodromal AD, respectively [1].

Visual ratings of MTA
An experienced neuroradiologist (J.A.) rated MTA using Scheltens's scale [40] in both CU-SEEDS cohort and ADNI cohort. 10 individuals were randomly selected and rated again by the same neuroradiologist to obtain intra-rater reliability. We took the average of the left and right MTA scores as the final MTA score. We used the cutoff of ≥ 1 to define prodromal [41] and preclinical AD.

MRI post-processing
All the MRIs from CU-SEEDS and ADNI were processed automatically using AccuBrain ® IV 1.1 (BrainNow Medical Technology Company Ltd.) that performs brain structure and tissue segmentation and quantification using 3D T1-weighted MR image [42]. This automatic post-processing method takes 20 minutes to generate AD-RAI and other quantitative measures.
We used the summation of the volume of both sides in milliliter (mL) as the final raw HV. Accubrain ® also generated the hippocampal fraction (HF) (bilateral absolute HV/intracranial volume). AccuBrain ® also generated AD-RAI to indicate the similarity in atrophy pattern between the subject's brain and those with AD with dementia (ranging from 0 to 1.0). Overall, AD-RAI is based on a machine learning method and it does not need extraction of radiomic features. Based on an inhouse training database with the brain volumetric data of both normal subjects and AD dementia patients, AccuBrain® computes and selects the most relevant brain regional volumetry and projects the multidimensional brain regional volumetry features into a single atrophy index (i.e. AD-RAI) for the individual to be tested. The in-house training database contains brain MRI scans of 400 subjects, with 45% AD dementia patients and 55% CU subjects. Regarding the inclusion criteria of the in-house training database, for the AD group they were: (1) diagnosis of AD according to the International Classification of Diseases, 10th Revision (ICD-10), (2) CDR ≥ 1, (3) able to perform the neuropsychological test and tolerate the MRI scanning. The inclusion criteria for the CU group were: (1) normal in general physical status, (2) a CDR of 0 and (3) no memory complaints.
We investigated the performance of AD-RAI in detecting subjects with A+T+ using an index of ≥ 0.5, as obtained from the derivation study that was found to be the optimal cutoff in differentiating between "converters" and "stable" using ADNI-2 database [19]. Note that in our derivation study, we did not obtain the optimal cutoffs of HV and HF in differentiating between "converters" and "stable". In order to compare AD-RAI with conventional imaging measures (i.e. HV and HF) in detecting A+T+ subjects in the present validation study, we further generated receiving operating curve (ROC) among all subjects with mild or no cognitive impairment (i.e. MCI and AGING CU subjects) and among MCI and CU subgroups for the differentiation between "converters" and "stable" subjects. The derived optimal cutoffs were as follows: all subjects (i.e. MCI and CU) -HV: 6.44mL, HF: 0.42%; MCI subjects -HV: 6.07mL, HF: 0.41%; and CU subjects -HV: 6.64mL, HF: 0.44%. The performance metrics (sensitivity, specificity, positive predictive values, negative predictive values, accuracy) using the optimal cutoffs of AD-RAI, HV, and HF in differentiating converters and stable subjects from ADNI subjects can be found in Supplementary Table 1A-1C. MRI of the 10 individuals who were randomly selected for evaluation of intra-rater reliability for visual MTA rating were processed again by AccuBrain ® to test/re-test precision of the tool in generating HV, HF, and AD-RAI.

Statistical analyses
Continuous variables were presented as means (SD), whilst categorical variables were presented as numbers (percentage). We compared the demographic characteristics of the MCI and CU subjects using independent-samples t-test for group comparisons. Intrarater reliability was assessed with the weighted Cohen's kappa test [43]. Sensitivity and specificity with 95% confidence intervals (CI), positive and negative prediction values (PPV, NPV), and accuracy were employed to evaluate the performance of four different imaging measures (i.e. AD-RAI, HV, HF, visual MTA) in the identification of A+T+ subjects among all subjects with MCI and CU (n=128), MCI subjects (n=50), and CU subjects (n=78). The metrics of various imaging measures in CU-SEEDS and ADNI cohorts were also calculated respectively. We also explored the metrics of various imaging measures in the detection of A+ with or without T+ (i.e. Alzheimer's continuum). Statistical analyses were performed using SPSS version 25.0 for IOS. AGING University; Albany Medical College; University of Iowa; Dartmouth-Hitchcock Medical Center; Wake Forest University Health Sciences Center; Rhode Island Hospital; Cornell Medical Center; Cleveland Clinic Lou Ruvo Center for Brain Health (CCLRBC); Roper St. Francis Hospital; and Butler Hospital Memory and Aging Program. The information on ethical approval and the centres involved in the ADNI study as listed above was obtained from the ADNI Data and Publications Committee. The authors of this article used the publicly available ADNI data but were not involved in the conduct of the study.

Supplementary Tables
Supplementary Table 1A. Performance metrics of derived AD-RAI, HV and HF in differentiating converters and stable subjects from ADNI-2 database among all subjects (n=158).

Measures
Sensitivity (