Predicting progression from mild cognitive impairment to Alzheimer’s disease on an individual subject basis by applying the CARE index across different independent cohorts

The purposes of this study are to investigate whether the Characterizing Alzheimer’s disease Risk Events (CARE) index can accurately predict progression from mild cognitive impairment (MCI) to Alzheimer’s disease (AD) on an individual subject basis, and to investigate whether this model can be generalized to an independent cohort. Using an event-based probabilistic model approach to integrate widely available biomarkers from behavioral data and brain structural and functional imaging, we calculated the CARE index. We then applied the CARE index to identify which MCI individuals from the ADNI dataset progressed to AD during a three-year follow-up period. Subsequently, the CARE index was generalized to the prediction of MCI individuals from an independent Nanjing Aging and Dementia Study (NADS) dataset during the same time period. The CARE index achieved high prediction performance with 80.4% accuracy, 75% sensitivity, 82% specificity, and 0.809 area under the receiver operating characteristic (ROC) curve (AUC) on MCI subjects from the ADNI dataset over three years, and a highly validated prediction performance with 87.5% accuracy, 81% sensitivity, 90% specificity, and 0.861 AUC on MCI subjects from the NADS dataset. In conclusion, the CARE index is highly accurate, sufficiently robust, and generalized for predicting which MCI individuals will develop AD over a three-year period. This suggests that the CARE index can be usefully applied to select individuals with MCI for clinical trials and to identify which individuals will convert from MCI to AD for administration of early disease-modifying treatment.

The general eligibility, inclusion, and exclusion criteria for ADNI subjects can be found on the ADNI website (www.adni-info.org). We selected a total of 74 MCI subjects with a baseline diagnosis of amnestic MCI, based on the following requirements: First, all subjects had at least one resting-state functional magnetic resonance imaging (R-fMRI) scan with corresponding anatomical scans. Second, all subjects had cerebrospinal fluid (CSF), amyloid beta (Aβ), and phosphorylated tau (p-tau) concentration values. Third, all subjects had scores on the Mini-Mental State Examination (MMSE), modified 13-item Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), and Rey Auditory Verbal Learning Test (AVLT) (immediate recall score, i.e., the sum of trials 1 to 5). Of note, in this study, the CSF, p-tau, and ADAS-Cog biomarkers were used only to estimate the optimal temporal sequence of events by using event-based probabilistic model. Finally, in this study, 46 subjects had a threeyear follow-up clinical diagnosis of MCI and met criteria for inclusion as part of either a nonprogressive MCI (N-MCI) group or a progressive MCI (P-MCI) group, depending on whether they progressed to ADtype dementia at the three-year follow-up. At study entry (baseline), all subjects underwent a standardized clinical interview, cognitive/functional assessments, structural MRI, and R-fMRI scans. The clinical status of each MCI subject was reevaluated at the 36-month follow-up time point. According to the follow-up clinical diagnosis by the NINCDS-ADRDA criteria for the diagnosis of probable AD [2], those MCI subjects who progressed to AD-type dementia within 36 months of entering the study were labeled as P-MCI, those who did not progress were labeled as N-MCI subjects. The clinical status for N-MCI and P-MCI subjects is employed as the "ground truth" in our classification experiments, as described below. Note that most of the subjects included in the ADNI dataset had not met both comprehensive neuropsychological assessment scores and had at least one R-fMRI scan with corresponding anatomical scans at the three-year follow-up. Therefore, data from the ADNI dataset were not used to investigate the links between the changes of the characterizing AD risk events and the changes of neuropsychological performance.
There are 12 P-MCI subjects who progressed to AD and 34 N-MCI subjects who did not within 36 months of entering the study. Among the 34 N-MCI subjects, four MCI subjects reverted to normal cognitive status and remained dementia-free, the other 30 MCI subjects remained cognitively stable.

Participants for the NADS dataset
The NADS study recruited 87 subjects with a baseline diagnosis of amnestic MCI status, through normal community health screening, newspaper advertisement, and hospital outpatient service. Written informed consent was obtained from all of the participants, and the study was approved by the responsible Human Participants Ethics Committee of the Affiliated ZhongDa Hospital, Southeast University, Nanjing, China. To be considered for inclusion, participants had to have had functional and structural MRIs performed in ZhongDa Hospital, which is affiliated with Southeast University, on a Neuroscience Imaging Center 3T scanner.
All amnestic MCI subjects met the diagnostic criteria proposed by Petersen and colleagues [3] and the revised consensus criteria of the International Working Group on amnestic MCI [4], including the following: (1) subjective memory impairment was corroborated by the subject and an informant, (2) objective memory performance was documented according to an Auditory Verbal Learning Test-delayed recall score that was within ≤ 1.5 SD of ageand education-adjusted norms (the cutoff was ≤ 4 correct responses on 12 items for ≥ 8 years of education), (3) normal general cognitive function was evaluated by a MMSE of 24 or higher, (4) a Clinical Dementia Rating of 0.5 with at least a 0.5 rating in the memory domain, (5) no or minimal impairment in daily living activities, and (6) the absence of dementia, or symptoms that were sufficient to meet the National Institute of Neurological and Communicative Disorders and Stroke or the AD and Related Disorders Association criteria for AD. The exclusion criteria were as follows: (1) a past history of known stroke (modified Hachinski score of > 4), alcoholism, head injury, Parkinson's disease, epilepsy, major depression (excluded using a Self-Rating Depression Scale), or other neurological or psychiatric illness (excluded by clinical assessment and case history), (2) major medical illness (e.g., cancer, anemia, or thyroid dysfunction), (3) severe visual or hearing loss, and (4) a T2-weighted MRI showing major white matter (WM) changes, infarction, or other lesions (two experienced radiologists analyzed the scans). Finally, 56 subjects had a three-year follow-up clinical diagnosis of amnestic MCI.
The clinical status of each MCI subject was reevaluated at 36 months and classified into the N-MCI and P-MCI groups, as described above. There were 16 P-MCI and 40 N-MCI subjects in the NADS study. Of 40 N-MCI subjects, six MCI subjects reverted to normal cognitive status and remained dementia-free; the other 34 MCI subjects remained cognitively stable.

Image acquisition for the ADNI dataset and the NADS dataset
The ADNI data acquisition process is described at http://adni.loni.ucla.edu/. Briefly, R-fMRI datasets were scanned on 3.0 Tesla MRI scanners (Philips, Netherlands). Axial R-fMRI images of the whole brain were obtained in seven minutes with a single-shot gradient echo planar imaging (EPI) sequence. Highresolution MP-RAGE (magnetization-prepared rapid gradient-echo) 3-D sagittal T1-weighted images also were acquired.

Functional connectivity indices of the regions of interest
This study calculated the functional connectivity indices (FCI) of the three regions of interest (ROIs): bilateral hippocampus (HIP FCI ), posterior cingulate cortex (PCC FCI ), and fusiform gyrus (FUS FCI ). The details regarding the FCI calculation stream were found in our previously published study [6]. First, the whole cerebral cortex was separated into 90 regions based on the Automated Anatomical Labeling (AAL) template, and the blood oxygen level dependent (BOLD) time series of each region was extracted using the AAL template mask from the preprocessed resting-state dataset [7]. Second, functional connectivity between each ROI and the other brain regions was calculated using the Pearson cross-correlation analysis. Thus, a vector consisting of 89 cross-correlation coefficient (CC) values for each ROI was obtained. Finally, each ROI's FCI valueidentified separately as HIP FCI , PCC FCI , and FUS FCI was calculated by summating 89 CC values within each ROI's vector and averaging them across each pair of bilateral ROIs.

Gray matter index
For each subject, the gray matter index [8] of each brain region (using the same AAL template) was calculated using SPM8 software (www.fil.ion.ucl.ac.uk/spm/ software/spm8/). First, the anatomical image of each individual's brain was normalized into the Montreal Neurological Institute (MNI) space. Second, the gray matter of the whole brain was segmented and separated from WM and CSF areas, and a threshold of 0.8 was used to exclude non-gray-matter areas. Third, each region's gray matter concentration index was determined by summing the gray matter concentration values of all voxels within the region and averaging across each pair of bilateral ROIs.

Event-based probabilistic model
Given that a set of N events, E 1 , E 2 , …, E N , is measured by N biomarkers, x 1 , x 2 , …, x N , respectively, the temporal order of events, S={s(1), s(2), …, s(N)}, is calculated by a permutation of the integers 1, …, N. For subjects j=1, …, J, the dataset X could be regarded as X={X 1 , X 2 , …, X J }. Specifically, X J represents the subject j data that is given by X j ={x 1j , x 2j , …, x Nj }, where x ij is the ith biomarker measurement for subject j. This study determined the optimal temporal order in a data-driven manner, based on the criteria that the optimal temporal order, defined as the S optimal , yielded the highest probability in measuring dataset X. That is, the p(X|S) value of the S optimal sequence was calculated to be maximal among all of the possible sequences. To accomplish this objective, we first estimated the likelihood of measurement x ij given that biomarker event E i has or has not occurred. These likelihoods are labeled below: (  ) = likelihood of measurement given that event E i has occurred (1) ( ¬ ) = likelihood of measurement given that event E i has occurred (2) We assumed that subject j is at stage k, although the authentic biomarkers sequence and the subject's stage were unavailable. This means that for subject j, events Es(1), Es(2), …, Es(k) already have occurred, and eevents E s(k+1) , E s(k+2) , …, E s(N) have not occurred. The likelihood of data X j given the sequence S and the subject's stage at k was obtained using the formula below: Where is the overall likelihood of measurements given that corresponding events have already occurred, is the overall likelihood of measurements given that these events have not yet occurred. Then, we obtained the likelihood of data X j in the condition of sequence S by summing the likelihood values of data X j across all possible stages within sequence S, as shown in Equ. (4) below: (4) Next, we combined the measurements of all subjects, j=1, …, J, assuming that the intersubject relationship is independent: In theory, the above analysis needs to be repeated for each possible sequence to determine the sequence S optimal with the maximal value of ( | ). However, such a computation strategy is extremely time consuming; total calculation times in this study would be 2.7942e+009, given that there are 10 biomarker events, 11 possible stages (including stage 0), and 70 subjects (cognitively normal [CN] and AD groups only) involved. Therefore, we employed a greedy algorithm to improve processing efficiency.

Event occurrence and nonoccurrence distribution modeling
We used a mixture model of two Gaussian distributions to fit the event data from the CN and AD groups, based on the assumption that an event occurring and an event not occurring are estimated by a mixed distribution of normal and abnormal groups. The fitted Gaussian distributions separated the data into two groups, i.e., abnormal (event occurred) and normal (event did not occur), similar to the approach by Young et al. [9]. Notably, we modified Young et al.'s approach by applying a k-mean clustering algorithm to separate the whole distribution into two clusters before applying the Gaussian mixture model fitting. This modified modeling method led to high consistency in the obtained model.

Self-growing greedy algorithm
The amount of time such an analysis would take to find a global optimal result is unpredictable due to the randomized initial sequence, and it may be quite long due to the inevitable searching loop. Therefore, we developed a new greedy algorithm to address this deficiency. The greedy algorithm explores the globally optimal solution by making the locally optimal choice at each stage, in a greedy heuristic manner. The greedy Markov chain Monte Carlo (MCMC) algorithm is a useful approach to find globally optimal results. Specifically, we started with a set of all possible initial root sequences, each of which consisted of two randomly selected events from the 10 biomarker events total. Second, for each initial sequence S, we generated the children of S by inserting a randomly selected event from the remaining events. Third, we selected the children sequence with the maximal ( | ) value; this replaced the initial sequence. Then, we entered another randomly selected event into the sequence and repeated the second and the third steps until no events were left. Thus, we generated whole sequences for each root sequence. Ultimately, we determined the sequence with the maximum ( | ) value as the final optimal sequence, S optimal . We repeated this greedy algorithm 100 times to ensure the S optimal had a high reliability.
This study used 45 CN and 25 AD subjects to determine the S optimal . Note that the MCI subjects were not used to train the S optimal .
The S optimal reflects the order in which the sequential pathophysiological events occurred and provides a numeric score to measure disease progression from one stage to the next.

CARE index score calculation based on the obtained sequence
Using the following equation to determine each subject's CARE index score, we calculated the likelihood value of k at each possible stage in the sequence and defined the CARE index score as that at which k had the highest likelihood value at the S optimal : (6) In Equ. 6, implications of and refer to those in Equ. 3, except that the optimal sequence, S optimal , is obtained.

Biomarker events
Ten well-studied AD biomarkers, as described above, were selected (Table S1); each represents an event that occurs along with AD progression.
It also determines the likelihood of subject j being in stage k, given the biomarker measurement X j and optimal , by the formula below: where X j ={x 1j , x 2j , …, x Nj }, and x ij is the ith biomarker measurement for subject j. The normalized likelihood is defined as: where normalization factor is determined by: (9) The weighted average (WA) stage , ℎ for subject j is defined as (10)

S optimal in missing biomarkers
In the previous EBP model [6], the individual subject's disease stage was determined by the "winner take all" approach, i.e., the stage k with the highest likelihood value determines the subject's disease stage. However, this poses a potential problem when a biomarker is missing. For example, when a subject's disease stage corresponds to the missing marker k, it is impossible to determine the subject's disease stage to be k in the "winner take all" approach. The disease stage will fall to the next available highest likelihood stage, most likely k−1 or k+1. To address this problem, here we employ the WA stage defined in Equ (10). Theoretically, the WA stage of a subject can be k even when the corresponding marker is missing; this is demonstrated with a representative subject in Fig. S1. In the case of missing biomarker , and were set to be 1. This is equivalent to removing t hem from Equ. (7) without having to modify the existing programs. Meanwhile, for was set to 0. With this numerical modification, neither the analytical equations (7-10) nor the existing programs from [6] needed to be modified. However, for clarity, Equ. (7) was rewritten to account for missing biomarker(s): The program was employed to calculate optimal when the biomarkers were missing. The WA stage for each subject was calculated.

A representative subject
By employing the WA calculation, theoretically, the WA stage of a subject can be k even when the corresponding marker is missing. This benefit is demonstrated with a representative subject. Fig. S1 shows the advantage of the WA stage calculation. Fig  S1A shows that the subject's stage is 3 with complete data and the original method; Fig. S1B shows that the AGING subject's stage is 1 with missing data and the original method. Note: With missing data at stages 3 and 4, the normalized likelihood of stages 1, 2, 5, and 7 increases and is maximized at stage 1. Compared to that calculated using complete data (stage 3), the result with missing data (stage 1) is two stages lower; Fig. S1C shows that the subject's WA stage is 3.01 calculated with the formula shown in the figure. The difference is only 0.01 compared with the original stage calculated with the complete data.

Demographic and neuropsychological data
To increase statistical power by reducing random variability, this study composited the neuropsychological tests into four cognitive domains and transformed the raw scores into four composite Z scores, as previously described [10][11][12]. First, for each neuropsychological test, the individual raw scores were transformed to Z scores, according to the mean and standard deviation of the scores for all subjects. The following is the equation for Z transformation: where is the Z score of the ith subject, r i is the raw score of the ith subject, r is the average raw score of the neuropsychological test for all subjects, and S is the standard deviation of the scores. Notably, for tests measured in time, including TMT-A, TMT-B, Stroop A, Stroop B, and Stroop C, the raw scores were defined as the reciprocal of the time required for the test. Then, each cognitive domain's composite Z score was determined by averaging the Z scores related to the tests. We divided these tests into four cognitive