Alzheimer’s Disease Prediction Using Long Short-Term Memory with Early-Phase 18F-Florbetaben Imaging Data

Single amyloid-beta (Aβ) imaging test is not enough to rise to the challenge of making AD diagnosis because of Aβ-negative AD or positive cognitively normal (CN). We aimed to distinguish AD from CN with dual-phase 18 F-Florbetaben (FBB) via machine learning algorithms and evaluate the AD positivity scores compared to delay-phase FBB (dFBB) which is currently adopted for AD diagnosis. A total of 264 patients (74 CN and 190 AD), who underwent FBB imaging test and neuropsychological tests were retrospectively analyzed. We compared three kinds of machine learning-based models and evaluated their performance with 4-fold cross validation. AD positivity scores estimated from dual-phase FBB showed better accuracy (ACC) and area under the receiver operating characteristic curve (AUROC) for AD detection (ACC: 84.091 %, AUROC: 0.900) than those from dFBB imaging (ACC: 81.364 %, AUROC: 0.890). The association between predicted AD positivity and the AD occurrence were compared, the use of dual-phase FBB was highest (OR: 56.333), followed by dFBB (OR: 35.182). These results show that the combined model which interpret dual-phase FBB with long short-term memory can be used to provide a more accurate AD positivity score, which shows a closer association with AD, than the prediction with only single-phase FBB.


Introduction
Approximately 50 million people worldwide suffer from dementia, and nearly 10 million new cases occur every year. The total population with such dementia is expected to be 82 million by 2030 and 152 million by 2050 1 . Alzheimer's disease (AD), the most common cause of dementia, is complex and multi-factorial in elucidating the continuum of conditions leading to asymptomatic, mild cognitive impairment, and dementia. Amyloid-β (Aβ), which can be measured through positron emission tomography (PET) scan or cerebrospinal uid analysis, is one of those de ning the pathology of AD, and is known as the earliest sign among AD biomarkers. Therefore, Aβ-related biomarkers have studied for a clinical diagnostic index as well as for early diagnosis or prediction [2][3][4] . However, as AD is known to be affected by neuro brillary tangles aggregated by hyperphosphorylated tau protein, genetics, and environmental in uences as well 5 , both Aβ-negative AD and Aβ-positive CN inevitably exist 6 . In addition, it is di cult to monitor the patient's condition because Aβ plaques are already saturated by the time cognitive function clinically declines 7 .
These facts remind us how an additional AD biomarkers is required to understand and respond to AD. 18F-Fluorodeoxyglucose (FDG), which is radiopharmaceutical that enables imaging of changes in glucose metabolism in brain tissue, is another one of the representative AD biomarkers. The hypometabolism, which is measured using FDG-PET, is known to be associated with neurodegeneration and cognitive decline 8 . However, such a series of PET imaging tests have drawbacks that make patients who need a diagnosis or longitudinal studies for AD undergo relatively frequent radiation exposure and high nancial expenditure.
Aβ uptake in early phase Aβ-PET is known to be a potential perfusion imaging modality that re ects cerebral blood ow [9][10][11] . Reference 4 reviewed the coupled relationship between hypoperfusion that causes deleterious changes in neurons and cerebral hypometabolism that underlies neuronal/synaptic dysfunction with the respective associations with cognitive impairment. Given an adequate evaluation of neuronal function and Aβ load from dual-phase Aβ-PET imaging, we may be able to provide patients with a more accurate AD diagnosis and prognostic evaluation without compromising patient convenience.
Compared to delay phase Aβ-PET, however, there is no consensus or a well-established guide regarding how to interpret and evaluate the potential perfusion imaging for AD.
In the eld of imaging biomarkers, various efforts have been made to provide an improved quality of medical services continuously. In particular, the latest technologies incorporating arti cial intelligence have been reported to show a consistent inference and classi cation performance comparable to human doctors. Such technologies are excellent at not only reducing a portion of labor for human doctors but also addressing inter-observation problems 12,13 . In addition, machine learning-based studies on imaging for AD biomarkers are also actively reported 14 . Existing machine learning-based studies for AD have commonly suggested a predictive model that learns single or more than two kinds of imaging data such as magnetic resonance imaging, FDG, or Aβ-PET. Those attempts using a variety of information for AD detection could be appropriate designs that address the complex and heterogenous characteristics of AD.
In this study, we attempted to develop and evaluate an improved imaging biomarker in the machine learning algorithm by engaging with dynamic early phase Aβ-PET as well as single delay phase Aβ-PET conventionally used for AD diagnosis. The method included (1) extracting the mean of the standard uptake value ratio (SUVr) with a consistent area and reference region from individual dual-phase Aβ-PET imaging, (2) developing a machine learning-based predictive model, such as logistic regression (LR), support vector machine (SVM), and neural network (NN), which estimates the AD positivity score, and (3) comparing the classi cation performance among models and evaluating the association between predicted AD positivity scores and cognitive function or occurrence of AD.

Materials And Methods
Participants We adopted FBB PET as Aβ-PET for this experiment and retrospectively recruited subjects who visited the Department of Neurology and Nuclear Medicine of the Dong-A University Hospital (DAUH) and underwent dual-phase FBB from November 2015 to June 2020. The total number of subjects was 264, consisting of 74 cognitive normal (CN) and 190 AD. Detailed demographic data of the participants are presented in Figure 1. All CN cases had normal age-, gender-, and education-adjusted performance on standardized cognitive tests. The AD participants met the following inclusion criteria: 1) criteria for dementia according to the Diagnostic and Statistical Manual of Mental Disorders 4 th Edition (DSM-IV-TR) 15 , and 2) the criteria for probable AD according to the NIA-AA core clinical criteria 16 . The individual FBB PET imaging for Aβ load was visually evaluated by brain Aβ plaque load (BAPL) scoring system, which de nes a BAPL score of 1 (no Aβ load), 2 (minor Aβ load), and 3 (signi cant Aβ load) 17 . Dong-A University Hospital Institutional Review Board (DAUHIRB) reviewed this study with the member who participated in Institutional Review Board Membership List and nally approved this study protocol (DAUHIRB-17-108). All procedures for data acquisition were in accordance with the ethical standards of DAUHIRB with 1964 Helsinki declaration and its later amendments or comparable ethical standards. We guarantee that informed consent was obtained from all participants for this study.

PET acquisition
All FBB PET imaging for this experiment was performed using a Biograph 40mCT Flow PET/CT scanner (Siemens Healthcare, Knoxville, TN, USA) and reconstructed through UltraHD-PET (TrueX-TOF). An FBB dose of 300 MBq was administered to the participants as an intravenous bolus injection from 0 to 20 min and from 90 to 120 min post-injection after helical CT with a 0.5 sec rotation time at 100 kVp and 228 mAs. The image acquisition time for dual-phase FBB PET was determined by related studies to su ciently include the peak of Aβ uptake for early phase FBB PET (eFBB) and the manufacturer's recommendations and for delay-phase FBB PET (dFBB) 10,17,18 . The acquired dynamic eFBB and static dFBB was 27 frames of 128 × 128 × 110 (3.19 mm × 3.19 mm × 1.5 mm) resliced from a eld of view of 408 mm × 408 mm × 165 mm, and one frame of 400 × 400 × 110 (1.02 mm × 1.02 mm × 1.5 mm) resliced from a eld of view of 408 mm × 408 mm × 165 mm, respectively. Static eFBB to evaluate potential perfusion was made by averaging the frames corresponding to 2-7mins from dynamic eFBB. The optimal time period required to obtain static eFBB was internally determined using the approach in Reference 10 .

Pre-processing
The pre-processing procedures applied to extract regional mean SUVr for dynamic eFBB or static eFBB/dFBB, respectively, were as follows. For the spatial normalization of all PET images, we used an inhouse eFBB PET template 19 , which averaged 8 CN and 8 AD randomly selected from the collective spatially normalized FBB data pool in Montreal Neurological Institute (MNI) space 20 . Each static eFBB was spatially non-linearly registered to the template space. For dynamic eFBB, the deformation eld corresponding to the template space from the mean of the total number of frames was used. The deformation elds for a dFBB were identical to those estimated in the spatial normalization process of the matched static eFBB 21 . As a result, the spatially registered imaging was in a voxel space of 95 × 79 × 68 (height × width × depth). We merged Hammers atlas 22 into nine representative regions (frontal lobe, temporal lobe, parietal lobe, occipital lobe, posterior cingulate, caudate, putamen, and thalamus) for the reference region for count normalization and volume of interest for estimating the mean SUVr. After spatial normalization, the intensities of each image were normalized with respect to the mean uptake of the whole cerebellar region as a reference region. Finally, for dynamic eFBB and static eFBB/dFBB, regional mean SUVr of 8 × 1 and regional time-activity curve (TAC) data of 8 × 27 (number of target regions × temporal length) were obtained, respectively.
Calculation of AD positivity score based on brain blood ow and amyloid-β plaque To calculate the AD positivity score, we aimed to build a proper machine learning-based classi cation model to predict the probability of whether the given regional TAC or mean SUVr data belong to the CN or AD distribution. Fig. 1 shows the structure of the adopted models for this study. Fig. 1a is a combined model that predicts AD using both dynamic eFBB and dFBB extracted from dual-phase FBB, and Fig. 1b shows the structure of the model using only one of either eFBB or dFBB (single-phase FBB). The combined model (NN aggregated ) in Fig. 1a consists of long short-term memory (LSTM eFBB ) model, which is used to extract temporal features such as dynamic eFBB, feedforward neural network (NN dFBB ) for dFBB, and last NN to make a nal diagnosis decision (NN Dx ) using the concatenated features (eFBB and dFBB) delivered from the preceding layers. Each LSTM eFBB and NN dFBB and NN Dx have an output layer with two nodes leading to the softmax function to interpret the model output as the probability for diagnostic label, and their model parameters were trained to minimize the cross-entropy loss between the predicted probability and one-hot encoded actual label. To train the weights of each internal model (LSTM eFBB , NN dFBB , and NN Dx ) constituting the combined model, we alternately trained the internal models in a 1:1:2 ratio of epochs. The models depicted in Fig. 1b are a logistic regression (LR) and support vector machine (SVM), or an NN-based model, and were used as a baseline to evaluate the feasibility of the combined model that interprets dual-phase FBB.

Detailed parameters for model selection and model evaluation
For the purpose of this experiment, we focused on showing that the model with dual-phase FBB is more useful for estimating AD positivity than a model with only dFBB. Therefore, we tried to simplify and unify the model structure and detailed parameters of each model as much as possible. NN-based models, including LSTM, have two hidden layers, eight nodes of each hidden layer, and L2 regularization was applied with a weight of 0.01. The learning curves of all models were set to be trained up to 10000 epochs but were stopped if the validation loss was not updated more than 200 times. The learning rate was 0.0005, and the Adam optimizer 23 was used for each setting. If the validation loss was not updated more than 100 times at a point, 0.001 of decay rate was applied to the learning rate of the point.
The SVM used in the experiment used a linear kernel as a kernel function. A radial basis function or polynomial kernel was also tested in an internal experiment but no meaningful difference was observed, and a simpler model was nally adopted to prevent over tting.
The software used in this experiment was the SPM12 library and MATLAB R2020a for the data preprocessing, including spatial normalization, count normalization, and for calculating regional mean SUVr based on the Hammers atlas 22 . Keras 2.2.4 library and Python 3.6.9 were used to select and evaluate a model for estimating AD positivity. The experimental tool was implemented and tested on Linux Ubuntu 16.04 LTS with an Intel Core i7-6800K CPU and two GPUs (NVIDIA GeForce GTX 1080).
For model selection and evaluation, we used a nested cross-validation framework 24,25 . In situations where there is a limited dataset, nested cross-validation could be an appropriate alternative when it is di cult to divide the dataset into training, validation, and testing sets. Its inner loop achieves model selection through a prede ned hyper-parameter search space and wrapper function, and the outer loop is used to measure the generalization performance of the selected model. Although we used 4-fold crossvalidation, instead of using the wrapper function, with the dataset corresponding to the inner loop, we manually tried to nd a uni ed hyper-parameter that the NN-based model does not under t or over t. We assign the index to the total dataset for each fold to simulate nested cross-validation and to maintain the reproducibility of our experiment.

Statistical Analysis
The experimental data included only cases that underwent dual-phase FBB imaging obtained at the Department of Neurology or Nuclear Medicine of the DAUH, excluding cases with other types of dementia or inconsistent imaging protocols. As a result, a total of 264 cases, including 74 CN and 190 AD, were included in this analysis. We used independent-sample t-tests for numerical variables such as age, and education, and Pearson's Chi-square test for categorical variables such as sex, FBB reading, K-MMSE, CDR, and GDS to determine whether the characteristics of subjects in our experimental dataset are biased according to the diagnostic label. For the demographic analysis, we used IBM SPSS statistics version 23.
In order to evaluate the classi cation performance of trained models, we calculated the accuracy (ACC) and area under the receiver operating characteristic curve (AUROC) for AD detection using DeLong's method 26 and Spearman correlation between predicted AD positivity scores and neuropsychological tests/actual diagnostic label. for these processes, we used MedCalc version 18.9.1 (MedCalc Software). In all tests, the statistical signi cance level was set at p<0.001 with a two-sided test.

Results
Data demographics Table. I shows the statistics and test results for the characteristics of participants in the experimental data we used. There was no statistically signi cant difference between the CN and AD groups collected retrospectively by DAUH in age, sex, and education variables (Page < 0.142, Psex= 0.065, Peducation = 0.188). The results of K-MMSE, CDR, and GDS (which are the dominant variables in the diagnosis of AD, and re ect cognitive function) and dFBB readings (which re ect a state of Aβ plaque load) showed statistically signi cant differences between groups (PMMSE < 0.0001, PFBB_reading = 0.0001). Our experimental data included 20.83 % of Aβ-positive CN and 16.84 % of Aβ-negative AD.

Pre-processed Image 2
For the result of spatial registration, Fig. 2a shows static eFBB and dFBB registered in MNI space, which is randomly selected for each diagnostic label, compared with raw images of those in native space. As a result of pre-processing, it was con rmed that the spatial characteristics of individual imaging disappeared after they were transformed into MNI space but functional characteristics remained according to the diagnostic label.
In Fig. 2b, to check whether the functional information of eFBB on our pre-processing method and selected time-period is feasible, eFBB (2-7 min) was observed by t-contrast according to the diagnostic label. The functional information of dFBB was omitted because the results have already been veri ed through previous studies 21 . As a result of t-contrast, Aβ uptake of AD relatively lower CN in eFBB was not dominant, except for the cerebellar area. On the other hand, the relatively higher Aβ uptake of CN than AD is clustered throughout the brain tissue area. Table. shows the AD classi cation performance of ML-based predictive models measured by outer 4- 84.091, AUROC: 0.900), which trained dual-phase FBB, followed by SVM that learned dFBB and LSTM that learned eFBB.

AD classi cation performance
AD positivities of total participants measured by three models (NN aggregated , LSTM eFBB , and SVM dFBB ) representing each kind of FBB (dual-phase FBB, dynamic eFBB, and static dFBB) in the total data are presented in Table. .  Fig. 3c and 3e show the distribution of mean SUVr and features extracted from dFBB, and those do not seem to fully explain Aβpositive CN and Aβ-negative AD. The distribution of mean SUVr and features extracted from eFBB shown in Fig. 3a, b, and d appears to be that Aβ-negative AD distribution is closer to Aβ-positive AD distribution compared to those extracted from dFBB. However, it is observed that the Aβ-positive CN distribution is still close to that of Aβ-positive AD. On the other hand, in Fig. 3f, the feature distribution extracted from dual-phase FBB showed a continuously distributed pattern rather than clustered, and relatively well separating Aβ-negative and Aβ-positive AD.

Discussion
We designed an experimental model to successfully improve the conventional imaging biomarkers with only static dFBB by engaging in dynamic eFBB based on the following two assumptions: (1) The potential blood ow information included in eFBB is su ciently distinguished from dFBB and they provide complementary information with respect to AD diagnosis. (2) The temporal information included in dynamic eFBB can be summarized as a feature vector representing blood ow information by the LSTM model. In the remaining paragraphs, we will elucidate the experimental results or related problems with respect to the hypotheses above.
Compared with the use of only dFBB in a conventional context, to improve the accuracy of AD classi cation by analyzing dual-phase FBB, eFBB and dFBB must contain su cient complementary information regarding AD, that is, eFBB should be able to su ciently explain AD in different aspects from dFBB. As shown in Fig. 2 and Table. , we tried to con rm whether the potential perfusion information of eFBB is suitable for this experiment. Even though the deformation eld used for registration in eFBB was applied to dFBB, both eFBB and dFBB could be located in the MNI space, and the Aβ load pattern in a region of gray matter was still observed in each preprocessed dFBB. In voxel-based analysis, hypoperfusion was observed in the AD group regardless of the Aβ distribution (Fig. 2). From the comparison of characteristics between the same Aβ distribution in Table. , it was observed that the AD positivity calculated from eFBB relatively explained AD distribution relatively well, which was di cult to discriminate the diagnostic label with dFBB in a same Aβ distribution. Therefore, the dynamic or static eFBB which acquired from our experimental protocol is meant to be complementary to the uptake of dFBB for AD detection, and the improved classi cation performance of the NN aggregated with dual-phase FBB could be based on the additional potential blood ow data.
LSTM is a representative NN for time series data that ultimately understands the long-term contextual information by managing the cell state necessary to determine the output from the input over time through input, output, and forget gates 28 . In terms of research on medical data, LSTM has been frequently used in EEG/ECG 29 , imaging reports 30 , electronic health records 31 , and static or dynamic imaging data 32,33 , which include temporal information. A common delay-phase static PET image is acquired at the acquisition time determined by investigating the pseudo-equilibrium interval in which speci c binding remains stable through TAC data and considering other parameters such as image quality and diagnostic accuracy. In the case of the FBB radiotracer, the manufacturer provides acquisition time for the delay phase, not for the early phase. In eFBB, the optimal acquisition time interval closest to perfusion cannot be found in the stable state owing to the curve that changes rapidly around the peak; therefore, the interval must be determined exploratory. Even the interval for ideal potential perfusion imaging is not deterministic and may vary from case to case. As a related work, it was mainly considered in studies that explored a speci c acquisition time based on the similarity between eFBB and FDG images. They randomly selected an interval, including the peak uptake 11 , or searched for a combination of the start time and time window to determine the acquisition time with the correlation most similar to FDG 10 . In Table. , the LSTM model showed better performance than the NN model trained static eFBB at 2-7 min, which had a good correlation with FDG in our prior study 18 . These experimental results may indicate that the LSTM could understand the temporal features required for AD classi cation from potential perfusion information in dynamic eFBB and the calculation of optimal acquisition time could be omitted. Fig. 5a and 5b show that eFBB and dFBB discriminate AD from CN using different features of each image. In Fig. 5b, most misclassi cations occurred in the Aβ-positive CN and Aβ-negative AD populations, whereas, in Fig. 5a, the eFBB classi er consistently scores a proper AD positivity for CN or AD regardless of Aβ distribution. Therefore, it could be considered that the performance of dual-phase FBB classi er originates from the state of neuronal injury by comprehensively evaluating the degree of hypo-perfusion from eFBB and Aβ plaque deposition from dFBB, respectively (Fig. 5c). In Table , AD positivity scores calculated by dual-phase FBB for the entire population showed the AUROC, which had no statistically signi cant difference compared to the MMSE, and better classi cation performance than those calculated using only dFBB regardless of Aβ distribution. These results may indicate that it is possible to improve the evaluation of the degree of neuronal damage in research or clinically when the AD positivity score of dual-phase FBB is provided. In addition, it could provide a quantitative index to nuclear medicine physicians to explain false negative/positive cases in FBB imaging tests. This quantitative method could be considered for application to other types of tracers or PET imaging where early phase PET re ects potential perfusion information.
This study proposes a quantitative method for the interpretation of dual-phase FBB at this point when the evaluation criteria for potential perfusion information of eFBB have not yet been established. Ultimately, it could help to reduce the radiation exposure and costs for patients with AD, and for a nuclear medicine physician, it could be a helpful tool in visual assessment for dual-phase FBB. On the other hand, as a limitation of this study, the predictive model analyzing dual-phase FBB needs to be evaluated in terms of external validation or clinical validity in the future. As mentioned earlier, AD is associated with neuro brillary tangles aggregated by phosphorylated tau, CSF biomarker, genetics, and environmental factors, in addition to Aβ plaque accumulation. Given additional clinical and laboratory data in the future, it would be possible to develop a predictive model that aggregates more various predictive factors for AD in addition to improving the performance of the quantitative model in this study.

Conclusion
In this paper, we report on how to interpret dual-phase FBB using ML-based models and their evaluation results. In comparison with the AD classi cation, the model trained on mean SUVr extracted from dualphase FBB imaging (ACC: 84.091%, AUROC: 0.900) showed better AD classi cation than single-phase FBB, eFBB (ACC: 78.182%, AUROC: 0.846), or dFBB (ACC: 81.364%, AUROC: 0.890). In addition, the AD positivity score estimated by dual-phase FBB (ρMMSE: −0.556, OR: 59.333) showed a higher correlation with psychological test results and the association with AD occurrence compared to those with only dFBB (ρMMSE: −0.174, OR: 27.679). These experimental results show that the proposed method could be used to interpret eFBB in dual-phase FBB and that by re ecting eFBB into the current reading system, Aβ-PET reading, AD diagnosis, or the monitoring system could be improved.