Prediction of the Clinical Severity of Progressive Supranuclear Palsy by Diffusion Tensor Imaging

Progressive supranuclear palsy (PSP) is characterized by a rapid and progressive clinical course. A timely and objective image-based evaluation of disease severity before standard clinical assessments might increase the diagnostic confidence of the neurologist. We sought to investigate whether features from diffusion tensor imaging of the entire brain with a machine learning algorithm, rather than a few pathogenically involved regions, may predict the clinical severity of PSP. Fifty-three patients who met the diagnostic criteria for probable PSP were subjected to diffusion tensor imaging. Of them, 15 underwent follow-up imaging. Clinical severity was assessed by the neurological examinations. Mean diffusivity and fractional anisotropy maps were spatially co-registered, normalized, and parcellated into 246 brain regions from the human Brainnetome atlas. The predictors of clinical severity from a stepwise linear regression model were determined after feature reduction by the least absolute shrinkage and selection operator. Performance estimates were obtained using bootstrapping, cross-validation, and through application of the model in the patients who underwent repeated imaging. The algorithm confidently predicts the clinical severity of PSP at the individual level (adjusted R2: 0.739 and 0.892, p < 0.001). The machine learning algorithm for selection of diffusion tensor imaging-based features is accurate in predicting motor subscale of unified Parkinson’s disease rating scale and postural instability and gait disturbance of PSP.


Introduction
Progressive supranuclear palsy (PSP) is one of the most common causes of neurodegenerativeparkinsonism after Parkinson's disease (PD). Both share several similar clinical symptoms (bradykinesia, rigidity, dysarthria, dysphagia, and dementia), although resting tremor is rare in PSP [1]. Because of substantial overlaps in clinical symptoms and inadequate accuracy of current tests, differential diagnosis from PD is challenging [2]. PSP is generally characterized by a rapid and progressive clinical course. Therefore, an objective evaluation of disease severity at an early stage might significantly improve the diagnostic confidence of the neurologist.
Neuroimaging examination, such as structural magnetic resonance imaging (MRI) studies, is often prescribed on the suspected patients in order to rule out concomitant neurological disorders. Structural MRI-based signs can be detected in the brain of patients with PSP. For example, characteristic midbrain atrophy ("hummingbird" sign) on the midsagittal plane and rounded midbrain peduncles ("Mickey Mouse" sign) on the axial plane [3] were noticed. Unfortunately, they can only appear in advanced disease stages. Although they could be diagnostically useful, these measures or their ratios on the midbrain or pons were not found to be related to age, disease duration, or clinimetric scores and rarely result in meaningful change in patient management [4]. In recent years, diffusion tensor imaging (DTI) has been extensively used in the study of damages in white matter tracts [5]. Although DTI parameters, such as fractional anisotropy (FA) and mean diffusivity (MD), are deemed to reflect clinical rating systems, to be translated into clinical practice the measurements still need extensive method standardization [4].
From a pathological standpoint, PSP is a tauopathy characterized by diffuse deposits of globose neurofibrillary tangles, tufted astrocytes, and coiled bodies and threads in different brain areas [1]. The pathological alterations in PSP are not limited to the midbrain but do actually affect various regions including cerebellum, brainstem, deep nuclei, cerebral white and grey matter. Therefore, it might require a comprehensive assessment of the microstructural damage in the whole brain, in order to predict the clinical severity.
Previous radiomic studies on oncology already included multiple imaging features into a regression model for the prediction of treatment outcome [6,7]. Here, we designed a tailored machine learning algorithm in which the predictors of clinical severity were selected from a multivariable linear regression model. The aim is to examine if the metrics derived from DTI can be related to the severity of patients with PSP in order to support a timely clinical diagnosis.

Materials and Methods
This retrospective study is a re-analysis of images collected from three prospective studies during 2008-2017. All studies were reviewed and approved by the Institutional Review Board of Chang Gung Memorial Hospital (Approval No. 97-0510B, 98-3626A, 100-3761A3 and 201600426B0) and conducted following the Declaration of Helsinki. All participants provided written informed consent following a detailed explanation in the prospective studies.

Study Patients
All of the study patients were enrolled from the neurology clinics and had a clinical diagnosis of probable PSP according to the diagnostic criteria of either (1) National Institute of Neurological Disorders and Stroke (NINDS) and the Society for PSP (NINDS_PSP criteria, between July 2008 and August 2011) and (2) Litvan et al. [1] between June 2012 and December 2017. All the participants underwent MRI examinations using a 3T scanner. Both DTI and structural images (T1 weighted magnetization-prepared rapid acquisition gradient echo sequence, T1-MPRAGE) were acquired. Diagnoses of PSP were made by three senior neurologists (CS Lu, and YH Weng, and WY Lin, with 28, 21, and 8 years of experience, respectively). The following patients were excluded: presence of brain abnormalities including hydrocephalus or encephalomalacia that may impair cognitive function on MRI and/or 18 fluorodeoxyglucose-positron emission tomography ( 18 FDG-PET) studies; history of intracranial surgery such as thalamotomy, pallidotomy, and/or deep brain stimulation; and major physical or neuropsychiatric disorders; general MRI exclusion criteria. The study sample consisted of 53 patients (21 men and 32 women, mean age: 65.7 ± 6.5 years; mean disease duration: 5.4 ± 3.2 years). Of them, 15 patients (8 men and 7 women; mean age: 65.9 ± 5.7 years) underwent follow-up imaging examinations and served as an additional validation cohort.

Image Acquisition
Imaging was performed on a 3T MR scanner (Magnetom Trio; Siemens, Erlangen, Germany). A total of 160 contiguous axial T1-weighted images were acquired with T1-MPRAGE using the following parameters: TR/TE = 2000 ms/2.63 ms; flip angle = 9 • ; field of view = 224 mm × 256 mm, matrix size = 224 × 256-resulting in a voxel size of 1 mm × 1 mm × 1 mm. Three senior neuroradiologists (YL Chen, SH Ng, and YM Wu, with 28, 20, and 10 years of experience, respectively) blinded to clinical data independently interpreted all MR images. Diffusion images were acquired with three different imaging protocols. Two diffusion-weighting values (b-values)-0 and 1000 s/mm 2 -were used in the final analysis. Imaging parameters are shown in Table 1.

Image Processing
Images were processed as previously described by Ng and coworkers [9] using MATLAB (MATLAB 2015a; Math Works, Inc., Natick, MA, USA). Briefly, individual diffusion tensor parametric maps (MD and FA) were calculated from diffusion-weighted images with Diffusion Kurtosis Estimator software [10]. Using structural T1 images, a parenchymal mask was created to remove the signal from the cerebrospinal fluid. Both MD and FA maps were spatially co-registered, normalized, and parcellated into 210 cortical and 36 subcortical brain regions according to the Human Brainnetome Atlas [11] using the Statistical Parametric Mapping software (SPM8, 2009) [12]. The 90th, 50th, and 10th percentiles of each parcellated region of interest were recorded from diffusion tensor parametric maps, resulting in a total of 1476 features.

Statistical Analysis and Feature Reduction Process
All calculations were performed using the SAS statistical package, version 9.4 (SAS Institute Inc, Cary, NC, USA). The clinical severity scale was used as the ground truth and was entered into the regression model, which included UPDRS-III, PIGD, MHY and LEDD. All participants were used as the training cohort. The results were further validated using leave-one-out and five-fold cross-validation. In addition, the second imaging dataset from the 15 returned patients served as an additional blind validation data set.
The number of features was reduced in two procedures. First the L1-norm regularized least absolute shrinkage and selection operator (LASSO) regression with 1000 times bootstrapping was performed, in order to reduce the number of features to less than sample size (53, the number of the participants). To clarify the potential effect on the predictability of the models, age, sex, disease duration and imaging protocols were examined, together with the features from diffusion metric, by LASSO. The features which were selected by more than 20% of the bootstrapping models (i.e., ≥200 times) were entered into the second analysis. As a result, approximately 22 to 32 features from the original image features survived in each clinical severity scale.
Secondly, linear regression with stepwise selection was used to identify the features that were finally used to predict each clinical severity scale. The number of features that entered into each regression model is limited to one fifth of the participants' number [13,14], which leads to 11 features in each regression model. The coefficient of these 11 features in the final model was determined by linear regression with 1000 times bootstrapping.
The robustness of the regression model was expressed by the mean absolute error and the mean adjusted R 2 . We further validated our findings in the subgroup of patients who underwent serial MRI imaging (using the same image processing method). The extracted features were entered into the model developed in the initial analysis. For model PIGD, three missing data of the assessment mentioned in Table 1 were excluded in the analysis. The difference of mean absolute error among the cross validation in baseline (leave-one-out, five-fold) and the subgroup of patients was examined by using Friedman test with p < 0.05 was regarded as significant. Figure 1 is a flowchart of the study procedure.

Results
The number of selected features after LASSO that were allowed to enter the regression model was inferior to the sample size (i.e., 32, 29, 24, and 22 for UPDRS-III, PIGD, MHY, and, LEDD respectively). Only diffusion metrics were selected into the final models. The final number of features in each prediction model was limited to 11 (approximately 53/5). The results of regression analysis revealed a strong correlation between diffusion imaging parameters and clinical severity ( Figure 2) for all measures (A: UPDRS-III; B: PIGD; C: MHY; and D: LEDD). We were able to predict the clinical severity scores by using a combination of FA and MD values extracted from multiple regions of interest, i.e., not limited to areas known to be related to PSP pathogenesis. All of the predictions for the four different subtypes were within the 95% confidence interval, the only exception being UPDRS-III from one patient of PSP-PD (that fell shortly outside this interval). J. Clin. Med. 2020, 9, x FOR PEER REVIEW 5 of 13

Results
The number of selected features after LASSO that were allowed to enter the regression model was inferior to the sample size (i.e., 32, 29, 24, and 22 for UPDRS-III, PIGD, MHY, and, LEDD respectively). Only diffusion metrics were selected into the final models. The final number of features in each prediction model was limited to 11 (approximately 53/5). The results of regression analysis revealed a strong correlation between diffusion imaging parameters and clinical severity ( Figure 2) for all measures (A: UPDRS-III; B: PIGD; C: MHY; and D: LEDD). We were able to predict the clinical severity scores by using a combination of FA and MD values extracted from multiple regions of interest, i.e., not limited to areas known to be related to PSP pathogenesis. All of the predictions for the four different subtypes were within the 95% confidence interval, the only exception being UPDRS-III from one patient of PSP-PD (that fell shortly outside this interval).

Results
The number of selected features after LASSO that were allowed to enter the regression model was inferior to the sample size (i.e., 32, 29, 24, and 22 for UPDRS-III, PIGD, MHY, and, LEDD respectively). Only diffusion metrics were selected into the final models. The final number of features in each prediction model was limited to 11 (approximately 53/5). The results of regression analysis revealed a strong correlation between diffusion imaging parameters and clinical severity ( Figure 2) for all measures (A: UPDRS-III; B: PIGD; C: MHY; and D: LEDD). We were able to predict the clinical severity scores by using a combination of FA and MD values extracted from multiple regions of interest, i.e., not limited to areas known to be related to PSP pathogenesis. All of the predictions for the four different subtypes were within the 95% confidence interval, the only exception being UPDRS-III from one patient of PSP-PD (that fell shortly outside this interval).    The predictive equations for each clinical assessment can be calculated in Table 2 by the combination of diffusion metrics from multiple brain regions and the unstandardized coefficients. Table 3 summarizes the statistical results of the regression model for each assessment at training and validation. In all assessments, the adjusted R 2 varied from 0.739 to 0.892 in both model training and cross-validations. The mean absolute error of the estimation varied between 5.6% (UPDRS-III) and 40.1% (LEDD). The complete nomenclature-including diffusion metrics, percentile values, modified cytoarchitecture, and Montreal Neurological Institute (MNI) [15] coordinates are reported in Supplementary Table S2. The predictive equations for each clinical assessment can be calculated in Table 2 by the combination of diffusion metrics from multiple brain regions and the unstandardized coefficients. Table 3 summarizes the statistical results of the regression model for each assessment at training and validation. In all assessments, the adjusted R 2 varied from 0.739 to 0.892 in both model training and cross-validations. The mean absolute error of the estimation varied between 5.6% (UPDRS-III) and 40.1% (LEDD). The complete nomenclature-including diffusion metrics, percentile values, modified cytoarchitecture, and Montreal Neurological Institute (MNI) [15] coordinates are reported in Supplementary Table S2.

UPDRS-III
The  Follow-up MRI examinations were performed in a subset of patients who served for blind validation purposes (Figure 4). An approximately less than three-fold increase in the estimation error was observed for all measures. However, the model initially developed in the entire cohort still retained its ability to predict UPDRS-III (mean absolute error: 15.5%) and, to a lesser extent, LEDD (mean absolute error: 33.9%). Figure 4 plots the predicted (using the initially developed model) versus observed scores in the subset of patients who underwent follow-up imaging.

Main Findings
In this study, we developed a machine learning algorithm based on DTI to predict the clinical severity of PSP. Our model was found to be accurate for all of the clinical measures under consideration. Performance estimates of the prediction model were obtained using bootstrapping (1000 replications), leave-one-out/five-fold cross-validation, and through application of the model in the subset of patients who underwent repeated imaging. Notably the highest adjusted R 2 was of 0.892 from UPDRS-III.

Clinical Impact
Despite continuing efforts, the identification of reliable imaging biomarkers for predicting PD severity remains elusive [16]. During the traditional diagnostic workout, MRI is generally performed to rule out concomitant neurological disorders. The use of our machine learning algorithm may allow a timely evaluation of disease severity before standard clinical assessments. This possibility may especially be advantageous in high-volume tertiary centers with long waiting lists. In this regard, traditional clinical evaluations in patients with movement disorders are known to be time-consuming and prone to fluctuations of symptoms over time.
Follow-up MRI examinations were performed in a subset of patients who served for blind validation purposes (Figure 4). An approximately less than three-fold increase in the estimation error was observed for all measures. However, the model initially developed in the entire cohort still retained its ability to predict UPDRS-III (mean absolute error: 15.5%) and, to a lesser extent, LEDD (mean absolute error: 33.9%). Figure 4 plots the predicted (using the initially developed model) versus observed scores in the subset of patients who underwent follow-up imaging.

Main Findings
In this study, we developed a machine learning algorithm based on DTI to predict the clinical severity of PSP. Our model was found to be accurate for all of the clinical measures under consideration. Performance estimates of the prediction model were obtained using bootstrapping (1000 replications), leave-one-out/five-fold cross-validation, and through application of the model in the subset of patients who underwent repeated imaging. Notably the highest adjusted R 2 was of 0.892 from UPDRS-III. In contrast, DTI data may be obtained rapidly (less than 15 min in our study) and are increasingly becoming a routine part of current MRI protocols. Notably, our data revealed a consistent association between microstructural damage reflected on DTI and clinical severity scales, most noticeably in the motor area (UPDRS-III and PIGD), even when the age, sex, disease duration and different imaging protocols were controlled. The rapid progression of the disease, assessed timely and accurately by our technique, might improve the diagnostic confidence of the neurologist, where an appropriate treatment course can be designed based on the response.

Regions Related to Motor Function
Basal ganglia are commonly considered as the main region involved in the pathogenesis of movement disorders. However, there is no obvious relation between the extent of damage in the basal ganglia and clinical severity. This observation is not unexpected given that the execution of movements requires inputs from multiple brain areas. Motor abnormalities occurring in patients with PSP are likely the results of an extensive involvement of various motor-related areas-including thalamus, precentral gyrus, middle temporal gyrus, and middle frontal gyrus.
In the current study, we found that DTI parameters measured in the thalamus were associated with UPDRS-III (sensory region) and MHY (rostral temporal part). Structural atrophy in the thalamus has been associated with an impaired motor function and seems to be one of the hallmarks to distinguish PSP from both PD and MSA [17]. Activation of these thalamic regions has been reported in pain, sleep, execution, attention, and noticeably-motion-related vision [18]. Similarly, the regions identified within globus pallidus and the sensory thalamus may be linked to the execution and/or planning of different action tasks [19]. Our data indicate that the severity of motor damage may be assessed by measuring the diffusion metric in multiple parcellated regions as selected from the whole brain.

Regions Related to Psychomotor Interactions
The clinical manifestations of PSP predominantly-but not exclusively-affect motor function. It is widely recognized that a proper motor execution requires adequate sensorimotor feedback, visuo-spatial perception, and motor learning. In the current study, the parahippocampus gyrus was found to contribute significantly to the prediction of both UPDRS-III and LEDD. This region is involved in memory and/or semantic language function [20], and its impairment has been related to memory decline in patients with PD [21]. A previous connectome analysis demonstrated numerous connections from the parahippocampus to subcortical regions-including thalamus, basal ganglia, hippocampus, and amygdala [11]. Besides its role in memory, the parahippocampus may therefore serve as a local functional hub linked to various operating nodes within the motor neural network. Our data may prompt further investigations into the role played by psychomotor functions in the clinical manifestations of PSP.
An accurate prediction of LEDD may be hampered by numerous factors that can influence drug dosage-including age, sex, disease duration, genetic background, and pathological status. Here, we found that regions associated with LEDD were related to face recognition (medial superior temporal gyrus) [22], emotions of fear or disgust (amygdala) [23], and emotion processing-especially in the reward/pain domain (caudal cingulate gyrus) [24]. Difficulties in recognizing negative emotions are part of the cognitive impairment occurring in patients with PSP [25], who are characterized by an impaired metabolism in this part of caudal cingulate gyrus [26]. The functions from our neuroimaging findings are in accordance with clinical observations showing apathy and impaired emotion processing of facial expressions in PSP [25]. The general prediction rule outlined in our study-based on a combination of cortical and subcortical neural networks-suggests that PSP is characterized by alterations in emotion and cognitive processing [27]. Consequently, a comprehensive evaluation of these patients cannot be limited to the sole assessment of motor function.

Validation of the Prediction Model
Notably, 15 of the 53 study patients performed a follow-up MRI examination and served as an additional validation cohort. Many conditions can contribute to the deviations in our prediction, for example, the patient condition at clinical evaluations, the disease courses or response to the treatment during the follow-up period, as well as the scanner fluctuation at acquisition and the subsequent post-processing procedures [28]. The difference in MAE of UPDRS III and LEDD obtained in this patient subset did not reach significance (p = 0.936 and 0.282, respectively) when compared with those from the cross validation analysis-suggesting that our prediction rule might be reliable and consistent in both assessments.
It can be observed that the prediction of LEDD in the validation cohort slightly deviated from the forecast-being characterized by the lowest adjusted R 2 in the original model (leave-one-out/five-fold cross validation = 0.772 ± 0.008/0.739 ± 0.047). The response of patients with PSP to levodopa may be either absent or transient. However, there may be differences in the clinical spectrum of disease-with patients with PSP-RS showing poorer response than those with PSP-P [29]. The predominant inclusion of patients with PSP-P (as in our study) may lead to an underestimation of the predictive value. Notably, it has been previously suggested that common criteria for defining the response of patients with PSP to levodopa may not be entirely accurate and thus need further examination [30]. Similarly, our findings related to LEDD require additional validation in larger studies.

Technical Consideration and Additional Issues
Previous DTI analyses have been focused on white matter damage and frequently relied on complicated algorithms for accurate fiber tracking [31]-ultimately being unsuitable for routine clinical use. Notably, our current approach for predicting severity scores was not based on white matter tractography or connectivity analysis. We rather performed a reconstruction of the diffusion tensor followed by image normalization and parcellation-a method that can be easily implemented in a clinically-oriented environment with the use of freeware software SPM [12] or FSL (Functional magnetic resonance imaging of the brain Software Library [32]).
The inclusion of large amount of features might potentially result in model overfitting. To remove features unassociated with the outcome, several approaches have been developed, for example, principal component analysis [33] and independent component analysis [34]. However, it could be difficult to adjust the performance estimates when using these data-driven approaches [33,35]. The least absolute shrinkage and selection operator (LASSO) is a dimension-reduction technique which balances the bias and variance to minimize the mean squared error of the predictive model [36]. Features survived after LASSO procedure has been shown to be stable in the predictive performance when compared with other feature selection methods [37]. Here, we implemented a L1-norm regularized LASSO procedure to reduce the number of features that could be entered into the regression model. The result showed that the number of the features survived LASSO (22 to 32 for different clinimetrics) was less than the sample size (53 patients). The final number of the features in each predictive model was further reduced to 1/5 of the sample size at stepwise regression. This procedure was believed to be able to minimize the identification of spurious correlations [36].
PSP was a neurodegenerative disease caused by four-repeat (4R) tauopathy and region-specific tau deposits. It was postulated that the increase of amyloid-β (Aβ) might trigger tau pathology leading to the eventual neuronal death [38]. However, an appropriate tau-ligand positron emission tomography (PET) seems yet to be developed. Although the ATN characterization (biomarker of Ab-amyloid, Tau and Neurodegeneration or Neuronal injury) and of post-mortem evidence are not available to our study, it is less likely that these PSP patients could be amyloid positive according to the different clinical presentations, underlying neuropathological findings and our DTI imaging characteristics.
Because the subgroups of PSP patients was considered as different clinical presentations which might be related to region-specific tau deposits [39], it would require increasing number of patients in each subgroup to reliably verify our result. The combination DTI with appropriate tau-PET might be a powerful and complimentary approach to give precise diagnosis and prognosis of PSP. In the future study, it would be interesting to investigate this disease with specific radioactive tracers that is appropriate and clinically available to the general practice.

Limitations
Because of the retrospective nature of the study, the specific scale of PSP assessment is not available and the images of the participants were collected from three MRI imaging protocols, in an effort to increase the number of participants. However, the UPDRS-III and PIGD are used for the general evaluation of the clinical severity. Our result showed that the predictive models can be valid for different imaging protocols.
Owing to the data-driven approach, we cannot infer any direct causal relationship between the observed brain alterations and clinical severity measures. The question as to whether these associations are truly causal needs to be addressed in larger longitudinal investigations.
Because PSP is a rare disease, we were only able to include 53 patients in our analysis. Therefore, we did not divide a specific portion of the participants as a k-folds hold-out validation cohort. However, we did used a subset of data from 15 returned patients as an additional validation. A methodological point that merits comment is that diffusion metrics in volumes of interest are generally reported as means. It is difficult to forecast whether such parameters would increase (or decrease) under certain disease states. Consequently, the 90th, 50th, and 10th percentiles of each variable were recorded in this study. Being unrelated to the morphometric features of the ROI (e.g., length, area, and volume)-these values contributed to the accuracy of the normalization algorithm.