A neuroimaging measure to capture heterogeneous patterns of atrophy in Parkinson’s disease and dementia with Lewy bodies

Highlights • Neuroanatomical normative modelling used for the first time in Lewy body disorders.• Marked inter-subject heterogeneity in brain atrophy patterns in DLB and PD.• Total outlier count is associated with clinically relevant cognitive measures in DLB.• Total outlier count indicates deviations in brain structure in DLB and PD.


Introduction
Cognitive impairment is a core diagnostic feature of Dementia with Lewy bodies (DLB) (McKeith et al., 2017) and is common in Parkinson's disease (PD) where almost half of patients develop dementia within ten years' of diagnosis (Williams-Gray et al., 2013).Conventional grouplevel neuroimaging studies measuring brain structure in DLB and PD have yielded heterogeneous findings (Weil et al., 2019;Oppedal et al., 2019), with no consistent atrophy pattern predicting future cognitive decline (Weintraub et al., 2012;Lee et al., 2014) or correlating with symptom severity (Oppedal et al., 2019).This has limited the value of conventional neuroimaging measures as biomarkers.
A key issue in group-level analysis is between-subject heterogeneity, which results from intrinsic biological differences alongside psychosocial and environmental factors independent of the disease (Cohen-Mansfield, 2000).This has implications for case-control studies that compare group means, which only allow inferences to be made for the 'average subject', and treat between-subject variability as noise (Verdi et al., 2021).
To better understand the neural basis of neurodegenerative disorders such as PD and DLB, there is a need to understand between-patient heterogeneity.Neuroanatomical normative modelling is a recently established framework that maps individual patterns of variation from the expected norm (based on age and sex) for a given neuroimaging measure (Verdi et al., 2021;Marquand et al., 2016;Marquand et al., 2019).Exemplifying this approach, Rutherford and colleagues (Rutherford et al., 2022) modelled lifespan trajectories of cortical thickness and subcortical volumes using Bayesian Linear Regression based on a reference cohort of 58,836 healthy participants.Then, a new individual's cortical and subcortical data could be plotted within each normative distribution, to quantify deviation from expected patterns.Statistical thresholds can be used to binarise the resulting z-scores to quantify neuroanatomical outliers.The number of outliers can be aggregated to provide the total outlier count, an individualised measure of overall neurodegeneration.
Neuroanatomical normative modelling has recently been applied in Alzheimer's disease (AD) (Verdi et al., 2023;Verdi et al., 2023;Flavia et al., 2022), showing an increased number of outlier regions in AD than mild cognitive impairment (MCI) or healthy controls (Verdi et al., 2023).Importantly, total outlier count correlated with poorer cognitive performance, fluid biomarker-measures of Alzheimer's pathology, and predicted future conversion from MCI to dementia (Verdi et al., 2023;Verdi et al., 2023).Given that neuroimaging measures in Lewy body disorders may be more neuroanatomically heterogeneous than AD (Scheltens and Korf, 2000;Mak et al., 2014), this approach may have even greater utility in Parkinson's and DLB.
Here, we employed neuroanatomical normative modelling to investigate heterogeneity in Lewy body diseases and evaluate the potential of this technique to provide useful measures of disease severity.In PD, previous work has shown that visual performance predicts future cognitive decline, with poor visual performance associated with risk of future dementia (Zarkali et al., 2021;Hannaway et al., 2023).Here, we a) investigated differences in total outlier count between high and poor visual performers with PD; and between PD and DLB; and compared these to conventional cortical thickness analyses; b) compared patterns of dissimilarity between PD participants with high versus poor visual function; and between PD and DLB participants, and c) evaluated whether total outlier count correlated with cognitive severity in PD and DLB.We hypothesised that there would be a) significant differences in total number of regional outliers between high and poor visual performance PD groups, and in PD compared with DLB; b) greater dissimilarity in individual patients for low versus high visual performers in PD; and for DLB compared to PD.Finally, we predicted c) that greater total outlier count would be associated with poorer cognitive performance in PD and DLB.

Participants
Structural T1w-MRI data from two sites were used.The first site at University College London (UCL), included 108 participants with PD, 36 with DLB and 38 controls, from the Vision in Parkinson's disease study (PI: Dr Weil, Queen Square Ethics Committee reference 15/LO/00476).The second site was the pseudoanonymised Alzheimer's Disease Research Center (ADRC) "8361" which contributes data to the National Alzheimer's Coordinating Center (NACC) database (Beekly et al., 2007), and included 25 participants with DLB and 127 controls.Participants from the UCL site were recruited from the National Hospital for Neurology and Neurosurgery outpatient clinics and affiliated hospitals, or from national patient support groups (Lewy Body Society and Rare Dementia Support).They were diagnosed as having PD or probable DLB if they satisfied Queen Square Brain Bank PD diagnostic criteria (Daniel and Lees, 1993) and the Dementia with Lewy Bodies Consortium Criteria (McKeith et al., 2017) respectively.Exclusions were a history of traumatic brain injury, or major co-morbid psychiatric or confounding neurological disorders; and for participants with PD, presence of dementia was also an exclusion criterion, defined using Movement Disorder Society criteria (Emre et al., 2007).All UCL participants were assessed by a neurologist (RSW) to ascertain the diagnosis of PD or DLB.Controls were recruited from spouses of patients and UCL volunteer databases.Inclusion criteria were being aged 50-80 and exclusions were the presence of past neurological or psychiatric history, or cognitive impairment on history or neuropsychological testing.
Participants from site "8361" were included if they had a structural MRI scan and met the following criteria based on descriptors available in the NACC data file (06/2022 data freeze): 1) dementia diagnosis; 2) primary or contributing cause of cognitive impairment: Lewy body disease; 3) not classed as MCI; and 4) absence of a diagnosis of PD.Controls in the NACC dataset had no evidence of cognitive impairment or history of neurological illness.

Clinical assessment
PD participants at the UCL site were divided into high (n = 64) and low (n = 32) visual performers based on performance on two computerised visual tasks: biological motion and the 'Cats-and-dogs' task (see Supplementary material).These have been described previously (Zarkali et al., 2021;Weil et al., 2017;Weil et al., 2018;Leyland et al., 2020) and shown to predict dementia and poor outcomes in Parkinson's (Zarkali et al., 2021;Hannaway et al., 2023).We stratified the Parkinson's group on this basis rather than on mild cognitive impairment (MCI) status because the phenomenology and presentation of PD-MCI is often heterogeneous (Muslimovic et al., 2005;Litvan et al., 2012;Yarnall et al., 2014;Weil et al., 2018) and a significant proportion of patients revert to normal cognition within five years (Pedersen et al., 2017).In contrast, patients with PD and visual dysfunction are at heightened risk of developing dementia, as shown in several longitudinal cohorts and population studies (Williams-Gray et al., 2013;Zarkali et al., 2021;Hannaway et al., 2023;Hamedani et al., 2020;Han et al., 2020).
Disease-specific measures included the Movement Disorder Society Unified PD Rating Scale (MDS-UPDRS) that measures motor and nonmotor domains (Goetz et al., 2008), part III of the MDS-UPDRS (MDS-UPDRS-III) to assess motor function (Goetz et al., 2008), the University of Miami PD Hallucinations Questionnaire (UM-PDHQ) to evaluate hallucinations (Papapetropoulos et al., 2008) and the Hospital Anxiety and Depression Scale (HADS) (Zigmond and Snaith, 1983) to measure depression severity.

MRI acquisition and processing
Structural T1w-MRI scans at UCL were acquired on a 3 T Siemens Magnetom Prisma scanner with a 64-channel head coil.Structural magnetisation prepared rapid acquisition gradient echo (MPRAGE) data were acquired using the following parameters: 1 × 1 × 1 mm voxel, TE = 3.34 ms, TR = 2530 ms, flip angle = 7 • , acquisition time = 9 min.Structural T1w-MRI scans from NACC ADRC "8361" were acquired on 1.5 T GE scanners (further information on scanning parameters are available via the NACC database).
The "recon-all" function in FreeSurfer v6.0.0 (http://www.freesurfer.net)was used to process all UCL and NACC MRI data.Cortical thickness values (Destrieux parcellation;lh.aparc.a2009s.stats,rh.aparc.a2009s.stats)(Destrieux et al., 2010) and subcortical volumes (aseg.stats)were extracted.Processed images were quality controlled by visually inspecting grey and white matter boundaries, and subcortical segmentation boundaries superimposed on the corresponding structural T1-weighted image by a researcher blind to clinical status.Particular attention was paid to atrophied scans which can sometimes affect robust segmentation of brain structures.

Reference normative dataset
Rutherford and colleagues (Rutherford et al., 2022) modelled normative lifespan curves for cortical thicknesses across 148 regions (Destrieux parcellation) and subcortical volumes derived from Freesurfer using a warped Bayesian Linear Regression with age and sex as covariates, and accounting for site differences (Bayer et al., 2022).Bayesian linear regression with likelihood warping allows accurate modelling of non-Gaussian effects and upscaling of normative models to large cohorts (Fraza et al., 2021).Their reference cohort comprised 58,836 participants from 82 sites.

Applying neuroimaging normative modelling to study data
The reference normative model was recalibrated to the study datasets with an adapted transfer learning approach (Kia et al., 2022).This involved inputting control data from our two study sites into the reference normative model to generate stable parameters for cortical thicknesses and subcortical volumes, to account for residual differences in data distributions, caused by factors such as scanner differences.Zscores were then generated for each individual with DLB or PD, per region, relative to the recalibrated reference values.All modelling steps were performed using PCNToolkit (v0.20) (Rutherford et al., 2022).

Statistical analysis 2.6.1. Total outlier count
From the z-scores for each cortical and subcortical region generated from the normative modelling pipeline described above, outliers were defined as z-scores < -1.96.This is a commonly used threshold representing 95 % confidence that points below it differ from the mean (Fisher, 1925).This is equivalent to the p = 0.05 threshold for significance in frequentist statistical models, and since we are interested in atrophy, only consider lower values (i.e., the bottom 2.5 % of the population distribution) for a given neuroimaging metric.However, to ensure our findings were not driven by a particular threshold, we repeated the analysis using a more liberal outlier threshold < -1.282, to test whether this affected our findings (see Supplementary Material).
The total number of outliers across the 169 regions (148 cortical and 21 subcortical) was summed per participant to provide the total outlier count.Linear regressions, correcting for age and sex, were used to test for group differences in total outlier count between high and low visual performers with PD; and between DLB and PD.Further, subgroup analyses compared DLB participants at the UCL and NACC sites; and PD and DLB participants only at the UCL site.Group comparisons for proportion of outliers at each region were conducted using Mann-Whitney U tests and corrected for multiple comparisons using the False Discovery Rate (FDR).

Measuring dissimilarity within and between groups
Hamming distance is widely used in information theory and reflects the dissimilarity between two strings of equal length.At each point on the strings, a distance of 1 is assigned if the symbols are different, 0 if the symbols are the same.This is summed across the length of the strings to give the Hamming distance (Hamming, 2018).Hamming distance was calculated using the vector of binarised z-scores for outliers across brain regions.Participants were compared pairwise within groups, so had n-1 Hamming distance scores ranging from 0 to 169, where n is their group size.Median Hamming distance scores for each participant were calculated (rather than mean, as distributions were skewed) and between-group comparisons for the median Hamming distances run, using Mann-Whitney U tests.
To visualise spatial outlier patterns of cortical thickness for each region, we calculated the proportion of participants within each group that were outliers, and mapped these onto the Destrieux atlas cortical surface using ggseg in R (Mowinckel and Vidal-Piñeiro, 2020).

Associations between total outlier count and clinical features
Linear regressions adjusting for age and sex were used to test associations between total outlier count and composite cognitive score, MoCA and visuo-perception, measured using the Hooper Visual Organisation Test.In exploratory analyses, we tested associations with disease-specific measures including global measure of severity (MDS-UPDRS), motor severity (MDS-UPDRS-III), hallucination severity (UM-PDHQ) and depression score (HADS).Associations were tested in PD and DLB groups separately.For the DLB group we only included data from UCL where clinical severity data had been comprehensively collected.

Potential outliers in total outlier count measure
One PD participant and two DLB participants (one from UCL and one from NACC) had, on data visualisation, much higher total outlier counts (45, 50 and 53, respectively) than other participants (PD range excluding outlier: 0-24; DLB: 0-38).Their brain imaging was carefully quality controlled by three authors (RB, RSW, JHC), but did not show significant structural abnormalities, acquisition, or processing errors; and clinical assessment of the UCL participants was consistent with unambiguous diagnoses (see Supplementary Table 1).We present results with and without these participants below.

Conventional cortical thickness analysis
We used a conventional General Linear Model (GLM) (Freesurfer v6.0) to test for regional group-level differences in cortical thickness between high and low visual performers with PD and between PD and DLB.Age and sex were used as covariates and Monte Carlo multiple comparison correction, threshold p < 0.05.

Participants
We included 108 participants with PD (all from the UCL site); and people with DLB (36 from UCL, 25 from the NACC site), plus 165 controls (38 from UCL, 127 from NACC), used to calibrate the reference R. Bhome et al. dataset models to the study data (see Table 1 for demographic and clinical information, for further details of clinical measures in PD high and low visual performers see Supplementary Table 2).
Mean cortical thickness z-scores, derived from the normative modelling, showed a similar pattern of group differences as the total outlier score metric, except for DLB compared to PD group at the UCL site, where mean z-score was significantly lower in the DLB group, reflecting greater atrophy overall (Table 1).
The proportion of regional outliers were mapped by group (Fig. 2).For low compared to high visual performers with PD, and for DLB compared to PD, there are more regions in which there are greater numbers of outliers than would be expected by chance (i.e., >2.5 %), suggesting greater heterogeneity and more widespread atrophy.
In the PD group as a whole, 125 regions out of 169 had at least one patient with an outlier.This compares with 147/169 regions in the DLB group.The region with the highest number of PD patients who were an outlier was the left paracentral lobule and sulcus region (n = 15, 13.9 %).In the DLB group the region with the highest number of outliers (15 people, only 24 % of the group) was the right posterior-dorsal part of the cingulate gyrus (dPCC).For further information on the proportion of outliers per region, and where significant regional differences between groups exist, as well as a comparison between PD low and high visual performers, see Supplementary Table 3 and Supplementary Fig. 1.

Total outlier counts are associated with cognitive performance in DLB and with visuospatial processing in PD
There were significant differences in several clinical measures between PD and DLB groups, with the latter more severely affected (Supplementary Table 4).
In the PD group, total outlier score showed a significant association with the Hooper Visual Organisation Test (β = -0.67 (SE = 0.19); t = -3.59;p < 0.01), but did not show associations with global cognitive performance.Similar to DLB, no associations were found in PD between total outlier count and other disease measures (Table 2).
We repeated our analyses using a lower outlier threshold (z-score < -1.282), to ensure they were not driven by a particular threshold.The key findings were qualitatively similar to our findings using the threshold, z-score < -1.96 (Supplementary Tables 5 and 6).

Group-level cortical thickness analysis is less sensitive to differences in cortical atrophy between groups
A conventional GLM approach did not find any significant clusters of differences in cortical thickness between high and low visual performers in PD.Comparing PD with DLB, there were two significant clusters in the left precentral region and one significant cluster in both the superior frontal and precentral regions on the right, signifying reduced cortical thickness in DLB compared with PD in these regions (Supplementary Table 7).

Discussion
We used neuroanatomical normative modelling to examine heterogeneity of brain atrophy in PD and DLB, to overcome the limitations of 'group-average' analyses.We found greater and more variable atrophy in DLB compared with PD, despite limited spatial overlap in the cortical regions affected.We showed a similar effect for people with PD at higher risk of developing dementia (low visual performance), compared to PD at low risk of dementia (high visual performers), with higher total outlier count, and greater dissimilarity in PD low visual performers than high visual performers.Importantly, conventional GLM group-average analyses did not reveal atrophy differences between these groups.
Total outlier count is agnostic to the regional location of cortical atrophy, whereas conventional GLM approaches require cortical atrophy to be in the same locations between individuals.Strikingly, total outlier count was significantly associated with severity of cognitive measures in both DLB and PD.Overall, this indicates that measures derived from neuroanatomical normative modelling may have utility in Parkinson's and DLB.
We observed differences in total outlier count in patients at different stages in progression to dementia: in a PD dementia at-risk group (where patients did not yet have dementia); as well as in DLB.This suggests that neuroanatomical normative modelling may have clinical utility as a prognostic neuroimaging measure of disease progression in Lewy body disorders, as has been shown in Alzheimer's disease previously (Verdi et al., 2023;Verdi et al., 2023).Importantly for its clinical application, total outlier count can be calculated based on cortical thicknesses and subcortical volume read-outs from freely-available automated pipelines for commonly acquired T1w-MRI scans.
Higher total outlier count was significantly associated with poorer global cognition (lower composite cognitive and MoCA scores) in DLB but not with a measure of visuo-perceptual processing (the Hooper test).In contrast, in PD, we did not find a relationship between composite cognitive scores and total outlier count; whereas we did find a relationship between total outlier count and visuo-perception.It is possible that the lack of relationship between cognitive measures and total outlier count in PD was due to ceiling effects in the MoCA and composite cognitive scores.In contrast, the Hooper test, which measures visual perceptual processing, may be particularly sensitive to cognitive impairment in PD because visuoperceptual and visuospatial ability are early and key cognitive domains affected in PD (Curtis et al., 2019), thus less prone to ceiling effects.
The Hamming distance enabled quantification of dissimilarity between groups, and revealed greater inter-individual heterogeneity in low compared to high visual performers with PD, and in DLB compared to PD.In both comparisons, the group associated with poorer cognitive functioning showed greater dissimilarity.This is consistent with previous work showing increased dissimilarity in Alzheimer's compared to MCI and controls (Verdi et al., 2023).Greater dissimilarity in DLB compared to PD may relate to greater cortical involvement in DLB (Tsuboi and Dickson, 2005).Our findings highlight the benefits of considering individual differences over group-level analyses in Lewy body disease.
Normative modelling has some key advantages over alternative approaches to quantifying atrophy relative to a reference group, for example using W-score metrics, which have been previously applied to PD and DLB (Tremblay et al., 2021;Spotorno et al., 2020).The reference dataset of 58,836 used in normative modelling is around a thousand-fold larger than most W-score reference datasets, capturing much greater population variability and providing more robust estimates of deviation from control data.Further, the neuroanatomical normative modelling pipeline enables inter-individual heterogeneity to be quantified, which is not usually examined using W-score approaches.

Limitations
There are some limitations to consider for this work.Outliers were defined as z-scores < -1.96.This means total outliner count may fail to capture potentially relevant subthreshold levels of neurodegeneration.However, when using an alternative threshold, we found similar results, as was the case when using the mean cortical thickness z-score, which requires no threshold.
A further limitation is the differences between sites from which DLB participant data were collected.DLB data from the UCL site were collected prospectively with our study aims in mind, whereas NACC is a large relational database, where neuropsychological and clinical features information was limited.Although participants at the UCL and NACC sites did not differ in age and sex, NACC participants had longer disease duration, which may partly account for the observed increased total outlier count in that group.Alternative explanations are differences in study inclusion criteria, and testing demands on participants at the UCL site, leading to possible selection bias of less functionally impaired participants.MRI scans from the NACC site were performed on a 1.5 T scanner, and those at UCL on a 3 T scanner.However, the normative modelling pipeline is designed to help account for such differences in input data (Bayer et al., 2022) and the adapted transfer learning approach (Kia et al., 2022) allowed us to recalibrate the reference normative model based on site differences, including scanner parameters.
Finally, our DLB dataset is relatively small, although it is consistent with other imaging DLB studies (Ye et al., 2020).This may have underpowered the correlational analyses in the DLB group.DLB patients are generally frailer than those with PD and can be more challenging to assess.Recent data-sharing initiatives could enable normative modelling to be applied to larger DLB datasets from multi-site collaborations.

Conclusions
We showed that neuroanatomical normative modelling provides a new perspective on PD and DLB, which show more variable atrophy patterns between patients; and the total outlier count has potential as a clinically-useful measure of disease severity.This methodology yields personalised rather than more traditional case-control group average    In bold results showing statistically significant associations.

Fig. 1 .
Fig. 1.Outlier Heterogeneity.Outlier Hamming distance matrices for PD-low visual performers (A) and PD-high visual performers (B).Kernel density estimates (Yaxis) for a given Hamming distance score (X-axis) show that PD-low visual performers had more dissimilarity as evidenced by the flatter peak and longer tail compared to PD-high visual performers (C).Outlier Hamming distance matrices for the DLB (D) and PD (E) groups.Kernel density estimates (Y-axis) for a given Hamming distance score (X-axis) show that DLB participants had more dissimilarity as evidenced by the flatter peak and longer tail compared to the overall PD group (F).In A, B, D, E: dark blue / indigo represents the lower end of hamming distance scores whereby two participants are relatively similar to one another in terms of regional distribution of outliers, whereas yellow represents higher hamming distance scores, signifying greater dissimilarity.The more yellow in the plot, the greater the dissimilarity between individuals in the groups.PD, Parkinson's disease; DLB, Dementia with Lewy bodies.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2.Regional maps of outliers.The proportion of participants who are outliers in a particular cortical region, mapped onto the cortical surface.A. Low visual performers (who are at-risk of Parkinson's dementia) compared to high visual performers with PD (at lower risk of Parkinson's dementia).Qualitatively, more regions have a higher proportion of participants with outliers in the low visual performer group.B. PD and DLB.Qualitatively, more regions have higher proportions of participants with outliers in DLB than PD.Of note, there is no one region with more than 25% of participants being outliers, highlighting the heterogeneity in cortical atrophy in DLB and PD.Grey represents regions with 0-2.5% outliers.

Fig. 3 .
Fig. 3. Relationship between total outlier count and cognitive measures in DLB and PD.Regression plots for the association between total outlier count (independent variable) and the following dependent variables: Composite Cognitive score, MoCA and Hooper Visual Organisation Test, in DLB (A, B, C, respectively) and PD (D, E, F, respectively).Total outlier score correlated with cognition (but not with visuo-perception); whereas in PD, total outlier score did not correlate with cognition, but did show a relationship with visuo-perception.β coefficient values, corrected for age and sex, are presented along with P values.* denotes significant association.MoCA, Montreal Cognitive Assessment; HVOT, Hooper Visual Organisation Test; DLB, Dementia with Lewy bodies; PD, Parkinson's disease.

Table 1
Demographics, clinical characteristics and total outlier counts.
β ¼ 0.48 (SE ¼ 0.19); p < 0.01* PD, Parkinson's disease; DLB, Dementia with Lewy bodies; UCL, University College London; NACC, National Alzheimer's Co-ordinating Centre.All data are shown as mean (SD) apart from sex. *p values were analysed by a linear regression adjusting for age and sex.BOLD signifies statistically significant difference.
Weil has received speaker honoraria from GE Healthcare, consulting fees from Therakind, and honoraria from Britannia.J.H. Cole is a scientific consultant to and shareholder in BrainKey and Claritas HealthTech.Funding Rohan Bhome. is supported by a Wolfson-Eisai Clinical Research Training Fellowship.Naomi Hannaway is supported by a grant by the Rosetrees and Stoneygate Trusts.Neil P Oxtoby and Gonzalo Castro Leal acknowledge support from a UKRI Future Leaders Fellowship (MR/ S03546X/1) and the National Institute for Health Research University College London Hospitals Biomedical Research Centre.A.F. Marquand gratefully acknowledges funding from the Dutch Organization for Scientific Research via a VIDI fellowship (grant number 016.156.415).Rimona S Weil is supported by a Wellcome Clinical Research Career Development Fellowship (205167/Z/16/Z).The NACC database is funded by NIA/NIH Grant U24 AG072122.NACC data are contributed by the NIA-funded ADRCs: P30 AG062429

Table 2
Association of total outlier count with measures of cognitive performance and other disease specific measures.Parkinson's disease; DLB, Dementia with Lewy bodies; MoCA, Montreal Cognitive Assessment; HVOT, Hooper Visual Organisation Test; HADS, Hospital Anxiety and Depression Scale; MDS-UPDRS, Movement Disorders Society Unified Parkinson's Disease Rating Scale; UM-PDHQ, University of Miami Hallucinations; Questionnaire; HADS, Hospital Anxiety and Depression Scale.a p values were analysed using linear regressions adjusting for age and sex.