Leveraging speech and artificial intelligence to screen for early Alzheimer’s disease and amyloid beta positivity

Abstract Early detection of Alzheimer’s disease is required to identify patients suitable for disease-modifying medications and to improve access to non-pharmacological preventative interventions. Prior research shows detectable changes in speech in Alzheimer’s dementia and its clinical precursors. The current study assesses whether a fully automated speech-based artificial intelligence system can detect cognitive impairment and amyloid beta positivity, which characterize early stages of Alzheimer’s disease. Two hundred participants (age 54–85, mean 70.6; 114 female, 86 male) from sister studies in the UK (NCT04828122) and the USA (NCT04928976), completed the same assessments and were combined in the current analyses. Participants were recruited from prior clinical trials where amyloid beta status (97 amyloid positive, 103 amyloid negative, as established via PET or CSF test) and clinical diagnostic status was known (94 cognitively unimpaired, 106 with mild cognitive impairment or mild Alzheimer’s disease). The automatic story recall task was administered during supervised in-person or telemedicine assessments, where participants were asked to recall stories immediately and after a brief delay. An artificial intelligence text-pair evaluation model produced vector-based outputs from the original story text and recorded and transcribed participant recalls, quantifying differences between them. Vector-based representations were fed into logistic regression models, trained with tournament leave-pair-out cross-validation analysis to predict amyloid beta status (primary endpoint), mild cognitive impairment and amyloid beta status in diagnostic subgroups (secondary endpoints). Predictions were assessed by the area under the receiver operating characteristic curve for the test result in comparison with reference standards (diagnostic and amyloid status). Simulation analysis evaluated two potential benefits of speech-based screening: (i) mild cognitive impairment screening in primary care compared with the Mini-Mental State Exam, and (ii) pre-screening prior to PET scanning when identifying an amyloid positive sample. Speech-based screening predicted amyloid beta positivity (area under the curve = 0.77) and mild cognitive impairment or mild Alzheimer’s disease (area under the curve = 0.83) in the full sample, and predicted amyloid beta in subsamples (mild cognitive impairment or mild Alzheimer’s disease: area under the curve = 0.82; cognitively unimpaired: area under the curve = 0.71). Simulation analyses indicated that in primary care, speech-based screening could modestly improve detection of mild cognitive impairment (+8.5%), while reducing false positives (−59.1%). Furthermore, speech-based amyloid pre-screening was estimated to reduce the number of PET scans required by 35.3% and 35.5% in individuals with mild cognitive impairment and cognitively unimpaired individuals, respectively. Speech-based assessment offers accessible and scalable screening for mild cognitive impairment and amyloid beta positivity.

Thermal thresholds were determined based on the method of limits using a computer-controlled Peltier device (TSA-II-Neurosensory Analyzer; Medoc Ltd, Israel) with a 15 X 15 mm thermode secured to the fingertip as the participant's hand rested palm up on a supportive surface. The participant's other hand rested on a mouse which they were instructed to click to indicate the moment they perceived the sensation being assessed, at which point the thermode temperature would return to baseline (32˚C) in preparation for the next trial. There were three consecutive trials for each threshold. The method of limits was used in the following order: warm detection threshold (WDT), heat pain threshold (HPT), cool detection threshold (CDT) and cold pain threshold (CPT). Temperature was ramped from baseline at a rate of 0.5˚C/s. Cutoff temperatures were set to 0˚C for CDT and CPT, and 50˚C for WDT and HPT. Thermal thresholds were calculated by averaging the last two trials.
We then determined the temperature that elicited a verbal pain rating of 50/100 (where 0 is no pain and 100 is the worst pain imaginable). To do this, participants underwent a familiarization paradigm, where a 30 x 30 mm thermode is strapped to the left volar forearm, 15 cm from the wrist. Different thermal intensities are delivered and the participant is asked to verbally rate the pain of each stimulus (0-100). The thermal stimuli increase from baseline with a ramp rate of 2°C/s and maintain their target temperature for 6s, with rest periods in between each stimuli. They are delivered in the following order; 44°C, 45°C, 43°C, 46°C, 42°C, and if the participant did not rate any of the previous stimuli greater than 75/100, a final stimulus of 47°C is delivered. If one of the thermal stimuli from this familiarization protocol elicited a pain rating of 50-60/100, that temperature is taken as the participants Pain50. If not, a stimulus at a temperature that was 1˚C above their pain threshold was delivered at a ramp rate of 1˚C/s and maintained for 7 seconds. The participants were asked to provide a verbal pain rating (0-100), and then the temperature returned to baseline (32 ˚C). This process was repeated, adjusting the temperature up or down by 0.3 ˚C each trial, until a temperature that elicited a pain rating between 50-60 was achieved, referred to as the participant's "Pain50".
The Pain50 temperature was subsequently used in the temporal summation of pain (TSP) paradigm, a behavioural measure thought to reflect central sensitization, described in the Methods section of the main manuscript. The location of the familiarization protocol on the volar forearm was chosen for two reasons. The first was to keep this measurement consistent across our previously established healthy control database as well as several other patient databases that use the same protocol to determine pain50 as well as the same thermal TSP protocol. This enables us to compare results from these two quantitative sensory testing measures between our patients and controls and allows for future studies to investigate these measures across various chronic pain groups. Secondly, the location of the thermal familiarization and TSP protocols (15 cm from wrist) was purposefully kept outside the site of primary injury (i.e., median nerve innervated territory) because the intention of the TSP paradigm was to assess central sensitization in this patient group. Behavioural signs of central sensitization in patients with carpal tunnel syndrome have previously been reported as secondary hyperalgesia at distal sites on the affected arm, unaffected arm, or distant body sites including the leg and foot 7-9 , as well as enhanced TSP 10 . We wanted to avoid any primary hyperalgesia at the "injury site" influencing the temperature used to determine pain50, and subsequent performance in the TSP paradigm.

Results
Pre and post-operative sensory threshold data collected from the index finger of patient's most affected hand, along with HC data (n=10) and Pain50 data for all participants are presented in the Results section of the main manuscript as well as Figure 2. Here, pre and post-operative sensory threshold data across all testing sites (index and pinky fingers of both hands) are presented in Supplemental Table 2. Group data are displayed as mean ± standard deviation. * P < 0.05. One-tailed paired t-tests compared pre-operative thresholds on the most affected digit (D2, index finger) of the most affected hand with postoperative thresholds on that same digit (bolded data). Exact P values are reported in the Results section of the main manuscript. a Two patients had unilateral CTS affecting the right hand only (therefore, their left hand would be considered "unaffected"), while the remaining patients had bilateral CTS b Seven of the patients who returned for their post-op study visit had carpal tunnel release surgery on both of their hands (surgeries were performed one at a time, and post-op visits took place at least three months after the second surgery), and nine patients had surgery on just one hand D2 = digit 2 (index finger), D5 = digit 4 (pinky finger), MDT = mechanical detection threshold, VDT = vibration detection threshold, WDT = warm detection threshold, HPT = heat pain threshold, CDT = cool detection threshold, CPT = cold pain threshold

Supplementary Material Part 2: Additional Neuroimaging Analyses
Dynamic FC between the thalamus and S1

Methods
We determined the dynamic FC between our thalamus and S1 BA3a seeds using the dynamic conditional correlation method used previously. 1-3 Each participants' time course was prewhitened using an autoregressive and moving average (ARMA) (1,1) model. A generalized autoregressive conditional heteroscedastic (GARCH) model was then applied to estimate the conditional standard deviation (SD) over time, which was used to standardize the residuals of the time series. We used the dynamic conditional correlation to calculate the time-varying correlation between the time series with an exponentially weighted moving average derived from the data using maximum likelihood. 4 We then calculated the SD of each dynamic conditional correlation across the time series and used this SD as the summary metric for dynamic FC. 5 Higher SD values indicate greater fluctuations in FC strength between two brain regions over time. We calculated the dynamic conditional correlation for each pre-op patient and HC between our left thalamus and left S1 BA3a seeds. This provided us with a SD value representing the dynamic FC between these two seeds, which we compared between pre-operative and post-operative patients with carpal tunnel syndrome and HCs in GraphPad Prism 7 using two independent-samples t-tests.

Results
There was no significant difference in dFC between the thalamus and S1 BA3a seed between preoperative patients and HCs (two-tailed two sample t-test, P=0.4519), or between post-operative patients and HCs (two-tailed unpaired t-test, P=0.8830). See Supplemental Figure 1.

Sex-disaggregated FC analyses in patients with carpal tunnel syndrome and HCs
Carpal tunnel syndrome is more prevalent in women, 6,7 with some evidence that women report worse sensory symptoms than men with the same level of nerve impairment. 8,9 While we previously demonstrated resting state FC abnormalities in the descending antinociceptive system that are influenced by sex, 10 we did not have specific hypotheses about whether sex would influence S1 and thalamic FC in these patients. The smaller number of men (n=7) compared to women (n=18) in the patient sample precluded a direct statistical comparison between sex in this study. Therefore, to investigate a role sex may have on the abnormality in S1-somatosensory association cortex FC revealed in Analysis 1a, we performed sex-disaggregated static FC analyses of the S1 hand area to whole brain. Although Analysis 2a did not reveal any abnormalities in thalamocortical FC in pre-op carpal tunnel patients, we included a sex-disaggregated version of this analysis to investigate potential influences of sex on thalamic FC.

Methods
To perform the sex-disaggregated S1 hand area seed-to-whole brain FC analyses, we used the firstlevel GLM analyses created in Analysis 1a and entered them into two separate second-level FEAT analyses (two-group difference model with FLAME 1+2), one comparing women with carpal tunnel syndrome to healthy women (n=32), and the other comparing men with carpal tunnel syndrome to healthy men (n=14). We repeated this same procedure for the thalamic to somatosensory mask analyses, using first-level GLMs created in Analysis 2a. A cluster-based statistical threshold of Z > 2.3 and P < 0.05 with a Family Wise Error (FWE) correction was used for all FC analyses.

S1 to whole brain FC in pre-operative patients vs. HCs
We conducted sex-disaggregated analyses to investigate the potential influence of sex on our finding of abnormally low left S1 hand area to right somatosensory association cortex FC in carpal tunnel patients pre-operatively. The main finding of reduced S1-right supramarginal (BA 40) FC was identified in women with carpal tunnel syndrome compared to healthy women (n=36, P=0.001), but not in men with this condition compared to healthy men (n=14). Additionally, women with carpal tunnel syndrome had stronger S1 FC with a cluster including the right angular gyrus (BA 39), fusiform gyrus (BA 37) and lateral occipital cortex compared to healthy women (P=0.0289), while men with carpal tunnel syndrome had stronger FC with bilateral primary visual areas (BA 17) than healthy men (P=0.03). See Supplemental Table 3.

Thalamocortical FC in pre-operative patients vs. HCs
There were no differences between women with carpal tunnel syndrome and healthy women, nor men with carpal tunnel syndrome and healthy men when it came to thalamic FC with the somatosensory system mask. Peak MNI coordinates, Z, and P values are reported for significant clusters found in the S1 (hand area) seed-to-voxel whole brain functional connectivity analyses comparing women with carpal tunnel syndrome (CTS) before surgery to HC women (n=36), and men with CTS before surgery to HC men (n=14). Brain regions are provided using FSL's Harvard-Oxford Cortical Atlas & Talairach Daemon Label tools. Thresholded at P <0.05 (FWE-corrected for multiple comparisons).

Supplemental
Supplemental Figure 1 Normal thalamus to S1 BA3a FC in carpal tunnel syndrome. Our seed-to-seed analyses of static FC between the left thalamus and left S1 BA3a area revealed no differences between pre-op patients with carpal tunnel syndrome (n=25) and HCs (

Supplemental Figure 2 No Correlations between FC and clinical measures.
In Analysis 3, pre-operative patients' first-level S1 and thalamus FC analyses were entered into second-level regression analyses in FEAT (single-group average with additional covariate), with the statistically significant clusters from the group difference used as a mask, and the demeaned pre-op clinical scores entered as a covariate. We did not find significant correlations between pre-operative patients' clinical scores and their FC. Because FEAT does not provide statistical values for results that do not meet the threshold of significance, we performed the correlations in GraphPad Prism 7.0 and graphed the results here for visualization purposes. Specifically, we ran correlations between pre-operative patients' (n=25) mean Z statistic from a 2mm spherical seed centered around peak coordinates of the significant cluster being investigated (yaxis) and their pre-op BCTQ, BPI or pain DETECT scores (x-axis). (A) Regression analyses in FEAT (single group average with additional covariate) masked with a significant cluster in the supramarginal gyrus (revealed to have abnormally low FC compared to HCs in Analysis 1a) did not find significant relationships between pre-operative S1 FC with this region and pre-op BCTQ symptom, BPI, and painDETECT scores. This was confirmed with subsequent correlation analyses performed in GraphPad between S1-supramarginal gyrus FC and BCTQ symptom scores (Pearson's r=0. 1289