Can detection and prediction models for Alzheimer’s Disease be applied to Prodromal Parkinson’s Disease using explainable artificial intelligence? A brief report on Digital Neuro Signatures.

Parkinson's disease (PD) is the fastest growing neurodegeneration and has a prediagnostic phase with a lot of challenges to identify clinical and laboratory biomarkers for those in the earliest stages or those 'at risk'. Despite the current research effort, further progress in this field hinges on the more effective application of digital biomarker and artificial intelligence applications at the prediagnostic stages of PD. It is of the highest importance to stratify such prediagnostic subjects that seem to have the most neuroprotective benefit from drugs. However, current initiatives to identify individuals at risk or in the earliest stages that might be candidates for future clinical trials are still challenging due to the limited accuracy and explainability of existing prediagnostic detection and progression prediction solutions. In this brief paper, we report on a novel digital neuro signature (DNS) for prodromal-PD based on selected digital biomarkers previously discovered on preclinical Alzheimer's disease. (AD). Our preliminary results demonstrated a standard DNS signature for both preclinical AD and prodromal PD, containing a ranked selection of features. This novel DNS signature was rapidly repurposed out of 793 digital biomarker features and selected the top 20 digital biomarkers that are predictive and could detect both the biological signature of preclinical AD and the biological mechanism of a-synucleinopathy in prodromal PD. The resulting model can provide physicians with a pool of patients potentially eligible for therapy and comes along with information about the importance of the digital biomarkers that are predictive, based on SHapley Additive exPlanations (SHAP). Similar initiatives could clarify the stage before and around diagnosis, enabling the field to push into unchartered territory at the earliest stages of the disease.


Introduction
Parkinson's disease (PD) prevalence rate increases with age and rises from about 1% in individuals aged ≥60 years to 3.5% in older adults of 85-89 years [1][2][3] .The complexity of cross-sectional diagnosis is stereotypically exemplified in PD, which happens to be the second-most common neurodegenerative disorder after Alzheimer's disease [4][5][6] .The pathological hallmark of PD is misfolded α-synuclein protein (aSyn) structures, and the gold standard for diagnosis is their identification in post mortem pathological examinations of the brain 7 .However, given that most idiopathic patients experience years or sometimes decades of unspecific symptomology, the field of prediagnostic Parkinson's disease (prodromal-PD) is fast-moving with multiple strategies seeking to discover a panel of clinical and laboratory biomarkers for those 'at risk' 8 .Prodromal-PD 9 is when individuals do not fulfill diagnostic criteria for PD (i.e., bradykinesia and at least one other motor sign) but exhibit signs and symptoms that indicate a higher-than-average risk of developing motor symptoms and a diagnosis of PD in the future.Presently, most imaging markers across a range of modalities and the emerging literature on fluid and peripheral tissue biomarkers is limited in predicting prodromal-PD, pointing to the need to identify robust predictors of change across the entire spectrum from ordinary to symptomatic PD for more realistic primary or secondary preventive trials for PD 10 .
Consequently, longitudinal measures of pre-motor symptoms and behavioral/cognitive decline are essential for evaluating preclinical markers and monitoring prodromal-PD progression.Such longitudinal characterization of non-motor features has been identified by the Movement Disorders Society (MDS) as being valuable for early identification of PD, according to the research criteria for prodromal PD 11 , which include two types of measurements: the delineation of the relative temporal trajectories of specific quiet motor and non-motor features that can be present before diagnosis and the fluctuation of those features over time within and across neurocognitive domains 12 .The utility of such markers in evaluating prodromal-PD progression depends on early symptoms and signs before PD diagnosis is possible and may vary across different primary care settings 12 .The utility of such markers in evaluating prodromal-PD progression depends on early symptoms and signs before PD diagnosis is possible and may vary across different primary care settings 12 .However, intra-individual variability (IIV) across several measurements, called dispersion, is a sensitive marker for detecting change even at prodromal stages of a disease 13 .One digital biomarker tool that utilizes dispersion to provide such measurements is the Altoida Digital Neuro Signatures platform (DNS), a more efficient, accurate, and sensitive assessment of cognitive function than traditional neuropsychological tests, both in cross-sectional and longitudinal evaluations 14 .Previous studies have validated the machine learning model's performance to measure dementia disease progression and detect the biological signature of prodromal AD, which predicts conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD) with 94% prognostic accuracy 15 .
In this work, we will briefly report on DNS signature similarities from previous studies and the dataset collected in The ANANEOS Project, an ambitious longitudinal community-based study for healthy aging in Greece.The project is part of the GR2021 Priority project Healthy Brains for Life (age 20-99 years) and focuses on the decentralized and remote assessment of the symptoms of preclinical stages in Alzheimer's disease and movement disorders, e.g., Parkinson's, with a rationale and a methodology similar to other international initiatives.Relevant examples of similar large-scale national initiatives can be found in Japan with the IROOP registry system for identifying risk factors for dementia 16 , the Sydney (Australia) memory and ageing study 17 , the Framingham heart study in the USA 18 , the UK Biobank study of lifestyle and genetic factors incidence in dementia 19 , the European Prevention of Alzheimer's Dementia Longitudinal Cohort Study 20 , the FINGER project in Finland 21 , the INTERCEPTOR Project in Italy 22 or The Vallecas Project in Spain 23 .
The emergence of large longitudinal primary care cohorts, alongside advances in digital biomarkers and artificial intelligence (AI), has allowed detailed exploration of the full range of early motor and non-motor symptoms that predate PD.In contrast, advanced prodromal PD detection and prediction models could become a platform for medical practitioners that plan to diagnose or detect the disease earlier and more accurately.Despite the enthusiasm that objective motor dysfunction occurs prior to diagnosis in PD and the variety of measuring devices, which have been developed, including software applications that harness passive and active digital biomarkers, e.g.activity and motion (and in some cases speech) captured by smartphones and tablet devices, custom-built sensors that measure gait, bradykinesia, dyskinesia, and nocturnal movement detection devices, there are currently very few examples of the application of wearable devices or AI models in prediagnostic PD.
Our goal here was to answer the single question: Can detection and prediction models for Alzheimer's disease be rapidly applied to prodromal Parkinson's disease using explainable artificial intelligence?A major foreseeable hurdle is ensuring that any detection and prediction model focuses both on improving the system performance and AI interpretability, employing natural language explanations, which could help physicians understand the predictions.For the answer above, we focused on DNS signature patterns between our existing databases and ANANEOS using permutation-based techniques to help us understand the actual effect of the predictors (DNS signatures from the existing AD database) in the target database (preclinical markers that predict Prodromal-PD progression).

Data collection
We used a combination of clinical and population data, collected and provided by Altoida, Inc.The clinical data (n=438) is described in previous studies 15 and consists of controlled tests of elderly (≥50 years) subjects with known biological and psychological biomarkers (e.g., MCI, amyloid-beta (Ab)+, Ab-, AD).We used the dataset described as "New validation study" (Clini-calTrials.govIdentifier: NCT02843529) for this work, the original purpose of which was to evaluate the performance of Altoida's application as an adjunctive tool for diagnosing AD.This data was collected in various major cities in Italy, Greece, Spain, USA, and Ireland.
The dataset above was enriched with two more databases: 1) A clinical dataset called RADAR-AD.RADAR-AD is a multicentre observational, cross-sectional, cohort study in subjects within the preclinical-to-moderate AD spectrum as well as healthy controls.The design entails three tiers: (1) main study, which includes smartphone applications and wearable devices only; (2) first sub-study, which in addition includes fixed sensors at the participant's home; and (3) second sub-study, which in addition includes fixed sensors in an existing smart home environment.Participating clinical sites were selected based on their geographic location, expertise in digital technologies and disease population of interest, and the availability of clinical cohorts with known AD biomarkers 24 .
2) A population dataset collected by Altoida named "healthy basket."A healthy basket is a population sample consisting of middle-aged cognitively healthy Japanese subjects (n=130).The inclusion criteria for participation were age 20-50 years and self-assessed cognitively healthy (i.e., no known cognitive disorders).The subjects received no stipend for participation, and permission for scientific studies was provided by accepting the terms and conditions of Altoida, Inc.All subject information was anonymized and de-identified.
Beyond the digital biomarkers collected by the Altoida application, no further biomarkers were recorded for this population sample.For both datasets, the subject's sex was self-reported.All subjects (of both groups) performed multiple test sessions using Altoida's application.
Finally, the target database was part of the Digitally enhanced, Decentralized, Multi-omics Observational Cohort (ANANEOS) study.ANANEOS is an ongoing single-centre, observational, longitudinal cohort (n=500,000) for individuals (aged ≥50 years) with a ClinicalTrials.govIdentifier: NCT04701177 different datasets, we stratified all analysis by dataset, sex, and number of data points.This ensures that we have exactly the same number of data points from each sex and from each study (clinical and population).The flowchart showing the overall dataset structure and the prelim study purpose is shown in Figure 1.

Digital neuro signatures (DNS)
For this work, we repurposed data from Altoida's application which collects digital biomarkers for neurocognitive function measurement and predictive diagnosis of AD 15 .Altoida's application collects digital biomarker data for detecting early-onset AD.While holding a tablet or smartphone device, the subject is asked to perform a series of motor functioning tasks and two augmented reality (AR) tasks.In the motor functioning tasks, the subject is required to draw shapes and tap on the (touch)screen using the finger of their dominant hand (see Figure 2 for an illustration of all the motor functioning tasks).
In one of the AR tasks, the subject is asked to place three virtual objects in a small space (approximately 3×3 m or 2×4 m) and afterward find them again.The AR task is performed by navigating around the space with the tablet or smartphone in both hands (see Figure 3).During these tasks, the handheld device collects telemetry and touch data from the built-in sensors, enabling profiling of hand micro-movements, screen-touch pressures, walking speed, navigation trajectory, cognitive processing speed, and additional proprietary inputs.
A single test session using Altoida's application consists of two batches of motor tasks and two AR tasks.After a subject completes all tasks, the recorded digital biomarker data from the onboard electronics sensors is bundled and securely and anonymously uploaded to a server for further processing.Provided the data of multiple subjects, machine learning can be used to detect patterns.In previous work machine learning was either used to classify subjects as healthy or at risk of AD 15 .In this work, we examined DNS signatures from our development dataset for AD to see if they demonstrated preclinical markers that predict prodromal-PD progression, expressed by the capacity of results to inform a novel prodromal-PD DNS signature.

Machine learning
We extracted 793 digital biomarker features from the onboard electronics sensors describing various cognitive, functional, and physiological characteristics of each subject.These features include response times, eye-hand coordination precision, fluctuations in the telemetry (accelerometer and gyroscope) data, Fourier analysis of the telemetry data, step detection, and additional proprietary data.Based on the digital biomarker feature data from a selection of healthy subjects, we trained a DNSmatch classifier to distinguish prodromal-PD individuals from any other group.We used the XGBoost algorithm 25 with DNS preclinical markers that predict prodromal-PD as the target variable for the classification.

Performance evaluation
We applied stratified five-fold grouped cross-validation to estimate the generalization performance of the DNS prodromal-PD classifier.We grouped data points by subject to ensure that multiple data points of a single subject were all in the same fold (either training or testing), preventing learning bias.For our prodromal-PD classifier, we measured accuracy and precision averaged over the five cross-validation testing folds.To assess the classifier's performance on different age groups, we trained nine additional classifiers (10 in total), each using different random subsets of the data.

Model explainability
We used the Shapley Additive exPlanations (SHAP) 26 method to better understand the predictions made by the DNS prodromal-PD classifier.The SHAP method allocates to each feature of a classifier a game-theoretical value representing the contribution of that feature towards the classification targets.The sign of the SHAP values indicates the direction of the contribution, and the magnitude of the SHAP value indicates the importance.For our classifier, negative SHAP values contribute to classifying as non-prodromal-PD, positive numbers towards prodromal-PD.SHAP values have an additive property meaning they can be summed together to provide the feature contribution of a group of features 27 .

Common DNS signature for preclinical AD and prodromal PD and features contribution
We wanted to investigate whether detection and prediction models for Alzheimer's Disease can be rapidly applied to prodromal PD using explainable artificial intelligence.The analysis of our prodromal PD classifier revealed at least 20 common features that are the same for both preclinical AD and prodromal  During the AR test, the subject is asked to place and find three virtual objects in the room.To do so, the subject is required to walk around the room holding a tablet or smartphone device in front of him/her.While doing so, the camera of the device records the environment and displays it back to the user on the screen, augmented with virtual objects (in this illustration, a teddy bear).The user needs to place the objects on flat surfaces and later recall their position by walking back to that location.
PD.After computing a SHAP value for each DNS signature containing more than 793 features from our development dataset, we arrived at this conclusion.We compared them with a novel prodromal-PD classifier, which detected a DNS signature in the ANANEOS validation dataset.We then investigated, which of the 793 features contain the most relevant preclinical markers that predict prodromal-PD.Figure 4 shows a grouping of those features that were ranked as having the highest overall contribution in the classifier.
The primary contributing group of features is named the ignore high tone percentage and the AR object placement directness.This group consists of an interference index (non-motor feature) and a set of frequency magnitudes obtained while the participant moves around trying to find a virtual object in the AR test (motor feature).These features could therefore be interpreted as a brain network function and navigation micro-errors.The second most important group of digital biomarker features is the AR global telemetry variance.The global telemetry variance is the variance in the accelerometer and gyroscope signal over the entire duration of the AR task.It could be interpreted as coarse-scale hand motion micro-movement (motor feature).The third and fourth most essential features are frequency magnitudes during object placement, belonging to the top group placing virtual objects in the AR test, collected using a fast Fourier transform (FFT) on the measured accelerometer and gyroscope signal over 1.28 seconds before placing (motor feature).The remaining elements of the novel DNS prodromal-PD signature are taking into account age and group together "Motor test drawing features" to consider the speed and accuracy of the subject while drawing various patterns with the index finger (motor feature).The "Circle drawing test" measures how long the user spent within the limits of the circle while performing the motor tests.

Conclusion
Our work demonstrates that it is possible to detect a novel DNS signature from existing datasets using digital biomarker data collected from Altoida's application.The intrinsic similarities between preclinical AD markers and preclinical markers that predict prodromal-PD seem to be capturing quiet motor and non-motor features dependent on age.In the prediagnostic Parkinson's Disease population, the primary differentiating features are micro-errors and micro-movements detectable by Fourier analysis on accelerometer data, although they are non-visible to the naked eye.Such prelim results can provide physicians with some insights into driving factors of our prediction model from multiple points of view including visualization, and feature importance based on SHapley Additive exPlanations (SHAP).Further validation is pending upon larger sample size and multiple additional biological markers and endpoints.

Ioannis TARNANAS
Dear Dr. Tort, many thanks for your thorough review and excellent questions.We would like to address them here for your attention: Learning effects and the influence of tech literacy have indeed being measured through another study performed by an independent third party.Since this is a brief report for the specific finding about PD, we are reserving the right to publish those results at another publication.

○
Similar to the answer above, yes further analysis of the psychometric properties of the test are also being computed.The clinical protocol of that study will be published soon.It has been submitted at another journal.

○
The application of digital biomarkers in clinical practice and clinical trials is actually a very interesting question.Different barriers and opportunities exist in those two areas.For clinical trials, we aim to become a secondary/primary endpoint that could identify and report meaningful change of different groups or patients, e.g.Preclinical PD, Prodromal PD and PD diagnosis for therapeutic clinical trials.In the clinical domain and once the results of the primary endpoint above are satisfactory, we would like to become a companion diagnostic for novel treatments.In general however the barriers of entering clinical practice in the primary care setting is a multifaceted one.It involves a dialogue with regulators, medical associations, ○ healthcare economics (HTA) and other units that spread a long way after the demonstration of the clinical benefits.Therefore, we do believe that the first to benefit are going to be the clinical trials and after a long time also the primary care setting.
Competing Interests: No competing interests were disclosed.I have two questions for the authors: the first one is how they explain the fact that the same 20 features, in 793 possibilities, are the best for AD and Parkinson's disease (PD), when we know that in AD disease the cognitive impairments are more important, instead of in PD the motor ones are the main problem mainly in the beginning of these diseases?
The second one is: If each one item of the 20 features is compared head to head are there differences between them, and in case of positive answer, what are they? 2.
As explained above, is an excellent paper, well written, with a good methodology, and deserves to be published.My opinion is this paper is approved.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and does the work have academic merit?Yes

Figure 1 .
Figure 1.Flowchart showing the overall dataset structure and the prelim study purpose.

Figure 2 .
Figure2.The motoric functioning tasks in the Altoida test.These are executed one after another.Using their index finger of their dominant hand, from left to right, the task is to 1) draw a circle, 2) draw a square, 3) draw a rotated W shape within 7 seconds, 4) draw as many circles as possible within 7 seconds, 5) tap the highlighted buttons (left, right, left, right, etc.) 6) tap the highlighted button as fast as possible, the buttons highlight at random.

Figure 3 .
Figure 3. Illustration of the Augmented Reality (AR) task in the Altoida test.During the AR test, the subject is asked to place and find three virtual objects in the room.To do so, the subject is required to walk around the room holding a tablet or smartphone device in front of him/her.While doing so, the camera of the device records the environment and displays it back to the user on the screen, augmented with virtual objects (in this illustration, a teddy bear).The user needs to place the objects on flat surfaces and later recall their position by walking back to that location.

Figure 4 .
Figure 4. Feature importance of the Prodromal-PD classifier.A) The top twenty feature groups according to the SHAP method.Each bar represents the summed SHAP value of the features in that feature group.B) A feature value SHAP distribution plot for the top five contributing features.Subject specific SHAP values were computed for each datapoint in the classifier training data.For each feature, we then plot for each datapoint a dot with the feature value of that datapoint, with the dot color coded by the relative feature value.The position of each dot on the SHAP value x-axis represents the magnitude and the direction of the contribution of that specific feature value of that specific datapoint towards classifying as female (-1) or male (+1).Acronyms in the plots are Augmented Reality (AR), Fast Fourier Transform (FFT), SHapley Additive exPlanations (SHAP), Accelerometer (ACC), variance (var), first part of a single test (1st) or second part of a single test (2nd).

Reviewer Report 13
December 2021 https://doi.org/10.21956/openreseurope.15331.r28143© 2021 Anghinah R.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Renato Anghinah 1 Department of Neurology, University of São Paulo, Sao Paulo, Brazil 2 Department of Neurology, University of São Paulo, Sao Paulo, Brazil First of all, I would like to congratulate the authors for the high level of this paper.AI and learning machines have been used to help MD take decisions about diagnosis and treatments in a wide range of diseases.A very important area that needs this kind of approach is cognitive disease.The computational solution used to Alzheimer's Disease (AD) from Altoida device just showed useful and Digital Neuro Signature (DNS), was a good candidate to a new biomarker.The use of XGBoost algorithm is an excellent choice once in my point of view is the best machine learning to be used for structured data, like what we have in this study.1.

Table 1 . Data characteristics. P-value is
15,24 participants, recruited initially since March 2021 in Athens, Greece, are home-dwelling volunteers with known biological and psychological biomarkers at the preclinical stages in Alzheimer's disease and movement disorders, e.g., Parkinson's, without relevant psychiatric, neurological, or systemic disorders.The initial cohort size was 2,180 subjects at baseline.At the time of this writing (10/13/2021), the project is in the first wave of the 24-week follow-up visits (n=133).Table1describes our data characteristics for the entire sample and stratified by sex, with univariate comparisons.Our data consists of 788 subjects combined from three datasets, two clinical datasets15,24and a healthy population dataset.Subjects were distributed over several stages of the AD clinical continuum, namely healthy, preclinical AD Ab+, MCI (amyloid-βnegative) Ab-, MCI Ab+, dementia due to AD and prodromal PD as reported by clinical assessment.To counter the imbalance from multiple data points per subject and combining two demographically calculated using a two-sided t-test for age, chi2 for status and the Mann-Whitney rank test for the number of data points per subject.

2
Alzheimer's Disease and Other Cognitive Disorders Unit, Neurology Service, Hospital Clínic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, SpainTarnanas et al. present an interesting study on the application of Alzheimer's disease detection and prediction models for the identification and monitoring of prodromal stages of Parkinson's disease.The use of IA-driven assessments on these populations provides an alternative method for detecting subtle cognitive deficits preceding the symptomatic phases of Parkinson's disease in a more sensitive way that the one provided by standard neuropsychological measures.The predictive value of the digital neuro signature (DNS) appears very promising.This is an interesting, well-justified and well-reported and analyzed large-scale study and deserves indexing.I would like to ask the authors some questions:Cognitive performance is strongly correlated with age.Also, the use of novel devices and techniques in older adults may be mediated by the participant's skills and familiarization.How do the authors think this could impact task sensitivity and specificity?How easy-to-run is the Altoida's application and how learning effects were controlled?The present findings are in line with previous studies suggesting that individuals within the preclinical phase of Alzheimer's disease may exhibit subtle motor dysfunction(Albers et al.,  2015; Buchman & Bennett, 2011) and that intrasubject variability (i.e., the inconsistency of performance) might be a more relevant marker of early AD than motor speed(Verghese et  al., 2008; Mollica et al., 2019).Also, another less-studied variable such as multitasking performance appears to be a sensitive indicator of subtle dysfunction.Notwithstanding the conclusion that the DNS seems to be a valid probe of early cognitive changes in at-risk populations, do the authors think that further research is needed to isolate the critical elements of presymptomatic evaluation? ○

the study design appropriate and does the work have academic merit? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.Reviewer Expertise: Neuropsychology, Alzheimer's disease, Dementia, Biomarkers, early detection I confirm that I