Technological Evolution in the Instrumentation of Ataxia Severity Measurement

Cerebellar ataxia is the poorly coordinated movement that results from injury or disease affecting the cerebellum. The diagnosis and assessment of ataxia are significantly challenging due to dependency on clinicians’ experience and the attendant subjectivity of such a process. In recent years, neuroimaging and sensor-based approaches, supported by effective machine learning techniques have made advances in the pursuit of addressing these clinical challenges. In this work, we present an outline of approaches to applying machine learning to this clinical challenge. We first provide a fundamental clinical overview with practical problems and then from a machine learning perspective, outline possible approaches with which to address these clinical challenges. Also discussed are the limitations in existing methods, the provision of cross disciplinary approaches and the current state-of-the-art as a potential basis for future research.


I. INTRODUCTION
The Cerebellum is an important brain region that regulates almost all aspects of movement. Although less well understood, it also influences cognition through influences on frontal lobe function [1]. Many medical conditions result in impaired cerebellar function, which results in impaired co-ordination of movement known as cerebellar ataxia (CA) [2]. CA is common yet it is still recognised and assessed by an expert clinician interpreting the characteristic uncoordinated movements, often by asking the subject to perform specific tasks that accentuate the incoordination. This is known as the ''cerebellar examination'' and was first described over 100 years ago [3]. There are several reasons why recognition The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Zia Ur Rahman . and assessment of ataxia require objective measurements. First, detection of disease progression or the effect of treatment on that progression is difficult without accurate measurement [4], [5]. Second, the development of new therapies depends on measurements that show their efficacy or superiority over existing treatments (where available). Otherwise, there is a very real risk of overlooking genuinely effective treatments or having unnecessarily large, protracted and therefore costly clinical trials to overcome the noise of less sensitive or inconsistent measurements [4], [6]. Third, some ataxias (e.g., those associated with multiple sclerosis) already have treatment but their efficacy in certain individuals is difficult to assess without accurate measurement [7]. In other ataxias there is active research with the prospect of new and emerging therapies whose success may depend critically on accurate measurement to identify any beneficial effect.
It, therefore, presents an opportunity for an engineering approach to build a system that recognises the presence and severity of ataxia. In the interest of reducing the gap between clinical need and engineering endeavor, research groups have been using machine learning (ML) technology to advance the development of clinically useful objective measures of CA [8], [9], [10], [11]. The main aims of this paper to this end are: 1) providing experience in approaching clinical problems in cerebellar medicine which may benefit from engineering solutions using ML techniques; 2) reviewing progress in applying ML in the identification and assessment of CA; 3) proposing a practical approach with which to treat CA related time series data in addition to the introduction of a novel methodology for the severity estimation problem. This paper is structured as follows: Section II outlines an overview of ataxia with clinical problems and related background. Section III discusses variants of ML targets with possible ML-based solutions. This section also reviews statistical tools and ML platforms that have been used in CA research. In Section IV, applications of ML in assessing CA from neuroimaging are described. Likewise, Section V discusses applications of ML in assessing CA via voluntary movements. This section categorises movement patterns across sub-domains of speech, axial and appendicular. Section VI presents the challenges and future of ML approaches in CA and finally concluding remarks are delivered in Section VII.

II. CLINICAL OVERVIEW A. THE TERM ATAXIA
To someone first entering neurological literature, the term ''ataxia'' is quite nuanced as it has several different meanings depending on the context in which it is used. The literal translation of ''ataxia'' means uncoordinated movement and was introduced by 19 th century neurologists to describe cases where the patterns of disorganized movement appeared similar [12]. Initially these were thought to be specific disease processes and as was the custom of the time they were given eponymous names, some of which, such as Friedreich's ataxia (FRDA), survive to this time. An important contribution of the early neurologists was to formulate the proposition that specific neurological signs 1 (such as ataxia) arose from dysfunction of particular brain regions. It thus becomes apparent that these different eponymously named diseases that featured ataxia also affected similar brain regions even if through different pathological processes. These specific brain regions that resulted in ataxia were identified and described by using the brain region as an adjective: Cerebellar ataxia, vestibular (inner ear balance mechanism) ataxia, proprioceptive (sensory) ataxia. Another important concept is that the presence of other signs (e.g., spasticity or hearing loss) point to involvement of other neural regions, which in conjunction with the tempo of onset progression can help point to cause of dysfunction (pathology).
The process of describing ataxia by both the location affected (e.g., CA) or by the disease process has persisted but the understanding and classification of disease mechanisms is now more sophisticated. The common features of all of the disorders listed in Fig. 1 is that their pathology affects the cerebellum as well as many other parts of the nervous system (e.g., multiple sclerosis) or affect the cerebellum exclusively (e.g., post-infectious cerebellitis). Some of the many pathological processes can cause dysfunction of the cerebellum and/or its input pathways (afferents). Dysfunction may be temporary and without associated neuronal loss: for example, short term alcohol use or post-infectious cerebellitis. Broadly CA is caused by many pathological processes including toxins (e.g., long term alcohol use), infections, autoimmune disorders (e.g., multiple sclerosis or post-infectious), trauma and stroke [13], [14]. These are known as acquired CA to separate them from neurodegenerative CA which are considered to be linked to intrinsic processes, principally inherited factors. Much of the literature around measurement of CA is biased toward neurodegenerative disorders. As this study is directed at measuring CA, it will be focus on CA as a neurodegenerative disorder. A more detailed description of neurodegenerative ataxia is provided in subsection II-D below.
The key point for this discussion is that CA is the sign that results from dysfunction of the cerebellum and/or its neural afferents, and it may be caused by many pathological processes and in conjunction with a range of other signs.

B. WHAT ARE THE SIGNS OF CA?
CA results in potential dysfunction of every voluntary movement and so affects eye movement, speech, axial movement (balance and gait), and appendicular movement (upper and lower limbs) [14]. Cognition, especially executive function, may also be affected. Although experienced clinicians have a rapid almost subconscious pattern recognition of ataxia, the examination of some of the tasks used to accentuate ataxia have been codified and standardised in an effort to facilitate rating ataxia severity [2]. Some of these rating scales are (e.g., Scale for the Assessment and Rating of Ataxia [SARA] [15], International Cooperative Ataxia Rating Scale (ICARS) [16] and the Brief Ataxia Rating Scale (BARS) [17]). Some rating scales have been developed to service specific CA (e.g., Neurological Examination Score for Spinocerebellar Ataxia [NESSCA] [18], the Friedreich Ataxia Rating Scale [FARS] [19] and the modified FARS [20]). These scores provide an ordinal score (usually 5 points with ''0'' rated as normal) for each domain. The scores for each domain are summed to provide a total score, a lower score indicating less clinical severity. By calculating discriminant validity, interobserver, and intra-observer agreement (interclass correlation coefficient [ICC]), works by Brandsma and Saute et al. have proved the ICARS, SARA, and BARS were among the most reliable of the general clinical scales for CA [21].
These clinical rating scales are invaluable for ML approaches as they provided a quantified version of the ''ground truth'' by providing not only an indication of the presence of CA but also its severity. It is important to acknowledge that the clinical examination for ataxia is used in two quite different settings. One setting is in the diagnosis of CA (Section III-B, ii). This is the initial recognition that cerebellar dysfunction is present and usually leads to further investigations that aid in establishing which particular pathological process is the cause of the CA. Usually (but not always), this occurs relatively early in the course of the disease when signs of CA are relatively mild. There is a case to be made that objective measurement of CA using ML approaches would be helpful, because a person with CA's first contact is often with clinicians who are not experienced with CA and so may overlook these early signs, where an objective measurement might aid in its early detection. However, the second setting, which is the measurement of the severity of established CA (Section III-B, iii) and its progression, has received the most interest because objective measures would be helpful in clinical trials. It is important for ML approaches to distinguish between these settings because the feature sets that establish the presence of CA may differ from those features that change with increasing disease severity [22].

C. THE CLINICAL EXAMINATION VERSUS PHENOTYPING
A key concept behind the clinical examination in neurology is that particular ''signs'' are sought because they are signs of dysfunction of specific brain regions, for example, CA: cerebellum, spasticity: the long tracts, monocular blindness: optic nerve or retina. Originally this was an empirical observation that specific brain regions subserved specific functions. With increasing knowledge that particular disease processes tend to affect particular brain regions a knowledge of functional neuroanatomy allowed a clinician to deduce the site of dysfunction in the nervous system and the probable pathology. The advent of modern brain and spine imaging has helped in this localization of pathology and deduction of pathology.
However, genetics has provided an added perspective. The full set of genes in neurones is the same as every other cell in the body. However cerebellar neurons have a different appearance and connectivity to, for example, motor cortex neurones and the reason for these differences is the subset of genes they express. 2 Two cells that have the same set of genes (e.g., one individual's cardiac muscle cell and a cerebellar Purkinje cell) yet differing morphology, function and connections due to expression of different subsets of these genes are said to have a different phenotype. Thus, in effect, detecting CA caused by, for instance, alcohol, is detecting impairment of neurones with a specific phenotype. Nevertheless, phenotyping is not commonly used in this context but more commonly held for the specific case where gene function (due to, for example, a mutation) in cerebellar neurones causes those neurones to have impaired function or even degenerate.
We have focused so far on phenotyping by clinical examination. However, the changes on a Magnetic Resonance Imaging (MRI) scan also reflect the effect of specific gene dysfunction on particular neuronal subtypes as do the neuropathological changes on histological examination. Thus, phenotypical classification is based on all the genes that express function in specific neuronal types. The reason for explaining the origins of phenotyping is because it explains why idiopathic non-hereditary CA ( Fig. 1) and inherited ataxias are often discussed together (as will be the case in this review) as neurodegeneration.

D. CA RESULTING FROM NEURODEGENERATION
The term ''Neurodegenerative ataxias (NA)'' has its roots in the late 19 th century, when features of ataxia were being codified and various syndromes were described [12]. They were progressively worsening ataxias, often with specific patterns of neuropathology and where known acquired causes of ataxia had been excluded. Many were recognised as being inherited, even before modern genetics. However, in the last forty years the genetics of many of these diseases has been revealed and many cases of new inherited ataxias have been described (hereditary ataxias Fig. 1) [12], [23]. However, where the genetics of NA remains unknown, these are referred to as idiopathic (a term that dresses up ignorance of the cause) although it is expected that genetic factors drive many of these conditions, either as specific mutations or as risk factors [24].
Prior to the availability of molecular genetic testing, specific hereditary ataxias (e.g., FRDA) were diagnosed on the patient's clinical phenotype (the constellation of clinical features associated with a particular genetic inheritance, as described above). It is now clear that while the clinical phenotype is a reasonably accurate indicator of the underlying pathology, neither clinical nor pathological phenotype is a completely reliable witness to the underlying genetic mutation. For example, some conditions that have a genotype for hereditary motor and sensory polyneuropathy (respectively, impairment of the nerves carrying information to the muscles and information on sensation from the periphery) present with the phenotype of NA. It is also common to divide the inherited NA according to their pattern of inheritance: autosomal dominant and recessive, X-linked and mitochondrial. Dominance here refers to whether one of the copies of genetic material dominates the expression with a  special case for the sex-linked X chromosome. While this classification of heritability is useful, it does not give full recognition to the underlying complexity of the genetic basis for these conditions. Rather than simple single point mutations, repeating nucleotide expansions [e.g., FRDA, Cerebellar Ataxia with Neuronopathy and Vestibular Areflexia (CAN-VAS) and certain spinocerebellar ataxias (SCAs)] are important. Furthermore, polygenic expression may impart a risk for neurodegeneration as they do in other forms of neurodegenerative disease [e.g., Parkinson's Disease (PD)].
This highlights an unresolved problem that the genetics revolution has presented to neurology: to classify according to the phenotype or the genotype? This question is not resolved and while genetics is allowing enormous strides in understanding underlying molecular pathology and potential therapeutic targets, ataxia (the phenotype) is the reason that people consult clinicians and is one of the main ways these diseases impact quality of life. Thus, for the clinician having a phenotypic classification is important and being able to accurately detect the presence and severity of ataxia, and the extent to which it changes with disease progression or the effect of therapy, will be a cornerstone in developing new therapies and then deploying them effectively.

1) PHENOTYPES
NA are classified into inherited and idiopathic ataxia:

Inherited Ataxias
Autosomal Dominant: There are now a very large number of dominantly inherited CAs. The SCAs are inherited in an autosomal dominant manner, SCAs 1, 2, 3 and 6 are the most prevalent [25]. Their occurrence is low and in part depends on geography, with some forms being especially common in specific localities. The episodic ataxias (EAs) are another group of autosomal dominantly inherited CAs which are characterized by episodes of (rather than constant) ataxia.
Autosomal Recessive: FRDA has been the most prevalent autosomal recessive CA but CANVAS (Cerebellar ataxia, neuropathy, and vestibular areflexia syndrome), a recently described condition, lead to the discovery of the RFC1 gene, which appears to be relatively common [26], [27]. FRDA typically has its onset in late childhood (although variants occur with an onset after 25 years or later) whereas CANVAS is mid to late adult onset. There are a very small number of X-linked and mitochondrial CAs.
Idiopathic Ataxias Most people diagnosed with idiopathic ataxia fall into a large and somewhat heterogenous diagnostic category, although there are a few such as Multiple System Atrophy of the cerebellar type (MSAc) that are exceptions. MSA constitutes 11% of sporadic ataxia [28]. Historically the terms Idiopathic Late Onset Cerebellar Ataxia (ILOCA) [29] or Sporadic Adult Onset Cerebellar Ataxia (SAOA) [30], [31] has been used to describe idiopathic CAs where a more specific diagnosis could not be reached. The homogeneity of these diagnostic entities has been questioned as some consider them to be more akin to diagnoses of exclusion rather than defined diseases [32]. Increasingly, idiopathic ataxias are recognised as being (1) a pure cerebellar syndrome (CA); (2) CA plus bilateral vestibular dysfunction CABV; (3) CA plus/minus vestibulopathy or pathology of peripheral proprioceptive input. Fig. 1 summarizes an overall schema of CA phenotypes and Fig. 2 illustrates the relationship of the key pure and combination phenotypes discussed in this paper.

2) CLINICAL PROTOCOLS TO IDENTIFY PHENOTYPE
To date, clinicians use a variety of tests to identify phenotype based on the known characteristics of each phenotype. Depending on the individual's presentation, the identification of phenotype may be central to reaching a diagnosis. These tests are a mixture of strategies that comprise medical profile, family history, and a comprehensive neurological evaluation. Lesion-related ataxia caused by brain tumors, abscesses, strokes, or multiple sclerosis rely on MRI brain to aid in detecting a cause (i.e., the lesions) [14]. Acquired ataxias, for example, infectious or gluten ataxia will require the involvement of laboratory tests such as serum biomarkers, or serum antibody levels. Gene tests will be required to diagnose a genetic cause. Deficits in the perception of sensation may be present during the bedside clinical examination but are more robustly assessed by nerve conduction studies (NCS) [33].

III. ENGINEERING OVERVIEW A. IMPLICATIONS FOR THE ML ENGINEER
Due to its affection on movements, ML research in CA can extract features from the movements of people with CA and compared with individuals without ataxia. However, as mentioned above (section II-D1), CA may have pathologies involving the cerebellum alone or be accompanied by pathology of the vestibular system and/or the proprioceptive system (the component of the somatosensory system which senses the position of a body part in space). While impairments of each of these neural systems can produce ataxia, clinically it is difficult to discern differences in the CA that results from involvement of the cerebellum alone or when one or both of the other systems are also present. However, an ML modeler should be alert to the possibility that subsets of these entities that constitute NA may actually provide slightly different feature sets. A further complexity is that some forms of CA are also accompanied by other disturbances of movement [e.g., spasticity in FRDA (prolonged contraction of muscle) or bradykinesia (slowness of movement) in MSA] that may be recognizable by distinct features allowing both the ataxia and other characteristic movements to be recognised and quantified.

B. MACHINE LEARNING 1) ML TARGETS
The first aim of ML is to categorically separate ataxic from non-ataxic movements and then to grade its severity. Secondary aims might include examining whether people with CA can be sorted into those with and without afferent (vestibular and proprioceptive) involvement from those with other movement disorders (e.g., spasticity or bradykinesia). It is important to realise that the process of diagnosis will remain clinically based on the manifestation of disease, the presence of relevant signs and symptoms, and relevant genetic testing. Four main questions ranked according to their hierarchical level of difficulty for ML engineering are: (i) Univariate analysis: What features differentiate those individuals with and without ataxia? The features may be physically expressible (e.g., step length in gait) or signal processing based (e.g., recurrence quantification analysis [34]); (ii) Differential diagnosis 3 : how to utilize a combination of standalone features to classify people with ataxia from those without ataxia; (iii) Severity estimation: how to estimate the severity of ataxia. This function plays an integral role in assessing the rehabilitation process or response to drug treatment; (iv) Phenotype identification: is it possible to determine the phenotype of a person with ataxia using movement data? The movement-based identification, when fully developed, helps to optimize the effectiveness of the treatment and result in reducing the cost of diagnostic procedures which is currently based on expensive genetic testing.

2) ML DATA RESOURCES
As outlined above, CA results from dysfunction in cerebellar nerve cells (neurones) and those of its afferents. It can also be added that when there is an underlying genetic cause, the neuronal dysfunction results from altered gene expression (the process by which the instructions in our DNA are converted into a physical structure that may alter bodily functions) in these brain locations. Thus, the presence and severity of CA can be assessed by observing changes within cerebellar neurones (and the afferent pathways) by looking at the microscopic changes to these structures (histopathological examination) or its proxy, structural change e.g., atrophy 4 as seen on brain imaging in particular, MRI [35], [36], [37]. Histopathological examination is superior because not only does it make an assessment of neuronal loss and disruption of normal neuronal processes possible, but it can also identify abnormal levels of molecules that are directly or indirectly the result of specific gene action, that is, enable measurement of disease-related substances. Histopathological examination whilst an individual is alive is generally not performed in the regions of the brain that we are concerned with as the risk of significant brain 'damage' in harvesting the brain tissue is too high. Recent advances in MRI techniques enable the quantification of neuronal count (as a proxy of neuronal loss), identification of neuroinflammation and assessment of the integrity of afferent pathways possible. Thus, MRI is a potential means of assessing the progression (severity) of the process causing CA. The other and oldest means of assessing cerebellar dysfunction is direct assessment of movement. Implicitly, by naming it ''ataxia'', clinicians have recognised 3 In the clinical context, this is the process whereby a clinician, having gathered a patient's medical history (including symptoms), conducted a physical examination (to demonstrate the presence or absence of any relevant signs) and any medical test results (e.g., vestibular impairment) generates a list potential diagnoses in an effort to reach a definitive diagnosis. 4 The wasting away of body tissue, in particular as a result of cell degeneration or loss. a characteristic pattern of the movements associated with cerebellar dysfunction. Thus, strictly ataxia can only be measured from the subject's voluntary movements.

3) ML APPROACHES
With data sourced from neuroimaging and movement-based logger devices, three fundamental ML approaches are prevailingly applied to process the raw data as illustrated in Fig. 3.
(i) Hand-crafted ML: Researchers can extract features manually from raw data using signal processing techniques and utilize them to train ML models. The classification or regression models then classify cohorts and produce a severity score. This traditional approach has been used widely in CA research. The advantage of ML is its low demand in data points (less number of participants) and engineer can interpret the relationship between features vectors versus model outputs. However, feature extraction is a time-consuming process which requires statistical, signal processing, and domain knowledge to be able to deliver a superior model; (ii) End-to-end ML: Researchers can utilise a neural network (NN) or deep learning (DL) approach which is more advantageous than hand-crafted ML by eliminating the feature extraction step. However, end-to-end ML requires a large dataset and is less interpretable (as a black box) because feature vectors are auto generated and embedded in the model itself. In this approach, transfer learning is usually employed to leverage the pre-trained NN and helps to overcome the problem of a small dataset [38]. Time series data can also be converted into images and leverage the state-of-the-art NN.

4) ML PLATFORMS
Although many open-source tools are available with which to work with ML algorithms, CA researchers have mainly utilised TensorFlow, Keras, R, Pytorch, Matlab, and Scikitlearn. Another less popular ML platform is the Waikato Environment for Knowledge Analysis (WEKA) [44], which has been used to segregate people with FRDA from those without ataxia [45]. Tsfresh [46] is another supporting library for feature extraction which is used with time series. This python package automatically employs 63 time-series signal processing techniques to calculate a total of 794 features. In contrast to other neurological diseases such as Parkinson's disease (PD) or Alzheimer's Disease (AD), the dataset for CA is not publicly available. Across all domains, researchers have engaged conventional ML algorithms such as Random Forest, Linear discriminant and regression analysis (LDRA) [47], [48], Support Vector Machine (SVM), LASSO, Hidden Markov Model [49], Decision Trees, K-Nearest Neighbor [50], Least-squares support vector machine (LSSVM) [51], 3-nearest neighbour (3-NN), and neural network (NN) [52]. DL limited to the use of Deep Convolutional Neural Networks, Recurrent Neural Networks (RNNs), and Transfer Learning [53]. Model selection usually employs generic metrics including accuracy, f-beta score, sensitivity, specificity, precision or Akaike's Information Criterion corrected for small sample sizes (AICC) [54], [55].

C. STATISTICAL TOOLS IN CA STUDIES
Statistical tests used in CA research have principally been for univariate analysis, in order to evaluate the difference between control subjects and individuals with ataxia. Data normality is usually verified using the Shapiro-Wilk, Kolmogorov-Smirnov or Jarque-Bera tests [56]. If the dataset is normally distributed, a one-way analysis of variance (oneway ANOVA) or Student's t-test can be employed to evaluate the differences between the means of each cohort. The Kruskal-Wallis or Mann Whitney U test are conducted if case normality cannot be ensured. The Wilcoxon signed rank test, Fisher discriminant ratio, Effect size (Cohen's d or Hedge's G) are utilised to measure the significance of differences in the analyses [56]. Hedge's G is equivalent to Cohen's d, but modifies for standard deviations and unbalanced sample sizes. The reading of Hedge's G is comparable to that of Cohen's d, where 0.2 Cohen's d is typically interpreted as small, 0.5 as medium, and 0.8 as large. Correlation tests available include the Pearson Correlation Coefficient (PCC) and Spearman rank order Correlation Coefficient (SCC) depending on normality of the data set. As a general alignment, statistical results are significant if p values are less than 0.05. Intraclass correlation coefficients (ICCs) may be computed to ascertain repeatability of measures. In a longitudinal study, such as one which examines the effect of rehabilitation, this chosen metric must measure changes over time or in response to a therapy [57]. Power analyses are statistical measurements to be employed before a trial (prior-analysis) to determine the smallest number of samples required to detect a difference, or after a trial (post-analysis) to justify the significance of the research outcome. Available statistical programs to calculate the power and other multivariate analyses include G*Power3 [58], PASS 2020, 5 SPSS SamplePower, and SAS (SAS Institute, Cary, NC). As a general consensus, 30 samples per group are necessary to obtain a significant level [59].

IV. ML IN ASSESSING ATAXIA VIA NEUROIMAGING
A range of quantitative measures relevant to CA can be derived from brain and spinal MRI, indirectly measuring pathological changes resulting from neuronal atrophy, axonal damage, metabolic dysfunction, vascular abnormalities, or glial changes. ''Macrostructural'' changes in regional brain and spinal volume can be measured using standard T1-or T2-weighted MRI, reflecting neuronal atrophy, loss of neuropil, or developmental hypoplasia. Modern MRI acquisitions at submillimeter resolutions using 3-Tesla (or greater) field strengths allow for detection of increasingly subtle and more localised anatomical changes in the cerebellum and brainstem in people with CA, particularly alongside the rapid development of automated image processing approaches, including DL algorithms (e.g., ACAPULCO, FastSurfer) [37].
''Microstructural'' measures reflecting changes in tissue organisation or cellular environment can also be acquired, typically using diffusion imaging approaches. These are most commonly used to quantify white matter integrity (i.e. ''fractional anisotropy'') or fluid accumulation (''mean diffusivity''). More complex multicompartment biophysical modelling approaches can also be applied to diffusion-weighted imaging to delineate more specific biological featurese.g, the ''NODDI'' model provides regional quantification of neurite density, axonal dispersion, and extracellular fluid [35], [60]. Tractography approaches can also be applied to isolate particular white matter tracts of interest. In people with CA, these approaches are most useful for interrogating the integrity of the cerebellar peduncles (which carry afferent signals into the cerebellum and efferent signals from the cerebellum to other brain regions), and the ascending and descending spinal tracts (carrying sensory and motor signals from/to the rest of the body) [61], [62], [63].
Quantitative susceptibility mapping is another microstructural MRI technique which is sensitive to iron in grey matter structures and has become increasingly influential in assessing volumetric changes and iron-related pathology (likely reflecting mitochondrial dysfunction) in the deep cerebellar nuclei, such as the dentate nuclei, in CA [36]. Magnetic resonance spectroscopy offers opportunities to dive even deeper to quantify the molecular concentrations of particular metabolites or proteins that are reflective of neuronal or glial health and activity [64], [65]. Notably, although functional imaging (e.g., fMRI) provides a more 'living' snapshot of neuronal function, it is not generally considered to have adequate reliability to meaningfully contribute to classification or prediction [66].
Given the wealth of quantitative data that can be obtained using MRI, the task for the ML engineer is to identify data features that can categorise or sort NA from non-ataxic subjects, assess the severity of NA and distinguish between CA arising from involvement of the cerebellum only, from CA due to involvement of the cerebellum plus its afferents. The challenges here are the ''gold standards'' for severity, which inevitably fall to clinical assessment, with the previously described attendant limitations, such as subjectivity. A second factor is that clinical features (i.e., ataxia) will not be linearly related to neuronal loss. Neuropathology generally precedes clinical expression in CA. For example, significant thinning of the spinal cord is already evident at the time of first symptom expression in FRDA, and atrophy of the cerebellum and brainstem can be observed in people with SCAs many years prior to symptom onset [67]. It is therefore likely that neuronal loss must reach a threshold, exhausting 'neural reserve' and/or overwhelming compensatory mechanisms within the system before function is impacted. Additionally, neurodegeneration with disease progression in CA is likely nonlinear in both time and space. For example, in FRDA, spinal cord shrinkage largely occurs due to developmental hypoplasia, with only limited ongoing degeneration, dentate nucleus atrophy is maximal early in the disease course before approaching an asymptote, and cerebellar cortex is relatively normal until late in the disease [68], [69], [70], [71]. As such, feature selection likely needs to be closely tailored to each use case. It is likely that accurate and robust ML will be best served by considering multiple imaging modalities and features, and even 4D approaches where feasible [64].
Importantly, MRI provides a static snapshot of the brain at a point in time, providing information about the structure, connectivity and metabolism, but not about the behavioural/clinical implications of those biological changes. MRI quantifies pathology but not the effects of pathology. Indeed, the need exists to more clearly link the effects of pathology to function by means of identifying the presence and severity of ataxia.

A. ADVANCES IN ML TO IDENTIFY AN NA USING NEUROIMAGING
DL frameworks with an end-to-end approach has been used with MRI (T1-weighted Magnetization Prepared -RApid Gradient Echo images) on a data set of 61 controls and 107 individuals with ataxia, to discriminate three CA phenotypes SCA2, SCA6, AT with error rate of 13.75% [72]. Here, the disease functional score estimation with stacked autoencoder (SAE) reach up to 0.69 Pearson correlation. The technique proposed to train weak classifiers first and then combine to overcome the limited number of training samples, which is one of the most common limitations in health-care research to impede the use of DL approaches.
Applications of DL to diagnose neurodegenerative diseases (AD and PD) used Deep Convolution Neural Networks, Recurrent Neural Networks (RNNs), and Transfer Learning [53]. MRI provides hand-crafted measures of cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM) with a data set of mild cognitive impairment, PD, and scans without evidence of dopaminergic deficit [73], [74]. Other neuroimaging techniques that may provide utility include Computed Tomography (CT), Single-Photon Emission Computed Tomography (SPECT), and Positron Emission Tomography (PET) [53], [75]. Conventional ML and DL can further aid early diagnosis through interpreting clinical scanning images as well as discovering new treatment therapies [76]. Some supporting libraries for feature extraction include Voxel-Based Morphometry toolbox [77] with Statistical Para metric Mapping [78] that helps to produce WM, GM, and CSF brain images. Preprocessing steps may include the use of software tools such as Freesurfer 6 to generate a masked and intensity-normalized image which contains labels for the cerebellum [72] and brainstem, and the ACAPULCO CNN for segmentation of the cerebellar lobules [79].

V. ML IN ASSESSING ATAXIC MOVEMENT
As previously stated, the uncoordinated movement resulting from CA affects eye movements, speech, axial function (balance and gait), appendicular function (upper and lower limbs), which are here referred as ''domains''. Although the incoordination intrudes into everyday activities, there are specific repertoire of movements used by clinicians during the bedside examination to accentuate ataxia. The common theme of these movements is to focus on timing and accuracy using repetitive movements or tasks that require accuracy. The ability to maintain a stable posture is also examined directly (in the case of axial function) or indirectly (where movements at more distal joints require a more proximal 7 joints to provide a stable platform -e.g., accurate pointing of the finger at a target requires coordinated contraction of the muscles that stabilize the shoulder joint). The nervous system is constantly correcting and adjusting posture, and in CA errors in the timing and accuracy of these corrections results in increased sway and irregular gait and balance.
Advances in ML to identify a person with NA by employing data gathered on ataxic movements can be categorized based on research approaches of assessing ataxia through routine movements (subsection V-A), beyond task modeling (subsection V-A), or tasks emulating standard clinical tests (subsection V-B).

A. ASSESSMENT VIA ROUTINE MOVEMENT AND BEYOND TASK MODELING
In addition to body worn sensors [34], [47], we have developed hardware in the forms of a cup [22], [83] and a spoon [80], [81], [82] to examine ataxia through routine daily movements (as opposed to the specific movements which are based on the cerebellar bedside examination). Inspired by devices of daily use, the primary focus is the objective assessment of CA while completing usual activities of daily living. These devices have the potential for being used in the home, an option that a clinic setup is unable to deliver. Ecologically relevant activities such as using a spoon and cup are functionally relevant and demonstrate high performance; 88% segregation rate between individuals with and without ataxia, and 0.72 correlation with clinical scales [22], [80]. Nevertheless, this research may be viewed as limited, in that it is still adhering to specific, pre-defined tasks such as feeding and drinking.
In the context of assessment via task-free activities, Khan et al., investigated the possibility of using a continuous logger (a single wrist sensor worn) to estimate severity of children with ataxia-telangiectasia, recording at home environment [10]. Table 1 lists research in assessing CA through routine movement and beyond task-independent modeling.

B. ASSESSMENT VIA EMULATING STANDARD CLINICAL TESTS 1) EYE MOVEMENTS
There are many clinical eye movement abnormalities in cerebellar disease [87] which include incoordination of eye movements (e.g., inaccurate eye tracking of a moving object, or when moving between objects) or additional abnormal eye movements (e.g., nystagmus, a range of involuntary repetitive eye movements). Most abnormal cerebellar related eye movements are able to be readily captured using portable infra-red goggles (video-oculography) [88]. Recently, a high speed smartphone camera was used to capture eye movement with a diagnostic sensitivity of 0.84, specificity of 0.77, and a correlation value of 0.63 with the BARS oculomotor subscore [89]. This involved recording two minutes of slowmotion video of the face, capturing the centre point of the iris whilst the subject tracked a moving dot on a screen. Facial landmarks were detected and the relative position of the centre point of the iris to fixed facial landmarks was recorded. Feature extraction was a combination of the total power and variance of frequencies embedded within the iris movement. TABLE 2 presents the research and corresponding features implemented within the eye movement domain analyses.

2) ATAXIC SPEECH (DYSARTHRIA)
People with ataxia often exhibit disordered speech referred to as ataxic speech or cerebellar dysarthria. This dysarthria is generally recognized by specialist clinicians as containing a combination of a reduced rate of speech, uncoordinated production of speech sounds and volume [107], [108], [109]. Functionally, this combines to affect speech intelligibility [91], [110]. Clinical assessment tasks often require a subject to repeat a certain vowel or word [111]. By observing factors such as the rhythmicity of the task, the clinician or sensory system can endeavour to capture any dysarthria. At the bedside vocal tasks such as the utterance of the phrase ''British Constitution'' [90] or /ta/-/ta/-/ta/ [112] are undertaken, SARA (sub-item four), SCA Functional Index (SCAFI -PATA rate) [113], ICARS (subitems 15 and 16), Rapid Verbal Retrieve [114], the Controlled Oral Word Association Test, the Animal Test, Action Fluency Test [115], Phonemic Verbal Fluency [116], Lexical fluency, Semantic fluency Tasks [117]. In addition, researchers in CA have used other vocal tasks which have been investigated in PD such as repetition of the vowel /a/ or /pa/-/ta/-/ka/ per one breath; narrating a fictional story for 90 seconds [118]; or produce a sample of continuous speech [91]. Apart from the clinical evaluation in formal scales such as SARA and SCAFI, the researcher may also evaluate speech motor function by the Munich Intelligibility Profile (MVP-Online) and the Bogenhausen Dysarthria Scales (BoDyS) [92].
Researchers may employ a professional microphone or more recently portable devices such as a smartphone's microphone or recording over a telephone. The speech domain is a promising candidate for developing tele-diagnostic and telerehabilitation programs. This is enabled by methods that are inexpensive, non-invasive, provide easy assess to a relatively large population, are simple to administer and can be performed remotely.
Features used in general speech studies have an extended history of development. Work by Buder reviewed more than 100 acoustic measurements with many of them have been implemented in the actual clinical setting [119]. Kent and colleagues revealed the excessive variation of fundamental frequency (F 0 ) was the most frequent biomarker in ataxic speech [120]. Individuals with FRDA also presented significantly higher scores on the Cepstral Spectral Index of Dysphonia during vowel production [91]. Work by Brendel et al. used time-based acoustic parameters including speaking rate, maximum syllable repetition rate, and pitch variation coefficient to assess dysarthria severity and the speech impairment profile in individuals with FRDA [92]. Ataxic speech has been analysed by combining perceptual and acoustic features of dysphonia to categorize individuals with FRDA against control participants to more than 80% accuracy [91]. Phase-based cepstral features employed Mel-frequency cepstral coefficients (MFCC) and the modified group delay function cepstral coefficients (MGDCCs) to differentiate ataxia cohorts and estimate ataxic severity [90].
Supporting libraries used in speech studies include Librosa and Snack sound toolkit. Librosa is a well-known python package for music and audio analysis [121]. It extracts spectral and rhythm features including chroma, melspectrogram, MFCC, spectral centroid, flatness, and bandwidth. Snack sound toolkit 8 is another library supporting scripting languages to extract visual features from speech [122]. Another stand-alone program used to extract vocal features is Analysis of Dysphonia in Speech and Voice program. 9 To date, research in the speech domain has been somewhat constrained to the first approach of hand-crafted ML. We suggest transforming 1-D speech signal into 2-D image to leverage state-of-the-art DL techniques to tackle the severity and phenotype problems. Ataxia research may also try to apply techniques which have been utilised in other relevant domains. In PD speech domain, Wan and colleagues utilised smartphone to record speech data to estimate a severity score [123]. Work by Rusz and colleagues considered a smartphone to evaluate acoustic indices to diagnose people with PD [118]. Animated humans (artificial intelligence interactive avatars) ask individuals with PD questions and record the resulting conversation to capture more complex language related data [122], [124]. The computer-based chat agency distinguished healthy controls and individuals with dementia with an accuracy of 93% [122]. These techniques are all promising and compatible to be applied to CA research field.

3) LOWER LIMB
In the domain of lower limb ataxia assessments were primarily based on two clinical tasks, foot-tapping and the heelknee-shin task. The foot-tapping task requires the subjects to place their heel on the floor and repetitively tap the floor with the 'ball' of the foot. In the heel-knee-shin task, subjects are in their seated position and place the heel of one foot onto the opposite knee and repetitively slide the heel down along the shin to the ankle and back up to the knee. These protocols are interchangeable to be repeated with the other limb. Movements of these tasks utilise multiple joints which require a higher involvement from the cerebellum and therefore likely disclose ataxia signs. TABLE 2 presents studies which applied ML and their corresponding features implemented within the lower limb analyses. Wireless sensing techniques presented in the gait section can also be implemented to recognize and assess CA with the heel-knee-shin movement. Researchers in this sector may refer to the activity and gesture recognition based wireless sensing [125].

4) TRUNCAL ATAXIA
Truncal coordination is of great significance in forming a stable base of support for limb movements and ambulation. Balance based analytics significantly contribute to the accuracy of a classification model as well as place its largest portion in the final optimal feature set [126]. Milne et al.
conducted a systematic review of the efficacy of rehabilitation in genetic CAs and found significantly uniform evidence that rehabilitation improves mobility and balance [127]. Diaz and colleagues presented a complete picture of using wearable sensors in balance analysis [128]. Ghislieri and colleagues summarised balance assessment using wearable inertial sensors for CA and other diseases [104].
Bedside balance testing involves tasks such as the Romberg's Balance manoeuvre (SARA sub-item two [15]), Limits of Stability Test [129], Single Leg Stance Test [130], Five Times Sit to Stand Test [131], Functional Reach Test [132], Clinical Test of Sensory Interaction and Balance [133], Tinetti Test [134], Berg Balance Scale (BBS) [135], and the Balance Evaluation Systems Test (BESTest) [136]. Among forty-one relevant balance measures, the BBS, SARA, and the TUG were identified as the most robust outcome measures with at least 75% expert consensus [137]. Associated closely with truncal balance is the loss of ambulation (LoA). The Kaplan-Meier estimator and Turnbull method together with the FARS have been studied to predict the LoA with a dataset of 1021 individuals with FRDA [138]. The findings concluded that individuals with early onset (less than 15 years of age) FRDA typically become wheelchair dependent at an average of 11.5 years after the onset of the first symptoms. The authors also suggested employing LoA as a valuable progression biomarker ranking approach for individuals with ataxia. TABLE 3 lists some relevant research and the features implemented within truncal analyses. In one study, rehabilitation enhanced balance performance in individuals with CA and MRI revealed enhanced structural connections between the cerebellum and cortex [139].

5) UPPER LIMB
Similar to the lower limbs, CA impacts on upper limb dexterity and may be apparent in a range of tasks, such as degradation of handwriting and difficulty performing tasks such as fitting a key into a lock. Bedside testing includes finger tapping (tap on a surface or between fingers, SARA [15]), fingerto-nose test, rapid alternative hand movements, ballistic and ramp tracking [142], [147]. These tasks aim to stress the cerebellum by employing movements through multiple joints and aim to uncover ataxic movements which include dysmetria (difficulty in effecting the correct distance of a movement), intention tremor (abnormal movements which increase in amplitude as the intended target is approached) and dyssynergia (the decomposition of a movement into smaller component movements). TABLE 4 presents research and the corresponding features implemented in upper limb movement analyses. Work by Impedovo et al. presented a comprehensive review of online handwriting analyses from the perspective of pattern recognition [149]. Developed as online software, these approaches may be subject to scale-up capability in order to enable the development of tele-rehabilitation VOLUME 11, 2023  14018 VOLUME 11, 2023 programs. They have proved to be useful biomarkers in the assessment of AD and PD and hence, may find applicability in CA research [149].

6) GAIT
Gait deficit is often the first ataxic feature to be noted [14]. Gait changes include widening of the lateral base of support, variability in stride length and timing, hesitancy and unsteadiness [151]. Spatiotemporal variability measures, spatial step variability compound measures, lateral step deviation, and harmonic ratios are among the higher sensitivity biomarkers quantifying ataxic gait [159], [160].
Clinical walking tasks used to evaluate ataxic gait include slow, preferred, and fast speed walking [154], SCAFI (8 meter/25-foot walk), Timed Up and Go (TUG) test [161], FARS (a timed walk of 50 feet) [19], SARA (sub-item one) [15], ICARS (gait disturbances sub-items one and two) [16], U-turning [162], walking in a circle [163], Tandem Walk test [146], Timed 25 Foot Walk Test [164], 10 metre Walk Test [165], Six Metre Walk Test [166], and Functional Ambulation Classification [167]. A review summarised the structure of typical motion capture techniques applied in neurological diseases across six categories of sensor systems including inertial kinematic, optical, magnetic, mechanical, acoustic, and computer vision [168]. Works of Boekesteijn et al. [169] and Jourdan et al. [170] list 34 hardware platforms that are commercially available to quantify gait deficit. Specifically, APDM Mobility Lab TM [153] and BioSensics [171], [172]are among two that have been used on ataxia research. TABLE 5 presents research and the corresponding features implemented in certain gait analyses. Intensive rehabilitation has demonstrated benefits in individuals with degenerative CA [127], [167], [173]. A telerehabilitation study in PD employed a pedometer or step counter, to monitor the progression of mobility during drug trial [174].
Mobility exerts a very significant effect on quality of life, and hence, measurement of the effect of CA on gait is an important endeavor. Our experience with gait metrics has utilized wearable kinemetic sensors to look at mean power frequency bands, fuzzy entropy and resonance frequency [154]. Recently, wireless sensing has emerged as a privacy-preserving and non-invasive solution for assessing the impact of neurological diseases on gait [175], [176]. It has advantages over a wearable sensory system as it does not involve any devices attached to subject's body and protects privacy to a greater extent than camera-based systems, as it records only movement disturbances. Other platforms using continuous wave radar or ultra-wideband are less active than wifi-based platforms due to installation cost, reduced portability and dedicated transmitter requirements. Using consumer WiFi devices, design and implementation of an indoor falls detection system was reported to have more than 90 per cent detection precision [177], [178]. Falls risk prediction is also an important component of research in ataxic gait. Along with the developments of deep neural networks, we would expect applications using state-of-the-art models to address CA related problems. Long Short-Term Memory networks have proved their potential with a gait data set containing 48 participants living with PD, Huntington's disease and multiple sclerosis. The system was able to differentiate unaffected subjects from people with one of the three diagnoses with more than 94% accuracy and classify affected individuals into one of the three diseases with more than 95.67% accuracy [179]. Virtual reality is another promising tool with which to study CA rehabilitation with its ability to provide an engaging and highly individualized environment. In PD research, the virtual environment helps to mimic realworld situations [180].

VI. CHALLENGES AND FUTURE PROSPECTS A. PHENOTYPIC IDENTIFICATION AND DEEP LEARNING
Phenotype classification plays a vital role in diagnosis, disease discovery, symptom management and in customizing rehabilitation therapy. By understanding the pathology underlying ataxia we can develop appropriate methods to measure the efficacy of interventions such as medications and rehabilitation. It remains a challenging question as current approaches do not reflect significant phenotypic difference. In the context of clinical assessment, phenotype identification holds the possibility of reduced costs in genetic testing and increased diagnostic certainty. Such a system is not designed to substitute clinicians or to conduct a self-diagnosis, but rather assists with supplementary evidence. In CA, phenotypic diagnostic has been reported with neuroimaging-based approaches facilitated by end-to-end approaches [72]. There has been limited work to identify pure CA from CABV and CA plus proprioceptive loss or the presence of bradykinesia or spasticity. There has also been a lack of work to separate known genotypes with pure CA. One approach may be to employ the end-to-end framework with input sources from time-series signals.

B. SEVERITY ESTIMATION AND ITS SUBJECTIVE GROUND TRUTH
ML models classifying phenotypes use objective ground truth referred from gene tests. Contradictorily, current supervised ML models estimating ataxia severity use subjective ground truth referred from human rated scores. Although obtaining high correlations with clinical scores (0.62 to 0.91 Pearson's correlation [34], [80], [86], [89], [90], [147]), the subjective ground truth sets a limit on what ML models can learn. This is even more applicable with the fact that most research involves less than three clinicians to assess the participants and reported correlation results was with high standard deviations [34]. We suggest a novel approach to utilize a set of non-ataxic control's data as a reference base. Severity estimation can be calculated based on the amount of difference in which an individual with CAs performance deviates VOLUME 11, 2023  from the control's baselines. This approach can utilise signal processing techniques such as recurrence quantification analysis [181], multiscale entropy [182], or poincaré plot [39]. Also, unsupervised learning techniques such as clustering or association can be used to as independent relative to clinical subjective ground truth.  [183], [184], [185]. In epilepsy, Page and colleagues have utilised DL on EEG data [186]. Another research branch to use DL is baseline prediction. This research branch attempts to predict information of an individual based on a known larger dataset. Baseline prediction is a novel research direction in CA and has been reported in FRDA [187], [188]. By collecting certain reliable baseline information, the researchers can predict the uncertain or missing components of the new data to an acceptable level of accuracy.

D. DEVICES AND APPROACHES
Activities of daily living are an important basis for research [189]. Recently, radiofrequency technologies assessing motor control in daily activities have been reviewed [189]. Magnetic induction or wireless sensing systems applied with deep recurrent neural networks may emerge substantially in limb and gait research [190]. Handwriting and drawing are being progressively adopted in the assessment of PD and AD which may spark particular interest in CA research [149], [191]. These approaches can be easily performed on patients' personal computers or take advantage of smartphone's popularity to scale up a broad market.

E. FALL SCREENING AND PREDICTION
Falls are identified as a significant cause of injury in individuals with CA [192]. In an effort to provide a non-invasive screening platform, work by Gallamini and colleagues suggested employing a standing platform and Romberg's balance test to investigate the risk of falling [193]. Falls prediction employed the traditional approach to search for linkage between a signal's features and a fall. For instance, research in PD showed the high frequency (3 Hz to 8 Hz) components of leg movements during Freezing of Gait (FOG) were not present in ordinary walking or standing. By calculating a power ratio of the ''freeze'' band [3][4][5][6][7][8] to the ''locomotor'' band [0.5-3 Hz], the authors could estimate a threshold to identify FOG events [194]. With a similar method, Handojoseno et al. proved the power spectral density and wavelet energy of electroencephalography (EEG) could function as promising biomarkers to indicate FOG with 80% accuracy [195]. Another approach used a DL auto encoder with wearable devices to predict with a 71.3% accuracy on average 4.2 seconds before a FOG occurred [196]. The authors built anomaly detection models to monitor wearer's gait continually and established connections between FOG with abnormalities captured by the sensor. These approaches may find applications in CA.

VII. CONCLUSION
This review paper presented (I) a brief overview of clinical background and relevant information for ML applications in CA; (II) an overview of ML objectives, promising approaches, and supporting tools to solve significant clinical problems in the practice of cerebellar medicine and research; (III) existing and recent advancements in the diagnosis and assessment of CA based on neuroimaging; (IV) examples of highlights in measuring ataxia using instrumentation; and finally (V) a list of potential future directions in this field.
The area of CA measurement has advanced substantially within the last decade and is now entering a new era of application in clinical and research settings. Motor-based systems combined with the development of new machine learning techniques will take advantage of device compactness and the popularity of smart devices, to progress various applications including progress of tele-assessments. Continued collaboration and global efforts can progress to allow a CA dataset which may be publicly accessible. We foresee the future challenges to estimate severity scores in a more reliable, consistent, and accurate manner; identify early onset ataxia by susceptibility biomarkers; and progress in phenotypes identification. With this review, we hope that we have succeeded in stimulating the CA community toward the development of advanced objective assessment tools. IAN H. HARDING is currently the Head of the Mechanisms of Neurodegeneration Research Group, Department of Neuroscience and Monash Biomedical Imaging. He leads the Mechanisms of Neurodegeneration Research Group, Department of Neuroscience and Monash Biomedical Imaging. He uses magnetic resonance imaging (MRI) and positron emission tomography (PET) to investigate and track brain changes in people with neurodegenerative diseases. This work principally focuses on individuals with inherited subcortical diseases, including Friedreich ataxia, spinocerebellar ataxias, and Huntington's disease. Additional work in other neurodegenerative disorders and in preclinical animal models is also being undertaken with our collaborators. His research interests include biological phenotyping, such as describing changes in brain structure and function, and mechanistic inferences, including cellular/molecular-level measurements of inflammation, oxidative stress, and metabolic dysfunction. His studies seek to provide more comprehensive disease descriptions and to identify measures relevant to track disease progression or treatment efficacy. DAVID J. SZMULEWICZ received the Ph.D. degree from The University of Melbourne. He is a Neurologist, a Neurotologist, and a Medical Researcher. He is currently the Head of balance disorders and ataxia service with The Royal Victorian Eye and Ear Hospital, the Cerebellar Ataxia Clinic, and the Alfred Hospital; and a Neurologist with the Monash Medical Centre, Friedreich's Ataxia Clinic. He is also the Co-Director of the Australian Temporal Bone Bank which aims to facilitate pathological investigation of hearing and balance diseases. He is a Lead Investigator in research defining novel ataxia-cerebellar ataxia with neuropathy and vestibular areflexia syndrome (CANVAS), a project to develop objective ataxia metrics and an objective bedside test of imbalance-the video VVOR. His research interests include balance disorders that affect the vestibular systems, cerebellum, and the combination of the two. VOLUME 11, 2023