Multi-Modal Deep Learning Diagnosis of Parkinson’s Disease—A Systematic Review

Parkinson’s Disease (PD) is among the most frequent neurological disorders. Approaches that employ artificial intelligence and notably deep learning, have been extensively embraced with promising outcomes. This study dispenses an exhaustive review between 2016 and January 2023 on deep learning techniques used in the prognosis and evolution of symptoms and characteristics of the disease based on gait, upper limb movement, speech and facial expression-related information as well as the fusion of more than one of the aforementioned modalities. The search resulted in the selection of 87 original research publications, of which we have summarized the relevant information regarding the utilized learning and development process, demographic information, primary outcomes, and sensory equipment related information. Various deep learning algorithms and frameworks have attained state-of-the-art performance in many PD-related tasks by outperforming conventional machine learning approaches, according to the research reviewed. In the meanwhile, we identify significant drawbacks in the existing research, including a lack of data availability and interpretability of models. The fast advancements in deep learning and the rise in accessible data provide the opportunity to address these difficulties in the near future and for the broad application of this technology in clinical settings.

in combination with genetics, all contribute to the pathogenesis of PD.
Even though there is presently no cure for PD, pharmacological approaches based on dopamine substitution, as well as surgical techniques such as deep brain stimulation (DBS) provide substantial improvement of motor symptoms. While mortality is not increased in the first decade after disease onset, eventually doubles compared to the general population, with falls and aspiration pneumonia being the leading causes of hospitalization and decease.
PD is clinically defined by the presence of bradykinesia and at least one additional cardinal motor symptom (rigidity or rest tremor), in combination with other supporting and exclusionary features [2], [3]. Reduced facial expression, handwriting diminishing (micrographia), speech and voice impairment, as well as difficulty swallowing, all represent additional motor aspects of PD. The aforementioned motor symptoms gradually interfere with patients' daily activities, negatively affecting their quality of life. Progression of motor dysfunction with increasing gait abnormalities and onset of postural instability further compromise patients' autonomy and safety. Although the motor symptoms define the clinical syndrome, a majority of PD patients have other complaints that have been classified as non-motor, probably related to non-dopaminergic pathways. These include mood and mental alterations, such as depression, lack of motivation or apathy, and a declining cognitive capacity. Fatigue, sleep disturbances, autonomic (i.e., orthostatic hypotension, urogenital dysfunction, constipation, and excessive sweating), as well as sensory complaints are common components of the clinical spectrum of PD. Nonmotor symptoms are present in the early stages of the disease, while some antedate the onset of cardinal motor features by years even decades [4]. However, these symptoms become increasingly prevalent and are major determinants of quality of life, progression of overall disability and mortality, as the disease advances [5].
The diagnosis of Parkinson's disease is mostly dependent on the patient's clinical evaluation [6], [7]. The Movement Disorder Society -Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is currently regarded as the gold standard for the evaluation and monitoring of PD [8], followed by the well-established Hoehn and Yahr (H&Y) scale. Although there are some interesting possibilities, there are no clear biomarkers for PD, and all pertinent research findings support the clinical diagnostic confirmation. Even more difficult is the identification of the disease in its early stages, since the symptoms' heterogeneous appearance and progression result in complicated clinical presentations of PD [9]. Fortunately, Machine Learning (ML), a field of Artificial Intelligence (AI) which is defined as the capability of systems to autonomously acquire knowledge and detect patterns from experience or existing data without being explicitly programmed [10], has become increasingly successful in identifying nonlinear connections in high-dimensional data. Moreover, a cutting-edge machine learning technology, Deep Learning (DL), has recently achieved success and increased performance that surpasses state-of-the-art in several health areas [11]. DL allows the entry of high-dimensional and unprocessed data and automatically learns its representation through the use of Deep Neural Networks (DNN), which need minimum feature engineering work on data preprocessing [12].
DL strategies are essential to manage such complex data extracted from sensor-enabled devices [13]. During the last decade, there has been a rapid development in the use of DL modeling techniques that employ sensor data to track, monitor, and predict the course of Parkinson's disease [14], [15]. Other techniques involve the use of biomarker data sets, such as dopamine transporter data from tomographic images and serum cytokines, to evaluate classification performance using shape features derived from produced regions of interest [16]. Additionally, a recent study provided evidence that AI can identify people, who suffer from PD, from their nocturnal breathing and could accurately estimate the disease severity and progression [17].
Nowadays, research on the development of wearable sensory equipment for the detection of symptoms or characteristics of PD has made significant progress [18]. The sensors can be located in different parts of the body and are associated with different modalities and combinations thereof. The proper use of sensors and the selection of the investigated modalities can offer provision to more remote healthcare services and tailored diagnoses [19]. In addition, these digital sensors enable the autonomous, non-disruptive collection of real-world data [20]. Meanwhile, the real-time nature of wearable technology and the depth of insight into a patient's vitals enable physicians to detect disease early and provide the most accurate diagnoses, whether in-clinic or remotely [19]. Therefore, it is of utmost importance the use of sensors to obtain biomarkers from modalities, that can offer real-time monitoring and thus, may accurately reflect everyday symptoms and their variation.
In the presented systematic literature review (SLR) we focus on studies that employ DL methods towards PD diagnosis by utilizing data from the modalities of gait, upper limb motion, speech and facial expressions. Information from the aforementioned four modalities can be obtained from wearable devices during daily life activities in real-time, outside of the clinical settings, and thus, can provide caregivers and clinicians with a more realistic picture of the progression of the disease, leading to more personalized treatment of the patient.
The rest of the manuscript is structured as follows: Section II provides with a description of the most common PD symptoms related to the investigated modalities and the importance of their use for PD diagnosis is highlighted.
Further, Section III comprises of the related recently published reviews regarding the use of AI to the diagnosis of PD. Further, the research method followed in this review is presented in detail in Section IV, including the research goal and questions, the study search and selection strategy as well as the extraction of information from the selected surveys. In Section V, our findings are analyzed and the research questions are answered in detail. Finally, in Section VI, we summarize and discuss our results and the possible limitations of our work whilst in Section VII we draw the main conclusions from our review.

II. MANIFESTATION OF PD SYMPTOMS
In this Section, the clinical features of Parkinson's disease regarding the modalities of gait, upper limb motion, speech and facial expressivity, both motor and non-motor, are described in the context of the progression of the disease.

A. Analysis of Gait
Gait impairment is an evolving condition, and different patterns of gait disturbances can be detected throughout the progression of PD, such as reduced smoothness of locomotion, increased interlimb asymmetry [21], decreased speed, reduced step length [22], shuffling steps, increased double-limb support, defragmentation of turns, decreased balance and postural control [23].
In addition, data related to gait mechanisms are utilized to create strategies for predicting the risk of falling. Falls represent a significant risk factor that greatly influences the quality of life of Parkinson's patients [24]. Falls in people with severe Parkinson's disease are often related to a paroxysmal symptom known as "Freezing of Gait" (FoG). More than 60% of PD patients experience some form of freezing as the disease progresses, and insufficient knowledge of the pathophysiology and circuit mechanisms limits the effectiveness of treatment [25]. Considering this, FoG is characterized as an episodic (seconds-long) failure to create an effective step, and its pathophysiology likely involves context-dependant dysfunction across multiple neuronal levels, including cortical, subcortical and brainstem regions [26]. FoG can occur at the beginning of the first step, when turning, performing several tasks, walking through narrow spaces, arriving at a destination, or walking through doors. Accelerated pacing (festination), another sign of Parkinson's disease, is clinically characterized as the patient's tendency to walk forward in increasingly rapid and shorter steps, with the patient's center of gravity shifting forward, over the supporting leg. There is evidence that individuals with Parkinson's disease who frequently present FoG follow a pattern of accelerated pacing prior to "freezing" [27]. Several studies show that, among a number of gait metrics, step duration, step length, double support time and swing time might be used to create models that predict episodes of FoG [28].
Therefore, an objective and quantitative analysis of the modality of gait could potentially improve the current approach, which may aid in Parkinson's disease patient diagnosis, symptom monitoring, therapy management, rehabilitation, and fall risk assessment and prevention.

B. Analysis of Upper Limb Motion
Typically, the motor manifestations of PD have a localized onset (i.e. reduced arm swing, hand tremor [29]). Uncertainty exists over the appearance of a recognized somatotopic pattern of development and progression of motor symptoms in PD. According to patterns based on postmortem studies and in vivo positron emission tomography (PET) imaging, motor indications of Parkinson's disease should begin in the lower extremities and progress upward [29]. However, the majority of neurologists would concur that the early motor symptoms of PD manifest in the upper limbs. This is a subjective conclusion based on the experience, which might be influenced by observational bias. Therefore, the motor manifestations of the upper limb may be more apparent to patients and clinicians than the early, modest impairments of the lower limb and face [30].
Tremor, which is defined as an involuntary, rhythmic, and oscillating movement of a body part, is one of the most common symptoms of PD [31] and is referred as Parkinsonian tremor (PT). The objective of the tremor analysis is to quantify PD tremor, which is rhythmic, has a typical frequency, and is more prevalent in the hands during rest [32]. There are various types of tremor, each with unique causes and characteristics [33]. The clinical picture, the correct interpretation of medical history, and the doctor's experience all contribute to a more precise determination of tremor type. Despite the existence of a variety of diagnostic methods for Parkinson's disease and tremor, their rapid and effective differentiation, especially in their early stages, proves especially challenging due to their wide variety of causes and symptom similarities [34].
In addition to tremor episodes, freezing phenomena are a significant cause of disease-related disability in PD [35]. Freezing occurs most frequently during walking (FoG), as well as during swallowing, speech, and especially repetitive movements of the upper limbs [36]. As with FoG, freeze of the upper limbs impairs daily activities such as handwriting, tooth brushing, typing, and bimanual coordination to a significant degree [37]. Consequently, it is essential to determine the cause and detect the abnormalities in the motion of the upper limbs in order to effectively reduce and treat the associated syndrome.

C. Analysis of Speech
Speech is an essential biological feature of human beings and impairments in voice can indicate possible disorders, such as PD. The majority of PD patients develop a variety of speech disorders while speech might be affected years before the main motor symptoms of the disease appear [38]. Therefore, the modality of speech is crucial to be investigated early and in depth.
The most important and common impairments include reduced loudness, monopitch, monoloudness, breathy and hoarse voice quality, reduced stress and imprecise articulation [39], [40]. These changes in speech belong to the hypokinetic dysarthria category and can be exploited as possible biomarkers so as to identify early signs of this neurological disorder [41]. Sustained vowels as well as reading sentences, words and short texts have been employed to distinguish the PD patients from healthy controls [38], [42]. Moreover, imprecise articulation, phonation and prosody have been observed in specific consonants, as mentioned in [41]. All these symptoms can be characterized as mild, moderate or severe [38].

D. Analysis of Facial Expressions
Facial expressions constitute a fundamental source of information for disease analysis, manifested in the early stages of the disease, even years before diagnosis [43]. Hypomimia, also known as facial amimia, is one of the most prominent clinical indications of Parkinson's disease, characterized by a decrease or lack of spontaneous facial movements, smallamplitude and low-velocity voluntary orofacial movements, and emotional expressiveness. Moreover, it has been reported that there is a possible relationship between amimia and other axial symptoms, such as gait freezing [44].
Observable signs include, among others, abnormalities associated with a deeper indentation of the eyelids, a staring expression, involuntary mouth opening, and stiffness in the orbicularis oculi muscles [45]. On the upper face, hypomimia often is often presented as a reduction in blink rate, but on the lower face, spontaneous smiling difficulties are noticed [46]. In addition, the early manifestations of facial muscle movement-related symptoms along with their possible correlation with motor symptoms of PD highlight the need for more research to be conducted on the alterations of facial expressions.

III. RELATED WORKS
In recent years, the number of publications on the application of deep learning to the diagnosis of PD has increased. Although previous studies have reviewed the use of machine learning in the diagnosis and assessment of PD, they were limited to the analysis of motor symptoms, kinematics, and wearable sensor data as well as the utilization of traditional ML techniques [47], [48]). Recently, [49] provided a comprehensive analysis of the influence of machine learning and deep learning approaches applied towards Parkinson's disease diagnosis on the development of new research areas based on neuroimaging techniques as well as physiological signals obtained from speech, gait and handwriting. In addition, this research investigates the existing status and potential applications of data-driven AI technologies in the diagnosis of Parkinson's disease.
Healthcare services are gaining interest in computer-aided diagnostic (CAD) technologies based on artificial intelligence methods that can perform automated diagnosis of Parkinson's disease. To this goal, authors in [50], collected 63 studies (2011-2021) on deep learning from various modalities including brain analyses (SPECT, PET, MRI, and EEG) and motion symptoms (gait, handwriting, speech, EMG). They demonstrated that deep learning models can reach excellent prediction accuracy for Parkinson's disease, particularly the convolutional neural network (CNN) model, which has been widely recommended by studies focusing on image classification for brain imaging and handwriting analysis. Additionally, the CNN model worked well with one-dimensional inputs such as EEG and speech analysis. This work suggests that academics will be incentivized to use more explainable and interpretable methodologies in deep learning-based CAD tools, which will subsequently be adopted by end-users and enhance the health care results for the increasing number of people affected by PD around the world.
In order to offer a complete overview of the data modalities and machine learning algorithms utilized in the diagnosis and differential diagnosis of PD, [51] conducted a literature evaluation of papers published through 2020, using PubMed and IEEE Xplore. This study examines the aims, data sources and types, machine learning methods, and associated outcomes of 209 papers that were included. These results reveal a strong potential for the implementation of ML techniques and novel biomarkers in clinical decision making, resulting in a more systematic and accurate diagnosis of Parkinson's disease.
In this systematic review, we aim to (a) comprehensively summarize published studies that applied advanced DL models to the diagnosis of PD-related symptoms and estimation of disease severity levels for an exhaustive overview of data sources, data types, deep learning models, and associated outcomes, (b) assess and compare the feasibility and efficiency of the different DL methods in the diagnosis of PD by utilizing information obtained from four different modalities: gait, upper limb motion, speech, and facial expressions, (c) provide machine learning practitioners interested in the diagnosis of PD with an overview of previously used models and data modalities and the associated outcomes as well as the different PD diagnostic targets in relation to each modality studied, and (d) present and comment upon the variety and types of sensors used to acquire the desired information from the studies modalities. The application of AI to clinical and non-clinical data of different modalities has often led to high diagnostic accuracies in human participants, therefore may encourage the adaptation of cutting-edge DL algorithms and novel biomarkers in clinical settings to assist more accurate and informed decision making.

IV. RESEARCH METHOD
This study was designed and carried out by following guidelines for systematic literature reviews [52]. We followed the process depicted in Fig. 1, which can be divided into the three common phases of planning, conducting, and documenting.
The objective of Phase 1, i.e. the Planning phase, was to: • establish the need for a review of DL methods implemented towards the diagnosis of PD.
• identify research goal and more importantly questions, and • define the protocol to be followed by the research team for carrying out the work in a systematic and pinpoint manner. The output of the Phase 1 is a detailed review protocol.
The objective of Phase 2 was to perform the SLR by carrying out all the steps defined in the review protocol, as follows: • Search and selection: Four peer-reviewed databases were searched automatically. Then, possible entries were filtered to create the final list of studies to be reviewed. After selection, we performed comprehensive backward and forward snowballing.
• Data extraction form definition and classification framework: We compared and categorized studies considering research topics [53]. This was accomplished methodically with keywording.
• Data extraction and synthesis: We examined each main research in depth, hence completing the associated data extraction form. Forms were gathered and aggregated for further analysis and synthesis. We also reviewed and analyzed the previous data. This task elaborated on extracted data to address each research question. In Phase 3, we analyzed and synthesized the data. For independent replication and verification, we developed a thorough replication kit. We include search and selection raw data, the full list of important research, and data extraction raw data.

A. Research Goal and Questions
The goal of this study was to identify, classify, and evaluate trends, focus, and open challenges in existing research on the diagnosis of Parkinson's disease by utilizing cutting-edge deep learning techniques. Our focus is to search for deep learning methods utilized towards the detection of various indicators of Parkinson's disease such as motor and non-motor symptoms, severity levels or other PD related characteristics [54].
Our overall goal can be refined into the following specific research questions (RQ), for each of which we also provide primary objective of investigation: • RQ1: What are the most advanced DL methods used for the diagnosis of PD and the assessment of PD severity using the modalities of interest?
• RQ2: Has DL being used for non-conventional PDrelated diagnostic targets?
• RQ3: Could the fusion of multi-modal information offer more personalized and accurate diagnosis?
• RQ4: Which sensing systems are used to analyze the symptoms of PD related to speech, facial expressions, upper limb movement and gait? Based on the review results obtained by answering RQ1, we offer a deep knowledge of the most advanced DL techniques and frameworks employed towards the diagnosis of PD, its symptoms and severity levels from data obtained from the modalities of speech, gait, upper limb motions and facial expressiveness. Answering RQ2, gives an insight regarding the existence of DL models aiming to the identification of non-conventional PD diagnostic targets. Moreover, answering RQ3 helps the community to understand whether there is potential in the adoption of DL methods based on the fusion of different modalities and space for improvement in this research area. Finally, by answering RQ4 we provide a solid foundation for a thorough comparison of existing sensory equipment solutions that are currently used to extract the necessary information which are then processed and served as input to the DL models.
contribution is useful for both (i) researchers to further contribute to this research area by defining new approaches Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. or refining existing ones, and (ii) practitioners to better understand existing methods and techniques and thereby to be able to adopt the one that better suites their research and business goals.

B. Search and Selection Strategy
In this phase, we gathered the set of research studies that are relevant and representative for our purposes. Before performing the actual search and selection of relevant studies, we manually selected a set of ten pilot studies. They were selected based on the authors' knowledge of the targeted research domain (i.e., DL methods for multi-modal PD diagnosis) and on an informal preliminary screening that we performed on the available literature on the topic. Selected pilot studies fulfil our selection criteria (see below) and they are presented in Table I. Pilot studies were used to validate our search and selection strategy; more specifically, we used them to have quick feedback about the goodness of our search string to be used for the automatic search and for guiding the refinement of the selection criteria.
1) Automatic Search: During this phase, automated searches were conducted on the electronic databases and indexing systems specified in Table II. As indicated in [64], we selected four of the largest and most comprehensive scientific databases and indexing systems in biomedical engineering, namely SCOPUS, IEEE Xplore Digital Library, PubMed, and ACM Digital Library in order to cover as much relevant material as possible. The selection of these electronic databases and indexing systems was strongly affected by their high accessibility, their capacity to export search results to well-defined, computation-friendly formats, and the fact that they have been acknowledged as an efficient way to conduct systematic literature reviews in biomedical engineering.
To create the search string, we considered initially the research questions and afterwards the set of pilot studies. Then, we retrieved a list of relevant concepts, their synonyms, abbreviations, and alternative spellings, and merged them with ANDs and ORs to form the final search string. The search string displayed below was evaluated by conducting pilot searches on the four sites and comparing the findings to all pilot studies, which were required to be included in the acquired results. The actual search strings used for each database were obtained by syntactically modifying them to the unique database's properties. We searched for the search string on the paper's title, abstract, and keywords; the automated searches yielded 3342 possible studies. disease OR phenotyp* OR symptom* OR stage* OR severity AND diagnos* OR assess* OR identif* OR classif* OR recogn* AND lower limb* OR gait OR face OR facial OR speech OR voice OR tremor OR upper limb* OR multimodal* OR multi-modal* AND machine learning OR deep learning OR neural net* 2) Impurity and Duplicates Removal: Due to the nature of electronic databases and indexing systems, search results may contain items that are obviously not research papers, such as conference and workshop proceedings, international standards, textbooks, book chapters, etc., as well as duplicates. At this step, we manually eliminated impurities and merged duplicates. Fig. 2 presents the percentage of the studies included between years 2016 and 2023.

parkinson* AND
3) Selection Criteria: After removing impurities and duplicates, our inclusion and exclusion criteria were applied to all the remaining research to determine their possible inclusion in the set of studies. Each study was analyzed in two steps: initially, its title, keywords, and abstract were considered; secondly, if the analysis did not result in a clear conclusion, the introduction and conclusion parts were reviewed. The following selection criteria were applied: Inclusion Criteria for Peer-Reviewed Literature: 1) Studies proposing DL-based PD diagnostic methods based on gait, speech, facial and upper limb movement characteristics. 2) Studies subject to peer review [65].
3) Studies written in English.

4) Studies available as full-text.
Exclusion Criteria for Peer-reviewed literature 1) Secondary and tertiary studies (e.g., systematic literature reviews, surveys). 2) Studies in the form of tutorial papers, short papers (≤ 3 pages), poster papers, editorials, manuals, because they do not provide enough information. 3) Studies that do not involve subjects diagnosed with PD, but healthy volunteers mimicing movements and symptoms caused by the disease. 4) Studies that do not include any information about the DL models that were employed. To select studies objectively, two researchers actively participated in this phase (V.S. and A.P.). More specifically, by following the method proposed in [66], each potentially relevant study was classified by the researchers as relevant, uncertain, or irrelevant according to the selection criteria above. Studies classified as irrelevant by both raters were immediately excluded, while those marked as relevant were preliminary included. For the uncertain cases, the selection team discussed with the mediation of one more researcher, the mediator (MT). The Prisma model, shown in Fig. 3, depicts the overall paper review process, the studies included, and provides the number of research papers involved at each stage of the pipeline.

C. Data Extraction
At the beginning of this phase, we have developed a data extraction form to be used to collect data retrieved from each primary study, as shown in Table III. For addressing specific questions about the identified research, we took into account standard information such as the title, authors, type, and publication year of each study. For the research questions, we followed a systematic procedure based on keywording for defining the characteristics of each cluster of the data extraction form and obtaining the corresponding data from the studies.
The objective of the keywording was to create an extraction form that was compatible with previous research and took their features into consideration [53]. Specifically, we gathered keywords and topics by reviewing the complete texts of pilot studies. The gathered keywords and concepts were then clustered in order to organize them according to the selected categories. During the actual extraction process, we obtained any relevant information that did not fit on the data extraction form. We assessed the collected additional data and, when necessary, modified the data extraction form to better accommodate the data; previously analyzed studies were re-analyzed using the modified data extraction form. This procedure was not complete until all studies had been analyzed. The final total number of studies considered and evaluated was 87.

V. DATA ANALYSIS AND SYNTHESIS
In this Section we will focus in providing answers to the questions framed in Section IV, based on the analysis of the available evidence in the studies reviewed. Each subSection will cover one distinct research question.

A. Advanced DL Methods for PD Diagnosis and Severity Assessment Based on the Investigated Modalities
In the next subsections we will present the advanced deep learning techniques and frameworks that have been reported in the literature for the detection of PD's intensity and evolution based on the analysis of gait, upper limb motion, speech and facial expressiveness. With the term "advanced" we refer to recent DL techniques that go beyond the most commonly used methods and either are based on the fusion of DL algorithms or either explore cutting-edge techniques such as attention models, autoencoders, and Generative Adversarial Networks (GANs) [67], [68]. The presented literature review identified a number of advanced DL-based models built for PD diagnosis based on the discussed four modalities. The results are depicted in Tables IV -V. 1) Analysis of Gait: As described in Section II, the objective and quantitative analysis of the modality of gait could potentially lead to a more automatic diagnosis of PD and its symptoms. Based on the findings of the presented literature review, advanced DL methods have been developed accepting as input gait signals during various walking trials. Notably, the majority of studies aim towards the automatic identification of FoG-related events, as until now, the assessment of FoG requires well-trained experts to perform time-consuming annotations via real-time or vision-based observations. The respective studies are presented in Table IV.
In order to make a clinical diagnosis, it is time-consuming and subjective for professionals to evaluate the patient's stride. Currently, the formulation of FoG identification as a human action recognition job in video analysis offers a viable answer to these problems. Nevertheless, the majority of existing human action detection algorithms are inadequate for this task, as FoG is extremely delicate and is readily disregarded when it is obscured by irrelevant motion. A novel action detection technique, the convolutional 3D attention network (C3DAN), comprised of a Spatial Attention Network (SAN) and a 3-dimensional convolutional network (C3D), was developed to address this issue [69]. SAN seek to produce a coarseto-fine attention area, whereas C3D extracted discriminative characteristics. The suggested method can pinpoint the attention region without requiring manual annotation and extract discriminative characteristics from beginning to conclusion. The suggested C3DAN approach for quantifying FoG in PD was assessed on a video dataset gathered from 45 PD patients in a clinical environment and achieved an accuracy of 79.30%. Further, acceleration signals consisted of 237 FoG occurrences were acquired from 10 PD patients' lower back during walking trials in [70]. From successive gait cycles, acceleration patterns and spectrograms were generated and used for FoG detection model training and various domain contributions to FoG detection model training were evaluated by comparing the model's performance. Deep convolutional layers in conjunction with recurrent layers (DeepCNN-LSTM) were learned offline and then used to detect FoG or non-FoG events. The authors achieved an average FoG detection accuracy of 94.30% using the acceleration spectrogram as input.
Another research attempt [71] combined CNN and LSTM networks for FoG recognition. In particular, the authors proposed a FoG detection system in which hand-crafted features were fed to a hybrid CNN-LSTM model for additional feature learning and classification. The manually produced features with time-frequency representation were recovered from the raw sensor data using a discrete wavelet transform (DWT) on many levels. The CNN and bidirectional long short-term memory network (BiLSTM) hybrid deep learning architecture was then utilized to extract deep features and categorize FoG events with an accuracy of 90.01%. Concerning hybrid neural networks, researchers in [72] outlined a deep learning-based strategy consisting of a hybrid Neural Network constructed by merging a CNN, LSTM, and DNN. This approach produced a classification accuracy that was 3.90% higher than comparable efforts. Finally, squeezed and excited deep learning was utilized to solve the FoG detection issue using wearable sensors in [73] by proposing a SE-CNN deep learning model. Each convolutional layer integrated channel-specific input from the squeeze and excitation module due to this mechanism. The Daphnet dataset was utilized to assess the suggested deep learning model, which obtained 95.66% accuracy when compared to other based methods.
Since FoG events may be identified through the motion patterns of joints, scientists defined vision-based FoG detection as a fine-grained graph sequence modelling challenge by modeling the anatomic joints in each temporal segment with a directed graph [74]. To describe FoG patterns, a novel graph sequence recurrent neural network (GS-RNN) with graph recurrent cells that accept graph sequences of dynamic structures as inputs was presented. Experimental results on more than 150 films collected from 45 patients indicated that the suggested GS-RNN for FoG identification displayed good performance with an AUC of 0.80. To overcome the lack of data, patient-independent models were employed to identify FoG and demonstrated high sensitivity but low specificity, or vice versa. Authors in [75] created a Deep Gait Anomaly Detector (DGAD) employing a transfer learning-based technique to increase FoG detection accuracy, while examining the influence of data augmentation and extra pre-FoG segments on the prediction rate. Seven patients with Parkinson's disease undertook a variety of everyday walking exercises while wearing inertial measuring units with the target models accounting for 87.40% of FoG onsets. Additionally, researchers in [76], presented the classification of FoG episodes utilizing Wi-Fi and radar imaging by leveraging multiresolution scalograms formed by channel state information (CSI) imprint and micro-Doppler signatures produced by reflected radar signal. 120 participants participated in a variety of activities, including walking at varying speeds, voluntary stops, sitting and standing, and FoG-inducing exercises. Combining the pictures received from both sensing approaches, the suggested improved Autoencoder was utilized to identify FoG episodes via a data fusion procedure. Using data fusion, the suggested technique achieved an overall accuracy of 98.00%.
Due to the limited number of patients and characteristics in the most frequent utilized datasets for the use of deep learning, the bulk of research concentrate on binary classification tasks, particularly the distinction between PD and non-PD subjects and the presence of FoG in PD patients. However, the early detection of varying degrees of disease severity would additionally aid doctors and likely result in a more individualized disease evaluation.
Following this perspective, a novel PD-ResNet structure based on the ResNet unit was introduced in order to achieve the automated recognition of PD severity based on the H&Y scale (early PD: H&Y score ≤ 2.5, moderate to advanced PD: H&Y score > 2.5 [17]. In this study, polynomial enhanced dimensions technology was used to increase the dimensionality of the input data features, and synthetic minority oversampling technique (SMOTE) was employed to accomplish sample balance. Further, PD-ResNet which gathers abundant feature information and effectively solves the gradient problem along with an enhanced focus loss function were suggested. Experiments demonstrated that the proposed PD-ResNet with enhanced focused loss function could identify H&Y stage efficiently with an accuracy of 92.00%.
Furthermore, researchers in [77] performed pairwise analysis of gait data and introduced a novel method for assessing the relative severity level of PD patients based on the scores of UPDRS scale (normal, mild, and moderate severity level). In order to achieve this objective, a novel deep learning architecture for pairwise rating of multivariate time-series data acquired by Ground Reaction Force (GRF) sensors worn on the foot was developed. In 10-fold cross validation, the proposed model, called Ranking by Siamese Recurrent Network with Attention, achieved 81.00% with an AUC of 0.88. To realize automated quantitative assessment of gait motor disorder in PD patients using gait videos, authors in [55] proposed a two-stream spatial-temporal attention graph convolutional network (2s-ST-AGCN) under deep supervision and model-driven scheme. In particular, the spatial organization and temporal dynamics of the joints and bones were modeled. Experiments done on a clinical dataset to demonstrate the usefulness of the proposed model for identifying PD severity levels (MDS-UPDRS scores 0-4) yielded satisfactory results with an accuracy of 98.90%.
The review also identified two surveys utilizing advanced DL methods for PD recognition. By fusing and aggregating data from several sensors on the lower limbs, a unique hybrid model was suggested by [78] to discover the gait differences amongst three neurodegenerative disorders (ALS, PD, HD). Utilizing a spatial feature extractor (SFE), representative features of pictures or signals were generated. A novel correlative memory neural network (CorrMNN) architecture was developed to extract temporal characteristics from the two modalities' input in order to collect temporal information. The researchers then incorporated a multiswitch discriminator to link the observations with individual state predictions, achieving an accuracy of 99.00%. Finally, authors in [79] created a deep time series-based method for the detection of aberrant walking patterns in the gait dynamics of elderly individuals based on a hybrid LSTM-MLP network. The results demonstrated a testing accuracy of 70.00%.
2) Analysis of Upper Limb Motion: As already mentioned in previous Sections, motor symptoms occurring at upper limbs (fingers, wrists, hands, upper arms) are common manifestations of PD. Table IX in the Appendix summarizes the studies and their findings that employ DL methods for the detection of PD using motion characteristics presented in the upper limbs.
Looking at Section II, a typical manifestation of PD is the tremor symptom occurring at fingers, wrists and hands [80]. This type of tremor, also known as pathological hand tremor (PHT), compromises manual aiming, motor coordination, and movement dynamics and it is also considered as the main characteristic of the condition of essential tremor (ET) [81]. Effective treatment and management of the symptoms depends on the accurate and timely identification of the afflicted individuals, with the PHT features serving as a crucial parameter for differential diagnosis [33]. To this goal, the presented review identified two surveys that utilized advanced deep learning for discriminating between ET and Parkinson's tremor (PT). The respective studies are presented in Table V. Researchers in [82], integrated Gated Recurrent Unit (GRU) and LSTM algorithms. Initially, accelerometer sensors were utilized to capture hand tremors in each subject's three axial dimensions. These data are then pre-processed via the conventional scalar function and scaled in-unit variance, passed through the GRU model, and then served as input into the LSTM model to enhance its performance. Finally, they utilized a blockchain network to confirm the testing accuracy of the trained model, which was 74.10%. Additionally, in [83] a data-driven NeurD-Net model was proposed analyzing the kinematics of the hand classifying between PD and ET. NeurDNet was trained on more than 90 hours of hand motion signals including 250 tremor evaluations from 81 patients, exceeding its state-of-the-art equivalents with a differential diagnostic accuracy of 95.55%.
Considering the detection of PD, researchers in [84] studied the applicability of CNN and CNN-BLSTM models using time series classification. Raw time series from pen-based signals were employed from the CNN-BLSTM and various data augmentation methodologies were presented in order to train these algorithms for PD detection on large-scale data. In this context, the Multi-Modal Collection (PDMultiMC) collection was produced [85], which contains recordings of online handwriting, voice signals, and eye movements. The HandPDMultiMC dataset, a subset of PDMultiMC, contains examples of handwriting from 42 participants (21 PD and 21 controls). Experimental results on the dataset revealed that CNN-BLSTM models trained with jittering and synthetic data augmentation provided the highest performance for early PD identification (97.62% accuracy) when combined with synthetic data augmentation.
3) Analysis of Speech: As a fundamental biological characteristic of humans, voiceprint is widely utilized in medical research and diagnostics, particularly in the identification of Parkinson's disease [89]. Despite the fact that there are several symptoms and features that signal PD, voice characteristics play a significant part among the predictive factors, as explained in Section II. A person with PD exhibits a variety of vocal impairments, including trembling and poor speaking. Voice analysis has the added advantages of being non-invasive, inexpensive, and easy to diagnose. The presented literature review identified a number of ardent researchers that invented new models and refined current ones to classify between PD and healthy subjects. The selected studies can been seen in Table V.
To address the restricted amount of current patient voiceprint datasets and samples, [86] developed a Spectrogram Deep Convolutional Generative Adversarial Network (S-DCGAN) for sample augmentation. S-DCGAN created a high-resolution spectrogram by adding network layers, the Spectral Normalization (SN) approach, and a feature matching strategy. To enhance the samples, spectrograms with high similarity and low distortion were chosen based on the Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Fréchet Inception Distance (FID) and GAN-train results demonstrated the data's potential to generalize. Moreover, the authors built the ResNet50 model with a Global Average Pooling (GAP) layer to efficiently collect and categorize voiceprint data in order to increase recognition accuracy. Finally, comparison tests on various models and classification techniques were performed with the results indicating that the S-DCGAN-ResNet50 hybrid model achieved the maximum voiceprint recognition accuracy of 91.25% and sensitivity of 92.5%, allowing it to discriminate between PD patients and healthy individuals more precisely than the simpler DCGAN-ResNet50.
Finally, in a study in [87], Shimmer, Jitter, Harmonic parameters, Frequency parameters, Detrended Fluctuation Analysis (DFA), Recurrence Period Density Entropy (RPDE), and Pitch Period Entropy (PPE) related features were employed and Conv-XGB had been selected to differentiate PD from  [88]. As input to the stacked autoencoder deep network, a spectrogram and scalogram of voice signals were utilized. Support vector machine (SVM) and Softmax classifiers were utilized to evaluate the retrieved features and the stacked auto-encoder based time-frequency features with softmax classifier achieved the highest level of accuracy (87.00%).

4) Analysis of Facial Expressions:
Expressions on the face represent the prevalent symptoms of Parkinson's disease. The majority of the time, medical professionals identify Parkinson's disease in patients through intrusive, costly, and arduous medical testing as well as careful overtime monitoring. Thus, it is vital to design an alternative, cost-effective, and lasting approach that can aid the physician in analyzing the entire behavior of PD patients [90]. However, based on the results of this review, only a few studies have utilized the modality of facial expressions to diagnose PD-related characteristics by integrating deep learning algorithms, as can be observed in Table XI in the Appendix. These algorithms follow more traditional DL architectures as typical CNNs dominate over other models such as LSTMs regarding the differential diagnosis of PD, the symptom of hypomimia and the emotional expressiveness.

B. DL for Non-Conventional PD-Related Diagnostic Targets
In this Section we focus on studies that do not concern common PD-related diagnostic targets as identified in Section II as well as our conducted literature review.
1) Vascular Parkinsonism: Vascular Parkinsonism (VaP) is often characterized by lower body parkinsonism with early and quick impairment of gait and/or postural control and less tremor, in contrast to Idiopathic Parkinsonism (IPD). Patients with VaP often present a characteristic shuffling gait, but may also exhibit significant FoG, even in early stages of the disease [91]. Two studies evaluated the effectiveness of some machine learning strategies in distinguishing IPD and VaP gait and are presented in Tables VII -I. In [92], two supervised machine learning techniques, Multiple Layer Perceptrons (MLPs) and Deep Belief Networks (DBNs), were utilized to undertake a comparative classification study. The decisional space consisted of gait characteristics, with or without neuropsychological test (Montreal cognitive assessment (MoCA) score), that were rated highest in an error incremental analysis. For the classification task of identifying parkinsonian gait by discriminating between patients (IPD+VaP) and healthy controls, both algorithms achieved excellent accuracy (93.00% with or without MoCA). In the classification test of the two patient groups (VaP and IPD), the DBN classifier performed better (73.00% with MoCA) than the MoCA classifier. In 2021, a new approach for gait pattern differentiation that used CNNs based on gait time series with and without the influence of levodopa medication was proposed [93]. The gait data of VaP patients, IPD patients, and healthy people were collected using sensors worn on both feet. Utilizing the linear support vector machine, lasso, and random forest, recursive feature elimination was used to determine the ideal feature subset that led to the best outcomes. Multiple hyperparameters and feature subsets were utilized to implement CNNs. The best CNN classifier obtained an accuracy of 86.00% when the impact of levodopa medication (OFF/ON state) was considered simultaneously. The drug reaction to levodopa increased classification performance.
2) Multiple FoG Events: The combination of freezing of gait prediction and rhythmic laser signals may assist PD patients in overcoming FoG episodes. Two research sought to use the poor gait patterns, before FoG, to develop DL-based models for multi-class FoG prediction [94], [95]. For the accurate classification of the gait prior to FoG (pre-FoG), the slope of the impaired gait pattern was used to define the individualized pre-FoG phase. On the basis of the pre-FoG phase and the relabeled gait data, the customized labeled FoG prediction LSTM and CNN models were constructed, yielding positive performance results. In another research [96], using machine learning methods, the freezing of gait event was identified prior to its commencement, hence producing walking, FoG, and gait transition classes. Using the Boruta technique for feature selection, the DMLP-based model was applied to 5-second windows with an average accuracy of 78.00%.
3) Speech Dysarthria Recognition: In the earliest stages of PD, 90% of patients experience voice abnormalities, specifically hypokinetic dysarthria [97]. Researchers in [39] examined the problem of speech-based categorization of amyotrophic lateral sclerosis (ALS) patients, Parkinson's disease patients, and healthy controls (HC). To this goal, a spectrogram-based method is utilized with a 2D-CNN. By feeding overlapping windows to the CNN, temporal elements were taken into account through the use of short signal segments or broad analysis filters. A categorization of dysarthria into three classes (ALS, PD, or HC) was conducted. In addition, authors conducted a classification experiment for PD severity (3 class). Both baseline Mel frequency cepstral coefficients (MFCC) data and log Mel spectrograms were utilized in experiments. For a variety of audio durations, classification results indicate that models trained on log Mel spectrograms regularly outperform MFCC's, achieving an accuracy of 93.00%.
Additionally, a multitask learning scheme was developed in [41] to evaluate the severity of several speech deficits in PD patients. Consideration was given to a CNN-based deep learning strategy for multitask learning. Time-frequency representations of segment transitions between voiced and unvoiced segments served as input to CNNs. The evaluated tasks corresponded to subscores of a comprehensive scale developed to assess the patients' dysarthria impairments. Multitask learning enhanced the generalization of CNN, resulting in more representative feature maps for assessing the speech symptoms of PD patients. The results suggested that training a CNN in a multitasks learning scheme is preferable to training individual CNNs to learn tasks for each deficiency of PD patients. The aforementioned research are presented in Table X in the Appendix. 4) Alterations in Facial Expressions -Hypomimia: In clinical practice, the evaluation of hypomimia symptoms remains subjective or is confined to the identification of a few landmarks that inadequately explain the disease's subtle manifestations [59]. A recent study [98] presented a novel digital biomarker, represented as a spatio-temporal convolutional representation that learns facial movement patterns to differentiate between Parkinson and control patients as seen in Table XI in the Appendix. The suggested architecture constructs a representation using 3D convolutional layers that are combined from inception modules, achieving salient face expression activation maps. In a retrospective investigation including 16 Parkinson patients and 16 controls, this method was verified. Using 480 video sequences, the architecture achieved an average accuracy of 91.87% in a disease classification condition task.
Further, researchers in [99], tested whether contemporary computer vision techniques may be used to detect veiled facial features and measure medication states in PD. In order to identify PD hypomimia signals, a CNN model was trained, using pictures collected from videos of PD patients and controls. This trained model was applied to clinical interviews with 35 PD patients in their drug-induced and non-induced motor states. Conclusively, the algorithm detected PD hypomimia with a test set AUC of 0.71, compared to 0.75 for expert neurologists using the UPDRS-III Facial Expression score. In addition, the classification accuracy of the model for on and off drug states in clinical samples was 63.00% compared to 46.00% when utilizing clinical rater scores.

C. Fusion of Multi-Modal Information
Although preclinical Parkinson's disease detection has been investigated, a practical, cost-effective, and comprehensive screening diagnostic has not yet been developed [100]. Due to the high heterogeneity and complexity in the progression of PD, as well as the difficulties in collecting a single time-point measurement of a single sign, it would be almost impossible to fulfill the aim of precise treatment and severity evaluation without incorporating a combination of bio-signals from different modalities. In this work, we identified a number of studies that followed a multi-modal approach towards PD diagnosis and are depicted in Table VI.
Researchers in [63] proposed a unique classification method for PD patients and healthy controls utilizing Bidirectional Long Short-Term Neural Networks (BLSTM). SensHand and SensFoot inertial wearable sensors for upper and lower limb motion analysis were utilized to capture motion data for thirteen tasks drawn from the MDS-UPDRS Part III. The retrieved spatiotemporal and frequency characteristics were used to each participant as a single input for the development of a recurrent BLSTM to distinguish between the two groups. Maximum achievable accuracy was 82.40% and the respective findings demonstrated that the selected features greatly contributed to assessing the long-term pattern in BLSTM for the evaluation of the PD, and that the increase in batch size could affect the accuracy of the training and testing models. Furthermore, a DMLP classifier for mobile phone-based behavior analysis was presented in [105] to evaluate the course of PD patients by assessing their speech and movement patterns, as monitored by a smartphone accelerometer in their pockets at various times of the day. Popular machine learning classification algorithms were applied to a dataset from UCI and a dataset collected by the authors in order to classify each patient as Parkinson's positive or negative. In addition, the performance of each approach based on its ability to appropriately categorize patients into one of these groups was evaluated, thus proving that, DMLP outperformed the rest of the models on both datasets.
Research in [102] offered a multimodal investigation of the motor skills of PD patients, accounting for deep learning architectures based on time-frequency representations and CNNs that integrate information from speech, handwriting, and gait signals. The proposed method replicated the incapacity of patients to initiate and terminate movement of their lower and upper limbs and vocal muscles while carrying out the studies outlined in Table VI. The feature maps learned by the CNN trained with multimodal input allowed for the interpretation of the neural network's hidden representations. The initial convolutional layers of the CNN trained using time-frequency representations of speech revealed statistically significant differences between PD patients and controls. With  TABLE VI  SUMMARY OF SELECTED ARTICLES FROM THE LITERATURE ON PD AND ITS SYMPTOMS DIAGNOSIS BASED ON MULTI-MODAL DATA the final handwriting-trained CNN layer, comparable results were obtained. The combination of the three bio-signals was the most reliable method for classifying PD patients according to their disease stage. CNNs seemed ideal for modeling the problems of Parkinson's disease (PD) patients to initiate and terminate the movement of separate limbs, allowing for the correct categorization of PD patients and control participants.
Combined dataset deconstruction with multi-source ensemble learning enabled participants with incomplete data to be included in the training of machine learning models, as demonstrated in [103]. Using multi-source ensemble learning in conjunction with CNNs that capitalize on the quantity of the available data, researchers achieved an accuracy of 82.00% in PD classification, 9.00% more compared to conventional strategies. The rise in accuracy was attributed in part to the use of CNNs with a DNN and in part to the development of models employing a large cohort of participants. Finally, [104] presented a unique time-series based on deep learning method to Parkinson's Disease prediction using remotely and irregularly acquired speech, hand motion, and gait data from smartphones. Using the Neural Ordinary Differential Equations, researchers synced discrete data to unified observational time points in order to generate multimodal time-series representations. In both the temporal and modality aspects, two hypothesized attention processes acquired key characteristics from noisy signals in an adaptive manner. The success of the suggested method was shown by insights and better quantitative and qualitative outcomes on a large public dataset.

D. Sensory Equipment for Detection of PD-Related
Clinical Signs 1) Sensors Capturing PD Gait: The analysis of gait, and specifically the analysis of the various events taking place during the stance phase and swing phase, is essential for the treatment of a variety of orthopaedic and neurological disorders [106]. The detection of typical gait events is a helpful technique for improving gait analysis, evaluating therapies for abnormal gait, and developing devices and sensors for gait support [22], [107], [108]. Therefore, gait analysis requires the quantified study of the parameters related to force, time and distance, by calculating a range of important temporal and spatial characteristics [109].
To analyze key features and movement patterns in normal and pathological gait, a variety of methods for computer-assisted analysis have been developed and are in clinical use. Human gait characteristics are detected by fixed devices such as optical motion capture systems, floor-based force platforms, and electronic treadmills, which are considered as "golden standards" [110]. However, these systems are only suitable for hospitals or hospital facilities due to their size, high cost, and the need for specialized personnel.
Video-based motion capture systems and instrumented motion analysis systems, have been well studied for obtaining gait characteristics. From the total forty-four (46) studies, 20% utilized COTS camera equipment in order to capture the body poses and motion of the subjects providing the researchers with precise lower limb markers for gait analysis. However the value of the camera-based setups has been mostly limited to a laboratory context, they are expensive and require specialized movement laboratories [111].
In alleviating the limitations of camera-based systems, wearable motion sensing devices and systems have been developed for accurate gait analysis in daily life settings. In particular, recent technological advancements in microelectromechanical systems (MEMS) have enabled the design and development of lightweight and low-cost inertial measurement units (IMUs) based wearable sensors that are used for gait and balance monitoring [112], [113]. The authors in [114] report results that indicate the superiority of IMUs over camera-based setups regarding reliability and precision for the detection of PD based on gait related features. Therefore, it is well-established that mobile inertial measuring equipment, i.e. accelerometers, gyroscopes and magnetometers, are capable of objectively tracking gait motion [115], [116]. Based on the results of our literature review, almost half of the selected surveys (45%) capitalized on the accuracy and reliability offered by IMUs and consequently used them to analyse gait and extract gait related characteristics. More specifically, the aforementioned studies used data obtained from IMUs placed primarily on the trunk (waist, lower back, chest), ankles and thighs at rates of 33, 31 and 20% respectively, according to the present literature review. Other sites where IMUs were placed were the wrists, calves and feet. Fig. 4 presents the placement and the percentage of the studies that employ IMU sensors on different lower extremities sites.
Nevertheless, due to the well-known defects associated with inertial sensor measurement data, such as time-variant sensor biases and measurement noise [117], there is an increasing need to exploit other types of sensors integrated in wearable devices. One such type of sensors are insole-based plantar pressure monitoring systems, that have risen to prominence as a key wearable tool for monitoring the course of chronic disease in patients. Internal plantar pressure measurement insoles and foot switches are the cutting-edge technology for gait phase analysis, since each phase can be associated with a specific value of the sensor output. Insoles offer better resolution compared to foot switches, as they allow the recording of the whole foot contact with the ground, which is not affected by the positioning of the foot switch [107], [118]. In addition, an important advantage is the ease of use as minimal intrusion is achieved.
Consequently, insole-based plantar pressure monitoring systems are expanding rapidly around the globe, with several research institutes and businesses demonstrating an increased interest in the field [119]. In gait analysis, they are frequently used to count steps and extract spatiotemporal information. They are regularly used in stability investigations to identify the center of pressure and, consequently, postural stability [120]. Although plantar pressure insoles have proven to be reliable for the analysis of gait and walking patterns, only 30% of the total number of selected studies have utilized insole systems, possibly due to their higher cost concerning hardware equipment but also software licences. Interestingly, none of the studies combined two or three of the aforementioned technologies (cameras, IMUs, insole pressure sensors). Finally, in 5% of the selected studies the sensory equipment was not specified.
Typically, while normal gait steps have a main frequency from 0.5 to 3 Hz, FoG events present a main frequency to the range of 6 to 8 Hz [121]. Therefore, it is essential to note, that the maximum and minimum sampling frequencies between all IMUs and pressure sensors used in the selected studies were found to be between 1080 and 30 Hz respectively, whereas the maximum and minimum frames per second between all the studies that employed camera-based equipment were 240 and 25 fps respectively.

2) Sensors Capturing Upper Limp PD Motion and
Tremor: Considering the importance of studying the motion of upper limbs towards the detection of PD and its evolution, the majority of the studies selected in this review utilized IMUs for PD detection, estimation of PD severity or the identification of Parkinson's tremor (PT). Specifically, in 44% of the studies the IMUs were placed on the subjects' wrists, in 25% they were placed on the subjects' fingers and the metacarpal area, and in 6% of the studies they were placed on the area, as shown in Fig.5. The rest of the studies did not clarify the exact point of placement of the IMUs. Furthermore, in addition to using IMUs, the hand position during clinical evaluation tasks was captured from a 3-camera setup in [122]. Additionally, a number of studies used digitized tablets to locate the coordinates of the pen during handwriting tasks and smartphones placed on a desk to analyze finger tapping activity.
Concluding, the characteristics of Parkinson's disease tremors have been thoroughly investigated. It has been reported that the frequencies of the classical rest tremor, isolated postural tremor, and kinetic tremor during slow movement are 3-7 Hz, 4-9 Hz, and 7-12 Hz, respectively [123]. The sensory equipment used in the relevant reviewed studies obtained motion information from the upper limb extremities with sampling rates ranging from 40 Hz to 12 kHz, enough to satisfy the Nyquist theorem.
3) Sensors Capturing PD Speech: As speech is considered of major importance for the early detection of PD patients, the way of recording the subjects' voice is a crucial issue in addition to the appropriate selection of the tasks to be performed. As PD patients suffer from a variety of neurological symptoms, the recording should be more convenient. As a consequence, initially, the data are usually recorded at 44.1 kHz or 48 kHz [124], [125]. The choice of the recording devices includes cell phones, so as to make the recording procedure available at any environment [39]. Moreover, in other studies, systems consisting of extend USB sound card with low level noise combined with microphones of "clip-on" type are used [42]. Furthermore, notice that, the understanding of neurological disorders through speech requires not only recordings of good quality but also of a satisfying duration for their accurate analysis, usually not very long. Finally, for the creation of a well-defined dataset, important role play the age and the balanced number of male and female subjects' choice [40], [41]. 4) Sensors Capturing PD Facial Expressions: Expressions on the face represent the prevalent symptoms of Parkinson's disease. The majority of the time, medical professionals identify Parkinson's disease in patients through intrusive, costly, and arduous medical testing as well as careful overtime monitoring. Thus, it is vital to design an alternative, costeffective, and lasting approach that can aid the physician in analyzing the entire behavior of PD patients [90]. In the studies reviewed, researchers record the facial expressions of participants by using high resolution digital cameras in order to identify relevant biomarkers and patterns that link the hypomimia with PD.

VI. DISCUSSION
The primary objective of this survey is to examine and project future research directions in the field of DL-based PD diagnosis techniques. This review provides a comprehensive overview of advanced deep learning based approaches for PD manifestations prediction by utilizing data from different modalities and in particular gait, speech, upper limbs and facial movements. We have analysed the importance of collecting physiological signals from these different modalities to diagnose Parkinson's Disease.
Based on the results presented in this literature review, almost one third (30%) of the total investigated studies made use of CNNs to reach their diagnostic targets, as depicted in Fig. 6. Moreover, CNNs were additionally employed in combination with other DL algorithms such as LSTMs or attention models. LSTM networks were the second most widely used models while a vast majority of the studies made use of simpler neural network models such as deep multi layer perceptrons. Additionally, 11% of the selected studies followed the more classical deep neural network concept to achieve their diagnostic targets. In addition, most articles considered accuracy, specificity, and sensitivity as metrics to validate the developed PD classification models' performance.
Observing Fig. 7, three (3) diagrams can be distinguished which vary according to the number of participants in each study. Each graph shows the accuracy rate of the applied model in relation to the 5 main classification targets as identified in the literature review i.e. FoG detection, PT detection, PD detection, PD severity and PT severity. The highest success rates were observed in studies that carried out analysis of gait and reached accuracy rates of over 98.00%, thus highlighting the efficiency of this particular modality in PD identification. These studies focused on the identification of PD and FoG events, but also on the discrimination of the level of severity of the disease. In fact, two of these studies involved more than 100 volunteers, which may have intrigued researchers to explore and investigate the potential of more complex DL models such as Autoencoders and two-stream spatial-temporal attention graph convolutional network (2s-ST-AGCN). Interestingly, among the studies with the highest accuracy scores and despite the widespread use of CNNs in the selected studies, only one of them used this technique and even in combination with RNN, towards FoG detection.
Regarding the studies that dealt with the detection of disease characteristics using upper extremity movement data, and by looking at Fig. 7, we can observe that research on the upper extremities has focused either on the recognition of the disease or on the identification of PT and classification of its severity. The recognition of the disease was performed successfully at a rate of 97.00% by the hybrid CNN-LSTM model and by a DMLP neural network. With the same percentage, the existence of the PT was also identified by the CNN algorithm. Interestingly, the classification of the severity levels of tremor was successfully performed at rates between 83.00 and 91.00%, which are among the lowest success rates of the upper limbs related studies investigated.
The majority of the research that used the speech modality aimed at identifying or differentially diagnose PD. According to the results of this literature review, the highest recognition rate was achieved by the NNge model and reached 96.00%. The classification of the disease severity levels was similarly high. More specifically, the levels of dysarthria were classified at a rate of 93.00% by the CNN model. Finally, novel network architectures such as ResNet, Conv XGB and SADN achieved a high rate in disease recognition above 87.00%.
The difficulty in altering facial expressions and the phenomenon of hypomimia are a result of the muscle stiffness caused by Parkinson's disease. Thus, the CNN model was able to identify among PD and non-PD patients at a rate of 92.00%, while in another study with the same diagnostic target the LSTM model achieved a 76.00% rate.
Without a combination of biosignals from various modalities, it would be nearly impossible to achieve the goal of precise treatment and severity evaluation, as previously stated. As mentioned, the highest accuracy rates in the single modality studies were related to the binary diagnosis of the disease. In accordance to this, the highest accuracy rate in the selected multi-modality studies (82.00%) is found when attempting to classify between PD and non-PD subjects. Notably, the two models that achieved this rate were the recurrent BLSTM model and the DL architecture combining DNN and CNN models. In fact, we must highlight the importance of a large  VII  SUMMARY OF SELECTED ARTICLES FROM THE LITERATURE ON PD AND ITS SYMPTOMS DIAGNOSIS  BASED ON LOWER LIMB MOVEMENT AND GAIT  TABLE VIII  SUMMARY OF SELECTED ARTICLES FROM THE LITERATURE ON PD AND ITS SYMPTOMS DIAGNOSIS  BASED ON LOWER LIMB MOVEMENT AND GAIT (CONTINUE) database, as the highest rate of classification of disease was found in two studies with over 100 participants, while the lowest (56.00%) was found in a survey with 84 subjects.

A. Challenges and Limitations
Despite the significant contributions made through a comprehensive synthesis of the most pertinent information on deep learning methods for clinical diagnosis, this systematic review has some limitations. It is acknowledged that collecting actual patient data is the most difficult task in the healthcare sector compared to other research fields. In most instances, neurodegenerative disease-related medical datasets are imbalanced. Specifically, deep learning studies for each investigated modality (gait, upper limb motion, speech, facial expressiveness) may utilize either public or private datasets to train their models. Therefore, comparing the performance of two deep learning models that were not trained with the same dataset could be quite challenging.
Furthermore, due to the significant variability of the studies with respect to the case of data and presentation of results, it was difficult to directly compare the outcomes associated with each type of model across studies, as some studies failed to indicate whether model performance was evaluated using a test set, and/or results given by models that did not yield the best per-study performance. In addition, it is our view that there is a lack of research on facial movement and expressiveness, as well as analyses involving multiple modalities. This makes it difficult to identify the model that performs better in these two categories. Lastly, the vast number of deep learning models proposed for gait analysis makes it difficult to identify the most effective model.

VII. CONCLUSION
Parkinson's disease requires early diagnosis and treatment to minimize its impact and preserve patients' independence. Early and accurate clinical diagnosis of PD is especially crucial in the context of emerging neuroprotective treatments. While prodromal diagnosis primarily relies on non-motor symptoms, our systematic review proposes a multi-modal deep learning approach to enhance early clinical diagnosis based on motor signs. By integrating data from various sources, DL-based solutions aim to improve the accuracy of clinical Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. In this study, we reviewed 87 studies on deep learning for four modalities (gait, upper limb motion, speech, and facial expressions) and their fusion. For the purpose of diagnosing Parkinson's disease and improving the model's accuracy, numerous studies have focused on gait, upper limb movement, and speech signals using both conventional and advanced techniques. However, there may be a need to investigate other diagnostic modalities that can be obtained outside of clinical settings, such as facial muscle movement signals. Additionally, signal fusion from multiple modalities is required to achieve more complex diagnostic goals, such as the disease's severity.
Taking this into consideration, even more difficult is the identification of the disease in its early stages, since the symptoms' heterogeneous appearance and progression result in complicated clinical picture of PD. Therefore, clinical research is also directed at deep phenotyping of the disease, an extensive analysis of the disease's distinct components that exceeds the scope of standard medical records. Deep learning models should be enriched to get high accuracy in the diagnosis of Parkinson's disease. Finally, we believe that metrics other than specificity and sensitivity could be introduced to produce even better provisions for experts in diagnosing PD. These recommendations can probably solve the challenges in enhancing the accuracy of Parkinson's disease classification.

APPENDIX
See Tables VII-XI.