Employing Classiﬁcation Techniques on SmartSpeech Biometric Data towards Identiﬁcation of Neurodevelopmental Disorders

: Early detection and evaluation of children at risk of neurodevelopmental disorders and/or communication deﬁcits is critical. While the current literature indicates a high prevalence of neurodevelopmental disorders, many children remain undiagnosed, resulting in missed opportunities for effective interventions that could have had a greater impact if administered earlier. Clinicians face a variety of complications during neurodevelopmental disorders’ evaluation procedures and must elevate their use of digital tools to aid in early detection efﬁciently. Artiﬁcial intelligence enables novelty in taking decisions, classiﬁcation, and diagnosis. The current research investigates the efﬁcacy of various machine learning approaches on the biometric SmartSpeech datasets. These datasets come from a new innovative system that includes a serious game which gathers children’s responses to speciﬁcally designed speech and language activities and their manifestations, intending to assist during the clinical evaluation of neurodevelopmental disorders. The machine learning approaches were used by utilizing the algorithms Radial Basis Function, Neural Network, Deep Learning Neural Networks, and a variation of Grammatical Evolution (GenClass). The most signiﬁcant results show improved accuracy (%) when using the eye tracking dataset; more speciﬁcally: (i) for the class Disorder with GenClass (92.83%), (ii) for the class Autism Spectrum Disorders with Deep Learning Neural Networks layer 4 (86.33%), (iii) for the class Attention Deﬁcit Hyperactivity Disorder with Deep Learning Neural Networks layer 4 (87.44%), (iv) for the class Intellectual Disability with GenClass (86.93%), (v) for the class Speciﬁc Learning Disorder with GenClass (88.88%), and (vi) for the class Communication Disorders with GenClass (88.70%). Overall, the results indicated GenClass to be nearly the top competitor, opening up additional probes for future studies toward automatically classifying and assisting clinical assessments for children with neurodevelopmental disorders.


Introduction
Neurodevelopmental disorders (NDs) are a group of disorders that typically appear in childhood and are characterized by impairments in neurological development that affect multiple aspects of communication, learning, social, behavior, cognitive, and emotional ability to function [1][2][3][4][5][6]. NDs include Autism Spectrum Disorders (ASD), Attention Deficit Hyperactivity Disorder (ADHD), Intellectual Disability (ID), Specific Learning Disorder (SLD), and Communication Disorders (CD) [1]. DSM 5 [1] defines these disorders' profiles with certain characteristics [1][2][3][4][5][6]: (i) ASD exhibits persistent difficulties with social digital technologies, letting novel insights into their visual and cognitive processing [18]. Eye tracking is a method for identifying diagnostic biomarkers with evidence in children with ASD [19][20][21], ADHD [22][23][24][25], ID [26][27][28], SLD [29,30], and CD [24,27]. The role of the autonomic nervous system has earned consideration for many types of neurophysiological features of NDs, such as ASD [31][32][33][34]. There are many characteristics that can be studied by taking heart rate measurements, of which a very common one is heart rate variability signal (HRV), since it has been found to be directly related to health [35], mental stress [36], cognitive functions [37], and psychosomatic state [38]. Autonomic dysregulation is a biomarker for ASD and ADHD. Specifically, assessment using HRV can distinguish sensory reactivity in ASD children from that found in typically developed children [31,39]. Furthermore, ADHD can be assessed using HRV to distinguish measurements regarding sustained attention and emotional and behavioral regulation deficits seen in ADHD, and it may help to define the pathophysiology of the disorder [40,41].
Machine learning (ML) is a subset of AI and a rapidly evolving field of study that aims to establish high-quality prediction models using search strategies, deep learning, and computational analysis to enable machines to learn to make autonomous decisions and improve their performance at specific tasks [42]. There are several uses for ML in health and healthcare [12,[43][44][45][46][47][48]. The way we approach disease/disorder screening, diagnosis, and treatment may change as a result; for example, ML algorithms can examine patient data to spot trends and forecast the course of diseases/disorders. Supervised ML for classification is a type of machine learning where a model is trained to predict a categorical output variable. Metrics such as accuracy, error rate, precision, and recall can be used to evaluate a classification model's performance [39,49]. A good classification model should have high accuracy, precision, and recall, but the optimal values may depend on the specific problem being addressed. For instance, early detection of type 2 diabetes and its complications has been identified from electronically collected data using ML and deep learning techniques [50,51]. Further, towards individualized treatment plans, ML algorithms can examine patient data, including genetic data and medical history improving treatment results [52,53]. Wearable technology and sensor data can be analyzed by ML algorithms to track patient health and spot early disease symptoms [54,55].
In relation to this, a soft computing approach of predictive fuzzy cognitive maps has been employed successfully to represent human reasoning and to derive conclusions and decisions in a way that is human-like for a Medical Decision Support System [48]. This system was intended for medical education, employing a scenario-based learning approach to safely explore extensive "what-if" scenarios in case studies and prepare for dealing with critical adversity [48]. Additionally, a sub-band morphological operation method has also been used successfully to detect cerebral aneurysms [56] and convolutional neural networks have been employed for the classification of leukocytes categories and leukemia prediction [57]. Furthermore, wearable electroencephalogram (EEG) recorders and Brain Computer Interface software have been proposed to aid in the assessment of alcohol-related brain waves [58]. More specifically, calculated spectral and statistical properties were used for classification, and Grammatical Evolution was applied. The suggested approach reported high accuracy results (89.95%), and thus, it was suited for direct drivers' mental state evaluation for road safety and accident avoidance in a future in-vehicle smart system. Further, for the hemiplegia type classification among patients and healthy individuals, an automatic feature selection and building method based on grammatical evolution (GE) for radial basis function (RBF) networks was presented [59]. Using an accelerometer sensor dataset, this approach was put to the test using four different classification techniques: RBF network, multi-layer perceptron (MLP) trained using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) training algorithm, support vector machine (SVM), and GenClass, a GEbased parallel tool for data classification. The test results showed that the suggested solution had the best classification accuracy (90.07%) [59]. Various approaches of neural networks and deep neural networks have been used for classification of speech quality and voice disorders with very promising results [43][44][45][46][47]60,61].
New prospects are presented to assist clinical decision-making through the use of AI algorithms, automated instruments for measuring, decision-making, and classification in communication deficiencies and NDs in the research setting [11][12][13][14][15]62]. Traditional ML approaches use separate feature extraction procedures and classification methods, but with Deep Learning these two procedures are done comprehensively [42]. For the ASD diagnosis in young children from 5 to 10 years old, an intelligent model has been presented based on resting-state functional magnetic resonance imaging data from global Autism Brain Imaging Data Exchange I and II datasets and using convolutional neural networks (CNNs) [63]. The best results have been obtained with Adamax optimization technique. A review of ML research for MRI-based ASD identification deduced that the accuracy of research studies with a significant number of participants is generally lower than that of studies with fewer participants, implying the further need for large-scale studies [64]. Regarding participants' age, it is shown that the accuracy of ASD automated diagnosis is higher for younger individuals [64]. Another thorough examination of deep learning approaches looks into the prognosis of neurological and neuropsychiatric disorders, reporting more potential for diagnosing stroke, cerebral palsy, and migraines using various deep learning models [65].
A deep neural network model employed in the early screening of ASD, assessing children's eye tracking data applicability, reported outcomes that strongly indicated efficiency in helping clinicians for a quick and reliable evaluation [15]. The outcomes of a review article on ML methods of feature selection and classification for ASD, used to analyze and investigate ASD, indicate an improvement in diagnostic accuracy, time, and quality without complexity [66]. In an analysis and detection of ASD after applying various ML techniques and handling missing values, the results strongly suggest that convolutional neural network-(CNN) based prediction models work better on their datasets with a significantly higher accuracy for ASD screening in children, adolescents, and adult data [67]. A CNN is employed for the classification of ADHD, trained with EEG spectrograms of 20 patients and 20 healthy participants. The model has an accuracy of 88% ± 1.12%, outperforming the Recurrent Neural Network and the Shallow Neural Network, with the advantage of avoiding the manual EEG spectral or channel features [68]. Furthermore, a CNN was used to identify ADHD from a dataset of children (ADHD: 50, Healthy: 57) and the network input data consisted of power spectrum density of EEGs. The accuracy obtained was 90.29% ± 0.58% [69].
Additionally, serious games which embed fine motor activities obtained from a mobile device and deep learning convolutional neural networks (CNN) are proposed as novel digital biomarkers for the classification of developmental disorders [12]. A pilot study of an integrated system that includes a serious game and a mobile app, and utilizes ML models that measure ADHD behaviors, suggests their significant potential in the domain of ADHD prediction [14]. Moreover, a gamified online test and ML using Random Forests for the predictive model were designed with results revealing that their model correctly detected over 80% of the participants with dyslexia, pointing out that dyslexia can be screened using an ML approach [62].
Consequently, more in-depth research is needed which utilizes automatic classification techniques to assist clinicians' decision making. The aim of the current study is to examine automatic classification for the assistance and support of evaluation procedures in speech and language skills on biometric data gathered for children with a Disorder (NDs or no-NDs). Further, and in more detail, we also examine five types of NDs: ASD, ADHD, ID, SLD, and CD. Hence, we overall study six binary classification problems. The methods utilized to classify the data are a Radial Basis Function (RBF) neural network, a Deep Neural Network (DNN), and a Grammatical Evolution variant named GenClass [70].

The SmartSpeech Project
This study is part of the ongoing research project "Smart Computing Models, Sensors, and Early diagnostic speech and language deficiencies indicators in Child Communication" (also known as SmartSpeech), funded by the Region of Epirus in Greece and the European Regional Development Fund (ERDF). Specifically designed activities based on ND assessment procedures were used to create a serious game for the SmartSpeech project [71]. This serious game collects players' responses. A dedicated server backend service processes gathered data and examines whether specified domains or skills can be used for the early clinical screening/diagnostic procedures toward automated indications.

The Sample
The sample of this study consists of the SmartSpeech biometric data. A total of 435 participants, mean age: 8.8 ± 7.4 years and mixed gender (M:224 and F:211), contributed to the sample of this study. The participants' sample was divided in groups of NDs (96) and no-NDs (339). NDs have been categorized in agreement with DSM-5 (ASD: 17, ADHD: 18 have, ID: 8, SLD: 19, and CD: 42). Six NDs participants have co-occurrence of more than one disorder.

Data Collection
To recruit for the sample, many calls were made through health and educational sectors that support children with NDs and no-NDs. For each child participant, an adult has also been involved to provide the required (parental) consent and the child's developmental and communication history. The project's nature, purpose, procedures, and approval by the University of Ioannina Research Ethics Committee, Greece (Reg. Num.: 18435/15 May 2020), which complies with the General Data Protection Regulation GDPR, were then thoroughly explained to parents during an informative meeting. Parents then endorsed the consent form.
Next, the child interacts with the serious game, under the clinician's guidance. The game is designed to record the child's responses while playing the game along with biometric measurements, i.e., eye tracking and heart rate. The responses are quantified as variables forming four categories, more specifically, hand movements on the touch screen, verbally answering questions or executing commands, eye tracking data while watching scenes, and spontaneous heart rate reactions in real time. Regarding the first category, the game automatically outputs the scores that correspond to the player's performance. The remaining variables are analyzed as follows.
The digital game employs procedures to recognize words through the built-in speechto-text (STT) capability. The participant is asked in several phases of the game to pronounce words such as characters' names and objects of the game. The words that make up the correct answers are predetermined. The duration of each recording is 10 s, long enough to capture the participant's response. For the word recognition, the speech-to-text program CMUSphinx [72] has been chosen. It is available for free, it can work on different operating systems (desktop, mobile), it is relatively fast, and it works offline. In this software, a corresponding recognition model in Greek has been created and trained [73]. There are a total of 40 words that are expected to be "heard" in the targeted language, that is, Greek. The program essentially detects which word from the above best matches what the child has said and consequently whether the child gave a correct answer or not.
During the game and in real time the player wears a smartwatch which records the heart rate. The values are sent online to the database where they are synchronized with the different phases of the game. Depending on the activity we are interested in samples collected from the specific time period and the corresponding statistics variables are calculated. Mean, standard deviation, and range for every distinct activity of the game are the heart rate (HR) variables. HRV is the variation in the time difference between successive heart beats, and several ways of calculation have been defined [33]. For the exact calculation one should have the information of the time difference of successive pulses, and the most reliable way is with the electrocardiogram (ECG). The wearable device (smartwatch) used in the game allows only the heart rate to be measured, not the individual pulses. From the heart rate (HR) it is not possible to calculate HRV directly, especially when filtering-smoothing techniques are used by the measuring devices which alter the original information of the measurements. However, to obtain an estimate of the rate variability, since it is considered a more important feature than the rate itself, we have calculated the heart rate standard deviation and range statistics as an alternative.
Unity environment (Unity ® , 2022) is used to implement the game. Eye tracking is provided through the SeeSo software [74]. This is achieved by detecting the position of the eyes through the camera of the mobile device (tablet) and allows the calculation of the target the observer is looking at on the screen. In certain activities in the game the above functionality is enabled, and the result is a sequence of X, Y coordinates that correspond to the screen of the device at certain time periods. These coordinates give direct information about what the player is looking at (gaze points). It is known that human eye movements, when trying to obtain information when watching a scene, are generally very fast, with a duration of a few milliseconds, so one can quickly process any visual stimulus by literally scanning the scene. Fast eye movements are also called saccades [75], while when the target points of the gaze are relatively close both spatially and temporally, they constitute what we call a fixation [75], which refers to where and when we mentally process the scene by deriving information out of it. The software gives information about the fixations and we extract three basic variables that are common in eye movement research [76]. These are (a) number of fixations (fixation count-FC), (b) time that passed till the first fixation (time to first fixation-TTFF), (c) the total duration of fixations (time spent-TS).

Data Formulation
Three datasets were formed, corresponding to the categories of (i) game scores N:435 (NDs: 96, no-NDs: 339), (ii) heart rate statistics N:321 (NDs: 88, no-NDs: 233), and (iii) eye-tracker metrics N:182 (NDs: 42, no-NDs: 140). Each set is a representation of the set of classification input variables. Data that were invalid or missing were filtered out and forced to case reduction and thus pathological case reduction in the datasets. Table 1 summarizes the input variables used in each dataset; game-scores dataset: 30 variables, heart-rate dataset: 15, eye tracking dataset: 16 (as depicted visually in Section 2.6). The TTFF variables in the eye tracking dataset had to be removed during the filtering process. Figure 1a-c provides a visualization summarizing descriptive statistics for this study's variables (means, standard error). The classes that were used are defined by the target binary variables. The Disorder variable indicates whether the child has ND(s) or not, i.e., no-ND. The remaining variables, Signals 2023, 4 407 such as ASD, ADHD, ID, SLD, and CD, indicate the existence of the disorder in the way that the DSM-5 has previously been described.

Classification Methods
The methods in this study used to classify the data are RBF, DNN, and Grammatical Evolution variant, named GenClass, which are also depicted in which are also depicted visually in Section 2.6.
The RBF is a kind of artificial neural network which has been widely used for a range of tasks, including classification, regression, and clustering with effectiveness in problems with high-dimensional input spaces and complex patterns [77][78][79]. The RBF network has several advantages over other neural network architectures, including its ability to handle high-dimensional data, fast training and testing times, and the ability to approximate any continuous function with arbitrary precision. The RBF network has three layers, according to [79] (input, hidden, and output). An input comes from a variable in Table 1. The hidden layer uses radial basis functions as activation functions to transform the input data into a new representation. This representation is then used for further processing in the output layer. The output of the network is computed as a linear combination of the transformed inputs. Thus, the output is a binary decision in the form of 0 or 1 (TRUE or FALSE), representing the two outcomes of each of the six classes of the study. Figure 2 presents the RBF Neural Network flowchart. The classes that were used are defined by the target binary variables. The Disorder variable indicates whether the child has ND(s) or not, i.e., no-ND. The remaining variables, such as ASD, ADHD, ID, SLD, and CD, indicate the existence of the disorder in the way that the DSM-5 has previously been described.

Classification Methods
The methods in this study used to classify the data are RBF, DNN, and Grammatical Evolution variant, named GenClass, which are also depicted in which are also depicted visually in Section 2.6.
The RBF is a kind of artificial neural network which has been widely used for a range of tasks, including classification, regression, and clustering with effectiveness in problems with high-dimensional input spaces and complex patterns [77][78][79]. The RBF network has several advantages over other neural network architectures, including its ability to handle high-dimensional data, fast training and testing times, and the ability to approximate any continuous function with arbitrary precision. The RBF network has three layers, according to [79] (input, hidden, and output). An input comes from a variable in Table 1. The hidden layer uses radial basis functions as activation functions to transform the input data into a DNNs [80] consist of many artificial neural networks formed as layers, each composed of a specific number of neurons. The received input is transformed nonlinearly at each layer, and the outputs are then passed on to the layer above it until the network's output. Their architecture (Figure 3) allows them to learn highly complex representations of input data and link them to the desired output, rendering them a suitable and effective tool for a wide range of applications, including but not limited to image recognition, speech recognition, natural language processing, and classification. new representation. This representation is then used for further processing in the output layer. The output of the network is computed as a linear combination of the transformed inputs. Thus, the output is a binary decision in the form of 0 or 1 (TRUE or FALSE), representing the two outcomes of each of the six classes of the study. Figure 2 presents the RBF Neural Network flowchart. DNNs [80] consist of many artificial neural networks formed as layers, each composed of a specific number of neurons. The received input is transformed nonlinearly at each layer, and the outputs are then passed on to the layer above it until the network's output. Their architecture ( Figure 3) allows them to learn highly complex representations of input data and link them to the desired output, rendering them a suitable and effective tool for a wide range of applications, including but not limited to image recognition, speech recognition, natural language processing, and classification. The core element of a DNN is the artificial neuron [81]. Specifically, a neuron applies a nonlinear function to the weighted sum of those inputs and outputs. In the case of a fully connected network, each neuron of each layer is connected with every neuron of the next layer, and the weights of those connections are learned during the training. The learning procedure, or, differently, the training phase of a DNN, usually involves the adjustment of the weights of all neuron connections to minimize an error between the network's predictions and the actual output values. Usually, this is conducted using a method called The core element of a DNN is the artificial neuron [81]. Specifically, a neuron applies a nonlinear function to the weighted sum of those inputs and outputs. In the case of a fully connected network, each neuron of each layer is connected with every neuron of the next layer, and the weights of those connections are learned during the training. The learning procedure, or, differently, the training phase of a DNN, usually involves the adjustment of the weights of all neuron connections to minimize an error between the network's predictions and the actual output values. Usually, this is conducted using a method called backpropagation [82], in which the gradient of the error with respect to each weight in the network is calculated to reduce the error. This minimization can also be conducted by using more sophisticated optimization techniques but with a cost. One of the challenges in training DNNs is to avoid overfitting. Overfitting means that the network becomes so specialized to the training data that it cannot perform well on new unseen data. Many techniques have been proposed to mitigate this problem, such as dropout and weight decay [83], among others.
A genetic programming technique called grammatical evolution uses a grammarbased strategy to evolve computer programs [84]. It is an evolutionary process that has been used in various cases such as music composition [85], economics [86], symbolic regression [87], and caching algorithms [88]. In the genetic algorithm, the chromosomes serve as a vector of integer values to represent the production rules of a Backus-Naur Form (BNF) grammar [89].
The algorithm proposed by Tsoulos, named GenClass [70], is a classification algorithm based on grammatical evolution. The start symbol of the grammar serves as the starting point for the production procedure, which gradually produces the program string by substituting nonterminal symbols with the right hand of the chosen production rule. Figure 4 shows the GenClass flowchart.
The main advantage is that it does not require any additional information, such as the derivatives of the objective problem, which cost in time and memory. Specifically, it generates a series of classification rules in a C-like language that can be easily programmed and used in real C programs without many modifications. The generated rules are constructed with the use of if-else conditions, and the variables represent the corresponding features. The source code of the method can be found in https://github.com/itsoulos/GenClass (accessed on 30 December 2022).
The application details of the utilized classifiers are depicted. The following techniques were used to successfully identify the categories in the 3 datasets: RBF with 10 processing neurons [79], DNN approaches described thereafter, and GenClass [70]. The number of chromosomes used in GenClass were 500 and a maximum of 2000 generations were allowed. Experimental settings parameters are shown in Table 2.  The main advantage is that it does not require any additional information, such as the derivatives of the objective problem, which cost in time and memory. Specifically, it generates a series of classification rules in a C-like language that can be easily programmed and used in real C programs without many modifications. The generated rules are constructed with the use of if-else conditions, and the variables represent the corresponding features. The source code of the method can be found in https://github.com/itsoulos/GenClass (accessed on 30 December 2022).
The application details of the utilized classifiers are depicted. The following techniques were used to successfully identify the categories in the 3 datasets: RBF with 10 processing neurons [79], DNN approaches described thereafter, and GenClass [70]. The number of chromosomes used in GenClass were 500 and a maximum of 2000 generations were allowed. Experimental settings parameters are shown in Table 2.  The provided DNN approaches were implemented using Python language and Keras library. Three different approaches were considered for the comparisons with different fully connected layers. The approaches were named according to the corresponding adopted layers, with the names DNN-3, DNN-4, and DNN-5, accordingly. The architecture of DNN-3 consists of three fully connected layers with 64, 32, and 16 neurons, respectively, and a final output layer with three neurons. The neurons used the sigmoid activation function [90], while the final output neurons used the softmax activation. The model is compiled with the Nadam optimizer and categorical cross-entropy loss function and trained over 1000 epochs with a batch size of 8. Accordingly, for DNN-4, the extra added layer has 128 neurons, and for DNN-5, 256.

Performance Estimation
The 10-fold cross-validation technique has consistently been employed as an evaluation method to fairly assess the predictive ability and produce its efficiency ( Figure 5). We divided each dataset into ten partitions. Nine of the partitions we created were used for training, and the final partition was used for testing. For each instance we performed thirty independent experiments and calculated each algorithm and the average classification errors. Moreover, we used different seed numbers for every experiment by using the C programming language's drand48() random number generator. For the experiments, we used freely downloadable software from https://github.com/itsoulos/IntervalGenetic, (accessed on 18 February 2023). For classification evaluation, a confusion matrix is used to calculate the er precision, recall, and accuracy, presented below in Equations (1) Overall, to overview the methods followed in this study, Figure 6 visually strates the study's workflow. For classification evaluation, a confusion matrix is used to calculate the error rate, precision, recall, and accuracy, presented below in Equations (1)-(4), respectively [39,49]:

2023, 4, FOR PEER REVIEW
Overall, to overview the methods followed in this study, Figure 6 visually demonstrates the study's workflow.

Results
The experimental results are shown in Tables 3-7. Tables 3-7 show the results as average error rates percentages (%) for the eye tracking, the heart rate, and the gamebased datasets.  For the eye tracking dataset, it is shown that the best overall results are obtained with the GenClass method with a total average error rate of 12.03%. More specifically, this method is found to be more suitable for the Disorder class, which denotes whether a child has a disorder or not and particularly for the disorders ID, SLD, and CD, with average error rates of 13.07%, 11.12% and 11.30%, respectively. DNNs on the other hand are proved to be more accurate for distinguishing the NDs of ASD and ADHD, with average error rates of 13.67% and 12.56%, respectively. The number of layers seems to have a small impact on the outcome with the four-layer DNN achieving the highest performance.
For the heart rate dataset, the RBF classifier is superior to the others for all the classes with an overall average error rate of 18.73%. This method is proved to be more appropriate when the biometric data consist of heart rate measurements, whereas the difference in performance against to the other classifiers is remarkable.
For the game scores dataset, the GenClass classifier is found to be slightly better in detecting all the target disorder variables with an average error rate of 22.08%.
Furthermore, Table 6 compares the precision and recall for the Disorder datasets. Finally, a comparison in terms of higher classification accuracies is shown in Table 7 for each class and SmartSpeech dataset.

Discussion
This study aimed to utilize ML to examine the development of innovative automated solutions for the early identification of NDs in children with communication deficiencies, offering the development of technology-based data-gathering techniques such as motion tracking, heart rate metrics, and eye tracking from the new SmartSpeech dataset developed in Greek. Ten-fold cross-validation was chosen for evaluating model efficacy since it produces high variability in testing and training data, decreases bias, and delivers consistent findings for all tries, parameters, and models. The results of this research give a direct comparison of the different machine learning methods employed on this dataset, which are RBF, DNN, and GenClass.
The reported results of this study (Tables 3-5) display the comparison of all the methods employing the performance metric of the error rate (%). Thus, a smaller value implies better performance. Precision and recall metrics are also displayed for the class Disorder (Table 6). Finally, the highest performance classification methods in accuracy metrics are reported for each class and dataset (Table 7). Particularly, Table 7 clearly illustrates the tendency of the specific methods to dominate in each dataset and class; more specifically:

•
For the eye tracking measurements, the GenClass and the DNN-4 have proven to be the best choices, with an accuracy of at least 86.33% for the ASD population. GenClass is superior for the classes Disorder, ID, SLD, and CD, whereas DNN-4 is better for ASD and ADHD. For the aggregate class Disorder, GenClass has the highest observed accuracy of 92.83%. This finding may be utilized for automated screening to discriminate whether an individual has NDs.

•
The RBF method is the most accurate in the heart rate dataset, with an accuracy of at least 80.05%. It is notable that it achieves the best performance for all the classes under study.

•
As for the game-based dataset, the GenClass method has the highest accuracy for the classes Disorder, ASD, ID, and CD. The classes ADHD and SLD are better identified using the RBF algorithm.
However, in most other cases GenClass and DNN-4 outperform the rest. It is worth noting that GenClass is expected to have longer execution times since it is based on genetic algorithms. Nevertheless, in this study we have employed the parallelization feature of the software GenClass [91] to speed up the process.
Similar research attempts to identify NDs have been reported in the literature. For example, one such study evaluated the ability of drag-and-drop data to be used to classify children with developmental disabilities [12]. Data were collected from 223 children with typical development and 147 children with developmental disabilities via a mobile application (DoBrain). A deep learning CNN algorithm was developed to classify an area under the curve (AUC) of 0.817. Furthermore, in line with our study, a binary classifier has also been trained using paralinguistic features extracted from typically developing children and children suffering from Speech Sound Disorders (SSD), reporting 87% accuracy [60]. In the same direction as our study, the HRV was also used as a biomarker to distinguish autistic and typical children by applying several machine learning algorithms, that is, the Logistic Regression, Linear Discriminant Analysis, and Cubic Support Vector Machine [39]. Logistic Regression proved to be the best classifier for a color stimulus test in that study, whereas Linear Discriminant Analysis was better in the baseline test. Moreover, an important biomarker to detect ASD can be considered similar to our research which focused on eye tracking data [15]. While finding the best method to predict autism with the help of eye tracking scan path images, the DNN classifier was compared to traditional machine learning approaches such as Boosted Decision Tree, Deep Support Vector Machine, and Decision Jungle. The DNN model outperformed the other machine learning techniques with an AUC of 97%, sensitivity of 93.28%, specificity of 91.38%, negative prediction value (NPV) of 94.46%, and positive predictive value (PPV) of 90.06% [15]. Moreover, RBF also reported reliable results in a study with an attempt to identify children with ID that was done using two different feature extraction methods of speech samples, that is, the Linear Predictive Coding based cepstral parameters and Mel-frequency cepstral coefficients, along with four classifiers, that is, k-nearest neighbor, support vector machine, linear discriminant analysis, and RBF neural network [92]. The RBF classification model was the best technique for classifying disordered speech, giving higher accuracy compared to the rest of the classifiers (>90%).
Furthermore, this study's sample size is analogous to other research [12,15,93] due to the high costs of collecting the data involving human subjects and the ongoing development of tasks and experimental techniques that can discriminate between various situations to the greatest extent possible. Similar to prior studies [93], in this study, experimenting while collecting a single multi-dimensional data sample may take 1.5 to 4 h of participant's time (such as setting up, testing, and setting down) and 2 to 6 h of participant time (which encompasses travel time). Furthermore, reaching out to people and encouraging participation is complex, making recruiting many participants with NDs difficult. As a result, the resources available for early-stage studies do not allow for gathering samples from thousands of people. Although this study's sample size is not very large, its results form one of the first attempts at employing ML on data from digital gameplay and sensors to automatically assist the clinician's decision, reducing the inherent uncertainty of clinical diagnosis regarding speech and language activities and their manifestations. This study contributes to the automatic classification of NDs based on new datasets initiated from responses during software interactions, primarily designed and implemented for the Greek language. Future research may focus on enriching the dataset and considering recent advances in classification to enhance accuracy.

Conclusions
This study examines a number of ML approaches to explore how to automatically identify children with various neurodevelopmental disorders. The ML techniques utilize modern optimization algorithms such as the Radial Basis Function (RBF) Neural Network, Deep Learning Neural Networks (DNN), and a variant of the Grammatical Evolution method, namely GenClass. These methods are used for disorder classification on our dataset, derived from SmartSpeech, an innovative system with a digital mobile serious game designed to assist clinicians in speech and language therapy in Greek. The dataset is split in three parts, one for the game-based data and two for biometric data measured, that is, eye tracking and heart rate. The results of this study have shown that best performing classifiers for the eye tracking datasets were GenClass and DNN-4, for the heart-rate dataset was the RBF method, and for the game-based were GenClass and RBF.
The outcomes of this study motivate further research in future. Evidently, modern technologies and especially ML methodologies are giving an opportunity to clinicians to improve their assessment both in terms of speed and accuracy. Funding: This research was funded by the project titled "Smart Computing Models, Sensors, and Early Diagnostic Speech and Language Deficiencies Indicators in Child Communi-cation", with code HP1AB-28185 (MIS: 5033088), supported by the European Regional Development Fund (ERDF).

Data Availability Statement:
The participants of this study did not give written consent for their data to be shared publicly, so due to privacy restrictions and the sensitive nature of this research data sharing is not applicable to this article.