ASD2-TL∗ GTO: Autism spectrum disorders detection via transfer learning with gorilla troops optimizer framework

Autism Spectrum Disorder (ASD) treatment requires accurate diagnosis and effective rehabilitation. Artificial intelligence (AI) techniques in medical diagnosis and rehabilitation can aid doctors in detecting a wide range of diseases more effectively. Nevertheless, due to its highly heterogeneous symptoms and complicated nature, ASD diagnostics continues to be a challenge for researchers. This study introduces an intelligent system based on the Artificial Gorilla Troops Optimizer (GTO) metaheuristic optimizer to detect ASD using Deep Learning and Machine Learning. Kaggle and UCI ML Repository are the data sources used in this study. The first dataset is the Autistic Children Data Set, which contains 3,374 facial images of children divided into Autistic and Non-Autistic categories. The second dataset is a compilation of data from three numerical repositories: (1) Autism Screening Adults, (2) Autistic Spectrum Disorder Screening Data for Adolescents, and (3) Autistic Spectrum Disorder Screening Data for Children. When it comes to image dataset experiments, the most notable results are (1) a TF learning ratio greater than or equal to 50 is recommended, (2) all models recommend data augmentation, and (3) the DenseNet169 model reports the lowest loss value of 0.512. Concerning the numeric dataset, five experiments recommend standardization and the final five attributes are optional in the classification process. The performance metrics demonstrate the worthiness of the proposed feature selection technique using GTO more than counterparts in the literature review.


Introduction
Behavioral, social, and communication impairments are the hallmarks of autism spectrum disorder (ASD) [1].Repeated behaviors and delays in motor skill development are also symptoms of ASD [2].These diseases can generally be distinguished from three years of age using available diagnostic protocols.ASD usually manifests in the first two years; the symptoms last a lifetime.Autism affects many parts of the brain.The gene interactions or polymorphisms contributing to this disorder also affect it genetically.ASD affects approximately one child out of 70 worldwide.The US CDC [3] stated that 168 out of 10,000 children were diagnosed with ASD in 2018, the highest rate on record.Boys have a higher prevalence rate of ASD than girls.The United States has an estimated 3.63 % of boys with ASD aged 3-17, while girls have an estimated 1.5 %.
There is no specific treatment for ASD, but various treatment approaches have been devised to alleviate symptoms and improve cognitive abilities, daily life skills, and functionality in individuals with ASD [2].Early treatment, however, can significantly improve the symptoms and functional abilities.The most common intervention methods for patients who have ASD are behavioral and cognitive, with some relying on evolutionary approaches.Early intervention can improve social skills, interaction, and neurodevelopment for individuals with autism spectrum disorder (ASD).To achieve this, it is necessary to develop an effective and precise method of diagnosing ASD.However, ASD diagnosis can be challenging due to its complex and heterogeneous symptoms.Therefore, manual screenings are tedious, time-consuming, and susceptible to human error.As a branch of artificial intelligence, machine learning (ML) is a technique that enables computers to automatically analyze large datasets and find patterns to make decisions about them.Supervised ML can predict a given class of data points by building mathematical models based on training data [4].Computer-aided diagnosis (CAD) systems aim to assist clinicians and medical professionals in diagnosing diseases and conditions.For example, scholars are trying to design classifier-based computer models to diagnose Autism due to recent advances in ML.The CAD systems do not intend to make their diagnoses of patients.However, In the hands of clinicians, they can be valuable instruments for achieving an efficient diagnosis and faster diagnosis.Data preparation, dimension reduction, model training, validation, and testing are all aspects of ML pipelines.Pre-processing algorithms act as the front end of the pipeline.This can be attributed to the fact that they can improve some aspects of poor-quality data, namely outlier detection, feature scaling, and imputation, thus making the data more ready to be utilized further in the learning process [5].Deep learning (DL) methods have recently gained popularity and have shown great potential in the medical field.More high-level features can be discovered with DL than with traditional ML methods.ML and image processing techniques have dramatically improved healthcare image processing and illness detection, achieving performance comparable to that of skilled specialists.
Deep learning is a powerful technique that simulates brain activity to create prototypes that can assist decision-making and data processing.Convolutional neural networks (CNNs), a deep learning model, are commonly used for analyzing visual images with minimal pre-processing.Machine learning (ML) as a diagnostic tool has become increasingly popular, providing additional information [6].However, DL models are less reliable in clinical settings as they require large datasets and can be affected by the choice of hyperparameters.Transfer learning (TL) is a process that involves taking parts from one model and using them to build another model serving a different purpose.It can potentially improve DL models by enabling knowledge transfer between various tasks.Utilizing meta-learning for reuse may become more common in the future [7].To improve performance, a process of optimization is used to select appropriate hyperparameter values rather than choosing them randomly [8].
Optimization is valuable in numerous fields, including engineering, mathematics, medicine, and the military.It involves selecting the best or most effective solution for a problem and improving its efficiency and effectiveness in the long run.An optimization process is an iterative approach that involves a thorough search of all possibilities to develop an ideal solution.There are two groups of literature optimization methods: deterministic and stochastic.Deterministic methods can achieve global optimum solutions within negligible error tolerance and converge in a finite amount of time but suffer from degraded performance in proportion to the size of the optimization problem.Stochastic optimization exploits the randomness of scenarios to probe searches superficially and can provide very efficient results despite not guaranteeing optimal results.Heuristic approaches such as evolutionary algorithms, nearest neighbor algorithms, memetic algorithms, insertion algorithms, and dynamic relaxations are less expensive and achieve near-optimal results.Still, most are tailored toward specific problems [9][10][11][12].
There will likely be many layers, intermediate processing elements, and other structural elements in a proposed framework, meaning search metaheuristics will be needed to explore them.Metaheuristics, also known as stochastic algorithms, are a class of methods that provide efficient and reliable solutions to nonlinear optimization problems.A variety of metaheuristics solves optimization problems, and much of this intelligence comes from natural organisms in nature.Metaheuristics aims to offer a set of guidelines or rules for developing algorithms independent of the problem [10].Despite the structural properties, metaheuristic approaches initiate arbitrary trials within their limits.Until the termination condition is met, each algorithm-specific equation evolves potential solutions.
Metaheuristic optimization involves two main phases -diversification and intensification -to find an optimal solution.Diversification is the exploration phase, which uses randomized searches to reduce the chance of being entrapped by local minima and maintain a global search.Intensification is the exploitation phase, which concentrates successful samples near the population memory to identify promising regions near the best solution.Balancing these two phases is crucial for successful metaheuristic optimization.Nature-inspired algorithms are commonly used for optimization problems, with examples of physics-based, nature-based, humansbased, swarm-based, and animal-based methods.Most metaheuristic algorithms are inspired by animal hunting and prey behavior, with three common types being evolutionary, physics, and swarm algorithms.Swarm algorithms simulate population behavior, with examples such as particle swarm optimization (PSO), ant colony optimization (ACO), and artificial bee colony algorithms.Other examples of swarm intelligence include firefly, gray wolf, Gorilla Troops Optimizers, and whale optimization algorithms [12].
Despite the existence of several algorithms for ASD detection, these algorithms fail to provide exact solutions to NP-hard multidimensional problems.This paper aims to fill that gap by introducing a novel deep-learning framework using TL with a Gorilla Troops optimizer for detecting ASD.The main contributions of this study can be organized as follows.
-Based on pre-trained CNNs, an innovative Autism Spectrum Disorders Detection via Transfer Learning with Gorilla Troops Optimizer (ASD 2 -TL*GTO) framework has been devised.-Gorilla outperformed natural-inspired algorithms among the most recent top algorithms.
-The ASD 2 -TL*GTO framework is flexible; hyperparameters are not manually assigned.
-Two separate datasets are used, which makes the ASD 2 -TL*GTO framework's deployment and data availability easier.
-Standard performance measurements have yielded extremely good outcomes.
The rest of the paper is organized as follows.First, the related work is reviewed in Section 2.Then, Section 3 presents the proposed ASD 2 -TL*GTO framework.Next, Section 4 describes the experimental result.Section 5 wraps up this paper and discusses further research.

Related studies
Many fields of medicine, including structural and functional neuroimaging, are experiencing an expansion in the use of ML algorithms and DL approaches.Several neuroimaging studies have been conducted over the past few years to capture and analyze brain activity, including electroencephalography (EEG), magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), resting-state functional magnetic resonance imaging (rsfMRI), positron emission tomography (PET), and electrocorticography (ECoG) [13][14][15][16][17].The following section discusses different techniques and neuroimaging studies proposed for ASD identification.

Machine learning for ASD identification
Many researchers have applied ML techniques for ASD classification [17][18][19][20].An automated postural control pattern detection algorithm using ML was developed and validated on the COP dataset for identifying children with Autism.Several supervised ML techniques were used to determine the ASD postural control.According to the findings, all ML algorithms successfully recognized postural control patterns between typically developing children and children with ASD [4].On the other hand, physical activity levels in the ASD and typical development groups were not closely observed.Bilgen et al. [5] considered T1-weighted MRI data for the brain for autism spectrum disorders.This study investigated the link between brain area morphology and spatial representation using an ML-based diagnostic technique.
Nevertheless, the model has not proved accurate enough for ASD identification.Therefore, several ML methods and DL approaches such as Naïve Bayes (NB), support vector machine (SVM), Logistic regression (LR), K-nearest neighbor (KNN), and convolutional neural network (CNN) are implemented on the UCI dataset to analyze features and predict autism symptoms in children [21].The results illustrate that CNN and SVM achieve the highest accuracy.An ensemble learning method is presented to represent deep features of the brain obtained from functional MRI (fMRI).This study used a stacked denoising autoencoder (SDA) to derive deep feature representation from multi-atlas images.To solve the task of ASD classification, multi-layer perceptron (MLP) and ensemble learning methods are employed.The proposed model exhibited an accuracy of 74.52 % [22].Another study combined convolutional networks based on multi-atlas graphs and ensemble learning to diagnose ASD automatically [23].A dataset of 949 subjects is used to evaluate the proposed approach, including 419 patients with ASD and 530 patients with typical control (TC).Chaitra et al. [24] combined graph-theoretic techniques with a support vector machine for ASD identification for 432 ASD patients.However, the experimental results reveal that the model could only diagnose ASD with 70.1 % accuracy.

Deep learning for ASD identification
Deep learning-based classification techniques have recently attracted much attention from researchers due to their ability to identify features and diagnose ASD effectively [2,25] automatically.By analyzing the brain activity patterns of patients, Heinsfeld et al. captured autism symptoms from a large brain imaging dataset [26].The maximum accuracy of the model was 70 % based on rs-fMRI data.In a study by Ari et al. [27], EEG signals were used to diagnose high-risk Autism in children.The model's architecture consists of a sparse coding-based feature mapping (SCFM) algorithm, Douglas-Peucker (DP) approach, and CNNs.Initially, the DP algorithm decreases the EEG signal by reducing the number of samples for each channel.The Wavelet-derived EEG signals are then encoded using SCFM.In addition, extreme learning machines (ELM)-based autoencoders (AE) are utilized to improve CNN models' performance.The results of the experiment showed that the model was 98.88 % accurate.However, this study includes small samples; only 20 children with Autism were used for ASD classification.
Xu et al. examined the inferior frontal gyrus and temporal lobe abnormalities among 47 children with ASD using short-term memory networks (LSTMs) with attention mechanisms.ASD was classified with a high level of accuracy, with a specificity of 97.5 % [28].However, sophisticated algorithms are needed to identify and capture high-level information from the fNIRS data.Moreover, a novel cognitive learning method based on long short-term memory and autoencoder networks was developed to investigate untraditional brain characteristics and capture ASD symptoms [29].Enhanced convolutional neural networks (ECNNs) are also proposed to identify specific patterns to diagnose ASDs by analyzing functional connectivity patterns between different brain areas [30].According to experimental results, the proposed ECNN can achieve 80 % classification accuracy.In Epalle et al.'s study, multi-input DL networks were used to classify autism symptoms.The architecture of the proposed model incorporates three different atlases to pre-process the neuroimaging data.In addition, the Hinge loss function was utilized for training the proposed DL network.As a result, the model reached a classification accuracy of 78.07 % [31].
Elbattah et al. [32] presented a novel application of transfer learning for ASD detection using eye-tracking.They employed transfer learning models such as VGG-16, ResNet, and DenseNet [33] for Autism diagnosis.These models comprise a base model for feature extraction and a classifier model for classification.Eye-tracking scan paths are converted into a visual representation to facilitate the use of pre-trained vision models.However, the study acknowledges that their review of potential machine-learning approaches for autism detection is confined to facial expressions and eye-gaze movements, potentially overlooking other significant features or modalities.Moreover, the small sample size used in the experiments may limit the generalizability of the results.
In summary, the literature indicates that many researchers use neuroimaging modalities, such as fMRI and rsfMRI, to detect Autism.

A.M. Almars et al.
Table 1 summarizes the related work for ASD.However, three main limitations can be concluded.First, detecting ASDs using EEG has traditionally been accomplished using traditional machine learning algorithms.EGG has shown superiority over other neuroimaging methods in terms of high temporal resolution, convenience, noninvasive nature, general availability for physicians, and low setup costs.Second, a few DL-based studies have been suggested to capture Autism using EGG [27].Third, NP-hard multidimensional problems cannot be solved by these existing algorithms.This paper proposes a novel ASD 2 -TL*GTO framework based on TL and an Artificial GTO for detecting Autism to fill the gap.

Methodology
This study proposes an ASD 2 -TL*GTO framework for Deep Learning (DL) and Machine Learning (ML) classification and optimization, leveraging the Artificial Gorilla Troops Optimizer (GTO) metaheuristic optimizer.The GTO was chosen for its proficiency in efficiently exploring complex, high-dimensional search spaces and balancing exploration and exploitation during optimization.Inspired by the social behavior and intelligence of gorilla troops in the wild, the GTO is a novel metaheuristic optimization algorithm.It has demonstrated superior performance over other metaheuristic optimization algorithms across various optimization tasks and has been successfully applied to various optimization problems in diverse fields.
In the context of this study, we chose to use the Gorilla Troops optimizer in conjunction with transfer learning because of its ability to effectively optimize the weights of deep neural networks, which are widely used in transfer learning.Transfer learning involves reusing pre-trained models such as AlexNet [34], DenseNet [33], and MobileNet [35] to improve the performance of new models on different tasks, and the optimization of these models is a crucial step in achieving high performance.The Gorilla Troops optimizer effectively optimizes the weights of deep neural networks, making it well-suited for use in transfer learning applications.
The flowchart of the GTO algorithm is depicted in Fig. 1.This section introduces the Artificial Gorilla Troops Optimizer (GTO) metaheuristic optimizer, followed by a detailed discussion of the proposed framework's internal components.The framework comprises four phases, beginning with data collection.The current study utilizes two distinct datasets, one numerical and the other comprising images.These datasets are then pre-processed to suit the subsequent classification and optimization stage better.Following this, the initial GTO population is generated.The pre-processed datasets and the GTO population are then employed in the classification and optimization phase.

Artificial Gorilla Troops Optimizer (GTO)
GTO is a metaheuristic optimization algorithm rooted in the gorillas' lifestyle.GTO (gorilla behavior optimization) was proposed by Abdollahzadeh et al., in 2021.Simulation of gorilla social behavior and movement [36,37] is achieved through the technique.The gorilla troops comprise a silverback troop and a family of females and their offspring.Other groups of male gorillas also exist.A silverback lives for about 12 years and gets its name from the silvery hair that grows on its back during puberty.Members of this group revolve around Silverback.Additionally, it dictates the group's movement, determines whether they fight, ensures their safety, and directs them to food sources.Young male gorillas, known as blackbacks, provide backup protection for silverbacks.Their backs are not covered in silver hair; they are between 8 and 12 years old.The male gorillas migrate from their birthplaces as well as the females.New groups of gorillas usually form from these migrations.It is common for male gorillas to break away from their group and form one with a female who has recently moved out.Male gorillas may remain in the community where they were brought up and are referred to as silverbacks.Certain gorillas may compete with or continue to lead the team to achieve their goals without the Silverback when the Silverback dies [36,37].
The GTO optimizer showed high accuracy and efficiency [38].This optimizer requires little adjustment for engineering applications [38], as it is easy to use.Additionally, by enhancing search capabilities, the GTO method may be used to explore other system dimensions.When the dimensions are increased, the performance of other optimizers declines noticeably, giving this one an advantage in all similar dimensions [38].Gorillas prefer to live in groups, so they cannot live alone.
Consequently, the gorillas hunt together for food and remain under a silverback leader who makes all the group's decisions.A silverback is considered the best gorilla in this algorithm, and all others tend to approach it, while the weakest is ignored since it is the least preferred.The gorilla is represented by X in this algorithm, while GX represents silverbacks.Take a gorilla that is seeking better food sources, for example.Thus, GX is generated each time an iteration occurs and exchanges with the next solution if an improved value can be found [36].Moreover, the algorithm can be divided into two phases, as illustrated below.

GTO exploration phase
GTO's gorilla optimization algorithms consider silverbacks the best candidate solutions for each optimization operation stage, and all gorillas are considered contenders.Three operators have conducted exploration: migration to unknown areas to increase GTO exploration.By moving to other gorillas, the second operator achieves a balance between exploring and exploiting.GTO can significantly improve their search for different optimization spaces by using the third operator during the exploration phase, transfer towards a known and effective space.A parameter named p can be used to select the migration mechanism for an unknown position.The attribute p in the scope of [0, 1] must be provided before executing the optimization process to determine the likelihood of selecting a transition plan to an unknown position [39].If rand < p, then the first mechanism is selected.In the case of rand≥ 0.5, however, the mechanism is selected to move toward other gorillas.If rand< 0.5, The decision is made to migrate to a known location.Any mechanism can help the algorithm perform well, depending on the mechanism utilized.All results are evaluated at the end of the exploration phase, and GX (t) is used instead of X (t) (Silverback) if it is the lowest-cost option.

GTO exploitation phase
This phase can be divided into two processes: Following the Silverback and Competition for adult females.First, the value of D can be used to decide.D is calculated in the Equations below with the randomly selected variable W, starting the optimization process [38].

Follow the silverback
Silverback, the leader of the newly formed group, is a young, healthy male whom the other gorillas closely watch.In the same way, they follow Silverback's instructions to collect food and explore different areas.In addition, members of the group can affect movement within the group.Using this strategy when D ≥ W, Silverback commands his gorillas to search for food from different food sources.

Competition for adult females
At puberty, teenage gorillas compete for female members of their group with other males, a process that is usually violent.However, this strategy is applicable in situations when D < W. As a result, if the cost of GX (t) is lower than the cost of X (t) , then GX (t)  replaces X (t) , and is a better alternative (Silverback) [39].

Data acquisition and description
The datasets in this study are acquired from two public sources: Kaggle and UCI ML Repository.The first dataset is the Autistic Children Data Set, consisting of 3,374 images partitioned into "Autistic" and "Non-Autistic" cases [40].The Autistic Children Dataset consists of facial images of children.The second dataset is merged from three numerical repositories: (1) Autism Screening Adult, which consists of 704 records [41]; (2) Autistic Spectrum Disorder Screening Data for Adolescents, which consists of 104 records [42]; and (3) Autistic Spectrum Disorder Screening Data for Children that consists of 292 records [43].They consist of 21 attributes.After the merge process, the second dataset consists of 1,100 records; five samples are shown in Table 2.It shows that the first ten columns consist of Boolean values.There are another three numeric attributes and eight categorical attributes.
The datasets utilized in this study are publicly accessible and have been anonymized.The authors, however, have limited knowledge regarding the specific process of classifying facial images.It is presumed that the contributors of the dataset annotated the images, with medical professionals such as clinicians and doctors likely involved in the diagnosis and classification process.
The Autistic Children Data Set used in our study comprises facial images of children, classified based on a known diagnosis.The specifics of the classification process are not accessible to us, as the dataset was procured from a public source, Kaggle.It is believed that the images were annotated by the individuals who uploaded the dataset, with the involvement of clinicians and doctors in the diagnosis and classification of the cases.
The Autistic Children Data Set contains 3,374 facial images of children.The second dataset compiles data from three numerical repositories, encompassing 21 screening attributes.It's important to note that these datasets were obtained from public sources -Kaggle and the UCI ML Repository.All data used in our study are anonymized and publicly available.

Data pre-processing
The data pre-processing phase readies the datasets for the subsequent classification and optimization stages.As the data acquisition phase outlines, this study employs two different datasets (i.e., images and numerical records), necessitating distinct pre-processing techniques.The images in the Autistic Children Data Set vary in size; therefore, they are resized to a uniform dimension of (128, 128, 3).Given that this dataset is balanced, there is no need for data-balancing techniques at this stage.The numerical dataset contains categorical attributes, which are transformed using label encoding.This method converts categorical labels into numerical ones.Cells with question marks representing unanswered entries are replaced with zeros.
Five data scaling techniques are utilized in the current study for the images.They are (1) normalization, (2) standardization, (3) min-max scaling, (4) max-abs scaling, and (5) robust scaling.The equations behind them are shown from Equation (1) to Equation ( 5), respectively, where X i is the input record, X 0 is the scaled output record, μ is the record mean, σ is the record standard deviation, Q 1 is the first quartile, and Q 3 is the third quartile.

GTO initial population generation
This study utilizes the GTO optimization for both datasets.The GTO is used with the first dataset and DL CNN models to find the best models' hyperparameters, while it is used with the second dataset and machine learning models to select the most promising features.However, the initial population generation for them is the same.The population is randomly generated, and the population size is set to N max .Each solution is a vector of size (1 ×D) in the population, where each element is in the range ∈ [0, 1].The value of D is determined concerning the dataset.In other words, D will equal the number of hyperparameters for the first dataset and the number of attributes for the second dataset.Equation ( 6) illustrates the initialization process of the population matrix, where population represents the whole population matrix, LB, and UB represent lower and upper boundaries, and rand represents random values ∈ [0, 1].

Classification and GTO optimization phase
The learning phase begins when the datasets have been pre-processed and the initial population has been created.In this phase, various transfer learning hyperparameters, such as data augmentation and batch size, are optimized using the GTO metaheuristic optimizer.For each pre-trained transfer learning model being utilized, the goal is to determine the best hyperparameter values.There are three processes involved in this step.These are summarized in Fig. 2. The first and second phases in the Figure run once, while T[max] repeatedly iterates for the other two phases.

Fitness score calculation
In this step, the fitness function score for each solution is evaluated in relation to the given dataset.As previously mentioned, each solution cell contains a random value ∈ [0,1].Therefore, mapping these floating numbers to corresponding values is necessary for each dataset.For the image dataset, the cell values are mapped to hyperparameters, as outlined in Table 3.For the numerical dataset, the cell values are mapped to Boolean flags, determining whether to retain or drop the columns of the features.
How to map from the cells' values to hyperparameters?To understand how this mapping procedure works, consider mapping the batch size from the solution cell to an associated hyperparameter.The range of batch sizes from which to choose must first be determined.This paper utilizes the "4 → 48 (step = 4)" range.This results in twelve possible outcomes.Equation ( 7) can be used to calculate the possibility.For Instance, suppose the random numeric value is 0.75, and the possibilities are 12, then the index is 9. (i.e., the batch size value of 36).In Table 4, the ranges of hyperparameters are shown.The target pre-trained transfer learning model is assembled after mapping each element in the solution to the associated hyperparameter.This study's pre-trained transfer learning CNN models include DenseNet169, DenseNet201, MobileNet, MobileNetV2, MobileNetV3Small, and MobileNetV3Large with the "ImageNet" pre-trained weights [44,45].In the current investigation, each pre-trained transfer learning CNN model will start learning on the split subsets for several epochs equal to 5. The generalization of the pre-trained transfer learning CNN model is tested on the entire dataset.

Range Index
Fig. 2. Proposed autism detection framework integrating deep learning, machine learning, and gorilla troops optimizer (GTO).
How to map from the cells' values to a new data subset?Equation ( 8) maps from the cells' values to Boolean flags to keep or drop the features' columns.For the cell i at index i, if its value is greater than or equal to 0.5, the features' column is kept and dropped otherwise.
The ML models used in this study include Decision Tree (DT), Extra Trees (ET), and Light Gradient Boosting Model (LGBM).The grid search is applied to fetch the best ML models' hyperparameters.There are two hyperparameters for the DT and ET (i.e., criterion and splitter).The splitter has the options best and random, while the criterion has the options Gini and entropy.The LGBM has a single hyperparameter (i.e., learning rate) with the values [0.01,0.1,1.0].The number of estimators is fixed at 300.Cross-validation is used in  the ML models for five folds.
how is performance evaluated?Various performance metrics, such as accuracy, Area Under the Curve (AUC), and specificity, are computed to assess the model's performance.Specifically, accuracy is calculated by dividing the number of correct predictions by the total sample count.Sensitivity, or recall, indicates the proportion of actual positive samples correctly identified, reflecting the classifier's ability to detect positive instances correctly.On the other hand, specificity deals with truly negative samples, indicating the proportion of actual negative samples correctly identified.Precision represents the proportion of true positives among all instances classified as positive.The F1-score is a harmonic mean of precision and recall, providing a balanced measure of these two metrics.The AUC intuitively evaluates the overall quality of the classifier.In this study, we employ the following performance metrics: Accuracy (Equation ( 9)), Precision (Equation ( 10)), Specificity (Equation ( 11)), Recall (or Sensitivity) (Equation ( 12)), Area Under Curve (AUC), Intersection over Union (IoU), Dice (Equation ( 13)), Cosine Similarity, F1-score (Equation ( 14)), Youden Index (Equation ( 15)), Balanced Accuracy, and Overlap Index.

Population updating
The population is ordered in descending order according to fitness ratings.Consequently, the top choice is the best, and the bottom choice is the worst.This step is necessary to determine whether X t best and X t Worst are required for the population update process.The GTO metaheuristic optimizer determines the ideal hyperparameters for each CNN model.Three steps make up the GTO's operation.They are (1) three mechanisms for exploration, (2) one mechanism for exploitation, and (3) one mechanism for competing for adult females.Equation ( 16) represents an expanded exploration process; the exploitation mechanism is presented in Equation (17).
Equation (18) presents the mechanism for competition for adult females, where r 1 , r 2 , and r 4 are three random values, X (t) r represents a random solution from the population; a silverback gorilla position vector (i.e., best solution) is X silverback , the impact force is mimicked by Q, and the coefficient vector A represents the level of violence in a conflict [48].

The suggested framework overall pseudocode
The steps are computed iteratively for a maximum number of iterations T max .After that, the best combination can be used in any subsequent analysis.The suggested overall classification, learning, and hyperparameters optimization strategy is summarized by Algorithm 1.

Experiments and discussions
The experiments' results and analyses are presented in the current section.The common configurations used in the experiments for both datasets are listed in Table 4.

The "Autistic Children Data Set" experiments
For the first dataset, the best hyperparameters are presented in Table 5, while the corresponding performance metrics are presented in Table 6.Fig. 3 summarizes the best performance metrics for each used TL model graphically.The measurements are displayed on the x-axis, while the scores are on the y-axis.They show that a TF learning ratio above or equal to 50 % is recommended.Using categorical cross-entropy loss function is recommended by three models.
Using the SGD parameters, the optimizer is recommended by four models.Three models recommend the robust scaling technique.Applying data augmentation is recommended by all models.Applying horizontal flipping is recommended by five models.The lowest loss value is 0.512, which the DenseNet169 model reports.The highest accuracy, AUC, IoU, dice, cosine similarity, and F1-score are 86.55 %, 90.76 %, 89.66 %, 90.18 %, 87.22 %, 73.11 %, and 86.55 %, respectively, by DenseNet201, DenseNet169, DenseNet201, DenseNet201, DenseNet169, DenseNet201, and DenseNet201 models respectively.From that, DenseNet201 and DenseNet169 are considered the best models.

Table 5
Optimal hyperparameters for pre-trained CNN models applied to the autistic children dataset.

The second merged dataset experiments
The current study uses three numerical datasets, as discussed in Subsection 3.4.Table 7 shows the performance metrics before feature selection, while Table 8 shows the performance metrics after feature selection.They show that feature selection using GTO improved the performance metrics and decreased the elapsed time.They also show that five experiments recommend standardization, and the last five attributes are unnecessary in the classification process.Fig. 4 shows a graphical comparison between before and after using feature selection.
Some potential limitations of the study could include the limited size and heterogeneity of the datasets used, which may affect the generalizability of the findings.Furthermore, the study focuses mainly on diagnostic accuracy without considering other relevant clinical outcomes, such as the impact of the proposed intelligent system on patient management and quality of life.Finally, this study does not address the issue of interpretability of the proposed framework, which is an essential aspect of medical diagnosis and rehabilitation.

Conclusions and future work
The study proposes a Deep Learning and Machine Learning-based intelligence system using the Artificial Gorilla Troops Optimizer (GTO) metaheuristic optimizer to detect autism spectrum disorders (ASD).The study uses two datasets, including images of Autistic and non-autistic children and numerical repositories.The study uses various data scaling techniques, and GTO is employed to determine optimum hyperparameters for deep learning CNN models with the first dataset.In contrast, it is utilized to find the best features with the second dataset and machine learning models.The proposed method has potential clinical applications to aid doctors in accurately diagnosing ASD, leading to more effective treatments and better patient outcomes.In terms of future research, the study plans to expand to include a larger dataset and investigate other transfer learning architectures.The proposed method also has the potential to be applied in other medical areas where the diagnosis is challenging, leading to improved medical decision-making and patient outcomes.
Compliance with Ethical Standards: Ethics Approval This article contains no studies with human participants or animals performed by authors.The authors certify that they have no affiliations with or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript.

Table 6
The best performance metrics reported by the Autistic Children Dataset.

Fig. 4 .
Fig. 4. Graphical comparison of performance metrics before and after feature selection, with green bars representing 'before' and blue bars indicating 'after'

Table 1
Summary of previous studies in the field.

Table 2
Samples from the second merged dataset.

Table 3
Hyperparameters for optimization using transfer learning and image dataset.

Table 4
Configurations of the conducted experiments.
A.M.Almars et al.

Table 7
Performance metrics before feature selection for the second dataset.

Table 8
Performance metrics following the feature selection process for the second dataset.