Optimized MobileNetV3: a deep learning-based Parkinson’s disease classification using fused images

Background and Objective Parkinson’s disease (PD) is a progressive neurological condition that manifests motor and non-motor symptoms. Early in the course of the disease, PD patients frequently experience vocal difficulties. In the beginning, preprocessing procedures were used with multi-focus image fusion to enhance the quality of input images. It is essential to diagnose and treat PD early to ensure that patients live healthy and productive lives. Methods Tremors, rigidity in the muscles, slow movement, difficulty balance, and other psychological symptoms are some of the disease’s symptoms. One of the critical mechanisms supporting PD identification and assessment is the dynamics of handwritten records. Several machine-learning techniques have been researched for the early detection of this disease. Yet the main problem with most of these manual feature extraction methods is their poor performance and accuracy. Results This cannot be acceptable when discovering such a chronic condition. For this purpose, a powerful deep learning model is suggested to help with the early diagnosis of Parkinson’s disease. Therefore, we proposed MobileNetV3-based classification. To enhance the classification performances even more, the MobileNetV3-based approach was optimized by the Improved Dwarf Mongoose Optimization algorithm (IDMO). Conclusion The Pyramid channel-based feature attention network (PCFAN) chooses the critical features. The efficiency of the approaches is tested using the PPMI and NTUA datasets. Our proposed approach obtains 99.34% accuracy, 98.53% sensitivity, 97.78% specificity, and 99.12% F-score compared to previous methods.


INTRODUCTION
Parkinson's disease (PD) is a nervous system diagnosis that gradually impairs brain function.It affects the brain's dopamine-producing cells to perish (Rezaee et al., 2022).Consequently, it has an impact on the body's motor system.Pesticide exposure and a history of head injuries are the leading causes of PD.Research indicates that people with PD are more prone to smoking, drinking alcohol, having high cholesterol, or consuming excessive food (Gunduz, 2021;Masud et al., 2021).The signs of PD include delayed movement, tremors, imbalance, poor posture, and speech distortion.Dopamine-boosting

Novel contribution
The significant key contributions of this article are as follows, In the pre-processing stage, we first utilized the image contrast enhancement algorithm to enhance the difference of a given input image.Next, the mean filter was used to eliminate the noise from the input image.Then, we blended the contrast-enhanced image and filtered the image into a single image as a fused image employing the multiscale morphological gradient method.For extracting the features of the preprocessed image, we presented a new Pyramid channel-based feature attention network (PCFAN) that employed a multi-stage design with attention blocks at every step.To classify the levels of PD, such as mild, moderate, and severe, we employed the MobileNetV3 technique with the Dwarf Mongoose Optimization algorithm to improve the classification accuracy.Several ablation experiments have been performed on the PPMI and NTUA datasets.The experimental results showed that the suggested network outperformed the state efficiency compared to all other methods.

RELATED WORKS
This section summarizes current studies on the machine learning algorithms used in PD categorization and provides information on the most recent deep learning techniques.
To assess gait data and develop a DNN classifier for Parkinson's disease, El Maachi, Bilodeau & Bouachir (2020) introduced a 1D convolutional neural network (1D-Convnet).The suggested framework processes 18 parallel 1D data from foot sensors measuring the vertical ground response force (VGRF).A total of 18 parallel 1D-Convnets that correspond to system inputs make up the first segment of the network.In the second component, which is a fully linked network, the outcomes of the 1D-Convnets are concatenated to create the final categorization.Our tests showed that the suggested strategy highly detects PD from gait data.The proposed technique had a 98.7% accuracy rate.Using deep learning, Loh et al. (2021) developed a 2D-CNN model to diagnose Parkinson's disease automatically.The computerized detection of PD algorithm used in this study classified spectral images into unaffected PD patients with or without dopamine-producing medications and healthy controls using the suggested 2D-CNN model.The suggested model obtained a high accuracy of 99.46% for multi-categorization employing tenfold cross-validation.Ortiz et al. (2019) suggested a technique for PD identification utilizing isosurfaces-based characteristics and CNN.They employed a convolutional neural network (CNN) model based on LeNet-5 and AlexNet to identify isosurfaces and extract descriptive information.Similar to how contour lines connect places of equal height, isosurfaces link voxels with the specified intensity or value.The effort culminated in creating a categorization system that used supervised learning via CNN architectures to categorize DaTSCAN pictures.Karaman et al. (2021) created deep CNNs to automatically identify PD using voice signals obtained from biomarkers.Data pre-processing and fine-tuning-based transfer learning processes comprised two primary stages of the established CNN algorithms.They investigated whether combining a speech dataset from a large dataset with transfer learning model fine-tuning techniques could improve PD identification.The findings showed that the suggested deep CNN model, which combines transfer learning with a finetuning strategy, can diagnose PD with a 91.17% accuracy.
Using one-dimensional convolutions and bidirectional gated recurrent units (BiGRU), Diaz et al. (2021) developed a unique categorization framework to investigate the utility of handwriting's sequential information for detecting Parkinson's symptoms.In this research, the raw sequences and derived features were subjected to one-dimensional convolutions; the generated sequences were then fed to BiGRU layers to produce the final categorization.The recommended approach outperformed other existing alternatives when compared.Olivares et al. (2020) developed an improved ELM by utilizing the Bat Algorithm that increases the machine learning method's training phase to enhance the accuracy while lowering or maintaining loss in the learning phase.With input weights and bias values, the approximation technique simultaneously defines an optimal vector.It is intended to optimize the ELM's training phase to produce the best categorization model.When compared with existing approaches, the presented approach yields greater performances.
Employing deep learning (DL) approaches, Vyas et al. (2022) offered two unique methods.Convolution neural networks (CNNs) in 2D and 3D that were trained on axialplane MRI data were utilized.Four pre-processing stages, N4 bias correction, histogram matching, z-score normalization, and picture scaling, were performed to increase the algorithm's effectiveness.Using the information, the 3D model could categorize the test data with 88.9% accuracy and 0.86 area under the curve (AUC).

Limitations
However, this study may have some limitations: Due to the computational difficulty of two-dimensional CNN models, training is timeconsuming.Substantial amounts of memory in the computer are necessary as the method may crash If the memory requirement for training a method outweighs the number of shots.The proposed model's generalizability could be constrained by the limited sample size of the PD dataset employed in this investigation.
To overcome these problems, a Pyramid channel-based feature attention network (PCFAN) is used to extract the relevant features to reduce computational complexity.In our proposed method, we used a large number of images for training purposes for better prediction with sufficient memory.While comparing with our proposed approach, the existing approaches yield less prediction accuracy and high detection rates.Also, it takes high computational time.A 99.34% accuracy rate is achieved by this system, offering results that are equivalent to those of the recently suggested approach.As a result, it can be concluded that the input data is significantly reduced when models are computed while maintaining the relevant data, producing excellent classification accuracy with a low computational effort.

PROPOSED METHODOLOGY
The proposed methodology comprises three phases: preprocessing, feature extraction, and classification.In the beginning, during the pre-processing step, we first used the image contrast enhancement algorithm to improve the contrast of the provided input image, and then a mean filter was used to remove the noise.
After employing a multi-scale morphological gradient approach, the contrast-enhanced image and the filtered image were combined into a single image to create the fused image.We introduced a novel Pyramid channel-based feature attention network (PCFAN) that used a multi-stage design with attention blocks at each level to extract the features from the preprocessed picture.Finally, the images were classified as mild, moderate, and severe with the help of MobileNetV3.The IDMO algorithm optimized the classification approach hyperparameters.Figure 1 represents the architecture diagram of the proposed methodology.

Image preprocessing
This subsection provides a detailed explanation of the proposed preprocessing algorithm.Some contrast abnormalities have been removed using preprocessing methods to improve classification results.We have created two copies of the input images to accomplish this.Contrast adjustment was done to enhance the visualization of the needed region in the initial copy.The image's contrast was altered using the highest and minimum values of the image pixels.

Image contrast enhancement algorithm
A novel approach has been suggested by Ying et al. (2017) to give accurate contrast enhancement, and it was utilized to build the enhancement dataset.The weight matrix for picture fusion was first built using lighting prediction methods.The algorithm then functioned as follows.The camera response model was made available, allowing for the combining of numerous exposure photos.The optimal exposure ratio was then discovered for a decent exposure of the synthetic image in areas where the source image was underexposed.Further, a weight matrix was used to combine the input image and the created image to create a superior image.Equations ( 1) and ( 2) provide the essential formulas that the algorithm used.The images were integrated as in Eq. ( 4) to create an image with all pixels.
while N indicates the quality of images, P i the i-th image of the exposure set W i represents the weight map of the image, c for the three-color channel index, and R for the enhancement's outcome.P i is calculated from Eq. ( 2).
g is referred to as the brightness transform function (BTF), and k i is the exposure ratio.In our investigation, the BTF was the beta-gamma connectivity method from Eq. (3).

gðP; KÞ
The variables a, b, and k of the camera could be used to compute the parameters β and γ.As in the initial investigation, we used a constant parameter (a = 0.3293; b = 1.158).In the conclusion of the method, Eq. ( 4) was used to produce the enhanced image.

Mean filtering
Using numerous picture flattening patterns for graphic convolution processing is typical for picture de-noising in the spatial domain.The foundation of mean filtering is replacing a pixel's single grey value with the total of all the surrounding pixels' grey values.According to the following process, an image is created for a pixel point (a, b) in a source image with f (a, b), while its surrounding area S contains M pixels:

Multi-scale morphological gradient
The efficient operator, the multi-scale morphological gradient (MSMG), extracts gradient information from a picture to show the contrast level in a pixel's nearby areas.As a result of this, the MSMG technique is quite effective and used for edge detection and picture segmentation.MSMG has been employed in multi-focus image fusion as a focus measure.
The following details are used for MSMG.A multi-scale structural element is described as SE1 is a notation for a fundamental structural component.By using the morphological gradient operators from picture f, the gradient feature G t can be expressed.
while the morphological operators for dilation and erosion are denoted by È and (.) accordingly.Here, t refers to the number of scales.By estimating the weighted total of gradients across all scales, one may derive the multi-scale structuring element from the gradient feature.The same preprocessed images are shown in Fig. 2.
Here, the weight of the gradient in the t-th scale is denoted by They increase the depth of the feature maps to 64 and 128 and reduce the resolution of the feature maps by half, respectively.Unlike earlier studies, which mainly used the third stage's output features, the PCFA, made of several channel-attention blocks, provides data from all three stages' output features.It is possible to extract spatial and channel-dimension information using the channel-attention block.Then, a network for image reconstruction with just one CNN layer is used to recover the original clean image.
Channel attention block: To make sure that the network collects additional features, the channel attention strategy is employed in this study to look at the dependencies between feature channels.
The framework of PCFA is shown in Fig. 3. Concatenation operation is performed by fc at position ði; jÞ, ½v 1 ; v 2 ; …:v c and as a result, l is the concatenation of v k ðk ¼ 1 . . .CÞ.These processes can record how the aggregated features depend on one another through channels.This is represented as: The convolution layer, sigmoid activation function, and ReLU are denoted by f, g, and r.The goal f 1 is to decrease the input feature channels.The features are first activated by ReLUg, then with a convolution layer 2, they are expanded to their original width.The final output feature F out is attained by while is an element-wise product, and the original feature is represented as f.PCFAN: This model can retrieve features from various CNN layers and merge those characteristics simultaneously to produce more valuable features.These techniques, however, typically employ an intuitive fusion process, such as addition or concatenation.In order to integrate the advantages of the feature pyramid and the channel attention mechanism, we presented the PCFA approach.PCFA comprises two layers of upsampling, two levels of concatenation, and four channel-attention blocks.In PCFA, top-down and bottom-up pathways are both available.The suitable channel attention blocks for the bottom-up path-way are filled with features from three tiers.PCFA reconstructs layers with higher spatial resolution from the downtop pathway using their semantically richer layers.Combining information from the topdown and bottom-up routes makes it possible to express traits effectively and comprehend their relative importance at different levels.The complementary information among lowlevel and high-level attributes can, therefore, be fully utilized by PCFA for extraction.

Loss function
The suggested network is optimized using the Mean Square Error (MSE) loss L mse and the Edge loss L edge , two loss functions.The MSE loss is employed to assess differences between image outcomes regarding pixel-wise aspects.The MSE is explained as follows:

Classification
In order to classify Parkinson's disease after successfully identifying brain images in scans, we suggested utilizing MobileNetv3 deep learning.The CNN family, known as MobileNet, was created by a team of Google, Inc. researchers for the purpose of classifying images.MobileNet included several innovative ideas through its numerous iterations to decrease the number of parameters while maintaining excellent classification accuracy.Compared to several other CNN architectures of equivalent size, MobileNet performs well in terms of accuracy because of the MADDS (multiply-add operations).MobileNetV3 (Zivkovic et al., 2022), in particular, has the top-1 accuracy among the other models.Investigating the proposed approach for this categorization challenge was primarily motivated by this.
MobileNet is made of bneck blocks, a collection of construction blocks.Figure 4A shows the overall MobileNet architecture, whereas Fig. 4B shows the intricacies of a bneck block.MobileNetV1 used depth-wise convolutional operations in place of typical convolutional methods.In Fig. 4B, a residual link between the input and results tensors is shown.Then, as depicted in Fig. 4, the creators of MobileNetV3 added both the compression and expansion stages to the start and finish of every bneck block.This setup is called an "Inverted Residual Block" (IRB), as the residual connections only make limited output and input tensor connections.
The IRB idea contributed to further lowering the model's computational expenses.In order to further reduce computations (such as ReLU), the authors used linear activations rather than non-linear activation functions after filtering the input and output tensors.The SE module was included by the authors to finalize the MobileNetV3 concept.
The SE module also features a function of the h-swish added by the authors.The description of the Swish activation function is as follows: The description states that a bneck block constructs a feature map optimized using SE components and connection residuals.Our decision to use the bneck block as the fundamental model of an architecture resembling the UNet was motivated by this.

Improved dwarf mongoose optimization algorithm (IDMO)
This phase provides details on the IDMO algorithm working process.

The IDMO mode
The Improved Dwarf Mongoose Optimization algorithm (IDMO) is suggested to improve DMO exploration and exploitation.The DMO is modified in three straightforward yet efficient ways by this optimization technique.Alphas are chosen by the IDMO rather than the DMO, which chooses them solely based on their computational load.The IDMO's alpha selects the mongoose based on physical fitness, and a new operator is presented to control the alpha's mobility.This increases the IDMO's exploration and exploitability.
Secondly, randomization changes the scout group motions to diversify the search and investigate previously unexplored places.The suggested approach obtains optimization in three stages they are babysitters, forage area, and abundant food source.The search agents are the individual mongooses, described as an n-by-d matrix.The modified alpha guides the group to unexplored territory during the exploration phase by following the modeled processes.
Randomization is used to change the scout group motions in order to diversify the search process and explore previously unexplored locations.Once the requirement for babysitter swap is satisfied and babysitters are switched, the exploitation is accomplished.At this stage, the obtained solution is upgraded in order to yield superior results.

Population initialization
According to Eq. ( 14), a matrix of potential dwarf mongooses (X) is used to stochastically initialize the IDMO population.Among the optimization issue's upper bound (U) and lower bound (L), the population vector lies.

X ¼
x 1;1 x 1;2 x 1;dÀ1 x 1;d x 2;1 x 2;2 x 2;dÀ1 x 2;d : x i;j : x n;1 x n;2 x n;dÀ1 x n;d while n represents the entire amount of dwarf mongooses in a mound, Every X i;j in Eq. ( 15) reflects the location of the jth population's dimension.

Alpha group
Babysitters are subtracted from the overall amount of dwarf mongooses to get the total population dimensions for this group.The dwarf mongoose that is the fittest is chosen to be the alpha, as shown by the alpha symbol in Eq. ( 16).
Peep's vocalizations are used by the female alphas to keep them integrated.As specified in Eq. ( 16), the IDMO moves about the problem space while searching.It initially identifies as the strongest dwarf mongoose in the family and leads the other members of the pack towards a prospective food source.In contrast to the DMO, which relies solely on the vocalization of the alpha to sway the behavior of the other dwarf mongoose, this situation deviates from it.The IDMO's exploration and exploitability are improved by operator a.The IDMO uses the location of the alpha to establish the location of the other mongoose.

Scout group
Scouts are responsible for selecting an appropriate sleeping mound as dwarf mongooses are semi-nomadic and are reported to never return to their sleeping mounds.Due to the dwarf mongooses' propensity for congregating around plentiful food sources, the scouts' fitness level is considered when choosing a prospective sleeping mound.Consequently, the fittest scout is chosen.As stated in Eq. ( 19), the scouts are simulated.
The babysitters Equation ( 20) provides the exchange standard for the babysitter.Once the requirement is met, the counter is reset to 0, and the swapped babysitters communicate with the dwarf mongooses.By doing this, they could create mongooses better suited to their environment rather than starting from scratch, as in DMO.By multiplying L by the current iteration and CF, L is reset if it reaches zero.
directing the dwarf mongooses' overall volitional movement, X b , X v , and X u are chosen at random to take the position of the babysitters, and br denotes the birthrate.It makes the alpha selection overhead simpler, and the IDMO's computing complexity is significantly decreased.Dwarf mongooses forage under the leadership of the alpha female, which initiates the optimization process.The nest is tended by a small group known as the babysitters.Finding sufficient food sources resembles the IDMO's exploratory phase.The exploitation phase of the IDMO is represented by this hypothetical scenario.The search region is further investigated and exploited by looking for a sleeping mound at night.

RESULT AND DISCUSSIONS
This section investigates the specifics of the analytical results obtained using the suggested methodology based on evaluation criteria like sensitivity, accuracy, specificity, and F1 score.The five steps in this research include dataset description, preprocessing, feature extraction, classification method, and analysis outcomes.This study categorized PD diseases using the PPMI and NTUA datasets.Two sets of data were created from the full dataset: a training set that contained 80% of the data and a validation set that contained 20% of the data.Several tests were conducted using various network configurations.Network configuration options were kernel size, batch size, stride, and padding.During each epoch of model training, validation, and training accuracy statistics were recorded.The model was tested on the test set after every training cycle.It was observed that the categorization accuracy varied across all experiments above 99%.The output samples of proposed methodology is shown in Fig. 5.

Experimental setting
Windows 10 was used on an Intel i5 2.60 GHz processor with 16 GB of RAM.The investigations were conducted using Python, KERAS, and TensorFlow against the backdrop of the Anaconda3 environment.In this research, the PPMI and NTUA datasets were used for validation to calculate the performances of our suggested strategy.The data samples were divided into two groups, one served as the training dataset and was used to build a classifier.In the second step, the classifier was assessed using the testing dataset.

Dataset description
The Parkinson's Progression Makers Initiative (PPMI) collection contained photographs of patients and controls accessed for the research.An international network of clinical sites was where the PPMI was conducted.One goal of the PPMI was to gather medical, biological, consumer, and imaging data in order to hasten the establishment of biomarkers of PD progression.The ultimate objective of these biomarkers was to be employed in therapeutic investigations.For this investigation, T1-weighted MR images from PPMI were chosen.
With the use of a 1.5-3 Tesla scanner, these images were produced.It takes about 20 to 30 min to complete the scan.Axial, sagittal, and coronal views were used to acquire the three-dimensional (3D) sequence of T1-weighted MR images with a slice thickness of 1.5 mm or less.Approximately 6,500 images in the dataset.From the dataset, we utilized 4,550 (70%) images for training purposes, and the remaining 1,950 (30%) images were utilized for testing purposes.

NTUA
The NTUA Parkinson dataset contains MRI, DaT scans, and clinical information from PD-affected patients.There were about 42,000 photos that could be used for research purposes.The frames per sequence and resolution of the T1, T2, and Flair MRI image samples in this dataset varied for each image.From the dataset, we utilized 29,400 (70%) images for training purposes, and the remaining 12,600 (30%) images were utilized for testing purposes.

Evaluation criteria
To evaluate the prediction capabilities of the classifiers, evaluation measures were required.
The following metrics were employed to evaluate the proposed method.Accuracy: Accuracy is the percentage of accurate forecasts among all made predictions.
Sensitivity: Sensitivity, sometimes called true positive rate (TPR) or Recall, evaluates a system's tendency to anticipate the future positively.
Specificity: Specificity, commonly called true negative rate (TNR), evaluates a system's capacity to forecast negative outcomes correctly.
Precision: Precision, called positive prediction value (PPV), assesses a system's capacity to generate only relevant outcomes.
F-Measure: The harmonic mean of precision and recall was computed by F-measure.

Evaluation of training and testing time
The batch size, learning rate, momentum, and weight decay were each 32, 0.03, 0.9, and 0.01.A 0.01 learning rate was used initially.The learning rate reached saturation in the ReLu layer.The quantity of epochs was also a vital training parameter, as there was a chance that the network might be either under or over-fitted.We ran the network through 200 epochs of training on these datasets.The suggested model's training and testing accuracy spanned between 0.98 and 0.99.The hyperparameter configuration is shown in Table 1.In Figs. 6 and 7, the graphs show the categorization accuracy and loss value of the IDS concerning the number of iterations.As observed in the image, the approach used in this study had a positive convergence effect.We divided the dataset into two halves for the   Differentiation of existing machine learning approaches vs proposed In this section, we compared our proposed deep learning approach with existing machine learning approaches.The existing machine learning approaches like SVM, RVM, and Naïve Bayes were analyzed.We found that the suggested strategy outperformed existing approaches, demonstrating 99.06% accuracy, followed by SVM, GA-ELM, RVM, Decision tree, Naive Bayes, ANN, and CNN with 92.35%, 89.22%, 90%, 92%, 93%, 94%, and 96%, respectively.The approach proposed in this study provides a better outcome when compared to other existing solutions.CNN was the second strategy that produced a greater performance.GA-ELM performed less effectively than the other strategies.A comparison of existing approaches is shown in Fig. 8 and Table 2.

Comparison of transfer learning approaches
The existing transfer learning approaches like DenseNet121, VGG16, ResNet, MobileNet, and Inception V3 were differentiated with the proposed method.The table compares existing transfer learning approaches with the proposed ones.
F-score, sensitivity, accuracy, and specificity were also used to compare the results.In Table 3, the comparison findings for the transfer learning approach are presented.Compared with the other approaches, the proposed method yielded a superior performance.Figure 9 shows the comparison of existing approaches with the proposed method.
The receiver operating characteristic (ROC) curve shows the performance of a model used for categorization across all levels.Two variables, TPR and FPR, were plotted on this curve.The ROC curve is shown in Fig. 10.Table 4 shows the comparison of features.The essential features were extracted based on the PCFA approach.Our proposed approach extracted all the features.Compared with the previous methods, PCFA performed well.Comparison of existing machine learning approach with proposed deep learning approach is shown in Table 5.The existing machine learning approaches like neural network, naïve Bayes, random forest, and SVM were employed to compare.While comparing with all four methods, the proposed approach exhibited greater performances over accuracy, sensitivity, and specificity.The results of optimized MobileNet V3 and MobileNet V3 during the testing process are shown in Table 6 and Figs.11 and 12. Before employing the optimization approach, the MobileNet V3 yielded 98.09% accuracy in the PPMI dataset and 96.53% in the NTUA dataset; the proposed approach performed slightly lower.While we used the IDMO algorithm to optimize the proposed approach, the performance improved considerably.
Training, testing accuracy, and loss comparison are shown in Table 7.The existing approaches like DenseNet, VGG 19, ResNet, and inception V3 obtained low performances compared with the proposed model.The approach proposed in this study, optimized by IDMO, showed high accuracy and a lesser loss.
A comparison of existing related works is represented in Table 8.The existing authors used various datasets, including PD, PPMI, PD audio dataset, etc., but our proposed approach employed PPMI and NTUA datasets.While compared with existing approaches, the proposed approach yields a greater predictive performance.The second most performance was presented in reference (Loh et al., 2021).Here, we used PPMA and NTUA datasets for our experimental work, then compared our proposed approach with other existing "state-of-the-art" approaches to analyze the efficiency of our work.The suggested approach performed better than the existing systems in terms of accuracy, specificity, F-score, and sensitivity, with 99.34% accuracy, 99.12%, 97.78%, and 98.53%, respectively.In previous studies, they achieved lower accuracy with higher computational time.In this research, the proposed methodology achieved higher classification accuracy than previous studies with less computational time using deep learning based feature extraction and optimized classification techniques.Future studies will tackle three important subjects.The suggested expert system's performance will initially be evaluated using a variety of datasets.Hybridization of algorithms or new nature-inspired algorithms for feature selection can be investigated to identify PD and other applications.Also, it will be crucial to improve the variety of deep learning comparison techniques.

Figure 5
Figure 5 Result of proposed methodology.Photo credit: B. Srinivasa Rao.Full-size  DOI: 10.7717/peerj-cs.1702/fig-5 model's training and testing.A total of 200 training epochs of the processed training set were used to train the suggested strategy during this phase.It was configured to learn at a rate of 0.01.It employed L1 or L2 regularization techniques to penalize large weights in the network.This helps prevent the model from fitting the noise in the training data.Dropout is another regularization technique that randomly drops a fraction of neurons during training to reduce overfitting.Consider reducing the depth and width of your MobileNet model.It can use smaller variants like MobileNetV3 or even explore custom architectures tailored to the specific task.

Table 1
Optimized hyperparameter used for training.

Table 2
Comparison of existing approaches.

Table 4
Comparison of extracting features.

Table 6
Result of optimized MobileNet V3 and MobileNet V3 during the testing process.

Table 7
Training and testing accuracy comparison with existing approaches.Parkinson's diagnosis is an extremely challenging medical problem.Although it is technically challenging to confirm a Parkinson's diagnosis, practitioners can recognize the disorder by examining patients and looking at various symptoms.The optimized MobileNet V3 was used in the proposed study to examine MRI data to identify Parkinson's disease classes.MobileNet V3 method was optimized with the help of the Improved Dwarf Mongoose Optimization algorithm (IDMO).For extracting the features from the preprocessed image, we presented a new Pyramid channel-based feature attention network (PCFAN) that employs a multi-stage design with attention blocks at every step.

Table 8
Comparison of existing research with proposed.