Classification of breast cancer using a manta-ray foraging optimized transfer learning framework

Due to its high prevalence and wide dissemination, breast cancer is a particularly dangerous disease. Breast cancer survival chances can be improved by early detection and diagnosis. For medical image analyzers, diagnosing is tough, time-consuming, routine, and repetitive. Medical image analysis could be a useful method for detecting such a disease. Recently, artificial intelligence technology has been utilized to help radiologists identify breast cancer more rapidly and reliably. Convolutional neural networks, among other technologies, are promising medical image recognition and classification tools. This study proposes a framework for automatic and reliable breast cancer classification based on histological and ultrasound data. The system is built on CNN and employs transfer learning technology and metaheuristic optimization. The Manta Ray Foraging Optimization (MRFO) approach is deployed to improve the framework’s adaptability. Using the Breast Cancer Dataset (two classes) and the Breast Ultrasound Dataset (three-classes), eight modern pre-trained CNN architectures are examined to apply the transfer learning technique. The framework uses MRFO to improve the performance of CNN architectures by optimizing their hyperparameters. Extensive experiments have recorded performance parameters, including accuracy, AUC, precision, F1-score, sensitivity, dice, recall, IoU, and cosine similarity. The proposed framework scored 97.73% on histopathological data and 99.01% on ultrasound data in terms of accuracy. The experimental results show that the proposed framework is superior to other state-of-the-art approaches in the literature review.


INTRODUCTION
With over two million cases diagnosed globally in 2020 and about seven hundred thousand deaths, breast cancer is classified as the most lethal female disease (Sung et al., 2021). By 2040, the number of patients is anticipated to rise to 4.07 million, with 1.4 million deaths (Global Cancer Observatory, 2020). With approximately 26% of women affected worldwide, it is regarded as the first cancer type to cause death. However, detecting and diagnosing breast cancer in its early stages can enhance survival rates by up to 80% (Yu et al., 2022). Clinicians must identify suspicious tumors after segmenting them to diagnose breast abnormalities. As the number of breast cancer patients climbs, clinicians will find it increasingly difficult to accurately detect the disease in a short time (Oza et al., 2022). For medical image analysts, the diagnosis process is complex, time-consuming, and monotonous. Medical image analysis is one of the most exciting research areas that have gained significant attention in academia and the medical sector, allowing for the detection and treatment of various illnesses, including breast cancer (Dewangan et al., 2022;Zerouaoui & Idri, 2022;Chowdhury et al., 2022;Mohamed et al., 2022). Breast cancers are classified into five stages according to Peiris et al. (2015). The stages progress from precancerous to metastatic or advanced cancer. Digital mammography is a tool for identifying early-stage breast cancer with no need for surgery (Demb et al., 2020). With mammography screening, radiologists can detect cancers in mammogram images with smaller diameters and seemingly random places. The ability of mammography to detect significant early signs has resulted in a 15% reduction in breast cancer deaths (Løberg et al., 2015). Breast ultrasound is a regularly used diagnostic modality in clinical practice (Pourasad et al., 2021). Breast cancer is caused by epithelial cells surrounding the terminal duct lobular unit. Cancer cells inside the basement membrane of the draining duct and the basement membrane of the terminal duct lobular unit are referred to as in situ, or non-invasive cancer cells (Jabeen et al., 2022). The presence of axillary lymph node metastases is an important factor when selecting prospective therapy for breast cancer (Sun et al., 2020). Ultrasound imaging is one of the most commonly used test materials for detecting and characterizing breast disorders. It is a common imaging method for mammography and radiological cancer diagnostics.
Artificial intelligence (AI) technology has lately been used to aid radiologists in detecting breast cancer more quickly and accurately utilizing mammography (Zhang et al., 2021). Radiologists may concentrate on discrete or localized areas, but AI-based systems analyze mammograms as a whole at the pixel level and have spatially long-range memory. AI strategies for enhancing mammographic detection of breast cancer have been widely studied, involving several issues and challenges. The use of artificial intelligence (AI) in mammography analysis is widely thought to make the computer-aided diagnosis (CAD) process more interesting, effective, and objective. Earlier CAD systems relied on handcrafted features to recognize suspicious bulk using traditional machine learning algorithms (Muramatsu et al., 2016;Virmani et al., 2016;Yassin et al., 2018;Li et al., 2017). Manual feature extraction has led to a higher number of false positives (Dhungel, Carneiro & Bradley, 2017). Because the image has numerous details, identifying the optimal set of features for recognition becomes quite difficult. Deep learning methods have overcome the shortcomings of traditional machine learning approaches by learning an object's attributes during the training phase (Rupapara et al., 2022;Gardezi et al., 2019).
In recent decades, convolution neural networks (CNN) have been used in CAD systems for breast cancer diagnosis. It functions as a decision-making mechanism by giving extra information to discriminate between malignant and benign tumors (Oza et al., 2022). The CAD can potentially increase the overall accuracy, sensitivity, and specificity of breast lesion diagnosis. In addition, this approach cuts down the number of false positives caused by human mistakes. Several researches that discuss the use of CAD systems to diagnose breast cancer have been published in the literature, some of which have already passed clinical testing (Yu et al., 2022;Bruno et al., 2020). This study introduces a framework for reliable breast cancer classification based on histopathological and ultrasound data using CNN and Transfer Learning (TL) (Agarwal et al., 2021). The Manta Ray Foraging Optimization (MRFO) algorithm is used for parameters and hyperparameters optimization. The following points outline the current study's contributions: -Presenting a novel CNN-based framework for automatic and reliable breast cancer classification based on histopathological and ultrasound data. -Applying Transfer Learning using eight modern pre-trained models.
-Optimizing CNN and Transfer Learning hyperparameters via MRFO algorithm to achieve the best configurations for each pre-trained model and better classification performance. -The hyperparameters of the CNN architecture do not need to be set manually, making the proposed framework more adaptable. -The results reveal that the worthiness of the proposed framework exceeds most related research in terms of accuracy and other criteria.
The rest of the article is organized as follows: In 'Background', the background of deep learning, CNN, and metaheuristic optimization are introduced. 'Related Studies' reviews the related studies of breast cancer classification. In 'Methodology', the approach is described, as well as the proposed framework. In 'Numerical Results and Analysis', the experimental results are examined and debated. Finally, in 'Conclusions', the article is concluded.

BACKGROUND
This section presents comprehensive preliminaries for the most relevant subjects and methodologies employed in this study. Convolutional neural network (CNN), transfer learning (together with the pre-trained CNN networks utilized in this study), metaheuristic optimization, and the MRFO optimization technique are among the subjects covered.

Convolutional neural network (CNN)
Hinton invented the term ''deep learning'' in 2007 to describe machine learning models for high-level object representation, although it was not widely acknowledged until late 2012 (LeCun, Bengio & Hinton, 2015). Deep learning gained prominence in the field of computer vision after a deep learning technique based on a convolutional neural network (CNN) won the most well-known computer vision competition, ImageNet (Krizhevsky, Sutskever & Hinton, 2012). CNN is a deep learning system inspired by the human brain's visual cortex and seeks to mimic human visual function (Sarvamangala & Kulkarni, 2021). CNN is a significant advancement in image comprehension, including image classification, segmentation, localization, and detection. The fundamental reason for CNNs' extensive adoption is their efficiency in image recognition. The CNN takes an image as input and produces classification categories such as cancer or non-cancer. Local shift-invariant interconnections connect the layers. CNN is a multi-layer generative graphical model that employs pixel values in images as input information rather than features generated from segmented objects, obviating the need for feature calculation or object segmentation. CNNs, like human neurons, are made up of convolutions with learnable weights and biases. Convolutional layers, activation functions, pooling, and fully-connected layers are the primary building blocks of CNNs, as depicted in Fig. 1.

Transfer learning
Training and testing data should be in the same feature space, according to a key idea in many machine learning (ML) algorithms. This hypothesis, however, may not hold true in various real-world applications. For example, there are instances where training data is prohibitively expensive or difficult to obtain (Baghdadi et al., 2022). As a result, highperformance classifiers trained using increasingly available data from different domains are required. This strategy is known as transfer learning (TL) (Agarwal et al., 2021). When the target domain has limited data, but the source domain has a large dataset, TL performs brilliantly. The process of obtaining information from a source domain DS that contains a learning task TS and applying that knowledge to enhance a learning task TT is informally described as TL. In a target domain, DT, where DS is not the same as DT or TS is not the same as TT (Zhuang et al., 2020;Iman, Rasheed & Arabnia, 2022).

CNN architectures
This section highlights the CNN architectures pre-trained to perform transfer learning. Among several designs, InceptionV3, Xception, EfficientNetB7, NASNetLarge, VGG19, SeNet154, DenseNet201, and ResNet152V2 are discussed since they were used in this work.

InceptionV3:
With transfer learning, InceptionV3 achieved high classification accuracy in numerous biomedical applications (Ashwath Rao, Kini & Nostas, 2022;Al Husaini et al., 2022). The symmetric and asymmetric construction components of the InceptionV3 model include convolutions, average pooling, max pooling, concatenations, dropouts, and fully connected layers (Szegedy et al., 2016). Batch normalization is employed frequently throughout the model and for activation inputs. In addition, InceptionV3 employs Softmax to compute Loss, Max-pooling for spatial pooling, and FC for final classification.
Xception: Xception, which stands for Extreme version of Inception, was proposed as an expansion of the Inception architecture that was fully built on depth-wise different convolutions rather than conventional convolutions (Chollet, 2017). The use of depth-wise distinct convolutional layers in the CNN network simplifies network computations. This aids in learning the efficient features from images and improves model performance.

EfficientNetB7:
The core building component of EfficientNet architecture is Mobile Inverted Bottleneck Convolution (MBConv) (Sandler et al., 2018). The number of these MBConv chunks varies within the EfficientNet network family. From EfficientNetB0

VGG19:
The VGG-19 deep learning algorithm is a 16-layer network with three FC layers (Simonyan & Zisserman, 2014). Maxpool, FC, Relu, dropout, and softmax layers are among the 41 total layers. VGG tackled the large-size filter impact by replacing it with a stack of (3times3) filters. However, the calculation complexity is increased by using small-size filters.
SeNet154: Squeeze-and-Excite Networks provide a building element for CNNs that improves channel interdependencies at nearly no computational cost (Hu, Shen & Sun, 2018). Aside from providing a significant speed improvement, they are also simple to include in current systems. The Squeeze and Excitation Network, in essence, proposes a novel channel-wise attention mechanism for CNNs in order to improve their channel interdependencies. They learn to use global information, boosting advantageous features while ignoring others.
DenseNet201: Dense convolution network (DenseNet) is a pre-trained deep learning model that employs feedforward to connect each layer to all subsequent levels (Huang et al., 2017). A feature map is included in each layer of the model. The feature map of each layer serves as the input for the following layer. It allows optimum information transfer inside the network by connecting all levels directly. DenseNet's key benefits are that it drastically reduces the number of parameters, reduces gradient runaway, improves feature diffusion, and encourages feature reuse.

ResNet152V2:
A Residual Network (ResNet) is a CNN architecture composed of several convolutional layers. Previous CNN configurations reduced additional layers' efficiency. As a result, ResNet has many layers and is highly performant (He et al., 2016). The main difference between ResNetV2 and the previous (V1) is that, before adding a weight layer, V2 performs batch normalization on it. Consequently, ResNet performs well in image recognition, illustrating the significance of a wide range of visual recognition tasks.

Metaheuristic optimization
Optimization is a critical procedure for making the best use of various resources. It is a necessary step in any field of study. Several strategies have been developed to address various optimization challenges (Meraihi et al., 2021). Each algorithm employs a particular acceptable technique for certain problems and efficiently solves them while being ineffective for others (Abualigah et al., 2022). However, the vast majority of them fall into one of two groups. The conventional approaches, such as gradient descent and Newton (Wang et al., 2021), fall under the first category. These approaches are basic and straightforward in general, but they only yield a single solution every iteration, which is time-consuming. The strategies in the second category, known as metaheuristic (MH), tend to escape the limits of traditional methods (Alshinwan et al., 2021). Nowadays, MH optimization methods are highly appealing due to their particular benefits over traditional algorithms. MH is used to identify high-quality solutions to a wider range of challenging real-world scenarios since it can handle multi-objective multi-modal, and nonlinear formulations. These MH strategies are generally inspired by nature, physics principles, and human behavior. The primary classes of MH include natural phenomena-based, swarm-based, human-based, and evolutionary-based techniques (Abualigah et al., 2022). Natural phenomena-based approaches imitate natural phenomena such as spirals, rain, wind, and light (Ewees et al., 2021;Şahin, Dinler & Abualigah, 2021). The swarm-based methods mimic the behavior of animals, birds, fish, and other swarms when they are looking for food (Sharma & Kaur, 2021;Zhao, Zhang & Wang, 2020). As optimization methods, human-based methods mimic human behavior (Moosavi & Bardsiri, 2019;Khishe & Mosavi, 2020). Finally, the mechanism of evolutionary-based approaches is inspired by emulating the notions of natural genetics, which is employed to replicate the principle of natural genetics (Tzanetos & Dounias, 2021;Bentéjac, Csörgo & Martnez-Muñoz, 2021). These approaches rely on the operators' crossover, mutation, and natural selection.

Manta ray foraging optimization
Manta Ray Foraging Optimization (MRFO) (Zhao, Zhang & Wang, 2020) is a swarm MH optimization algorithm. It was inspired by the manta rays' foraging activity when acquiring their food. Chain foraging, cyclone foraging, and somersault foraging are three strong and sophisticated foraging tactics evolved by manta rays. Chain foraging emulates the fundamental activity of food hunting. The hunting manta rays form a chain to capture all the plankton on the way. Manta rays consume plankton, which is regarded as a food resource. The chain leader of the manta ray guides the chain as it has the best solution. This process decreases the risk of losing plankton and increases food hunted. When a high plankton density is located, The manta rays swim in a spiral pattern towards the plankton during cyclone foraging. The spiral route is determined by the location of plankton and the relative position of a manta ray in relation to its front agent. This mechanism produces a vertex in the eye of a cyclone, allowing manta rays to capture the plankton easily. Somersault foraging starts by considering the plankton's best position as a pivot. Then, the searching manta ray performs backward somersaults before rolling around the pivot. Later on, the leader, manta ray, updates its location with the best location so far. Despite their uncommon nature, these foraging activities are exceedingly effective.

RELATED STUDIES
Most breast cancer detection strategies have relied on machine learning or deep learning to classify cases into binary or multi-class classifications. Accuracy, precision, recall, and F1-score performance indicators are all tracked using these methodologies. Breast cancer diagnosis and categorization have recently become the subject of substantial investigation. Wang et al. (2016) proposed a deep learning-based approach for automatically detecting metastatic cancer using entire slide images of sentinel lymph nodes. To improve the training set, they included patches from normal lymph node regions that the system had initially misclassified as cancer. The method used a 27-layer deep network architecture and achieved 98.40% classification accuracy. Han et al. (2017b) developed a deep learning-based breast cancer multi-classification approach. The structured model worked brilliantly on a large-scale dataset (averaging 93.2% accuracy). This indicates the proposed strategy's strength in terms of providing a useful tool for breast cancer multi-classification. Han et al. (2017a) employed a deep learning framework to distinguish between different types of tumors in breast ultrasound imaging. The training data used ten-fold cross-validation to find the optimal values. The networks had a 91% accuracy rate. Jannesari et al. (2018) used pre-trained deep neural networks and fine-tuned them to distinguish between several cancer types such as breast, bladder, lung, and lymphoma. This method was then applied to the BreakHis database to categorize breast cancer subtypes. They were able to perform breast tumor binary classification with 96.4% accuracy. Golatkar, Anand & Sethi (2018) created a two-stage system for categorizing tissue images into four groups: normal, benign, in situ, and invasive. First, they employed a pre-processing technique to select only important regions from tissue images for training and testing. Next, they employed transfer learning to train the patch-level classifier and majority voting to classify images. The average accuracy across the four classifications was 85%, with carcinoma vs. non-carcinoma classification accuracy of 93%. Fujioka et al. (2019) employed a CNN to classify breast mass images obtained from ultrasound. The CNN model had an accuracy of 92.5%. Compared to radiologists, the CNN model performed better in the binary classification of breast tumors based on ultrasound. Hijab et al. (2019) investigated three deep learning variations for ultrasound (US) images for computer-aided recognition of breast cancers. To improve accuracy, they fine-tuned and employed the VGG16 pre-trained model. The accuracy of the results ranges from 79 to 97%. Singh et al. (2020) embraced the notion of employing transfer learning in breast cancer, where they worked on unbalanced data from the WSI dataset and employed VGG-19 with other classifiers such as logistic regression, random forest, and other dense layers. As a result, they were able to achieve maximum accuracy of 90.30%. In their study, Ayana, Dese & Choe (2021) focused on transfer learning methods applied to ultrasound breast image classification and detection utilizing pre-trained CNN models. Furthermore, their examination of some of the most regularly utilized transfer learning approaches highlights the potential for future study in this area. In their research, Khamparia et al. (2021) proposed a hybrid transfer learning model incorporating MVGG and ImageNet. They used the WSI dataset to test their model, and its accuracy was 94.3%. They also employed image segmentation and 3D mammography throughout their study, which helped them get superior results. Choudhary et al. (2021) conducted extensive experiments with three significant pretrained CNNs, including VGG19, ResNet34, and ResNet50. They attained an accuracy of 91.25% using the VGG19 pruned model, outperforming early techniques on the same dataset. Alzubaidi et al. (2021) proposed a solution to the data scarcity problem in medical imaging. The authors employed many unlabeled histopathological images of breast cancer to train the CNN model. The model was fine-tuned before being trained on a small labeled breast cancer dataset. The authors attained a 97.51% overall accuracy with this method. The authors also used new double transfer learning, which resulted in a 97.7% accuracy. Dewangan et al. (2022) developed a technique for early-stage detection of breast cancer. Initially, the system is trained using the acquired MRI breast image dataset. Following that, the pre-process function was used to eliminate the training faults. Consequently, the pre-processed data was supplied into the classification layer, and the feature extraction and classification processes were completed. Finally, the proposed model's findings demonstrate its capability to efficiently detect breast cancer, whether benign or malignant, at an early stage. As a result, the planned mechanism has attained 99.6% more accuracy. Jabeen et al. (2022) created an approach for classifying breast cancer using US images. The proposed approach deployed a DarkNet-53 deep learning model; the breast US data is first augmented and then retrained. Following that, the pooling layer's features were recovered, and the best feature was selected using two different optimization strategies. The selected attributes are then fused using a specific approach, and the classification was performed using machine learning techniques. The suggested technique (using feature fusion and Clustered Support Vector Machines(CSVM) classifier) achieved an accuracy of 99.1%. Zerouaoui & Idri (2022) investigated 28 hybrid architectures for breast cancer imaging classification using two datasets (BreakHis and FNAC). They used different classifiers such as MLP, SVM, DT, and KNN, besides seven Deep Learning techniques for feature extraction. The DenseNet201 was utilized by the three top-ranked hybrid architectures that significantly outperformed other hybrid architectures. Using MLP, SVM, and KNN classifiers, they attained average accuracies of 93.85%, 93.21%, and 83.87%, respectively. Chowdhury et al. (2022) developed a breast cancer classification system that uses pretrained transfer learning models to extract fine-tuned characteristics prior to training on histopathological images. It assists users in classifying tissues by allowing them to upload a single histopathological image at a time. The proposed model has achieved an accuracy of 99.58%. Mohamed et al. (2022) suggested a three-stage, fully automated breast cancer detection method. First, the thermal images are decreased in size to speed up processing. Second, the region of interest is retrieved automatically using the U-Net architecture. Third, they proposed CNN architecture to classify breast tissues. Their model achieved a classification accuracy of 99.33%. Lahoura et al. (2021) established a breast cancer detection system based on cloud computing and Extreme Learning Machine (ELM) as a classifier. They used the WBCD dataset to test multiple classifiers for breast cancer diagnosis. First, the gain ratio approach was utilized to choose the most relevant features and reject extraneous information. Second, numerous cutting-edge algorithms were implemented and compared to ELM on the standalone system. Furthermore, the ELM model was implemented on the Amazon EC2 cloud platform. The experimental results proved an obtained accuracy of 98.68%.
Senan et al. (2021) classified breast cancer based on transfer learning and histopathology images. They created a network based on the AlexNet architecture. Transfer learning was used to provide efficient and accurate classification. ImageNet information can be transferred via the network as convolutional features for image histology problem classification. Despite the minimal number of images in the target data set (BreaKHis), they achieved 95% accuracy. Boumaraf et al. (2021) applied the transfer learning technique to the VGG-19 architecture. It evolved into a block-wise fine-tuned architecture on histopathology pictures after pre-training on ImageNet. Finally, they assessed the proposed approach for magnification-dependent breast cancer classification using the BreaKHis dataset. The best-obtained classification accuracies vary from 94.05% to 98.13% utilizing the DL technique for binary classification.

METHODOLOGY
Based on histopathological and ultrasound findings, the current study describes a framework for classifying breast cancer automatically and accurately with the help of convolutional neural networks, transfer learning, and the Manta Ray Foraging Optimization for parameters and hyperparameter optimization. The proposed hybrid framework is depicted in Fig. 2. Figure 2 depicts the framework, which comprises two mechanisms. The model is optimized and created with high state-of-the-art (SOTA) performance metrics in the first learning and optimization mechanism. It consists of five phases: data acquisition, data pre-processing, data splitting, classification and optimization, and production preparation. The second mechanism is the production mechanism. In it, the patient will perform an X-ray scan on the brain, and the scan will be classified using the first suggested classifier. It should be diagnosed with one of ''Healthy'' or ''Tumor''. If the scan is ''Healthy,'' the patient has a healthy brain. If the scan is ''Tumor'', then the patient will perform an MRI scan, which will be classified using the second suggested classifier to determine the

Phase 1: dataset acquisition
The datasets used in this study were obtained from two publicly available datasets on Kaggle.
In summary, the current study depends on two different types of modalities. The first is the histopathological slides, while the second is the ultrasound records. The first dataset, Breast Cancer Dataset (BreaKHis) (Elmasry, 2021), is partitioned into two classes: ''Benign'' and ''Malignant'' where each category included 2,479 and 5,304 images respectively. The second dataset, US images data (Fouad, 2021), is divided into three categories: ''Benign, '' ''Malignant,'' and ''Normal,'' with 437,210, and 133 images in each. The ''Normal'' class includes cases where there is no tumor while the ''Benign'' class includes cases where there is a tumor but is considered benign. In both datasets, data augmentation is used prior to training to up-sample and equalize the number of images per category. After equalization, the first dataset contained 10,608 images, with each class containing 5,304 images. Besides, the second dataset contained 1,311 images, with 437 images in each class. The specifications of the datasets used are summarized in Table 1.

Phase 2: dataset pre-processing
The second phase pre-processes the datasets by applying four techniques. They are image resizing, dimensional scaling, and data balancing.
Image Resizing: The used images are not equal in their dimensions. As a result, in the RGB mode, a bicubic interpolation resizing technique should be used to the size of (128 × 128 × 3). Dimensional Scaling: It utilizes four different scaling techniques, namely (1) normalization, (2) standardization, (3) min-max scaling, and (4) max-abs scaling. The equations underlying them are shown in Eqs. (1)-(4) respectively where X represents RGB image, X output is the output scaled image, µis the image's mean, σ is the standard deviation of the image.
Dataset Balancing: The dataset is imbalanced, which can enhance the misclassification or overfitting in the training and optimization process. To eliminate this issue, a data balancing technique should be utilized. The current study employs data augmentation techniques. They use shifting (width and height), shearing, rotation, flipping (horizontal and vertical axes), zooming, and brightness changes. Table 2 depicts the different utilized data augmentation techniques and their configurations.
Data Splitting: There are three subsets in the dataset: training, testing, and validation. The current study employs an 85% to 15% ratio. Initially, the entire dataset is divided into training and testing subsets based on the defined ratio. Finally, the training subset is divided into training and validation subsets based on the same defined ratio.

Phase 3: classification and optimization
The current phase optimizes the various transfer learning hyperparameters using the MRFO metaheuristic population-based optimizer (e.g., dropout ratio, data augmentation appliance, and scaling technique). The proposed mechanism seeks the best configurations for each pre-trained transfer learning CNN model. This phase implements three internal processes and repeatedly runs for a number of iterations equals T max . The processes are: (1) The process is summarized in Fig. 3 and discussed in detail in the current subsection. Initially, the MRFO population sack is randomly created once at the beginning of the classification and optimization phase. The number of solutions is N max and set to 10 in the current study. Each solution from the population sack is a vector with a size of (1 × D) where each element ∈ [0,1]. Each cell in a solution reflects a specific learning hyperparameter (e.g., dropout ratio and batch size). The current study targets to improve 15 hyperparameters as shown in Table 3. The solution indexing (starting from 1 to 15) and the corresponding hyperparameters definitions and ranges are presented in it. If data augmentation is applied, D = 15 and if not, D = 7.
After that, the fitness (i.e., objective) function score of each solution in the population sack is calculated. The objective function consists of three inner steps: (1) hyperparameters mapping, (2) model creation and preparation, and (3) model training and evaluation. The hyperparameters mapping step maps the solution into the corresponding actual hyperparameters as defined in Table 3. How does the hyperparameters mapping happen internally? A simple calculation is utilized to map the random numbers to the values of the corresponding hyperparameters. This can be done using Eq. (5). For example, if it is required to convert the model dropout ratio (i.e., the third element) from the solution numeric value to the corresponding hyperparameter. The range of the model dropout ratios to select from should be determined first (the current study uses the '' [0 → 0.6]'' range). Then, from Eq. (5), if the random numeric value is 0.85 and we have a range from 0 to 0.6, then the value is 0 + 0.85 × (0.6 − 0) = 0.51.

Value = Lower Bound + Solution[Element Index] × (Upper Bound − Lower Bound). (5)
The target pre-trained transfer learning model is created and compiled with these mapped hyperparameters after mapping each element in the solution to the corresponding hyperparameters. The current study uses the MobileNet, MobileNetV2, MobileNetV3Large, VGG16, VGG19, Xception, DenseNet201, and NASNetMobile pre-trained transfer learning CNN models. Their initial weights (i.e., parameters) are set with the ''ImageNet'' pre-trained weights. The pre-trained transfer learning model will start the training process for a number   of epochs set to 5 in the current study. After the learning (i.e., training), the pre-trained transfer learning model is evaluated on the whole dataset for validation and generalization purposes. The model's performance is assessed based on different metrics (e.g., precision, accuracy, recall, and F1-score). This study uses a variety of performance metrics illustrated in Eqs. (6) to (11).
TP represents the true positive, TN indicates the true negative, FN represents the false negative, and FP indicates the false positive. After calculating the performance metrics for each solution, the population solutions are sorted in descending order concerning the objective function score. In other words, the best solution is located at the top to determine X best , which is used throughout the process. After that, the population is updated using the MRFO equations. The MRFO works on two branches: cyclone foraging (Eq. (12)) and chain foraging (Eq. (13)). To determine which path to follow, a random value (∈ [0,1]) is generated, and the first branch is followed if the generated random value < 0.5 and the second branch is followed otherwise. After updating the solution using either path, each solution's fitness function is reevaluated again, and X best is redetermined. After that, somersault foraging (Eq. (14)) is performed. Again, each solution's fitness function is reevaluated, and X best is redetermined.
where X i (t ) is the ith solution at iteration t , t is the current iteration number, r is a random number ∈ [0,1], β and α are weight coefficients, and S is the somersault factor, r 1 is a random number ∈ [0,1], and r 2 is a random number ∈ [0,1]. Algorithm 1 explains how the population (that is, solutions) is updated using MRFO metaheuristic optimization.

The proposed framework steps
Iteratively, the steps are computed over a maximum number of iterations T max . The optimization iterations used the MRFO; the best possible model combination can then be applied to any further analysis, such as the production mechanism as prescribed in the framework. Finally, algorithm 2 summarizes parameter learning and hyperparameter optimization.

NUMERICAL RESULTS AND ANALYSIS
This section presents the test results for various applied experiments. Furthermore, it provides comments on the reported results. A summary of all experiment configurations is provided in Table 4. Table 5 summarizes the configurations of the binary dataset.  Table 7. According to the analysis, five different models recommend using the KLDivergence loss. Moreover, the Nesterov-based models are recommended by three models. Three models recommend the min-max scalers. Different performance metrics can be derived by combining the values reported in Table 6 and the learning history. There are two types of metrics presented.

Experiments with the binary dataset
The first set of metrics represents those that need to be maximized (e.g., Acc, F1-score, P, Sensitivity, Recall, S, AUC, IOU, Dice, and Cosine Similarity). The second one indicates the metrics that must be minimized (e.g., Logcosh Error, Mean Absolute Error, Mean Squared Error, Mean Squared Logarithmic Error, and Root Mean Squared Error). Table 8 reports category metrics for the first category while Table 9 reports metrics for the second category. It is obvious that the Xception pre-trained model is superior to others when it comes to the classification of two-classes datasets. There is no difference between Recall and Sensitivity regarding results or formulas.

Three-classes dataset experiments
The three-class dataset is summarized in Table 10. According to Table 11, the TP, TN, FP, and FN of the best solutions are available after learning and optimizing each pre-trained model. According to the results, the ResNet152V2 pre-trained model has the lowest FP and FN values. Conversely, EfficientNetB7 has the highest FP and FN values. Table 12 shows the optimal combination of each model. This study shows that five models recommend Categorical Cross entropy Loss. Three models recommend standardization, and four recommend normalization scalers. Three models recommend Adam as an optimizer. Seven models recommended performing data augmentation, with six recommending vertical flipping and five recommending horizontal flipping.
Different performance metrics can be derived from the values provided in Table 11 and the learning history. There are two types of metrics reported. The first aspect shows Algorithm 1: The population (i.e., solutions) updating process using the MRFO metaheuristic optimizer 1 Function UpdateMRFOSolutions (solutions,scoresList ,model,trainX ,trainY ,validationX ,validationY ,testX ,testY ) // Sort the population scores.   CUDA v.11.2,and 12 GB RAM) the metrics that need to be maximized (i.e., Acc, F1-score, P, Sensitivity, Recall, S, AUC, IoU, DC, and Cosine Similarity). On the other hand, the second represents the metrics that must be minimized (namely, Logcosh Error, Mean Absolute Error, Mean Squared Error, Mean Squared Logarithmic Error, and Root Mean Squared Error). Table 13 reports Dataset size before data balancing ''Benign' ': 2,479 and ''Malignant'': 5,304 Dataset size after data balancing ''Benign' ': 5,304 and ''Malignant'': 5,304

Graphical summarizations
Based on the experiments conducted on the suggested approach, Fig. 4 shows the best combination of different alternatives. As shown in Figs. 5 and 6, confusion matrices are presented for two-classes and three-class datasets, respectively. A graphical summary of the reported learning and optimization results is presented in Figs. 7 and 8, respectively, using two-and three-class datasets.

Related studies comparisons
The proposed approach is compared with other state-of-the-art approaches in Table 15. It compares the whole systems as black boxes. In comparison to related studies, the present study has demonstrated superior results. To 48 Step 4   deep convolutional neural network (CNN) approach combining transfer learning with the Manta Ray Foraging Optimization (MRFO) for analyzing histopathological slides and ultrasound images. Optimizing CNN hyperparameters through MRFO increases the performance of the framework. In addition, this approach uses the Manta Ray Foraging Optimization (MRFO) method to drive down the run-time resource requirements of the trained deep learning models, thus improving their adaptability and reducing their run-time costs. Several pre-trained CNNs were used for our experiments, including MobileNet, MobileNetV2, MobileNetV3Large, VGG16, VGG19, Xception, DenseNet201, and NASNetMobile. The datasets used for the study are the Breast Cancer Dataset, which is binary classified, and the Breast Ultrasound Dataset, which is classified into one of the three classes: Benign, Malignant, and Normal. Several pre-training steps are undertaken before the training process begins for each dataset, including data augmentation, image resizing, dimensional scaling, and data balancing. For two-class datasets, the Xception pretrained model is the best. However, when it comes to the three-classes data, the ResNet152V2 pre-trained model is the best. The proposed framework scored 97.73% on the histopathological data and 99.01% on the ultrasound data based on accuracy. In