A Selected Deep Learning Cancer Prediction Framework

Deep learning (DL) algorithms are crucial for predicting various diseases because they can analyze a large amount of healthcare data within a short prediction time. One of these diseases is cancer, which causes one out of six deaths worldwide. Many researchers have adopted predictive frameworks such as machine learning and DL to predict cancer prognosis, in addition to the probability of its recurrence, progression, and the patients’ survival estimation. Currently, all stakeholders are interested in the accuracy of cancer prognosis prediction. This study selected a framework within high accuracy and short prediction time from three DL frameworks for improving the performance of cancer prognosis prediction. This prediction requires a quick and high-accuracy optimizer, so we propose a binary version of the continuous AC-parametric whale optimization algorithm. This version is built on S-shaped transfer functions to identify the minimal optimal subset of features and maximize the classification accuracy. These frameworks proposed have the following forms: the first is a Feed-Forward Neural Network (FFNN) in which the input is the optimal set of feature selection. The second is an optimized parameter FFNN. The third is composed of a feature selection layer in which the best subset of selected features is for use as inputs in the optimized FFNN. We compared these frameworks using a comparative study. Our results show that, under all conditions, the third framework is superior to the others with an average accuracy of 100%, whereas the first and second frameworks achieved 94.97% and 93.12% accuracy, respectively.


I. INTRODUCTION
Cancer is the second disease that causes one out of six deaths worldwide [1]. In 2020, the International Agency for Research on Cancer predicted 19,300,000 new cases and 10,000,000 deaths [2]. The reason for growing concern about cancer is that it weakens the immune system and causes an imbalance in other biological changes. The most common kinds of cancer are breast, colon, cervical, lung, prostate, and ovarian cancer.
Several previous studies have introduced frameworks for predicting the prognosis of cancer, the probability of recurrence and progression, and estimating patients' survival [1]. All stakeholders, including patients, their caregivers, and providers, are interested in the accuracy of cancer prognosis prediction. One of the factors that contribute to the effective treatment of patients is prediction accuracy [3]. Disease detection involves the classification of tumor types and identification of cancer symptoms to train a machine that can identify new metastatic tumor types or diagnose a disease at an early stage because treatment in later stages is more difficult. However, due to the enormous number of gene expression levels in a person, diagnosing cancer might be challenging. The basic difficulties linked to the treatment and prevention of illnesses are recognized to be inscribed by gene expression levels [4].
As precision medicine and early detection procedures have developed recently [4][5][6] with many detector screens reaching 70%-80% [7], the demand for new machine learning (ML) approaches for discovering new biomarkers has become one of the primary drivers of most biomedical research.
Deep sequencing is a DNA fractionation technique that has changed genomic science significantly. The progression of deep sequencing over the past decade has continuously generated huge volumes of data making genomics among the top fields of data generation [8]. Although the sequence itself cannot explain ready-to-use information, it can be converted using a complicated procedure that deduces protein drawn from the sequence. As the designed genome sequence agrees with historically identified cancer genome sequences, it evaluates the expression of the protein and checks whether it is cancerous [9]. The genomic data collection has caused several difficulties in providing a logical description of cancer's genetic origin. Moreover, cancer prognosis is complex due to the existence of genomic datasets containing several features but comparatively few samples. Early diagnosis increases the chances of healing; thus, its study is crucial.
The H2O framework is a multi-layer neural network (NN) system suitable for DL tasks [10]. The Deep Learning Architecture (DLA), including multiple levels of non-linearity, is a hierarchical model for extracting features. DL models can learn to represent the usable original data. Moreover, they show the best output for complicated data such as text, images, and audio [11].
Single-solution and population-based algorithms are two classes of meta-heuristic algorithms. In first-class, an optimization algorithm performs the optimization process using only one candidate solution that evolves and gets updated during iterations, whereas the second class performs the optimization process with an initial random search agent representing the population. The solution to the optimization problem is a candidate for each search agent. Individuals exchange data on the search area and work together to prevent local optima stagnation and coverage toward a global purpose. Many studies have used different optimization algorithms for resolving decision-making problems [12][13][14][15][16]. Notably, a metaheuristic algorithm's quality is determined by its ability to achieve proper control and considerable balance between exploitation and exploration [17]. Exploitation is the ability to discover optimal solutions surrounding the best-known solutions. Exploration is the effort of using metaheuristics to locate novel locations in a search space having better points. Most metaheuristics use exploration early in the optimization process to thoroughly examine the feasible region and prevent a recession in the local optima.
Based on the above-mentioned reasons, multiple metaheuristic techniques have been adopted with wrapper methods to produce an acceptable solution in an acceptable time.
In addition to using a single optimization algorithm to resolve the Feature Selection (FS) problem, researchers have proposed different hybrid approaches to solve binary optimization problems. A hybrid approach between Whale Optimization Algorithm (WOA) and simulated annealing is studied in [18] and that of Genetic Algorithm (GA) and Particle Swarm Optimizer (PSO) is reported in [19]. Note that a hybrid approach between filter and wrapper methods of FS has been previously studied [20,21].
In the FS problem, there is no guarantee that a better subset of features will be identified. Moreover, no optimizer is ideally suitable to solve any optimization issues using the No Free Lunch (NFL) theorem [22]. This explains why certain optimizers poorly work when solving certain optimization problems.
This paper is structured as follows. The motivations and contributions of the study are in Section 2. A literature review, a short overview of the FFNN, and AC-parametric WOA (ACP-WOA) are presented in Section 3. The proposed BACP-WOA-S and designed frameworks are described in Section 4. The experimental results are discussed in Section 5. Finally, the conclusion and future work are presented in Section 6.

II. MOTIVATIONS & CONTRIBUTIONS OF THE STUDY
The motivations of this study are as follows: 1) Select a framework from three proposed DL H2O frameworks that deal with big data for improving the performance and prediction of cancer prognosis. 2) Propose a very accurate and fast optimizer, Binary AC-Parametric WOA (BACP-WOA-S), which is required for FS to reduce the size of the dataset used and also to tune FFNN (number of layers and number of neurons per layer). The relevant contributions of this study are as follows: • DL frameworks support a large amount of data in various formats. Because patient health data contains multisource data, suitable for cancer prognosis prediction. • FS is an approach to efficiently selecting optimal features for NN training; thus, potentially improving cancer prognosis prediction and reducing the size of input data to FFNN. • The three proposed frameworks can be used in biomedical diagnostic applications to improve the prediction accuracy of the diagnosis of disease. However, the third framework has the highest accuracy at the same time, it has been the nearest time of the first framework. • The frameworks have been evaluated on six benchmark datasets, including breast, cervical, colon, lung, prostate, and ovarian cancers, to demonstrate their reliability and efficiency. These datasets are publicly available and are still used in most current studies [23,24,25,26,27,28,29]. • The advantages and efficiency of BACP-WOA-S are compared with other common optimizers. • 100% accuracy is achieved in predicting all types of cancers, which benefits patients, as the earlier the treatment begins, the better the chance of cure. This study will help in selecting the most suitable framework and accurate one based on the case's severity (case of critical patients or cases of non-critical patients).

III. RELATED WORK
Several studies reported on cancer diagnosis have used various methods for cancer diagnosis prediction; some of these methods have demonstrated significant prediction accuracies. Table 1 lists the results of previously reported methods.
Some researchers use different ML classifiers such as knearest neighbor (KNN), LR, DT, RF, and SVM to improve treatment and medication discovery for diagnosis. In [23] performed on cervical cancer dataset. Also, in [24, 25] performed on four, six different breast cancer datasets respectively. But, in [26] performed on two different colon cancer datasets.
In [29] performed on different microarray data using multilayered DL algorithm to detect the type of disease.
In [30] proposed an ensemble of Artificial Neural Network (ANN) classifiers for the classification of microarray data. They used four different cancer datasets.
In addition to other forms of omics data, [34] proposes combining gene FS with cancer classification for gene expressions.
To dimension reduction method, in [35] performed on SVM and LMBP using Principal Component Analysis (PCA). Also [38] worked on ANN and GA using PCA.
The reason for using DL over other traditional methods is that we expect to handle the big data size problems. Also, the prediction time for cancer diagnoses is very critical as the patient's life depends on this time, especially in critical cases. DL is the most suitable methodology to use in this case. As it needs to have the high-end infrastructure to train in a reasonable time.
This section is organized as follows: First, a detailed description of the artificial neural network used in our proposal is described in subsection A. In subsection B: the optimized Whale algorithm (AC-PARAMETRIC WOA) used in the proposed framework is described.

A. Feed-Forward Neural Network (FFNN)
The ANN was inspired by the human neural system. It is used in various applications, including pattern recognition, optimization, and control. The FFNN is similar to ANN in that nodes' connections do not build a loop [39]. The important reason for selecting this layer is that the input data are simply generalized and time scales where disruptions are easier to categorize.
In the FFNN, the neuron is the essential part; however, data flows in numeric form between m neurons in the past layer to one neuron i, and the data flow over as an aggregate as follows: where is the weight of contact between neuron j of the past layer to the present neuron i.
= relating data and 0 = ingrained threshold for neuron i is considered as a standard weight.
A popular approach is to use activation function "a" to generalize hidden neurons or output neurons ( Table 2). The network comprises three layers, input, hidden, and output layers, with the neurons of the input and output layers as m and n, respectively ( Figure 1).

TABLE 2. Activation Functions
The aim is to identify an FFNN weight set that correctly reflects the relationship between the input vector and desired output vector. The network has been training on a collection of P input-output vector combined with an error back propagation algorithm [40] that can minimize the execution work for the pattern using the following equation: where E is the total mean sum squared error between the measured outputs, is actual state, and is desired state. z and g denote the values for the training set and component of the output vector.

Ref
Year

Name Activation Function
Linear

1) ENCIRCLING PREY
In this strategy, the humpback whale starts with the best applicant solution, which is the target prey. Consequently, the remaining search agents update their positions to those of the best search agents. This behavior can be mathematically formulated as follows: where i is iteration, ⃗ and ⃗ are coefficient vectors, and ⃗ * is the location vector of the optimal solution obtained. We measure ⃗ and ⃗ by ⃗⃗⃗ is defined as vectors in [0;1]. We require to linearly reduce ⃗ ⃗ vector from two to zero values through the iterations.

2) BUBBLE-NET ATTACKING METHOD (EXPLOITATION PHASE)
The bubble-net attacking behavior is divided as follows. 1) Shrinking encircling mechanism: This behavior of humpback whale was achieved by reducing the value of from two to zero in (5) over the number of processes, where is a random value in the interval [-, ]. Between the original positions and the current best position, a search agent's new positions can be described.
2) Spiral updating mechanism: The distance between the whale and the prey can be described by where ⃗ ⃗  is the distance from the population optimal solution to the current individual whale, b is a fixed value used to describe the logarithmic spiral shape, and l is a value inside range [−1, 1]. To switch between these two mechanisms, 50% of it is assumed the shrinking mechanism, and the other 50% is the spiral mechanism.
Consequently, the mathematical model is as follows: where p is a value defined in [0, 1]. Figure 2 shows the exploration phase, ⃗ can use the random values within 1 ≺ A ≺ −1 to move the agent away from the reference whale. The new position of a search agent is discovered by selecting an agent randomly that allows the WOA to perform a global search.

3) SEARCH FOR PREY (EXPLORATION PHASE)
where rand is a random position vector selected from the current population. Multiple parameters are used to customize the algorithm during these two phases; some of them have a considerable impact on efficiency. Most whale variants are capable of achieving higher efficiency. Moreover, researchers had devoted little interest to the intelligibility of the typical whale. In [42], the suggested revisions to the parameters a, a2, A, and C could affect the WOA exploration and exploitation. The following equations describe these modifications.
As realized, a and a2 are time versions that slowly reduce the change range when the denominator is changed to the square of maximum iterations. The fluctuation of a transition is more effective than standard randomization after the alteration of A and C is transformed into a sinusoidal change. Figure 3 shows the flowchart of the modified ACP-WOA.

IV. THE PROPOSED CANCER PREDICTION FRAMEWORK
The most crucial factors for a cancer patient are accuracy and prediction time since the higher the prediction accuracy of the disease, the higher the chance of the patient receiving treatment. Meanwhile, a long prediction time decreases the patient's chance for receiving a treatment.
This paper proposes three different cancer prediction frameworks. These frameworks are used to select the most suitable one based on the case's severity (case of critical patients or cases of non-critical patients).
The main contributions of this work are to: A) propose three different frameworks using the FFNN. B) propose a new binary optimizer, namely S-SHAPED BINARY AC-PARAMETRIC WOA (BACP-WOA-S) for binary classification problem, Feature Selection and tuning the FFNN. First, the details of these frameworks are described below.

A. THE PROPOSED FRAMEWORKS BASED ON THE FFNN
The main objective is to propose three different deep learning frameworks then select the framework that has the best accuracy and less processing time for improving the performance of cancer prediction based on the severity of patients.
An FFNN was used for all frameworks. The whale algorithm was modified for this purpose. We built simple NN frameworks with three to four layers. The first framework is an NN, and its input is the optimal set of FS, whereas the second framework is an optimized parameter NN. The third framework comprises the above two frameworks.
These frameworks presented take the following form: the first is a FFNN in which the input is the optimal set of feature selection. The second is an optimized parameter FFNN. The details of the first and second frameworks are shown in appendix A.
The third framework, namely, the FFNN framework with FS and best configuration, is composed of a feature selection layer in which the best subset of features is selected for use as inputs in the optimized FFNN.
In this framework, H2O is used for processing big data. The framework consists of four main layers. The processing, feature selection, deep learning, and prediction layer, as shown in figure 4. 1) The processing layer: Often, the medical datasets collected from PCs or sensors may be incomplete, inconsistent, and contain errors. This may cause a classification problem. Normalization methods: features having values of varying degrees of magnitude, may hurdle the performance of some ML algorithms. So, these methods may be used to scale their values between 0 and 1. Data imbalance reduction: features imbalance refers to a classification problem where some features are highly underrepresented. This causes the classifier to bias towards the majority of features. The algorithm mentioned in the research [43] was used to treat the problem of the imbalance dataset by over-sampling the minority class instances.
These error values should be eliminated before dividing the dataset into training, testing, and testing subsets. To evaluate the performance of this layer, six benchmark datasets are used. The first dataset is for breast cancer. The second and third are for colon, cervical cancer, whereas the fourth is for lung cancer. The fifth is prostate cancer. The sixth is ovarian cancer. We perform normalization on certain columns in these datasets during the first phase of pre-processing. The values of these columns have been rescaled. The second stage of pre-processing involves removing unnecessary columns such as the id column. The labels for each entry in the dataset are changed from string values to numeric values in the third and final phase of preprocessing. Each dataset has two classes (Malignant and Benign), which are converted to 0 and 1, respectively.
2) The FS layer: In this layer, a recent binary variant of WOA, BACP-WOA-S, was proposed to solve the issue of FS. Therefore, the number N of features in the dataset will have to be 2N, which represented a large area of features to be extensively searched for feature reduction. The proposed algorithm was employed to adapt the search space for achieving the optimal feature combination. The proposed algorithm was employed to adapt the search space for achieving the optimal feature combination. Moreover, the lesser the selected features, the better the solution. A special fitness function was used to measure each solution; the function was based on two primary objectives: the number of selected features in the solution and the error rate. For these aims to be achieved, we employed an NN. Moreover, these stated features were used to train the NN for achieving the most efficient model using our fitness function, which is defined as follows: ER(D) is the error rate, |N| is the number of features, |L| is the chosen feature's subset length, α determines the importance of the classification quality, and β corresponds to the feature reduction. We illustrate them as follows: α ∈ [0; 1], β = 0.01, and β = 1 -α are adopted from [44]. This fitness function indicates that classification quality and feature subset length are of distinct importance for the FS problem. In our experiment, we select parameters based on trial and error in modest and common simulations such as α where a high value ensures the optimal position or at 6 least is reduced by a real rough set.
3) The DL layer: This layer represents FFNN with the best settings using the proposed optimizer. The FFNN is trained to use a selected subset of features with structure parameters such as the number of layers, the number of neurons in the hidden layer, biases, and activation function are 3, 10, random, and TanH, respectively. As for the initial weights, H2OFrame IDs initialize the weights such that the default initial_weight_distribution and initial_weight_scale parameters are uniform adaptive and one, respectively. Moreover, the training parameters learning rule and sum-squared error are Levenberg-Marquardt and 0.01, respectively. The FFNN is then trained to use the features and tested using the validation data. Subsequently, the error rate that is utilized to measure the fitness value is resolved. All the iterations and solutions in the population were achieved with previous tasks. Furthermore, the proposed BACP-WOA-S, binary WOA (BWOA), binary PSO (BPSO), binary GA (BGA), and binary gray wolf optimizer (BGWO) algorithms are examined in this layer. Each optimizer generates the best solution, and it is verified using the test data after the optimization process is performed. During the last testing process, various metrics were enlisted for comparison. The BACP-WOA-S uses training and validation data portions during the optimization process and for testing data after optimization. Therefore, we ensure that every optimizer examines the same data set portions in every iteration. In this manner, a fair comparison is obtained. 4) The Prediction layer: The proposed optimizer, in the next subsection, is used in this layer to select a solution to an optimization problem that requires the resolution of the two functions: exploration and exploitation. to identify the infected cases.

B. THE PROPOSED ALGORITHM: S-SHAPED BINARY AC-PARAMETRIC WOA (BACP-WOA-S)
A continuous version of ACP-WOA uses (4) to move search agents inside the search space for adjusting their positions to any point. This process is called continuous space. Naturally, the FS problem is binary; if there is no alteration, the continuous version of ACP-WOA cannot be used to resolve the FS problem. Therefore, we propose a BACP-WOA-S version that is appropriate for resolving the FS problem. BACP-WOA-S binary indicates that only binary solutions [0, 1] are required for the candidate's options. The feature is not selected if it has a value of 0; however, it is selected if it has a value of 1.
To convert the solutions of ACP-WOA from continuous to binary, we first scale the values to be in the interval [0,1]. As per a previous study, the conversion is achieved using an S (Sigmoid)-shaped transfer function (TF). Its family has four TFs: S1, S2, S3, and S4 (Table 3). Thus, the elements of the location vectors must be transformed from 0 to 1 and vice versa, forcing agents to move in a binary space. Table 3 lists the mathematical formulations for each TF, whereas figure 5 shows the mathematical curve of S-shaped. Algorithm 1 shows the steps involved in BACP-WOA-S. We use the following formula, proposed by Kennedy and Eberhart [45], to convert scaled continuous values to binary values.
where ( +1) is the updated binary position at specific iterations i and dimensions d, and sigmoid(x) is given by (17)- (20).

V. Experimental Results and Discussion
We conducted four experiments. The first experiment was to test the performance of the proposed optimizer, whereas the rest was to test the performance of the three frameworks. All experiments were conducted on Intel® Core™ i7-2.90 GHz processor with 32-GB RAM and an NVIDIA Quadro M2000M GPU. Based on datasets, four experiments were conducted for this purpose.

A. DATASET DESCRIPTION
The experiments were conducted on six benchmark datasets that have two labels: B = Benign and M = Malignant.
• Each sample is described by 15154 genes. It was collected from ovarian cancer patients [28]. As descriptive examples of issues that the proposed frameworks could solve, the datasets were selected to include several features, classes, and instances ( Table 4). The instances were randomly categorized into three equivalent subsets in each dataset. These subsets are called train (80% of the data), test (5% of the data), and valid (15% of the data) in a cross-validation manner.

B. Evaluation Metrics
Different evaluation matrices may be used to evaluate the performance of the trained model, such as the confusion matrix illustrated in figure 6, which is commonly used to offer various classification metrics and performance evaluation parameters.

C. PERFORMANCE ANALYSIS OF THE PROPOSED BACP-WOA-S 1) EXPERIMENT #1
The first experiment depends on the second layer of the first framework and the third framework. The proposed BACP-WOA-S was tested against BWOA, BPSO, BGA, and BGWO algorithms to verify its performance. The algorithms were compared in terms of the average error, average fitness value, average selection size, and average standard deviation.
The configuration values are listed in table 5. The results of this experiment are summarized in tables 6-9.  Tables 6-9 show cumulative results for all optimizers on six datasets. BACP-WOA-S showed superior results as it achieved the lowest average error, average selection, and standard deviation values, indicating its superior stability to other algorithms. In other words, BACP-WOA-S outperformed other algorithms, such as the FS algorithm for the fitness value on all datasets, except for cervical and lung cancer datasets because they were relatively modest in terms of features, which indicated that BACP-WOA-S could select the optimal subset of features showing the lowest error.

2) EXPERIMENT #2
The accuracy of the proposed optimizer is checked against the accuracy resulted from WOA, GA, PSO, and GWO optimizers. This is made to assert that the proposed method improves the accuracy. The results obtained are shown in the following figures for 50 iterations. These figures show the superiority of the proposed algorithms over the others.   The more results obtained are given in appendix D.

D. PERFORMANCE ANALYSIS OF THE PROPOSED FRAMEWORKS
In Experiments 3-5, the proposed frameworks were compared in terms of accuracy, precision, recall, F1-score, specificity, confusion matrix values, computational time, mean squared error (MSE), and logarithmic loss (log-loss) values.

1) EXPERIMENT #3 TESTS THE BEHAVIOR OF THE 1ST FRAMEWORK
This procedure demonstrates the effectiveness of the first framework in classifying cancer cases into four layers: preprocessing, FS, DL without optimizing FFNN, and prediction. Table 10 summarize the performance of cancer datasets. As shown in this table, the first framework achieved an average of 94.97% accuracy, 95.15% precision, 93.05% recall, 94.9% F1-score, 96.51% specificity, 0.113 MSE, and 0.409 log-loss for all cancer datasets (See appendix B, figure  9)

3) EXPERIMENT #4 TESTS THE BEHAVIOR OF THE 2 nd FRAMEWORK
This procedure demonstrates the effectiveness of the second framework in classifying cancer cases into three layers: preprocessing, DL with optimizing FFNN, and prediction.  figure 10).

4) EXPERIMENT #5 TESTS THE BEHAVIOR OF THE 3 rd FRAMEWORK
This procedure demonstrates the effectiveness of the third framework for classifying cancer cases into four layers: preprocessing, FS, DL with optimizing FFNN, and prediction. Table 12 summarize the performance of cancer datasets. As shown in this table, the third framework achieved an average of 100% accuracy, 100% precision, 100% recall, 100% F1score, 100% specificity, 0.085 MSE, and 0.309 log-loss for all cancer datasets (See appendix B, figure 11).
Notably, the best performance was obtained using the third framework (Tables 10-12) (See appendix C, figure 12)

5) EXPERIMENT #6 WILCOXON'S RANK-SUM
The p-values of the proposed 3rd framework are calculated using Wilcoxon's rank-sum test. This test determines if the outputs of the proposed have a significant difference or not. If the p-value < 0.05, it means that the 3rd framework results have significantly different. Otherwise, a p-value > 0.05 means that the results have no significant difference. Table 15 shows the results of the p-value and mean of accuracy by using t-Test. Where the p-value values are smaller than 0.05. This proves the superiority of the 3rd proposed framework and it is statistically significant.  In table 13, the accuracy of the first framework is less than the third, despite their proximity in time. Moreover, the third framework is better than the second framework, but with a comparatively longer prediction time. The accuracy of the second framework is slightly lower than that of the third framework, but with a much shorter prediction time. As a result, for noncritical patients, the third framework is suitable because of its high accuracy, whereas the second framework is suitable for critical patients because of its shorter prediction time. The bolded letters in the table represent these results.
In table 14, the proposed method uses the optimizer BACP-WOA-S to select the optimal set of features that are used as input to optimized FFNN (best number of layers and number of neurons). The table shows the results obtained from the proposed framework (100% breast, 100% colon, 100% cervical, 100 lung, 100% prostate, and 100% ovarian cancer datasets). The bolded letters in the table represent the best results. The 3 rd framework outperforms the other frameworks for the breast, colon, cervical, prostate cancer datasets. Regarding the lung and ovarian cancer datasets, the 3 rd framework, Xiongshi, D., et al. [28], Wu, P., et al. [34], Adiwijaya et al. [35], Saqib, P., et al. [37] and Cahyaningrum, K., et al. [38] outperform the other framework.

F. DATA ANALYSIS
First, as we mentioned earlier, cancer is the second disease that causes one out of six deaths worldwide. In 2021, ovarian cancer was one of the most common causes of cancer death, whereas 13,770 women died [49]. In addition, lung (1,800,000 deaths), colon (935,000 deaths), and breast (685,000 deaths) were some of the most common causes of cancer death in 2020 [49]. Prostate and cervical cancer were the most common causes of cancer deaths in 2019 and 2018, causing the deaths of 31,638 males and 34,000 women, respectively [49]. These large numbers of deaths can be reduced if cases are detected and treated early because the most crucial factors for a cancer patient are accuracy and prediction time, since the higher the prediction accuracy of the disease, the higher the chance of the patient receiving treatment. Meanwhile, the cost of treatment is lower.
Second, according to previously mentions reasons, this paper proposed three different cancer prediction frameworks each with a different prediction time. According to the experimental results, for noncritical patients, the third framework is suitable because of its high accuracy, whereas the second framework is suitable for critical patients because of its shorter prediction time.
Third, in proposed frameworks, the modified optimizer is required for FS to select optimal features and also to tune FFNN (number of layers and number of neurons per layer). When using DL, the accuracy increases as the number of inner layers increases, but the more the layers increase, the longer the processing time becomes. Therefore, we propose three frameworks, 1 st with one layer, 2 nd with two layers, and 3 rd with three layers, to choose one of them with high accuracy and less time.

VI. CONCLUSION AND FUTURE DIRECTIONS
In this study, we proposed a selected DL cancer prediction framework based on the dynamic group to achieve a balance between exploration and exploitation. The proposed BACP-WOA-S divides the solutions into two methods: the first group is responsible for exploration and the second is responsible for exploration. The first group applies two techniques: search around individuals and mutation. Similarly, the second group applies two techniques: move toward the leader and search around the leader.
In this study, we selected a framework from three DL H2O frameworks based on their accuracy and processing time. The performance of the FFNN framework was better in breast, colon, cervical, and prostate cancer but [28,34,35,37,38] in the lung, and ovarian cancer datasets, which are the same as our result. The proposed algorithm achieved an accuracy of 100%, 100%, 100%, 100%, 100%, and 100% for breast, colon, cervical, lung, prostate, and ovarian cancer datasets, respectively. Although the selected DL framework achieved higher accuracy in the six benchmark datasets, we ensured the stability of the control scheme because it was designed in the proposed framework.
In the future, new metaheuristic algorithms will be tested against our proposed algorithm. Further, the proposed algorithm will be applied to solve an additional binary classification problem. The impact of increasing the difficulty of datasets used in DL is still under study. From tables 6-9, we can obtain the average 0.0046 error, 0.350 selection size, 0.0186 fitness value, 0.0245 standard deviations for the proposed algorithm which outperforms other algorithms such as the FS algorithm for the fitness value on all datasets, except for cervical cancer and lung cancer datasets. Therefore, we plan to apply the broad learning system (BLS) to improve the result.

APPENDIX A 1) FFNN FRAMEWORK WITH FEATURE SELECTION
In the first framework, H2O is based on four layers. The first layer is the pre-processing layer. The second layer is where BACP-WOA-S is used with the FS layer to select the optimal features that will be used in the third layer, which is the DL layer. To train the NN, the third layer uses the optimalselected features and NN's default settings. The presented optimizer is used in the final layer to identify the infected cases. (Appendix A -figure 7)

2) FFNN FRAMEWORK WITH BEST THE CONFIGURATION
In the second framework, H2O comprises three layers. The second layer is where the DL uses the best settings of the NN to train the FFNN for obtaining the best classification accuracy. (Appendix A, figure 8)

APPENDIX C
The following shows the python program flowchart in detail for the third proposed framework: