An Amalgamated Approach to Bilevel Feature Selection Techniques Utilizing Soft Computing Methods for Classifying Colon Cancer

One of the deadliest diseases which affects the large intestine is colon cancer. Older adults are typically affected by colon cancer though it can happen at any age. It generally starts as small benign growth of cells that forms on the inside of the colon, and later, it develops into cancer. Due to the propagation of somatic alterations that affects the gene expression, colon cancer is caused. A standardized format for assessing the expression levels of thousands of genes is provided by the DNA microarray technology. The tumors of various anatomical regions can be distinguished by the patterns of gene expression in microarray technology. As the microarray data is too huge to process due to the curse of dimensionality problem, an amalgamated approach of utilizing bilevel feature selection techniques is proposed in this paper. In the first level, the genes or the features are dimensionally reduced with the help of Multivariate Minimum Redundancy–Maximum Relevance (MRMR) technique. Then, in the second level, six optimization techniques are utilized in this work for selecting the best genes or features before proceeding to classification process. The optimization techniques considered in this work are Invasive Weed Optimization (IWO), Teaching Learning-Based Optimization (TLBO), League Championship Optimization (LCO), Beetle Antennae Search Optimization (BASO), Crow Search Optimization (CSO), and Fruit Fly Optimization (FFO). Finally, it is classified with five suitable classifiers, and the best results show when IWO is utilized with MRMR, and then classified with Quadratic Discriminant Analysis (QDA), a classification accuracy of 99.16% is obtained.


Introduction
A cancer is nothing but the abnormal growth of cells in the affected region, and it has the ability to spread to various regions of the body [1]. Colon cancer is one of the commonly occurring cancers, and it happens due to genetic, lifestyle, and aging factors. Other risk factors associated with it are lack of physical activity, obesity, diet issues, and smoking [2]. The main symptoms include blood in the stool, weight loss, fatigueness, and changes in the bowel movements. Often started as a benign tumor in the form of a polyp, later it becomes cancerous [3]. Treatments for colon cancer include radiation therapy, targeted therapy, chemotherapy, and surgery. The cancer may be cured if it is confined within the walls of the colon, but if it has spread widely, then it is not curable, but managed to a certain extent with improvement in life style quality [4].
For the identification of cancer disease, the microarray data classification technique is utilized widely [5]. To monitor genome wide expression, one of the vital tool that many biologists use is microarray technology. In the form of gene expression differences, the formulation and acquisition of data from tissue samples are obtained. Generally, huge size of scientific data brings a lot of problems to the researchers who are trying to identify the useful information for the application of data mining techniques to be used [6]. This tremendous amount of microarray data is also quite asymmetric in nature, as the number of genes ranges from a few hundreds to many thousands [7]. So, classification with this huge amount of data is difficult as it increases computational cost thereby degrades the performance of the classifier. Therefore, for such asymmetric data, it is very difficult to utilize the traditional classifiers, and therefore, for the analysis of microarray data, dimensionality reduction is highly required. A rank-based approach is mostly utilized to select the dominant features in the high dimensional data analysis [8]. Some of the common ranking approaches used in literature are Information gain,t-test, ANOVA, Relief F, BW ratio,t-statistic, Fischer score, correlation-based feature selection, Wilcoxon score test, Wilk's Lambda score, and Signal to Noise Ratio (SNR) Euclidean distance [9]. In this work, multivariate MRMR is used to select the top 600 genes. Later, with the optimization of using 6 techniques, the best 30, 60, and 90 genes are selected. Generally, the main intention of feature selection is multifarious as the comprehensibility of the classifier model mitigates, the unbalanced number of features and sample proportion reduces. For microarray-based classification of colon cancer, a few famous works reported in literature is given below.
A feature selection from colon cancer dataset for cancer classification using Artificial Neural Networks (ANN) was done by Rahman and Muniyandi [10]. The gene expression analysis was used to find out the risk analysis of colorectal cancer incidence by Shangkuan et al. [11]. Based on machine learning and similarity measures, gene selection and classification of colon cancer microarray data were done by Liu et al. [12]. Using multiple machine learning paradigms, the statistical characterization and classification of colon microarray gene expression data were done by Maniruzzaman et al. [13]. The prediction of colon cancer with genetic profiles utilizing intelligent technique was done by Alladi et al. [14]. For the diagnosis and survival prediction of colon cancer, ANN was proposed by Ahmed [15]. The polygon models for grandular structures [16] and the detection and classification of nuclei in routine colon cancer histology images were done by Sirinukunwattana et al. [17]. A deep learning-based tissue analysis prediction outcome in colorectal cancer was done by Bychkov et al. [18]. The colon cancer classification analysis using machine learning in DNA microarray data was used by Cho and Won [19]. An evolutionary neural network was utilized to predict the colon cancer by Kim and Cho [20]. A classification framework applied to cancer gene expression profiles was done by Hijazi and Chan [21]. A hybrid gene selection algorithm based on interaction information technology was utilized for microarray-based colon cancer classification [22]. A gene selection methodology based on clustering for classification tasks in colon cancer was done by Garzon and Gonzalez [23]. A hybrid gene selection method using MRMR and Artificial Bee Colony (ABC) was utilized for colon cancer classification by Alshamlan et al. [24]. A random subspace aggregation for colon cancer prediction was done by Yang et al. [25]. A supervised locally linear embedding technique with correlation coefficient was utilized for colon cancer microarray classification by Xu et al. [26]. Genetic programming was used for colon cancer classification by Vanneschi et al. [27]. Sparse representation for classification of colon tumor was done by Hang et al. [28]. A standardized comparative analysis of biomarker selection techniques was done by Dessi et al. [29]. A Node Influenced Method (NIM) for colon cancer classification was also used [30]. However, in this work, multivariate MRMR with six optimization techniques is used. The organization of the work is as follows. In Section 2, the materials and methods are given followed by the usage of MRMR technique to select the genes. In Section 3, the second level optimization using different optimization algorithms is done, and in Section 4, classifiers are explained followed by results in discussion in Section 5 and concluded in Section 6.

Materials and Methods
For the colon cancer classification, a dataset was used which is publicly available online [31]. There are about 2000 genes here. Class 1 represents the tumor class with 40 samples, and Class 2 represents the healthy class with 22 samples, and totally, there are 62 samples. The details of the dataset are tabulated in Table 1 The illustration of the work is shown in Figure 1.  [32] and with the help of several statistical measures, it is done. The information which a random variable gives about another random variable with respect to both the gene activity and class label can be assessed by the Mutual Information (MI). For both continuous and categorical variables, this method can be applied. For discrete variables, MI is utilized to seek genes that are not redundant R and are maximally relevant Twith an assigned target label and expressed as where MI represents Mutual Information, i, j represents the genes, jFj represents the number of features in N, and h represents the class label. For continuous variables, the F-statistic (ANOVA test) is utilized to trace the maximum relevance between a gene and a class label. To minimize redundancy, the measurement of the correlation of the gene pair of that class is done as where the F-statistic is expressed as F, i, j are the genes, and the class label is represented as h. The number of factors in N is jNj; c represents correlation. It is utilized together with entropy. To analyze the relevance and redundancy of a gene cluster, then normalized MI is utilized, and the combination of the most relevant genes is traced. For continuous variables, linear relationships are replaced by MI. For both discrete and categorical data, this method gives lower error accuracies.

Optimization Techniques
The solution of the best element from a particular set of available alternatives can be done with the help of optimization techniques [33]. Application ranging from computer science, economics, biology, mechatronics, etc. has utilized optimization techniques predominantly based on their necessity. Therefore, optimization is nothing, but the minimization of a real function by means of choosing the input values systematically from a specific set and then the value of the functions is computed. Therefore, in a defined domain, to find the best available values of a particular objective function, optimiza-tion is used. This work utilizes the usage of 6 optimization techniques to find the best values of 30, 60, and 90 features/genes from the 600 shortlisted genes so that it can be well classified, thus forming a very nice, amalgamated approach 3.1. Invasive Weed Optimization. A famous population based meta heuristic algorithm is IWO [34]. By utilizing the randomness and imitating property of weeds colony, the general optimum of a mathematical function is found out. A serious threat to crops is the growth of weeds as they have an offensive growth habit. They are very powerful as they are quite adaptable and resistant to environmental changes. A powerful and simple optimization algorithm is obtained when their characteristics are considered. Initiation of three different qualities of a weed such as randomness, resistance, and adaptability is considered by this algorithm. In agriculture, this technique is inspired by the methodology having colonies of invasive weeds. A weed is nothing but a plant which grows all of a sudden and unintentionally; though when weeds grow in other places where it does not interfere with the basic human needs, then it is not considered as a problem. Based on the colonized weed, a simple numerical optimization algorithm has been proposed, and it is called as IWO algorithm. This algorithm is very powerful and effective in optimal solutions convergence with the help of utilizing preliminary features such as seeding, growth of it, and competition in a weed colony. Some basic features by the method to simulate the habitat behavior of weeds are considered as follows: (1) Initialization of primary populations: in the search space, the distribution of limited number of seeds is done (2) Process of reproduction: a flowering plant is obtained from each seed, and again, flowering plant pushes seeds based on their fitness value. In a linear manner, there is a decrease in the number of grains of grasses from A max to A min as follows: (3) Spectral Spread Method: the seeds obtained by the group are represented in a normal distribution with a mean standard deviation and is expressed as where the number of maximum iterations is represented as T, the current standard deviation is σ t , and the nonlinear modulation index is represented as m.

BioMed Research International
This equation convinces that in a nonlinearly manner, there is a decrease of the fall of grains so that more fit plants are produced, and inappropriate plants are eliminated.
(4) Competitive deprivation: if the number of grasses becomes higher than the maximum number of grasses in the colony ðC max Þ, the grass having the worst fitness value is removed from the colony so that in the colony, a standard number of herbs remains.
(5) Until the maximum number of iterations are reached, this process continues, and then, storage of minimum colony cost function of the grasses is done.

Teaching
Learning-Based Optimization. One of the famous population-based optimization techniques is TLBO where a classic teaching-learning phenomenon is mimicked within a particular classroom environment [35]. Here, a group of learners is assumed as population, and various design variables are assumed as different subjects provided to the learners. Therefore, the learner's results are highly analogous to the fitness value of the optimization problem. The last solution in the entire population is assumed as the teacher. Teacher phase and learner phase are the two important phases of TLBO, and the two phases are elaborated as follows: 3.2.1. Teacher's Phase. In this stage, the learning is done by the learners from the teachers. The enhancement of the mean of the whole class to the learner's level is tried by the teacher in this phase. Between the existing mean and the new mean, the difference is expressed as where E new represents the new mean for the j th iteration andE j represents the mean for each design variable. Two randomly generated parameters are applied within the equation: n j is the range number between 0 and 1. The teaching factor is represented as T F , and here, in our work, it is set as 2. By setting the value as 2, it has a major effect on the value of the mean to be changed. The role of the adjusting factor is played by T F in this algorithm which helps to control the scale and moving direction when the solutions are updated. In a random manner, the value of T F is decided and is represented as The existing solution is updated based on this Diff Mean according to the following expression as: 3.2.2. Learner's Phase. In this second part of the algorithm, the learners interact between themselves and increase their knowledge. Random interaction between one learner and the other learner occurs so that the knowledge is enhanced. If a particular learner has more knowledge, then other learners can make use of this learner with good knowledge and can improve their skills. The learning phenomenon is expressed mathematically as follows: At a specific iteration j, A j , and A k are considered as two different learners (solutions), where j ≠ k and is represented as If A new provides a better function value, then it is accepted into the population. For the implementation of TLBO, the steps are as follows: Step 1. The optimization problem is defined, and the algorithm parameters are initialized. The population size ðP s Þ is initialized along with the total number of generations ðG s Þ and the number of design variables ðD s Þ. The optimization problem is defined as follows for our case: minimize f ðAÞ, where f ðAÞ is the objective function and A denotes the vector for design variables. Initial solutions are constructed as per P s and D s .
Step 2. The mean of the population columnwise is calculated so that the mean of each degree variable is obtained as E j . The best solution as (teacher) is identified based on A teacher = A f ðAÞ=min ; the movement of E j to A teacher will be tried hard, so assume E new = A teacher : Step 3. The diff mean based on (8) is calculated by using the teacher factor T F effectively.
Step 4. Based on (9), the solution in the teacher phase is modified, and the new solution is accepted if it is better than the existing one.
Step 5. Based on (10) and (11), the solution in the learner phase is updated, and then, the better one is accepted into population.
Step 6. Until the termination criterion is met, the steps (2) to (5) are repeated.

League Championship
Optimization. LCO is a new evolutionary algorithm inspired from sporting competitions in various sports leagues, and its main intention over a continuous search space is tracing the optimum solution for problems done here [36].
A randomly created group of ′ A ′ solutions forms the initial population of the algorithm. To a team, all the solutions are being attributed specifically to the formation of the current team. The playing strength that is being assigned has the fitness value, and it is very useful for the formation of corresponding team. A more potent formation is aimed to replace the present formation, and it is because of the greedy selection of the LCO. The number of seasons ðNÞ is assigned as a termination factor which compresses A − 1 weeks so that 4 BioMed Research International N × ðA − 1Þ contest weeks are yielded (it is to be understood that A is an even value). The existing teams always play in pass when the league schedule in every week is considered. Depending on the team formation, the playing strength of the team assesses the match outcome. When the events of previous events are tracked, during the recovery time, the formation and update of each team is done. The famous rule of the LCO is that if the value of playing strength is more then the likelihood of winning the game is more, the prediction about the outcome of a match cannot be done, and also win/lose could only be represented.
(1) League Schedule Development. For a season, the generation of a nonrandom order is done to enable teams so that a match is played against each other. By making a single round robin program, LCO does this task so that during a season, only one match is held between 2 teams. When the involvement of ′A′ teams is done, then AðA − 1Þ (2 games are required).
(2) Winner/loser Determination. As per the standard rule (the higher probability of winning if the playing strength of a team is higher) and considering Z w j and Z w k as the formations and also f ðZ w j Þ and f ðZ w k Þ as the playing strength of the teams j and k, then where C w k is the chance of a particular team ′k′ to its opponent at week w, ðC w j Þis defined accordingly and f ½Z = ðz 1 , z 2 , ::, z M Þ is an M variable function which is aimed to be reduced over the entire space.
The above mentioned formula indicates that the likelihood probability of a win for the particular team k (or j) is highly proportional to the difference between f ðZ w k Þ or f ðZ w j Þ and the total strength of the team. In such a case, a better team is assumed to have more factors in compliance with the ideal team. For the evaluation of the team, the distance from a common reference point forms as the basis. Therefore, for the winning portion of team is expressed by the ratio of these distances. When the idealized rule is considered, from the viewpoint of both teams, the probability that team j beats team k is considered to be equal and is expressed as From (12) and (13), C w j is expressed as follows Then, random generation of a number considered from 0 to 1 is done, and it is compacted with C w j to assess the winning/losing team. So team ′ j ′ won the game if C w j is greater than or equal to this, if not otherwise vice-versa happens.
3.4. Beetle Antennae Search Optimization. The richest species of the order Coleoptera is beetles. There are two long antennae in the beetles, and they are usually longer than the body. For detecting the food resources and a potential suitable mate, the two antennae can be utilized. When the unknown areas are explored, these antennae can act as an exploration apparatus. A metaheuristic algorithm can be modelled using the exploration behavior of beetles with two antennae, and it is called BAS algorithm [37]. The achievable solution is represented by the position of every beetle, and so the optimal solution is considered as the least and minimum distance from food. Without the gradient information itself, the optimization of BAS can be done. The particular search process is explained as follows: Step 1. All the BAS algorithm parameters are defined. The initialization of P positions of beetles x p ðp = 1, 2, ⋯, PÞ is done randomly. The maximum number of iterations is set as I max and set i = 0.
Step 2. In a random dimensional space, the initial antennae positions of beetles are constructed to be normalized so that the initial exploration environment can be expanded. The normalization of a random searching direction is done as follows: where a random function is denoted as rndð:Þ whose dimension of the solution is represented as dim.
Step 3. To assess the location of food, beetles utilize their antennae when foraging. If the antenna on one particular side is close to food, then the odor of food is received by that antenna, and as a result, it becomes stronger thereby the individual progresses to that same antenna side. The normalization of the right and left antennae is done as follows: where the iteration number is represented as i; the position of the left and right antennae is represented as z i r and z i l , respectively. The position of the beetle is represented as z i , and the sensing length of the antennae is represented as s i .
Step 4. By means of detecting the odor, the determination of the next position of beetle is done. So, based on the strength of the odor, the next location of the beetle is explored. Whichever antenna (left or right) receives the strongest odor, then the progress or movement will be towards it. The update of the beetle's location is done as: where the step size of searching is represented as δ i , f ð:Þ is represented as the evaluation function, and m represents the movement direction of the beetle. The sign function is represented as sign (.) Step 5. The update of the sensing length of the antenna s i and the searching step size δ i is done as follows: The fixed reduction factors are represented as c 1 and c 2 (between 0 and 1).
Step 6. The evaluation function of every individual is computed and compared to all the possible solutions to assess the optimal solutions. The number of iterations is updated as i = i + 1 and returns to step 2. Until i = I max is achieved, the process is repeated.
Step 7. The optimal solution is expressed as output.

Crow Search Optimization.
A famous metaheuristic algorithm, its application is widely used in many fields/problems. It is basically inspired from the highly intelligent attitude and behavior of crows [38]. Naturally, intelligence behaviors can be well seen in crows such as self-awareness, recognizing faces, advanced communication ways between them, warning the flock between unfriendly ones, and recognition of the food's hidden place after a long period of term. When compared to the human brain, the brain body ration of the crows is slightly lower, and crows in general have been recognized as the one of the most intelligent birds in nature.
The natural behavior of crows is emulated by the CSO evolutionary process by means of hiding and recovering the food. This algorithm is primarily based on population, and so the flock size is confirmed by M crows or individuals which are of m-dimensional in nature. The position on Y j,g of the crow j in a particular iteration g is expressed and indicated as a possible solution represented as Y j,g = y 1 j,g , y 2 j,g , ⋯, y m j,g where the maximum of iterations in this method is expressed as max iter. The best visited location L j,g is remembered by the crow due to its natural capability in order to hide food until the current iterations are expressed as: Based on two behaviors, pursuit and evasions, the modifications of each position, are done as follows: (1) Pursuit: with the main intention to discover the hidden place, a crow ′ k ′ follows crow ′ j ′ . The purpose of crow ′ k ′ is achieved, and the crow j does not check the presence of other crow (2) Evasion: the crow ′j′ deliberately takes a random trajectory in order to protect its food as the crow ′ j′ knows about the presence of crow ′ k ′ . By implementing a random movement, the simulation of the behavior in CSO is done An Awareness Problem (AP) is met to determine the kind of behavior considered by each crow ′ j ′ . Therefore, a uniform distribution of a random value r j between the ranges of 0 and 1 is sampled. If the range r j is greater or equal than AP, then application of behavior is implemented, or else situation two is chosen. In the following model, the summarization is done as: The magnitude of movement from crow Y j,g towards the best position L k,g of crow k is indicated by the flight length f l j,g . The random number r j is in the range of [0, 1] with uniform distribution. The evaluation of their position and the update of the memory vector are done as follows once the modification of crows is done as follows: where the objective function to be minimized is represented as Fð:Þ.
3.6. Fruit Fly Optimization. A famous relatively fast and simple method to find global optimization is FFO algorithm, and it is dependent on the food finding behavior of the fruit fly [39]. The smell of the food source can attract the fruit fly even when it is at a faraway location, and then, it progresses towards that direction rapidly. Once it gets close to the location of the food, it utilizes its vision to trace the food. FFO 6 BioMed Research International when compared with other optimization algorithms can achieve accurate optimization quickly. The summary of the original FFO is as follows: Step 1. Initialization: for the fly group, the population size is defined with the random initial fruit fly swarm location (X_axis, Y_axis) and the iteration termination criteria.
Step 2. Specific location assignment: the fruit fly location ðA i , B i Þ of an individual is randomly assigned as Step 3. The smell concentration judgement value SC i is set as the reciprocal of the distance from the fruit fly to the origin as Step 4. The smell concentration judgement function is defined, and it is nothing but the fitness function. For the corresponding position, it is done by substituting SC i to trace the smell concentration.
Step 5. The maximum smell concentration value along with its corresponding position is found out: bestSmell, bestSmell ½ = max Smell ð Þ: ð30Þ Step 6. The maximum smell location is utilized to replace the swarm centre location and is represented as: Step 7. The swarm history best Smellbest should be superior to bestSmell so that it is proceeded to step 6. Otherwise, proceed to step 2 and continue the iteration.

Classification Procedures
The best selected feature values or the optimized values are then used for classification. Five different types of classifiers are used here in this work.

Random Forest (RF) Classifiers.
One of the famous ensemble learning technique for regression and classification is Random Forest. With the help of bootstrap aggregation, multiple Decision Trees are constructed here. Based on the prediction of the tree structure, the classification is done by RF. After attaining the ultimate solution in the majority voting system, the judging of the result of each tree is done, and so it is highly suitable for a better fit.

Adaboost Classifiers.
A famous machine learning technique is Adaboost meaning adaptive boosting. To improve the performance of the classifier, it is utilized in conjunction with various kinds of algorithm. This classifier is generally less prone to overfitting problems and quite sensitive to noisy data. To achieve an optimal classification performance on a dataset, many parameters should be adjusted based on the appropriate learning algorithm, and Adaboost does it so well.

Logistic Regression (LR).
It is a famous supervised learning classifier. When the input variable is either discrete or continuous and when the output variable is categorical, it is used widely. Based on the input variables, the parameters are estimated by the Logistic Regression so that the probability of output variable is exactly predicted.

Decision Trees (DT).
It is a famous decision support tool that utilizes a tree structure constructed using input features. Based on many input features, the target variables are easily predicted and that is the main objective of this classifier. Almost for different kinds of applications, DTs are used because for a given input data, the extraction of decision rules can be done easily.

Quadratic Discriminant Classifier (QDA).
A famous supervised learning technique in machine learning field, it is widely used by many researchers to classify the objects into 2 or more classes by means of using a quadratic surface. It is a simple extension of LDA, and the rule of classification is the same as it. Here, among the groups, equal covariance matrices are not assumed generally.

Results and Discussion
It is classified with a 10-fold cross validation method, and the performance of it is shown in tables below. The mathematical formulae for computing the Performance Index (PI), Sensitivity, Specificity, and Accuracy are mentioned in literature, and using the same, the values are computed and exhibited [33]. PC is Perfect Classification; MC is Missed Classification, and FA is False Alarm in the expressions below. In addition to that, Good Detection Rate (GDR) is also computed and shown. The Sensitivity is computed as Specificity is computed as Accuracy is expressed as  Table 2 shows the performance analysis of classifiers in terms of classification accuracies with six optimization techniques for different gene selection methods using 30-60-90 selected genes. It is revealed from Table 2 that QDA classifier with 90 selected genes at IWO technique reached the highest accuracy of 99.16%. LR classifier with 60 selected genes attained a lower value of classification accuracy of 75.609% at CSO under individual category. Across the classifiers, the FFO method acquired a high average accuracy of 91.43%. Table 3 demonstrates the performance analysis of classifiers in terms of PC with six optimization techniques for different gene selection methods using 30-60-90 selected genes. It is observed from Table 3 that QDA classifier with 90 selected genes at IWO technique reached the highest PC of 98.96%. Adaboost classifier with 30 selected genes attained a lower value of PC of 51.125% in CSO under individual category. Across the classifiers, the FFO method maintained a high average PC of 82.865%. This is due to the smoothening effect of features by FFO across the classifiers. Table 4 reports the performance analysis of classifiers in terms of PI with six optimization techniques for different gene selection methods using 30-60-90 selected genes. From Table 4, it is observed that QDA classifier with 90 selected genes at IWO technique reached the highest PI of 98.935%. Adaboost classifier with 30 selected genes ebbed at a lower value of PI of 4.391% in CSO under individual category. Across the classifiers, the FFO method maintained high average PI of 76.672%. The lowest ever average PI of 29.83% across the classifiers is indicated by BASO method. Table 5 depicts the performance analysis of classifiers in terms of GDR with six optimization techniques for different gene selection methods using 30-60-90 selected genes. From Table 5, it is reported that QDA classifier with 90 selected genes at IWO technique reached the highest GDR of 98.96%. LR classifier with 60 selected genes was ebbed at a lower value of GDR of 4.758% in CSO under individual category. Across the classifiers, the FFO method maintained high average GDR of 82.86%. The lowest average GDR value of 53.16% across the classifiers is attained by the BASO method. Table 6 deals with the average performance analysis of classifiers in terms of parameters like accuracy, PC, PI, and GDR in average to six optimization techniques for different gene selection methods using 30-60-90 selected genes. It is indicated in Table 6 that QDA classifier with 90 gene selected condition scores higher parametric values like accuracy of 91.22%, PC of 82.36%, PI of 72.44%, and GDR of 82.37%. Therefore, QDA classifier with 90 gene selected gene cases will be considered as a better    Figure 2 shows the performance analysis of accuracy in various classifiers under six different optimization methods for 30-60-90 genes selected in colon cancer. As depicted from Figure 2 that QDA classifier with 90 selected genes at IWO technique reached the highest accuracy of 99.16%. LR classifier with 60 selected genes attained a lower value of clas-sification accuracy of 75.609% at CSO. Across the classifiers, the FFO method acquired a high average accuracy of 91.43%. The lower average accuracy of 80.55% is ebbed by the BASO method. Figure 3 represents the average performance analysis of classifier bench mark parameters like accuracy, PC, PI, and

Conclusion and Future Work
Thus, the classification of colon cancer has a huge importance in the medical field. As many existing cancer classification models are clinical based, it has a pretty less diagnostic ability. With the rapid advancement of gene expression technology, many kinds of cancers can be classified with the help of using DNA microarray. As the characteristics of gene expression data possess a high dimension, nonbalanced distribution, and a small sample size, classification of it is pretty difficult. Therefore, to get a better insight into the colon 11 BioMed Research International cancer classification problem, a systematic approach has been proposed. In this paper, the problem of colon cancer classification is confronted with the help of MRMR and six other optimization techniques. Finally, it is classified with five suitable classifiers, and the best results show when IWO is utilized with MRMR, and then classified with QDA, a classification accuracy of 99.16% is obtained. Future works aim to work with other feature selection techniques and optimization methods for the better classification and analysis of microarray-based colon cancer classification.

Data Availability
The data availability will be provided to the genuine researchers upon verifiability and request to the corresponding author.

Conflicts of Interest
The authors declare that there is no conflict of interest.