Particle Swarm Optimized Hybrid Kernel-Based Multiclass Support Vector Machine for Microarray Cancer Data Analysis

Determining an optimal decision model is an important but difficult combinatorial task in imbalanced microarray-based cancer classification. Though the multiclass support vector machine (MCSVM) has already made an important contribution in this field, its performance solely depends on three aspects: the penalty factor C, the type of kernel, and its parameters. To improve the performance of this classifier in microarray-based cancer analysis, this paper proposes PSO-PCA-LGP-MCSVM model that is based on particle swarm optimization (PSO), principal component analysis (PCA), and multiclass support vector machine (MCSVM). The MCSVM is based on a hybrid kernel, i.e., linear-Gaussian-polynomial (LGP) that combines the advantages of three standard kernels (linear, Gaussian, and polynomial) in a novel manner, where the linear kernel is linearly combined with the Gaussian kernel embedding the polynomial kernel. Further, this paper proves and makes sure that the LGP kernel confirms the features of a valid kernel. In order to reveal the effectiveness of our model, several experiments were conducted and the obtained results compared between our model and other three single kernel-based models, namely, PSO-PCA-L-MCSVM (utilizing a linear kernel), PSO-PCA-G-MCSVM (utilizing a Gaussian kernel), and PSO-PCA-P-MCSVM (utilizing a polynomial kernel). In comparison, two dual and two multiclass imbalanced standard microarray datasets were used. Experimental results in terms of three extended assessment metrics (F-score, G-mean, and Accuracy) reveal the superior global feature extraction, prediction, and learning abilities of this model against three single kernel-based models.


Introduction
Cancer is a disorder caused by excessive and uncontrolled cell division in a body. A total of 9.6 million people died of cancer in 2018 [1]. As a matter of fact, death due to cancer can be reduced to nearly half if the cancer types are detected early and the right treatment administered in time. However, it is still a challenge for researchers to effectively diagnose cancer on the basis of morphological structure since different cancer types exhibit thin differences [2].
This challenge encourages application of data mining techniques, especially the use of gene expression data in determining the types of cancer cells. The level of gene expression can duly indicate the activity of a gene in a body cell based on the number of messenger ribonucleic acids (mRNAs). It is well known to contain information about the disease that may be in the gene sample, which may help experts in treating or preventing the disease [3].
Though next-generation sequencing (NGS) especially RNA-sequencing (RNA-Seq) is slowly replacing microarrays when analyzing and identifying complex mechanism in gene expression, e.g., in the gene expression-based cancer classification problem, it is relatively expensive compared to microarrays. Since microarrays have been used for a long time, there exist robust statistical and operational methods for their processing [4][5][6][7][8][9][10][11][12][13]. In addition, many significant microarray experiments have been conducted and are publicly available to the research community [14][15][16][17][18][19][20]. For microarrays, there exist large and well-maintained repositories that have collected these types of data for long. While the preprocessing and analysis steps of microarray data are mostly standardized, the establishment of RNA-Seq data analysis techniques is still ongoing in the field of transcriptomics. Because of these reasons, to date microarrays are still utilized in many gene expression-based cancer classification studies as presented in the most recent survey of hybrid feature selection methods in microarray gene expression for data for cancer classification [20][21][22][23].
The DNA microarray technology has the capability of determining the level of thousands of genes concurrently in a given experiment, which so far has facilitated the development of cancer classification by the use of gene expression data [4][5][6][7][8][9][10][11][12][13].
Clinical decision support is the most recent application of DNA microarrays in the medical domain. This support can take the form of disease diagnosis or predicting clinical outcomes in response to a treatment. Currently, the two major areas in medicine that are drawing much attention in this regard are management of cancer and other contagious diseases [24].
With the rapid development of artificial intelligence (AI), machine-learning algorithms such as artificial neural network (ANN), support vector machine (SVM), and k-nearest neighbor (KNN), many researchers have immensely applied them in the gene expression-based cancer diagnosis. For instance, the artificial neural networks (ANNs) have been proposed for the microarray gene classification due to their superior ability to map inputoutput structured data. Khan and Meltzer utilized the ANN in analyzing microarray gene data from patients with small round blue-cell tumours [9]. Bevilacqua and Tommasi developed an accurate classifier model based on the feed-forward ANN for estrogen receptor (ER) ± metastasis recurrence of breast cancer tumours [25]. Chen et al. [26] also modeled a classifier for microarray gene data using ANN ensembles that were based on filtering of samples. In all these studies, attractive classification accuracies were obtained.
Furey proposed an SVM based on a simple kernel to carry out gene expression data analysis, which turned out to perform remarkably [27]. Vanitha et al. utilized SVM alongside mutual information gained (MI-SVM) for feature selection [11]. In his research, he used various SVM models: linear SVM, radial basis function (RBF) SVM, quadratic SVM, and polynomial SVM. He further compared the results obtained from the proposed scheme with the k-nearest neighbor (K-NN) and ANN classifier results. Based on the obtained result, utilization of the MI-SVM obtained better results compared to K-NN and ANN, and even in some datasets, 100% accuracy was achieved.
Based on these previous research studies, it is evident that SVM has already made an important contribution in the field of microarray-based cancer classification. However, many researchers have pointed out that though the SVM is a promising classifier in microarray-based cancer classification, its performance solely depends on three aspects: the penalty parameter C of this classifier, the type of kernel utilized, and its parameters [28][29][30][31][32].
To improve the classification accuracy of the SVM classifier, some techniques have been presented to search for the optimal model parameters, such as the grid-search and the gradient descent [1]. Although these approaches have proven their effectiveness in the corresponding experiments, in most cases they fall into the local optimum point easily and have a defect of low efficiency [1,18].
Recently, some meta-heuristic techniques, such as particle swarm optimization (PSO), genetic algorithm (GA), bat algorithm (BA), and dragonfly algorithm (DA) have attained promising results when utilized in tuning SVM classifier's parameters [18]. However, most of these research studies have not been applied to gene expression-based cancer analysis. In addition, they only focus on SVM with a single kernel function. Though some research studies [28] point out that combining multiple kernel functions can achieve better performance compared to a single kernel function, little research has provided an in-depth formulation and analysis of the performance of a multiclass support vector machine (MCSVM) with a combined kernel function. Thus, there would be a definite need to systematically study the complex optimization problem in the MCSVM classifier with a combined kernel applicable to gene expression-based cancer classification.
Considering PSO has a number of desirable properties, including simplicity of implementation, scalability of dimension, and a good empirical performance, and is computationally efficient compared to other optimization techniques [33], and there exist few studies on MCSVM classifier with combined kernels in microarray-based cancer classification, this paper proposes a novel gene expressionbased cancer classification model, i.e., PSO-PCA-LGP-MCSVM. This model is based on particle swarm optimization (PSO), principal component analysis (PCA), and multiclass support vector machine (MCSVM) with a novel hybrid kernel function, i.e., linear-Gaussian-polynomial (LGP) kernel.
The objective of this research is to construct a MCSVM classifier with three different standard kernel functions (linear, Gaussian, and polynomial). Use PCA to reduce the dimensional complexity of the considered microarray datasets and optimize all the parameters of this model using PSO.
The overall structure of this paper takes the form of five chapters, including this introductory chapter. The remaining part of this paper proceeds as follows: a detailed presentation of the proposed model is presented in Section 2. Section 3 deals with the considered cancer microarray datasets and the evaluation metrics used. Section 4 focusses on the experimental results and discussions. Finally, conclusions and recommendations are given in Section 5.

PSO-PCA-LGP-MCSVM Principles
2.1. Normalization. Microarray gene expressions can differ by an order of magnitude. Thus, it is necessary to normalize these data to improve the performance of subsequent microarray data analysis stages like gene selection/feature extraction, clustering, and classification [1].
In this paper, the microarray gene expressions are linearly transformed from the interval [X min , X max ] ⟶ [0, 1] uniformly utilizing the following equation [1]: where X ′ is the new normalized value of the gene expression level and X is the value of the gene expression level before normalization, while X max and X min , respectively, declare the largest and least values of all the data in an attribute (gene) to be normalized.
Since the min-max normalization has the advantage of preserving exactly all the relationships among the original gene data values and does not introduce any bias [1], it is considered in this paper.

Principal Component Analysis (PCA).
One of the major challenges encountered in working with DNA microarray data is their high dimensionality that is coupled with a relatively small sample size. While there is a plethora of crucial information that can be derived from these large datasets, their high-dimensional nature can often hide the critical information. Thus, a process that can reduce the dimensionality complexity of this type of data is required. In addition, a dimensionality reduction step will minimize errors obtained in the subsequent classification stage [1,12,[33][34][35].
In this paper, principal component analysis (PCA) that includes the calculation of variance of proportion for eigenvector is used. The steps of this algorithm are as follows: (a) Let X′ (the normalized microarray gene expression data) be the input matrix for PCA. Each row vector of X ′ represents the normalized expression gene values for each of the genes. (b) Compute the mean (centroid) X of each gene j using the following equation where the sum goes through all M samples (tissues): where M is the number of tissues and X ij ′ is gene j data. (c) Compute the covariances (degree to which the genes are linearly correlated) as per the following equation: where C kj is the covariance of gene k and gene j, M is the number of samples (tissues), X ki ′ is the expression level of gene k in sample i, X ji ′ is the expression level of gene j in sample i, X k is the mean of expression levels of gene k, and X j is the mean of expression levels of gene j. (d) Form a covariance matrix C using the computed covariances and transform it into a diagonal matrix as depicted in the following equation: The diagonal elements of the transformed matrix are the eigenvalues z 1 , z 2 , . . . , z M which denote the amount of variability captured along a particular new dimension. (e) Calculate corresponding eigenvectors as ρ 1 , ρ 2 , . . . , ρ M using the following equation: (f ) Sort the eigenvalues in descending order, i.e., The eigenvectors corresponding to the k largest eigenvalues (where k < M) are the first k principal components. (h) Select the first k eigenvectors via the cumulative proportion of variance (eigenvalues). The proportion of variance (PPV) for each principal component is determined as follows: (i) Form the principal component matrix P, a matrix consisting of selected k eigenvectors that correspond to the largest k eigenvalues, where the k eigenvectors are derived from eigenvalues that meet the criterion in the following equation: (j) Compute dimensionally reduced microarray gene expression data X DimRed ′ using the following equation: Hence, the analysis reduces the highly dimensioned original microarray datasets to P for each sample, which are the inputs for the multiclass support vector machine (MCSVM).
To be able to measure the generalization error for each considered model, per-fold PCA was adopted. This is achieved by first conducting a separate PCA on each calibration set and then applying this transformation on the validation set. This same transformation is achieved by first subtracting the means of the calibration set from the validation set and then projecting these data onto the principal components of the training set achieved this. The underlying assumption is that the testing and training set should be derived from the same distribution, which justifies this process.
The main objective of MCSVM is to map the preprocessed, nonlinear inseparable microarray gene expression data into a linear highly dimensioned manifold θ by the use of a transformation ∅: R N ⟶ θ, then obtaining the optimal hyperplane Ψ: ψ(x) � (ω · ϕ(x) + b) by solving the following optimization convex problem (the soft margin problem) [36]: where ω is a coefficient vector of the hyperplane in the manifold (feature space), b is the threshold value of the hyperplane, ξ i is a slack factor introduced for classification errors, and β is a penalty factor for errors. The parameter β controls the penalty of misclassification and its value is normally determined via cross-validation. Larger values of β normally lead to a small margin which minimizes classification errors while smaller values of β may produce a wider margin resulting in many misclassifications.
The feature space θ is highly dimensioned, so its direct computation can lead to "dimension disaster." However, since ω � n i�1 δ i y i ∅(x i ), then all the operations of the support vector machine (MCSVM) in the feature space θ are only dot products. And since kernel functions, i.e., , are efficient at handling dot products, they were introduced into the SVM. This implies there is no need to know how to map the microarray gene expression data from its original space to the feature space θ. Thus, selection of a kernel and its coefficients is vital in the computational efficiency and accuracy of an MCSVM classifier model [28][29][30][31][32].
These MCSVM kernel functions can be broadly categorized as follows: local kernel functions and global kernel functions. Samples far apart have a great impact on the global kernel values while samples close to each other greatly influence the local kernel values. The linear and polynomial kernels are good examples of global kernels while the Gaussian radial basis function and the Gaussian are local kernels [28,[30][31][32]37].
Relatively speaking, the linear kernel function has a better extraction of global features from samples, the polynomial kernel has good generalization ability, and the Gaussian kernel (the most widely used kernel) has a good learning ability among all the single kernel functions. Thus, it is evident that utilizing a single kernel function-based MCSVM classifier in a given application such as gene expression data may neither attain good learning ability, proper global feature extraction ability, and a better generalization capability. In trying to overcome this hiccup, two or more kernel functions can be combined [28][29][30][31][32].

Linear-Gaussian-Polynomial MCSVM (LGP-MCSVM).
In trying to build a kernel model that has better global feature extraction, good learning, and prediction abilities, the work presented in this paper combines the merits of two global kernels (linear and polynomial) and one local kernel (Gaussian). This paper therefore proposes a novel kernel "linear-Gaussian-polynomial (LGP)" kernel, which is formulated as follows: where β 1 + β 2 + β 3 � 1, β ∈ R, and δ, d > 0.
In this paper, we utilize different values of β to mix the three standard kernels (different regions of the input space). In this case, β is a vector, i.e., β � [β 1 , β 2 , β 3 ]. Through this approach, the relative contribution of each kernel to the hybrid kernel, i.e., G lgpk (x i , x i′ ), can be easily varied over the input space.
The LGP kernel function takes better global feature extraction ability from the linear kernel, good prediction ability from the polynomial kernel, and better learning ability from the Gaussian kernel. Mercer's theorem provides the necessary and sufficient qualifiers of a valid kernel function. It states that a kernel function is a permissible kernel if the corresponding kernel matrix is symmetric and positive semidefinite (PSD) [5,38].
A kernel matrix can be validated that it is PSD by determining its spectrum of eigenvalues. It is important to note that a symmetric is positive definite if and only if all its eigenvalues are nonnegative. Considering this, for the proposed kernel to be permissible, it must satisfy Mercer's theorem. This validity can be proved by using the Taylor expansion for the exponential function of equation (13): where From equation (19), it is evident that G LGP (x i , x i′ ) is a mixed kernel comprising of a weighted linear kernel, a constant β 2 , and a weighted summation of polynomial kernels. Using propositions (20)-(22) of Theorem 1 and propositions (21) and (22)

Corollary 1. Functions of a Mercer kernel K1 are also
Mercer's kernels: Since the proposed hybrid LGP kernel combines three valid Mercer's kernels, i.e., linear, Gaussian, and polynomial kernels, it also a valid Mercer's kernel that can be used for training and classification of the multiclass support vector machine (MCSVM).
By using the proposed LGP-MCSVM, the nonlinear transformation of the microarray gene sample points to get the corresponding kernel matrix so as to obtain the classification results during the training phase of the MCSVM classifier.

Particle Swarm Optimization (PSO).
Currently, there is no widely accepted method for optimizing these parameters. The "grid-search (GS)" with exponentially growing sequences of combination C, η for the commonly utilized Gaussian kernel is often applied in microarray analysis [1,18]. Though it is easy to implement, it has low computing efficiency. In addition, the optimal result of the GS can only be generated from the preset grid combinations while unknown possible optimal parameters cannot be explored and discovered.
In this paper, particle swarm optimization (PSO) optimization technique is adopted to optimally search for the best parameter combinations for the considered models [18,33]. The PSO technique is derived from the migration patterns of birds during foraging, which has a faster convergence, efficient parallel computing, and a strong universality that is able to efficiently avoid local optimum [20]. In addition, the iteration velocity for its particles is largely influenced by the sum of current velocity, previous particle value, the current global optimal value, and random interferences, which greatly helps avoid the local optimal and improves the search coverage and effectiveness. In order to effectively evaluate the performance of the considered models, different values were considered for all kernel parameters within the following ranges presented in Table 1.
The parameters that need to be determined in the PSO algorithm include the dimension of the search space D, the swarm size n, cognitive learning factor c 1 , social learning factor c 2 , the inertia weight w, and the maximum number of iterations. The search space dimension D for each considered model is equal to the number of parameters required to be set for that model, i.e., PSO + L-MCSVM (D � 1), PSO + P-MCSVM (D � 4), PSO + G-MCSVM (D � 2), and PSO + LGP-MCSVM (D � 8). Since each model has a different dimensional search space and there is no exact rule in the literature for selecting the swarm size, as a rule of thumb with heuristic optimization algorithms, the swarm size for each model was set to 10 × D [39]. According to [40], both the cognitive learning factor and social learning factor were set to 2, i.e., c 1 � c 2 � 2, and the inertia weight w was set to 1 as suggested in [41]. To prevent the searches from terminating prematurely and unnecessary additional computational complexity, the maximum number of iterations for all models was set to 50. Table 2 presents these initial PSO parameters of each model. More information on the PSO algorithm is presented in [18-20, 33, 39-43].

PCA-PSO-
LGP-MCSVM Model. The main process of the proposed algorithm is outlined as follows: (1) Transforming the cancer microarray data into the right format for the SVM package. (2) Loading a cancer microarray dataset.
(3) Randomly dividing the loaded microarray data into two sets: training set and testing set. (4) Initialize the PSO parameters such as the population size, the maximum number of iterations, and the considered multiclass SVM parameters. (5) Adopt PSO to search for the optimal solution of particles in the global space by using 5-fold crossvalidation that incorporates per-fold PCA feature extraction. This process is presented below. (6) To achieve 5-fold cross-validation incorporating PCA, the following steps were followed: (i) For j � 1 to 5 repeat steps (ii) to (vi) (ii) Carry out PCA on data present in the remaining 4 folds to generate a loadings matrix (iii) Transform this data (data in the remaining 4 folds, i.e., calibration set) into a set of principal component (PC) scores using the first P components (that account for at least 95% cumulative variance) of the loadings matrix generated in step (ii) (iv) Build a considered SVM classification model using a set of parameter values using the generated PC score data in step (iii) (v) Transform the held-out test fold data (i.e., data in fold j) into a set of principal component (PC) scores using the P component loading matrix retained in step (iii) (vi) Compute the classification accuracy of the built SVM classification model in step (iv) using the transformed test fold j data in step (v) (vii) For the considered parameters set, store their optimal parameter values set (i.e., a set of parameters that yields the highest classification accuracy) (7) Report optimal parameters for the considered model. (8) Carry out PCA on the whole training set data (i.e., the training set obtained in step 3) to generate a loading matrix. (9) Transform this whole training set data into a set of PC scores using the first P components (that account for at least 95% cumulative variance). (10) Build an optimal model for the considered SVM classification model using the optimal parameter values set obtained in step (vii) using the generated PC scores data in step 9. (11) Transform the whole testing set data (i.e., the testing set obtained in step 3) into a set of principal component (PC) scores using the P component loading matrix retained in step 9. (12) Compute the classification accuracy of the built optimal SVM classification model in step 8 using the transformed whole testing set data in step 9. (13) Report this test classification accuracy.
The schematic diagram in Figure 1 shows all the process of the PSO-PCA-LGP-MCSVM algorithm.
It is important to mention that the whole analysis process is conducted using the LIBSVM framework in MATLAB [44,47] on Intel(R) Core (TM) i3-3240M CPU @ 3.4 GHz with 12 GB of RAM machine.

Considered Microarray Datasets.
To assess the performance of the proposed PSO-PCA-LGP-SVM algorithm, several experiments were conducted on four publicly available datasets. Summary of all the datasets utilized in this research can be found in Table 3, and following is a brief description of each dataset: Colon dataset [8]: Due to the small number of instances in the considered datasets, all the datasets were initially split into two disjoint sets: the training set and the test set. Utilizing 5-fold crossvalidation, the training set was randomly divided further into 5 subsets (approximately) equal in size. Each time 4 subsets were selected as the calibration set and the remaining subset was used as the validation set. This process was repeated 5 times. Finally, the average of classification accuracy on the validation set was used as one of the evaluation metrics. It is important to point out that by using 5-fold cross-validation to dynamically divide the microarray training samples, the considered models turn out to be more stable and objective.
The percentage proportion for the calibration, validation, and test sets for all the considered microarray datasets is presented in Table 4.

Performance Measures for Imbalanced Microarray
Datasets. When the samples in a dataset are unevenly distributed among the classes (for instance, in the case of microarray datasets), the task of classification in imbalanced domains must be defined. The majority class, as a result, influences the data mining algorithms skewing their performances towards it [15].
Most algorithms simply compute the accuracy on the basis of the percentage of correct samples.
However, in the case of microarrays, these results are highly deceiving since the minority classes hold minimal effects on the overall classification accuracy. Thus, a consideration of a complete confusion matrix (Table 5) must be made to obtain the classification of both positive and negative classes independently [15].
The description in Table 5 gives four baseline statistical components, where TP and FN denote the number of positive samples, which are accurately and falsely predicted, respectively, and TN and FP depict the number of negative samples that are predicted accurately and wrongly, respectively.
Two most frequently used metrics for class imbalance problem, namely, F-measure and G-mean, can be regarded as functions of these four statistical components and are calculated as follows: where precision, recall, TPR, and TNR are further defined as follows: , The overall classification accuracy (Acc) can be calculated using the following equation: However, all these evaluation metrics are appropriate for estimating binary-class imbalance tasks. To extend them for multiclass, the following transformations should be considered [15].G-mean computes the geometric mean of all the classes' accuracies and is defined by where Acc i denotes the accuracy of the i th class. F − measure can be transformed as F-score and is computed using the following equation: where F− measure i is calculated further using the following equation: Acc can be transformed as depicted by the following equation: where P i is the percentage of samples in the i th class. To impartially and comprehensively assess the classification performance of the proposed model in comparison with PSO-PCA-L-MCSVM, PSO-PCA-G-MCSVM, and PSO-PCA-P-MCSVM models that utilize the standard linear, Gaussian, and polynomial kernels, respectively, the three extended measures, namely, F-score, G-mean, and Acc which are described in (29), (30), and (32), respectively.

Results and Discussions
The experimental results for the 4 classification models on the 4 microarray datasets are reported in Tables 6-8, where the best result in each dataset is highlighted in bold and the worst is italicized. From Tables 6-8, the following observations can be made (i) Lung and St. Jude datasets are slightly sensitive to the class imbalance while Colon and AML-ALL are not, as shown by the difference between Accuracy and G-mean values. An Accuracy slightly lower than the G-mean values implies that the MCSVM is affected by the imbalanced class distribution. This is largely attributed by a large number of true negatives (TNs) recorded achieved by all the models when analyzing both the Lung and St. Jude datasets. (ii) The hybrid kernel boosted the classification performance of the multiclass on three datasets, i.e., Colon, Lung, and St. Jude. These promotions are better portrayed by the F-score and G-mean metrics, which are used to evaluate a balance level of classification results. However, a tie is reported for the AML-ALL dataset. This implies that though the complementary characteristics of the three standard kernels, i.e., linear, Gaussian, and polynomial, in the proposed hybrid linear-Gaussian-polynomial (LGP) kernel may improve the multiclass support vector machine classifier's classification ability on most microarray datasets, a single suitable kernel is sufficient.
(iii) Of all the considered models, the PSO-PCA-P-MCSVM reported the least performance in all the considered metrics for all the four datasets. However, it is important to note that a promising kernel can be obtained if we embed into the exponential kernel.
In summary, compared with single kernel-based models (i.e., PSO-PCA-L-MCSVM, PSO-PCA-G-MCSVM, and PSO-PCA-P-MCSVM), the proposed PSO-PCA-LGP-MCSVM model that is based on a hybrid linear-Gaussianpolynomial (LGP) kernel with a better global feature extraction ability, good prediction ability, and better learning ability, has an attractive classification ability in cancer diagnosis using both imbalanced dual and multiclass microarray datasets. Moreover, due to the excellent global searching ability of the particle swarm optimization, it can effectively optimize the hybrid kernel-based MCSVM when solving a wider range of classification problems.

Conclusion
Techniques to choose or construct suitable kernel functions and optimally tune its parameters for MCSVM have received a considerable and critical attention in imbalanced microarray-based cancer diagnosis. A novel classification model, PSO-PCA-LGP-MCSVM, that is based on MCSVM with a hybrid kernel, i.e., linear-Gaussian-polynomial (LGP), is proposed in this paper. The LGP kernel combines the advantages of three standard kernels, i.e., linear, Gaussian, and polynomial kernels in a novel manner where the linear     kernel is linearly combined with a polynomial kernel that is embedded into a Gaussian kernel. Using PSO to optimally tune the LGP kernel-based MCSVM resulted in better generalization, learning, and predicting ability as evidenced by the promising results in terms of three extended measures F-score, G-mean, and Accuracy irrespective of imbalanced binary or multiclass microarray datasets. The performance of the proposed model was compared with those of 3 models, i.e., PSO-PCA-L-MCSVM, PSO-PCA-G-MCSVM, and PSO-PCA-P-MCSVM that are based on single linear, Gaussian, and polynomial kernels, respectively, and the experimental results show that the proposed model is superior to the three single-kernel-based models. This reflects the good practical value of the proposed model in the field of microarray-based cancer diagnosis, which can also be extended to more applications of medical diagnostic classification to explore its potential.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.