A Graph-Theoretic Approach for Identifying Non-Redundant and Relevant Gene Markers from Microarray Data Using Multiobjective Binary PSO

The purpose of feature selection is to identify the relevant and non-redundant features from a dataset. In this article, the feature selection problem is organized as a graph-theoretic problem where a feature-dissimilarity graph is shaped from the data matrix. The nodes represent features and the edges represent their dissimilarity. Both nodes and edges are given weight according to the feature’s relevance and dissimilarity among the features, respectively. The problem of finding relevant and non-redundant features is then mapped into densest subgraph finding problem. We have proposed a multiobjective particle swarm optimization (PSO)-based algorithm that optimizes average node-weight and average edge-weight of the candidate subgraph simultaneously. The proposed algorithm is applied for identifying relevant and non-redundant disease-related genes from microarray gene expression data. The performance of the proposed method is compared with that of several other existing feature selection techniques on different real-life microarray gene expression datasets.


Introduction
Data dimensionality reduction can be done in two ways: 1) feature extraction creates new feature by combining features and, 2) feature selection choose subset of features by eliminating features with less or no predictive information. The center of attention of this proposed study is only on the feature selection. Feature selections have immense impact in improving the quality of classification and clustering technique in machine learning and pattern classification. The feature selection can be applied to both supervised and unsupervised learning. In a supervised scenario [1], [2], the correct class of all training samples are additionally known and the feature evaluation criteria to generate selected feature set are based on the known class label of the features. In contrast, in unsupervised cases the assessment criteria are completely independent of the true class labels of the features. Performance in unsupervised classification is typically considered as the capability of a clustering algorithm to expose groupings (clusters) in a given data set. Subsequently, the clustering solution is evaluated using some cluster validation techniques like entropy (E), class separability (S), fuzzy feature evaluation index (FFEI), etc [3]. Again feature selection may be filter-based or wrapper-based approach. When the utility of a feature is measured in terms of some proxy measure, then it is called filter-based feature selection. The proxy measure uses the class label in supervised filter-based approach. In unsupervised filter, the proxy measure considers the degree to which the distribution of the feature values exhibits the class structure in the feature space. Utility measures for wrapper methods [2] completely rely on a classifier or clustering result. As filter methods are independent of the classifier applied subsequently, they have excellent generalization properties, but may be less effective at decreasing the dimensionality of the feature space and boosting classification accuracy. Generally, they are computationally cheaper than the wrapper approaches. But wrapper based methods are more prone to have data over-fitting. The variety of feature selection technique has been addressed in quite a few ways such as clustering based [4], [5], content based [6], for ensemble classifier [7], graph based [8], [9] and feature similarity based [3].
In this context, two opposite strategies have been proposed in the literature: those that aim at the exclusion of redundant features [3] and those that focus on the elimination of irrelevant features [10]. Besides these methods there exist some Particle Swarm Optimization (PSO) based feature selection techniques in the literature. In [11], a multiswarm binary PSO has been introduced. A scheduling algorithm has been executed for selecting fittest subswarm where classification accuracy and fscore are combined as objective function. Then in [12], author used PSO and Least Square Support Vector Machine for feature selection and in [13] an improved PSO with signtest has been described for identifying relevant features. Again article [14] used bPSo but all these methods have been modeled as single objective fashion where classification accuracy has been considered as objective function. However, also there exist multiobjective PSO-based approaches like [15], [16] and [17] where MOPSO has been well studied but they did not consider the redundancy among features which should be minimized for reducing computation cost and improving the performance. Therefore, the objective of feature selection should be to select the most significant or relevant as well as nonredundant features.
In this article we have proposed a novel graph-theoretic model for selecting most relevant and non-redundant features from the input dataset. In the proposed method, first a complete graph is shaped where the nodes symbolize the features and edge weights are defined by the dissimilarity among the features. Then we extract the densest subgraph from the feature-dissimilarity graph. The attributes contained by the extracted subgraph comprise the final selected relevant and non-redundant features. For identifying the densest subgraph, we have projected a multiobjective binary particle swarm optimization (MO-bPSO) based algorithm. The particles are fashioned as binary strings for encoding the feature subset. Two objective functions, average node-weight and average edge-weight are optimized simultaneously. Unlike single objective optimization which yields a single best solution, multiobjective optimization (MOO) [18], [19] algorithms turn out a set of solutions which contains a number of non-dominated solutions, none of which can be further improved on any one objective without degrading it in another. Here the multiobjective optimization problem is tackled by applying bPSO [20] in which fitness comparison takes Pareto dominance [21] into account during the movement of the particles in the search space. The non-dominated solutions are stored in an archive to approximate the Pareto front [22].
In this proposed article, feature selection technique is applied to identify relevant and non-redundant gene markers from microarray gene expression data [23]. Microarray is a rapidly growing technology that provides the opportunity to assay the expression levels of genes in a single experiment. A microarray gene expression data set contains the expression levels of thousands of genes over a number of tissue samples. Hence this is a sample versus gene matrix which also contains the class label for each sample. Although recently it has gained popularity in the process of finding disease-related gene or marker, its high dimensionality and noise pose a challenging problem. Moreover some genes may not be very relevant to the corresponding class labels; hence they are not helpful for phenotype classification. In binary classification [24], the task of classification is done to the samples of the microarray dataset consisting of normal (benign) and cancer (malignant) tissue. Otherwise when samples represent three or more subtypes of cancer then classification [25] is called multiclass cancer classification.
It is common in practice that in order to find the most relevant genes, most of the existing feature selection techniques [26], [27] produce a redundant set of genes. This fact has encouraged us to apply our proposed graph-based multiobjective binary particle swarm optimization technique which selects not only the relevant genes but a non-redundant set of genes also. The performance of the proposed technique is established on different real-life microarray gene expression data sets and compared with that of various existing gene selection techniques.

Other Relative Methods
There are many more feature selection techniques in the existing literature establish their own superiority. In this article, we have taken some of them namely, T-test, Ranksum test, SFS, SBE, CFS, mRMR(MIQ), Graph-based feature selection () and Clusterbased feature selection(). Moreover as our method is multiobjective one, so the singleobjective versions are also taken into account. By nature, the Sequential Forward Search (SFS) [28] selects features sequentially depending on the adopted criteria. On the contrary, Sequential Backward elimination (SBE) [29] discards features on the basis of the adopted criteria. Additionally, a methods like Correlation-based Feature Selection (CFS) [30] has been used for performance analysis. Here, the ratio of snr value to mean correlation value is considered as the criteria to calculate the features importance. The number of resultant features of our proposed approach is the input of the other comparative algorithms like T-test, Ranksum test, SFS, SBE, CFS and mRMR(miq). In case of T-test [31], and Ranksum test [32], [26], at first the p-values of the features are sorted and required numbers of features are taken for validation. In mRMR feature selection technique [33,34], the relevance of gene is calculated by mutual information [35] between a feature and its corresponding class labels and redundancy is computed as the mutual information among the features. The basic concept of mRMR is to select the genes such that they are relevant and mutually maximally dissimilar to each other at the same time. Let s denotes the subset of genes that we are seeking. The average minimum redundancy is given as Equation 1: where I(i,j) presents the mutual information between i-th gene and j-th gene and jsj is the number of genes in S. The discriminant power of a gene by the mutual information I(h,gi) is calculated as per Equation 2. That means the mutual information between targeted classes h~h 1 ,h 2 , Á Á Á ,h k and the gene expression g i is the measure of relevance of that gene. Thus the maximum relevance condition is to maximize the average relevance of all genes in s is Equation 2: Therefore, the redundancy of a gene has to be minimized and relevance of a gene has to be maximized. As two conditions are equally important, two simplest combined criteria are: Max(V {W ), and Max(V =W ). Here only the mRMR for discrete variable in form of mRMR mutual information quotient (mRMR MIQ) is described. The mRMR with MIQ scheme is formulated as per Equation 3.
Next, in Graph-based feature selection method [36], a graph G~(V |E) has been constructed with node-set V, edge-set E(V |V and edge weight matrix W whose elements are in the interval [0; 1]. Each vertex represents a feature and the edge between two features represents their pair wise relationship. The weight on the edge reflects the degree of relevance between two features. Therefore, the graph G with the corresponding edgeweight or weighted relevance matrix has been formed. The algorithm states: a) computing the relevance matrix W~(w ij ) n|n based on the mutual information between feature vectors, b) dominant-set clustering to cluster the feature vectors and c) selecting the optimal feature set from each dominant set using the multidimensional interaction information (MII) criterion. Therefore, in Cluster-based feature selection method [37], the feature set Finding Non-Redundant and Relevant Gene Markers is partitioned into clusters of similar features where the number of clusters and the cardinality of the subset of selected features, is automatically estimated from the data. But this method relies on some user defined parameters.

Multiobjective Optimization (MOO) and Problem Description
In this section first the basic concepts of multiobjective optimization are described. Subsequently, the formulation of gene selection problem as multiobjective optimization problem is described.
MOO concepts. In many real world problems, there exist different aspects of solutions which are partially or wholly in conflict. Therefore, treating those problems as single objective optimization produces an unreliable result. In multiobjective optimization problem the objectives may estimate those different aspects of solutions which are conflicting in nature. The multiobjective optimization can formally be stated as follows [18], [19]. Find the vectorx x Ã~½ x Ã 1 ,x Ã 2 , . . . ,x Ã n T of decision variables which satisfies m inequality constraints: and p equality constraints: and optimizes the vector function: The constraints in Equation 4 and 5 define the feasible region which contains all the allowable solutions. Any solution outside this region is inadmissible since it violates one or more constraints.
The vectorx x Ã denotes an optimal solution in . The essence of multiobjective optimization technique can be determined through Pareto optimality [21]. Pareto optimal set comprises of all those solutions for which it is impossible to improve any objective without simultaneous worsening in some other objective. It can be said that a vector of decision variables for at least one j when the problem is minimizing one. Here, denotes the feasible region of the problem (i.e., where the constraints are satisfied). Pareto optimal set [22] generally contains more than one solution because there exist different 'trade-off' solutions to the problem with respect to different objectives. The set of solutions contained by Pareto optimal set are called non-dominated solutions. The plot of the objective functions whose non-dominated vectors are in the Pareto optimal set is called the Pareto front [22]. Specifically MOO is a process of generating the whole Pareto front or an approximation to it.
Problem description. In this article the target is to find nonredundant but relevant features from a data matrix. In other words the resultant features are not only non-correlated but significant too. So the problem should be defined in such a manner that the correlated and irrelevant features are not selected. In our proposed scheme, the problem is equivalent to finding most dense subgraph from a weighted undirected graph. The arrangement of the data matrix can be viewed as a two-dimensional matrix where the rows indicate instances and columns indicate attributes or features. One additional column is there for presenting the corresponding class labels of the instances. A range of some similarity/dissimilarity measures includes correlation coefficient [38], Euclidean distance [39] and maximal information compression index [3] etc. Using one of these dissimilarity (negative similarity) measures the symmetric matrix is generated which is termed as a dissimilarity matrix. Let the data set has n features, F~ff 1 ,f 2 ,f 3 ,:::,f n g. Calculating pairwise negative similarity between features of the feature set F manipulates (n|n) symmetric dissimilarity matrix Sm. Therefore from this dissimilarity matrix Sm a weighted complete graph G can be formed. Since a node represents a feature, so the vertex set of the graph G is V~ff 1 ,f 2 ,f 3 ,:::,f n g, i.e., the graph contains total n nodes. The value at row i and column j in the dissimilarity matrix Sm, represents the weight of the edge between node f i and f j . As each feature has some dissimilarity value with every other feature (present in dissimilarity symmetric matrix Sm), hence the graph G is a complete graph. Fig. 1 demonstrates the process of conversion from data matrix to feature-dissimilarity graph. First the dissimilarity matrix (for edge weight) is calculated for the data matrix using correlation coefficient between each pair of gene. The correlation coefficient s between two random variable x and y can be defined as [38]: where var() denotes the variance of a variable and cov(x,y) the covariance between the variables. If x and y are completely correlated, i.e., exact linear dependent exist, then s(x,y) is 1 or 21 and if totally uncorrelated then s(x,y) is 0. Hence (1{js(x,y)j) represents the dissimilarity between x and y. Subsequently, a graph G is formulated from the dissimilarity matrix. Let the samples are belong to either class1 (denoted by c1) or class2 (denoted by c2). Then the signal-to-noise ratio (SNR) value (node weight) corresponds to each feature (f i ) is calculated using mean and standard deviation (s.d.) of class1 samples (c1) and class2 samples (c2) and defined as [40]: The SNR describes the ratio of the relative mean to the sum of Standard Deviation of two classes of samples. Basically, it describes the difference between central tendency and variation or dispersion exists from the average value of the data points. A low SNR indicates that the feature does not have much different values in different classes. Whereas, high SNR indicates that the feature values are spread out over a large range of values and it is expected that the values are different in different classes. Very low SNR may be considered to be insignificant to the class labels and high SNR value means feature is highly differentially expressed. Therefore the SNR value is treated as feature relevance. For the graph G larger edge weight means that the features connected by that edge are more dissimilar and larger node weight means features are more relevant. Thus finding the most dense subgraph g from graph G is equivalent to finding the non-redundant and most relevant feature set, as the features (nodes) enclosed by the subgraph g, will have maximum average edge weight (dissimilarity) and maximum average node weight (SNR). Therefore the problem can be defined as: find the most densest subgraph (g)

Proposed Multiobjective Binary PSO-based Approach
Particle Swarm Optimization (PSO) [41], [42] is a well known swarm-based optimization techniques which optimizes a problem by iteratively trying to get better candidate solutions with respect to a given fitness measure. In PSO, a set of particles or candidate solutions traverse the search space with a velocity based on their own experience and the experience of their neighbors. During each traversal, the velocity and thereby the position of the particles are restructured. This process is repeated until some stopping criteria are met. Unlike other classical optimization techniques which tend to have premature convergence to local optimal solution, PSO is known for globalized searching.
In this article, the input data matrix is first transformed into a weighted undirected complete feature-graph, where the nodes (having relevance as node weight) symbolize the genes and the edges are weighted according to the dissimilarity of genes. In each iteration, a reduced subgraph is computed for which the average relevance and average dissimilarity among the genes contained by the reduced subgraph are maximized. Therefore, the densest subgraph having maximum average weight (node+edge) is identified by applying binary PSO [20]. The bPSO is applied to multiobjective optimization and with the help of non-dominated sorting [43] and Crowding Distance measure [18], small set of non-redundant informative genes is identified.
Particle encoding. Here the population is called swarm and it consists of m number of candidate solutions or particles. Each particle has n cells where n is the total number of genes comprises the data matrix i.e., each cell signify one gene from the data matrix. The cells can have values either 0 or 1. If the i-th cell of a particle has value 1 then i-th gene is selected from the dataset, otherwise it is ignored.
Initialization. Initially each cell of a particle is either 0 or 1 chosen randomly. After the initial particles are chosen, their corresponding fitness values are calculated. Then the velocity of each cell of the particle is initialized to zero. For each dataset, the algorithm is executed for 100 iterations. The input of the proposed system, i.e., the swarm size is set to 25 and the weighting factors c1 and c2 which are cognitive and social parameters respectively are set to 2.
Fitness computation. Here two objectives, average dissimilarity (negative correlation) and average signal-to-noise values are maximized. Each particle form a reduced subgraph for which average negative correlation (avg ncorr) and average SNR value (avg snr) are computed. As the bPSO algorithm is designed as minimization problem, so fitness values are computed as (1{avg ncorr) and (1{avg snr). Then cells are iterated as usual PSO evaluation [44]. Now for calculating fitness values of a particle, those genes are selected for which representing cells have value 1. Therefore, these selected genes of the corresponding particle forms a subgraph g½v,e,vw,ew where v is the set of nodes, e is the set of edges, vw is a vector of node weights by computing SNR value for each node and ew is a edge weight matrix calculated by (1-correlation) between each pair of nodes. Thereafter, avg ncorr (Equation 9) and avg snr (Equation 10) are defined as avg snr~P Updating position and velocity. As each cell represents one gene, so here the two terms cell and gene are used interchangeably. The position of a gene within a particle contains either 0 or 1, and velocity of each gene is initialized to zero. Using the information obtained from the previous step the position and velocity of each particle are updated. Each particle keeps track of the best position it has achieved so far in the history, and this best position is also called pbest or local best. In multiobjective perspective, that position is chosen for pbest for which fitness of that particle dominates other fitnesses acquired by that particle in the history, if there is no such fitness then random choice is done between current and previous position of that particle. The best position among all the particles is called global best or gbest which is randomly chosen from the archive of non-dominated candidate solutions. Actually whenever a particle moves to a new position with a velocity, its position and velocity are altered according to the Equations 11 and 12 given below [20]: Here t is the time stamp and i-th particle and j-th position are considered. In Equation 11 new velocity (v ij (tz1)) is acquired using velocity of previous time (v ij (t)), pbest and gbest. Then new position (x ij (tz1)) is obtained by adding new velocity with current position (x ij (t)) as shown in Equation 12. r 1 and r 2 are two random value in the range of 0 to 1. w in Equation 13 is the inertia weight which is computed as: Updating archive. The repository where the non-dominated population in the history is reserved called archive. First the archive A is initialized with non-dominated population of P i . Next for updating the archive A, the next generation population P iz1 is merged with the archive A i i.e., A iz1~Ai zP iz1 and then nondominated solutions are yielded by applying non-dominated sorting and crowded distance sorting to the combined archive A iz1 . The non-dominated sorting and crowded distance sorting are evaluated for this combined population to obtain better diversity of the Pareto optimal front.
Proposed MObPSO algorithm. Here, the proposed multiobjective binary particle swarm optimization (MObPSO) is designed for maximizing the dissimilarity (negative correlation) and SNR, which are represented as edge weight and node weight, respectively. The adopted graph based MObPSO technique is illustrated in Table 1 Algorithm 1. The population is initialized by arbitrarily selected features from the data matrix and population fitness values are calculated using Equation 9 and Equation 10. The archive A is initialized by the population after non-dominated sorting of the primary population. Velocity and position are updated using Equations 11 and 12 respectively. Local best P is updated comparing the current fitness and previous fitness of a particle and global best G is updated according to random picking of particle from the archive. After updating the position and velocity, the archive is added with next generation solution and then non-dominated sorting [43] and crowding distance [18] sorting are used to revise the extended archive. These steps are repeated for particular number of iterations.

Results and Discussion
In this section, we first describe the real-life datasets and their preprocessing procedure, thereafter portray the performance metrics followed by the results of different algorithms.

Datasets and Preprocessing
In this article three real-life gene expression datasets are used which are publicly available from the following website: www. biolab.si/supp/bi-cancer/projections/info/. The above described two-class datasets can be obtained as matrix format whose columns are genes and rows are samples and preprocessed by SNR (Equation 8) for each gene (column). The genes (column) of the data matrix are sorted according to the decreasing order of obtained jSNRj. Lastly from the data matrix top 100 genes are taken. After that the data matrix is normalized to set each gene expression value in the range from 0 to 1.

Score Analysis
Performance is evaluated using sensitivity, specificity, accuracy, fscore, AUC and average correlation. The entire dataset is divided into two different sets: training and test set. The proposed approach is applied on the training data. Therefore, a set of nondominated candidate solutions are obtained. After that, for final marker genes assortment, we employ the BMI-score [45] which considers the discriminative power of each gene by incorporating the true positive rate from logistic regression. In mathematical terms, let us assume a data set D consisting of two groups 'control (ctr)' and 'experiment (exp)'. BMI assigns a score for a feature x defined as follows: Here, l is a scaling factor and TP 2 is the product of the true positive (TP) rates determined for each group using logistic regression. CV ctr and CV denote the coefficient of variance for the feature x in the 'control' group and in both groups, respectively. Also, D~ x x= x x ctr , where x ctr , and x denote the mean value of x in 'control' and in both groups, respectively. The maximum BMIscore generating candidate solution is considered as the most informative solution. The performance of the proposed algorithm is compared with that of its single objective versions and other two statistical tests like T-test and Wilcoxon Ranksum test. The datasets are arbitrarily divided into two sets: training set and test set. This process is repeated 10 times and we got 10 train sets and their corresponding 10 test sets. Each of the algorithms is executed for each train file and evaluated with the corresponding test file. Thus for each algorithm, we got 10 sensitivity, 10 specificity, 10 accuracy and 10 F -score values. Now the average of these 10 values for each performance metric with standard deviation are computed and tabularized.  Table 2 that for each score metric the proposed method outperforms (0.8962, 0.9, 0.898, 0.9002, 0.964) the singleobjective versions, T-test, Ranksum test, SFS and SBE. Regarding sensitivity, our method is better than Graph-based and Cluster-based but differs slightly with CFS and mRMR (miq). Again with respect to specificity, the performance is average. In case of accuracy and fscore, proposed method is better than mRMR (miq) and Cluster-based method but not as good as CFS and Graph-based method. The AUC produced by the proposed method is 0.964 which is better than all the other methods. Except T-test, our method produces 0.4714 as average correlation which is less compared to that for the other Input: data matrix dt, C = number of genes, N = number of particles, threshold thr~0:9, Graph G~½V ,E,VW ,EW designed from dissimilarity matrix Sm.
Output: archive A 1: ½x n ,v n ,G n ,P n N n~1 :~initialize(dt) 4Random locations and velocities 2: g N ½V 1,E1,VW 1,EW 1~Sm(V 1 N n |V 1 N n ) 4subgraphs g N for N particles are formed from dissimilarity matrix Sm methods. This indicates that non-correlated genes are identified. As population-based optimization techniques take more time to execute, therefore time complexity of our method is 81.176 Sec. which is not so high than other comparative methods.
For DLBCL data, Table 2 shows that with respect to average sensitivity, fscore and AUC our proposed technique (0.9111, 0.8428 and 0.9644) uniformly scores better than all the other methods. With respect to specificity, proposed method has scored better than all but CFS and Graph-based method. The accuracy produced by our method is also better than others except Graphbased method. Although CFS and Graph-based method result less correlated genes but their sensitivity is very bad. Time complexity for proposed method is higher than others but however, the difference is not very high.
Moreover, for Child-ALL data, it is clear from Table 2 that the proposed scheme has established its superiority in case of sensitivity, accuracy. But with respect to average specificity the score is 0.8233 which is not better than singleobjective (SNR), Ranksum test, SFS, CFS, mRMR (miq) and Graph-based method. But with respect to fscore and AUC, most of the time, proposed method produce better score than others. Again average correlation of the proposed method is 0.7324 which is also the lower than others except CFS. Hence the proposed technique uniformly yields better values which prove the superiority of our proposed technique.

Cross-Validation Performance
The performance analysis is extended using 10-fold cross validation. All the algorithms are executed on the total sample versus gene dataset and the output genes are validated using 10fold cross-validation using Support Vector Machine (SVM). The cross-validation scores of different algorithms are reported in Table 3. It is clear from the table that for the prostate dataset, with respect to sensitivity, specificity, accuracy and fscore proposed method outperforms than other methods except CFS. With respect to AUC, our method is better than CFS, mRMR(miq), Graph-based and Cluster-based. The average correlation for our method is very much lower than other methods i.e. proposed method results more non-redundant features than other comparative methods. But it is obvious from the table that it took more time to execute than others. In case DLBCL dataset, with respect to sensitivity, accuracy, fscore and AUC, the proposed method performs best among all the methods. With respect to specificity, the proposed method performs slightly less than singleobjective (SNR), T-test, Ranksum test, CFS and Graph-based method. The average correlation produces by the proposed technique is less than other methods except mRMR (miq). It can also be noticed from the table that the execution time for the proposed method is 3.6832 Seconds but the difference with other method is less. For the Child-ALL dataset, with respect to accuracy, fscore and AUC the proposed method performs better than other comparative methods. With respect to sensitivity, the score is average and less than other methods. The specificity scored by the proposed technique is 0.719 which highly better than other methods except Graph-based method. The proposed method produced 0.6764 as average correlation which is less than other methods except CFS.

Gene Marker Analysis
After executing the proposed technique 10 times we got 10 feature sets. Thereafter we took those genes as maker which appears at least 5 times in the 10 feature sets. Table 4 describes the gene markers ID, Symbol and Description for the three datasets. Among the gene markers, many of those have already been validated to be associated with the respective cancer classes in different existing literature. Such as for prostate cancer data the gene 32243 g at (CRYAB) and 33904 at (CLDN3) have been reported in [46] and 37639 at (HPN) and 41504 s at (MAF) have been reported in [47]. Also the genes X 02152 at (LDHA) and M14328 s at (ENO1) of DLBCL have been reported in [48]. Again in [49], the genes 41117 s at (SLC9A3R2), 33069 f at (UGT2B15)of Child-ALL data are reported. In Fig. 2, Fig. 3 and   Fig. 4, the heatmaps of the feature sets identified by our proposed technique for prostate dataset, DLBCL dataset and child-all dataset are shown respectively. The heatmaps show gene versus sample matrix. The cells of the heatmap represent the expression levels of the genes in terms of colors. The red shades represent high expression levels whereas the green shades represent low expression levels and the colors towards black represent the medium expression values. It is evident from the figures (2, 3 and 4) that the gene markers for each tumor subtype has either high expression values (Up-regulated) or low expression values (Downregulated) over all the samples of the respective tumor class. From  Fig. 2, it is clear that the genes 37639 at (HPN), 32243 g at (CRYAB), 33904 at (CLDN3) and 41504 s at (MAF) are upregulated (high expression value in normal tissue and low expression in tumor tissue) and genes 40435 at (SLC25A6) and 33614 at (RPL18A) are down-regulated (vice-versa). Then it can be seen from Fig. 3 that the genes X 02152 at (LDHA), M14328 s at (ENO1) and U59309 at (FH) are all downregulated with respect to DLBCL to FL. Subsequently, for child-ALL data all genes are down-regulated because Fig. 4 depicts that high expression value in before-therapy class and low expression value in after-therapy class.

Conclusion
In this proposed study, the problem of supervised feature selection is posed as relevant and non-redundant gene markers identification from microarray gene expression data. The microarray data matrix has been converted into feature-dissimilarity graph where nodes stand for features. The nodes and edges are weighted according to feature relevance and dissimilarity value between features, respectively. Then the densest subgraph having maximum average node and edge weight has been identified that means features with high relevance and less redundant are selected as output. For identifying subgraph having non-redundant and relevant feature nodes, a graph based multiobjective bPSO has been proposed. Here, bPSO has been modeled using multiobjective framework which is based on non-dominated sorting and crowding distance sorting. Three real life datasets have been used for performance analysis. The comparative study between the proposed technique and its single objective versions, T-test and Ranksum test has been performed. Moreover, gene marker analysis with respect to each dataset is also illustrated. As a future scope, we plan to incorporate a supervised wrapper based approach to calculate objective functions using fuzzy association rules.