A multiobjective multi-view cluster ensemble technique: Application in patient subclassification

Recent high throughput omics technology has been used to assemble large biomedical omics datasets. Clustering of single omics data has proven invaluable in biomedical research. For the task of patient sub-classification, all the available omics data should be utilized combinedly rather than treating them individually. Clustering of multi-omics datasets has the potential to reveal deep insights. Here, we propose a late integration based multiobjective multi-view clustering algorithm which uses a special perturbation operator. Initially, a large number of diverse clustering solutions (called base partitionings) are generated for each omic dataset using four clustering algorithms, viz., k means, complete linkage, spectral and fast search clustering. These base partitionings of multi-omic datasets are suitably combined using a special perturbation operator. The perturbation operator uses an ensemble technique to generate new solutions from the base partitionings. The optimal combination of multiple partitioning solutions across different views is determined after optimizing the objective functions, namely conn-XB, for checking the quality of partitionings for different views, and agreement index, for checking agreement between the views. The search capability of a multiobjective simulated annealing approach, namely AMOSA is used for this purpose. Lastly, the non-dominated solutions of the different views are combined based on similarity to generate a single set of non-dominated solutions. The proposed algorithm is evaluated on 13 multi-view cancer datasets. An elaborated comparative study with several baseline methods and five state-of-the-art models is performed to show the effectiveness of the algorithm.


Introduction
In the field of biology and medicine, classification has wide range of applications [1]. With the advancement in microarray technology, generation of thousands of gene sequence data points for cancer-tissue datasets has become possible. It is possible to accurately differentiate between diagnostic and treatment. Available methods for patient stratification are dependent on gene sequence data and patients are grouped based on the expression profiles [2,3]. In addition to gene sequence data other data types, like miRNA (microRNA) expression, DNA methylation, can be explored to improve the accuracy of patient classification models [4]. Each of these data is termed 'omic' (genomics, transcriptomics, methylomics, respectively). The objective here is to identify groups with similar molecular characteristics.
Integrative clustering of several omics data for the same set of samples can disclose more precise structures that are not exposed by examining a single omic data. By exploiting the information present in multiple omics, clustering techniques can obtain better performance compared to a single omic. Some of the advantages of clustering based on multiple omics are given as follows: (i) multi-omics clustering reduces the effect of noise in the data, (ii) each omic can reveal structures that are not present in other omics, (iii) different omics can unfold different cellular aspects.
A major difficulty of cluster analysis is the selection of best clustering algorithm for a given data set [5].Many omic datasets possess heterogeneous structures whereas most of the existing clustering algorithms search for homogeneous structures from a dataset. The problem of algorithm selection for clustering datasets having heterogeneous structures can be addressed by combined use of cluster ensemble and multi-objective clustering techniques [6].
Recently, Li et al. [7] proposed a novel method of combining multi-objective optimization (MOO) with integrated decision making (IDM) to address the problem of combined heat and power economic emission dispatch. Authors used a two-stage approach. In the first stage, θ−dominance based evolutionary algorithm is used to generate Pareto-optimal front of the model. In the second stage, using fuzzy c-means clustering, the obtained Pareto-optimal solutions are clustered to identify the best compromise solutions using grey relation projection.
In this paper, the clustering problem is formulated as an optimization problem where different cluster quality measures are used as the objective functions. We have introduced a multi-objective based multi-view cluster ensemble algorithm (enAMOSA, in short), which simultaneously uses the concepts from both cluster ensemble and multi-objective based multiview clustering algorithms. The key idea is to minimize problems associated with cluster analysis, as well as to overcome the limitations of multi-objective based multi-view clustering and cluster ensemble methods when they are used separately. Here ensemble is not used as a lateintegration technique, but it is used as a perturbation operator for generating new solutions based on the selected parent solutions. Throughout this paper, omic is termed as view and multi-omic as multi-view in the context of algorithms. An overview of the proposed method is given below: • enAMOSA conducts multi-view based multi-objective clustering by first identifying different partitions from the same data set using different views. To capture the goodness of an individual clustering generated using a single view, an internal cluster validity index, conn-XB index [8], is used. The values of the conn-XB index for different partitions obtained using varying views are simultaneously optimized along with agreement index [9]. Agreement index measures the agreement among multiple partitions obtained using different views in a new way. A special perturbation operator is used which replaces the traditional mutation operator. This operator uses an ensemble method along with the initial population for generating new diverse solutions. Finally, the partitions obtained on multiple views are combined to generate a single solution.
• A large number of experiments are conducted to illustrate the efficacy of different components of the proposed enAMOSA algorithm. We have developed several baseline methods by generating all possible combinations of the base partitions used in the experiment. These baselines are explained in detail in the later sections of the paper. To further demonstrate the effectiveness of the proposed perturbation operator, we have also compared results of enAMOSA with another version of multi-view AMOSA where normal perturbation operator is used (the perturbation operator used in [9]) and ensemble technique is used separately for combining the final Pareto optimal solutions generated by different clustering algorithms.
• The developed algorithm is tested on 13 genomic datasets. Results are compared with those obtained by baseline algorithms and existing state-of-the art models.
The overall steps of the proposed algorithm are shown in Fig 1. Some of the contributions of our proposed methodology are as follows: • To the best of our knowledge, this work is the first multi-objective based multi-view approach for capturing heterogeneous structures from multi-omics data in the field of patient classification.
• A new perturbation operator is designed by combining the concepts of both multi-objective multi-view clustering and cluster ensemble. It improves the robustness of the proposed algorithm to deal with data having different types of clusters.
• The algorithm is capable of capturing heterogeneous structures within a view and also amongst different views.
• In our proposed algorithm, different views can be clustered using different clustering algorithms. Further, the views can have a different number of clusters. To the best of our knowledge, previous multi-view multi-objective algorithms, like MvAMOSA [9], allow different views to be clustered by the same clustering algorithm and also restrict all the views to have the same number of clusters.

Background
In the literature, several semi-supervised or supervised classification methods [10][11][12] are developed for cancer diagnosis. These classification techniques classify tumor samples in cancer dataset as malignant or benign or any other sub types [13]. But it is not always possible to obtain labeled tissue samples. For example, real life gene expression datasets in Ref. [14] or microRNA datasets in Ref. [15] are some unlabeled datasets. Hence, application of supervised classification techniques in cancer classification problem is difficult due to unavailability of labeled data. Thus clustering techniques become popular in solving different problems from bioinformatics domain. Multiple molecular profiling data can be collected for the same individual. Exploiting these data separately and then combining them can significantly improve the clinically relevant patient subclassifications [16]. This section discusses existing works on multi-view clustering, cluster ensemble techniques, drawbacks of the state-of-the art models and motivation of the work.

Existing works on multi-omics/view clustering methods
The increase of multi-modal datasets in real-world applications has raised the interest in multi-view learning [17].
Based on the algorithmic approach multi-view clustering methods can be broadly classified into three categories; (i) Early integration, (ii) Late integration, and (iii) Intermediate integration.
Early integration approach is the simplest amongst all. In this approach, at first, all the different views are concatenated to form a single large dataset with features from multiple views. The resulted dataset is clustered using any single-view clustering method. However, this approach has some major drawbacks. Firstly, it causes a significant increase in the data dimension which is a challenge for clustering algorithms. Secondly, it ignores different distributions present in different views of the dataset. LRACluster [18] and Structured sparsity [19] are some of the methods which use early integration approach. LRACluster [18], uses a latent representation of the samples to determine the distribution of numeric, count and binary features. It optimizes a convex objective and provides a globally optimal solution. Structured sparsity [19] method concatenates the views and applies a weighted linear transformation for clustering. The features that do not contribute to the cluster structure are assigned with low weights.
In late integration approach, each view of the dataset is clustered separately using a singleview algorithm. Here, each view can be clustered using different clustering algorithms. Finally, the clusters from different views are integrated to form combined global clusters. COCA [20] and PINS [21] are examples of methods using this approach. PINS [21] uses a connectivity matrix to integrate clusters of different views. This algorithm first adds some Gaussian noise to the data, the cluster number is chosen in a way that clustering is robust to the perturbation. Serra et al. [16], proposed a multi-view approach, MVDA, for identifying different clinically relevant patient-subclasses by combining the information present in multiple high-throughput molecular profiling data sets generated by omics technologies.
Intermediate integration approach involves the following; (i) methods where views are integrated using similarity/distance, (ii) methods that use joint dimension reduction for different views and (iii) methods using statistical modelling of the views.
Chikhi [22], proposed a generalized spectral clustering algorithm, Multi-View Normalized Cuts (MVNC). It is a two-step approach. Initially, the spectral clustering is applied on the dataset followed by a local search to refine the initial clustering. Similarity Network Fusion (SNF) [23] is another similarity-based method which constructs a similarity network for each view separately. Using an iterative process these networks are fused together. Regularized Multiple Kernel Learning with Locality Preserving Projections (rMKL-LPP) [24], performs dimensionality reduction on different views such that similarities amongst the samples are preserved in low dimensions. Subsequently, K-means is applied to this low dimensional representation. Zhang et al. in [25], proposed CMVNMF (Constrained Multi-View clustering based on NMF). It is an extension of the NMF model where different views can contain different samples, but certain samples from different views are constrained to be in the same cluster. iCluster [26] utilized a joint latent-variable model to detect the grouping structure from multi-omics data. iCluster+ [27], an extension of iCluster, includes different models but maintains the idea of iCluster that data originates from a low dimension. The latest extension is iClusterBayes [28]. This method uses Bayesian regularization and is much faster compared to its previous variants. In [29], authors proposed an parameter-free clustering models, Adaptively Weighted Procrustes technique, for multiview clustering. Authors in [30], proposed a self weighted multiview clustering technique (SwMC).

Existing works on cluster ensemble
Cluster ensemble is a technique of deriving a better clustering solution from a set of candidate clustering solutions [5,31]. A cluster ensemble algorithm can be presented as a two step approach: (i) a diverse set of base partitions are generated; and (ii) these partitions are combined to form a single consensus partition. Depending on the type of base partitions, cluster ensemble is of two types, viz., homogeneous and heterogeneous. When base partitions are obtained from same clustering algorithm, it is called homogeneous and in contrast if base partitions are obtained from different clustering algorithms, it is called heterogeneous. Based on the type of consensus function used, the existing approaches of cluster ensemble are mainly categorized under co-association, graph/hyper-graph partitioning, mutual information or relabeling [6].
In [32], authors formalized the cluster ensemble problem as a combinatorial optimization problem in terms of shared mutual information. They have proposed three algorithms: MCLA (meta-clustering algorithm), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA). Depending on the mutual information shared, a consensus function can be applied to select the best partition amongst those produced by these three algorithms.
Based on the base partitions, the CSPA algorithm constructs a similarity matrix. Values in the matrix denote the fraction of partitions where two objects belong to the same cluster. Further, a similarity-based clustering algorithm is applied on this matrix to generate the consensus partitioning.
In HGPA algorithm, a hypergraph is constructed by representing base partition clusters as hyper-edges of the graph. This hypergraph is partitioned by cutting with a minimal number of hyper-edges.
In MCLA algorithm, a meta-graph is constructed, where each base partition cluster forms the vertex. Similarity between the vertices represents the edge weights of this graph. Vertices belonging to the same partition do not have edges. On partitioning the meta-graph, the clusters belonging to the same group are considered correspondents. The objects are assigned to the meta-clusters they are strongly associated with, generating the consensus partition.
In the HBGF (Hybrid bipartite graph formulation) HBGF [33], a bipartite graph is constructed from the set of base partitions. Objects and clusters are simultaneously modeled as vertices of the graph. In the end, a graph partitioning algorithm is applied on the generated bipartite graph. The resulting division of the objects is the consensus partitioning.

Drawbacks of the existing literature
In the field of patient sub-classification, multi-view data from multiple omics technologies can be obtained for same individual. The clinically relevant patients sub-classification can be significantly improved by combining these data, rather than exploiting them separately. However, by and large, multi-view clustering approaches have not penetrated bioinformatics yet [34]. The existing multi-view based classification techniques for patient sub-classification suffer from the following drawbacks: 1. Existing multi-view clustering problems are mostly solved as single objective optimization problems. A single quality measure for partitioning is optimized implicitly or explicitly using various paradigms of unsupervised single-view learning. Initially different views of the dataset are partitioned and later the agreement between the partitions obtained on different views is optimized. But instead of treating these two objectives (goodness of partitions obtained using individual views and agreement among-st partitions) separately, it is better to optimize them simultaneously for capturing better partitioning structures among the views.
2. The existing multi-view based approaches applied for patient sub-classification problem are very simple in structure and cannot effectively identify more than one relevant structures of the datasets.
3. Multi-objective clustering algorithms can identify different alternative partitionings of a dataset after a single execution. But as the number of alternatives increases, the analysis becomes harder.
4. In the patient-stratification problem, cluster ensemble is mostly used during view integration. But, the literature lacks the use of any multi-view multi-objective algorithm combinedly with an ensemble technique rather than separately, to capture fine-structures present among different views.
5. Most of the existing multi-view algorithms are designed to capture homogeneous structures among multiple views.
6. Existing multi-view multi-objective algorithms allow the same clustering algorithm for partitioning the data over multiple views of the sample and also restrict the views to have the same number of clusters.

Motivation
The general aim of any multi-view clustering is to improve the cluster quality in each view and to increase the agreement between multiple partitionings obtained using individual views. By nature it is a multi-objective optimization problem with two types of objectives, cluster quality over different views and agreement between multiple views, to be optimized simultaneously. Further, multi-omics datasets exhibit complex structures, difficult for single-objective based clustering algorithms to capture. Although the multi-objective approach offers a set of alternative structures of the dataset, as the number of alternatives increases, the analysis becomes harder. All these motivated us to develop a new multi-objective based multi-view algorithm with a unique ensemble based perturbation operator that is capable of capturing the finetuned structures in multi-omic datasets.

Problem formulation
The multi-view cluster ensemble problem is formulated as a multiobjective optimization problem.
• Given: • A multi-view dataset containing V views and n number of samples S ¼ f� • d m is the number of features in the m th view, and D m is the n × d m matrix representing the m th view.
• D m ij is the j th feature of the i th sample in the m th view.
• Concatenation of the V views produces matrix D of size n × d, where d ¼ P V m¼1 d m is the total number of features.
• A set of objective functions where each CV i is a cluster validity index measured on the partitioning obtained after considering only view m for the given data set, and AI is used for measuring the agreement between the partitions obtained for different views.
• Find: • A consensus partitioning (U) generated by ensembling the outputs of clustering algorithms, CA 1 , CA 2 , . . ., CA p , satisfying all views • The set of samples, S, is divided into K clusters, {U 1 , U 2 , . . ., U K } . . . ; � x i n i g; n i : number of samples in cluster i; � x i j : jth sample of cluster i.
• which simultaneously optimizes the objective functions. The simultaneous optimization of these objectives produces a Pareto optimal front.

enAMOSA: Ensemble based multi-view archived multi-objective simulated annealing
This section discusses about the proposed multiobjective based multi-view cluster ensemble approach, namely enAMOSA.
To overcome the difficulties of traditional clustering algorithms, enAMOSA combines characteristics of cluster ensemble and multi-view based multi objective clustering methods. enAMOSA comprises of three main steps: (1) generation of diverse set of base partitions for each view, (2) determination of an ensembled partitioning considering the multiple base partitions and (3) finally generating a consensus partitioning satisfying different views. The proposed algorithm differs from traditional ensemble approach in two ways. Firstly, instead of producing a single consensus partitioning, it produces a set of consensus partitionings. In fact, the set of solutions can contain partitionings that are combinations of other partitionings, or partitionings of high quality that already appeared in the set of individual partitionings. Secondly, it is an iterative process. For each iteration, it combines pairs of partitionings for each view and then the views are integrated to generate a new solution for evaluation. The steps involved in enAMOSA are shown in Algorithm 1.
The calculation of dominance among the solutions is the same as in AMOSA [35]. In the Algorithm 1, temperature (temp) plays a significant role in calculating the probability of acceptance of a solution.
Algorithm 1: Algorithm for enAMOSA Initialize: iter, SL, HL, T min , T max , no_views, α, temp = T max 1 begin 2 Initialize pool with solutions from k-means, complete linkage, fast search clustering and spectral clustering.
Compute dominance of the solutions in pool. 10 Initialize Generation of base partitions. To generate the initial solutions (called base partitions), four different clustering techniques (called base clustering algorithms), hierarchical (complete linkage) [36], K-means, fast search [37] and Spectral clustering, are applied on each view of the given dataset. These four algorithms used belong to different categories of the clustering algorithm, like, K-means represents the centroid models, hierarchical represents the connectivity model and, spectral and fast search represent the density based model of clustering. The more diverse the base algorithms, higher the chances of capturing differently shaped clusters of the data set. It is essential to have different types of partitionings in the initial archive so that enAMOSA can receive as much information as possible to find an optimal number of possible existing structures.
The choice of clustering algorithms for generating base partitions are not merely restricted to these four clustering algorithms only, but other clustering algorithms can also be used.
The number of clusters (K) which will be given as an input to the base clustering algorithm (^) is determined randomly. The number of clusters is varied over the range K min to K max . Here, the value of K min = 2 and K max ¼ ffi ffi ffi n p , where n denotes the number of samples. A value K is selected randomly between the range K min and K max with uniform probability.^is applied to the data set with the number of clusters = K varying the views.
But fast search [37] is a density-based clustering and parameters are determined automatically from the corresponding views. This algorithm does not consider the number of clusters as input. It automatically determines the number of clusters from any given dataset.
At the end of this step, we have a set of base partitions for each view. Archive initialization. For each view, we compute the dominance of the base partitions obtained in previous step. A set of non-dominated solutions are generated from each view. The archive is initialized with these non-dominated solutions.
The initial population of the archive in enAMOSA is not generated randomly, as is done for most of the AMOSA based clustering techniques. Instead, it is composed of a set of base partitions, π 1 , generated by running a diverse set of conceptually different algorithms.
String representation. In order to represent the initial partitioning solutions generated by different clustering algorithms, membership matrix based representation scheme is used.
For example, if K-means is executed on V different views with the corresponding set of attributes with the number of clusters = K, then for each case, a membership matrix, Mem of size K × n is obtained as follows: Here, � x j denotes the j th data point and U i denotes cluster i. Mem ij denotes the membership value of the j th data point for the i th cluster.
Suppose the data set is having total V views and the clustering algorithm^is selected to be executed on the data set. Then for a given view, a membership matrix of size K × n is generated. Total V such membership matrices of size K × n are encoded in the string. Thus length of the string is V × K × n. Fig 2 shows an example of the proposed string representation. All the strings of the archive are initialized in the above way.
Perturbation operator. The special perturbation operator uses ensemble method along with initial population for generating new solutions.
This operator finds the consensus partitioning between a pair of selected parents, for each individual view. Any existing cluster ensemble method can be used in enAMOSA as the perturbation operator. The idea is to generate new good-quality solutions which are combinations of previous two solutions. First, two parents are randomly selected from the archive to be combined. The combination is done for individual views. Ensemble based operator is applied on the membership matrices present in two selected solutions for a given view. Let p 1 1 and p 2 1 , be the membership matrices of two selected solutions, respectively, with K 1 1 and K 2 1 number of clusters (p 2 1 means second selected parent from view 1 and similarly for others). Let the ensembled solutions be represented by p F 1 and p F 2 . The number of clusters K F 1 for partition p F 1 is chosen randomly in the interval ½K 1 1 ; K 2 1 �. Second, the parents are combined using ensemble method. The consensus partition generated has K F 1 clusters. Illustration of the operator is given in Fig 3. The operator is briefly described in Algorithm 2.
Here, CSPA (cluster-based similarity partitioning algorithm) technique [32] is used as underlying ensemble method. The operator worked as follows: • Let the selected clustering solutions be: π 1 and π 2 with K 1 and K 2 number of clusters, respectively. Let the corresponding partitionings be U 1 corresponding to the solutions, π 1 and π 2 , respectively.
• For each partitioning solution, an adjoint matrix, A k n�n , k = 1, 2 is generated as follows:  • A new similarity matrix, Sim n × n is computed as follows: This similarity matrix is used for clustering the data set using any standard similarity-based clustering algorithm like hierarchical clustering technique. In general an induced similarity graph (vertex = object, edge weight = similarity) approach using METIS [38] can be used along with the newly generated similarity matrix. The above ensemble based operator is applied for individual views separately. Algorithm 2: Algorithm for new perturbation operator.
return new_pt 10 end Objective functions. The optimization framework uses two objective functions: (i) Agreement Index [9] for measuring the agreement between partitions obtained from different views, and (ii) Connectivity based XB-Index or conn-XB Index [8].
Agreement index. Agreement Index [9] is used for measuring the agreement between partitions obtained from different views.
Here A v1 and A v2 are adjoint matrices of the views v1 and v2 respectively. The final Agreement index for the total partitioning is calculated as follows: Here, n total samples in the dataset and V is the number of views. connected-XB or conn-XB Index. In [8], authors have developed connectivity based XB-Index. The definition of this index follows the formulation of popular XB-Index [39].
d short ð� x; � z i Þ is the shortest distance between two points, � x and � z i , along the relative neighborhood graph [8]. It measures the connectivity between two points. If two points are connected / a path exists between these two points along the relative neighborhood graph (RNG) then d short value will be low. Here U i denotes the cluster i, � z i is the medoid of cluster i, n is the size of the whole data set and � z j denotes the cluster j. The objective is to lower the value of conn-XB index in order to obtain good partitioning.
A solution encodes total V number of membership matrices/partitionings. For each such membership matrix/partitioning, the value of conn-XB is calculated to measure the goodness of this partitioning. Let the values be conn−XB 1 , conn−XB 2 , . . ., conn−XB V . Then the objective functions corresponding to a single solution are fconn À XB 1 ; conn À XB 2 ; . . . ; conn À XB V ; 1 AI g enAMOSA simultaneously optimizes these (V + 1) number of objective functions. Consensus function for view combination. At the end of the execution of enAMOSA, we get a set of non-dominated solutions on the final archive. The psedu code is given in Algoriyhm 1. Each of these solutions encodes total V number of membership matrices. A new late integration method is proposed to combine the membership matrices present in a single solution. A consensus partitioning is obtained satisfying all the available views. So, in order to get a consensus partitioning, initially the common points of different clusters present in different partitionings obtained using different views are identified. This is achieved by majority voting scheme. If a pair of points cluster together in majority of the views then in the final partitioning they will also be grouped together. Likewise all the pairs of data points are evaluated. If some points are not assigned to any group (this situation may occur if even number of views are used and a tie occurs) then in the final partitioning, these points are assigned to the group of their nearest neighbors. The process is illustrated below: • Let the adjoint marices of the partitionings present in a string corresponding to different views be denoted by A k where k = 1. . .V, V = totalnumberofviews. Then a new adjoint matrix, A sum is computed as follows: • Now a new matrix A new is generated as follows: • The matrix A new is used to generate the final partitioning. Following a link based approach, connected components of the matrix A new are identified. Points are considered as vertices and the points, (i, j), whose A new ij ¼ 1 are connected by an edge. The connected components of this graph are treated as initial clusters. Let total number of clusters be K.
• For rest of the points which are not part of any of the clusters extracted in the previous step, cluster assignment is done as follows. Any point � x i will be assigned to kth cluster where: Here K denotes total number of clusters/connected components identified from the previous step. n k denotes the number of points in the kth connected component/cluster, � x k j denotes jth point of the kth cluster and d short ð� x i ; � x k j Þ denotes the shortest distance [8] between � x i and � x k j . • Finally a partitioning will be obtained where all the points are part of some clusters. This partitioning is reported as the final consensus partitioning for that particular solution.
• For each solution present in the archive, a single consensus partitioning is obtained. If archive−size = N, then N such consensus partitionings will be generated.

Theoretical analysis
Complexity analysis. In this section, the time complexity of enAMOSA is discussed. The basic steps and their complexities are as follows: Let HL = N, where N = size of Archive and SL = γ × HL, γ � 2 The final time complexity of enAMOSA is: Convergence analysis. In the proposed algorithm enAMOSA, we have simultaneously optimized two objectives, conn-XB Index [8] and Agreement index [9].
conn-XB Index [8] follows the formulation of popular XB-Index [39]. It measures the ratio between the cluster compactness and cluster separation. Xie-Beni validation index behaves convex when the samples are around the optimal values for the centroids [40]. Similarly, conn-XB Index [8] behaves convex under same condition.
Agreement index [9] measures the agreement between partitions obtained using different views. It is given by the following equation: where, n a = number of pairs of samples occurring together in both the views. n d = number of pairs of samples not occurring together in different views. If there are n number of samples then, Now, replacing Eq 8 in Eq 7, we have The value of n a is 0 � n a � n 2 , 0 when all pairs disagree and n 2 when all pairs agree.
Hence, AI is a monotonically increasing function in the range of n a . enAMOSA follows the formulation of AMOSA [35]. The acceptance probability is crucial for the behavior of the simulated annealing. enAMOSA adopts a dynamic acceptance which is dependent on the domination status [35]. It is given by: where, ΔE q,s, t represents the change in energy state of state q and state s at given temperature T. The convergence proof of simulated annealing based multi-objective optimization is elaborately explained in [41]. All the above mentioned factors ensure the convergence of the proposed algorithm.

Dataset collection and preparation
To evaluate the performance of the proposed algorithm we have used a total of 13 benchmark omic datasets. The details of the datasets are given in  [44] dataset contains data for three views: microRNA expression (GSE22220 accession number), mRNA (GSE22219 accession number) and DNA Methylation. Using clinical data also retrieved from the same source, patients were classified into four categories: Level1, Level2, Level3, Level4.
MSKCC.PRCA. This dataset contains samples from patients with prostate cancer tumors. It has three views: gene expression, miRNA expression and DNA Methylation. According to a study performed on this dataset [45], patients are grouped into two categories: first class is Tumor stage I and the second class is Tumor stage II, III and IV.
TCGA.GBM. Glioblastoma cancer the dataset has three views: gene expression, miRNA expression and DNA Methylation. As described in [46], patients are grouped into four categories: Classical, Mesenchymal, Neural and Proneural.

Preprocessing of datasets
One of the common features of omics datasets is that the number of samples is much smaller than the number of features. Normalization of features in different omics is necessary for handling different distributions. Further, feature selection for dimensionality reduction is essential to provide different omics an equal prior opportunity to contribute to clustering. Dimensionality reduction is also crucial for keeping the most informative features, reducing the load on the clustering algorithm. In our approach, we have used an unsupervised feature selection technique, variance score. For this, we calculated the variance of each feature. Among them, top 22 − 24% features having highest scores are selected. The number of selected features for different benchmark datasets are given in Table 1.

Evaluation metrics
To compare enAMOSA with other methods we have used two evaluation metrics, normalized mutual information (NMI) [47] and adjusted rand index (ARI) [48]. These metrics measure the similarity between the true and predicted partitions; higher values signify predicted class is more similar to true class.

Input parameters
The proposed approach, enAMOSA, is based on the multiobjective optimization technique, AMOSA [35]. It has three main components: (i) initial temperature value (T max ); (ii) cooling schedule; and (iii) number of iterations (iter) at each temperature.
The initial temperature is selected such that the algorithm can capture the entire search space. If initial temperature is set to too high, then it will accept all the proposed solutions and if set to too low it will transform into a greedy search. Here, the initial temperature (T max ) is set to achieve an initial acceptance rate of approximately 50% on derogatory proposals. Here, T min is set to 10 −3 . The initial temperature is selected based on the acceptance ratio of z, and average positive change in objective function, Δf o [49].
Here z = 1/2, The cooling schedule determines the functional form of the change in temperature required in SA [35]. The temperature is changed using commonly used geometric schedule, T i + 1 = α × T i , where α is the cooling rate and 0 < α < 1. As stated in [35], value of α is chosen between 0.5 to 0.99. This cooling schedule is simple in nature. There is a need for a small number of transitions to be sufficient to reach the thermal equilibrium. Here, the value of α is set to 0.8, causing a sufficiently small number of transitions in temperature to reach equilibrium.
The number of iterations at each temperature is chosen so that the system is sufficiently close to the stationary distribution at that temperature [35]. Less number of iterations will significantly reduce the search space, and the solution will not reach the global optimal. For our problem, as the sample size is not considerably large, the iteration value is set to 100.
In Table 2, we have reported the parameter settings used in the experiments.

Clustering performance
An extensive comparative study is performed to show the effectiveness of enAMOSA with respect to different approaches. The comparing approaches are briefly described below:   Tables 3 and 4. 2. In order to show the efficacy of ensemble based perturbation operator in enAMOSA process, we have developed another ensemble based multiobjective multi-view based approach, namely, AMOSA(ensemble). The steps of this approach are enumerated below: • The initialization of the archive will be done similar to that of enAMOSA. Four different clustering techniques, K-means, complete linkage, spectral and fast search clustering are executed multiple times with varying parameter values and the number of clusters. The membership matrices generated by these clustering techniques are encoded in the form of solutions of the archive. • For the perturbation operator we have used the following operator. The simple binary mutation is applied on each membership matrix encoded as a string with some probability. The binary bit value is flipped with some probability. Some points are randomly selected and their membership values are changed.
• In order to compute the objective functions, V number of membership matrices present in the string are obtained. The conn-XB-index values of all these V partitionings are calculated. The agreement index between these V partitionings is also calculated. The objective functions are conn À XB 1 ; conn À XB 2 ; . . . ; conn À XB V ; 1 AI � � • AMOSA process is applied to simultaneously minimize these objective functions. Note that the above process is different from the proposed approach only in the use of perturbation operator. Unlike enAMOSA here normal binary perturbation operations are used to generate new solutions. Thus initial solutions generated by base clustering algorithms were not ensembled during the optimization process. Each individual solution is evolved separately without mixing with other solutions. This algorithm is developed to show that ensemble based mutation operation indeed plays an important role in generating good solutions.
Using Eq 12, we have calculated the degree of contribution by each view in the final clustering obtained. Contributions computed for different views are shown in Fig 4.
Here, Adj v = the adjoint matrix of the partitioning obtained using view v Adj [v = the adjoint matrix corresponding to the final consensus partitioning.

Statistical significance test
For statistical significance test we have used a non-parametric test one-way Analysis of Variance (ANOVA) because it is independent of the distribution type of the dataset. The test is performed at 1% significance level. Results obtained by all the seven algorithms for each dataset are divided into seven groups. One-way ANOVA is conducted between enAMOSA group and remaining groups and results are reported in Table 5. All the p-values reported in Table 5 are less than 0.01. These values establish that improvements obtained by enAMOSA over other comparing algorithms are statistically significant.

Gene marker identification
From the clustering results obtained by enAMOSA on the OXF.BRC.1 dataset, we tried to extract the group of genes which have mainly contributed in patient classification. There are four patient classes in OXF.BRC.1 data set, viz., Her2, Basal, LumA, LumB. To identify the gene markers from Her2 class, we solved a binary classification problem. Two groups are created, one containing the samples from Her2 class and the other containing samples from rest of the classes. After considering both the groups, Signal-to-Noise Ratio (SNR) [50] is calculated for each of the genes. It is defined as, where σ j and μ j , j 2 [1,2], respectively, denote the standard deviation and the mean of class j for the corresponding gene. Higher SNR value for an individual gene signifies that it is having higher expression value for the class it belongs to and lower expression values for others. Finally total 10 genes are selected from the SNR list, with top 5 genes having highest SNR values (up regulated genes) and bottom 5 genes having lowest SNR values (down regulated genes). Similarly like Her2, the process is repeated for other classes too present in the dataset.    Table 6. Note that a gene up-regulated in one class can be down-regulated in another.

Biological significance test
To show the biological significance of selected genes, a biological significance test is conducted using Gene ontology consortium (http://www.geneontology.org/). For each GO term, the percentage of genes sharing that term among the genes of that cluster (% Cluster) and among the  whole genome (%Genome) has been reported in Table 7. From the results, it can be seen that the genes belonging to the same cluster share a higher percentage of GO terms compared to the whole genome. This signifies that the genes of a particular cluster are more involved in the similar biological process compared to the remaining genes of the genome.

Discussion
The average NMI and ARI values obtained by the execution (20 times) of our proposed method, enAMOSA, on all the 13 datasets are shown in Tables 8 and 9, respectively. From the Tables 8 and 9, it is observed that the results obtained by our proposed methodology outperform the results obtained by other state-of-the-art single objective algorithms (MVDA (unsupervised) algorithm [16], LRAcluster [18], PINS [21], SNF [23] and iClusterBayes [28]) by 10 − 11% (approx.) and 10 − 14% (approx.) in terms of NMI and ARI, respectively. Comparison of enAMOSA with other baseline versions (enAMOSA km , enAMOSA spec , enAMOSA cl and enAMOSA fs ) shows that combination of all the four base partitions (i.e., enAMOSA) performs better than its single base partition counterparts (enAMOSA km , enAMOSA spec , enAM-OSA cl and enAMOSA fs ) by 5 − 10% (approx.) and 8 − 11%(approx.) in terms of NMI and ARI, respectively. To show the effectiveness of the new perturbation operator, enAMOSA is compared with AMOSA(ensemble). NMI and ARI scores obtained by enAMOSA exceed AMOSA(ensemble) by 5 − 9%(approx.) and 8 − 10% (approx.), respectively. Results reflect the efficiency of the proposed integrated approach of ensemble and multi-objective algorithm through new perturbation operator over using them separately.
To further explore the importance of diversity in the base partitions, all possible combinations of the base partitions are generated and the results of NMI and ARI are presented in  Tables 3 and 4 respectively. Comparing these results with that obtained by the proposed algorithm enAMOSA (from Tables 8 and 9) it is observed that the proposed method outperforms its counterparts by 2 − 4% (approx.) for both NMI and ARI. The following observations are drawn from careful analysis of the results in Tables 3 and 4: 1. The hypothesis of the work is that the diversity in the initial solutions will allow the algorithm to capture more accurate cluster structures. From Tables 3 and 4, we can see that the NMI and ARI values obtained by combined base partitions are higher compared to their single counterparts in Tables 8 and 9 respectively. Further, within Tables 3 and 4 it is seen that combination of 3 base partitions produces higher results compared to the combination of 2 base partitions. For example, results obtained by enAMOSA cl,km,spec are better compared to any of its 2 base partitions like enAMOSA cl,km , enAMOSA cl,spec and enAMOSA km,spec . Similar results are true for other combinations also that are reported in Tables 3 and 4. Results support the initial hypothesis of the work.
2. The performance of the algorithm depends on the type of base partitions used for populating the archive initially. In the worst case scenario, enAMOSA generates results comparable to the best base partition solution the archive is initialized with. By this we mean that, suppose initially the archive is initialized with solutions obtained from two different base  Table 3. By analyzing Tables 3 and 4, it is observed that similar pattern is followed for other datasets also. At least, enAMOSA ensures to generate best possible solution among the base partitions. Tables 3 and 4, it is also observed that enAMOSA km,spec,fs performs better than other algorithms presented in these tables for most of the datasets. A closer analysis shows that mainly density-based clustering algorithms (fast search and spectral clustering) capture better structures from the datasets compared to K-means and hierarchical. It may be because density-based clustering algorithms are capable of capturing arbitrary cluster structures from the datasets.

From
Apart from NMI and ARI scores, we have also reported the F1-measure and accuracy score obtained by enAMOSA on all the 13 benchmark datasets in the Table 10.
In Fig 6, we have reported the gene expression profile plot for each individual classes (Basal, Her2, LumA and LumB) of OXF.BRC.1 dataset. The compactness of the structures shows that the clustered samples share the same type of gene expressions, i.e, within a cluster genes have good coherence among them.
In Table 11, we have reported the execution time (in seconds) for all the algorithms used in the experiment. All the algorithms are executed on machine having intel Core i5 7th Generation processor with 8GB of RAM. The time is calculated by taking the average over 20 runs of the algorithms. Execution time of enAMOSA is comparable to that of LRAcluster and iClusterBayes.

Scalability analysis
An important aspect of performance analysis is the study of how algorithm performance varies with parameters. In particular, we may evaluate the scalability of the algorithm, that is, how effectively it can use an increased number of samples. From the time complexity equation, Eq 6, it is observed that the execution time of the model depends on the total number of iterations (TotalIter), number of samples (n), number of clusters (r), size of archive (N) and number of objective functions (M). Now, the total number of iteration (TotalIter), size of archive (N) and number of objective functions (M) are fixed for all the datasets. As for the number of clusters are concerned, in our algorithm, the number of clusters is not fixed for any particular dataset, the algorithm automatically determines the number of clusters. So, the increase in execution time depends on the number of samples (n) present in the datasets. By analyzing the numeric values obtained by empirical studies in Table 11, clearly supports our finding. TCGA. BRC has the highest number of samples (629), it has the highest execution time of 9631.21 seconds and MSKCC.PRA has the lowest number of samples (151), it has the lowest execution time of 727.08 seconds. Similar kind of results are seen for other datasets also in Table 11, that is, execution time increases with an increase in sample points. The results reported in Table 11 reveals that the execution time of the algorithm is not huge with the increase in the number of samples in the data set; it converges in polynomial time even with a large number of samples and also the execution time is comparable to the state-of-the-art method iClusterBayes.

Conclusion
In order to properly subclassify the patient data, consideration of multiple views is highly solicited. A single clustering method is not enough to capture all possible structures in a dataset. This is a multi-view classification problem which is solved with the help of the proposed multiobjective based multi-view cluster ensemble based technique.
In the current paper, we have proposed a multiobjective based cluster ensemble technique for multi-view classification. Initially different simple clustering algorithms are applied to generate some base partitionings by varying the number of clusters. These initial solutions are finally combined using some cluster ensemble based operators. The goodness of the individual partitionings obtained using different views is measured using a connectivity-based internal cluster validity index, namely conn-XB and an agreement index computing the agreement amongst the partitionings captured on different views. The values of these measures are simultaneously optimized using the search capability of AMOSA, which is a multiobjective simulated annealing based optimization technique. Obtained results on 13 cancer data sets illustrate the utility of the proposed approach for patient sub-classification task. An extensive comparative study has been conducted to show the efficacy of individual components of the proposed enAMOSA approach. Some approaches are developed to show the utility of initialization step of enAMOSA; further another multi-view based cluster ensemble technique is developed which utilizes some normal mutation operators instead of using an ensemble-based operator. This comparative study reveals that all the components of the proposed approach, enAMOSA are important. Some of the important findings we made are (i) proposed algorithm successfully captures complex heterogeneous structures from multi-omics data compared to other state-of-the-art methodology; (ii) the proposed perturbation operator proves effective in integrating the ensemble technique with multi-objective technique. The comparative results support its effectiveness; (iii) the algorithm, enAMOSA, effectively combines multiple views having different number of clusters; (iv) the execution time of the algorithm is not huge; it converges in polynomial time and also the execution time is comparable to the state-of-the-art method iCluster-Bayes. Study of the various comparative results presented in this paper supports our findings.
In future research, works will be carried out in developing a multi-view based biclustering framework. The developed multi-view based clustering techniques will be applied for solving some real-life problems of social media data. Documents can also be represented using multiple views. Thus many of the document classification problems can be solved using the developed multiobjective based multi-view clustering technique.