Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping

: Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view clustering algorithms. However, the high-dimension and heterogeneity of multi-omics data make great e ff ects on the performance of these methods. In this paper, we propose to learn the informative latent representation based on autoencoder (AE) to naturally capture nonlinear omic features in lower dimensions, which is helpful for identifying the similarity of patients. Moreover, to take advantage of survival information or clinical information, a multi-omic survival analysis approach is embedded when integrating the similarity graph of heterogeneous data at the multi-omics level. Then, the clustering method is performed on the integrated similarity to generate subtype groups. In the experimental part, the e ff ectiveness of the proposed framework is confirmed by evaluating five di ff erent multi-omics datasets, taken from The Cancer Genome Atlas. The results show that AE-assisted multi-omics clustering method can identify clinically significant cancer subtypes.


Introduction
Currently, the research and application of big data technologies have penetrated from the internet fields to many other industries.Among which, the rapid development of high-throughput sequencing technologies accelerates the speed of biological big data accumulation, which has triggered a multifaceted revolution in the research of advanced biology and medicine.Like other fields benefiting from big data, these biological big datas bring new opportunities and challenges to bioinformatics as well.
The key issue is how to discover some insightful knowledge from the biological big data efficiently, which has attracted a lot of attention from both academia and industry.
Thanks to advanced sequencing technologies, multi-omics data are generated in large quantities [1], which usually contain genomes, transcriptomes, proteomes, metabolomes, etc.It is worth noting that multi-omics data can be regarded as a specific type of multi-view data.As analyzed in the survey [2], in the past several years, more and more researchers have paid attention on analyzing multi-omics data via machine learning methods with the aim to obtain new knowledge.The general framework of Cox-regularized-model-based (or machine learning-based) multi-omics analysis is shown in Figure 1, which visually shows that each omics is represented by a data matrix from the perspective of a specific view.Although there are certain connections among each omic, the multi-source heterogeneity among multiple omics through data integration brings more potentials for discovering new knowledge, which is beneficial for disease identification and drug development.Also, some priori knowledge can be used to enhance the performance of machine learning methods.For example, a priori information about relationships between the different omics data can be considered, so as to diminish false-positive results and enhance the relevance of true molecular interactions as well.Over the last decade, considerable efforts have been devoted to the development of numerous computational methods for multi-omics data integration [3], which is the fundamental of knowledge discovery.These approaches can be roughly categorized into three classes in terms of the major strategies they used: Early, intermediate and late integration [4].Early integration methods perform a simple concatenation of features from the omic data into a single feature combination, while late integration methods separately learn each omic layer and then merge the clustering results into a single solution.Both early and late integration methods fail to model the interactions among the features in different omics data levels.Instead, intermediate integration methods have gradually become mainstream, which consolidate data by constructing a holistic model for joint dimensionality reduction and cluster-ing without simply concatenating features or merging results.
Nowadays, clustering of histology-oriented data has generated significant value for research in biology and medicine (e.g., disease typing, drug research, precision medicine, etc.) [5].Among them, multi-omics data clustering which considers connections among different omics, belonging to intermediate data integration, can lead to more systematic discoveries [6]: 1) it can reduce the effects of experimental and biological noise in the data; 2) different groups can reveal different cellular levels; 3) even at the same molecular level, each group may contain data that are not available in other groups and 4) different groups can represent data from different levels of organisms.Although the existing multiomics clustering algorithms have gained progress during the past years [7][8][9][10][11][12], they still have a large room of performance improvement by developing efficient algorithms based on advanced multi-view learning techniques, especially for large-scale multi-omics data.
In this paper, we propose the autoencoder-assisted latent space learning for survival analysis and multi-omics clustering (AELSMC) to identify meaningful cancer subtypes.First, the autoencoder (AE) aims to obtain nonlinear high-dimensional omic features in the lower dimensional space, so as to determine more accurate similarity of patients.Next, the clinical information assisted by embedding a multi-omic survival analysis approach is incorporated to learn the similarity graph of heterogeneous data at the multi-omics level.Then, we perform spectral clustering on the similarity matrix of patients given a number of clusters, and hence, generate the result of subtype groups.The proposed method is compared with some other representative algorithms on five multi-omics datasets.Experimental results have validated the promising potential of AE in capturing multi-omics feature information in lower dimensions, and the superiority of the proposed method in generating more distinguished subtypes.

Survival prediction
Survival analysis is related to the time going by when an event begins until a censoring point.It is usually used to estimate the survival time of the observed patient [13], namely, the time from diagnosis of a disease to death.Nevertheless, it can be also concerned to any time-dependent event, which is often termed as disease-free survival, such as time in hospital or time until a disease recurs.In the literature, various survival prediction techniques are developed for clinical analysis of diseases, among them some categories are widely-used, like multi-task learning based analysis [14], deep learning based analysis model [15], and reweighted regression model [16].
Note that most of the existing survival prediction methods are developed based on a single type of data.When there exists different types of data, multi-view learning can exploit the complementary information between them by the joint optimization model so as to improve the generalization performance.Research on various multi-view learning techniques has gained a lot of attention [17], however, the development of survival prediction methods based on multi-view learning on multi-omics data is still under-explored.In view of this, we attempt to take full advantage of multi-view survival prediction on multi-omics data, thereby facilitating to deal with the tasks (e.g., cancer subtyping) in clinical analysis.

Cancer subtyping based on multi-omics analysis
During the past decade, cancer subtyping (or cancer subtypes identification) has become one of the vital steps for advancing diagnosis, prognosis and treatment.The essence of cancer subtyping is to classify patient samples with similar features of omics data, which usually adopts unsupervised or semi-supervised clustering methods.
In early time, the research of cancer subtyping focuses on clustering single omic data, such as gene expression data, which is similar to that of survival prediction techniques.However, it is insufficient today since a large quantity of multi-omics data has been generated quickly in this field.Under further research, various multi-omics clustering methods were proposed and applied to cancer subtyping, e.g., [18][19][20][21], which can be briefly summarized into three main categories: Multi-view clustering (MvC) methods, model-based methods, and similarity-based methods.Among them, MvC techniques seem to be more prevalent, as witnessed in literature.For example, a novel multi-view clustering with low-rank and sparsity constraints (MVCLRS) was proposed to capture both the global and the local structures by integrating the multi-omics data [18].A multi-view spectral clustering with latent representation learning method was proposed in [20], which can deal with the incomplete multi-omics data with missing values.Most existing cancer subtyping methods are developed in an unsupervised manner, however, some knowledge like multi-view (omic) survival analysis (survival prediction) is very helpful in the MvC [13,16].
Generally, omics (or multi-omics) data have the characteristics of sample scarcity and high dimensionality, hence dimension reduction or subspace learning techniques are very useful, as can be found in some recent multi-omics clustering algorithms.For example, a learning vector quantized representation based on vector-quantized variational autoEncoder was developed in [19].The noise and redundant information in high-dimensional omics data has been addressed by the latent representation learning in [20].The principal component analysis (PCA)-based feature extraction and singular value decomposition (SVD) were utilized for latent subspace learning in [22].To simultaneously deal with the issues of high-level noise and high heterogeneity existing in multi-omics data, the deep latent space fusion (DLSF) model [23] was proposed based on a cycle AE with a shared self-expressive layer, which can learn consistent manifold in the sample latent space.

Multi-view clustering
In this section, we provide a more detailed discussion about MvC, owing to its predominance for multi-omics analysis.The research of MvC algorithms is a hot topic in the field of unsupervised learning, as clustering is performed by utilizing the heterogeneous perspectives of features in multiview data to achieve accurate and meaningful solutions.In recent years, various MvC algorithms [24][25][26][27] have been proposed.Roughly, the existing MvC methods are mainly classified into three categories: 1) matrix decomposition methods [24]; 2) subspace-based methods that identify consensus low-dimensional subspaces [28] and 3) graph-based methods that utilize consensus nearest-neighbor matrices in graphs [25,27].Meanwhile, based on the data information employed, they can be further simply divided into two types [25]: Feature-driven methods (containing matrix decomposition and subspace-based categories) and relation-driven methods (graph-based category).Among them, the former one establishes explicit models via data features to estimate the distributions of data.The latter types aim at analyzing the point-to-point relationships of data as commonly-presented in graphs, which seeks to apply various optimization methods [25,29,30] on the graphs to get high-quality data partitioning.
As claimed in [25,27], graph-based MvC methods have the advantages of simplicity and efficiency (e.g., efficiently handling nonlinear data), which have gained more attention recently.In this study, we employ the graph-based MvC technique for multi-omics clustering, for which a patient-to-patient similarity graph is learned via the AE-assisted informative latent space.At the same time, the graph learning and survival prediction are simultaneously optimized in the joint built on the latent embedding space, as discussed in the next section.

The workflow of the proposed framework and basic definitions
An overall workflow of the proposed framework on multi-omics cancer subtyping is illustrated in Figure 2, and it has three major components.First, the more informative latent space representation (i.e., Z) is captured by training the AE neural network on the multi-view (multi-omics) dataset X.Therefore, two tasks-i.e., the survival analysis and similarity graph learning, are integrated into a unified optimization procedure based on the latent space representation.Finally, the cancer subtypes can be obtained by performing the spectral clustering algorithm on the affinity matrix of the patient-to-patient similarity graph, which can further provide insights for survival analysis and biological enrichment.Given a dataset X = {X v 1 , X v 2 , . . ., X v n } of n points, and there are m views, namely {v 1 , v 2 , . . ., v m }.Then for the k-th view, the feature matrix of data is X k ∈ R p k ×n , where p k is the number of features in this view.Here, the multi-omics dataset of the k-th omic is defined as is the aforementioned feature matrix, T i is the observation time, and δ i is the censoring indicator which indicates whether the patient is censored (δ = 0) or observed (δ = 1).Thus, T i is defined as follows: where O i denotes a survival time while C i is a censored time.

Latent representation learning based on autoencoders
Generally, the number of samples is much less than the number of features in current biological datasets.The AE, as an efficient dimension reduction tool, can map the high-dimensional data into the low-dimensional hidden representation Z v i , where "v" represents a particular view of the input data.AEs, unsupervised neural networks, attempt to restore inputs from their outputs through the process of encoding and decoding [31,32].As shown in Figure 3, the general structure of an AE is made up of three parts, i.e., encoder, code (bottleneck), and decoder.To be specific, an encoder is a function that compresses the input into various latent representations, and thereby only useful information/features can be left by squeezing the input through a bottleneck.Accordingly, a decoder attempts to reconstruct the learned representation from the encoder back to the original format.
Here we adopt the minimization of reconstruction error as the training goal of an AE, i.e., calculating the difference between output X ′ and input X.More specifically, each node Z v j in the hidden layer can be obtained as: where σ(•) is the encode function, a and w are parameters of the encoder.Hence, data in the latent space is represented as Z.
The decoder takes the hidden representation as input and tries to reconstruct the original input, hence the value of each node X v ′ i in the output layer can be calculated as: where σ(•) is the decode function, a ′ and w ′ are parameters of the decoder.The goal of training an AE is to optimize a predefined loss function.Through minimizing the loss function to reconstruct the input, the weight parameters can be updated.Regarding a particular view "v", the reconstruction loss (L v r ) can be formulated as Therefore, the above loss function is used to update the weight parameters to reconstruct the original data X, and the more informative latent space representation (i.e., Z) can be captured by training the AE neural network.Then, either the survival analysis model or the similarity graph is generated from the optimized latent embedding space of Z.

Survival analysis and similarity graph learning via the latent embedding space
As mentioned in Section 2.1, the survival information or clinical information is helpful for identifying meaningful cancer subtypes from the biological perspectives, as also claimed in [13,16,20].To be more specific, the quality of the patient-to-patient similarity graph can be improved by simultaneously optimizing the learning process of the survival analysis model and the similarity matrix.Therefore, the two tasks, namely the survival analysis and similarity graph learning, are integrated into a unified optimization procedure.
Here, we employ the multi-view (multi-omics) Cox model [33] to be the survival analysis function.In addition, an omics-consistency co-regularization term [34] is introduced to explore consistent and complementary information within different omics.It aims at shrinking the agreement of the prediction between each pair of views among the multi-omic data, and hence, improves the learning performance.Therefore, the regularized survival analysis model of multi-omic data is formulated as: where w = [w 1 , w 2 , . . ., w m ] is the survival prediction coefficient across all views (omics) and R i denotes the risk set of T i , containing instances with observed time not less than T i .Actually, function (3.5) is optimized via the latent representation of data Z, rather than the original data X.
Let S be the similarity matrix (or affinity matrix), which reflects the global structure of data.In S , s i j is the similarity weight between the i-th and the j-th samples.A pair of similar data points can get a large weight, and vice versa.
Note that, in clinical analysis, it is usually observed that patients grouped in the same disease subtypes have similar distributions of features as well as similar survival times.Hence, a joint learning model of patient-to-patient similarity graph and survival prediction tends to discover disease subtypes more precisely.To this end, an adaptive affinity learning to measure the edge weights based on both the similarity of patient samples [35] and survival analysis [16] is defined as follows: Similarity of patient samples where γ and µ are the tradeoff parameters.For simplicity, we set the value of γ and µ to 1.In the affinity matrix, a larger weight value is assigned for two patients if they get a smaller distance and similar prediction value.
It should be pointed out that the coefficient w is obtained by optimizing the survival prediction model (i.e., Eq (3.5)), such that it cannot be estimated only based on Eq (3.6).To address this issue, a collaborative learning model of survival prediction and graph affinity learning is developed by combining Eqs (3.5) and (3.6) into a joint optimization model.Hence, the unified loss function is formulated as: Note that it is interesting to include the AE training step into the multi-view Cox model of Eq (3.5).Specifically, a more unified optimization model can be built by adding Eq (3.4) to Eq (3.7).That is to say, step 1 (AE training) and step 2 (unified optimization of survival analysis and similarity graph learning) in Figure 2 are combined into a whole function.It needs additional theoretical analysis, which is beyond the scope of this work, and we will attempt to explore it in the future.
Therefore, the two tasks of graph learning and survival analysis are simultaneously optimized.The quality of similarity matrix S could be improved based on the distance of patient instances and prediction time, in order to identify more reasonable results of disease subtypes.Meanwhile, the similarity matrix S can in turn positively reinforce the survival analysis model.Consequently, the above two tasks can improve each other.Specifically, through the joint alternating optimization strategy [13,16], the coefficient w of survival analysis model and the similarity matrix S iteratively update one while keeping the other one fixed.In addition, the proximal gradient algorithm [36] and the Lagrange multipliers [35,37] need to be adopted to obtain or approximately reach the closed-form solution of w and S , respectively.For simplicity, the detailed formula derivation is not presented here, which can be referred to [13,16].Instead, we directly provide the iterative solution of w and S .
The updating formulation of w is achieved by keeping S fixed in L unified , and let h(w) be a part of L unified only regarding w, i.e., h(w) = L survival + γ n i=1 n j=1, j n Z i w − Z j w 2 S i, j .According to the proximal gradient algorithm utilized in [36], w is calculated iteratively by: where v = w(t) − (τ/2) ▽ h(w(t)) with learning step τ that can be estimated by linear search.Thus, the coefficient vector w is updated based on the following closed form solution: where (a) + is the positive function.
Similarly, the updating formulation of S is obtained by keeping w fixed in L unified (L unified actually equals to L similarity in this case).In addition, let ) can be redefined as min S i n j=1, j i S i, j d i, j + S 2 i, j with S i, j and S i subject to the constraint of Eq (3.6).According to the Lagrange multipliers utilized in [35,37], the closed-form solution of S is approximately calculated by: where K ∈ (1, n) is a pre-specified constant value to control the neighbour size (K=20 is set here), d i is obtained by sorting d i in ascending order, and (A) + is the positive function.
After obtaining the similarity matrix (S ) of the patient-to-patient graph via the latent embedding space, we can perform spectral clustering on S with a given number of clusters, and hence, generate the clustering result, namely the subtype groups.
The complete optimization procedure of the AELSMC algorithm is presented as Algorithm 1.The source code of AELSMC is freely available at https://github.com/ShuweiZhu/AELSMC.git.12 end 13 Perform spectral clustering algorithm based on the similarity matrix S to generate k subtypes.

Clustering ensemble for a consensus solution
As mentioned in Section 1, late integration strategy can take advantage of a set of clustering solutions (based solutions) from different perspectives, such as using different methods, parameters and subspaces.As a typical late integration strategy, clustering ensemble is performed to fuse these base solutions, aiming at generating a consensus clustering [38,39].Usually, such a consensus clustering is beneficial from the complementary information of multiple views of omics data while making up their shortcomings.For example, a graph-based multi-method and multi-source consensus clustering strategy [12], named as ClustOmics, has been proposed based on evidence accumulation clustering (EAC) [40] to improve the robustness of predictions.A clustering ensemble method is usually composed of two main components: 1) clustering generation, and 2) consensus function.For the former, the quality and diversity of the generated clusterings are two key factors.During the last two decades, a majority number of clustering ensemble methods have been proposed, and among them, three consensus methods proposed in [41], the hyper-graph partitioning algorithm (HGPA), cluster-based similarity partitioning algorithm (CSPA) and meta-clustering algorithm (MCLA), are the most classical but still prevalent, due to their efficiency and efficacy.
In this study, we propose to take advantage of the cluster ensemble strategy by integrating useful information of multiple solutions from different perspectives, for example, to set different parameters.To do this, the powerful locally weighted clustering ensemble algorithm named LWGP [42] is used here.First, a set of clustering candidates is generated as the ensemble pool.Note that the dimension of latent space for each omic is set to be β × d, where β is the control parameter of the size for latent space, and it takes value from interval B = [0.01,0.02, . . ., 0.1].Thus, we can get the ensemble pool by setting different values of β in the proposed framework.Thereafter, the LWGP algorithm is executed on the ensemble pool to obtain the final solution.

Experimental results and analysis
In this section, we present the experimental settings, the comparison with the competitive methods of five multi-omics datasets, and the corresponding analysis (effectiveness verification and parameter analysis of the proposed framework).
Table 1 shows the properties of these multi-omics datasets, where the number of samples, censored times and the dimension of three omics (miRNA, mRNA, methylation) is presented, respectively.Note that, the Z-score standard normalization is adopted to map the original data X to the normal distribution with a mean of 0 and a standard deviation of 1, as defined below: The proposed AELSMC method is compared with several competing approaches: affinity network fusion (ANF) [7], similarity regression fusion (SRF) [8], similarity network fusion (SNF) [47], subspace Merging (SM) [9], DLSF [23], AutoCox [48], survival supervised graph clustering (S2GC) [13] and SparseAE (note that the author has not defined the algorithm name, here we define it as SparseAE for convenience) [43].We conduct experiments on a PC with an Intel Core i7-1065G7 CPU and 16 GB RAM.Moreover, the Cox log-rank test (or -ln of log-rank's test p-value) [6] and the widely-used internal validity index-Silhouette [49] are adopted to evaluate the clustering performance.
Firstly, the -ln of log-rank's test p-value can measure whether the survival time is significantly different among subgroups.The log-rank test is defined as: where c denotes the number of clusters.O k is the number of identified instances in the k-th subgroup while E k is the number of expected instances.A smaller value of the p-value indicates a better result.The Silhouette index [49] can quantify the goodness of clustering solutions by measuring the compactness within clusters and the separation between clusters [38,39,50], since no standard clustering result is available for the multi-omics datasets.It takes value from [-1, 1], and a larger value of the Silhouette indicates a better result.

Comparison with the state-of-the-art methods
The experimental results are obtained by using the source code provided by the authors or the results reported in the papers in the case that the source code is not available.For example, the codes of DLSF and S2GC are provided by the authors, which is very helpful for us.For convenience, the control parameter β of the size for latent space is set as 0.05 here, but the sensitivity analysis of β is presented in later subsection.Note that, it is necessary to make comparison experiments under the same conditions, so in this subsection the number of clusters k keeps consistent for all the methods on each dataset, i.e., k = 5 for BIC, k = 3 for COAD, k = 3 for GBM, k = 4 for KRCCC, k = 4 for LSCC, k = 3 for OV, k = 2 for SARC and k = 3 for SKCM, as suggested or used in literature [13,43].
The clustering performance of different cancer subtype identification methods in terms of Cox logrank test p-value is shown in Table 2, where the best and the second best for each dataset is shown with a dark (in bold type) and light gray background, respectively.For the competitive algorithms, the S2GC algorithm obtains the best result on dataset LSCC and the two second best results; SparseAE obtains the best result on datasets KRCCC and OV.Thus, it is observed that among the nine state-ofthe-art cancer subtyping algorithms, no one can significantly outperform the others in all cases.For our AELSMC method, there are significant differences between the cancer subtypes identified, as the Cox log-rank test p-values for all datasets are quite low.Overall, the proposed AELSMC method shows very obvious superiority over other cancer subtyping algorithms, since it obtains the best result on 5 out of 8 datasets and the second best for the remaining 3 dataset LSCC.All these observations demonstrate the effectiveness of the proposed AELSMC model.It deserves noticing that AELSMC can obtain a much lower result (p-value=4.7E-12)than the others on the GBM dataset, which may be owing to the high-quality similarity graph learned via the latent embedding space.By using the Kaplan-Meier survival analysis, the survival curves of AELSMC on five multi-omics datasets are generated as shown in Figure 4.Note that Kaplan-Meier probability is a commonly used approach to discriminate the survival time of different groups.It is observed from Figure 4 that the subtypes of all the datasets can mostly be well distinguished, except for two subtypes of the COAD dataset.That is to say, in most cases there is a significant difference between the survival probability curves of various clusters identified by our AELSMC method.To take the BIC dataset as an example, among all the five subtypes, subtype 5 has the longest average survival time, followed by subtypes 1 and 2. The prognosis of subtypes 3 and 4 is relatively poor.For the other four datasets, the prognosis of different subtypes can be observed in the similar way.
Moreover, the average values of Silhouette obtained by six multi-omics clustering methods, SRF, SNF, DLSF, S2GC, SparseAE and AELSMC are presented in Table 3, where the best score is high- First, we present the effectiveness verification of the AE in our proposed framework.To take dataset GBM as an example, the training process of an AE on the view of mRNA and methylation, respectively, is shown in Figure 5.The loss function of an AE is the mean square error with L2 regular term and sparse regular term (msesparse for short), and the total number of epochs is set as 100 here.It is obvious that the AE can quickly converge toward a low value of msesparse, which shows the effectiveness and efficiency of the latent space learned by the AE model.Also, the decreasing trend of both msesparses becomes slower at the late stage, such that it is not necessary to set a large value for the total number of epochs.Generally, we suggest to set the number of total epochs as less than 100, and sometimes 50 or smaller is enough.From Figure 5   The number of clusters k was set in the interval from 2 to 9 and we conduct our AELSMC method on the five datasets, respectively, by setting the control parameter of the size for latent space as 0.05. Figure 6 shows the -log10(p-value) of the AELSMC method with varying k.We can see that an appropriate k is helpful for obtaining more significant cancer subtypes.If the number of subgroups of AELSMC is based on the value of Cox log-rank test p-value, we can generate better results than that of Table 2, which shows the promising potential of AELSMC.To better investigate the effectiveness of the AELSMC method, we compare the result of integrating all three omic types with that of our framework using each single omic type as well as simply concatenating the encoder features.The comparison result is shown in Table 4, where the best result in each case is shown in bold type.It is obvious that our AELSMC method of integrating multi-views obtains much better results than that of using only a single view.Moreover, the results of simply concatenating the encoder features shown in the concatenation column can obviously demonstrate the advantages of the proposed method in practice, since the p-value on each dataset is much worse than that in the integration column.Hence, we can conclude that the proposed data integration technique can significantly improve clustering performance and generate different survival profiles.

Parameter analysis
In this section, the analysis of parameter sensitivity is presented.To be specific, the parameter sensitivity of an AE and cancer subtyping process are investigated, respectively.For the latent representation learning based on an AE, the dimension of reduced space is set to be β × d, where β takes its value from interval B = [0.01,0.02, . . ., 0.1]. Figure 7 shows the performance of the AELSMC method on five multi-omics datasets in terms of -log10(p-value) under the given number of clusters.As we can see, neither a high value nor a low value of β is capable of obtaining desirable results in most cases.For example, the result on dataset LSCC is very poor by setting β with a higher value, while the result on datasets COAD and KRCCC is relatively poor by setting β with a lower value.It may be owing to the fact that the over-reduced latent space loses too much information of the original dataset, while a relatively high-dimensional latent space is not very beneficial for learning the similarity matrix of datasets.On the contrary, a moderate value of β (i.e., in the middle of interval B) can usually generate a desirable result, even though not the best one among the 10 cases (|B| = 10) for each dataset.Note that a worse result (which takes a value around 12) is obtained by setting the moderate value of β on dataset GBM.However, the performance in this case is still much better than other comparative algorithms since the second best -log10(p-value) is just 3.01 generated by S2GC from Table 2. Hence, we suggest to set β with a moderate value in interval B, e.g., β = 0.05 as the default value.However, it still needs more investigation of exploring how to set the best value of β for each multi-omics dataset.
For the survival analysis, the parameter λ needs to be set in Eq (3.5), which may play a significant impact on the final performance.In view of this, we provide the performances comparison of the AELSMC on five multi-omic datasets with different values of λ in terms of -log10(p-value), as illustrated in Figure 8, where λ takes the value from set {0.1, 0.5, 1, 5, 10}.We can observe that the results of setting λ = 1 are desirable in general, and it especially shows obvious superiority on datasets COAD and GBM by λ = 1 over that of setting other λ values.Hence, λ = 1 is suggested in our proposed method for analyzing the multi-omics datasets.Additionally, it is interesting to develop multi-objective clustering methods [38,39] for multi-omics cancer subtyping, since they can take advantage of multi-objective evolutionary algorithms [29,30] to simultaneously balance multiple clustering objectives in terms of clustering quality from different perspectives (or views/omics).Meanwhile, some parameters may not be needed in the optimization model, like Eq (3.7).

Conclusions
To deal with multi-omics data characterised by high-dimension and heterogeneity, we propose an AE-assisted latent representation learning method for survival prediction and multi-omics clustering.The role of AEs benefits to capture nonlinear omic features in a lower-dimensional space, so as to identify the similarity of patients.Moreover, we make full use of survival information or clinical information by the means of embedding a multi-omic survival analysis approach when integrating the similarity graph of heterogeneous data at the multi-omics level.To validate its superiority, the performance of our method has been compared with other state-of-the-art algorithms on five multiomics datasets.The experimental results reveal that the proposed AELSMC algorithm is a highly competitive method to deal with cancer subtype identification under the condition of biological big data.The promising performance of AEs encourages that more deep learning methods [51,52] can be further investigated to advance the development of biology and medicine.
In addition, it is worth pointing out that more different omics data can be further explored via [44][45][46], and these datasets help discover some insightful knowledge that facilitates to deal with the tasks in clinical analysis.For example, the use of blood bioenergetics and metabolomics can be predictive biomarkers of patient response to immune checkpoint inhibitor therapy [45], which plays an important role in guiding treatment decisions and developing approaches to the treatment of therapeutic resistance.Moreover, OmicsDI [44] is an open-source platform that can be used to access, discover and disseminate omics.It can integrate proteomics, genomics, metabolomics and transcriptomics datasets, which has great potential for our further study.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Figure 1 .
Figure 1.Multi-omics data analysis based on Cox-regularized models (or other machine learning models).

Figure 2 .
Figure 2. Workflow of the proposed framework on multi-omics cancer subtyping.

Figure 3 .
Figure 3. General structure of an AE.

Algorithm 1 : 4 Step 1 . 5 Fix 6
The complete optimization procedure of the AELSMC algorithm Input: Multi-omics dataset: X; Parameters: λ, β, and the number of subtypes k Output: The identified subtypes (clustering result).1 Train autoencoder on multi-omics dataset X to generate the latent space representation Z. 2 Set t = 0, and initialize model coefficient w and similarity matrix S .3 while the unified model not converge do Suvival analysis.S and estimate w by proximal gradient algorithm, i.e., repeat:Compute the gradient ▽h(w(t)) and let v = w(t) − (τ/2) ▽ h(w(t)).

7 8 9 Step 2 . 11 t
Update w = sign(v)(v − η) + and increase the learning rate of τ.Update w(t + 1) = w.Similarity learning.10 Estimate S by fixing w: Compute d i j = γ µ Z i − Z j 2 + Z i w − Z j w 2 and update each element of S by Eq(3.10).= t + 1.
(a) the msesparse of epoch = 50 approximately reaches 0.75, while that of epoch = 100 is 0.49; and from Figure 5(b) the msesparse of epoch = 50 approximately reaches 1.0, while that of epoch = 100 is 0.79.Hence, there is actually no significant difference between the learned latent spaces in the cases of epoch = 50 and epoch = 100 here.

Figure 5 .
Figure 5. Training process of AE on BIC dataset.

Figure 6 .
Figure 6.The -log10(p-value) of the AELSMC method on five multi-omics datasets with varying number of subtypes.

Figure 7 .
Figure 7. Performances comparison of setting different values of parameter β for AE-assisted latent representation learning on five multi-omics datasets in terms of -log10(p-value).

Figure 8 .
Figure 8. Performances comparison of the AELSMC method on five multi-omics datasets with different values of λ in terms of -log10(p-value).

Table 2 .
Clustering performance of different cancer subtype identification methods in terms of Cox log-rank test p-value.

Table 3 .
Comparative cluster evaluation of average Silhouette.

Table 4 .
Performances comparison of the AELSMC method (Integration) on multi-omic datasets with that of the framework on each single omic type and simply concatenating the encoder features (Concatenation) in terms of Cox log-rank test p-value.