Unsupervised Classification of Multi-Omics Data during Cardiac Remodeling using Deep Learning

Integration of multi-omics in cardiovascular diseases (CVDs) presents high potentials for translational discoveries. By analyzing abundance levels of heterogeneous molecules over time, we may uncover biological interactions and networks that were previously unidentifiable. However, to effectively perform integrative analysis of temporal multi-omics, computational methods must account for the heterogeneity and complexity in the data. To this end, we performed unsupervised classification of proteins and metabolites in mice during cardiac remodeling using two innovative deep learning (DL) approaches. First, long short-term memory (LSTM)-based variational autoencoder (LSTM-VAE) was trained on time-series numeric data. The low-dimensional embeddings extracted from LSTM-VAE were then used for clustering. Second, deep convolutional embedded clustering (DCEC) was applied on images of temporal trends. Instead of a two-step procedure, DCEC performes a joint optimization for image


Introduction
Cardiovascular diseases (CVDs), as the leading cause of death in the U.S, are the subject of significant research investigation [1]. CVDs are complex biological phenomena with molecular mechanisms that are largely unknown. With the rise in new experimental technologies over the last three decades, a large volume of omics data describing CVDs has been accumulated [2][3][4]. To elucidate their complex biological mechanisms and identify key molecules associated with different disease phenotypes, we often analyze high-dimensional measurements on DNA sequences, RNA, proteins, metabolites and others. Conventional analyses tend to focus on single-omics datasets. While single-omics analyses have helped improve our understanding of CVDs [5,6], a lack of translational discoveries for multifaceted diseases like CVDs suggests the need for integrated molecular investigations [7]. A comprehensive picture of complex molecular mechanisms related to a CVD of interest can be constructed by simultaneous analysis of multiple molecular data types that are so-called multi-omics [8,9]. In particular, new computational approaches are required to overcome the challenges in integrating heterogenous and temporal data [10][11][12][13].
From predicting clinical outcomes from biomedical datasets [3,14,15] to analyzing multiple data types simultaneously [10][11][12], machine learning (ML) has emerged as a novel and essential component of modern biomedical research. In supervised learning with known group labels (such as controls and experimental conditions), we may use ML algorithms to classify samples based on clinical and molecular variables and build prognostic models [15,16]. With omics datasets where labels are often unknown or unmeasured, unsupervised ML algorithms such as auto-encoders (AEs) [17,18] can be employed to extract hidden sources of variation and help discover patient sub-groups [19]. Recently, deep neural networks have been developed and applied successfully in a wide range of data-intensive fields [20][21][22][23]. With the continuous availability of efficient computational resources, deep learning (DL) is being proven to outperform conventional ML approaches in areas, such as, automatic feature engineering, representation learning, sequence modelling, etc. [24,25]. In particular, we are interested in unsupervised deep learning that can help us uncover molecular signatures of CVDs. Using multiple hidden layers that may be seen as performing data-driven non-linear transformations, deep neural networks can obtain high-level abstractions that are characteristics of underlying phenotypes. However, unsupervised DL in the context of CVDs with multi-omics has been understudied.
In order to better understand CVDs, we have been using diverse genetic strains of mouse and an experimental treatment to induce cardiac hypertrophy. A landmark proteomics dataset based on this mouse model is publicly available [26] and has been analyzed for biomarker identifications [27]. This dataset comprises thousands of proteins quantified at several time points under control (CTRL) and isoproterenol-induced hypertrophy (ISO) across six genetic strains [26]. Based on the same model, we have also quantified plasma concentration levels of 610 metabolites over a period of 14 days. The temporal nature of proteomics and metabolomics data makes the integrative multi-omics analysis challenging and presents new opportunities for adapting DL approaches. We are interested in investigating relationships among proteins and metabolites during cardiac remodeling in an unsupervised manner. Therefore, we implemented two unsupervised DL approaches for the integrative analysis of our time-series multi-omics data.
First, we implemented a two-step approach where low-dimensional embeddings from a variational autoencoder (VAE) [28] are used for clustering. As a popular representation learning method, VAE extracts low-dimensional embeddings, which may shed new insights on molecular mechanisms. As proteins and metabolites were measured over time, we implemented a long short-term memory (LSTM) [29][30][31][32]-based VAE. LSTM is a type of recurrent neural network (RNN) that can capture time-dependent behavior but without the vanishing gradient problem [32]. Then, we applied clustering on the embeddings to identify important temporal trends in the data, and identify groups of molecules (proteins and metabolites) with similar temporal characteristics. Second, we utilized images of abundance levels of proteins and metabolites over 14 days under ISO condition to find clusters based on their visual similarities. To achieve this, we employed deep convolution embedded clustering (DCEC). Unlike two-step approaches, DCEC performs joint optimization for extracting lowdimensional embeddings and assigning cluster memberships [33][34][35].
Integrative analysis using unsupervised deep learning could contribute to characterizing multi-omics data. Specially for complex diseases such as CVDs, heterogeneous data from proteomics and metabolomics could possess hidden groups of biomarkers. We investigated whether DL may extract more biologically meaningful clusters than conventional clustering approaches. The biological pathway analysis using Reactome knowledgebase [36] revealed that the clusters obtained from DL methods harbored more significant pathways than from K-means clustering [37,38], hierarchal clustering (HC) [39] and partitioning around medoids (PAM) [40] methods. Furthermore, we investigated whether scale-free images of temporal trends as inputs could help better cluster the molecules based on visual similarities. We found that DCEC which uses image data as input performed much better than conventional clustering approaches as well as LSTM-VAE using time-series numeric data.

Materials and Methods
We implemented and compared three conventional clustering algorithms and two unsupervised DL approaches on temporal proteomics and metabolomics datasets. Both large-scale datasets are based on six genetically diverse mouse strains with induced cardiovascular conditions, recently generated by our lab. In proteomics dataset [26], six genetic mouse strains (FVB/NJ, CE/J, C57BL/6J, DBA/2J, BALB/cJ, and A/J; 26 mice per strain), purchased from The Jackson Laboratory, were selected for their wide relevance to CVDs, cancer, metabolic and other diseases. The mice were all juvenile males between 9-12 weeks of age. One group (72 mice of six genetic strains) received 15 mg·kg −1 ·d −1 isoproterenol (ISO) (subcutaneous micro-osmotic pumps; Alzet) treatment, whereas the second group (84 mice of six genetic strains) received sham treatment (CTRL). Tissue samples were collected at 0, 1, 3, 5, 7, 10, and 14 days. Cardiac protein samples were prepared and liquid chromatography tandem-mass spectrometry (LC-MS/MS) analysis for protein half-life and steady-state abundance was performed. In this study, we focused on protein abundance levels quantified as normalized spectral abundance factor (NSAF).
The metabolomics dataset is based on a similar cardiovascular mouse model for plasma metabolomics profiling. Six genetic mouse strains (FVB/NJ, CBA/J, C57BL/6J, DBA/2J, BALB/cJ, and A/J; 54 mice per strain) with divergent susceptibilities towards ISO stimulation were continuously treated with 15 mg·kg −1 ·d −1 ISO for 14 days using miniosmotic pump. At baseline (day 0) and during ISO treatment (days 1, 3, 5, 7, and 14), blood samples and whole hearts were collected for plasma metabolomics profiling, cardiac phenotypic assessments, and annotation analyses of metabolic pathways. The absolute concentrations of 610 plasma metabolites were quantified in six genetic mouse strains during ISO stimulation. Over time, changes in concentrations of these metabolites represent the underlying metabolic responses associated with the progression of ISO-induced maladaptive remodeling of the heart. Note that CBA/J and CE/J strains are genetically closely related [41] and have very similar response to ISO (i.e., isoproterenol-induced lethality and cardiac fractional shortening) [42]. The other five strains are the same in both datasets.
For proteomics dataset, we normalized all abundance values under ISO with respect to corresponding CTRL. Concentration levels of metabolites in ISO were normalized with levels at day 0. Proteins and metabolites with quantified values available at 3 timepoints or more were included in the analysis. The missing values were imputed using a multiple imputation (MI) technique; fully condition specifications (FCS) with predictive mean matching (PMM) [43,44]. PMM does not make any assumption regarding the normality of the distribution and MI takes into account the uncertainty associated with missing data by generating multiple imputed values for each missing value. Smoothing splines were fitted to each molecule across time points and the degree of smoothing was automatically determined by leave-one-out cross-validation. The aggerated dataset has complete time-series data for 3,479 proteins and 513 metabolites. The abundance and concentration values were scaled so that they are in the same range between −1 and 1. As a baseline, we employed three highly popular clustering algorithms: K-means clustering, hierarchical clustering (HC) and partitioning around medoids (PAM). Then, we employed two clustering approaches based on DL; the first approach uses the time-series numeric data while the second approach uses images of temporal trends as input. Figure 1 provides an overview of our integrated clustering workflow for the multi-omics data.
The first DL approach uses low-dimensional embeddings from variational autoenconder (VAE) [28] to form clusters. In autcoencoders (AEs), an encoder uses the input data to extract low-dimensional embeddings whereas a decoder reconstructs the input data from these embeddings. Improving upon standard AEs whose low-dimensional embeddings may not be structured in a meaningful manner, VAE imposes a distributional assumption on lowdimensional embeddings that is latent vectors. In the encoder of VAE, a constraint is added so that the latent vectors closely follows a unit Gaussian distribution. Consequently, beyond minimizing the reconstruction error such that the input data may be reconstructed by the decoder, low-dimensional embeddings from VAE are more robust, meaningful, and generative. The loss term in VAE comprises two errors; the reconstruction error and the latent error. The reconstruction error is the mean square error, that evaluates the network's performance in recontructing the input data. The latent loss measures how well the latent vectors follows the assumed distribution, using Kullback-Liebler divergence (KL). Given the input data, the encoder of VAE generates vectors of means and standard deviation. This allows one to sample from the corresponding distribution and to reconstruct the input data using the decoder.
Furthermore, for temporal multi-omics data, we propose a VAE architecture based on long short-term memory (LSTM) which directly models time-or sequence-dependent behavior [29][30][31][32]. The input to an LSTM-based encoder is a time-series and the output are the lowdimensional embeddings. These embeddings are then fed to an LSTM-based decoder which tries to reconstruct the original time-series ( Figure S1). Given enough iterations, the decoder accurately reconstructed the input data. Subsequently, the low-dimensional embeddings obtained from LSTM-VAE were fed to a K-means clustering algorithm. We refer to this clustering approach as LSTM-VAE throughout this paper. LSTM-VAE was implemented utilizing TensorFlow [45] and Keras [46] libraries in python.
The second DL approach utilizes images of the temporal trends of 3,992 molecules to cluster them based on their visual similarities. Specifically, deep convolutional embedded clustering (DCEC) [33] method was employed to perform joint optimization for feature learning and clustering. DCEC is a recently proposed version of improved DEC (IDEC) [34,35] with convolutional layers since these layers are better suited for images ( Figure S2). To adapt DCEC to temporal multi-omics data, the numeric time-series data must be visualized. Therefore, we investigated multiple combinations of line widths and image sizes. Figure S3 shows different images generated for the same molecule. Generally, one would like to choose the image size such that it is large enough to be clearly visualized by human eye, and small enough to keep the computational complexity at a reasonable level. Similarly, the line width needs to be dense enough so the signal pattern is emphasized, and thin enough so the entire signal lies within the image frame. The kernel size is another parameter which may affect the deep convolutional learning in DCEC. Therefore, we performed a systematic analysis to select the best combination of line width, image and kernel sizes. All images were generated in portable network graphics (PNG) format.
Note that we are interested in the method that can obtain clusters which are biologically more meaningful. Therefore, once the cluster assignments were obtained from K-means, HC, PAM, LSTM-VAE and DCEC, we performed biological pathway enrichment analysis to identify the method enriching the highest number of significant pathways. In general, the clustering methods that identify a high number of significantly enriched pathways can be considered biologically important. The functional annotations on cellular pathway information were performed through Reactome knowledgebase (release v67, 2018_12) (https://reactome.org/) [36]. Reactome links proteins and metabolites to their molecular functions, describing hierarchical relationships of molecules in biological processes during cardiac hypertrophy progression [27]. All the molecules in a given cluster were used together as input for the Reactome pathway enrichment analysis. Specifically, UniProt identifiers [47] and KEGG identifiers [48] were used for proteins and metabolites, respectively. The significance of biological pathways identified by Reactome is determined by false discovery rate (FDR), calculated from corrected over-representation probability and providing statistical validation of molecular enrichment within the given pathways [49].
The clustering algorithms were implemented on a server with Intel Xeon Processor E5-2643 v4, 128 GB RAM and NVIDIA Quadro M4000 GPU. K-means and HC algorithm were implemented using scikit-learn [50] library in python programming language while PAM was implemented using 'cluster' package in R programming language. K-means, HC and PAM clustering assignments are provided in supplemental Tables S1 to S3, respectively.

Results
Unsupervised classification of multi-omics data helps us dissect the molecular basis for the complex diseases such as cardiovascular diseases (CVDs). Using the integrated proteomics and metabolomics data from mice undergoing cardiac remodeling, we investigated diverse clustering approaches, including K-means, HC, PAM, LSTM-VAE, and DCEC.
Clustering typically requires an appropriate number of clusters, K. Generally, a low value of K may extract very general patterns while a high value may be severely affected by noise.
This number can be selected based on prior biological knowledge and/or computational analysis. We selected K = 6 based on prior biological knowledge, the within-cluster sum of squares, and the silhouette analysis. The within-cluster sum of squares visualized in Figure  S4, suggests at least 3 or 4. However, from our biological understanding of the integrated dataset and its temporal patterns, we anticipated more distinct patterns. Evaluating the Kmeans clustering performance using multiple K values with the silhouette analysis [51], we decided to proceed with K = 6 ( Figure S5). While a range of K values extracted clusters with above the average scores, there are wide fluctuations when K > 6. Particularly, with K > 6, the differences in cluster sizes are large and smaller clusters would contain limited numbers of molecules. Therefore, the number of clusters is set to K = 6 for all the five clustering methods.
Beyond the number of clusters K, deep neural networks require hyperparameters that must be selected for particular applications. Our LSTM-VAE network naturally has the input layer of dimension 7×1 since each molecule is associated with a vector of length 7. Input timeseries data were processed through an encoder LSTM layer with an intermediate dimension n, followed by fully-connected layers for obtaining means and standard deviations, from which the latent vectors may be sampled in VAE. We set the embedding dimension to 3 as it reconstructed the input signal faithfully. It was observed when the dimension of latent vectors was set to 2, the time-series reconstructed by the decoder was too smooth and failed to capture all the variations in the original. In the decoder, we used two LSTM layers with units n (intermediate dimension) and 1 (input dimension) respectively. Therefore, the first layer generated 7×n sequences while the second layer reconstructed the time-series matching the original input dimension of 7×1. The number of epochs were set to 1000 with adadelta optimizer [52]. Finally, K-means clustering was employed on the low-dimensional embeddings extracted by LSTM-VAE. We tried a range of different values for n from 3 to 10 to be used in the LSTM layers. The average cluster losses on the original time-series obtained from different LSTM-VAE architectures demonstrated that n = 6 is a suitable number of units in intermediate LSTM layers (Figure S6). Overall, we selected a 7-6-3 (input-intermediate-embedding) architecture for the encoder layers. The LSTM-VAE model with number of hidden units in each layer (right cell) is shown in Figure S7 and the source code is available at https://github.com/bilalmirza8519/LSTM-VAE. Figure 2 shows the visualization of the 6 clusters in embedded feature space of LSTM-VAE. The clustering assignments from LSTM-VAE are reported in Supplemental Table S4.
For the DCEC method, we used the same architecture as in the original paper [33]. The encoder has 3 convolutional layers with dimensions of 32, 64, 128, followed by a fullyconnected embedding layer of dimensions 10. The stride length was set to 2. However, instead of a fixed kernel size from [33], we tried 3 different values, 3×3, 4×4, 5×5 as we wanted to find the best combination of image size, line width and kernel size. Note that the last layer has a kernel size of 3×3 and the decoder is a mirror of the encoder. The autoencoder in DCEC was pre-trained with 300 epochs with adadelta optimizer. The source code for DCEC was obtained from https://github.com/XifengGuo/DCEC. Figure S8 shows the average cluster loss calculated on the original time-series as a function of image size and line width for three different kernel sizes. Generally, a line width of > 4 with an image size greater than 40×40 produced better results than with thinner lines and smaller image sizes. The method was found to be not very sensitive to the kernel size. Note that the lowest clustering loss does not necessarily mean the best results, and therefore we also visually validated that the clusters have distinct characteristics. We selected an image size of 80×80, a kernel size of 3×3 and a line width of 7 as this combination consistently provided low clustering loss over 3 trials ( Figure S9). Image sizes greater than 80×80 did not achieve any better results but increased computational time. Figure 3 shows the t-distributed stochastic neighbor embedding (t-SNE) visualization of the embeddings obtained from DCEC. Clustering assignments from DCEC are reported in Supplemental Table S5.
We are interested in the temporal patterns identified from different clustering methods. Visualization of six cluster centers from the five methods shows how similar (or different) their patterns are ( Figure 4). Even though the clusters centers have relatable patterns across clustering methods, there are some notable differences, which may hold key biological characteristics. Nonetheless, we labeled these clusters as "increase", "decrease", "increasedecrease", "decrease-increase", "late increase" and "late decrease", accordingly to the most dominant trends. We performed pathway enrichment analysis for the five clustering methods separately using Reactome knowledgebase [36,49]. Table 1 lists the number of significant pathways (and number of molecules in parenthesis) enriched with FDR < 0.05, by each method. The highest numbers of pathways are reported in bold while the highest numbers of molecules are reported in Italics. As can be observed, DCEC enriched more pathways in total compared to LSTM-VAE, K-means, HC and PAM. Moreover, DCEC enriched the highest number of significant pathways in most of the clusters in comparison with other four methods. K-means and PAM enriched highest number of pathways for one cluster each, but the differences in comparison with other methods are not substantial.
The temporal patterns of these molecules may be both positively or negatively correlated, as biological homeostasis is commonly maintained by regulatory feedback loops. Therefore, we also merged the complementing clusters together: increase with decrease, increasedecrease with decrease-increase, and late increase with late decrease. We refer to these merged clusters as "one-directional change", "two-directional change" and "late change", respectively. Table 2 shows the number of significant pathways enriched by the merged clusters at an FDR threshold of 0.05. DCEC enriched the highest number of significant pathways followed by LSTM-VAE in total. Among a total of 52 unique pathways enriched by DCEC, we found Mitochondrial Translation Termination and Mitochondrial Translation Elongation in one-directional change cluster. These are the essential cellular pathways of mitochondrial protein synthesis, significantly relevant to the cardiac remodeling and heart failure [53,54]. Furthermore, pathways of programmed cell death (e.g, Apoptotic Execution Phase) and PTEN regulation (e.g., Regulation of PTEN Stability and Activity), which were previously reported as important factors of cardiac remodeling and homeostasis [53,[55][56][57][58], were detected solely by DCEC in two-directional and one-directional change clusters, respectively. Supplemental Tables S6-S8 list the unique pathways enriched by K-means, HC, PAM, LSTM-VAE, and DCEC in one-directional change, two-directional change and late change clusters, respectively. Figure 5 shows the number of shared and unique molecules across different methods for the three merged clusters. The majority of the molecules in each merged cluster are shared in the clustering results of at least three methods. Approximately 8.8% of molecules are unique to each method, that demonstrate its particular way to treat the molecules with less clear pattern.
Lastly, to summarize multiple pathway enrichment analyses, we ranked five clustering methods based on the number of significant pathways in both original and merged clusters. For a given cluster, the method enriching the highest number of pathways was ranked 1, the second highest 2 and so on. If the same number of pathways are enriched for more than one method, an average rank was assigned. Table 3 provides the ranking of the five methods for each individual cluster as well the average rank over nine clusters. Both deep learning methods are ranked higher than the conventional clustering methods. Specifically, DCEC is the highest ranked methods with an average rank of 1.72 and LSTM-VAE is the second highest ranked method with an average rank of 2.77. The average ranks of PAM, HC and Kmeans are 3.16, 3.61 and 3.72 respectively.

Discussion
Temporal multi-omics helps us dissect the dynamic molecular basis of complex diseases, such as CVDs. As molecular processes tend to be regulated in a temporal fashion, we expect a network of molecules to exhibit distinct temporal patterns under a given environmental or experimental condition. However, methodologically it is unclear how to classify heterogeneous molecules in multi-omics data without using their annotations. In this temporal multi-omics study, we applied three well-known clustering algorithms (K-means, HC, and PAM) and two recent approaches based on deep learning (LSTM-VAE and DCEC) to systematically classify responses of proteins and metabolites during cardiac remodeling. Pathway enrichment analysis using Reactome knowledgebase provided an in silico biological validation and comparative evaluation. We found that deep learning approaches of LSTM-VAE and DCEC help reveal more biologically meaningful clusters. Particularly, the clusters from DCEC contain the highest numbers of significant biological pathways.
Unsupervised classification requires a number of hyperparameters, that must be set using domain expertises and computational analyses. Finding the optimal number of meaningful clusters has long been a challenging issue. We explored using a smaller number of clusters than K = 6. However, it was observed that the highly plausible delayed responses were lost in such cases since there were no clusters for late increase and late decrease. In contrast, larger values lead to a greater imbalance in the cluster sizes, where a small cluster likely comprises a pattern specific to limited molecules or even noise. We further investigated the dimensions of intermediate layers and latent vectors, as well as how to generate images based on numeric time-series. However, the possible combinations of hyperparameters are endless and may warrant investigations beyond the scope of this paper. Nonetheless, we suggested a workflow and heuristics for a systematic exploration.
For interpretability and comparison of results across five methods, the clusters were grouped and labeled based on the dominant temporal trends. However, the temporal trends from different methods have subtle differences. Such that some molecules may be classified differently, and clusters may exhibit unique biological characteristics. For example, the 'increase' cluster of DCEC contained much more significant pathways than that of other methods. Many of the molecules unique in this DCEC cluster start to follow increase pattern at day 3 rather than day 1. Most of the other methods classified those molecules to the 'late increase' cluster. On the other hand, many molecules in the 'late increase' and 'late decrease' clusters of DCEC generally have a slow or no change until day 7 or day 10, followed by a sharp change. These differences can also be observed from the cluster centers plotted for each method in Figure 4. Similarly, it can be observed from the figure that clusters corresponding to two-directional change are different for DCEC in comparison with most of other methods. These clusters from DCEC include molecules with an initial sharp change followed by a slow change in the other direction.
Pathway analysis evaluates whether a cluster of molecules may share similar functions, according to their annotations. This in silico biological validation allows us to evaluate unsupervised classification methods, as we expect good clustering would result in biologically meaningful groups. Notably, the individual clusters obtained by DCEC not only contained more enriched pathways than that by other methods, they also represented much more enriched pathways in total. Moreover, both DL approaches generated clusters with higher numbers of significant pathways than three conventional clustering methods, resulting in DCEC and LSTM-VAE ranking the highest in Table 3. Therefore, we expect that DL-based clustering approaches will be highly useful for temporal multi-omics studies. Moreover, as DCEC was originally developed for computer vision, it may be wise for molecular biologists and bioinformaticians to stay multi-disciplinary.
Multi-omics data are often imbalanced due to technological and biological differences. Our integrated dataset comprises approximately 87% proteins and 13% metabolites, such that most of the clusters and their associated pathways are dominated by proteins. Thus, it is of our interest to increase the number of metabolites in subsequent studies to extract more pathways enriched by proteins and metabolites in concert. Similarly, genomics and others on the same mouse model would expand our multi-omics study. Moreover, further biological significance related to CVDs may be understood by consulting other pathway knowledgebases such as KEGG Pathway [48] and SMPDB [61].

Conclusions
We have performed integrative analysis of temporal proteomics and metabolomics data in the context of CVDs using unsupervised deep learning. We utilized both time-series numeric data and scale-free images of the temporal trends obtained from heterogenous molecules. It was found that unsupervised methods based on deep learning generally obtained clusters that are biologically more relevant than those extracted by conventional methods such as Kmeans, HC and PAM. Specifically, we utilized Reactome knowledgebase to identify the number of significantly enriched biological pathways by each clustering method. The clusters obtained from DCEC, a method incorporating deep convolutional layers to effectively learn from images, contained the highest number of significant pathways. DCEC method clusters molecules based on the visual similarity of their temporal trends, demonstrating that deep convolutional learning from images is an effective approach for integrative analysis of temporal multi-omics data. We also provided a systematic approach to image generation for molecules, and found the best combination of line width, image size and kernel size for deep convolutional learning. Due to the generality of the proposed unsupervised deep learning workflow, it can be straightforwardly extended to other complex diseases.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material. Missing values in proteomics and metabolomics datasets are completed by fully conditional specification (FCS)-based multiple imputations. Cubic splines were employed to smooth the temporal trends. Images were also generated based on the temporal trends over 14 days. As a baseline, K-means clustering, hierarchical clustering (HC), partitioning around medoids (PAM)) were employed. In contrast, two deep neural network architectures for unsupervised learning were also implemented. First, long short-term memory (LSTM)-based variational autoencoder (LSTM-VAE) was used to extract low-dimensional embeddings from the timeseries numeric data and K-means clustering was employed on the embeddings. Second, deep convolutional embedded clustering (DCEC) was employed to perform clustering of molecules based on image data. The results obtained from the clustering methods were fed to Reactome knowledgebase for pathway enrichments analyses. Three axes represent the three dimensions of the latent space obtained from LSTM-VAE. Different colors denote six clusters identified by K-means clustering. The ellipsoids represent 80% of the concentration for each cluster.   Cluster centers for the each of the 6 clusters obtained by K-means, HC, PAM, LSTM-VAE and DCECE methods are plotted. Based on the temporal similarities of cluster centers across methods, we labeled the clusters as increase, decrease, increase-decrease, decrease-increase, late increase and late decrease. However, there are some obvious differences, which can be attributed to the differences in learning algorithms and input data types (e.g., time-series numeric or image). In each merged cluster, majority of the molecules are shared by at least three clustering methods. All the methods have some unique molecules in each cluster due to differences in learning algorithm and/or input data type. Hierarchal clustering generally has a high concentration of unique molecules while LSTM-VAE has the least.