Temporal response characterization across individual multiomics profiles of prediabetic and diabetic subjects

Zheng, Minzhang; Piermarocchi, Carlo; Mias, George I.

doi:10.1038/s41598-022-16326-9

Download PDF

Article
Open access
Published: 15 July 2022

Temporal response characterization across individual multiomics profiles of prediabetic and diabetic subjects

Minzhang Zheng^1,2,
Carlo Piermarocchi³ &
George I. Mias^1,2,3

Scientific Reports volume 12, Article number: 12098 (2022) Cite this article

1525 Accesses
5 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Longitudinal deep multiomics profiling, which combines biomolecular, physiological, environmental and clinical measures data, shows great promise for precision health. However, integrating and understanding the complexity of such data remains a big challenge. Here we utilize an individual-focused bottom-up approach aimed at first assessing single individuals’ multiomics time series, and using the individual-level responses to assess multi-individual grouping based directly on similarity of their longitudinal deep multiomics profiles. We used this individual-focused approach to analyze profiles from a study profiling longitudinal responses in type 2 diabetes mellitus. After generating periodograms for individual subject omics signals, we constructed within-person omics networks and analyzed personal-level immune changes. The results identified both individual-level responses to immune perturbation, and the clusters of individuals that have similar behaviors in immune response and which were associated to measures of their diabetic status.

Microbiota in health and diseases

Article Open access 23 April 2022

Genome-wide association studies

Article 26 August 2021

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Introduction

The development of novel technologies in personal health monitoring devices, high throughput sequencing and computational methods has generated massive omics data, and provides both a great opportunity and challenge to precision health^1,2,3,4,5. The big data provides plentiful health information ranging from biomolecular, physiological, and environment data to clinical measures. This information helps identify potential deviations from a healthy baseline and improves health risk predictions¹. A big challenge of a big data approach to precision health is how to integrate and understand these multi-dimensional, extremely diverse sources, with highly heterogeneous data³. Early efforts by Chen, Mias, Li-Pook-Than et al. focused on assessing the feasibility of integrated Personal Omics Profiling (iPOP), by utilizing a multiomics integration framework to interpret healthy and diseased states followed through an individual’s blood-based multiomics assessment⁶. More recent efforts by Sara Ahadi et al. revealed personal aging markers by using deep longitudinal profiling⁷, Abdellah Tebani et al. discovered how the personal cohort changes during the wellness period⁸. Environmental effects have also been studied by M. Reza Sailani et al., revealing two biological seasonal patterns in California by multiomics profiling⁹. Wearable sensors have also been used in digitalized health in tracking physiomes and activity¹⁰. Other implementations have used multiomics to monitor the drug responses¹¹. Non-invasive longitudinal saliva multiomics have been recently used by Mias et al. to monitor immune responses in a vaccinated individual¹². Although these efforts have shown the great promise of deep multiomics profiling, the complexity of data presents limitations for practical implementations. Deep multiomics data come from diverse sources, and have different types, sizes and ranges, which complicates comparisons between different individuals’ personal multiomics. In the Pioneer study by Price et al.¹³ dynamic data clouds were used for longitudinal monitoring of individual subjects, in a study that also incorporated behavioral coaching to improve clinical biomarkers. In recent work, Zhou et al.¹⁴ carried out iPOP across multiple individuals, and built correlation networks of molecular associations. However, to the best of our knowledge, direct networks of individuals associated with longitudinal individual deep multiomics profiles have not been constructed.

In this investigation we took an individual-focused approach to categorize personal longitudinal deep multiomics profiles and group individuals into communities, using spectral representations of individual multiomics time series. In taking this individual-focused approach, one of our goals was to perform an analysis closer to clinical applications, where inherently the individual is monitored over time to enable a personal diagnosis. We implemented this approach on personal multiomics profiling data from prediabetic or diabetic individuals (type 2 diabetes mellitus, T2D) at its earliest stage from the study by Zhou et al.¹⁴. We first identified individual-level molecular responses to immune perturbation associated to individual physiological state changes. Based on the individual temporal responses, we built clusters across individuals showing similar trends. The microscopic molecular behavior was linked to phenotypic differences, including body mass index and insulin resistance, with the immune response dominating differences attributed to diabetic status.

Methods

Summary of cohort details and data

The original data used in this analysis comes from the study by Zhou et al.¹⁴, that focused on multiomics characterization of host-microbe dynamics in prediabetics. The measures, SSPG, Matsuda, DI and isrMax, came from the other paper of the same project by Shussler-Fiorenza Rose et al.⁴. All data obtained were made publicly available by the original authors^4,14 as described therein (under Stanford IRB No. 23602), and no additional institutional review board approvals (IRB) were required for this investigation.

The participants had been classified as diabetic or prediabetic in the original study according to one of the following three criteria¹⁴: (i) haemoglobin A1C (A1C; diabetic $\ge 6.5\%>$ prediabetic $\ge 5.7\%$), (ii) fasting glucose (diabetic $\ge$ 126 mg $\hbox {dl}^{-1}>$ prediabetic $\ge 100$ mg dl $^{-1}$) and (iii) based on an annual oral glucose tolerance test (OGTT; diabetic $\ge$ 200 mg $\hbox {dl}^{-1}>$ prediabetic $\ge 140$ mg $\hbox {dl}^{-1}$ at 2 h). The different subjects in the study had highly heterogeneous visit records: some subjects only had one visit record but one subject has more than 150 visit records (time points), which is about 15 times more than the average of 10 visits per subject. Multiple omics were generated from the subjects including blood based transcriptomics, microbiome data (nares/gut), cytokine measurements and clinical measures. To ensure the individual omics profiles had enough time points and time series, we filtered records from individuals so that the number of time points $N_t >= 4$ and the number of omics $N_o > 500$, across participants from the data source¹⁴. We also excluded the subject with the 150+ visits records, as this was not comparable directly to other subjects in our analysis, given the density of points. Our final filtered dataset contained 69 subjects from the original data source. Figure 1a shows the age distribution across sexes for the subjects. We had 34 males and 35 females, and most subjects were older than 50. Summary distributions of the subjects’ observation window are shown in Fig. 1b. The observation window is heterogeneous, ranging from 200 to 1200 days. During the observation window, there were a total of 846 visits with different conditions, including: 486 healthy visits as baseline, 148 visits when subjects got infected, 119 visits had immunization effects, 43 visits with subject weight gain or loss period, 18 visits with subjects on antibiotics, and 32 other healthy conditions, summarized in Fig. 1c. Overall we analyzed 733,425 time series across 5 datasets: 713,874 RNA-sequencing (RNA-seq) data, 3221 clinical measures, 4554 cytokines, 6336 Gut and 5440 Nares measurements. The majority ($>90$%) of the time series comes from RNA-seq, Fig. 1d. As RNA-seq provides a comprehensive and accessible map of the transcriptome, with more omics profiled (by number comparison) compared to other-omes (e.g. proteomes/metabolomes, etc.) we expect that the majority of future omics profiling studies data will follow similar trends (as has been the case so far), though we do expect more microbiome data to emerge, as the host-microbiome interaction is still under considerable investigation.

Data preprocessing

To obtain an individual’s omics profile, we combined all the omics source dataset into a dataframe, then separated into dataframes for each individual. Since our workflow has two branches: single subject analysis and multi-subject similarity analysis, as seen in Fig. 2, the following data preprocessing was carried out for the two branches: (i) For the single subject analysis, we selected the signals with less than 25% time points missing from each individual’s dataframe as the input for single subject analysis, using each individual’s time points as possible measurement points. (ii) For multi-subject analysis, we sorted each individual’s time frame from their dataframes, then combined all individual time frames to get all the possible time points as the common time frame. We then calculated Lomb–Scargle periodograms from each individual’s dataframe using this common time frame as the set of possible measurement points. The transformed dataframe was then used as the input for the multi-subject analysis.

Individual subject analysis

Time series categorization

Individual subject analysis was carried out in Python, with the package PyIOmica utilities for time series categorization¹⁵ (command calculateTimeSeriesCategorization). Briefly, for each subject s, for each omics i its time series X was analyzed over its constituent timepoints. The individual omics intensities at timepoint j were compared to the initial timepoint by subtracting its intensity from all values, $\tilde{X}_{is}(t_j)=X_{is}(t_j)-X_{is}(0)$. The data were then normalized to a time series Q, using the Euclidean norm, so that $Q_{is}(t_j)=\frac{\tilde{X}_{is}(t_j)}{\sqrt{\sum _k\left( \tilde{X}_{is}(t_k)^2\right) }}$.

The algorithm’s classification of signals into trends uses spectral methods, as previously described^12,15. Briefly, for each signal a Lomb–Scargle periodogram is calculated as a list $P_{is}$. The inverse Fourier transform of $P_{is}$ returns a list of autocorrelations, $\{\rho _{isk}\}$, where $k\in \{0,\ldots , N/2\}$ is the lag. In parallel with the original time series signals, a bootstrap set of $10^5$ time series was generated by sampling from the original data with replacement. The autocorrelations at different lags of the bootstrap set were computed to generate an autocorrelation null distribution for each lag from which a set of cutoffs $\{\rho _{ck}\}$ were obtained corresponding to a 0.95 quantile. A time series was then assigned to a class labeled with the lowest Lag l for which the series’ autocorrelation $\rho _{isl}$ is larger than the cutoff, i.e., where $l=\text {Min}\left[ \left\{ j:\rho _{\text {isj}}\ge \rho _{\text {cj}}\right\} \right]$, and $j\in {1,\ldots ,k}$.

If a time series $\tilde{X_{is}}$ does not have autocorrelations that satisfy the cutoff criteria, the algorithm then checked if the series has a pronounced peak or trough at any time point. The time series’ maximum, $\max _{is}=\max \tilde{X_{is}}$, and minimum, $\min _{is}=\min \tilde{X_{is}}$ were compared to $\left\{ \text {max}_{cn},\text {min}_{cn}\right\}$, which are maxima and minima cutoffs from null distributions again computed using the bootstrap time series for all possible time series lengths n. $\tilde{X_{is}}$ is then labeled as a SpikeMax signal if $\max _{is}>\text {max}_{cn}$, or as a SpikeMin if $\min _{is}<\text {min}_{cn}$. A time series that did not meet any of the cutoff criteria was not labeled as having a statistically significant trend. The approach categorizes each omics signal based on its own trend, and is thus not affected by differences in the omics modality, making it directly extensible to incorporating different kinds of time series data¹⁶.

Clustering and heatmaps

After classification, and using PyIOmica’s clusterTimeSeriesCategorization function, we carried out a two-tier hierarchical clustering (agglomerative; complete linkage) for each temporal class, to identify groups (G) and sub-groups (S). The clustering grouping used a similarity based on $\{\rho _{isk}\}$ (for the autocorrelation classification) and $\{Q_{is}\}$ for the second tier. Groups and subgroups were determined using a silhouette algorithm¹⁷. The results were visualized for each subject and every temporal class identified using PyIOmica’s visualizeTimeSeriesCategorization. Example outputs are shown in Fig. 3 and included in the Online Data Files (ODFs). The first tier of clustering aims at capturing the autocorrelation structure, and hence the dominant pattern in the data. The second tier clustering based on the data values can distinguish pattern variations, and particularly sign differences that the autocorrelations (being the inverse Fourier transform of the periodogram frequencies) would not capture¹⁶.

Reactome enrichment analysis

Reactome¹⁸ pathway enrichment analysis was carried out for each Group/Subgroup and each subject using the Reactome application programming interface (API) in PyIOmica. Examples are shown in Table 1 for two subjects, and complete output for all subjects is included in the ODFs.

Across subject comparisons

Individual results aggregation

The individual subject omics that showed statistically significant trends were aggregated to identify consistency across individuals. Signals identified as having statistically significant trends, $FDR < 0.05$, in more than 50% of the individuals, are shown in Table 2.

Network construction

The network analysis was carried out in Python, using networkx¹⁹ and scikit-network²⁰. First the time series periodograms for all the omics time series were computed for all subjects using the LombScargle function in PyIOmica. Next, for pairs of subjects p, q and for each omics time series i, a Spearman correlation matrix $S_i$ was constructed. In parallel, a bootstrap simulation of 50,000 time series was also generated from the data, as a null distribution, and the pairwise Spearman correlations were computed for these as well to determine a Spearman correlation cutoff for significance, $s_c$ at the 0.99 quantile level. Entries were kept that were most correlated to each other, by creating a restricted distance matrix $R_{i}$, such that

$$\begin{aligned}{}[R_{i}]_{p,q} = {\left\{ \begin{array}{ll} 1, \text {if } [S_i]_{p,q} > s_c\\ 0, \text {if } [S_i]_{p,q} \le s_c~. \end{array}\right. } \end{aligned}$$

(1)

A weighted network was then constructed, with the subjects represented as nodes, with an adjacency matrix A, constructed as $A = \sum _i R_i$. The entries p, q of the adjacency matrix represent the connections in the network. A nonzero entry $A_{p,q}$ means there is an edge connecting nodes (subject) p and q. The magnitude of $A_{p,q}$ provides a weight of the edge. In summary, the edges connecting pairs of nodes (subjects) were added to the network if there was at least one omics for which the Spearman correlation between the subjects was greater than $s_c$, and the edges were weighted by the number of omics that met this criterion for each pair of subjects. Thus, the network captures the similarity between subjects, putting more weight on edges connecting subjects with a higher number of similar omics signals.

Network communities calculation

To determine the network community structure, a k-means approach was used. The computation used scikit-network’s clustering.kmeans, applying an embedding, and utilizing singular value decomposition with dimension one. The number of communities (4) was selected based on the elbow and silhouette methods¹⁷, and the sklearn.metrics.silhouette_score²¹ was used for the silhouette scores calculation.

Mann-Whitney U tests

Subject measurements were compared between members of the communities in the network calculations (see above). We used non-parametric Mann–Whitney U tests²², to test for statistically significant pairwise differences across communities (p value < 0.05). The results shown in Table 3 were computed using the scipy.stats.mannwhitneyu Python functionality²³.

Results

Single subject analysis

We used PyIOmica’s¹⁵ spectral methods to classify the time series for each individual’s omics into temporal trends. The objective is to identify sets of omics that show similar temporal behavior that deviates from each individual’s own baseline. The PyIOmica categorization algorithm generates 3 sets of classes from an individual’s omics time series: (i) Lag classes, of time series showing statistically significant autocorrelation at different lags, (ii) Spike Maxima class of time series with no autocorrelation but with positive spikes (high intensity pulses), and (iii) Spike Minima class of time series with no autocorrelation but with negative spikes (low intensity pulses)^15,16. Then within each class, the algorithm separates time series into groups and subgroups based on the autocorrelation patterns and signal intensities. Further details are provided in the “Methods” section.

All analyses and results from each individual’s classifications are provided in the Online Data Files (ODFs). Here we show examples of Lag1 classification for two subjects, ZKFV71L (Female, 66-year old, Prediabetic) and ZTMFN3O (Female, 40-year old, Prediabetic), Fig. 3a and b respectively. The Lag 1 class for subject ZKFV71L has 323 time series, which were assigned to 1 group and 3 subgroups, shown in Fig. 3a left panel. The intensity changes of the 3 subgroups represented the healthy status changes, indicating the systemic immune responses in the subject. Within each subgroup, we created a mean time series, whose intensities of each time point equals to the average intensity of the time series at this time point within the subgroups. These are used to obtain the time points community structure using our visibility-based community detection algorithm²⁴. The communities of each subgroup were related to the health status changes, as indicated in Fig. 3 middle panel (circular visibility graph layouts are shown for each subgroup’s mean time series). The autocorrelation heatmap corresponding to the time series in this category is also shown in Fig. 3 right panel. Similar results were found in subject ZTMFN30, Fig. 3b. The results for all other subjects, including corresponding omics in each class and group/subgroup classifications, and visualizations are all available in the ODFs.

Once sets of omics that show similar profiles in time are identified, we can assess the biological significance of these temporal associations. Following classification, we carried out Reactome pathway¹⁸ enrichment analysis for the genes that showed statistically significant trends for each subject. The statistically significant (False Discovery Rate, FDR < 0.05) pathways results for subject ZKFV71L and ZTMFN3O for the autocorrelation Lag 1 class are shown in Table 1. The over-representation for subject ZKFV71L included Endosomal/Vacuolar pathway (14 genes), Antigen Presentation: Folding, assembly and peptide loading of class I MHC (15 genes), Interferon alpha/beta signaling (16 genes), ER-Phagosome pathway (14 genes), Interferon gamma signaling (16 genes), Antigen processing-Cross presentation (14 genes), Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell (15 genes), Interferon Signaling (16 genes), Class I MHC mediated antigen processing and presentation (17 genes) and Adaptive Immune System (23 genes), etc. These Reactome pathways indicated immune responses of this subject (ZKFV71L) corresponding to the health status change from Healthy to Immunization. Similarly, we also found statistically significant Reactome pathways for subject ZTMFN3O, including: Cellular response to heat stress (7 genes), Cellular responses to stress (12 genes), Cellular responses to stimuli (12 genes), Interleukin-10 signaling (7 genes), Signaling by Interleukins (13 genes) and Cytokine Signaling in Immune system (14 genes). These pathways are also indicative of an immune response of this subject (ZTMFN3O) corresponding in this case to a health status change from Healthy to Infection. Reactome Pathway enrichment analyses for all subjects including all subgroups are included in the ODFs.

Multi-subject similarity analysis

Based on the individual results, we first aggregated the omics that showed statistically significant trends in each individual to identify the signals that are consistent across the majority of individuals ($>50$% of subjects, number of occurrences $\ge 35$), included in Table 2. We found that high frequency signals came from 3 data sources: cytokines, nares and gut microbiome. Many studies have now shown that cytokines have a profound relationship with type 2 diabetes^25,26,27, our findings are consistent with these previous works and provide potential biomakers for type 2 diabetes (see also Discussion).

To further investigate common responses across individuals, we created a multi-subject similarity network. The network was constructed by comparing the Spearman correlation of the spectral time series representation (periodogram) across common omics between every pair of subjects. The network, shown in Fig. 4a, has nodes corresponding to the 69 subjects, with weighted edges corresponding to the number of omics that showed similar temporal behavior. We used a k-means algorithm to calculate communities of the network (see “Methods” section for details). Four communities were identified, denoted as Community 0, Community 1, Community 2 and Community 3. Community 0 has 28 individuals, 15 Females and 13 Males, ages from 29 to 69, with disease status including 19 Prediabetic, 1 Diabetic, 5 Crossover and 3 Control. Community 1 has 16 individuals, 6 Females and 10 Males, ages from 33 to 75, 12 Prediabetic, 1 Crossover and 3 Control. Community 2 has 15 individuals, 9 Females and 6 Males, age from 39 to 67, 13 Prediabetic, 2 Crossover. Community 3 has 10 individuals, 5 Females and 5 Males, age from 39 to 70, 6 Prediabetic, 2 Crossover and 2 Control. We then compared clinical measures between the subjects in the community, including body mass index (BMI), disposition index (DI), steady-state plasma glucose (SSPG), Matsuda index (Matsuda), and maximum insulin secretion rate (isrMax). The violin plots show group separated by community and sex, Fig. 4b–f. These 5 distribution figures qualitatively indicate that the Community 0, Community 1 and Community 2 have similar distribution but have large differences with Community 3. We also found that the female and male subjects have different distributions, even within the same community for BMI, DI and isrMax measures. We carried out non-parametric Mann–Whitney U tests^22,28 to compare across the different communities for statistical significance (p value < 0.05), with the results summarized in Table 3. The 5 measures do not show statistical significant differences between Community 0 and Community 1,Community 0 and Community 2, and Community 1 and Community 2. The BMI and SSPG distributions between Community 0 and Community 3, and between Community 2 and Community 3 have statistically significant differences, the BMI distributions between Community 1 and Community 3 also have statistically significant differences, indicating that the subjects in Community 3 have statistical difference in physiological states comparing with Community 0, Community 1 and Community 2. Differences in Male vs. Females in the comparisons, particularly for BMI, DI and isrMax, suggest that even though overall subjects in these three communities may have similar physiological states and responses, females and males still display different physiological states (though the low number of subjects is affecting power in further breaking down of community differences).

We next ranked the omics in each community (represented by the weighted edges), by their frequencies of occurrence. We then carried out Reactome Pathway Enrichment analysis for the top 25% ranked genes of each community (to reduce noise effects from low frequency genes). The statistically significant pathway results (FDR < 0.05) are shown in Table 4. Community 0 has 16 statistical significant Reactome pathways, related to immune response. For Community 2 the statistically significant Reactome pathways include TRKA activation by NGF, DDX58/IFIH1-mediated induction of interferon-alpha/beta, Activation of TRKA receptors, and Ca2+ activated K+ channels. Community 1 and Community 3 did not have statistically significant Reactome results. The Reactome pathway Enrichment analysis, along with the subjects’ BMI, DI, SSPG, Matsuda and isrMax distributions differences indicate that these affect temporal immune responses in different detected Communities, and also have sex differences (BMI and SSPG) between two of the identified communities (Community 0 and Community 3).

Result summary

In summary, our results separate individuals with different physiological conditions and immune response based on the multiomics time series similarity of individual profiling analyses. We can detect changes in each individual compared to their own baseline that coincide with immune responses in each subject, and correspond to the individual time of onset per individual. The groups of genes discovered are associated with immune-relevant pathways, including antigen processing and presentation, interferon signaling and interleukin signaling (e.g., Table 1). The trends detected are consistently detected across multiple individuals (Table 2), and include multiple cytokines (IP10, IL13, PDGFBB, TGFA, IL27, TGFB, MCP1, GCSF, MIG, IL22, CD40L, VEGF, SCF and EGF) as well as bacterial signals (with top results by occurrence in the highest number of individuals including genus Streptococcus in both nares and gut and Blautia in gut, classes Gammaproteobacteria and Betaproteobacteria in nares, family Streptococcaceae in both nares and gut, and phyla Proteobacteria in gut and Bacteroidetes in nares, and order Lactobacillales in gut). Finally, a network with nodes representing subjects and weighted edges omics with similar temporal behavior across individuals, shown in Fig. 4, separated the individuals in 4 communities, with statistically significant differences (p value < 0.05) detected in BMI, SSPG, DI and isrMax, including sex differences, Table 3. The network communities detected involve different pathways for the top genes (highest weight edges) with similar behavior between subjects, Table 4. This involves in one group of subjects (Community 0) several immune responses (antigen processing and presentation, interferon and cytokine signaling, and several pathways also identified in SARS-CoV2-2 immune responses, as multiple immune-related genes are involved), and different pathways in a second group (Community 2, including TRKA activation by NGF, DDX58/IFIH1-mediated induction of interferon-alpha/beta, Activation of TRKA receptors and Ca2+ activated K+ channels).

Table 1 Statistically significant (FDR < 0.05) Reactome pathways results for subject ZKFV71L and ZTMFN3O for autocorrelation Lag 1.

Full size table

Table 2 Frequency of signals with statistically significant temporal trends in individuals.

Full size table

Table 3 Mann–Whitney U test for different measures between two different communities.

Full size table

Table 4 Statistically significant (FDR < 0.05) Reactome pathways results of top quartile highest genes for Communities 0 and 2.

Full size table

Discussion

In this manuscript, we applied spectral methods to analyze multiomics individual profiles from public data for 69 individuals. Our goals were to take an individual-focused approach and: (i) detect molecular-level deviations from each individual’s own baseline in response to dynamic changes in their physiological states, and (ii) build on the individual results to compare across subjects in a bottom-up approach and identify common molecular signatures. We generated periodograms for individual subject omics time series categorization, constructed within-person omics networks and detected personal-level immune changes corresponding to the individual’s physiological state changes. We identified similar individual-level responses to immune perturbation across multiple subjects. We then used the periodograms across subjects to identify network clusters of individuals with similarities across their common omics temporal patterns. The multi-individuals’ similarity network revealed different communities within which the molecular behavior was linked to phenotypic differences, including body mass index and insulin resistance, with the immune response dominating differences attributed to diabetic status, Table 3.

Our results are consistent with and extend previous research that has reported several cytokines with important roles in type 2 diabetes development, including cytokines from our findings across individuals in Table 2, with some examples highlighted below. Elevated concentrations of IP10 (CXCL10) have been reported in type 2 diabetes, and are associated with higher diabetes risk^29,30. The IL13 pathway is a potential therapeutic target for glycemic control in type 2 diabetes³¹. IL27 has been implicated in insulin resistance in genome wide association studies³². Wang et al. had reported the pathogenic role of IL27, using diabetic NOD mice to investigate T-cell mediated autoimmune diabetes³³, but recently the role of IL27-IL27R$\alpha$ in promoting adipocyte thermogenesis has been investigated in the context of treating insulin resistance³⁴. Decreased plasma IL22 level was found to be a potential trigger of impaired fasting glucose and type 2 diabetes, in a retrospective study of Han Chinese subjects³⁵. PDGFBB is reported as associated with type 2 diabetes mellitus and complications^36,37. TGFA³⁸ and TGFB³⁹ have shown a pathologic contribution in diabetic kidney disease. TGFB is also reported associated with type 2 diabetic nephropathy⁴⁰. ALKP has been investigated as an independent predictor for diabetes incidence^41,42. MCP1 has been found significantly increased in patients with type 2 diabetes⁴³. Furthermore, through rat studies GCSF has been reported as a potential novel therapeutic drug in early diabetic nephropathy patients⁴⁴. The involvement of MIG (CXCL9) in the progression of type 2 diabetes nephropathy has been reported⁴⁵. CD40-CD40L has been associated with type 2 diabetes mellitus⁴⁶, VEGF is involved in the pathogenesis of diabetic complications⁴⁷, c-Kit and its ligand, stem cell factor (SCF) have been reported as a potential novel target for treating diabetes⁴⁸. Finally, chemotactic cytokines, including eosinophil chemotactic factors (ECFs), have been shown to be related to type 2 diabetes mellitus⁴⁹. In our results, the cytokines above had high occurrence rates with statistically significant trends across the diabetic and prediabetic individuals. The findings suggest that our approach has potential application in clinical trials to identify disease biomakers and treatment targets.

In our analysis we used unsupervised methods to classify time-resolved trends in each subject, as well as to construct the network and identify the communities that showed differences in different diabetes-relevant measures (BMI, SSPG, DI and isrMax), Fig. 4 and Table 3. As the analysis did not depend on prior knowledge of the prediabetes/diabetes status of the subject, and given that the prediabetes-diabetes distinction is a rather continuous spectrum this suggests that potential pathophysiological signals are playing a part, and merits further future investigation.

Our study has limitations: In the current study we focused on immune changes and perturbations as these were available in the source data. As there are multiple other pathogenic factors that contribute to T2D, the data do not offer a complete picture and will need to be supplemented with additional followup studies. The data used in this investigation contained fine-grained omics data, and comparatively crude phenotypic data. Future investigations can address this at the outset of the experimental design prior to data collection. Furthermore, we have focused on immune markers, as these were the perturbation data available. To comprehensively study T2D more kinds of perturbations from baseline need to be investigated, including other disorders. Furthermore, the transcriptomic data used were generated from bulk RNA-sequencing, and hence do not allow for cell-type specific analyses, which are important is evaluating T2D. We expect that cell-type-specific studies with longitudinal data will become more prevalent with the recent focus on single-cell RNA-sequencing approaches. In terms of the data, there is uneven sampling (inherent in any real-world/subject based study), which we have addressed with our approach, as well as having different lengths of time series across individuals. Also, the RNA-seq data modality dominates in terms of number of omics. Our approach is robust as it analyzes single omics signals when assigning autocorrelation classes, irrespective of modality and hence is not affected by the unevenness between modality sizes. In the dendrogram constructions of determining groups and subgroups, it is possible that the trends are dominated by the large RNA-seq data, and potentially overfitting to a particular trend (i.e. missing subtle trend variation). Still, in this unsupervised clustering approach any distinct singleton omics patterns will be displayed.

Furthermore, environmental measures are not included and the low number of participants does not allow for a nuanced analysis of heterogeneity across subjects. For example, different subjects received different treatments which are analyzed in bulk. In terms of broad applicability of data collection and analysis, the data are obtained using invasive approaches (at least for blood components), which should improve with non-invasive transcriptomic mapping, such as using saliva. While our analysis included microbiome data (nares and gut), their association with transcriptome results, the cytokines and other measurements noted in Table 2, and their mechanistic role still requires further model-based experimentation. Finally, time series analyses with multiomics are an evolving area of research. We currently do not have T2D datasets that can be used for validation of findings. We anticipate that more studies will be generated that will also provide the necessary data to enable validation beyond the discovery approach in this investigation. Such data will also allow us to evaluate different methodologies, as there are many approaches to time-series analysis that we intend to continue investigating. In this investigation we have used the spectral methods as a first approach to detect data patterns in an unsupervised analysis. The spectral methods aid in addressing missing data/uneven sampling while maintaining the statistical properties of the data, and provide a streamlined pattern recognition based on frequency/autocorrelation behavior which is often used in time series analyses. Future work can also employ different approaches, especially for longer time series and with higher sampling rates to aid in the development of differential equation models of departures from signal baselines in response to perturbations, that provide more mechanistic interpretations of T2D dynamics.

In this manuscript we performed an individual-focused analysis of a prediabetes study cohort^4,14, which revealed insights from this longitudinal dataset and can lead to actionable health discoveries, providing relevant information for precision health monitoring. Our approach is the first, to our knowledge, with a temporal bottom-up/individual focus. Starting from individual microscopic measurements (molecular level omics), intra-subject immune responses can be characterized. Then, building on the individual responses, a macroscopic inter-subject temporal clustering of subjects based on temporal similarity provides information on how immune responses can be related to diabetic states, consistent with and supplementing the original work.

In summary, our findings utilize personal temporal omics to identify collective responses across individuals associated with macroscopic characteristics, and provide an approach that can potentially help predict disease responses and outcomes towards clinical implementations. The approach is non-disease specific, and extensible: any time signal measurement can be incorporated and any number of individuals can be compared spanning periods of individualized wellness and departures therefrom. Additionally, real-world application limitations of missing data and uneven sampling can be addressed. Beyond identifying trends within an individual, expanding to larger cohorts can eventually provide individual temporal signatures of specific disease onset. Such temporal disease signatures can be used to train models for monitoring departures from a healthy baseline, towards prevention or minimally timely treatments. Coupling multi-timepoint monitoring with non-invasive sampling (e.g. saliva¹²) can help eventually provide affordable population-level precision health.

Data availability

The original data analyzed in this investigation were made publicly available by Zhou et al.¹⁴ and Schussler-Fiorenza Rose et al.⁴ as described therein, on https://med.stanford.edu/ipop.html. All data files used in this investigation, including original code and results files have been released on Github (https://github.com/gmiaslab/TemporalMultiomicsDiabetes), and also deposited on Zenodo (https://doi.org/10.5281/zenodo.6751960), and are referred to as Online Data Files (ODFs) in the manuscript.

References

Gambhir, S. S., Ge, T. J., Vermesh, O. & Spitler, R. Toward achieving precision health. Sci. Transl. Med.https://doi.org/10.1126/scitranslmed.aao3612 (2018).
Article PubMed PubMed Central Google Scholar
Snyder, M. & Zhou, W. Big data and health. Lancet Digit Health 1, e252–e254. https://doi.org/10.1016/S2589-7500(19)30109-8 (2019).
Article PubMed Google Scholar
Kellogg, R. A., Dunn, J. & Snyder, M. P. Personal omics for precision health. Circ. Res. 122, 1169–1171. https://doi.org/10.1161/CIRCRESAHA.117.310909 (2018).
Article CAS PubMed Google Scholar
Schussler-Fiorenza Rose, S. M. et al. A longitudinal big data approach for precision health. Nat. Med. 25, 792–804. https://doi.org/10.1038/s41591-019-0414-6 (2019).
Article CAS PubMed Google Scholar
Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 83. https://doi.org/10.1186/s13059-017-1215-1 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307. https://doi.org/10.1016/j.cell.2012.02.009 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ahadi, S. et al. Personal aging markers and ageotypes revealed by deep longitudinal profiling. Nat. Med. 26, 83–90. https://doi.org/10.1038/s41591-019-0719-5 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tebani, A. et al. Integration of molecular profiles in a longitudinal wellness profiling cohort. Nat. Commun. 11, 4487. https://doi.org/10.1038/s41467-020-18148-7 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Sailani, M. R. et al. Deep longitudinal multiomics profiling reveals two biological seasonal patterns in California. Nat. Commun. 11, 4933. https://doi.org/10.1038/s41467-020-18758-1 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, X. et al. Digital health: Tracking physiomes and activity using wearable biosensors reveals useful health-related information. PLoS Biol. 15, e2001402. https://doi.org/10.1371/journal.pbio.2001402 (2017).
Article CAS PubMed PubMed Central Google Scholar
Tasaki, S. et al. Multi-omics monitoring of drug response in rheumatoid arthritis in pursuit of molecular remission. Nat. Commun. 9, 1–12. https://doi.org/10.1038/s41467-018-05044-4 (2018).
Article CAS Google Scholar
Mias, G. I. et al. Longitudinal saliva omics responses to immune perturbation: A case study. Sci. Rep. 11, 710. https://doi.org/10.1038/s41598-020-80605-6 (2021).
Article CAS PubMed PubMed Central Google Scholar
Price, N. D. et al. A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat. Biotechnol. 35, 747–756. https://doi.org/10.1038/nbt.3870 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Longitudinal multi-omics of host-microbe dynamics in prediabetes. Nature 569, 663–671. https://doi.org/10.1038/s41586-019-1236-x (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Domanskyi, S., Piermarocchi, C. & Mias, G. I. PyIOmica: Longitudinal omics analysis and trend identification. Bioinformatics 36, 2306–2307. https://doi.org/10.1093/bioinformatics/btz896 (2019).
Article CAS PubMed Central Google Scholar
Mias, G. I. et al. MathIOmica: An integrative platform for dynamic omics. Sci. Rep. 6, 37237. https://doi.org/10.1038/srep37237 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 (1987).
Article MATH Google Scholar
Croft, D. et al. Reactome: A database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697. https://doi.org/10.1093/nar/gkq1018 (2010).
Article CAS PubMed PubMed Central Google Scholar
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using networkx. In Varoquaux, G., Vaught, T. & Millman, J. (eds.) Proceedings of the 7th Python in Science Conference 11–15 (Pasadena, CA USA, 2008). https://conference.scipy.org/proceedings/scipy2008/paper_2/ (Last checked June 26, 2022).
Bonald, T., de Lara, N., Lutz, Q. & Charpentier, B. Scikit-network: Graph analysis in python. J. Mach. Learn. Res. 21, 1–6 (2020).
MATH Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60. https://doi.org/10.1214/aoms/1177730491 (1947).
Article MathSciNet MATH Google Scholar
Virtanen, P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272. https://doi.org/10.1038/s41592-019-0686-2 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zheng, M., Domanskyi, S., Piermarocchi, C. & Mias, G. I. Visibility graph based temporal community detection with applications in biological time series. Sci. Rep. 11, 1–12. https://doi.org/10.1038/s41598-021-84838-x (2021).
Article CAS Google Scholar
Randeria, S. N., Thomson, G. J., Nell, T. A., Roberts, T. & Pretorius, E. Inflammatory cytokines in type 2 diabetes mellitus as facilitators of hypercoagulation and abnormal clot formation. Cardiovasc. Diabetol. 18, 1–15. https://doi.org/10.1186/s12933-019-0870-9 (2019).
Article CAS Google Scholar
Dovio, A. & Angeli, A. Cytokines and type 2 diabetes mellitus. JAMA 286, 2233. https://doi.org/10.1001/jama.286.18.2233 (2001).
Article CAS PubMed Google Scholar
Miranda, T. S. et al. The ratios of pro-inflammatory to anti-inflammatory cytokines in the serum of chronic periodontitis patients with and without type 2 diabetes and/or smoking habit. Clin. Oral Invest. 23, 641–650. https://doi.org/10.1007/s00784-018-2471-5 (2019).
Article Google Scholar
Fay, M. P. & Proschan, M. A. Wilcoxon–Mann–Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Stat. Surv. 4, 1–39. https://doi.org/10.1214/09-SS051 (2010).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Pan, X., Kaminga, A. C., Wen, S. W. & Liu, A. Chemokines in prediabetes and type 2 diabetes: A meta-analysis. Front. Immunol. 12, 934. https://doi.org/10.3389/fimmu.2021.622438 (2021).
Article CAS Google Scholar
Herder, C. et al. Chemokines as risk factors for type 2 diabetes: Results from the Monica/Kora Augsburg study, 1984–2002. Diabetologia 49, 921–929. https://doi.org/10.1007/s00125-006-0190-y (2006).
Article CAS PubMed Google Scholar
Stanya, K. J. et al. Direct control of hepatic glucose production by interleukin-13 in mice. J. Cln. invest.https://doi.org/10.1172/JCI64941 (2012).
Article Google Scholar
Vargas-Alarcon, G. et al. Interleukin 27 polymorphisms, their association with insulin resistance and their contribution to subclinical atherosclerosis. The GEA Mexican study. Cytokine 114, 32–37. https://doi.org/10.1016/j.cyto.2018.11.028 (2019).
Article CAS PubMed Google Scholar
Wang, R. et al. The pathogenic role of interleukin-27 in autoimmune diabetes. Cell. Mol. Life Sci. 65, 3851–3860. https://doi.org/10.1007/s00018-008-8540-1 (2008).
Article CAS PubMed Google Scholar
Wang, Q. et al. Il-27 signalling promotes adipocyte thermogenesis and energy expenditure. Naturehttps://doi.org/10.1038/s41586-021-04127-5 (2021).
Article PubMed PubMed Central Google Scholar
Shen, J., Fang, Y., Zhu, H. & Ge, W. Plasma interleukin-22 levels are associated with prediabetes and type 2 diabetes in the Han Chinese population. J. Diabetes Invest. 9, 33–38. https://doi.org/10.1111/jdi.12640 (2018).
Article CAS Google Scholar
Shen, S., Wang, F., Fernandez, A. & Hu, W. Role of platelet-derived growth factor in type ii diabetes mellitus and its complications. Diab. Vasc. Dis. Res. 17, 1479164120942119. https://doi.org/10.1177/1479164120942119 (2020).
Article PubMed PubMed Central Google Scholar
Yeboah, J., Sane, D. C., Crouse, J. R., Herrington, D. M. & Bowden, D. W. Low plasma levels of FGF-2 and PDGF-BB are associated with cardiovascular events in type II diabetes mellitus (diabetes heart study). Dis. Markers 23, 173–178. https://doi.org/10.1155/2007/962892 (2007).
Article CAS PubMed PubMed Central Google Scholar
Heuer, J. G. et al. Role of TGF-alpha in the progression of diabetic kidney disease. Am. J. Physiol.-Renal Physiol. 312, F951–F962. https://doi.org/10.1152/ajprenal.00443.2016 (2017).
Article CAS PubMed Google Scholar
Qiao, Y.-C. et al. Changes of transforming growth factor beta 1 in patients with type 2 diabetes and diabetic nephropathy: A prisma-compliant systematic review and meta-analysis. Medicinehttps://doi.org/10.1097/MD.0000000000006583 (2017).
Article PubMed PubMed Central Google Scholar
Mou, X. et al. Serum TGF-$\beta$1 as a biomarker for type 2 diabetic nephropathy: A meta-analysis of randomized controlled trials. PLoS ONE 11, e0149513. https://doi.org/10.1371/journal.pone.0149513 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, S.C.-C. et al. Liver fat, hepatic enzymes, alkaline phosphatase and the risk of incident type 2 diabetes: A prospective study of 132,377 adults. Sci. Rep. 7, 1–9. https://doi.org/10.1038/s41598-017-04631-7 (2017).
Article CAS Google Scholar
Malo, M. S. A high level of intestinal alkaline phosphatase is protective against type 2 diabetes mellitus irrespective of obesity. EBioMedicine 2, 2016–2023. https://doi.org/10.1016/j.ebiom.2015.11.027 (2015).
Article PubMed PubMed Central Google Scholar
Panee, J. Monocyte chemoattractant protein 1 (MCP-1) in obesity and diabetes. Cytokine 60, 1–12. https://doi.org/10.1016/j.cyto.2012.06.018 (2012).
Article CAS PubMed PubMed Central Google Scholar
So, B.-I. et al. G-CSF prevents progression of diabetic nephropathy in rat. PLoS ONE 8, e77048. https://doi.org/10.1371/journal.pone.0077048 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Higurashi, M. et al. Increased urinary levels of CXCL5, CXCL8 and CXCL9 in patients with type 2 diabetic nephropathy. J. Diabetes Complic. 23, 178–184. https://doi.org/10.1016/j.jdiacomp.2007.12.001 (2009).
Article Google Scholar
Seijkens, T., Kusters, P., Engel, D. & Lutgens, E. Cd40-cd40l: Linking pancreatic, adipose tissue and vascular inflammation in type 2 diabetes and its complications. Diab. Vasc. Dis. Res. 10, 115–122. https://doi.org/10.1177/1479164112455817 (2013).
Article CAS PubMed Google Scholar
Zhang, Q. et al. VEGF levels in plasma in relation to metabolic control, inflammation, and microvascular complications in type-2 diabetes: A cohort study. Medicinehttps://doi.org/10.1097/MD.0000000000010415 (2018).
Article PubMed PubMed Central Google Scholar
Feng, Z.-C., Riopel, M., Popell, A. & Wang, R. A survival kit for pancreatic beta cells: Stem cell factor and c-kit receptor tyrosine kinase. Diabetologia 58, 654–665. https://doi.org/10.1007/s00125-012-2566-5 (2015).
Article CAS PubMed Google Scholar
Sell, H. & Eckel, J. Chemotactic cytokines, obesity and type 2 diabetes: In vivo and in vitro evidence for a possible causal correlation? Symposium on ‘Frontiers in Adipose Tissue Biology’. Proc. Nutr. Soc. 68, 378–384. https://doi.org/10.1017/S0029665109990218 (2009).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the Translational Research Institute for Space Health through NASA Cooperative Agreement NNX16AO69A (project T0412). CP acknowledges support by NIH R01GM122085.

Author information

Authors and Affiliations

Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
Minzhang Zheng & George I. Mias
Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
Minzhang Zheng & George I. Mias
Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
Carlo Piermarocchi & George I. Mias

Authors

Minzhang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Piermarocchi
View author publications
You can also search for this author in PubMed Google Scholar
George I. Mias
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, M.Z., C.P. and G.I.M.; Methodology, M.Z., C.P. and G.I.M.; Software, M.Z. and G.I.M.; Investigation, M.Z., C.P. and G.I.M.; Visualization, M.Z.; Resources, M.Z. and G.I.M.; Writing—Original Draft, M.Z. and G.I.M.; Writing—Review and Editing, M.Z., C.P. and G.I.M.; Funding Acquisition, C.P. and G.I.M.; Supervision, C.P. and G.I.M.

Corresponding author

Correspondence to George I. Mias.

Ethics declarations

Competing Interests

C.P. owns equity in Salgomed, Inc. G.I.M. has consulted for Colgate Palmolive North America. M.Z. declares the absence of any commercial or financial relationships that could be construed as a potential competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, M., Piermarocchi, C. & Mias, G.I. Temporal response characterization across individual multiomics profiles of prediabetic and diabetic subjects. Sci Rep 12, 12098 (2022). https://doi.org/10.1038/s41598-022-16326-9

Download citation

Received: 23 February 2022
Accepted: 08 July 2022
Published: 15 July 2022
DOI: https://doi.org/10.1038/s41598-022-16326-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Microbiota in health and diseases

Genome-wide association studies

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Introduction

Methods

Summary of cohort details and data

Data preprocessing

Individual subject analysis

Time series categorization

Clustering and heatmaps

Reactome enrichment analysis

Across subject comparisons

Individual results aggregation

Network construction

Network communities calculation

Mann-Whitney U tests

Results

Single subject analysis

Multi-subject similarity analysis

Result summary

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links